Speech to text with the Google cloud speech api and Ruby
This guide is a quick overview on how to setup speech to text conversion with the Google cloud speech API in your Ruby application. The Google cloud speech API provides speech recognition for over 80 languages, powered by machine learning.
First, let’s head over to the Google cloud platform to create an account. After creating your account, start with creating your first project. To use the Cloud speech API, activate the api from your api management dashboard.
Finally, after activating the api, go to you api management dashboard and select credentials. Here you can create an api key that we will use for authentication.
Setting up the Google api client
In this example, we will use a basic ruby project/gem. Feel free to do this in Rails or any other framework you prefer.
bundle gem my_project
To access the Google cloud speech api, we will use the Google api ruby client.
Add the gem to your gemfile
gem 'google-api-client', '~> 0.9'
The api client we want to use is speech_v1beta1. In service.rb, a few methods are provided that allow use to work with the speech recognition api. To perform the audio processing, the ruby api client provides us two options;
- synchronous, with the
sync_recognize_speech
method. This results in receiving the results after all audio has been sent and processed. - asynchronous, with the
async_recognize_speech method
. This allows the audio to be processed asynchronous, returning an ‘operation’ with the operation status and/or results. This is the method that we will use in our example.
Processing our audio file
In the Transcriber module, add the following two methods;
- async_request: This will perform the request sending a request object with our audio file and configuration.
- get operation: This will retrieve the operation with the status of our request process, and results if the operation is finished.
In the example, we use the Google audio cloud sample (brooklyn.flac). If you want you can change this to use the content attribute instead with a path to your local .flac file.
lib/transcriber.rb
Using the module
Run irb in lib/transcriber followed by require_relative "transcriber"
to load your module. Or use the console of you’re using Rails.
To make your first request:
request = Transcriber::async_request
This will respond with an ‘operation’ and a name
attribute. The name is what we will use to identify our operation and to retrieve the results.
Depending on the length of your file, it can take some time to process the audio. Use the get_operation method with the operation name to retrieve the operation status.
operation = Transcriber::get_operation(request.name)
If the operation is finished, check out the transcript from your request with
operation.response["results"]
For more information about the Google cloud speech api;