Speech to text with the Google cloud speech api and Ruby

This guide is a quick overview on how to setup speech to text conversion with the Google cloud speech API in your Ruby application. The Google cloud speech API provides speech recognition for over 80 languages, powered by machine learning.

First, let’s head over to the Google cloud platform to create an account. After creating your account, start with creating your first project. To use the Cloud speech API, activate the api from your api management dashboard.

Finally, after activating the api, go to you api management dashboard and select credentials. Here you can create an api key that we will use for authentication.

Setting up the Google api client

In this example, we will use a basic ruby project/gem. Feel free to do this in Rails or any other framework you prefer.

bundle gem my_project

To access the Google cloud speech api, we will use the Google api ruby client.

Add the gem to your gemfile

gem 'google-api-client', '~> 0.9'

The api client we want to use is speech_v1beta1. In service.rb, a few methods are provided that allow use to work with the speech recognition api. To perform the audio processing, the ruby api client provides us two options;

  • synchronous, with the sync_recognize_speech method. This results in receiving the results after all audio has been sent and processed.
  • asynchronous, with the async_recognize_speech method. This allows the audio to be processed asynchronous, returning an ‘operation’ with the operation status and/or results. This is the method that we will use in our example.

Processing our audio file

In the Transcriber module, add the following two methods;

  • async_request: This will perform the request sending a request object with our audio file and configuration.
  • get operation: This will retrieve the operation with the status of our request process, and results if the operation is finished.

In the example, we use the Google audio cloud sample (brooklyn.flac). If you want you can change this to use the content attribute instead with a path to your local .flac file.

lib/transcriber.rb

require 'google/apis/speech_v1beta1'

module Transcriber
  def self.async_request
    speech = Google::Apis::SpeechV1beta1::SpeechService.new
    speech.key = 'YOUR_API_KEY'
    async_recognize_request_object = Google::Apis::SpeechV1beta1::AsyncRecognizeRequest.new

    async_recognize_request_object.config = { 
      encoding: "FLAC", 
      sample_rate: 16000, 
      language_code: "en-US"
    }

    async_recognize_request_object.audio = { 
      # content: path_to_audio
      uri:'gs://cloud-samples-tests/speech/brooklyn.flac'
    }

    speech.async_recognize_speech(async_recognize_request_object)
  end

  def self.get_operation(operation_name)
    speech = Google::Apis::SpeechV1beta1::SpeechService.new
    speech.key = 'YOUR_API_KEY'
    speech.get_operation(operation_name)
  end
end

Using the module

Run irb in lib/transcriber followed by require_relative "transcriber" to load your module. Or use the console of you’re using Rails.

To make your first request:

request = Transcriber::async_request

This will respond with an ‘operation’ and a name attribute. The name is what we will use to identify our operation and to retrieve the results.

Depending on the length of your file, it can take some time to process the audio. Use the get_operation method with the operation name to retrieve the operation status.

operation = Transcriber::get_operation(request.name)

If the operation is finished, check out the transcript from your request with

operation.response["results"]

For more information about the Google cloud speech api;

Cloud speech api - Getting started