I'm trying to build a software that will identify the language being spoken.
My plan is to use Google's cloud speech to text to transcribe the speech, and put it through cloud translation api to detect the langauge of the transcription.
However, since speech to text requires language code to be set prior to transcribing, I was planning to run it multiple times with different sets of languages and compare the "confidence" value to find the most confident transcription, that will be put through to cloud translation api.
Would this be the ideal way? Or would there be any other possible options?
Maybe you can check the Detecting language spoken automatically page in google cloud speech documentation.
Related
I am working in an application that gathers a user's voice input for an IVR. The input we're capturing is a limited set of proper nouns but even though we have added hints for all of the possible options, we very frequently get back unintelligible results, possibly as a result of our users having various accents from all parts of the world. I'm looking for a way to further improve the speech recognition results beyond just using hints. The available Google adaptive classes will not be useful, as there are none that match the type of input that we're gathering. I see that Twilio recently added something called experimental_utterances that may help but I'm finding little technical documentation on what it does or how to implement.
Any guidance on how to improve our speech recognition results?
Google does a decent job doing recognition of proper names, but not in real time just asynchronously. I've not seen a PaaS tool that can do this in real time. I recommend you change your approach and maybe identify callers based on ANI or account number or have them record their name for manual transcription.
david
I've noticed that certain apps on Android (ie. gboard) support translating phrases such as 'poop emoji' into the actual emoji as part of speech recognition. I was wondering if this is something that is supported through google's cloud speech APIs that I could similarly use in my own applications?
In my initial scan of the API I can't see anything that might indicate a way to turn this on (ie. RecognitionConfig et.al has no obvious toggles for it), and in some quick one-off tests in my own app I wasn't provided emoji-fied results from the service.
I've done a bunch of googling but found nothing so far.
Any insight here would be awesome, thanks!
-edit- Thanks to the answer below I have learned this currently is not supported. I've gone to Google's issue tracker to request this feature. If anyone wishes to track the feature request the link is:
https://issuetracker.google.com/u/1/issues/113978818
The Cloud Speech-to-Text API service doesn't currently support emoji phrases recognition; however, you can use the Send Feedback button located at the lower left and upper right corners of the service public documentation, as well as take a look the Issue Tracker tool in case you want to raise a Speech API feature request in order to notify to Google about this desired functionality.
Finally, you can refer to the Release Notes section of Speech-to-Text API to keep the track of the new features and functionalities added to the service.
Is there an option to automatically detect the spoken language using Google Cloud Platform Machine Learning's Speech API?
https://cloud.google.com/speech/docs/languages indicates the list of the languages supported and user needs to be manually set this parameter to perform speech-to-text.
Thanks
Mahesh
As of last month, Google added support for detection of spoken languages into its speech-to-text API. Google Cloud Speech v1p1beta1
It’s a bit limited though - you have to provide a list of probable language codes, up to 3 of them only, and it’s said to be supported only for voice command and voice search modes. It’s useful if you have a clue what other languages may be in your audio.
From their docs:
alternative_language_codes[]: string
Optional A list of up to 3 additional BCP-47 language tags, listing
possible alternative languages of the supplied audio. See Language
Support for a list of the currently supported language codes. If
alternative languages are listed, recognition result will contain
recognition in the most likely language detected including the main
language_code. The recognition result will include the language tag of
the language detected in the audio. NOTE: This feature is only
supported for Voice Command and Voice Search use cases and performance
may vary for other use cases (e.g., phone call transcription).”
Requests to Google Cloud Speech API require the following configuration parameters: encoding, sampleRateHertz and languageCode.
https://cloud.google.com/speech/reference/rest/v1/RecognitionConfig
Thus, it is not possible for the Google Cloud Speech API service to automatically detect the language used. The service will be configured by this parameter (languageCode) to start recognizing speech in that specific language.
If you had in mind a parallel with Google Cloud Translation API, where the input language is automatically detected, please consider that automatically detecting the language used in an audio file requires much more bandwidth, storage space and processing power than in a text file. Also, Google Cloud Speech API offers Streaming Speech Recognition, a real-time speech-to-text service, where the languageCode parameter is especially required.
I've been researching several iOS speech recognition frameworks and have found it hard to accomplish something I would think is pretty straightforward.
I have an app that allows people to record their voices. After a recording is made, they have the option to create a text version.
Looking into the services out there (i.e., Nuance) most require you to use the microphone. OpenEars allows you to do this, but the dictionary is so limited because it is an offline solution (they recommend 300 or less words).
There are a few other things going on with the app that would make it very unappealing to switch from the current recording method. For what it is worth, I am using the Amazing Audio Engine framework.
Anyone have any other suggestions for frameworks. Or is there a way to dig deeper with Nuance to transcribe a recorded file?
Thank you for your time.
For services, there are a few cloud based hosted speech recognition services you can use. You simply post the audio file to their URL and receive back the text. Most of them don't have any constraint on the vocabulary. You can of course choose any recording method you like.
See here: Server-side Voice Recognition . Many of them offer free trial as well.
Is anyone aware of any Blackberry 10 Speech to Text libraries that can use a predefined word (grammar) list and do their processing entirely on the phone? I have a client that doesn't want to use their data plan (and in some cases wont have internet connectivity) for processing speech to text, but is really only interested in using a handful of key words to perform certain actions in their application.