Blackberry 10 Speech to Text on phone - blackberry

Is anyone aware of any Blackberry 10 Speech to Text libraries that can use a predefined word (grammar) list and do their processing entirely on the phone? I have a client that doesn't want to use their data plan (and in some cases wont have internet connectivity) for processing speech to text, but is really only interested in using a handful of key words to perform certain actions in their application.

Related

Ways to transcribe speech which language is not identified?

I'm trying to build a software that will identify the language being spoken.
My plan is to use Google's cloud speech to text to transcribe the speech, and put it through cloud translation api to detect the langauge of the transcription.
However, since speech to text requires language code to be set prior to transcribing, I was planning to run it multiple times with different sets of languages and compare the "confidence" value to find the most confident transcription, that will be put through to cloud translation api.
Would this be the ideal way? Or would there be any other possible options?
Maybe you can check the Detecting language spoken automatically page in google cloud speech documentation.

Reading and identifying multiple USB keyboards

I'm in charge of technology at my local camera club, a not-for-profit charity in Malvern UK. We have a database-centric competition management system which is home-brewed by me in Delphi 6 and now we wish to add a scoring system to it. This entails attaching 5 x cheap-and-standard USB numeric keypads to a PC (using a USB hub) and being able to programmatically read the keystrokes from each keyboard as they are entered by the 5 judges. Of course, they will hit their keys in a completely parallel and asynchronous way, so I need to identify which key has been struck by which judge, so as to assemble the scores (i.e. possible multiple keystrokes each) they have entered individually.
From what I can gather, Windows grabs the attention of keyboard devices and looks after the characer strings they produce, simply squirting the chars into the normal keyboard queue (and I have confirmed that by experiment!). This won't do for my needs, as I really must collect the 5 sets of (possibly multiple) key-presses and allocate the received characters as 5 separate variables for the scoring system to manipulate thereafter.
Can anyone (a) suggest a method for doing this in Delphi and (b) offer some guide to the code that might be needed? Whilst I am pretty Delphi-aware, I have no experience of accessing USB devices, or capturing their data.
Any help or guidance would be most gratefully received!
Windows provides a Raw Input API, which can be used for this purpose. In the reference at the link provided, one of the advantages is listed as:
An application can distinguish the source of the input even if it is
from the same type of device. For example, two mouse devices.
While this is more work than regular Windows input messages, it is a lot easier than writing USB device drivers.
One example of its use (while not written in Delphi) demonstrates what it can do, and provides some information on using it:
Using Raw Input from C# to handle multiple keyboards.

how to detect language spoken in google cloud platform machine learning speech api

Is there an option to automatically detect the spoken language using Google Cloud Platform Machine Learning's Speech API?
https://cloud.google.com/speech/docs/languages indicates the list of the languages supported and user needs to be manually set this parameter to perform speech-to-text.
Thanks
Mahesh
As of last month, Google added support for detection of spoken languages into its speech-to-text API. Google Cloud Speech v1p1beta1
It’s a bit limited though - you have to provide a list of probable language codes, up to 3 of them only, and it’s said to be supported only for voice command and voice search modes. It’s useful if you have a clue what other languages may be in your audio.
From their docs:
alternative_language_codes[]: string
Optional A list of up to 3 additional BCP-47 language tags, listing
possible alternative languages of the supplied audio. See Language
Support for a list of the currently supported language codes. If
alternative languages are listed, recognition result will contain
recognition in the most likely language detected including the main
language_code. The recognition result will include the language tag of
the language detected in the audio. NOTE: This feature is only
supported for Voice Command and Voice Search use cases and performance
may vary for other use cases (e.g., phone call transcription).”
Requests to Google Cloud Speech API require the following configuration parameters: encoding, sampleRateHertz and languageCode.
https://cloud.google.com/speech/reference/rest/v1/RecognitionConfig
Thus, it is not possible for the Google Cloud Speech API service to automatically detect the language used. The service will be configured by this parameter (languageCode) to start recognizing speech in that specific language.
If you had in mind a parallel with Google Cloud Translation API, where the input language is automatically detected, please consider that automatically detecting the language used in an audio file requires much more bandwidth, storage space and processing power than in a text file. Also, Google Cloud Speech API offers Streaming Speech Recognition, a real-time speech-to-text service, where the languageCode parameter is especially required.

Synchronise an audio to accurate transcription on iOS

I'm trying to synchronise text in my iOS app to audio that is being streamed simultaneously. The text is a very very accurate transcription of the audio that has been previously done manually. Is it possible to use keyword spotting or audio to text to assist with this?
The text is already indexed in the app with the clucene search engine, so it'll be very easy to search for any string of text/words in any paragraph in the text. Even if the audio to text conversion is not 100% accurate the search engine should be able to handle it and still find the best match in text within a couple tries.
Could you point me to any open source libraries for the audio to text conversion that would assist with this? I would prefer one that can convert the streamed audio to text directly and not rely on the microphones as is common in speech the text libraries as there may be cases where users may use headphones with the app and/or their may be background noise.
To recognize audiofile or audiostream on iOS you can use CMUSphinx with Openears.
To recognize a file you need to set pathToTestFile, see for details
http://www.politepix.com/openears/#PocketsphinxController_Class_Reference
To recognize the stream you can feed the audio into pocketsphinx through Pocketsphinx API
Since you know the text beforehand you can create a grammar from it and the recognition will be accurate.

Use audio files or text to speech for iOS application

I am creating an iOS game in which I have to inform user about events in the game with voice, that you have moved one piece, 2 pieces or well done you have performed well.
The problem is that voices are in large amount and if I replace audio files for each voice the app size will grow very large.
Second option I have discovered is to use text-to-speech library. I have tried "OpenEars" but the issue is I want voice like cartoon character or bird like which is not available in any of open source text-to-speech libraries as far as I have searched.
Can anybody suggest me what is the better way to handle it or any text-to-speech framework with different voice capabilities as mentioned in above paragraph.
Thanks in advance.
VoiceForge offers different TTS voices.
http://www.voiceforge.com

Resources