How to get Google Cloud Speech (voice-to-text) to recognize letters and sounds - google-cloud-speech

Is there a way to get the Google Cloud Speech API to recognize letters and letter sounds?
As an example use case, if I wanted to build a spelling game where a voice would say, "Spell restaurant" and the recognizer would listen for each letter and recognize them as they come through.
Similarly, is there a way to identify specific letter sounds like: "oo", "ew", "k" (as in cat) or "s" (as in circle).

It seems to already do a reasonable job at least in some cases. E.g., when I spell out "cee ay tee" it recognizes "c a t". It is also possible to provide "word hints" as described in this post:
Google Cloud Speech API word Hints
Supplying a list of single-letter "words" as hints, i.e.
phrases = ['a', 'b', 'c', 'd' ... ]
appears to give improved results in this area.

Related

Convert ID from speech to text without spaces

I'm using Google Cloud Speech API with IBM Voice Gateway in order to interact with a VoiceBot through a phone.
If I say an identifier contening letters and numbers through the phone, the Google Cloud Speech converts it into string with spaces. For example, if I say "A1B2C3", it will convert it into the following string "a 1 b 2 c 3".
Do you know if there is way to avoid these useless spaces ?
Thanks for your help!
Lucas
I don't see any way in which you can eliminate spaces from the API response. What you could do is experiment with the available features, as this is probably your best chance to get a recognition more similar to what you are looking for.
For example: you can provide some sample hint phrases echoing your use case, indicate that the audio is a phone call, or use an enhanced model (although for the latter to be available you need to first opt in for data logging).
Honestly though, for your case, it might be better if you post process the returned string (e.g. with a simple "a 1 b 2 c 3".replace(' ','') ).

Youtube API v3 : no wildcard in search?

I'm making a list of the videos of my channel, and want to use the search endpoint of the API : https://developers.google.com/youtube/v3/docs/search/list
Ther eis a "q" parameter to send the query. What completely bugs me is that no wildcard is referenced in the documentation, and when using * it doesn't do anything. For example, in order to find any video containing "television" in the title, the full word has to be input ! Sending "tel" won't work, nor sending "televisio".
Did I miss something ? Is there a way around this ?
Thanks !
YouTube searching works along the same paradigm as Google searching, which is quite a bit different than the character-wildcard keyword approach. It's semantic probabilistic searching, looking for relevance based on the terms you give it, so while the * does represent a wildcard, it represents a whole word. For example, you can search for "a * saved" and it will return to you the videos which score the highest relevance score where any word could be substituted in place of your wildcard.
You can also use other punctuation based search operators ... the + sign, - sign, quotation marks, etc. Just make sure they're all URL encoded before you send the query in.

How can I adjust OpenEars wrong recognition

I used the OpenEars for my app.just recognize "a" to "z" in the alphabet.
But it had a bad recognition in recognize alphabet than word.
So, how can i use my sound model to improve the recognition of OpenEars.
And how can I use OpenEars to recognize some special sound.
for example. I give OpenEars a dog sound and I want it to give me back "dog"
So this is a two part question which might be better to the community split up. OpenEars from what I understand is best served as using words in the dictionary. If you want it to recognize alphabet letters I would try and use the phonetic spelling of each letter instead of using just the letter. So instead of using 'f' use "ef".
As for the second part of the question, you might be able to recognize specific types of dogs which go "ruff" but smaller dogs with more of a "yip!" would have to be added to the initial dictionary as well.
I would get the demo app and really just experiment with these words.

Encoding spaces for exact bigram match in twitter streaming API track keywords

I'm using the Twitter streaming API. It works wonderfully for single words, but seemingly cannot filter by an exact bigram (two word string).
I'm testing this by searching for common words, that are commonly in combination:
e.g. "feel good"
This is the URL: (will require OAuth login):
https://stream.twitter.com/1.1/statuses/filter.json?track=keywords_go_here
Things that don't work:
track=feel%20good ==> still produces: "text":"Feels so good outside!..."
track=%27feel%20good%27 ==> produces nothing
track=feel%20good, ==> still produces "good that my friend has an ED too because I can feel..."
Any ideas on getting this to work?
edit: someone sort-of answered this in early 2010: Twitter Streaming API - tracking exact multiple keywords in exact order , but are there any updates on this issue?
It seems like you can do that search according to the api: https://dev.twitter.com/docs/using-search
"happy hour" containing the exact phrase "happy hour"
Just need to put your phrase in quotation
I am sorry, but the answer is
Exact matching of phrases (equivalent to quoted phrases in most search engines) is not supported.
Furthermore,
Punctuation and special characters will be considered part of the term they are adjacent to.
So if you track "feel good", you will get messages such as
He said, "feel it", and I replied, "I am good".
If you want exact matches, then you have two options:
A) track both terms and then discard all tweets that don't have exact matches, or
B) get a paid subscription to the Twitter firehose with Gnip or DataSift. Twitter makes a living out of things like this, so I don't think it's ever gonna be available on the Streaming API.

Algorithm for keyword/phrase trend search similar to Twitter trends

Wanted some ideas about building a tool which can scan text sentences (written in english language) and build a keyword rank, based on the most occurrences of words or phrases within the texts.
This would be very similar to the twitter trends wherin twitter detects and reports the top 10 words within the tweets.
I have identified the high level steps in the algorithm as follows
Scan the text and remove all the common , frequent words ( such as, "the" , "is" , "are", "what" , "at" etc..)
Add the remaining words to a hashmap. If the word is already in the map then increment its count.
To get the top 10 words , iterate through the hashmap and find out the top 10 counts.
Step 2 and 3 are straightforward but I do not know in step 1 how do I detect the important words within a text and segregate them from the common words (prepositions, conjunctions etc )
Also if I want to track phrases what could be the approach ?
For example if I have a text saying "This honey is very good"
I might want to track "honey" and "good" but I may also want to track the phrases "very good" or "honey is very good"
Any suggestions would be greatly appreciated.
Thanks in advance
For detecting phrases, I suggest to use chunker. You can use one provided by NLP tool like OpenNLP or Stanford CoreNLP.
NOTE
honey is very good is not a phrase. It is clause. very good is a phrase.
In Information Retrieval System, those common word are called Stop Words.
Actually, your step 1 would be quite similar to step 3 in the sense that you may want to constitute an absolute database of the most common words in the English language in the first place. Such a list is available easily on the internet (Wikipedia even has an article referencing the 100 most common words in the English language.) You can store those words in a hashmap and while scanning your text contents just ignore the common tokens.
If you don't trust Wikipedia and the already existing listing for common words, you can build your own database. For that purpose, just scan thousands of tweets (the more the better) and make your own frequency chart.
You're facing an n-gram-like problem.
Do not reinvent the wheel. What you seem to be wanting to do has been done thousands of times, just use existing libs or pieces of code (check the External Links section of the n-gram Wikipedia page.)
Check out the NLTK library. It has code that does number one two and three:
1 Removing common words can be done using stopwords or a stemmer
2,3 getting the most common words can be done with FreqDist
Second you can use tools from Stanford NLP for tracking your text

Resources