I am using Twilio studio flow to make an IVR and not want to miss a single command from customer. When we say a sentence Gather Input widget is working but we say a single word like sales .Widget not detected any word and trigger no input .Can someone give suggestion how we used Gather input in our flow that it detect even a single word. I used hints and set language as well. I also used speech model like number and command but I am not sure how we used it .
Waiting for Answer.
Thanks
I am trying to detected single words as well from customers in Twilio studio
You're not going to be able to do both well and need to direct your user to either say something short or say something long. I recommend you go with short for best results. Additionally, if you don't want to miss anything your customer said you might want to record your call and to post call analysis to see if there was anything important there. Honestly, if this is your first time introducing speech to your customers do this:
"Thank you for calling X, how may I help you?"
Customer says whatever.
"Let's try this a different way for X press 1, for Y press 2."
This helps you to better understand what your customer's would normally ask for in their own words and then have a better idea if you really need to capture long sentences or short words.
david
Related
Im sure this is a fairly basic question. I have completed the example https://www.twilio.com/blog/2017/06/how-to-use-twilio-speech-recognition.html and when I run it and I say either cat, number or chuck facts it repeats exactly what I have said perfectly but says that it cant match up with the switch statement. Has anyone experienced this before and if so could they let me know where to look.
Additionally I have also set up a basic IVR using studio flow and that also doesnt accept the speech
Any help would be appreciated.
Thanks
Steve
I am trying to implement the accessibility to my ios project.
Is there a way to correct the pronunciation of some specific words when the voice-over is turned on? For example, The correct pronunciation of 'speech' is [spiːtʃ], but I want the voice-over to read all the words 'speech' as same as 'speak' [spiːk] during my whole project.
I know there is one way that I can set the accessibility label of any UIElements that I want to change the pronunciation to 'speak'. However, some elements are dynamic. For example, we get the label text from the back-end, but we will never know when the label text will be 'speech'. If I get the words 'speech' from the back end, I would like to hear voice-over read it as 'speak'.
Therefore, I would like to change the setting for the voice-over. Every time, If the words are 'speech', the voice-over will read as 'speak'.
Can I do it?
Short answer.
Yes you can do it, but please do not.
Long Answer
Can I do it?
Yes, of course you can.
Simply fetch the data from the backend and do a find-replace on the string for any words you want spoken differently using a dictionary of words to replace, then add the new version of the string as the accessibility label.
SHOULD you do it?
Absolutely not.
Every time someone tries to "fix" pronunciation it ends up making things a lot worse.
I don't even understand why you would want screen reader users to hear "speak" whenever anyone else sees "speech", it does not make sense and is likely to break the meaning of sentences:
"I attended the speech given last night, it was very informative".
Would transform into:
"I attended the speak given last night, it was very informative"
Screen reader users are used to it.
A screen reader user is used to hearing things said differently (and incorrectly!), my guess is you have not been using a screen reader long enough to get used to the idiosyncrasies of screen reader speech.
Far from helping screen reader users you will actually end up making things worse.
I have only ever overridden screen reader default behaviour twice, once when it was a version number that was being read as a date and once when it was a password manager that read the password back and would try and read things as words.
Other than those very narrow examples I have not come across a reason to change things for a screen reader.
What about braille users?
You could change things because they don't sound right. But braille users also use screen readers and changing things for them could be very confusing (as per the example above of "speech").
What about best practices
"Give assistive technology users as similar an experience as possible to non assistive tech users". That is the number one guiding principle of accessibility, the second you change pronunciations and words, you potentially change the meaning of sentences and therefore offer a different experience.
Summing up
Anyway this is turning into a rant when it isn't meant to be (my apologies, I am just trying to get the point across as I answer similar questions to this quite often!), hopefully you get the idea, leave it alone and present the same info, I haven't even covered different speech synthesizers, language translation and more that using "unnatural" language can interfere with.
The easiest solution is to return a 2nd string from the backend that is used just for the accessibilityLabel.
If you need a bit more control, you can pass an AttributedString as the accessibilityLabel with a number of different options for controlling pronunication
https://medium.com/macoclock/ios-attributed-accessibility-labels-f54b8dcbf9fa
I know Autopilot doesn't support Portuguese, but this is a major bummer.
My bot asks a "yes or no" question (sim ou não) and if the answer is without accentuation (nao), it doesn't understand it means 'não'.
I can't even type in 'nao' as a synonym in the custom field type, it says 'FieldValue already in use'.
I asked the Twillio support 9 days ago and haven't got a reply yet. What do you guys think? Is there a way around this? =/
Twilio developer evangelist here.
The Autopilot speech recognition engine is only working for English, so the transcribed text will always be in English: having "não" as a field value won't work for queries like "nao", but it would be helpful to add "nao" as a field value with synonym "não".
It seems your assistant does not have samples with fields which means if you do not use the Collect Action, those values will never get recognized even when not using voice, since the model does not have those values. Using Collect will be useful in your case; otherwise, I think you should add samples with fields to certain tasks.
Let me know if this helps at all!
I am trying to use Open Ears for small part of my app. I have three or four keywords that I want to be able to "listen" to. Something like "Add", "Subtract", etc. I am just using the sample app found here. I want to have a special case in the app when I here "Add" etc. as opposed to a word that is not one of my four keywords. Right now I set my language to be only the four keywords, but whenever the Open Ears API hears anything, it picks between my four keywords. So if I cough, it picks the closest word out of the four words
How can I listen for a specific word without always choosing one of the keywords?
I was thinking I could have a whole bunch of words, a few hundred, and just check which word was spoken, and have a special case for my four keywords, but I don't want to have to type down each word. Does Open ears provide any default languages?
OpenEars developer here. Check out the dynamic grammar generation API that was just added in OpenEars 1.7 which may provide the right results for your requirements: http://www.politepix.com/2014/04/10/openears-1-7-introducing-dynamic-grammar-generation/
This approach might be more suitable for keyword detection and detection of fixed phrases. Please bring further questions to the OpenEars forums if you'd like to troubleshoot them with me.
I'm hoping somebody can point me in the right direction to learn about separating out actions from a bunch of text.
Suppose I have this text
Drop off the dry cleaning, and go to the corner store and pick-up a jug of milk and get a pint of strawberries.
Then, go pick up the kids from school. First, get John who is in the daycare next to the library, and then get Sam who is two blocks away.
By the time you've got the kids, you'll need to stop by the doctors office for the perscription. Tim's flight arrives at 4pm.
It's American Airlines flight 331 arriving from Dallas. It will be getting close to rush hour, so make sure you leave yourself enough time.
I'm trying to have it split up into
Drop off the dry cleaning,
and go to the corner store and pick-up a jug of milk and get a pint of strawberries.
Then, go pick up the kids from school. First, get John who is in the daycare next to the library, and then get Sam who is two blocks away.
By the time you've got the kids, you'll need to stop by the doctors office for the perscription.
Tim's flight arrives at 4pm.
It's American Airlines flight 331 arriving from Dallas. It will be getting close to rush hour, so make sure you leave yourself enough time.
I haven't been able to find anything in my searches that is specifically action based. It would need to be smarter than just picking out verbs, as there are multiple verbs that are sometimes associated with one action for, instance the second item has 'go','pick-up' and 'get', but that is all part of a single action. Of course, "Tim's flight" is only suggests an action with the present participle, with the verb coming toward the end of the segment.
Any suggestions on where to look to do this kind of thing? Things to watch-out for, recommended readings, etc. etc.
Simple approach: parse the text using [your favorite parser], then select the sentences or SBAR phrases that are in the imperative mood. The Stanford Parser just so happens to have "Improved recognition of imperatives" in its very latest release.
There's probably no need for machine learning beyond what is already incorporated in standard parser programs.
This domain is called Information Extraction.
The general approach to sentence understanding is either:
extract a Part-Of-Speech tagged parse-tree (Python spaCy.io, nltk, CoreNLP etc.)
extract a word-vector (e.g. word2vec)