I have my twilio flow configured right, but even though the app recognizes the word it doesn´t follow the path it should, i tried to put all possible variations of the world in the widget, but made no difference.
Related
I am banging my head on this. Twilio Studio says it supports SSML using Amazon Polly voices on the say and gather widgets, https://www.twilio.com/docs/studio/widget-library/sayplay#ssml-support-for-polly-voices
I cannot make them work no matter what I try.
I tried using the examples from their docs, but nothing. What I currently have is this.
Twilio gather
I have also tried wraapping the whole block of text in valid ssml, using neural and non neural voices, single quoting, escaping. Nothing seems to work like the docs tell me it will.
When I look in the call log, the converted TWiML just strips all of the ssml. It looks like this
Twilio details
Any idea what I am doing wrong?
I figured this out in the end. The whole text block needed to be wrapped in <speak> </speak> block and the ampersands in the text needed to be removed.
I have an idea of making a live commentary assistant for football matches and so far this is what I have achieved:
I am using Dialogflow and achieved linking it with Actions on Google, so every time I say something Dialogflow can detect and fire an event to google home so the Google home responses.
The thing I would like to know more is, how about for example when a team scores a goal and google assistant catch that instantly, in real time.
One possibility is, I have the API for getting all the matches and scores updated every time team scores and fetch that every second and then call an intent in dialogflow so that will fire to google home assistant, but I am thinking that is somehow not the best idea.
Does anyone have any idea about this?
Sorry I am not adding any code, as the code in this case is not important as is the approach, idea how to achieve it.
Unfortunately, Actions on Google is not suited for this kind of use-case. The platform is designed for conversational experiences, where there's a back and forth. The platform intentionally limits real time features like background continual listening as well as things like push notifications.
Push notifications do work on phones, although not other surfaces like smart speakers (ie. Google Home). You can use that to get close to the behavior you may want, but otherwise it may not be suitable for your use case.
I've noticed that certain apps on Android (ie. gboard) support translating phrases such as 'poop emoji' into the actual emoji as part of speech recognition. I was wondering if this is something that is supported through google's cloud speech APIs that I could similarly use in my own applications?
In my initial scan of the API I can't see anything that might indicate a way to turn this on (ie. RecognitionConfig et.al has no obvious toggles for it), and in some quick one-off tests in my own app I wasn't provided emoji-fied results from the service.
I've done a bunch of googling but found nothing so far.
Any insight here would be awesome, thanks!
-edit- Thanks to the answer below I have learned this currently is not supported. I've gone to Google's issue tracker to request this feature. If anyone wishes to track the feature request the link is:
https://issuetracker.google.com/u/1/issues/113978818
The Cloud Speech-to-Text API service doesn't currently support emoji phrases recognition; however, you can use the Send Feedback button located at the lower left and upper right corners of the service public documentation, as well as take a look the Issue Tracker tool in case you want to raise a Speech API feature request in order to notify to Google about this desired functionality.
Finally, you can refer to the Release Notes section of Speech-to-Text API to keep the track of the new features and functionalities added to the service.
I was investigating various Speech Recognition strategies and I liked the idea of grammars as defined in the Web Speech spec. It seems that if you can tell the speech recognition service that you expect “Yes” or “No”, the service could more reliably recognize a “Yes” as “Yes”, “No” as `No”, and hopefully also be able to say “it didn’t sound like either of those!”.
However, in SFSpeechRecognitionRequest, I only see taskHint with values from SFSpeechRecognitionTaskHint of confirmation, dictation, search, and unspecified.
I also see SFSpeechRecognitionRequest.contextualStrings, but it seems to be for a different purpose. I.e., I think I should put brands/trademark type things in there. Putting “Yes” and “No” in wouldn’t make those words any more likely to be selected because they already exist in the system dictionary (this is an assumption I’m making based on the little the documentation says).
Is a way with the API to do something more like grammars or, even more simply, just providing a list of expected phrases so that the speech recognition is more likely to come up with a result I expect instead of similar-sounding gibberish/homophones? Does contextualStrings perhaps increase the likelihood that the system chooses one of those strings instead of just expanding the system dictionary? Or maybe I’m taking the wrong approach and am supposed to enforce grammar on my own and enumerate over SFSpeechRecognitionResult.transcriptions until I find one matching an expected word?
Unfortunately, I can’t test these APIs myself; I am merely researching the viability of writing a native iOS app and do not have the necessary development environment.
We're working on a app for blind and visually impaired users. We've been experimenting with a third party library to get spoken user input and convert it to text, which we then parse as commands to control the app. The problem is that the word recognition is not very good and certainly not anywhere near as good as what iOS uses to get voice input on a text field.
I'd like to experiment with that, but our users are mostly unable to tap a text field, then hit the mic button on the popup keyboard, then hit the done button or even dismiss any of it. I'm not even sure how they can deal with a single tap on the whole screen, it might be too difficult for some. So, I'd like to automate that for them, but I don't see anything in the docs that indicates it is possible. So, is it even possible, and if so, what's the proper way to do it so that it passes verification?
The solution for you is to implement a keyword spotting so that the speech recognition will be activated with the keyword instead of button tap. After that you can record commands/text and recognize them with any service you need. Something like "Ok google" activation on Motorola X.
There are several keyword activation libraries for iOS, one possible solution is OpenEars based on the open source speech recogntion library CMUSphinx. If you want to use Pocketsphinx directly, you can find keyword activation implementation in kws branch in subversion (branches/kws)
The only way to get the iOS dictation is to sign up yourself through Nuance: http://dragonmobile.nuancemobiledeveloper.com/ - it's expensive, because it's the best. Presumably, Apple's contract prevents them from exposing an API.
The built in iOS accessibility features allow immobilized users to access dictation (and other keyboard buttons) through tools like VoiceOver and Assistive Touch. It may not be worth reinventing this if your users might be familiar with these tools.