Say "one moment" after a Gather finishes - twilio

I can't seem to find any documentation on this, but I'd like to say "one moment" in a Gather block in between when a user stops speaking and when the speech recognition processor delivers the words they said (since anecdotally this can take a few seconds and result in dead air in the meantime).
I can't seem to find anything like that in the documentation. All of the examples are for things like:
<Response>
<Gather>
<Say>Voice prompt to read to the user before collection</Say>
<Say>Say more things if you want</Say>
</Gather>
<Say>Something to say if the user doesn't provide feedback</Say>
</Response>
Having around 5 seconds of dead air isn't the worst thing ever, but it lacks polish.

Twilio developer evangelist here.
There is nothing to provide for a message after the user finishes speaking to the <Gather> and after the speech result is ready and sent to the action URL, however I think you might be characterising the delay wrong.
Twilio streams the voice to the speech detection service, so we get real time results (you can get partial results by setting a partialResultCallback URL). Instead, the time that elapses between the end of the caller speaking and the action being called is based on the timeout which is 5 seconds by default.
What I would suggest is that you try different values for the speechTimout attribute including auto, which "will stop speech recognition when there is a pause in speech and return the results immediately."
Let me know if that helps at all.

Related

At what speed does Twilio send DTMF tones when using the sendDigits command?

My Twilio application dials our conference line, waits two seconds and then sends the conference PIN, followed by #.
$dial->number('442031234567', ['sendDigits' => 'wwww123456789'] );
I would like to be able to give my users an estimate of how long they should expect silence (while Twilio is sending the PIN digits) before the call is ready. I can make the call multiple times and time the delay, but that seems less exact that finding the underlying timings!
I know that each w character takes 0.5s, but I can't find any documentation for the amount of time each digit takes after that wait.
I've looked at Twilio's docs for sendDigits and also play
Twilio developer evangelist here.
I don't believe we give any guidance on how long the DTMF tones will take, but I believe they are a constant time. I would recommend trying it a few times, along with the system that you are dialling in to in order to estimate the time for your users.

Real-time speech recognition from a phone call recorded with Twillio

I'm currently using Twilio to make phone calls and I'd like to add a speech recognition element such that if a user says a specific phrase, my backend can take specific actions. If you're familiar with Twilio, something akin to the Gather verb. It needs to be real-time since if there are issues with recognition, the user would be prompted for clarification.
To add speech recognition to the Twilio Gather verb, add "speech" to the Gather input value, example: input="dtmf speech". After the caller says something and is quiet, the Twilio server translates the speech in text and sends the text to the action URL, then waits for response instructions. Your program can use the text to respond how ever you choose. One choice is to have your program respond with correction instructions (Say verb) and have the caller say something more, which would be processed again by your action URL.
Twilio Gather documentation including the implementation of speech recognition:
https://www.twilio.com/docs/api/twiml/gather
Example TwiML with a Gather verb using the speech recognition identifier.
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Gather input="dtmf speech" language="en-us"
numDigits="1"
timeout="6"
action="http://hostname/processUserResponse.py">
<Say voice="alice" language="en-CA">
Okay, speech recognition test. Enter any digit or say something.
</Say>
</Gather>
<Say voice="alice" language="en-CA">
Waited to long to say something. Response canceled ....
</Say>
</Response>
This was briefly covered here: https://stackoverflow.com/a/30224103/6189694
Seems like you would have to set up a conference call, and then join in as a muted user to listen in on the call.
I don't believe there is anything that works in real-time to do this. You could, however, use voice recording, pass the recording to another service (IBM's Watson Speech to Text comes to mind) and then handle it from there. It should be able to do this relatively quickly with the right workflow. I have never used Watson, just seen it used. So I am not sure on how long it would take to process the recording. I would think one or two word commands should be completed quickly.
Sorry I can't provide more guidance. Someone else in the community may have another method.
C# .net Core IVR Gather example using list of enums instead of the combined enum available in the official old C# example as per my comment above (also had to convert the url.actionurl to this monstrosity):
List<Gather.InputEnum> bothDtmfAndSpeech =
new List<Gather.InputEnum>(2){
Gather.InputEnum.Dtmf, Gather.InputEnum.Speech
};
var gather = new Gather(
action: new Uri(Url.Action("Show", "Menu")),
numDigits: 1, input:bothDtmfAndSpeech, bargeIn: true);
The IBM Watson Speech To Text service (STT) has this capability, it is called Keyword Spotting (https://www.ibm.com/watson/developercloud/doc/speech-to-text/output.shtml). Watson STT will let you push a live stream of telephony audio and produce not only recognition hypotheses but also it will be able to detect whether the user said sentences or commands specified beforehand. There is actually a demo that showcases this functionality, please give it a try:
https://speech-to-text-demo.mybluemix.net/

TwiML: create an automatic navigation & wait before playing the message

I am using Twilio to call Shops land-lines and play a message.
Some of the shops are using answering-machine with navigation to a specific departed: "press 3 for customer service..."
I would like to create an automatic navigation: When the system will recognize an answering machine I will use a digits sequence for each shop to reach the right department.
My problem is that after the system finished navigating
to the right department I don't know how long it will take the person in that department to pickup the phone and only after that to play the message.
This is what I am trying to do:
<Play>
<digits="wwww3">
</play>
<Pause length="?"/> // I don't know how long to wait.
<Play>
https://mySite/message.mp3
</play>
Is there an option to know when this person picks up the phone?
There is no easy answer. This is why most telemarketing companies use call screening (see below), or just play their message no matter what when someone answers.
Telemarketing is a multi-million dollar industry, and if this type of system was readily available or easy to develop companies would use it, however, they do not.
The only way you can be 100% sure a human has answered the phone and is listening is call screening as explained here: https://www.twilio.com/docs/tutorials/walkthrough/ivr-screening/php/laravel
While not a complete match for your query it is also covered here
https://www.twilio.com/help/faq/voice/can-twilio-tell-whether-a-call-was-answered-by-a-human-or-machine where it says
One alternative to AMD is Call Screening, aka “Human Detection”.
That is it for easy implementation. Outside of that you could code your own machine that could listen into the call via conference to try to identify when a human speaks and then process it. However, this type of system is very expensive and complex from what I have seen and even then it is not 100% reliable.

Sending tones via a manual process with Twilio

Our call center deals with businesses and we use Twilio to make our calls. However, many businesses have a menu to navigate before we get to talk to someone. How can I create a 10-key pad on our end and use it to send menu selections to the call we are connected with?
I know about the senddigits attribute on Dialing numbers with Twilio, but this sends preprogrammed tones. We have no way of knowing what the tones need to be until we are connected and in the menu, so this won't work.
I've been through the API pretty thoroughly and can't seem to find anything relating to this.
If there is nothing, is there another software that anyone can recommend that allows for making calls out, generating recordings of calls and allows me to send keytones manually after the call has been started?
Check out the digits attribute of the 'Play' tag.
https://www.twilio.com/docs/api/twiml/play#attributes-digits
Each 'w' character tells Twilio to wait 0.5 seconds instead of playing a digit.
Assuming I am understanding your problem, could you not us MP3s of DTMF tones (http://jetcityorange.com/dtmf/) and PLAY to send the tones after the call has started?

Is it possible to transcript a Twilio call "as you speak"?

Does anyone knows if it is possible by Twilio to create multiple audio records during a call based on a kind of audio flag or pattern, like silence for example. So that you could fire a callback on the end of each portion of speech to generate text during the call.
thank...
Twilio Evangelist here.
So, you could use the timeout attribute on the <Record> verb to get short 'bursts' of spoken text, but this may mean you time out while the caller is speaking a word. So you would only get half of it! This may make it difficult to decipher what is being said, and I would personally not use this approach.
You can end recording on a key-press (a DTMF tone) with the finishOnKey attribute, which may help your needs.
You cannot currently get a live, or near realtime transcription. You will receive the transcription very quickly, but we only support the timeout and key presses to end a recording and begin transcription.
Hope this helps!

Resources