Twilio: multiple-step recording - twilio

I want to get four related pieces of information from a caller, and get recordings of each. Do I need to implement this as four separate calls (Say/Record pairs in the XML file) to the Twilio API and my web endpoint (the Record 'action' or StatusCallback)? I'm using Python and Flask, but examples in other languages and frameworks would be helpful too.

Twilo evangelist here.
Four separate recordings is probably the easiest way to do what you want.
<Say>Begin Recording 1</Say>
<Record>
<Say>Begin Recording 2</Say>
<Record>
<Say>Begin Recording 3</Say>
<Record>
<Say>Begin Recording 4</Say>
<Record>
<Say>Thanks for recording four things</Say>
The hard part is going to be knowing that the user actually gave you the right information, so you might want to think about adding a step at the end of your call flow that allows your caller to hear their recording and choose to re-record the info if they want.
Hope that helps.

Related

Play live stream on twilio hold

I have a need to play LIVE audio to enqueued twilio calls (instead of playing hold music from an mp3 file for instance).
I've tried pointing the hold music to a live mp3 stream (icecast), which didn't work.
The only thing I can think of is to start a conference, put a call into it that is sending the audio I need to be played on hold, and then on-hold calls are placed (muted) into that conference.
Doesn't seem like best way, and I'd like to avoid conference costs though (there are millions of minutes per month of on-hold time).
Is there a more elegant solution for this problem?
Twilio developer evangelist here.
As far as I am aware this is not possible with <Play>. When you give Twilio an mp3 file to play it first downloads and caches the file (if the headers allow for it). When working with mp3 files Twilio expects an existing file and a finite file size.
I think using a conference, or a series of direct one to one calls, to play the stream as you suggested is likely the best solution. If you do have millions of minutes per month then I recommend you get in touch with the Twilio sales team who might be able to make those minutes more affordable.

Real-time speech recognition from a phone call recorded with Twillio

I'm currently using Twilio to make phone calls and I'd like to add a speech recognition element such that if a user says a specific phrase, my backend can take specific actions. If you're familiar with Twilio, something akin to the Gather verb. It needs to be real-time since if there are issues with recognition, the user would be prompted for clarification.
To add speech recognition to the Twilio Gather verb, add "speech" to the Gather input value, example: input="dtmf speech". After the caller says something and is quiet, the Twilio server translates the speech in text and sends the text to the action URL, then waits for response instructions. Your program can use the text to respond how ever you choose. One choice is to have your program respond with correction instructions (Say verb) and have the caller say something more, which would be processed again by your action URL.
Twilio Gather documentation including the implementation of speech recognition:
https://www.twilio.com/docs/api/twiml/gather
Example TwiML with a Gather verb using the speech recognition identifier.
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Gather input="dtmf speech" language="en-us"
numDigits="1"
timeout="6"
action="http://hostname/processUserResponse.py">
<Say voice="alice" language="en-CA">
Okay, speech recognition test. Enter any digit or say something.
</Say>
</Gather>
<Say voice="alice" language="en-CA">
Waited to long to say something. Response canceled ....
</Say>
</Response>
This was briefly covered here: https://stackoverflow.com/a/30224103/6189694
Seems like you would have to set up a conference call, and then join in as a muted user to listen in on the call.
I don't believe there is anything that works in real-time to do this. You could, however, use voice recording, pass the recording to another service (IBM's Watson Speech to Text comes to mind) and then handle it from there. It should be able to do this relatively quickly with the right workflow. I have never used Watson, just seen it used. So I am not sure on how long it would take to process the recording. I would think one or two word commands should be completed quickly.
Sorry I can't provide more guidance. Someone else in the community may have another method.
C# .net Core IVR Gather example using list of enums instead of the combined enum available in the official old C# example as per my comment above (also had to convert the url.actionurl to this monstrosity):
List<Gather.InputEnum> bothDtmfAndSpeech =
new List<Gather.InputEnum>(2){
Gather.InputEnum.Dtmf, Gather.InputEnum.Speech
};
var gather = new Gather(
action: new Uri(Url.Action("Show", "Menu")),
numDigits: 1, input:bothDtmfAndSpeech, bargeIn: true);
The IBM Watson Speech To Text service (STT) has this capability, it is called Keyword Spotting (https://www.ibm.com/watson/developercloud/doc/speech-to-text/output.shtml). Watson STT will let you push a live stream of telephony audio and produce not only recognition hypotheses but also it will be able to detect whether the user said sentences or commands specified beforehand. There is actually a demo that showcases this functionality, please give it a try:
https://speech-to-text-demo.mybluemix.net/

Twilio Recording: Pause and Resume

I believe the answer is no, but does Twilio provide ability to pause/resume a recording? Use case is recording a call, but pausing recording when collecting sensitive information. From the REST documentation, it doesn't appear to be a supported capability. Thought someone might have found some options for this requirement.
This is possible, though it's not wholly obvious how from the documentation.
You can modify call state using the REST API, as per https://www.twilio.com/docs/api/rest/change-call-state , and we basically use it to tell the call to re-dial to the same agent (presumably this is a call centre?) but with no-record, and then again with record re-activated once we're done.
You end up with two separate recordings for the call, which in our case we download, stitch together, and store back to our storage platform.
Edit:
Having discussed this issue with Twilio support, there's another possibility which allows you to just have a single recording.
Instead of dialling the two ends of the call together, you instead put them both into a conference that's recorded when you initially connect the call. When you want to pause it, using the REST API, you add a new "hold" leg into the conference, then move the two real ends of the call onto a new conference that isn't recorded. When you're done, you move them back again and it's "unpaused". You then only have one recording from the original conference.
None of these is ideal, and apparently they are working on a proper support setup for this (fairly obvious!) requirement, but this should solve it for now.
The Recording Pause & Resume feature is now supported in the Twilio API. Here's a link that gets you started:
https://support.twilio.com/hc/en-us/articles/360010199074-Getting-Started-with-Call-Recording-Controls#pause_resume

Sending tones via a manual process with Twilio

Our call center deals with businesses and we use Twilio to make our calls. However, many businesses have a menu to navigate before we get to talk to someone. How can I create a 10-key pad on our end and use it to send menu selections to the call we are connected with?
I know about the senddigits attribute on Dialing numbers with Twilio, but this sends preprogrammed tones. We have no way of knowing what the tones need to be until we are connected and in the menu, so this won't work.
I've been through the API pretty thoroughly and can't seem to find anything relating to this.
If there is nothing, is there another software that anyone can recommend that allows for making calls out, generating recordings of calls and allows me to send keytones manually after the call has been started?
Check out the digits attribute of the 'Play' tag.
https://www.twilio.com/docs/api/twiml/play#attributes-digits
Each 'w' character tells Twilio to wait 0.5 seconds instead of playing a digit.
Assuming I am understanding your problem, could you not us MP3s of DTMF tones (http://jetcityorange.com/dtmf/) and PLAY to send the tones after the call has started?

Is it possible to transcript a Twilio call "as you speak"?

Does anyone knows if it is possible by Twilio to create multiple audio records during a call based on a kind of audio flag or pattern, like silence for example. So that you could fire a callback on the end of each portion of speech to generate text during the call.
thank...
Twilio Evangelist here.
So, you could use the timeout attribute on the <Record> verb to get short 'bursts' of spoken text, but this may mean you time out while the caller is speaking a word. So you would only get half of it! This may make it difficult to decipher what is being said, and I would personally not use this approach.
You can end recording on a key-press (a DTMF tone) with the finishOnKey attribute, which may help your needs.
You cannot currently get a live, or near realtime transcription. You will receive the transcription very quickly, but we only support the timeout and key presses to end a recording and begin transcription.
Hope this helps!

Resources