How to use twilio bi-directional stream feature to play raw audio data - stream

I'm using Twilio Programmable Voice to process phone calls.
I want to use bi-directional stream feature to send some raw audio data to play by twilio, the initialization code looks like,
from twilio.twiml.voice_response import Connect, VoiceResponse, Stream
response = VoiceResponse()
connect = Connect()
connect.stream(url='wss://mystream.ngrok.io/audiostream')
response.append(connect)
Then when got wss connection from twilio, I start to send raw audio data to twilio, like this
async def send_raw_audio(self, ws, stream_sid):
print('send raw audio')
import base64
import json
with open('test.wav', 'rb') as wav:
while True:
frame_data = wav.read(1024)
if len(frame_data) == 0:
print('no more data')
break
base64_data = base64.b64encode(frame_data).decode('utf-8')
print('send base64 data')
media_data = {
"event": "media",
"streamSid": stream_sid,
"media": {
"playload": base64_data
}
}
media = json.dumps(media_data)
print(f"media: {media}")
await ws.send(media)
print('finished sending')
test.wav is a wav file encoded audio/x-mulaw with a sample rate of 8000.
But when run, I can't hear anything, and on twilio console, it said
31951 - Stream - Protocol - Invalid Message
Possible Causes
- Message does not have JSON format
- Unknown message type
- Missing or extra field in message
- Wrong Stream SID used in message
I have no idea which part is wrong. Does anyone know what's my problem? I can't find an example about this scenario, just follow instructions here, really appreciate it if someone knows there is an example about this, thanks.

Not sure if this will fix it but I use .decode("ascii"), not "utf-8"

Question is probably not relevant anymore, but I came across this while debugging my bi-directional stream, so it might be useful for someone:
Main reason why were you receiving this error because of the typo in json content. You are sending "playload" instead of "payload".
Another issue when sending data to twilio stream is that you should send mark message at the end of data stream to notify twilio that complete payload was sent. https://www.twilio.com/docs/voice/twiml/stream#message-mark-to-twilio
When sending data back to twilio stream, be aware that payload should not contain audio file type header bytes, so make sure you remove them from your recording or alternatively skip them while sending data to twilio.

Related

How can I make sure the Twilio Video dataTrack message gets sent and recieved?

I have a simple dataTrack for twilio-video
let dataTrack = new LocalDataTrack()
const tracks = track.concat(dataTrack)
room = await connectToRoom(roomConnectId, tracks)
....
function closeRoom() {
dataTrack.send(JSON.stringify('disconnected'))
}
How can I make sure this message is sent and recieved? Can I wrap it in a loop that keeps trying until it works? Are there any other options I can add to it to make it work?
I need the receiver to get this message 100%. Is that possible?
The DataTrack API uses WebRTC data channels which cannot 100% guarantee delivery. From the docs on configuring DataTrack reliability:
DataTracks are intended for low-latency communication between Participants. Importantly, to optimize for lowest latency possible, delivery of DataTrack messages is not guaranteed. You can think of them more like UDP messages, rather than TCP.
However there are some extra reliability features with the Twilio Video DataTrack.
You can set two parameters for the DataTrack API:
maxPacketLifeTime - the time in milliseconds during which the DataTrack will transmit or retransmit a message until that message is acknowledged
maxRetransmits - the maximum number of retransmit attempts that will be made.
Check out more on using these parameters in the Twilio Video DataTrack documentation.

How to change TWIML during a Media Streams call in Twilio

I would like to implement a fully programmatic interaction with twilio: the user calls, my server decides what to say, the user speaks, the server analyses the audio and decides what to answer using text-to-speech.
But I can't find in the documentation how to use media streams and text-to-speech at the same time.
With this code I can receive and send ulaw/8000 encoded audio:
#sockets.route('/')
def echo(ws):
while not ws.closed:
message = ws.receive()
if message is None:
continue
data = json.loads(message)
if data['event'] == "media":
# b64decode media.payload and audioop.ulawtolin() it
...
# make a media object with audio in media.payload and ws.send() it
With this I can say something when the user answers the call:
#app.route("/voice", methods=['GET', 'POST'])
def voice():
"""Respond to incoming phone calls with a 'Hello world' message"""
resp = VoiceResponse()
resp.say("hello world!", voice='alice')
return str(resp)
How can I use the two simultaneously, during a long user-server interaction?
EDIT: changed the title (see my comment below)

WhatsApp audio media message (MediaUrl0) Transcribe to text

I have a dialogflow chatbot that communicates with Whatsapp for business users, thru Twilio.
I would like to enhance the "chat" chatbot capability, and allow whatsapp users to also be able to send a voice messages.
WhatsApp voice media messages sent to Twilio have a URI parameter with the location of the media file, but this URI does not have a file extension. How can i extract the file to send it to a Speech-to-text service (Google or AWS) to have it transcribed into text and then send it to Dialogflow for intent recognition
Any ideas how i would go about doing this?
Twilio message log for a media message:
Request Inspector
+ Expand All
POST
https://xxxxxxxxxxxx
2021-04-27 08:35:39 UTC502
Request
URL
ParametersShow Raw
MediaContentType0 "audio/ogg"
SmsMessageSid "MMea4e6bcb3a9654a03d8d2a607c6d4cdd"
NumMedia "1"
ProfileName "xxxxx"
SmsSid "MMea4e6bcb3a9654a03d8d2a607c6d4cdd"
WaId "xxxxxxxxx"
SmsStatus "received"
Body ""
To "whatsapp:+32460237475"
NumSegments "1"
MessageSid "MMea4e6bcb3a9654a03d8d2a607c6d4cdd"
AccountSid "ACef27744806d8f8e68f25211b2ba8af60"
From "whatsapp:+32474317098"
MediaUrl0 "https://api.twilio.com/2010-04-01/Accounts/ACef27744806d8f8e68f25211b2ba8af60/Messages/MMea4e6bcb3a9654a03d8d2a607c6d4cdd/Media/ME27fbc66d47d8de49f1ae00e433884097"
ApiVersion "2010-04-01"
Message TextShow Raw
sourceComponent "14100"
httpResponse "502"
url "https://xxxxxxxxx"
ErrorCode "11200"
LogLevel "ERROR"
Msg "Bad Gateway"
EmailNotification "false"
I think you don't need the extension for this use case, you will probably need the language code for the resulting text and may be, AudioEncoding and sample rating for the transcription service.
Here is some examples from my code for whatson / google coud speech to text and DialogFlow.. AWS and Microsoft are very similar
//for ibm watson
RecognizeOptions recognizeOptions = new RecognizeOptions.Builder()
.model(RecognizeOptions.Model.ES_ES_NARROWBANDMODEL)
.audio(new ByteArrayInputStream(bytes))
.contentType(HttpMediaType.AUDIO_WAV)
.build();
// google speech to text
RecognitionConfig config = RecognitionConfig.newBuilder()
.setSampleRateHertz(48000)
.setLanguageCode(langcode)
.setEncoding(RecognitionConfig.AudioEncoding.OGG_OPUS)
.build();
// Dialogflow (sending audio directly)
InputAudioConfig inputAudioConfig = InputAudioConfig
.newBuilder()
.setLanguageCode(langcode)
.setSampleRateHertz(sampleRateHertz)
.build();
In the end, in all cases, what you send to the service is not a file but an array of byte (sort of)
Anyway, even when there is no one to one relation between content Type and file extension, the parameter "MediaContentType0" in the request give you a good starting point: "audio/ogg".

Click Noise after sending Audio Data to server

I am trying to record sound and sent it to server.
when I record sound there is no clicking noise.but after sending server I am getting a click sound.


As server is expecting raw wav stream without any “RIFF” headers, I am removing header and send it to the server

Here is my code
var newAudioData = audioData.advanced(by: 44)

I am removing 44 Byte from the front of original buffer and send newAudioData to server.

How that click sound is produced?

Thanks in advance

Twilio API for making outbound calls with a speech stream

I have a scenario where say at 5.00 AM every morning, I have a server side script / batch job that wakes up, selects a phone number from a list based on an algorithm, places a call to that phone number and uses text-to-speech to deliver a customized message. I have 2 questions,
Which Twilio API can I use to achieve this? Bear in mind there is no app UI and all the code would be on the back end. Think NodeRED flow or a Python script that is made to run at a given time.
Instead of specifying the text in the TwiML, can I pass say an audio stream from Watson's Text to Speech to the appropriate Twilio API?
To do this, you would need to use the programmable voice API from Twilio. This lets you play audio files, text to speech, make and manipulate phone calls, etc. I have never used Watson Text-to-Speech, but, if it can output an audio file you can play that with Twilio TwiML.
Here's an example in Node.
npm install twilio
//require the Twilio module and create a REST client
var client = require('twilio')('ACCOUNT_SID', 'AUTH_TOKEN');
client.makeCall({
to:'+16515556677', // Any number Twilio can call
from: '+14506667788', // A number you bought from Twilio
url: 'url/to/twiml/which/may/have/WatsonURL' // A URL that produces TwiML
}, function(err, responseData) {
//executed when the call has been initiated.
console.log(responseData.from); // outputs "+14506667788"
});
The TwiML could look like this:
<Response>
<Play loop="1">https://api.twilio.com/cowbell.mp3</Play>
</Response>
This would play the cowbell sound from the Twilio API. Just a default sound. This could be easily generated to play a Watson sound file if you can get a URL for that.
You could do the same thing in Node, if you'd rather not build the XML manually.
var resp = new twilio.TwimlResponse();
resp.say('Welcome to Twilio!')
.pause({ length:3 })
.say('Please let us know if we can help during your development.', {
voice:'woman',
language:'en-us'
})
.play('http://www.example.com/some_sound.mp3');
If you were to take this toString() it would output formatted XML (TwiML):
console.log(resp.toString());
This outputs:
<Response>
<Say>Welcome to Twilio!</Say>
<Pause length="3"></Pause>
<Say voice="woman" language="en-us">Please let us know if we can help during your development.</Say>
<Play>http://www.example.com/some_sound.mp3</Play>
</Response>
Hopefully this clears it up for you.
Scott

Resources