google Cloud Speech-to-Text dtmf support? - google-cloud-speech

We need a possibility also to identify DTMFs.
Scenario:
On a phone conversation we ask the user "Please enter id number".
We stream the audio to google cloud speech to text.
We would like to support both options at the same time: (a) user says the id nubmer (b) user presses the id numbers using phone keys (DTMf).

DTMF seems can't be detected using Cloud Speech-to-Text. I've opened a public issue on your behalf requesting this feature. You can star the issue and follow the progress.
As a workaround you can use scipy.signal library's spectogram method to detect the unique DTMF frequencies.

Related

Can I setup my own whatsapp photo, name // use my own phone number in Twilio whatsapp-api? And the sandbox question

Now, I am trying to build a program to send notifications to our customers via whatsapp if the product has been updated.
And I have tried using twilio (sandbox), but I am a little confused when using twilio whatsapp api, here is my question ...
If I use "Enabled WhatsApp Senders" function, can I use my own phone number or only the twilio phone number (trial number)?
If I use "Enabled WhatsApp Senders" function, can I set whatsapp photo and name?
After using "Enabled WhatsApp Senders" function, do I need to add all customer numbers to snadbox, or can I use the programming code to send it directly to the customer?
After using "Enabled WhatsApp Senders" function, should I always log in to whatsapp? Since I tried other whatsapp-api (must always log in to wahstapp to make the api work ...)
pic
You can use a Twilio number in your account that has been WhatsApp enabled. There is a process you must follow to get a Twilio number WhatsApp enabled, which is defined here and here and here.
A picture is part of the WhatsApp business profile creation, covered above. Also, pay attention to the behavior difference between a WhatsApp business account and a verified business account for very large brands.
Once your account has its own WhatsApp enabled number, you no longer need to have users subscribe to the shared Sandbox number. You do need to respect the 24hr window for free-form communications and the use of templates for out of session communications, covered in the documentation I shared above, and here.
There is no need to be logged into WhatsApp. In fact, the Twilio WhatsApp integrations supported read receipts, so you know when the end-user read your message.

Open Google Voice app with specific number

I want to implement call feature without sharing my actual SIM number. So I am trying to use Google Voice for that. Is it possible to open Google Voice app and automatically select a specific number. I will perform the call action manually.
Is this possible?

Speech to Text using Twilio

We use microsoft botframework for our chatbots. We would want to enable Voice channel to our bot. Is there a way to solution this? Does Twilio have anything that can add speech capabilities to our bot. Our bots are exposed via webchat components, skype, facebook messenger etc.
Twilio developer evangelist here.
There's no way within Botframework to add voice capabilities from Twilio, however receiving calls works in a similar way. When someone calls your Twilio number you receive a webhook which you can respond to with TwiML to tell Twilio what to do with the call.
To then perform things by voice action you can <Record> the caller's response and set the transcribe parameter to true. You also need to set a transcribeCallback URL as the transcription is done asynchronously. Once you receive that callback, the text of the transcription will be available as a parameter in the request. You could also perform the transcription yourself with a third party service by just taking the recording and sending it off.
Once you receive the transcription you can then make your decision as the the next step of the conversation and redirect the live call to the next step of your process using the REST API.
This is just a high level overview of how you might accomplish this. Let me know if it is of any help.
Voximal offers as Twillo a similar product but based on VoiceXML. The difference is that Voximal integrates natively most of STT engines (Microsoft, Google, Watson, iSpeech) in the solution (you only need to set the key or the user/password to configure them). You use a builtin grammar "text" to translate. Then the processing is very similar to the Twilio. You need to push the content to a chatbot engine (HTTP/XML/JSON), and you have a way to play the result with a TTS engine.
Have a look to the Parrot example (a script that repeats all you said using the STT and TTS) :
https://github.com/voximal/voicexml-examples/blob/master/parrot/parrot.vxml

Twilio: how to detect which participant is speaking in a conference

I'm working on to implement an audio-only conference app which is something like Google Hangout, but without video.
In Google Hangout, all participants can see which participants is currently speaking, via visual feedback. In other words, when someone starts speaking, their avatar comes foreground of all of participants immediately.
So, here's my questions for Twilio and its client SDKs:
Is there any way to detect current speaker (and give some feedback to users) ?
Is there any way to get input level of a microphone via the SDKs?
Interesting use cases .
I think both these can be achieved with Twilio and here's my views on how to achieve these
Detecting current speaker
What you essentially require is a flag that is globally shared across all participants. This flag should have the facility to be dynamically updated in real time by the speaker who is speaking and at the same time push this information , again in real time , to other participants. So , simply put , you want a shared resource where each Twilio Client can 'publish' and 'subscribe' their 'speaking' state. You could achieve it via Twilio Sync .
To do that , you could create a list object on Sync and add each participants whose audio level goes above a certain level that you consider them speaking . All Client instances in the conference should be subscribed to this list and so on 'itemAdded' or 'itemRemoved' each instance of client can get a list of participants who are speaking . Based on it , UI changes could be done.
You can get audio level (output and input) at each client instance by querying Twilio Voice Insights .For Audio input level , you will have the value passed in parameter named AudioLevelIn .
Note : Both these products require requesting access .

How do you retrieve a caller ID from from a digital line or a VOIP phone?

On my work place, we have some (PBX)Business Communications Manager 450 Telephone System and some (PBX)BCM50 also. The BCM450 is hybrid, which mean it can use digital lines and at the same time VOIP phones.
Right now, for example a user can have a Nortel Avaya 1120E or a Nortel T7316 Norstar in their desk.
I would like to know if there is a way in which way I can get the phone number of the caller, so I can use that number on a custom software for the company. In other words I want the phone number of the person calling me (maybe the extension if that is the case).
What I'm looking for, is that when a customer calls, the information associated with the telephone number can be seen on the screen programmatically (without input from system operator).
I can see there is someone talking about TAPI API, but i believe this is only for analog lines? Maybe somebody can put me on the right path or maybe somebody can provide an example of how it is done using any programming language.
If the VOIP phones are SIP based then you'd sniff eth. ports of the phones. SIP messages contains caller Id and called Id.
You can google like "SIP sniffer" for source code example...

Resources