I'm developing an iOS app that does voice based AI; i.e. it's meant to take voice input from the microphone, turn it into text, send it to an AI agent, then output the returned text through the speaker. I've got everything working, though using a button to start and stop recording the speech (SpeechKit for voice recognition, API.AI for the AI, Amazon's Polly for the output).
The piece that I need is to have the microphone always on and to automatically start and stop the recording of the user's voice as they begin and end talking. This app is being developed for an unorthodox context, where there will be no access to the screen for the user (but they will have a high-end shotgun mic for recording their text).
My research suggests this piece of the puzzle is known as 'Voice Activity Detection' and seems to be one of the hardest steps in the whole voice-based AI system.
I'm hoping someone can either supply some straightforward (Swift) code to implement this myself, or point me in the direction of some decent libraries / SDKs that I can implement in this project.
For good VAD algorithm implementation you can use py-webrtcvad.
It is a Python interface for C code, you can just import C files from the project and use them from swift.
Related
I'm new to iOS programming and I don't know where to start. I found code examples how to read frequencies from the microphone with AudioKit framework. But this is not what I am looking for. Is it possible to retrieving frequency of the currently playing song in real time without using a microphone?
Thank you for help.
The iOS security sandbox prevents apps from capturing general audio output of any other app, such as the Music app.
Certain music apps, such as GarageBand might share inter-app audio, but this isn't supported by the majority of apps that output "songs".
An app might play the "song" itself, via an AVAudioPlayer, and tap the AVPlayer's output to get raw sample data for spectral frequency and pitch analysis (two very different things, by-the-way).
Currently, I am working on developing the iOS App that triggers an event upon voice command.
I saw a camera app, where a user says "start recording," then the camera starts to the recording mode.
This is an in-app voice control capability, so I am thinking it is different from SiriKit or SpeechRecognizer, which I have already implemented.
How would I achieve it?
My question is NOT the voice dictation where a user has to press a button to start dictation.
App needs to passively wait for a keyword, or intent, which is something like "myApp, start recording" or "myApp, stop recording", then the app starts/stop that event function accordingly.
Thanks.
OpenEars : Free speech recognition and speech synthesis for the iPhone.
OpenEars makes it simple for you to add offline speech recognition in many languages and synthesized speech/TTS to your iPhone app quickly and easily. It lets everyone get the great results of using advanced speech app interface concepts.
Check out this link.
http://www.politepix.com/openears/
or
Building an iOS App like Siri
https://www.raywenderlich.com/60870/building-ios-app-like-siri
Thank you.
How would I achieve it?
There's an iOS 13 new feature called Voice Control that will allow you to reach your goal.
You can find useful information in the Customize Commands section where all the vocal commands are available (you can create a custom one as well):
For the example of the camera you mentioned, everything can be done vocally as follows:
I showed the items names to understand the vocal commands I used but they can be hidden if you prefer (hide names).
Voice Control is a built-in feature you can use inside your apps as well.
The only thing to do as a developer is eventually adapting the accessibilityUserInputLabels properties if you need specific names to be displayed for some items in your apps.
If you're looking for a voice command without pressing a button on iOS, the Voice Control is THE perfect candidate.
I'd like my iOS app to use text-to-speech to read to the user some information that it receives from a server, and I'd also like to allow the user to stop such speech by a voice command. I have tried speech recognition frameworks for iOS like OpenEars and I find the problem that it is listening and detecting the information the app itself is "saying" and it intereferes in the recognition of user's voice commands.
Has somebody dealt with this scenario in iOS and found a solution for that? Thanks in advance
It is not a trivial thing to implement. Unfortunately iOS and others record the sound which is playing through speaker. The only choice you have is to use the headset. In that case speech recognition can continue listening for input. In Openears recognition is disabled during TTS unless headset is plugged in.
If you still want to implement this feature which is called "barge-in" you have to do the following:
Store the audio you play though microphone
Implement noise cancellation algorithm which effectively will remove the audio from the recording. You can use cross-correlation to find a proper offset in the recording and spectral subtraction to remove the audio.
Recognize the speech in remaining signal.
It is not possible to do that without significant modification of openears sources.
Related question is Android Speech Recognition while music is playing
Currently I am developing an IOS app.I need once I open the camera through application then when i say "start recording" then recording should start automatically. Once i say "Cut" then it should stop. and ask for save video share etc.
My main concern is regarding voice command. recording should start stop through voice command.
Looking forward for experts suggestions.
Many Thanks
As Siri is not yet exposed to developers there are lot of third party sdks dat can be used for recognizing speech.
I have used
http://dragonmobile.nuancemobiledeveloper.com/public/Help/DragonMobileSDKReference_iOS/Introduction.html
There are number of other sdks too ,u can easily google them
U have to set the words dat u want to recognize and perform intended action.
All d best with your app!!
I'd like to stream video from the camera on an iOS device to a receiver via wifi, in effect turning the device into a wireless webcam. Is there a way to build a small app that captures video input on an iOS app and sends it via an RTSP stream or similar?
As this is an ad hoc experiment, I'm not concerned about App Store guidelines and can jailbreak if necessary.
If I interpret your question correctly you more or less need to solve four problems:
Get the camera feed.
Convert/encode this to the right format.
Stream the data.
Prevent the phone from locking itself and going into deep sleep.
The first one is fairly simple and Apple has as always provided good documentation and examples -> API link. Make sure you check out their example in the end as you will get a CMSampleBufferRef data object back.
For the second and third part, you should check out the CFNetwork framework and specially CFFTPStream for streaming using FTP.
If your are only building this for yourself then you can always turn off the Auto-Lock feature in the settings. If you on the other hand would like to distribute this to other users you could use a trick to play a mute sound every 10 seconds. This is more or less how all the alarm clocks work in the App Store. Here's a tutorial. =)
I hope I helped a little bit at least.
Good luck and best regards!
I'm 70% of the way to doing the same thing. Here's how I did it:
Capture content from video input
Chop video into files for use in HTML Live Streaming.
Spin up a web server on the iPhone and make the video files available.
Connect to the IP address of the phone and viola! you've got live streaming video.
Last time I touched the code I was trying to debug my Live Streaming not working. I'll try and get my source code posted on github this weekend, if you'd like to take a look.