iOS Speech Recognition - SFSpeechAudioBufferRecognitionRequest overwrites edited transcription - ios

My project uses iOS 10 speech recognition. For continuous speech, I use SFSpeechAudioBufferRecognitionRequest from Apple's speech library, and saving the result to a UITextView.
When the user pauses for x number of seconds, I want to add a period to the transcription, but the new transcription always overwrites the period because iOS speech saves the entire transcripton in a single string and keep appending to the string, then relays the result to my app continously.
For example: if my transcription was hello it's a test and my UI correctly adds a period. But then the user records some more (without pressing the microphone button again because it's continuous), then the period will be overwritten because the speech engine does not know about the period, so it will be hello it's a test talking again shown on the screen and I lose my edits. What is the best way to prevent this from happening?
This answer from another post suggests using a Timer. Using a Timer correctly adds a period but it doesn't solve the issue of speech engine not knowing about the period already being on the UI.

Related

How can I set delay time for each line using AVSpeechSynthesizer on iOS?

I would like to set a delay time for each line while using AVSpeechSynthesizer on iOS app.
Let's say, I have this text which has 3 lines.
The world's richest person decided to buy Twitter.
Mr Musk said he had backed out because Twitter failed to provide
enough information.
Twitter says plans to pursue legal action to enforce the agreement.
I want to set a delay time for every 3 lines while using AVSpeechSynthesizer. I tried, postUtteranceDelay, but it sets a delay time after the full article, not after the line.
PS:
this text-to-speech app has the setting for that delay of each line using iOS text-to-speech, but I wonder how it can be done.
https://apps.apple.com/jp/app/id935297933

Voice Activity Detection from mic input on iOS

I'm developing an iOS app that does voice based AI; i.e. it's meant to take voice input from the microphone, turn it into text, send it to an AI agent, then output the returned text through the speaker. I've got everything working, though using a button to start and stop recording the speech (SpeechKit for voice recognition, API.AI for the AI, Amazon's Polly for the output).
The piece that I need is to have the microphone always on and to automatically start and stop the recording of the user's voice as they begin and end talking. This app is being developed for an unorthodox context, where there will be no access to the screen for the user (but they will have a high-end shotgun mic for recording their text).
My research suggests this piece of the puzzle is known as 'Voice Activity Detection' and seems to be one of the hardest steps in the whole voice-based AI system.
I'm hoping someone can either supply some straightforward (Swift) code to implement this myself, or point me in the direction of some decent libraries / SDKs that I can implement in this project.
For good VAD algorithm implementation you can use py-webrtcvad.
It is a Python interface for C code, you can just import C files from the project and use them from swift.

Voice Command without Pressing a Button on iOS

Currently, I am working on developing the iOS App that triggers an event upon voice command.
I saw a camera app, where a user says "start recording," then the camera starts to the recording mode.
This is an in-app voice control capability, so I am thinking it is different from SiriKit or SpeechRecognizer, which I have already implemented.
How would I achieve it?
My question is NOT the voice dictation where a user has to press a button to start dictation.
App needs to passively wait for a keyword, or intent, which is something like "myApp, start recording" or "myApp, stop recording", then the app starts/stop that event function accordingly.
Thanks.
OpenEars : Free speech recognition and speech synthesis for the iPhone.
OpenEars makes it simple for you to add offline speech recognition in many languages and synthesized speech/TTS to your iPhone app quickly and easily. It lets everyone get the great results of using advanced speech app interface concepts.
Check out this link.
http://www.politepix.com/openears/
or
Building an iOS App like Siri
https://www.raywenderlich.com/60870/building-ios-app-like-siri
Thank you.
How would I achieve it?
There's an iOS 13 new feature called Voice Control that will allow you to reach your goal.
You can find useful information in the Customize Commands section where all the vocal commands are available (you can create a custom one as well):
For the example of the camera you mentioned, everything can be done vocally as follows:
I showed the items names to understand the vocal commands I used but they can be hidden if you prefer (hide names).
Voice Control is a built-in feature you can use inside your apps as well.
The only thing to do as a developer is eventually adapting the accessibilityUserInputLabels properties if you need specific names to be displayed for some items in your apps.
If you're looking for a voice command without pressing a button on iOS, the Voice Control is THE perfect candidate.

Swift - Stop speech recognition on no talk

I am working on an app that uses the new Speech framework in ios 10 to do some speech-to-text stuff. What is the best way of stopping the recognition when the user stops talking?
Not the best but a possible solution is to track the elpsed time since last result and after a certain amount of time stop recognition.

iOS AVAudioRecorder insert or append to recording

I'm in the process of creating an iPhone app to record/playback user dictations with the resulting file being uploaded to a server for speech recognition and transcription. I have everything working apart being able append or insert dictation into a recording thats being created. For example when a user creates a dictation they will record some speech, then play back some or all of that speech recording, then switch back into recording. The subsequent speech recording should append itself if the payback reached its end or overwrite the original record at the point the user stopped the playback.
I have been looking for a solution to this for a couple of weeks now and found nothing but vague suggestions, I also got a copy of the Learning Core Audio book but I'm no further forward. If anyone can shed some light on how to proceed I would be most grateful. Thanks in advance.

Resources