Process the text once voice input is stopped from SFSpeechRecognizer - ios

I am developing a Voice to Text application using iOS SFSpeechRecognizer API.
Found a great tutorial here: and it worked fine.
I wanted to process the text and perform some action as soon as the voice input is stopped. So, was curious whether there is a delegate method available for SFSpeechRecognizer which can recognise when the voice input is stopped so that I can capture the input and process further?

So, was curious whether there is a delegate method available for SFSpeechRecognizer which can recognise when the voice input is stopped so that I can capture the input and process further?
Not built into the SFSpeechRecognizer API, no. On the contrary, that is exactly why you must provide interface that allows the user to tell the recognizer that the input is finished (e.g. a Done button of some sort). Your app will be rejected if you omit that interface.

A possible solution maybe to use a third party library like FDSoundActivatedRecorder which start recording when sound is detected and
stops recording when the user is done talking.
Then you can use the recorded audio as in this link to convert it to text in a go.
func transcribeAudio(url: URL) {
// create a new recognizer and point it at our audio
let recognizer = SFSpeechRecognizer()
let request = SFSpeechURLRecognitionRequest(url: url)
// start recognition!
recognizer?.recognitionTask(with: request) { [unowned self] (result, error) in
// abort if we didn't get any transcription back
guard let result = result else {
print("There was an error: \(error!)")
return
}
// if we got the final transcription back, print it
if result.isFinal {
// pull out the best transcription...
print(result.bestTranscription.formattedString)
}
}
}

Related

audioEngine.start() throws exception when called

Problem
I'm trying to record microphone data using AVAudioEngine. This is my setup.
First I create singleton session, set sessions category and activate it.
let audioSession = AVAudioSession.sharedInstance()
do {
try audioSession.setCategory(.record)
try audioSession.setActive(true)
} catch {
...
}
After that i create input format for bus 0, connect input and output nodes and install tap on input node(also tried to tap output node)
let inputFormat = self.audioEngine.inputNode.inputFormat(forBus: 0)
self.audioEngine.connect(self.audioEngine.inputNode, to: self.audioEngine.outputNode, format: inputFormat)
self.audioEngine.outputNode.installTap(onBus: 0, bufferSize: bufferSize, format: inputFormat) { (buffer, time) in
let theLength = Int(buffer.frameLength)
var samplesAsDoubles:[Double] = []
for i in 0 ..< theLength
{
let theSample = Double((buffer.floatChannelData?.pointee[i])!)
samplesAsDoubles.append( theSample )
}
...
}
All of the above is in one function. I also have another function called startRecording which contains the following.
do {
try audioEngine.start()
} catch {
...
}
I have also verified microphone permissions which are granted.
Start method fails and this is the response
The operation couldn’t be completed. (com.apple.coreaudio.avfaudio error -10875.)
Questions
Based on documentation there are three possible causes
- There’s a problem in the structure of the graph, such as the input can’t route to an output or to a recording tap through converter nodes.
- An AVAudioSession error occurs.
- The driver fails to start the hardware.
```
I don't believe it's the session one, because i set it up via documentation.
If the driver fails, how would i detect that and handle it?
If graph is setup incorrectly, where did i mess it up?
If you want to record the microphone, attach the tap to the inputNode, not the outputNode. Taps observe the output of a node. There is no "output" of the outputNode. It's the sink.
If you need to install a tap "immediately before the outputNode" (which might not be the input node in a more complicated graph), you can insert a mixer between the input(s) and the output and attach the tap to the mixer.
Also, make sure you really want to connect the input node to the output node here. There's no need to do that unless you want to playback audio live. You can just attach a tap to the input node without building any more of the graph if you just want to record the microphone. (This also might be causing part of your problem, since you set your category to .record, i.e. no playback. I don't know that wiring something to the output in that case causes an exception, but it definitely doesn't make sense.)

Programmatically trigger the action that a headphone pause button would do

I am trying to find a way to pause any playing media on the device, so I was thinking of triggering the same logic that is fired when a user press the headphone "middle button"
I managed to prevent music from resuming (after I pause it within my app, which basically start an AVAudioSession for recording) by NOT setting the AVAudioSession active property to false and leave it hanging, but I am pretty sure thats a bad way to do it. If I deactivate it the music resumes. The other option I am thinking of is playing some kind of silent loop that would "imitate" the silence I need to do. But I think if what I am seeking is doable, it would be the best approach as I understood from this question it cannot be done using the normal means
func stopAudioSession() {
let audioSession = AVAudioSession.sharedInstance(
do {
if audioSession.secondaryAudioShouldBeSilencedHint{
print("someone is playing....")
}
try audioSession.setActive(false, options: .notifyOthersOnDeactivation)
isSessionActive = false
} catch let error as NSError {
print("Unable to deactivate audio session: \(error.localizedDescription)")
print("retying.......")
}
}
In this code snippet as the function name implies I set active to false, tried to find other options but I could not find another way of stopping my recording session and prevent resume of the other app that was already playing
If someone can guide me to which library I should look into, if for example I can tap into the H/W part and trigger it OR if I can find out which library is listening to this button press event and handling the pause/play functionality
A friend of mine who is more experienced in IOS development suggested the following workaround and it worked - I am posting it here as it might help someone trying to achieve a similar behaviour.
In order to stop/pause what is currently being played on a user device, you will need to add a music player into your app. then at the point where you need to pause/stop the current media, you just initiate the player, play and then pause/stop it - simple :)
like so:
let musicPlayer = MPMusicPlayerApplicationController.applicationQueuePlayer
func stopMedia(){
MPMediaLibrary.requestAuthorization({(newPermissionStatus: MPMediaLibraryAuthorizationStatus) in
self.musicPlayer.setQueue(with: .songs())
self.musicPlayer.play()
print("Stopping music player")
self.musicPlayer.pause()
print("Stopped music player")
})
}
the part with MPMediaLibrary.requestAuthorization is needed to avoid an authorisation error when accessing user's media library.
and of course you will need to add the Privacy - Media Library Usage Description
key into your Info.plist file

Callkit loudspeaker bug / how WhatsApp fixed it?

I have an app with Callkit functionality. When I press the loudspeaker button, it will flash and animate to the OFF state (sometimes the speaker is set to LOUD but the icon is still OFF). When I tap on it multiple times... it can be clearly seen that this functionality is not behaving correctly.
However, WhatsApp has at the beginning the loudspeaker turned OFF and after 3+ seconds it activates it and its working. Has anyone encountered anything similar and can give me a solution?
Youtube video link to demonstrate my problem
There is a workaround proposed by an apple engineer which should fix callkit not activating the audio session correctly:
a workaround would be to configure your app's audio session (call configureAudioSession()) earlier in your app's lifecycle, before the -provider:performAnswerCallAction: method is invoked. For instance, you could call configureAudioSession() immediately before calling -[CXProvider reportNewIncomingCallWithUUID:update:completion:] in order to ensure that the audio session is fully configured prior to informing CallKit about the incoming call.
From: https://forums.developer.apple.com/thread/64544#189703
If this doesn't help, you probably should post an example project which reproduces your behaviour for us to be able to analyse it further.
Above answer is correct, "VoiceChat" mode ruin everything.
Swift 4 example for WebRTC.
After connection was established call next
let rtcAudioSession = RTCAudioSession.sharedInstance()
rtcAudioSession.lockForConfiguration()
do {
try rtcAudioSession.setCategory(AVAudioSession.Category.playAndRecord.rawValue, with:
AVAudioSession.CategoryOptions.mixWithOthers)
try rtcAudioSession.setMode(AVAudioSession.Mode.default.rawValue)
try rtcAudioSession.overrideOutputAudioPort(.none)
try rtcAudioSession.setActive(true)
} catch let error {
debugPrint("Couldn't force audio to speaker: \(error)")
}
rtcAudioSession.unlockForConfiguration()
You can use AVAudioSession.sharedInstance() as well instead RTC
Referd from Abnormal behavior of speaker button on system provided call screen
The same issue has been experienced in the previous versions as well. So this is not the new issue happening on the call kit.
This issue has to be resolved from iOS. We don't have any control over this.
Please go through the apple developer forum
CallKit/detect speaker set
and
[CALLKIT] audio session not activating?
Maybe you can setMode to AVAudioSessionModeDefault.
When I use CallKit + WebRTC
I configure AVAudioSessionModeDefault mode.
Alloc CXProvider and reportNewIncomingCallWithUUID
Use WebRTC , after ICEConnected, WebRTC change mode to AVAudioSessionModeVoiceChat, then speaker issue happen.
Later I setMode back to AVAudioSessionModeDefault, the speaker works well.
I've fixed the issue by doing following steps.
In CXAnswerCallAction, use below code to set audiosession config.
RTCDispatcher.dispatchAsync(on: RTCDispatcherQueueType.typeAudioSession) {
let audioSession = RTCAudioSession.sharedInstance()
audioSession.lockForConfiguration()
let configuration = RTCAudioSessionConfiguration.webRTC()
configuration.categoryOptions = [AVAudioSessionCategoryOptions.allowBluetoothA2DP,AVAudioSessionCategoryOptions.duckOthers,
AVAudioSessionCategoryOptions.allowBluetooth]
try? audioSession.setConfiguration(configuration)
audioSession.unlockForConfiguration()}
After call connected, I'm resetting AudioSession category to default.
func configureAudioSession() {
let session = RTCAudioSession.sharedInstance()
session.lockForConfiguration()
do {
try session.setCategory(AVAudioSession.Category.playAndRecord.rawValue, with: .allowBluetooth)
try session.setMode(AVAudioSession.Mode.default.rawValue)
try session.setPreferredSampleRate(44100.0)
try session.setPreferredIOBufferDuration(0.005)
}
catch let error {
debugPrint("Error changeing AVAudioSession category: \(error)")
}
session.unlockForConfiguration()}
Thanks to SO #Алексей Смольский for the help.

CallKit can reactivate sound after swapping call

I'm developing CallKit application, I have a problem, Call Holding is failing to restart audio when "swapping" calls on the CallKit screen until user returns to in-app call screen. I can bypass this by updating:
supportsHolding = false
but I can I solve this problem, whatsapp for example can do this correctly!
p.s. I'm using webrtc to make a call!
thanks!
EDIT:
This is code of provider:
public func provider(_ provider: CXProvider, perform action: CXSetHeldCallAction) {
guard let call = conductor!.callWithUUID(uuid: action.callUUID) else {
WebRtcConductor.debug("\(self.TAG) 🔴 failed to perform HeldAction: uuid: \(action.uuid), calluiid: \(action.callUUID)")
action.fail()
return
}
setIsHeld(call: call, isHeld: action.isOnHold)
action.fulfill()
}
the setIsHeld function simply do:
audioTrack.isEnabled = enabled
If I use "mute" button of callkit screen, all works fine, but if I have 2 active calls, when I swipe from webrtc call to normal call, CXSetHeldCallAction is called and audio track did disabled, If I swipe again to webrtc call, audio track is enabled but i do not hear nothing, if I return to main app screen, audio works fine again!
Actually, there is a limitation in the Google WebRTC Library which leads to the described problem when implementing a CallKit integration which supports swapping calls.
The WebRTC Issue 8126 is known for over a year now, but not yet integrated into the WebRTC master branch. However, you can find the necessary code changes to fix this problem in the original ticket.
However, as a workarround, you can trigger the system notification which is subscribed by WebRTC internally.
Post a AVAudioSessionInterruptionType.ended Notification in the "didActivate audioSession" method of the CallKit Provider:
var userInfo = Dictionary<AnyHashable, Any>()
let interrupttioEndedRaw = AVAudioSessionInterruptionType.ended.rawValue
userInfo[AVAudioSessionInterruptionTypeKey] = interrupttioEndedRaw
NotificationCenter.default.post(name: NSNotification.Name.AVAudioSessionInterruption, object: self, userInfo: userInfo)
PS: Stare the ticket to improve chances of a merge ;-)
Had the same issue. If I have 1 active call, then new calls is incoming, I tap hold&accept. New call works, but after using Swap in CallKit audio stops working.
Found that provider:performSetHeldCallAction: method from CXProviderDelegate protocol is the spot where you can actually deactivate/activate audio for Swap calls via CallKit native interface.
In my case I used the audioController.deactivateAudioSession() method for the call was putting in OnHold.
But found that the same method provider:performSetHeldCallAction: was fired for other call that is being put active (from OnHold state), when tap Swap button via CallKit.
So you just need to deactivate/activate audio respectively to call's state (either hold or not).
In common way it should look this way:
func provider(_ provider: CXProvider, perform action: CXSetHeldCallAction) {
// Retrieve the Call instance corresponding to the action's call UUID
guard let call = callManager.callWithUUID(uuid: action.callUUID) else {
action.fail()
return
}
// Update the Call's underlying hold state.
call.isOnHold = action.isOnHold
// Stop or start audio in response to holding or unholding the call.
if call.isOnHold {
stopAudio()
} else {
startAudio()
}
// Signal to the system that the action has been successfully performed.
action.fulfill()
}
P.S. It looks like you should have some class that responds for Audio session. It should implement kind of activate audio session / deactivate audio session.

NotificationCenter stops working when the screen is locked

I'm having trouble with an app I'm building, the app objective is to play audio files, it works by requesting an audio file from a public API, playing it and wait until it ends, after it ends it requests another audio and starts over.
Here's a shortened version of the code that does this, I omitted the error checking part for simplicity
func requestEnded(audioSource: String) {
let url = URL(string: "https://example.com/audios/" + audioSource)
audio = AVPlayer(url: url!)
NotificationCenter.default.addObserver(self,selector: #selector(MediaItem.finishedPlaying(_:)), name: NSNotification.Name.AVPlayerItemDidPlayToEndTime, object: audio?.currentItem)
audio?.play()
}
#objc func finishedPlaying(_ notification: NSNotification) {
print("Audio ended")
callAPI()
}
func callAPI() {
// do all the http request stuff
requestEnded(audioSource: "x.m4a")
}
// callAPI() is called when the app is initialized
It works well when the screen is unlocked. When I lock the phone the current audio keeps playing but when it ends finishedPlaying() never gets called (the print statement is not shown on the console).
How can I make it so the app would know the audio ended and trigger another one all while locked?, In the android version I got around the screenlock problem by using a partial wakelock which made it run normally even with the screen off.
It has to be done this way because the API decides the audio on realtime and it's all done on the backend so no way to buffer more than one audio without breaking the requirements of the app
What are my options here?, any help is appreciated.

Resources