IOS/Swift/AVFoundation: Crash on audioEngine.inputNode declaration inside guard statement - ios

I am new to Swift and trying to implement the audioEngine for a microphone in order to record.
At one point I declare the inputNode (microphone) with the statement:
print("before input node")
guard let inputNode = audioEngine.inputNode else {
fatalError("Audio engine has no input node")
}
print("after check of input node")
Stepping through the code in the debugger, an exception occurs at runtime in the course of the guard let inputNode statement. The code prints "before inputNode" but never prints the fatalError or the "after check" lines
I was under the impression that a guard statement in Swift detects a nil value thereby avoiding a crash but that is not occurring in this case.
Would appreciate any suggestions on what might be going wrong.
For reference, prior to this point in the method the following code runs without issue:
public fund startRecording()
if recognitionTask != nil {
recognitionTask?.cancel()
recognitionTask = nil
}
let audioSession = AVAudioSession.sharedInstance()
do {
try audioSession.setCategory(AVAudioSessionCategoryRecord)
try audioSession.setMode(AVAudioSessionModeMeasurement)
try audioSession.setActive(true, with: .notifyOthersOnDeactivation)
} catch {
print("audioSession properties weren't set because of an error.")
}
recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
Edit:
Here is additional code relating to the engine:
declaration in viewdidload
private let audioEngine = AVAudioEngine()
And attempt to start it later in startRecording method referenced above:
let recordingFormat = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer, when) in
self.recognitionRequest?.append(buffer)
}
audioEngine.prepare()
do {
try audioEngine.start()
} catch {
print("audioEngine couldn't start because of an error.")
}

Related

SwiftUI : Speech recognition works on iphone by crashes on ipad

I don't know why the app works on iphone but crashes on ipad. I am building a speech to text feature.
this is my speech to text code
func StartRecording() -> String{
// Configure the audio session for the app.
let audioSession = AVAudioSession.sharedInstance()
try! audioSession.setCategory(.record, mode: .measurement, options: .duckOthers)
try! audioSession.setActive(true, options: .notifyOthersOnDeactivation)
let inputNode = audioEngine.inputNode
//
let recordingFormat = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer: AVAudioPCMBuffer, when: AVAudioTime) in
self.recognitionRequest?.append(buffer)
}
audioEngine.prepare()
try! audioEngine.start()
// Create and configure the speech recognition request.
recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
guard let recognitionRequest = recognitionRequest else { fatalError("Unable to create a SFSpeechAudioBufferRecognitionRequest object") }
recognitionRequest.shouldReportPartialResults = true
// Create a recognition task for the speech recognition session.
// Keep a reference to the task so that it can be canceled.
recognitionTask = speechRecognizer.recognitionTask(with: recognitionRequest) { result, error in
var isFinal = false
if let result = result {
// Update the text view with the results.
self.recognizedText = result.bestTranscription.formattedString
isFinal = result.isFinal
}
if error != nil || isFinal {
// Stop recognizing speech if there is a problem.
self.audioEngine.stop()
inputNode.removeTap(onBus: 0)
self.recognitionRequest = nil
self.recognitionTask = nil
}
}
return recognizedText
}
the app works fine on iphone but not on ipad.
This is the error I get when I try to run speech recognition on ipad simulator
Terminating app due to uncaught exception 'com.apple.coreaudio.avfaudio', reason: 'required condition is false: format.sampleRate == hwFormat.sampleRate'
what is causing the crash and how can I fix it?

How to get frequency of sound from AVAudioEngine

I'm implementing a speech recognition module for an app. It works fine, however there are some additional things that I need to do. For example, I need to know if a user is speaking or shouting. I know, I can achieve that by knowing the frequency of the sound. Here is how I implement it:
let audioEngine = AVAudioEngine()
let speechRecognizer: SFSpeechRecognizer? = SFSpeechRecognizer()
let request = SFSpeechAudioBufferRecognitionRequest()
var recognitionTask = SFSpeechRecognitionTask()
func recordAndRecognizeSpeech() {
let node = audioEngine.inputNode
let recordingFormat = node.outputFormat(forBus: 0)
node.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer, _) in
self.request.append(buffer)
}
audioEngine.prepare()
do {
try audioEngine.start()
} catch {
return print(error)
}
guard let myRecoginizer = SFSpeechRecognizer() else {
return
}
if !myRecoginizer.isAvailable {
return
}
recognitionTask = (speechRecognizer?.recognitionTask(with: request, resultHandler: { (result, error) in
//Handling speech recognition tasks here
}))!
}
This works fine for the speech recognition, but how can I get the frequency or amplitude value of the sound?

Getting crash while continuing conversion on speech framework ios 10

I am getting crash while continue talk with speech.framework and getting below crash of AVAudio engine getting NULL.
*** Terminating app due to uncaught exception 'com.apple.coreaudio.avfaudio', reason: 'required condition is false:
nullptr == Tap()'
This is due to in some cases my AudioEngine getting null.
Here is my StartRecording function code :
func startRecording(){
if recognizationTask != nil{
recognizationTask?.cancel()
recognizationTask = nil
}
let audioSession = AVAudioSession.sharedInstance()
do{
try audioSession.setCategory(AVAudioSessionCategoryRecord)
try audioSession.setMode(AVAudioSessionModeSpokenAudio)
try audioSession.setActive(true, with: .notifyOthersOnDeactivation)
} catch {
print("Audion session properies weren't set because of an error.")
}
recognizationRequest = SFSpeechAudioBufferRecognitionRequest()
guard let inputNode = audioEngine.inputNode as AVAudioInputNode? else {
fatalError("Audio engine has no input node")
}
guard let recognizationRequest = recognizationRequest else {
fatalError("Unable to create an SFSpeechAudioBufferRecognizationRequest object.")
}
recognizationRequest.shouldReportPartialResults = true
recognizationTask = speechRecognizer?.recognitionTask(with: recognizationRequest, resultHandler: { (result, error) in
var isFinal = false
if result != nil{
self.txtViewSiriDetecation.text = result?.bestTranscription.formattedString
isFinal = (result?.isFinal)!
}
if error != nil || isFinal {
self.audioEngine.stop()
inputNode.removeTap(onBus:0)
self.recognizationRequest = nil
self.recognizationTask = nil
self.btnSiri.isEnabled = true
}
})
let recordingFormat = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer, when) in
self.recognizationRequest?.append(buffer)
}
audioEngine.prepare()
do {
try audioEngine.start()
} catch {
print("audio engine couldn't start b'cus of an error.")
}
txtViewSiriDetecation.text = "Say something, I'm listening!"
}
How can I overcome this situation of getting NULL ?
Any one guide me on this?
Thanks in Advance!
I had that problem too, adding audioEngine.inputNode.removeTap(onBus: 0) fixed it for me.
fileprivate func stopRecording() {
recordingMic.isHidden = true
audioEngine.stop()
audioEngine.inputNode.removeTap(onBus: 0)
recognitionRequest?.endAudio()
recognitionTask?.cancel()
self.detectSpeechButton.isEnabled = true
self.detectSpeechButton.setTitle("Detect Speech", for: .normal)
recordingMic.isHidden = true
self.textView.isHidden = false
}
Non-nil format for installTap is being passed. This should only be done when attaching to an output bus which is not connected to another node; an error will result otherwise. The tap and connection formats (if non-nil) on the specified bus should be identical. Otherwise, the latter operation will override any previously set format.

Voice recognizer breaks voice synthesizer

I must be overlooking something, but when I try to combine voice synthesis and voice recognition in Swift, I get bad results ("Could not get attribute 'LocalURL': Error Domain=MobileAssetError Code=1 "Unable to copy asset attributes" UserInfo={NSDescription=Unable to copy asset attributes}") and the final result is that after that I am able to do speech to text, but text to speech is ruined until restart of the app.
let identifier = "\(Locale.current.languageCode!)_\(Locale.current.regionCode!)" // e.g. en-US
speechRecognizer = SFSpeechRecognizer(locale: Locale.init(identifier: identifier))!
if audioEngine.isRunning {
audioEngine.stop() // will also stop playing music.
recognitionRequest?.endAudio()
speechButton.isEnabled = false
} else {
recordSpeech() // here we do steps 1 .. 12
}
// recordSpeech() :
if recognitionTask != nil { // Step 1
recognitionTask?.cancel()
recognitionTask = nil
}
let audioSession = AVAudioSession.sharedInstance() // Step 2
do {
try audioSession.setCategory(AVAudioSessionCategoryRecord)
try audioSession.setMode(AVAudioSessionModeMeasurement)
try audioSession.setActive(true, with: .notifyOthersOnDeactivation)
} catch {
print("audioSession properties weren't set because of an error.")
}
recognitionRequest = SFSpeechAudioBufferRecognitionRequest() // Step 3
guard let inputNode = audioEngine.inputNode else {
fatalError("Audio engine has no input node")
} // Step 4
guard let recognitionRequest = recognitionRequest else {
fatalError("Unable to create an SFSpeechAudioBufferRecognitionRequest object")
} // Step 5
recognitionRequest.shouldReportPartialResults = true // Step 6
recognitionTask = speechRecognizer.recognitionTask(with: recognitionRequest, resultHandler: { (result, error) in // Step 7
var isFinal = false // Step 8
if result != nil {
print(result?.bestTranscription.formattedString as Any)
isFinal = (result?.isFinal)!
if (isFinal) {
if (result != nil) {
self.speechOutput.text = self.speechOutput.text + "\n" + (result?.bestTranscription.formattedString)!
}
}
}
if error != nil || isFinal { // Step 10
self.audioEngine.stop()
inputNode.removeTap(onBus: 0)
self.recognitionRequest = nil
self.recognitionTask = nil
self.speechButton.isEnabled = true
}
})
let recordingFormat = inputNode.outputFormat(forBus: 0) // Step 11
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer, when) in
self.recognitionRequest?.append(buffer)
}
audioEngine.prepare() // Step 12
do {
try audioEngine.start()
} catch {
print("audioEngine couldn't start because of an error.")
}
I used thus tutorial to base my code upon:
http://www.appcoda.com/siri-speech-framework/
func say(_ something : String, lang : String ) {
let synth = AVSpeechSynthesizer()
synth.delegate = self
print(something) // debug code, works fine
let identifier = "\(Locale.current.languageCode!)-\(Locale.current.regionCode!)"
let utterance = AVSpeechUtterance(string: something)
utterance.voice = AVSpeechSynthesisVoice(language: identifier)
synth.speak(utterance)
}
So if I use the "say" method on it's own it works well, if I combine the two, after doing speech recognition, the synthesizer does not work anymore. Any hints into the direction of the solution? I suppose something is not being gracefully restored to it's prior state, but I can't seem to figure out what.
Grrr...
This is the solution, sorry about not looking well enough, costed me a lot of time though.
func say(_ something : String, lang : String ) {
let audioSession = AVAudioSession.sharedInstance()
do {
// this is the solution:
try audioSession.setCategory(AVAudioSessionCategoryPlayback)
try audioSession.setMode(AVAudioSessionModeDefault)
// the recognizer uses AVAudioSessionCategoryRecord
// so we want to set it to AVAudioSessionCategoryPlayback
// again before we can say something
} catch {
print("audioSession properties weren't set because of an error.")
}
synth = AVSpeechSynthesizer()
synth.delegate = self
print(something)
let identifier = "\(Locale.current.languageCode!)-\(Locale.current.regionCode!)"
let utterance = AVSpeechUtterance(string: something)
utterance.voice = AVSpeechSynthesisVoice(language: identifier)
synth.speak(utterance)
}

SFSpeechRecognizer - detect end of utterance

I am hacking a little project using iOS 10 built-in speech recognition. I have working results using device's microphone, my speech is recognized very accurately.
My problem is that recognition task callback is called for every available partial transcription, and I want it to detect person stopped talking and call the callback with isFinal property set to true. It is not happening - app is listening indefinitely.
Is SFSpeechRecognizer ever capable of detecting end of sentence?
Here's my code - it is based on example found on the Internets, it is mostly a boilerplate needed to recognize from microphone source.
I modified it by adding recognition taskHint. I also set shouldReportPartialResults to false, but it seems it has been ignored.
func startRecording() {
if recognitionTask != nil {
recognitionTask?.cancel()
recognitionTask = nil
}
let audioSession = AVAudioSession.sharedInstance()
do {
try audioSession.setCategory(AVAudioSessionCategoryRecord)
try audioSession.setMode(AVAudioSessionModeMeasurement)
try audioSession.setActive(true, with: .notifyOthersOnDeactivation)
} catch {
print("audioSession properties weren't set because of an error.")
}
recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
recognitionRequest?.shouldReportPartialResults = false
recognitionRequest?.taskHint = .search
guard let inputNode = audioEngine.inputNode else {
fatalError("Audio engine has no input node")
}
guard let recognitionRequest = recognitionRequest else {
fatalError("Unable to create an SFSpeechAudioBufferRecognitionRequest object")
}
recognitionRequest.shouldReportPartialResults = true
recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, resultHandler: { (result, error) in
var isFinal = false
if result != nil {
print("RECOGNIZED \(result?.bestTranscription.formattedString)")
self.transcriptLabel.text = result?.bestTranscription.formattedString
isFinal = (result?.isFinal)!
}
if error != nil || isFinal {
self.state = .Idle
self.audioEngine.stop()
inputNode.removeTap(onBus: 0)
self.recognitionRequest = nil
self.recognitionTask = nil
self.micButton.isEnabled = true
self.say(text: "OK. Let me see.")
}
})
let recordingFormat = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer, when) in
self.recognitionRequest?.append(buffer)
}
audioEngine.prepare()
do {
try audioEngine.start()
} catch {
print("audioEngine couldn't start because of an error.")
}
transcriptLabel.text = "Say something, I'm listening!"
state = .Listening
}
It seems that isFinal flag doesn't became true when user stops talking as expected. I guess this is a wanted behaviour by Apple, because the event "User stops talking" is an undefined event.
I believe that the easiest way to achieve your goal is to do the following:
You have to estabilish an "interval of silence". That means if the user doesn't talk for a time greater than your interval, he has stopped talking (i.e. 2 seconds).
Create a Timer at the beginning of the audio session:
var timer = NSTimer.scheduledTimerWithTimeInterval(2, target: self, selector: "didFinishTalk", userInfo: nil, repeats: false)
when you get new transcriptions in recognitionTaskinvalidate and restart your timer
timer.invalidate()
timer = NSTimer.scheduledTimerWithTimeInterval(2, target: self, selector: "didFinishTalk", userInfo: nil, repeats: false)
if the timer expires this means the user doesn't talk from 2 seconds. You can safely stop Audio Session and exit
Based on my test on iOS10, when shouldReportPartialResults is set to false, you have to wait 60 seconds to get the result.
I am using Speech to text in an app currently and it is working fine for me. My recognitionTask block is as follows:
recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, resultHandler: { (result, error) in
var isFinal = false
if let result = result, result.isFinal {
print("Result: \(result.bestTranscription.formattedString)")
isFinal = result.isFinal
completion(result.bestTranscription.formattedString, nil)
}
if error != nil || isFinal {
self.audioEngine.stop()
inputNode.removeTap(onBus: 0)
self.recognitionRequest = nil
self.recognitionTask = nil
completion(nil, error)
}
})
if result != nil {
self.timerDidFinishTalk.invalidate()
self.timerDidFinishTalk = Timer.scheduledTimer(timeInterval: TimeInterval(self.listeningTime), target: self, selector:#selector(self.didFinishTalk), userInfo: nil, repeats: false)
let bestString = result?.bestTranscription.formattedString
self.fullsTring = bestString!.trimmingCharacters(in: .whitespaces)
self.st = self.fullsTring
}
Here self.listeningTime is the time after which you want to stop after getting end of the utterance.
I have a different approach that I find far more reliable in determining when the recognitionTask is done guessing: the confidence score.
When shouldReportPartialResults is set to true, the partial results will have a confidence score of 0.0. Only the final guess will come back with a score over 0.
recognitionTask = speechRecognizer.recognitionTask(with: recognitionRequest) { result, error in
if let result = result {
let confidence = result.bestTranscription.segments[0].confidence
print(confidence)
self.transcript = result.bestTranscription.formattedString
}
}
The segments array above contains each word in the transcription. 0 is the safest index to examine, so I tend to use that one.
How you use it is up to you, but if all you want to do is know when the guesser is done guessing, you can just call:
let myIsFinal = confidence > 0.0 ? true : false
You can also look at the score (100.0 is totally confident) and group responses into groups of low -> high confidence guesses as well if that helps your application.

Resources