I'm trying to make user speak and get what user says right. I read like 20 different articles about speech recognition and almost all is same. It keeps listening to user for like 1 minute or more. I want it to stop recognation when user stop speaking. I want to catch a word/few words that user says. Is there something limiting the time that user speak?
My code block :
func recordAndRecognizeSpeech(){
if recognitionTask != nil {
recognitionTask?.cancel()
recognitionTask = nil
}
let audioSession = AVAudioSession.sharedInstance()
do {
try audioSession.setCategory(.record, mode: .measurement, options: .duckOthers)
try audioSession.setActive(true, options: .notifyOthersOnDeactivation)
} catch {
print("audioSession properties weren't set because of an error.")
}
recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
let node = audioEngine.inputNode
guard let request = recognitionRequest else {
fatalError("Unable to create an SFSpeechAudioBufferRecognitionRequest object")
}
//request.shouldReportPartialResults = true
// Setting requiresOnDeviceRecognition to false would use the Apple Cloud for speech recognition.
if speechRecognizer?.supportsOnDeviceRecognition ?? false{
request.requiresOnDeviceRecognition = true
}
guard let myRecognizer = SFSpeechRecognizer() else {
// A recognizer is not supported for the current locale
return
}
if !myRecognizer.isAvailable {
// A recognizer is not available now
return
}
recognitionTask = speechRecognizer?.recognitionTask(with: request, resultHandler: { result, error in
if let result = result {
DispatchQueue.main.async {
let bestString = result.bestTranscription.formattedString
print(bestString)
}
} else if let error = error {
print(error)
self.audioEngine.stop()
node.removeTap(onBus: 0)
self.recognitionRequest = nil
self.recognitionTask = nil
self.speakButton.isEnabled = true
}
})
let recordingFormat = node.outputFormat(forBus: 0)
node.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat){buffer,_ in
self.recognitionRequest!.append(buffer)
}
audioEngine.prepare()
do {
try audioEngine.start()
} catch {
return print(error)
}
}
You can check the power of the sound input, and if it reach a minimum value, start a timer (like 3 seconds), and stop after the timer fire.
var recorder: AVAudioRecorder?
recorder.updateMeters()
let dB = recorder.averagePower(forChannel: 0)
Related
I`m using Speech framework to transcript user response in a quiz game. We present a image of an animal and the user have to say the name of the animal. When I evaluate user response... "result.transcriptions" only have 1 transcription, the same as "result.bestTranscription". Are there any solution to get multiple response options?
Here is my Code
if recognitionTask != nil {
recognitionTask?.finish()
recognitionTask = nil
}
do {
try AVAudioSession.sharedInstance().setCategory(.playAndRecord, mode: .default, options: .mixWithOthers)
try AVAudioSession.sharedInstance().overrideOutputAudioPort(AVAudioSession.PortOverride.speaker)
try AVAudioSession.sharedInstance().setActive(true, options: .notifyOthersOnDeactivation)
} catch {
print("audioSession properties weren't set because of an error.")
}
recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
let inputNode = audioEngine.inputNode
guard let recognitionRequest = recognitionRequest else {
fatalError("Unable to create an SFSpeechAudioBufferRecognitionRequest object")
}
recognitionRequest.shouldReportPartialResults = true
recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, resultHandler: { [weak self] result, error in
if let error = error {
debugPrint(error)
}
guard let result = result else { return }
if result.bestTranscription.formattedString.lowercased() == self.searchTerm.lowercased() {
found = true
self.response = result.bestTranscription.formattedString
} else {
result.transcriptions.forEach { transcription in
if transcription.formattedString.lowercased() == self.searchTerm.lowercased() {
self.response = transcription.formattedString
found = true
}
debugPrint("Transcription")
debugPrint(transcription.formattedString)
}
})
In one of my application I am using the Speech framework for converting user's voice into Text.
Basically I want my application hands-free with some commands It can operate.
So there is a limit of Apple that has only 1000 request per hour and SFSpeechRecognitionTask only last about 1 minute only.
I want SFSpeechRecognitionTask should make alive and keep recognise the voice.
So what is the best way we can do with the code. Is it too much battery gain If I will do restart SFSpeechRecognitionTask in every 1 min?
I have done code like below to start detecting voice and it's going to stop after 1 minute.
Please help me out if there will be a way to achieve it.
func startRecording() {
if recognitionTask != nil {
recognitionTask?.cancel()
recognitionTask = nil
}
let audioSession = AVAudioSession.sharedInstance()
do {
try audioSession.setCategory(AVAudioSessionCategoryRecord)
try audioSession.setMode(AVAudioSessionModeMeasurement)
try audioSession.setActive(true, with: .notifyOthersOnDeactivation)
} catch {
print("audioSession properties weren't set because of an error.")
}
recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
guard let inputNode = audioEngine.inputNode else {
fatalError("Audio engine has no input node")
}
guard let recognitionRequest = recognitionRequest else {
fatalError("Unable to create an SFSpeechAudioBufferRecognitionRequest object")
}
recognitionRequest.shouldReportPartialResults = true
recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, resultHandler: { (result, error) in
var isFinal = false
if result != nil {
if self.speechTimer != nil
{
if (self.speechTimer?.isValid)!
{
self.speechTimer?.invalidate()
}
self.speechTimer = nil;
}
print(result?.bestTranscription.formattedString as Any)
self.speechTimer = Timer.scheduledTimer(withTimeInterval: 2.0, repeats: false, block: { (timer) in
print("Recognition task restart")
})
isFinal = (result?.isFinal)!
if isFinal {
print("Final String: \(result?.bestTranscription.formattedString ?? "No string")")
}
}
})
let recordingFormat = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer, when) in
self.recognitionRequest?.append(buffer)
}
audioEngine.prepare()
do {
try audioEngine.start()
} catch {
print("audioEngine couldn't start because of an error.")
}
}
Video not playing after SpeechRecognizer. Not getting any error just stuck on AVPlayerViewController. I have stopped speechRecognizer also. then after I am trying to play video. The video perfectly plays before speechRecognizer.
Maybe that possible speechRecognizer is not stopping by this code. So, Maybe the problem is in stopRecording().
#IBAction func btnRecord(_ sender: Any) {
player.pause()
player.seek(to: CMTime.init(value: 0, timescale: player.currentTime().timescale))
if self.audioEngine.isRunning {
self.audioEngine.stop()
self.recognitionRequest?.endAudio()
}
else {
try! self.startRecording()
}
}
private func startRecording() throws {
// Cancel the previous task if it's running.
if let recognitionTask = recognitionTask {
recognitionTask.cancel()
self.recognitionTask = nil
}
let audioSession = AVAudioSession.sharedInstance()
try audioSession.setCategory(AVAudioSession.Category.record, mode: .default, options: [])
try audioSession.setMode(AVAudioSession.Mode.measurement)
try audioSession.setActive(true, options: .notifyOthersOnDeactivation)
recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
let inputNode = audioEngine.inputNode
//else { fatalError("Audio engine has no input node") }
guard let recognitionRequest = recognitionRequest else { fatalError("Unable to created a SFSpeechAudioBufferRecognitionRequest object") }
// Configure request so that results are returned before audio recording is finished
recognitionRequest.shouldReportPartialResults = true
// A recognition task represents a speech recognition session.
// We keep a reference to the task so that it can be cancelled.
recognitionTask = speechRecognizer.recognitionTask(with: recognitionRequest) { result, error in
var isFinal = false
if let result = result {
self.text = result.bestTranscription.formattedString
self.lblText.text = self.text
isFinal = result.isFinal
}
if error != nil || isFinal {
self.audioEngine.stop()
inputNode.removeTap(onBus: 0)
self.recognitionRequest = nil
self.recognitionTask = nil
}
}
let recordingFormat = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer: AVAudioPCMBuffer, when: AVAudioTime) in
self.recognitionRequest?.append(buffer)
}
audioEngine.prepare()
try audioEngine.start()
}
private func stopRecording() {
audioEngine.stop()
recognitionRequest?.endAudio()
if let recognitionTask = recognitionTask {
recognitionTask.cancel()
self.recognitionTask = nil
}
}
#IBAction func btnDonePopup(_ sender: Any) {
self.stopRecording()
self.playVideo()
}
Please change audioSession.setCategory to default value:
if error != nil || isFinal {
self.audioEngine.stop()
inputNode.removeTap(onBus: 0)
self.recognitionRequest = nil
self.recognitionTask = nil
do {
try audioSession.setCategory(.soloAmbient, mode: .measurement, options: [])
} catch { }
}
I'm trying to run Text To Speech (AVSpeechSynthesizer) along with Speech To Text from Siri Kit, but I'm stuck with it.
My TTS works perfectly until I run the code to execute the STT, after that my TTS doesn't work anymore. I debugged the code and during the executing of the code, no errors happen, but my text is not transforming to speech. I think somehow my STT is disabling the output microphone and that's why the TTS doesn't transform the text to speech anymore, well, that's just a theory. Ops: My TTS stops working, but my STT works perfectly
Any tips?
Here's my viewController's code:
#IBOutlet weak var microphoneButton: UIButton!
//text to speech
let speechSynthesizer = AVSpeechSynthesizer()
//speech to text
private var speechRecognizer: SFSpeechRecognizer!
private var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?
private var recognitionTask: SFSpeechRecognitionTask?
private var audioEngine = AVAudioEngine()
#IBAction func textToSpeech(_ sender: Any) {
if let word = wordTextField.text{
if !speechSynthesizer.isSpeaking {
//get current dictionary
let dictionary = fetchSelectedDictionary()
//get current language
let language = languagesWithCodes[(dictionary?.language)!]
let speechUtterance = AVSpeechUtterance(string: word)
speechUtterance.voice = AVSpeechSynthesisVoice(language: language)
speechUtterance.rate = 0.4
//speechUtterance.pitchMultiplier = pitch
//speechUtterance.volume = volume
speechSynthesizer.speak(speechUtterance)
}
else{
speechSynthesizer.continueSpeaking()
}
}
}
#IBAction func speechToText(_ sender: Any) {
if audioEngine.isRunning {
audioEngine.stop()
recognitionRequest?.endAudio()
microphoneButton.isEnabled = false
microphoneButton.setTitle("Start Recording", for: .normal)
} else {
startRecording()
microphoneButton.setTitle("Stop Recording", for: .normal)
}
}
func startRecording() {
if recognitionTask != nil {
recognitionTask?.cancel()
recognitionTask = nil
}
let audioSession = AVAudioSession.sharedInstance()
do {
try audioSession.setCategory(AVAudioSessionCategoryRecord)
try audioSession.setMode(AVAudioSessionModeMeasurement)
try audioSession.setActive(true, with: .notifyOthersOnDeactivation)
} catch {
print("audioSession properties weren't set because of an error.")
}
recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
guard let inputNode = audioEngine.inputNode else {
fatalError("Audio engine has no input node")
}
guard let recognitionRequest = recognitionRequest else {
fatalError("Unable to create an SFSpeechAudioBufferRecognitionRequest object")
}
recognitionRequest.shouldReportPartialResults = true
recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, resultHandler: { (result, error) in
var isFinal = false
if result != nil {
self.wordTextField.text = result?.bestTranscription.formattedString
isFinal = (result?.isFinal)!
}
if error != nil || isFinal {
self.audioEngine.stop()
inputNode.removeTap(onBus: 0)
self.recognitionRequest = nil
self.recognitionTask = nil
self.microphoneButton.isEnabled = true
}
})
let recordingFormat = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer, when) in
self.recognitionRequest?.append(buffer)
}
audioEngine.prepare()
do {
try audioEngine.start()
} catch {
print("audioEngine couldn't start because of an error.")
}
wordTextField.text = "Say something, I'm listening!"
}
}
This line:
try audioSession.setMode(AVAudioSessionModeMeasurement)
is probably the reason. It can cause the volume to be throttled so low, that it sounds like it is off. Try:
try audioSession.setMode(AVAudioSessionModeDefault)
and see if it works.
Probably because your audiosession is in Record mode, You have 2 solutions, first would be to set your try audioSession.setCategory(AVAudioSessionCategoryRecord) to AVAudioSessionCategoryPlayAndRecord (This will work) but a cleaner way would be to get a separate function for saying something and then set your AVAudioSessionCategory to AVAudioSessionCategoryPlayback
Hope this helped.
This is my first time using SFSpeechRecognizer in Swift and one piece of functionality isn't working. When I press the button audioButtonPressed, it seems to start recognition fine, and pressing it again stops it. When I try pressing it again to start recognition again, the recognition doesnt work and leaves me with a blank text view. How should I do this?
Here's my code
#IBAction func audioButtonPressed(_ sender: Any) {
if isRecording {
stopRecording()
delegate?.speechRecognitionComplete(query: query)
audioButton.backgroundColor = UIColor.red
isRecording = false
} else {
startRecording()
audioButton.backgroundColor = UIColor.green
isRecording = true
}
}
func stopRecording() {
audioEngine.stop()
audioEngine.inputNode?.removeTap(onBus: 0)
recognitionRequest = nil
recognitionTask = nil
}
func startRecording() {
recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
guard let recognitionRequest = recognitionRequest else {
return
}
recognitionRequest.shouldReportPartialResults = true
recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, resultHandler: { (result, error) in
var isFinal = false
if result != nil {
self.query = result?.bestTranscription.formattedString
self.audioTextField.text = self.query
isFinal = (result?.isFinal)!
}
if error != nil || isFinal {
self.stopRecording()
}
})
let audioSession = AVAudioSession.sharedInstance()
do {
try audioSession.setCategory(AVAudioSessionCategoryRecord)
try audioSession.setMode(AVAudioSessionModeMeasurement)
try audioSession.setActive(true, with: .notifyOthersOnDeactivation)
} catch {
print("the audio session isn't configured correctly")
}
let recordingFormat = audioEngine.inputNode?.outputFormat(forBus: 0)
audioEngine.inputNode?.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer, time) in
self.recognitionRequest?.append(buffer)
}
audioEngine.prepare()
do {
try audioEngine.start()
audioTextField.text = "How may I help you"
} catch {
print("audio engine failed to start")
}
}
When i first press audiobutton, start recording is called and it works perfectly, pressing it again stop recording is called and works fine, but then pressing again does not make the recognition start again.... ideas?
I think you are missing recognitionTask.cancel() before you dealloc task in stopRecording function.