I'm trying to create an app with Swift.
I integrated correctly speech-to-text and text-to-speech: my app works perfectly. You can find my project here.
After speech-to-text, the app makes an http request to a server (sending the text recognized) and the response (It is a string, i.e.: "Ok, I'll show you something") is reproduced vocally from text-to-speech. But, there is a big issue and I can't solve it.
When the app is reproducing the text vocally, the voice is too slow, as if it were in the background, as if there was something to be reproduced more important than the voice (actually nothing).
Debugging, I discovered that the issue starts using audioEngine (AVAudioEngine) inside the function recordAndRecognizeSpeech(). Running the app without using this function and playing a random text it works like a charm.
So, in my opinion when the app is reproducing the text vocally, it thinks there is still active audioengine, so the volume is very slow.
But, before reproducing the text, I called these functions (look inside ac function, line 96):
audioEngine.stop()
audioEngine.reset()
How can I solve this issue?
EDIT:
I found a partial solution. Now before the app plays the text vocally my code is:
audioEngine.inputNode.removeTap(onBus: 0)
audioEngine.stop()
audioEngine.reset()
recognitionTask?.cancel()
isRecording = false
microphoneButton.setTitle("Avvia..", for: UIControl.State.normal);
do {
let audioSession = AVAudioSession.sharedInstance()
try audioSession.setCategory(AVAudioSession.Category.ambient)
try audioSession.setActive(false, options: .notifyOthersOnDeactivation)
} catch {
print(error)
}
make_request(msg: self.speech_result.text!)
The function .setCategory works and the volume is like the default one. When I try to recall recordAndRecognizeSpeech() function, the app gives me this exception:
VAEInternal.h:70:_AVAE_Check: required condition is false: [AVAudioIONodeImpl.mm:910:SetOutputFormat: (IsFormatSampleRateAndChannelCountValid(hwFormat))]
This exception is caused by .setCategory(AVAudioSession.Category.ambient), it should be .playAndRecord, but with this value the volume returns to be low.
try this one.
let speaker = AVSpeechSynthesizer()
func say(text: String, language: String) {
// Start audio session
let audioSession = AVAudioSession.sharedInstance()
do {
try audioSession.setCategory(AVAudioSession.Category.playAndRecord)
try audioSession.setMode(AVAudioSession.Mode.default)
try audioSession.setActive(true)
try AVAudioSession.sharedInstance().overrideOutputAudioPort(AVAudioSession.PortOverride.speaker)
} catch {
return
}
if speaker.isSpeaking {
speaker.stopSpeaking(at: .immediate)
} else {
myUtterance = AVSpeechUtterance(string: text)
myUtterance.rate = AVSpeechUtteranceDefaultSpeechRate
myUtterance.voice = AVSpeechSynthesisVoice(language: language)
myUtterance.pitchMultiplier = 1
myUtterance.volume = 2
DispatchQueue.main.async {
self.speaker.speak(myUtterance)
}
}
}
Try This .
set rate for play speedly
var speedd = AVSpeechSynthesizer()
var voicert = AVSpeechUtterance()
voicert = AVSpeechUtterance(string: "Your post appears to contain code that is not properly formatted as code. Please indent all code by 4 spaces using the code toolbar button or the CTRL+K keyboard shortcut. For more editing help, click the [?] toolbar icon")
voicert.voice = AVSpeechSynthesisVoice(language: "en-US")
voicert.rate = 0.5
speedd.speak(voicert)
Related
// My code is below
do{
file = try AKAudioFile(readFileName: "Sound1.mp3", baseDir: .resources)
// file = try AKAudioFile(forReading: SingletonClass.sharedInstance.recordedURLs[SingletonClass.sharedInstance.recordedURL]!)
// AKSettings.defaultToSpeaker = true
}
catch {
}
do {
player = try AKAudioPlayer(file : file)
}
catch {
}
let lfoAmplitude = 1_000
let lfoRate = 1.0 / 3.428
_ = 0.9
//filter section effect below
filterSectionEffect = AKOperationEffect(tracker) { input, _ in
let lfo = AKOperation.sineWave(frequency: lfoRate, amplitude: lfoAmplitude)
return input.moogLadderFilter(cutoffFrequency: lfo + cutoffFrequency,
resonance: resonance)
}
Audiokit.output = filterSectionEffect
Audiokit.start()
And whenever I play the audio using a button with code player.play , the audio gets played properly. And if I connect the headphones, it gets played properly as well but as soon as I disconnect the headphones, I see the error:
It happens in same way for both wired as well as bluetooth headphones.
My app with stuck because of this issue only that too happens only with AKOperationEffect. Any help would be appreciated.
The comment from Kunal Verma that this is fixed is correct, but just for completeness here is the commit that fixed it.
https://github.com/AudioKit/AudioKit/commit/ffac4acbe93553764f6095011e9bf5d71fdc88c2
I'd like to know how to properly let my iPhone speak one sentence while my app is in background but then return to whatever was playing before.
My question is quite similar to AVSpeechSynthesizer in background mode but again with the difference that I want to be able to "say something" while in background without having to stop Music that is playing. So while my AVSpeechSynthesizer is speaking, music should pause (or be a bit less loud) but then it should resume. Even when my app is currently in background.
What I am trying to archive is a spoken summary of tracking-stats while GPS-Tracking in my fitness app. And chances are that you are listening to music is quite high, and I don't want to disturb the user...
I found the answer myself...
The important part ist to configure the AVAudioSession with the .duckOthers option:
let audioSession = AVAudioSession.sharedInstance()
try audioSession.setCategory(AVAudioSessionCategoryPlayback, with: .duckOthers)
This will make playback of f.e. music less loud but this would make it stay less loud even when speech is done. This is why you need to set a delegate for the AVSpeechSynthesizer and then handle it like so:
func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didFinish utterance: AVSpeechUtterance) {
guard !synthesizer.isSpeaking else { return }
let audioSession = AVAudioSession.sharedInstance()
try? audioSession.setActive(false)
}
That way, music will continue with normal volume after speech is done.
Also, right before speaking, I activate my audioSession just to make sure (not sure if that would really be necessary, but since I do so, I have no more problems...)
For swift 3, import AVKit then add
try? AVAudioSession.sharedInstance().setCategory(AVAudioSessionCategoryPlayback)
for swift 4.2 / Xcode 10
unfortunately .duckOthers is no longer available; I managed to make it work like that:
let audioSession = AVAudioSession.sharedInstance()
try! audioSession.setCategory(AVAudioSession.Category.ambient, mode: .default)
Swift 5.0
let synthesizer = AVSpeechSynthesizer()
let synthesizerVoice = AVSpeechSynthesisVoice(language: "en-US")
let str = "String"
let utterance = AVSpeechUtterance(string: str)
let audioSession = AVAudioSession.sharedInstance()
try! audioSession.setCategory(
AVAudioSession.Category.playback,
options: AVAudioSession.CategoryOptions.mixWithOthers
)
utterance.rate = 0.5
utterance.voice = synthesizerVoice
synthesizer.speak(utterance)
Background music don't stop and your word sound play even phone in silent mode by this two line of code before usual text to speech codes :
let audioSession = AVAudioSession.sharedInstance()
try!audioSession.setCategory(AVAudioSessionCategoryPlayback, with: AVAudioSessionCategoryOptions.mixWithOthers)
I am having a really difficult time with playing audio in the background of my app. The app is a timer that is counting down and plays bells, and everything worked using the timer originally. Since you cannot run a timer over 3 minutes in the background, I need to play the bells another way.
The user has the ability to choose bells and set the time for these bells to play (e.g. play bell immediately, after 5 minutes, repeat another bell every 10 minutes, etc).
So far I have tried using notifications using DispatchQueue.main and this will work fine if the user does not pause the timer. If they re-enter the app though and pause, I cannot seem to cancel this queue or pause it in anyway.
Next I tried using AVAudioEngine, and created a set of nodes. These will play while the app is in the foreground but seem to stop upon backgrounding. Additionally when I pause the engine and resume later, it won't pause the sequence properly. It will squish the bells into playing one after the other or not at all.
If anyone has any ideas of how to solve my issue that would be great. Technically I could try remove everything from the engine and recreate it from the paused time when the user pauses/resumes, but this seems quite costly. It also doesn't solve the problem of the audio stopping in the background. I have the required background mode 'App plays audio or streams audio/video using Airplay', and it is also checked under the background modes in capabilities.
Below is a sample of how I tried to set up the audio engine. The registerAndPlaySound method is called several other times to create the chain of nodes (or is this done incorrectly?). The code is kinda messy at the moment because I have been trying many ways trying to get this to work.
func setupSounds{
if (attached){
engine.detach(player)
}
engine.attach(player)
attached = true
let mixer = engine.mainMixerNode
engine.connect(player, to: mixer, format: mixer.outputFormat(forBus: 0))
var bell = ""
do {
try engine.start()
} catch {
return
}
if (currentSession.bellObject?.startBell != nil){
bell = (currentSession.bellObject?.startBell)!
guard let url = Bundle.main.url(forResource: bell, withExtension: "mp3") else {
return
}
registerAndPlaySound(url: url, delay: warmUpTime)
}
}
func registerAndPlaySound(url: URL, delay: Double) {
do {
let file = try AVAudioFile(forReading: url)
let format = file.processingFormat
let capacity = file.length
let buffer = AVAudioPCMBuffer(pcmFormat: format, frameCapacity: AVAudioFrameCount(capacity))
do {
try file.read(into: buffer)
}catch {
return
}
let sampleRate = buffer.format.sampleRate
let sampleTime = sampleRate*delay
let futureTime = AVAudioTime(sampleTime: AVAudioFramePosition(sampleTime), atRate: sampleRate)
player.scheduleBuffer(buffer, at: futureTime, options: AVAudioPlayerNodeBufferOptions(rawValue: 0), completionHandler: nil)
player.play()
} catch {
return
}
}
To explain my situation a little better I'm trying to make an app which will play a ping noise when a button is pressed and then proceed to record and transcribe the user's voice immediately after.
For the ping sound I'm using System Sound Services, to record the audio I'm using AudioToolbox, and to transcribe it I'm using Speech kit.
I believe the crux of my problem lies in the timing of the asynchronous System sound services play function:
//Button pressed function
let audiosession = AVAudioSession.sharedInstance()
let filename = "Ping"
let ext = "wav"
if let soundUrl = Bundle.main.url(forResource: filename, withExtension: ext){
var soundId: SystemSoundID = 0
AudioServicesCreateSystemSoundID(soundUrl as CFURL, &soundId)
AudioServicesAddSystemSoundCompletion(soundId, nil, nil, {(soundid,_) -> Void in
AudioServicesDisposeSystemSoundID(soundid)
print("Sound played!")}, nil)
AudioServicesPlaySystemSound(soundId)
}
do{
try audiosession.setCategory(AVAudioSessionCategoryRecord)
try audiosession.setMode(AVAudioSessionModeMeasurement)
try audiosession.setActive(true, with: .notifyOthersOnDeactivation)
print("Changing modes!")
}catch{
print("error with audio session")
}
recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
guard let inputNode = audioEngine.inputNode else{
fatalError("Audio engine has no input node!")
}
guard let recognitionRequest = recognitionRequest else{
fatalError("Unable to create a speech audio buffer recognition request object")
}
recognitionRequest.shouldReportPartialResults = true
recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, delegate: self)
let recordingFormat = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer, when) in
self.recognitionRequest?.append(buffer)
}
audioEngine.prepare()
do{
try audioEngine.start()
delegate?.didStartRecording()
}catch{
print("audioEngine couldn't start because of an error")
}
What happens when I run this code is that it records the voice and transcribes it successfully. However the ping is never played. The two(non-error) print statements I have in there fire in the order:
Changing modes!
Sound played!
So to my understanding, the reason the ping sound isn't being played is because by the time it actually completes I've already changed the audio session category from playback to record. Just to verify this is true, I tried removing everything but the sound services ping and it plays the sound as expected.
So my question is what is the best way to bypass the asynchronous nature of the AudioServicesPlaySystemSound call? I've experimented with trying to pass self into the completion function so I could have it trigger a function in my class which then runs the recording chunk. However I haven't been able to figure out how one actually goes about converting self to an UnsafeMutableRawPointer so it can be passed as clientData. Furthermore, even if I DID know how to do that, I'm not sure if it's even a good idea or the intended use of that parameter.
Alternatively, I could probably solve this problem by relying on something like notification center. But once again that just seems like a very clunky way of solving the problem that I'm going to end up regretting later.
Does anyone know what the correct way to handle this type of situation is?
Update:
As per Gruntcake's request, here is my attempt to access self in the completion block.
First I create a userData constant which is an UnsafeMutableRawPointer to self:
var me = self
let userData = withUnsafePointer(to: &me) { ptr in
return unsafeBitCast(ptr, to: UnsafeMutableRawPointer.self)
Next I use that constant in my callback block, and attempt to access self from it:
AudioServicesAddSystemSoundCompletion(soundId, nil, nil, {(sounded,me) -> Void in
AudioServicesDisposeSystemSoundID(sounded)
let myself = Unmanaged<myclassname>.fromOpaque(me!).takeRetainedValue()
myself.doOtherStuff()
print("Sound played!")}, userData)
Your attempt to call doOtherStuff() in the completion block is a correct approach (the only other one is notifications, those are the only two options)
What is complicating it in this case is the bridging from Obj-C to Swift that is necessary. Code to do that is:
let myData = unsafeBitCast(self, UnsafeMutablePointer<Void>.self)
AudioServicesAddSystemSoundCompletion(YOUR_SOUND_ID, CFRunLoopGetMain(), kCFRunLoopDefaultMode,{ (mSound, mVoid) in
let me = unsafeBitCast(mVoid, YOURCURRENTCLASS.self)
//me it is your current object so if yo have a variable like
// var someVar you can do
print(me.someVar)
}, myData)
Credit: This code was taken from an answer to this question, though it is not the accepted answer:
How do I implement AudioServicesSystemSoundCompletionProc in Swift?
AVSpeechSynthesizer has a fairly simple API, which doesn't have support for saving to an audio file built-in.
I'm wondering if there's a way around this - perhaps recording the output as it's played silently, for playback later? Or something more efficient.
This is finally possible, in iOS 13 AVSpeechSynthesizer now has write(_:toBufferCallback:):
let synthesizer = AVSpeechSynthesizer()
let utterance = AVSpeechUtterance(string: "test 123")
utterance.voice = AVSpeechSynthesisVoice(language: "en")
var output: AVAudioFile?
synthesizer.write(utterance) { (buffer: AVAudioBuffer) in
guard let pcmBuffer = buffer as? AVAudioPCMBuffer else {
fatalError("unknown buffer type: \(buffer)")
}
if pcmBuffer.frameLength == 0 {
// done
} else {
// append buffer to file
if output == nil {
output = AVAudioFile(
forWriting: URL(fileURLWithPath: "test.caf"),
settings: pcmBuffer.format.settings,
commonFormat: .pcmFormatInt16,
interleaved: false)
}
output?.write(from: pcmBuffer)
}
}
As of now AVSpeechSynthesizer does not support this . There in no way get the audio file using AVSpeechSynthesizer . I tried this few weeks ago for one of my apps and found out that it is not possible , Also nothing has changed for AVSpeechSynthesizer in iOS 8.
I too thought of recording the sound as it is being played , but there are so many flaws with that approach like user might be using headphones, the system sound might be low or mute , it might catch other external sound, so its not advisable to go with that approach.
You can use OSX to prepare AIFF files (or, maybe, some OSX-based service) via NSSpeechSynthesizer method
startSpeakingString:toURL: