AVSpeechSynthesizer has a fairly simple API, which doesn't have support for saving to an audio file built-in.
I'm wondering if there's a way around this - perhaps recording the output as it's played silently, for playback later? Or something more efficient.
This is finally possible, in iOS 13 AVSpeechSynthesizer now has write(_:toBufferCallback:):
let synthesizer = AVSpeechSynthesizer()
let utterance = AVSpeechUtterance(string: "test 123")
utterance.voice = AVSpeechSynthesisVoice(language: "en")
var output: AVAudioFile?
synthesizer.write(utterance) { (buffer: AVAudioBuffer) in
guard let pcmBuffer = buffer as? AVAudioPCMBuffer else {
fatalError("unknown buffer type: \(buffer)")
}
if pcmBuffer.frameLength == 0 {
// done
} else {
// append buffer to file
if output == nil {
output = AVAudioFile(
forWriting: URL(fileURLWithPath: "test.caf"),
settings: pcmBuffer.format.settings,
commonFormat: .pcmFormatInt16,
interleaved: false)
}
output?.write(from: pcmBuffer)
}
}
As of now AVSpeechSynthesizer does not support this . There in no way get the audio file using AVSpeechSynthesizer . I tried this few weeks ago for one of my apps and found out that it is not possible , Also nothing has changed for AVSpeechSynthesizer in iOS 8.
I too thought of recording the sound as it is being played , but there are so many flaws with that approach like user might be using headphones, the system sound might be low or mute , it might catch other external sound, so its not advisable to go with that approach.
You can use OSX to prepare AIFF files (or, maybe, some OSX-based service) via NSSpeechSynthesizer method
startSpeakingString:toURL:
Related
I'm trying to create an app with Swift.
I integrated correctly speech-to-text and text-to-speech: my app works perfectly. You can find my project here.
After speech-to-text, the app makes an http request to a server (sending the text recognized) and the response (It is a string, i.e.: "Ok, I'll show you something") is reproduced vocally from text-to-speech. But, there is a big issue and I can't solve it.
When the app is reproducing the text vocally, the voice is too slow, as if it were in the background, as if there was something to be reproduced more important than the voice (actually nothing).
Debugging, I discovered that the issue starts using audioEngine (AVAudioEngine) inside the function recordAndRecognizeSpeech(). Running the app without using this function and playing a random text it works like a charm.
So, in my opinion when the app is reproducing the text vocally, it thinks there is still active audioengine, so the volume is very slow.
But, before reproducing the text, I called these functions (look inside ac function, line 96):
audioEngine.stop()
audioEngine.reset()
How can I solve this issue?
EDIT:
I found a partial solution. Now before the app plays the text vocally my code is:
audioEngine.inputNode.removeTap(onBus: 0)
audioEngine.stop()
audioEngine.reset()
recognitionTask?.cancel()
isRecording = false
microphoneButton.setTitle("Avvia..", for: UIControl.State.normal);
do {
let audioSession = AVAudioSession.sharedInstance()
try audioSession.setCategory(AVAudioSession.Category.ambient)
try audioSession.setActive(false, options: .notifyOthersOnDeactivation)
} catch {
print(error)
}
make_request(msg: self.speech_result.text!)
The function .setCategory works and the volume is like the default one. When I try to recall recordAndRecognizeSpeech() function, the app gives me this exception:
VAEInternal.h:70:_AVAE_Check: required condition is false: [AVAudioIONodeImpl.mm:910:SetOutputFormat: (IsFormatSampleRateAndChannelCountValid(hwFormat))]
This exception is caused by .setCategory(AVAudioSession.Category.ambient), it should be .playAndRecord, but with this value the volume returns to be low.
try this one.
let speaker = AVSpeechSynthesizer()
func say(text: String, language: String) {
// Start audio session
let audioSession = AVAudioSession.sharedInstance()
do {
try audioSession.setCategory(AVAudioSession.Category.playAndRecord)
try audioSession.setMode(AVAudioSession.Mode.default)
try audioSession.setActive(true)
try AVAudioSession.sharedInstance().overrideOutputAudioPort(AVAudioSession.PortOverride.speaker)
} catch {
return
}
if speaker.isSpeaking {
speaker.stopSpeaking(at: .immediate)
} else {
myUtterance = AVSpeechUtterance(string: text)
myUtterance.rate = AVSpeechUtteranceDefaultSpeechRate
myUtterance.voice = AVSpeechSynthesisVoice(language: language)
myUtterance.pitchMultiplier = 1
myUtterance.volume = 2
DispatchQueue.main.async {
self.speaker.speak(myUtterance)
}
}
}
Try This .
set rate for play speedly
var speedd = AVSpeechSynthesizer()
var voicert = AVSpeechUtterance()
voicert = AVSpeechUtterance(string: "Your post appears to contain code that is not properly formatted as code. Please indent all code by 4 spaces using the code toolbar button or the CTRL+K keyboard shortcut. For more editing help, click the [?] toolbar icon")
voicert.voice = AVSpeechSynthesisVoice(language: "en-US")
voicert.rate = 0.5
speedd.speak(voicert)
I'd like to know how to properly let my iPhone speak one sentence while my app is in background but then return to whatever was playing before.
My question is quite similar to AVSpeechSynthesizer in background mode but again with the difference that I want to be able to "say something" while in background without having to stop Music that is playing. So while my AVSpeechSynthesizer is speaking, music should pause (or be a bit less loud) but then it should resume. Even when my app is currently in background.
What I am trying to archive is a spoken summary of tracking-stats while GPS-Tracking in my fitness app. And chances are that you are listening to music is quite high, and I don't want to disturb the user...
I found the answer myself...
The important part ist to configure the AVAudioSession with the .duckOthers option:
let audioSession = AVAudioSession.sharedInstance()
try audioSession.setCategory(AVAudioSessionCategoryPlayback, with: .duckOthers)
This will make playback of f.e. music less loud but this would make it stay less loud even when speech is done. This is why you need to set a delegate for the AVSpeechSynthesizer and then handle it like so:
func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didFinish utterance: AVSpeechUtterance) {
guard !synthesizer.isSpeaking else { return }
let audioSession = AVAudioSession.sharedInstance()
try? audioSession.setActive(false)
}
That way, music will continue with normal volume after speech is done.
Also, right before speaking, I activate my audioSession just to make sure (not sure if that would really be necessary, but since I do so, I have no more problems...)
For swift 3, import AVKit then add
try? AVAudioSession.sharedInstance().setCategory(AVAudioSessionCategoryPlayback)
for swift 4.2 / Xcode 10
unfortunately .duckOthers is no longer available; I managed to make it work like that:
let audioSession = AVAudioSession.sharedInstance()
try! audioSession.setCategory(AVAudioSession.Category.ambient, mode: .default)
Swift 5.0
let synthesizer = AVSpeechSynthesizer()
let synthesizerVoice = AVSpeechSynthesisVoice(language: "en-US")
let str = "String"
let utterance = AVSpeechUtterance(string: str)
let audioSession = AVAudioSession.sharedInstance()
try! audioSession.setCategory(
AVAudioSession.Category.playback,
options: AVAudioSession.CategoryOptions.mixWithOthers
)
utterance.rate = 0.5
utterance.voice = synthesizerVoice
synthesizer.speak(utterance)
Background music don't stop and your word sound play even phone in silent mode by this two line of code before usual text to speech codes :
let audioSession = AVAudioSession.sharedInstance()
try!audioSession.setCategory(AVAudioSessionCategoryPlayback, with: AVAudioSessionCategoryOptions.mixWithOthers)
I am having a really difficult time with playing audio in the background of my app. The app is a timer that is counting down and plays bells, and everything worked using the timer originally. Since you cannot run a timer over 3 minutes in the background, I need to play the bells another way.
The user has the ability to choose bells and set the time for these bells to play (e.g. play bell immediately, after 5 minutes, repeat another bell every 10 minutes, etc).
So far I have tried using notifications using DispatchQueue.main and this will work fine if the user does not pause the timer. If they re-enter the app though and pause, I cannot seem to cancel this queue or pause it in anyway.
Next I tried using AVAudioEngine, and created a set of nodes. These will play while the app is in the foreground but seem to stop upon backgrounding. Additionally when I pause the engine and resume later, it won't pause the sequence properly. It will squish the bells into playing one after the other or not at all.
If anyone has any ideas of how to solve my issue that would be great. Technically I could try remove everything from the engine and recreate it from the paused time when the user pauses/resumes, but this seems quite costly. It also doesn't solve the problem of the audio stopping in the background. I have the required background mode 'App plays audio or streams audio/video using Airplay', and it is also checked under the background modes in capabilities.
Below is a sample of how I tried to set up the audio engine. The registerAndPlaySound method is called several other times to create the chain of nodes (or is this done incorrectly?). The code is kinda messy at the moment because I have been trying many ways trying to get this to work.
func setupSounds{
if (attached){
engine.detach(player)
}
engine.attach(player)
attached = true
let mixer = engine.mainMixerNode
engine.connect(player, to: mixer, format: mixer.outputFormat(forBus: 0))
var bell = ""
do {
try engine.start()
} catch {
return
}
if (currentSession.bellObject?.startBell != nil){
bell = (currentSession.bellObject?.startBell)!
guard let url = Bundle.main.url(forResource: bell, withExtension: "mp3") else {
return
}
registerAndPlaySound(url: url, delay: warmUpTime)
}
}
func registerAndPlaySound(url: URL, delay: Double) {
do {
let file = try AVAudioFile(forReading: url)
let format = file.processingFormat
let capacity = file.length
let buffer = AVAudioPCMBuffer(pcmFormat: format, frameCapacity: AVAudioFrameCount(capacity))
do {
try file.read(into: buffer)
}catch {
return
}
let sampleRate = buffer.format.sampleRate
let sampleTime = sampleRate*delay
let futureTime = AVAudioTime(sampleTime: AVAudioFramePosition(sampleTime), atRate: sampleRate)
player.scheduleBuffer(buffer, at: futureTime, options: AVAudioPlayerNodeBufferOptions(rawValue: 0), completionHandler: nil)
player.play()
} catch {
return
}
}
Is it possible to use the enhanced/high quality voices (Alex in the U.S.) with the speech synthesizer? I have downloaded the voices but find no way to tell the synthesizer to use it rather than the default voice.
Since voices are generally selected by BCP-47 codes and there is only on for US English, it appears there is no way to further differentiate voices. Am I missing something? (One would think Apple might have considered a need for different dialects, but I am not seeing it).
TIA.
Yes, possible to pick from the 2 that seem to be available on my system, like this:
class Speak {
let voices = AVSpeechSynthesisVoice.speechVoices()
let voiceSynth = AVSpeechSynthesizer()
var voiceToUse: AVSpeechSynthesisVoice?
init(){
for voice in voices {
if voice.name == "Samantha (Enhanced)" && voice.quality == .enhanced {
voiceToUse = voice
}
}
}
func sayThis(_ phrase: String){
let utterance = AVSpeechUtterance(string: phrase)
utterance.voice = voiceToUse
utterance.rate = 0.5
voiceSynth.speak(utterance)
}
}
Then, somewhere in your app, do something like this:
let voice = Speak()
voice.sayThis("I'm speaking better Seppo, now!")
This was a bug in the previous versions of iOS that the apps using the synthesiser weren't using the enhanced voices. This bug has been fixed in iOS10. iOS10 now uses the enhanced voices.
When using an AVAudioPlayerNode to schedule a short buffer to play immediately on a touch event ("Touch Up Inside"), I've noticed audible glitches / artifacts on playback while testing. The audio does not glitch at all in iOS simulator, however there is audible distortion on playback when I run the app on an actual iOS device. The audible distortion occurs randomly (the triggered sound will sometimes sound great, while other times it sounds distorted)
I've tried using different audio files, file formats, and preparing the buffer for playback using the prepareWithFrameCount method, but unfortunately the result is always the same and I'm stuck wondering what could be going wrong..
I've stripped the code down to globals for clarity and simplicity. Any help or insight would be greatly appreciated. This is my first attempt at developing an iOS app and my first question posted on Stack Overflow.
let filePath = NSBundle.mainBundle().pathForResource("BD_withSilence", ofType: "caf")!
let fileURL: NSURL = NSURL(fileURLWithPath: filePath)!
var error: NSError?
let file = AVAudioFile(forReading: fileURL, error: &error)
let fileFormat = file.processingFormat
let frameCount = UInt32(file.length)
let buffer = AVAudioPCMBuffer(PCMFormat: fileFormat, frameCapacity: frameCount)
let audioEngine = AVAudioEngine()
let playerNode = AVAudioPlayerNode()
func startEngine() {
var error: NSError?
file.readIntoBuffer(buffer, error: &error)
audioEngine.attachNode(playerNode)
audioEngine.connect(playerNode, to: audioEngine.mainMixerNode, format: buffer.format)
audioEngine.prepare()
func start() {
var error: NSError?
audioEngine.startAndReturnError(&error)
}
start()
}
startEngine()
let frameCapacity = AVAudioFramePosition(buffer.frameCapacity)
let frameLength = buffer.frameLength
let sampleRate: Double = 44100.0
func play() {
func scheduleBuffer() {
playerNode.scheduleBuffer(buffer, atTime: nil, options: AVAudioPlayerNodeBufferOptions.Interrupts, completionHandler: nil)
playerNode.prepareWithFrameCount(frameLength)
}
if playerNode.playing == false {
scheduleBuffer()
let time = AVAudioTime(sampleTime: frameCapacity, atRate: sampleRate)
playerNode.playAtTime(time)
}
else {
scheduleBuffer()
}
}
// triggered by a "Touch Up Inside" event on a UIButton in my ViewController
#IBAction func triggerPlay(sender: AnyObject) {
play()
}
Update:
Ok I think I've identified the source of the distortion: the volume of the node(s) is too great at output and causes clipping. By adding these two lines in my startEngine function, the distortion no longer occurred:
playerNode.volume = 0.8
audioEngine.mainMixerNode.volume = 0.8
However, I'm still don't know why I need to lower the output- my audio file itself does not clip. I'm guessing that it might be a result of the way that the AVAudioPlayerNodeBufferOptions.Interrupts is implemented. When a buffer interrupts another buffer, could there be an increase in output volume as a result of the interruption, causing output clipping? I'm still looking for a solid understanding as to why this occurs.. If anyone is willing/able to provide any clarification about this that would be fantastic!
Not sure if this is the problem you experienced in 2015, it may be the same issue that #suthar experienced in 2018.
I experienced a very similar problem and was due to the fact that the sampleRate on the device is different to the simulator. On macOS it is 44100 and on iOS Devices (late model ones) it is 48000.
So when you fill your buffer with 44100 samples on a 48000 device, you get 3900 samples of silence. When played back it doesn't sound like silence, it sounds like a glitch.
I used the mainMixer format when connecting my playerNode and also when creating my pcmBuffer. Don't refer to 48000 or 44100 anywhere in the code.
audioEngine.attach( playerNode)
audioEngine.connect( playerNode, to:mixerNode, format:mixerNode.outputFormat(forBus:0))
let pcmBuffer = AVAudioPCMBuffer( pcmFormat:SynthEngine.shared.audioEngine.mainMixerNode.outputFormat( forBus:0),
frameCapacity:AVAudioFrameCount(bufferSize))