AVSpeechSynthesizer: how to display in a default player view - ios

I use AVSpeechSynthesizer to play text books via audio.
private lazy var synthesizer: AVSpeechSynthesizer = {
let synthesizer = AVSpeechSynthesizer()
synthesizer.delegate = self
return synthesizer
}()
let utterance = AVSpeechUtterance(string: text)
utterance.voice = AVSpeechSynthesisVoice(
language: languageIdentifier(from: language)
)
synthesizer.speak(utterance)
I want to update information in iPhone's default player view (probably naming is wrong 🙏):
indicate playing Chapter with some text
enable next button to play the next chapter
How can I accomplish this?

I really don't think you want to hack your way through this.. But if you really do I would:
Listen to remote commands (UIApplication.sharedApplication().beginReceivingRemoteControlEvents(), see Apple Sample Project
Set your properties on MPNowPlayingInfoCenter: MPNowPlayingInfoCenter.default().nowPlayingInfo[MPMediaItemPropertyTitle] = "Title"
Implement the AVSpeechSynthesizerDelegate and try to map the delegate functions to playback states and estimate the playback progress using speechSynthesizer(_:willSpeakRangeOfSpeechString:utterance:) (idk if possible)
You might have to play with the usesApplicationAudioSession property of AVSpeechSynthesizer to have more control over the audio session (set categories etc.)

Related

Play audio buffers generated by AVSpeechSynthesizer directly

We have a requirement for audio processing on the output of AVSpeechSynthesizer. So we started with using the write method of AVSpeechSynthesizer class to apply processing on top. of it. What we currently have:
var synthesizer = AVSpeechSynthesizer()
var playerNode: AVAudioPlayerNode = AVAudioPlayerNode()
fun play(audioCue: String){
let utterance = AVSpeechUtterance(string: audioCue)
synthesizer.write(utterance, toBufferCallback: {[weak self] buffer in
// We do our processing including conversion from pcmFormatFloat16 format to pcmFormatFloat32 format which is supported by AVAudioPlayerNode
self.playerNode.scheduleBuffer(buffer as! AVAudioPCMBuffer, completionCallbackType: .dataPlayedBack)
}
}
All of it was working fine before iOS 16 but with iOS 16 we started getting this exception:
[AXTTSCommon] TTSPlaybackEnqueueFullAudioQueueBuffer: error -66686 enqueueing buffer
Not sure what this exception means exactly. So we are looking for a way of addressing this exception or may be a better way of playing the buffers.
UPDATE:
Created an empty project for testing and it turns out the write method if called with an empty bloc generates these logs:
Code I have used for Swift project :
let synth = AVSpeechSynthesizer()
let myUtterance = AVSpeechUtterance(string: message)
myUtterance.rate = 0.4
synth.speak(myUtterance)
Can move let synth = AVSpeechSynthesizer() out of this method and declare on top for this class and use.
Settings to enable for Xcode14 & iOS 16 : If you are using XCode14 and iOS16, it may be voices under spoken content is not downloaded and you will get an error on console saying identifier, source, content nil. All you need to do is, go to accessiblity in settings -> Spoken content -> Voices -> Select any language and download any profile. After this run ur voice and you will be able to hear the speech from passed text.
It is working for me now.

Real time rate and pitch adjustments Swift

I am setting up a tts app with AVSpeechSynthesizer. I have to do real-time pitch and rate adjustments. I am using UISLider for adjusting pitch and rate.
Here is my code:-
#IBAction func sl(_ sender: UISlider) {
if synthesizer.isSpeaking {
synthesizer.stopSpeaking(at: .immediate)
self.rate = sender.value
if currentRange.length > 0 {
let valuee = currentRange.length + currentRange.location
let neww = self.tvEditor.text.dropFirst(valuee)
self.tvEditor.text = String(neww)
synthesizer.speak(buildUtterance(for: rate, pitch: pitch, with: String(neww), language: self.preferredVoiceLanguageCode2 ?? "en"))
}
} else {
}
}
I may have understood your problem even if no details are provided: you can't take into account the new values of the rate and pitchMultiplier when the speech is running.
To explain the following details, I read this example that contains code snippets (ObjC, Swift) and illustrations.
Create your AVSpeechUtterance instances with their rate and pitchMultiplier properties.
Add each one of them in an array that will represent the queue to be spoken.
Make a loop inside the previous queue with the synthesizer to read out every elements.
Now, if you want to change the property values in real-time, see the steps hereafter once one of your sliders moves:
Get the current spoken utterance thanks to the AVSpeechSynthesizerDelegate protocol.
Run the stopSpeaking synthesizer method that will remove from the queue the utterances that haven't been spoken yet.
Create the previous removed utterances with the new property values.
Redo steps 2/ and 3/ to resume where you stopped with these updated values.
The synthesizer queues all information to be spoken long before you ask for new values that don't impact the stored utterances: you must remove and recreate the utterances with their new property values to be spoken.
If the code example provided by the link above isn't enough, I suggest to take a look at this WWDC video detailed summary dealing with AVSpeechSynthesizer.

Use AVSpeechSynthesizer in background without stopping f.e. music app

I'd like to know how to properly let my iPhone speak one sentence while my app is in background but then return to whatever was playing before.
My question is quite similar to AVSpeechSynthesizer in background mode but again with the difference that I want to be able to "say something" while in background without having to stop Music that is playing. So while my AVSpeechSynthesizer is speaking, music should pause (or be a bit less loud) but then it should resume. Even when my app is currently in background.
What I am trying to archive is a spoken summary of tracking-stats while GPS-Tracking in my fitness app. And chances are that you are listening to music is quite high, and I don't want to disturb the user...
I found the answer myself...
The important part ist to configure the AVAudioSession with the .duckOthers option:
let audioSession = AVAudioSession.sharedInstance()
try audioSession.setCategory(AVAudioSessionCategoryPlayback, with: .duckOthers)
This will make playback of f.e. music less loud but this would make it stay less loud even when speech is done. This is why you need to set a delegate for the AVSpeechSynthesizer and then handle it like so:
func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didFinish utterance: AVSpeechUtterance) {
guard !synthesizer.isSpeaking else { return }
let audioSession = AVAudioSession.sharedInstance()
try? audioSession.setActive(false)
}
That way, music will continue with normal volume after speech is done.
Also, right before speaking, I activate my audioSession just to make sure (not sure if that would really be necessary, but since I do so, I have no more problems...)
For swift 3, import AVKit then add
try? AVAudioSession.sharedInstance().setCategory(AVAudioSessionCategoryPlayback)
for swift 4.2 / Xcode 10
unfortunately .duckOthers is no longer available; I managed to make it work like that:
let audioSession = AVAudioSession.sharedInstance()
try! audioSession.setCategory(AVAudioSession.Category.ambient, mode: .default)
Swift 5.0
let synthesizer = AVSpeechSynthesizer()
let synthesizerVoice = AVSpeechSynthesisVoice(language: "en-US")
let str = "String"
let utterance = AVSpeechUtterance(string: str)
let audioSession = AVAudioSession.sharedInstance()
try! audioSession.setCategory(
AVAudioSession.Category.playback,
options: AVAudioSession.CategoryOptions.mixWithOthers
)
utterance.rate = 0.5
utterance.voice = synthesizerVoice
synthesizer.speak(utterance)
Background music don't stop and your word sound play even phone in silent mode by this two line of code before usual text to speech codes :
let audioSession = AVAudioSession.sharedInstance()
try!audioSession.setCategory(AVAudioSessionCategoryPlayback, with: AVAudioSessionCategoryOptions.mixWithOthers)

AVSpeechSynthesizer High Quality Voices

Is it possible to use the enhanced/high quality voices (Alex in the U.S.) with the speech synthesizer? I have downloaded the voices but find no way to tell the synthesizer to use it rather than the default voice.
Since voices are generally selected by BCP-47 codes and there is only on for US English, it appears there is no way to further differentiate voices. Am I missing something? (One would think Apple might have considered a need for different dialects, but I am not seeing it).
TIA.
Yes, possible to pick from the 2 that seem to be available on my system, like this:
class Speak {
let voices = AVSpeechSynthesisVoice.speechVoices()
let voiceSynth = AVSpeechSynthesizer()
var voiceToUse: AVSpeechSynthesisVoice?
init(){
for voice in voices {
if voice.name == "Samantha (Enhanced)" && voice.quality == .enhanced {
voiceToUse = voice
}
}
}
func sayThis(_ phrase: String){
let utterance = AVSpeechUtterance(string: phrase)
utterance.voice = voiceToUse
utterance.rate = 0.5
voiceSynth.speak(utterance)
}
}
Then, somewhere in your app, do something like this:
let voice = Speak()
voice.sayThis("I'm speaking better Seppo, now!")
This was a bug in the previous versions of iOS that the apps using the synthesiser weren't using the enhanced voices. This bug has been fixed in iOS10. iOS10 now uses the enhanced voices.

AVSpeechSynthesizer output as file?

AVSpeechSynthesizer has a fairly simple API, which doesn't have support for saving to an audio file built-in.
I'm wondering if there's a way around this - perhaps recording the output as it's played silently, for playback later? Or something more efficient.
This is finally possible, in iOS 13 AVSpeechSynthesizer now has write(_:toBufferCallback:):
let synthesizer = AVSpeechSynthesizer()
let utterance = AVSpeechUtterance(string: "test 123")
utterance.voice = AVSpeechSynthesisVoice(language: "en")
var output: AVAudioFile?
synthesizer.write(utterance) { (buffer: AVAudioBuffer) in
guard let pcmBuffer = buffer as? AVAudioPCMBuffer else {
fatalError("unknown buffer type: \(buffer)")
}
if pcmBuffer.frameLength == 0 {
// done
} else {
// append buffer to file
if output == nil {
output = AVAudioFile(
forWriting: URL(fileURLWithPath: "test.caf"),
settings: pcmBuffer.format.settings,
commonFormat: .pcmFormatInt16,
interleaved: false)
}
output?.write(from: pcmBuffer)
}
}
As of now AVSpeechSynthesizer does not support this . There in no way get the audio file using AVSpeechSynthesizer . I tried this few weeks ago for one of my apps and found out that it is not possible , Also nothing has changed for AVSpeechSynthesizer in iOS 8.
I too thought of recording the sound as it is being played , but there are so many flaws with that approach like user might be using headphones, the system sound might be low or mute , it might catch other external sound, so its not advisable to go with that approach.
You can use OSX to prepare AIFF files (or, maybe, some OSX-based service) via NSSpeechSynthesizer method
startSpeakingString:toURL:

Resources