Play audio buffers generated by AVSpeechSynthesizer directly - ios

We have a requirement for audio processing on the output of AVSpeechSynthesizer. So we started with using the write method of AVSpeechSynthesizer class to apply processing on top. of it. What we currently have:
var synthesizer = AVSpeechSynthesizer()
var playerNode: AVAudioPlayerNode = AVAudioPlayerNode()
fun play(audioCue: String){
let utterance = AVSpeechUtterance(string: audioCue)
synthesizer.write(utterance, toBufferCallback: {[weak self] buffer in
// We do our processing including conversion from pcmFormatFloat16 format to pcmFormatFloat32 format which is supported by AVAudioPlayerNode
self.playerNode.scheduleBuffer(buffer as! AVAudioPCMBuffer, completionCallbackType: .dataPlayedBack)
}
}
All of it was working fine before iOS 16 but with iOS 16 we started getting this exception:
[AXTTSCommon] TTSPlaybackEnqueueFullAudioQueueBuffer: error -66686 enqueueing buffer
Not sure what this exception means exactly. So we are looking for a way of addressing this exception or may be a better way of playing the buffers.
UPDATE:
Created an empty project for testing and it turns out the write method if called with an empty bloc generates these logs:

Code I have used for Swift project :
let synth = AVSpeechSynthesizer()
let myUtterance = AVSpeechUtterance(string: message)
myUtterance.rate = 0.4
synth.speak(myUtterance)
Can move let synth = AVSpeechSynthesizer() out of this method and declare on top for this class and use.
Settings to enable for Xcode14 & iOS 16 : If you are using XCode14 and iOS16, it may be voices under spoken content is not downloaded and you will get an error on console saying identifier, source, content nil. All you need to do is, go to accessiblity in settings -> Spoken content -> Voices -> Select any language and download any profile. After this run ur voice and you will be able to hear the speech from passed text.
It is working for me now.

Related

AVSpeechSynthesizer: how to display in a default player view

I use AVSpeechSynthesizer to play text books via audio.
private lazy var synthesizer: AVSpeechSynthesizer = {
let synthesizer = AVSpeechSynthesizer()
synthesizer.delegate = self
return synthesizer
}()
let utterance = AVSpeechUtterance(string: text)
utterance.voice = AVSpeechSynthesisVoice(
language: languageIdentifier(from: language)
)
synthesizer.speak(utterance)
I want to update information in iPhone's default player view (probably naming is wrong 🙏):
indicate playing Chapter with some text
enable next button to play the next chapter
How can I accomplish this?
I really don't think you want to hack your way through this.. But if you really do I would:
Listen to remote commands (UIApplication.sharedApplication().beginReceivingRemoteControlEvents(), see Apple Sample Project
Set your properties on MPNowPlayingInfoCenter: MPNowPlayingInfoCenter.default().nowPlayingInfo[MPMediaItemPropertyTitle] = "Title"
Implement the AVSpeechSynthesizerDelegate and try to map the delegate functions to playback states and estimate the playback progress using speechSynthesizer(_:willSpeakRangeOfSpeechString:utterance:) (idk if possible)
You might have to play with the usesApplicationAudioSession property of AVSpeechSynthesizer to have more control over the audio session (set categories etc.)

How can I play sound system with text?

Do we have any ways to get sound from text?
For example, we have:
let str = "Hello English" .
I want to get sound system from that text.
As Ravy Chheng answered, this uses the built-in AVFoundation library to initialize basic text-to speak function. For more information, check out the documentation by Apple: https://developer.apple.com/reference/avfoundation/avspeechsynthesizer
import AVFoundation
func playSound(str: String) {
let speechSynthesizer = AVSpeechSynthesizer()
let speechUtterance = AVSpeechUtterance(string: str)
speechSynthesizer.speak(speechUtterance)
}

Record audio from text to speech ios AVAudioSession

I am using Xamarin to develop an iOS app. I need to be able to record the audio from AVSpeechUtterance to a file, or stream the audio buffer from internal audio to a method that will populate a byte array.
I've searched high and low on how to record internal audio (from text to speech api). I'm only able to use the internal microphone for recording, not the audio being played by app.
Currently using Xamarin's example for text to speech:
public void Speak (string text)
{
var speechSynthesizer = new AVSpeechSynthesizer ();
var speechUtterance = new AVSpeechUtterance (text) {
Rate = AVSpeechUtterance.MaximumSpeechRate/2,
Voice = AVSpeechSynthesisVoice.FromLanguage ("en-US"),
Volume = 0.5f,
PitchMultiplier = 1.0f
};
speechSynthesizer.SpeakUtterance (speechUtterance);
}
Hints/solution can be C#, Swift or Objective C. I just need to be pointed in the general right direction.

AVSpeechSynthesizer High Quality Voices

Is it possible to use the enhanced/high quality voices (Alex in the U.S.) with the speech synthesizer? I have downloaded the voices but find no way to tell the synthesizer to use it rather than the default voice.
Since voices are generally selected by BCP-47 codes and there is only on for US English, it appears there is no way to further differentiate voices. Am I missing something? (One would think Apple might have considered a need for different dialects, but I am not seeing it).
TIA.
Yes, possible to pick from the 2 that seem to be available on my system, like this:
class Speak {
let voices = AVSpeechSynthesisVoice.speechVoices()
let voiceSynth = AVSpeechSynthesizer()
var voiceToUse: AVSpeechSynthesisVoice?
init(){
for voice in voices {
if voice.name == "Samantha (Enhanced)" && voice.quality == .enhanced {
voiceToUse = voice
}
}
}
func sayThis(_ phrase: String){
let utterance = AVSpeechUtterance(string: phrase)
utterance.voice = voiceToUse
utterance.rate = 0.5
voiceSynth.speak(utterance)
}
}
Then, somewhere in your app, do something like this:
let voice = Speak()
voice.sayThis("I'm speaking better Seppo, now!")
This was a bug in the previous versions of iOS that the apps using the synthesiser weren't using the enhanced voices. This bug has been fixed in iOS10. iOS10 now uses the enhanced voices.

AVSpeechSynthesizer output as file?

AVSpeechSynthesizer has a fairly simple API, which doesn't have support for saving to an audio file built-in.
I'm wondering if there's a way around this - perhaps recording the output as it's played silently, for playback later? Or something more efficient.
This is finally possible, in iOS 13 AVSpeechSynthesizer now has write(_:toBufferCallback:):
let synthesizer = AVSpeechSynthesizer()
let utterance = AVSpeechUtterance(string: "test 123")
utterance.voice = AVSpeechSynthesisVoice(language: "en")
var output: AVAudioFile?
synthesizer.write(utterance) { (buffer: AVAudioBuffer) in
guard let pcmBuffer = buffer as? AVAudioPCMBuffer else {
fatalError("unknown buffer type: \(buffer)")
}
if pcmBuffer.frameLength == 0 {
// done
} else {
// append buffer to file
if output == nil {
output = AVAudioFile(
forWriting: URL(fileURLWithPath: "test.caf"),
settings: pcmBuffer.format.settings,
commonFormat: .pcmFormatInt16,
interleaved: false)
}
output?.write(from: pcmBuffer)
}
}
As of now AVSpeechSynthesizer does not support this . There in no way get the audio file using AVSpeechSynthesizer . I tried this few weeks ago for one of my apps and found out that it is not possible , Also nothing has changed for AVSpeechSynthesizer in iOS 8.
I too thought of recording the sound as it is being played , but there are so many flaws with that approach like user might be using headphones, the system sound might be low or mute , it might catch other external sound, so its not advisable to go with that approach.
You can use OSX to prepare AIFF files (or, maybe, some OSX-based service) via NSSpeechSynthesizer method
startSpeakingString:toURL:

Resources