IOS/Swift/AVSpeechSynthesizer: Control Speed of Enqueued Utterances - ios

In order to exert greater control over speech in the spirit of this tutorial for an audiobook although I'm not following it exactly, I have tried sending smaller pieces of a string such as phrases in separate chunks. The speech synthesizer enqueues each utterance and speaks them one after the other. In theory, this is supposed to give you greater control to make speech sound less robotic.
I can get the synthesizer to speak the chunks in order however there is a long delay between each so it sounds way worse than just sending all the text at the same time.
Is there anyway to speed up the queue so that the utterances are spoken one after the other with no delay?
Setting the properties: utt.preUtteranceDelay and utt.postUtteranceDelay to zero seconds does not seem to have any effect
Here is my code:
phraseCounter = 0
func readParagraph(test: String) {
let phrases = test.components(separatedBy: " ")
for phrase in phrases {
phraseCounter = phraseCounter+1
let utt = AVSpeechUtterance(string:phrase)
let preUtteranceDelayInSecond = 0
let postUtteranceDelayInSecond = 0
utt.preUtteranceDelay = TimeInterval.init(exactly:preUtteranceDelayInSecond)!
utt.postUtteranceDelay = TimeInterval.init(exactly:postUtteranceDelayInSecond)!
voice.delegate = self
if (phraseCounter == 2) {
utt.rate = .8
}
voice.speak(utt)
}
}

Is there anyway to speed up the queue so that the utterances are spoken one after the other with no delay?
As you did, the only way is to set the post and pre UtteranceDelay properties to 0 which is the default value by the way.
As recommended here, I implemented the code snippet hereafter (Xcode 10, Swift 5.0 and iOS 12.3.1) to check the impact of different UtteranceDelay values ⟹ 0 is the best solution to improve the speed of enqueued utterances.
var synthesizer = AVSpeechSynthesizer()
var playQueue = [AVSpeechUtterance]()
override func viewDidAppear(_ animated: Bool) {
super.viewDidAppear(animated)
for i in 1...10 {
let stringNb = "Sentence number " + String(i) + " of the speech synthesizer."
let utterance = AVSpeechUtterance(string: stringNb)
utterance.rate = AVSpeechUtteranceDefaultSpeechRate
utterance.pitchMultiplier = 1.0
utterance.volume = 1.0
utterance.postUtteranceDelay = 0.0
utterance.preUtteranceDelay = 0.0
playQueue.append(utterance)
}
synthesizer.delegate = self
for utterance in playQueue {
synthesizer.speak(utterance)
}
}
If a delay is too important with the '0' value in your code, the incoming string is maybe the problem? (adapt the code snippet above to your needs)

Related

AVSpeechSynthesizer error: An AVSpeechUtterance shall not be enqueued twice

I am stuck a little bit.
Here is my code:
let speaker = AVSpeechSynthesizer()
var playQueue = [AVSpeechUtterance]() // current queue
var backedQueue = [AVSpeechUtterance]() // queue backup
...
func moveBackward(_ currentUtterance:AVSpeechUtterance) {
speaker.stopSpeaking(at: .immediate)
let currentIndex = getCurrentIndexOfText(currentUtterance)
// out of range check was deleted
let previousElement = backedQueue[currentIndex-1]
playQueue.insert(previousElement, at: 0)
for utterance in playQueue {
speaker.speak(utterance) // error here
}
}
According to the docs AVSpeechSynthesizer.stopSpeaking(at:):
Stopping the synthesizer cancels any further speech; in constrast with
when the synthesizer is paused, speech cannot be resumed where it left
off. Any utterances yet to be spoken are removed from the
synthesizer’s queue.
I always get the error(AVSpeechUtterance shall not be enqueued twice), when I insert an AVSpeechUtterance in the AVSpeechSynthesizer queue. But it should stop according to the doc.
When stopping the player, the utterances are definitely removed from the queue.
However, in your moveBackward function, you insert another AVSpeechUterrance at playQueue[0] whose complete array represents the player queue.
Assuming the stops happens with currentIndex = 2, the following snapshots prove that the same object is injected twice in the queue:
Copy backedQueue[1] that is a copy of playQueue[1] (same memory address).
Insert backedQueue[1] at playQueue[0] (former playQueue[1] becomes new playQueue[2]).
Unfortunately, as the system indicates, AVSpeechUtterance shall not be enqueued twice and that's exactly what you're doing here: objects at playQueue indexes 0 and 2 have the same memory address.
The last loop after inserting the new object at index 0 asks the speech synthesizer to put all the utterances in its all new queue... and two of them are the same.
Instead of copying the playedQueue into the backedQueue (both contain the same memory addresses to their objects) OR appending the same utterance in both arrays, I suggest to create different utterance instances to be put as follows:
for i in 1...5 {
let stringNb = "number " + String(i) + " of the speech synthesizer."
let utterance = AVSpeechUtterance(string: stringNb)
playQueue.append(utterance)
let utteranceBis = AVSpeechUtterance(string: stringNb)
backedQueue.append(utteranceBis)
}
Following this piece of advice, you shouldn't meet the error AVSpeechUtterance shall not be enqueued twice.

AudioKit - AKOperationGenerator with AKParameters - CPU Issue

I need help with sending AKParameters to the AKOperationGenerator. My current solution use a lot of CPU. Is there a better way how to do it?
Here is my example code:
import AudioKit
class SynthVoice: AKNode {
override init() {
let synth = AKOperationGenerator { p in
//(1) - 30% CPU
let osc: AKOperation = AKOperation.squareWave(frequency: p[0], amplitude: p[1], pulseWidth: p[2])
//(2) - 9% CPU
//let osc: AKOperation = AKOperation.squareWave(frequency: 440, amplitude: 1, pulseWidth: 0.5)
return osc
}
synth.parameters[0] = 880
synth.parameters[1] = 1
synth.parameters[2] = 0.5
super.init()
self.avAudioNode = synth.avAudioNode
synth.start()
}
}
class ViewController: UIViewController {
override func viewDidLoad() {
super.viewDidLoad()
let mixer: AKMixer = AKMixer([SynthVoice(), SynthVoice(), SynthVoice(), SynthVoice(), SynthVoice(), SynthVoice()])
AudioKit.output = mixer
AudioKit.start()
}
}
I need 6 voice osc bank with envelope filter for each voice. I did not find any OSC-bank with envelope filter in AudioKit, so I started to write my own via AKOperationGenerator... But the CPU is too high. (About 100% in my project - 6 AKOperationGenerator with PWM square osc and envelope filter and a lot of AKParameters that can be changed via UI)
Thanks for any response.
I'd definitely do this at the DSP Kernel level. Its C/C++ but its really not too bad. Use one of the AKOscillatorBank type nodes as your model, but in addition to having an amplitude envelope, put in a filter envelope the same way. We're releasing an open source synth that does this exact thing in a few months if you can wait.

Image change with every sentence said in iOS?

I want my image view to change with every sentence in str, but it doesn't change
let elements = ["1","2","3"]
var cx = 0
for str in components{
OutImage.image = UIImage(named: elements1[cx+1])
myUtterance = AVSpeechUtterance(string: str)
myUtterance.rate = 0.4
myUtterance.pitchMultiplier = 1.3
myUtterance.voice = AVSpeechSynthesisVoice(language: "en-GB")
myUtterance.voice = voiceToUse
synth.speak(myUtterance)
}
You can't do that in a for loop, because speak is asynchronous, meaning that it will execute in "parallel" with the rest of your code. By the time your for loop finishes, the speech synthesiser probably didn't even start synthesising your first sentence!
The solution, is to use an AVSpeechSynthesizerDelegate. In particular, you need to implement this method:
func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer,
didFinish utterance: AVSpeechUtterance)
This delegate method is called when the synthesiser finishes an utterance.
In the method, you can figure out which image should be shown by looking at the cx variable. Then you should increment cx. After that, using cx, get the next utterance to be spoken and call speak. You should also remember to check for whether cx is the end of the array!

Tesseract in iOS (Swift) - How to Separate Text and Numbers in UITextField?

I have a Swift-based application that currently implements the Tesseract OCR framework (similar to the form in this tutorial: http://www.raywenderlich.com/93276/implementing-tesseract-ocr-ios). So upon taking a picture and employing Tesseract, I obtain the following output in a UITextField object:
Subtotal 155.60
Tax 14.02
Total 169.82
So now I would like to separate the text from the numbers in the UITextField. I was considering using the "contain" function built into Swift on a matrix containing all values in price format ([0.01 0.02, etc.]) but this will only return a boolean as outlined in this post (How to have a textfield scan for all values in an array individually in swift?). Does anyone have any suggestions on how to do this? Cheers!
Tesseract Implementation
func performImageRecognition(image: UIImage)
// 0
// 1
let tesseract = G8Tesseract()
// 2
tesseract.language = "eng"
// 3
tesseract.engineMode = .TesseractCubeCombined
// 4
tesseract.pageSegmentationMode = .Auto
// 5
tesseract.maximumRecognitionTime = 60.0
// 6
tesseract.image = image.g8_blackAndWhite()
tesseract.recognize()
// 7
textView.text = tesseract.recognizedText
textView.editable = true
Sounds like you might want to look into using Regular Expressions
func seperate (text: String) -> (text: String?, value: String?) {
// You might want to do an extra check here to ensure the whole string is valid
// i.e., nothing in between the two parts of the string
let textMatch = text.rangeOfString("^([A-Z]|[a-z])+", options: .RegularExpressionSearch)
let priceMatch = text.rangeOfString("[0-9]*.[0-9]{2}$", options: .RegularExpressionSearch)
// You might want to adjust regex to handle price edge cases, such as 15 (rather than 15.00) etc
if let textMatch = textMatch, priceMatch = priceMatch {
let textValue = text.substringWithRange(textMatch)
let priceValue = text.substringWithRange(priceMatch)
return(textValue, priceValue)
} else {
return (nil, nil)
}
}
seperate("Subtotal 155.60") // -> Subtotal, 155.60

AVPlayer not synchronized

I'm really out of ideas so I'll have to ask you guys again...
I'm building an iPhone application which uses three instances of AVPlayer. They all play at the same time and it's very important that they do so. I used to run this code:
CMClockRef syncTime = CMClockGetHostTimeClock();
CMTime hostTime = CMClockGetTime(hostTime);
[self.playerOne setRate:1.0f time:kCMTimeInvalid atHostTime:hostTime];
[self.playerTwo setRate:1.0f time:kCMTimeInvalid atHostTime:hostTime];
[self.playerThree setRate:1.0f time:kCMTimeInvalid atHostTime:hostTime];
which worked perfectly. But a few days ago it just stopped working, the three players are delayed by about 300-400ms (which is way to much, everything under 100ms would be okay). Two of these AVPlayer have some Audio processing, which takes some time more than the "normal" AVPlayer, but it used to work before and the currentTime property tells me, that these players are delayed, so the syncing seems to fail.
I have no idea why it stopped working, I didn't really changed something, but I'm using an observer where i can ask the self.playerX.currentTime property, which gives me a delay of about .3-.4 seconds... I already tried to resync the players if delay>.1f but the delay is still there. So I think the audio processing of player1 and 2 can't be responsable for the delay, as the currentTime property does know they are delayed (i hope you know what I mean). Maybe someone of you guys know why I'm having such a horrible delay, or is able to provide me another idea.
Thanks in advance!
So, I found the solution. I forgot to [self.playerX prerollAtRate:]. I thought if the observer is AVPlayerReadyToPlay it means, that the player is "really" ready. In fact, it does not. After AVPlayer is readyToPlay, it has to be pre rolled. Once that is done you can sync your placer. The delay is now somewhere at 0.000006 seconds.
Full func to sync avplayer's across multiple iOS devices
private func startTribePlayer() {
let dateFormatterGet = DateFormatter()
dateFormatterGet.dateFormat = "yyyy-MM-dd"
guard let refDate = dateFormatterGet.date(from: "2019-01-01") else { return }
let tsRef = Date().timeIntervalSince(refDate)
//currentDuration is avplayeritem.duration().seconds
let remainder = tsRef.truncatingRemainder(dividingBy: currentDuration)
let ratio = remainder / currentDuration
let seekTime = ratio * currentDuration
let bufferTime = 0.5
let bufferSeekTime = seekTime + bufferTime
let mulFactor = 10000.0
let timeScale = CMTimeScale(mulFactor)
let seekCMTime = CMTime(value: CMTimeValue(CGFloat(bufferSeekTime * mulFactor)), timescale: timeScale)
let syncTime = CMClockGetHostTimeClock()
let hostTime = CMClockGetTime(syncTime)
tribeMusicPlayer?.seek(to: seekCMTime, toleranceBefore: .zero, toleranceAfter: .zero, completionHandler: { [weak self] (successSeek) in
guard let tvc = self, tvc.tribeMusicPlayer?.currentItem?.status == .readyToPlay else { return }
tvc.tribeMusicPlayer?.preroll(atRate: 1.0, completionHandler: { [tvc] (successPreroll) in
tvc.tribePlayerDidPlay = true
tvc.tribeMusicPlayer?.setRate(1.0, time: seekCMTime, atHostTime: hostTime)
})
})
}

Resources