Text to speech conversion in Swift 4 - ios

I'm trying to integrate text to speech functionality in my iOS app.
For this I'm using AVSpeechUtterance and AVSpeechSynthesisVoice classes of AVFoundation framework.
extension String {
func speech(with pronunciation: String) {
let utterance = AVSpeechUtterance(attributedString: NSAttributedString(string: self, attributes: [.accessibilitySpeechIPANotation : pronunciation]))
utterance.voice = AVSpeechSynthesisVoice(language: "en-US")
let synth = AVSpeechSynthesizer()
DispatchQueue.main.async {
synth.speak(utterance)
}
}
}
The problem I'm facing is with the pronunciation of wind word as verb and noun, i.e.
wind as a verb is pronounced: waɪnd
and wind as a noun is pronounced: wɪnd
The above pronunciation strings follow the International Phonetic Alphabet (IPA).
But, I'm not getting the expected results.

If you want an IPA translation of a specific spelling, I suggest to use the iOS feature located at:
Settings > General > Accessibility > Speech > Pronunciations (iOS 12).
Settings > Accessibility > Spoken Content > Pronunciations (iOS 13)
Once you choose the desired result, you can use it in your code to be vocalized by the speech synthesizer.
EDIT
this solution also doesn't work for me.
I'm quite surprised by your comment because when I follow every steps of the provided link, I get the code snippet hereunder:
override func viewDidAppear(_ animated: Bool) {
super.viewDidAppear(animated)
let pronunciationKey = NSAttributedString.Key(rawValue: AVSpeechSynthesisIPANotationAttribute)
// let attrStr = NSMutableAttributedString(string: "blablablaNOUN",
// attributes: [pronunciationKey: "ˈwɪnd"])
let attrStr = NSMutableAttributedString(string: "blablablaVERB",
attributes: [pronunciationKey: "ˈwa͡ɪnd"])
let utterance = AVSpeechUtterance(attributedString: attrStr)
let synthesizer = AVSpeechSynthesizer()
synthesizer.speak(utterance)
}
... and when I launch this blank app after changing the iPhone Language in the Settings - General - Language & Region menu, I get the correct pronunciations for the verb and the noun.
Copy-paste the code snippet hereabove and test it by yourself.

Related

AVSpeechSynthesizer under ios16 not working (same code fine on iOS 12 / iOS 15) on UIKit

This is what I am using for TextToSpeech using AVSpeechSynthesizer and it seems to be working from iOS 12 & iOS 15, but when I try it on iOS 16.1, no voice can be heard using the below code.
I have confirmed that Spoken Content is working (Accessibility -> spoken content -> Speak Selection -> Enabled) and I can get the phone to speak out whole screen of text.
However, it's just not working for my app.
import Foundation
import AVFoundation
struct TTS {
// let synthesizer = AVSpeechSynthesizer()
static func speak(messages: String) {
let message = "Hello World"
let synthesizer = AVSpeechSynthesizer()
let utterance = AVSpeechUtterance(string: messages)
utterance.rate = AVSpeechUtteranceDefaultSpeechRate
utterance.postUtteranceDelay = 0.005
synthesizer.speak(utterance)
}
}
This has helped some, but only for the problem of AVSpeechSynthesizer not working under Xcode simulator.
AVSpeechSynthesizer isn't working under ios16 anymore
The other solutions in that SO doesn't seem to be working for my case.
some of the proposed solution is asking to move the let synthesizer = AVSpeechSynthesizer() out of the function, but when I do that, Xcode complains about Instance member 'synthesizer' cannot be used on type 'TTS'
https://developer.apple.com/forums/thread/712809
I think there's possibility something wrong w/ my code for iOS 16
This is for UiKit and not SwiftUI which other SO ( Swift TTS, no audio-output) has a solution for.
Changed my struct to a class since AVSpeechSynthesizer can't be a local variable.
This now works
import Foundation
import AVFoundation
class TTS {
let synthesizer = AVSpeechSynthesizer()
func speak(messages: String) {
print("[TTS][Speak]\n\(messages)")
let utterance = AVSpeechUtterance(string: messages)
utterance.voice = AVSpeechSynthesisVoice(language: "en-US")
utterance.postUtteranceDelay = 0.005
synthesizer.speak(utterance)
}
}
How to Use
let tts = TTS()
tts.speak(message: "Hello World")

AVSpeechSynthesizer volume too low

I'm trying to create an app with Swift.
I integrated correctly speech-to-text and text-to-speech: my app works perfectly. You can find my project here.
After speech-to-text, the app makes an http request to a server (sending the text recognized) and the response (It is a string, i.e.: "Ok, I'll show you something") is reproduced vocally from text-to-speech. But, there is a big issue and I can't solve it.
When the app is reproducing the text vocally, the voice is too slow, as if it were in the background, as if there was something to be reproduced more important than the voice (actually nothing).
Debugging, I discovered that the issue starts using audioEngine (AVAudioEngine) inside the function recordAndRecognizeSpeech(). Running the app without using this function and playing a random text it works like a charm.
So, in my opinion when the app is reproducing the text vocally, it thinks there is still active audioengine, so the volume is very slow.
But, before reproducing the text, I called these functions (look inside ac function, line 96):
audioEngine.stop()
audioEngine.reset()
How can I solve this issue?
EDIT:
I found a partial solution. Now before the app plays the text vocally my code is:
audioEngine.inputNode.removeTap(onBus: 0)
audioEngine.stop()
audioEngine.reset()
recognitionTask?.cancel()
isRecording = false
microphoneButton.setTitle("Avvia..", for: UIControl.State.normal);
do {
let audioSession = AVAudioSession.sharedInstance()
try audioSession.setCategory(AVAudioSession.Category.ambient)
try audioSession.setActive(false, options: .notifyOthersOnDeactivation)
} catch {
print(error)
}
make_request(msg: self.speech_result.text!)
The function .setCategory works and the volume is like the default one. When I try to recall recordAndRecognizeSpeech() function, the app gives me this exception:
VAEInternal.h:70:_AVAE_Check: required condition is false: [AVAudioIONodeImpl.mm:910:SetOutputFormat: (IsFormatSampleRateAndChannelCountValid(hwFormat))]
This exception is caused by .setCategory(AVAudioSession.Category.ambient), it should be .playAndRecord, but with this value the volume returns to be low.
try this one.
let speaker = AVSpeechSynthesizer()
func say(text: String, language: String) {
// Start audio session
let audioSession = AVAudioSession.sharedInstance()
do {
try audioSession.setCategory(AVAudioSession.Category.playAndRecord)
try audioSession.setMode(AVAudioSession.Mode.default)
try audioSession.setActive(true)
try AVAudioSession.sharedInstance().overrideOutputAudioPort(AVAudioSession.PortOverride.speaker)
} catch {
return
}
if speaker.isSpeaking {
speaker.stopSpeaking(at: .immediate)
} else {
myUtterance = AVSpeechUtterance(string: text)
myUtterance.rate = AVSpeechUtteranceDefaultSpeechRate
myUtterance.voice = AVSpeechSynthesisVoice(language: language)
myUtterance.pitchMultiplier = 1
myUtterance.volume = 2
DispatchQueue.main.async {
self.speaker.speak(myUtterance)
}
}
}
Try This .
set rate for play speedly
var speedd = AVSpeechSynthesizer()
var voicert = AVSpeechUtterance()
voicert = AVSpeechUtterance(string: "Your post appears to contain code that is not properly formatted as code. Please indent all code by 4 spaces using the code toolbar button or the CTRL+K keyboard shortcut. For more editing help, click the [?] toolbar icon")
voicert.voice = AVSpeechSynthesisVoice(language: "en-US")
voicert.rate = 0.5
speedd.speak(voicert)

How to detect text (string) language in iOS?

For instance, given the following strings:
let textEN = "The quick brown fox jumps over the lazy dog"
let textES = "El zorro marrón rápido salta sobre el perro perezoso"
let textAR = "الثعلب البني السريع يقفز فوق الكلب الكسول"
let textDE = "Der schnelle braune Fuchs springt über den faulen Hund"
I want to detect the used language in each of them.
Let's assume the signature for the implemented function is:
func detectedLanguage<T: StringProtocol>(_ forString: T) -> String?
returns an Optional string in case of no detected language.
thus the appropriate result would be:
let englishDetectedLanguage = detectedLanguage(textEN) // => English
let spanishDetectedLanguage = detectedLanguage(textES) // => Spanish
let arabicDetectedLanguage = detectedLanguage(textAR) // => Arabic
let germanDetectedLanguage = detectedLanguage(textDE) // => German
Is there an easy approach to achieve it?
Latest versions (iOS 12+)
Briefly:
You could achieve it by using NLLanguageRecognizer, as:
import NaturalLanguage
func detectedLanguage(for string: String) -> String? {
let recognizer = NLLanguageRecognizer()
recognizer.processString(string)
guard let languageCode = recognizer.dominantLanguage?.rawValue else { return nil }
let detectedLanguage = Locale.current.localizedString(forIdentifier: languageCode)
return detectedLanguage
}
Older versions (iOS 11+)
Briefly:
You could achieve it by using NSLinguisticTagger, as:
func detectedLanguage<T: StringProtocol>(for string: T) -> String? {
let recognizer = NLLanguageRecognizer()
recognizer.processString(String(string))
guard let languageCode = recognizer.dominantLanguage?.rawValue else { return nil }
let detectedLanguage = Locale.current.localizedString(forIdentifier: languageCode)
return detectedLanguage
}
Details:
First of all, you should be aware of what are you asking about is mainly relates to the world of Natural language processing (NLP).
Since NLP is more than text language detection, the rest of the answer will not contains specific NLP information.
Obviously, implementing such a functionality is not that easy, especially when starting to care about the details of the process such as splitting into sentences and even into words, after that recognising names and punctuations etc... I bet you would think of "what a painful process! it is not even logical to do it by myself"; Fortunately, iOS does supports NLP (actually, NLP APIs are available for all Apple platforms, not only the iOS) to make what are you aiming for to be easy to be implemented. The core component that you would work with is NSLinguisticTagger:
Analyze natural language text to tag part of speech and lexical class,
identify names, perform lemmatization, and determine the language and
script.
NSLinguisticTagger provides a uniform interface to a variety of
natural language processing functionality with support for many
different languages and scripts. You can use this class to segment
natural language text into paragraphs, sentences, or words, and tag
information about those segments, such as part of speech, lexical
class, lemma, script, and language.
As mentioned in the class documentation, the method that you are looking for - under Determining the Dominant Language and Orthography section- is dominantLanguage(for:):
Returns the dominant language for the specified string.
.
.
Return Value
The BCP-47 tag identifying the dominant language of the string, or the
tag "und" if a specific language cannot be determined.
You might notice that the NSLinguisticTagger is exist since back to iOS 5. However, dominantLanguage(for:) method is only supported for iOS 11 and above, that's because it has been developed on top of the Core ML Framework:
. . .
Core ML is the foundation for domain-specific frameworks and
functionality. Core ML supports Vision for image analysis, Foundation
for natural language processing (for example, the NSLinguisticTagger
class), and GameplayKit for evaluating learned decision trees. Core ML
itself builds on top of low-level primitives like Accelerate and BNNS,
as well as Metal Performance Shaders.
Based on the returned value from calling dominantLanguage(for:) by passing "The quick brown fox jumps over the lazy dog":
NSLinguisticTagger.dominantLanguage(for: "The quick brown fox jumps over the lazy dog")
would be "en" optional string. However, so far that is not the desired output, the expectation is to get "English" instead! Well, that is exactly what you should get by calling the localizedString(forLanguageCode:) method from Locale Structure and passing the gotten language code:
Locale.current.localizedString(forIdentifier: "en") // English
Putting all together:
As mentioned in the "Quick Answer" code snippet, the function would be:
func detectedLanguage<T: StringProtocol>(_ forString: T) -> String? {
guard let languageCode = NSLinguisticTagger.dominantLanguage(for: String(forString)) else {
return nil
}
let detectedLanguage = Locale.current.localizedString(forIdentifier: languageCode)
return detectedLanguage
}
Output:
It would be as expected:
let englishDetectedLanguage = detectedLanguage(textEN) // => English
let spanishDetectedLanguage = detectedLanguage(textES) // => Spanish
let arabicDetectedLanguage = detectedLanguage(textAR) // => Arabic
let germanDetectedLanguage = detectedLanguage(textDE) // => German
Note That:
There still cases for not getting a language name for a given string, like:
let textUND = "SdsOE"
let undefinedDetectedLanguage = detectedLanguage(textUND) // => Unknown language
Or it could be even nil:
let rubbish = "000747322"
let rubbishDetectedLanguage = detectedLanguage(rubbish) // => nil
Still find it a not bad result for providing a useful output...
Furthermore:
About NSLinguisticTagger:
Although I will not going to dive deep in NSLinguisticTagger usage, I would like to note that there are couple of really cool features exist in it more than just simply detecting the language for a given a text; As a pretty simple example: using the lemma when enumerating tags would be so helpful when working with Information retrieval, since you would be able to recognize the word "driving" passing "drive" word.
Official Resources
Apple Video Sessions:
For more about Natural Language Processing and how NSLinguisticTagger works: Natural Language Processing and your Apps.
Also, for getting familiar with the CoreML:
Introducing Core ML.
Core ML in depth.
You can use NSLinguisticTagger's tagAt method. It support iOS 5 and later.
func detectLanguage<T: StringProtocol>(for text: T) -> String? {
let tagger = NSLinguisticTagger.init(tagSchemes: [.language], options: 0)
tagger.string = String(text)
guard let languageCode = tagger.tag(at: 0, scheme: .language, tokenRange: nil, sentenceRange: nil) else { return nil }
return Locale.current.localizedString(forIdentifier: languageCode)
}
detectLanguage(for: "The quick brown fox jumps over the lazy dog") // English
detectLanguage(for: "El zorro marrón rápido salta sobre el perro perezoso") // Spanish
detectLanguage(for: "الثعلب البني السريع يقفز فوق الكلب الكسول") // Arabic
detectLanguage(for: "Der schnelle braune Fuchs springt über den faulen Hund") // German
I tried NSLinguisticTagger with short input text like hello, it always recognizes as Italian.
Luckily, Apple recently added NLLanguageRecognizer available on iOS 12, and seems like it more accurate :D
import NaturalLanguage
if #available(iOS 12.0, *) {
let languageRecognizer = NLLanguageRecognizer()
languageRecognizer.processString(text)
let code = languageRecognizer.dominantLanguage!.rawValue
let language = Locale.current.localizedString(forIdentifier: code)
}

How can I play sound system with text?

Do we have any ways to get sound from text?
For example, we have:
let str = "Hello English" .
I want to get sound system from that text.
As Ravy Chheng answered, this uses the built-in AVFoundation library to initialize basic text-to speak function. For more information, check out the documentation by Apple: https://developer.apple.com/reference/avfoundation/avspeechsynthesizer
import AVFoundation
func playSound(str: String) {
let speechSynthesizer = AVSpeechSynthesizer()
let speechUtterance = AVSpeechUtterance(string: str)
speechSynthesizer.speak(speechUtterance)
}

AVSpeechSynthesizer High Quality Voices

Is it possible to use the enhanced/high quality voices (Alex in the U.S.) with the speech synthesizer? I have downloaded the voices but find no way to tell the synthesizer to use it rather than the default voice.
Since voices are generally selected by BCP-47 codes and there is only on for US English, it appears there is no way to further differentiate voices. Am I missing something? (One would think Apple might have considered a need for different dialects, but I am not seeing it).
TIA.
Yes, possible to pick from the 2 that seem to be available on my system, like this:
class Speak {
let voices = AVSpeechSynthesisVoice.speechVoices()
let voiceSynth = AVSpeechSynthesizer()
var voiceToUse: AVSpeechSynthesisVoice?
init(){
for voice in voices {
if voice.name == "Samantha (Enhanced)" && voice.quality == .enhanced {
voiceToUse = voice
}
}
}
func sayThis(_ phrase: String){
let utterance = AVSpeechUtterance(string: phrase)
utterance.voice = voiceToUse
utterance.rate = 0.5
voiceSynth.speak(utterance)
}
}
Then, somewhere in your app, do something like this:
let voice = Speak()
voice.sayThis("I'm speaking better Seppo, now!")
This was a bug in the previous versions of iOS that the apps using the synthesiser weren't using the enhanced voices. This bug has been fixed in iOS10. iOS10 now uses the enhanced voices.

Resources