After integrating voice over through accessibilityLabels, and testing the interaction alone, it was time to try turning on the voice. Fortunately, it worked perfectly well for english text... But wasn't so lucky with Arabic.
Apparently, voice over utters "unpronounceable" when it reaches Easten Arabic-Indic numerals:
١ ، ٢ ، ٣
It is really inefficient to keep listening to each accessibility label to make sure, so I thought there would be some sort of query we can do to the TTS engine, and write tests around that.
All I know after research is that the underlying TTS engine is AVSpeechSynthesis, but that doesn't seem to have anything of that sort.
Related
I am trying to implement the accessibility to my ios project.
Is there a way to correct the pronunciation of some specific words when the voice-over is turned on? For example, The correct pronunciation of 'speech' is [spiːtʃ], but I want the voice-over to read all the words 'speech' as same as 'speak' [spiːk] during my whole project.
I know there is one way that I can set the accessibility label of any UIElements that I want to change the pronunciation to 'speak'. However, some elements are dynamic. For example, we get the label text from the back-end, but we will never know when the label text will be 'speech'. If I get the words 'speech' from the back end, I would like to hear voice-over read it as 'speak'.
Therefore, I would like to change the setting for the voice-over. Every time, If the words are 'speech', the voice-over will read as 'speak'.
Can I do it?
Short answer.
Yes you can do it, but please do not.
Long Answer
Can I do it?
Yes, of course you can.
Simply fetch the data from the backend and do a find-replace on the string for any words you want spoken differently using a dictionary of words to replace, then add the new version of the string as the accessibility label.
SHOULD you do it?
Absolutely not.
Every time someone tries to "fix" pronunciation it ends up making things a lot worse.
I don't even understand why you would want screen reader users to hear "speak" whenever anyone else sees "speech", it does not make sense and is likely to break the meaning of sentences:
"I attended the speech given last night, it was very informative".
Would transform into:
"I attended the speak given last night, it was very informative"
Screen reader users are used to it.
A screen reader user is used to hearing things said differently (and incorrectly!), my guess is you have not been using a screen reader long enough to get used to the idiosyncrasies of screen reader speech.
Far from helping screen reader users you will actually end up making things worse.
I have only ever overridden screen reader default behaviour twice, once when it was a version number that was being read as a date and once when it was a password manager that read the password back and would try and read things as words.
Other than those very narrow examples I have not come across a reason to change things for a screen reader.
What about braille users?
You could change things because they don't sound right. But braille users also use screen readers and changing things for them could be very confusing (as per the example above of "speech").
What about best practices
"Give assistive technology users as similar an experience as possible to non assistive tech users". That is the number one guiding principle of accessibility, the second you change pronunciations and words, you potentially change the meaning of sentences and therefore offer a different experience.
Summing up
Anyway this is turning into a rant when it isn't meant to be (my apologies, I am just trying to get the point across as I answer similar questions to this quite often!), hopefully you get the idea, leave it alone and present the same info, I haven't even covered different speech synthesizers, language translation and more that using "unnatural" language can interfere with.
The easiest solution is to return a 2nd string from the backend that is used just for the accessibilityLabel.
If you need a bit more control, you can pass an AttributedString as the accessibilityLabel with a number of different options for controlling pronunication
https://medium.com/macoclock/ios-attributed-accessibility-labels-f54b8dcbf9fa
We try to use build-in iOS text-to-speech tool for reading Chinese words in the app.
It's good in reading texts. But got problems reading separate words.
For example, we have character 还. It could be pronounced like "hái" with meaning "also, in addition"; and could be pronounced like "huàn" with meaning "to return".
In phrase 我还要还钱 (wǒ hái yào huàn qián) it pronounce 还 in both ways (correct).
In case of separate "还" iOS prefer to read it only like "hái". How to make it pronounce characters in the way we need it (if possible)?
As a quick solution you can cut required words from longer files and play them as audio instead of using TTS
I'm working on an application that requires the use of a text to speech synthesizer. Implementing this was rather simple for iOS using AVSpeechSynthesizer. However, when it comes to customizing synthesis, I was directed to documentation about speech synthesis for an OSX only API, which allows you to input phoneme pairs, in order to customize word pronunciation. Unfortunately, this interface is not available on iOS.
I was hoping someone might know of a similar library or plugin that might accomplish the same task. If you do, it would be much appreciated if you would lend a hand.
Thanks in advance!
AVSpeechSynthesizer for iOS is not capable (out of the box) to work with phonemes. NSSpeechSynthesizer is capable of it, but that's not available on iOS.
You can create an algorithm that produces short phonemes, but it would be incredibly difficult to make it sound good by any means.
... allows you to input phoneme pairs, in order to customize word pronunciation. Unfortunately, this interface is not available on iOS.
This kind of interface is definitely available on iOS: in your device settings (iOS 12), once the menu General - Accessibility - Speech - Pronunciations is reached:
Select the '+' icon to add a new phonetic element.
Name this new element in order to quickly find it later on.
Tap the microphone icon.
Vocalize an entire sentence or a single word.
Listen to the different system proposals.
Validate your choice with the 'OK' button or cancel to start over.
Tap the back button to confirm the new created phonetic element.
Find all the generated elements in the Pronunciations page.
Following the steps above, you will be able to synthesize speech using phonemes for iOS.
The Mac OS speech synthesizer has a set of embedded commands that let you do things like change the pitch, speech rate, level of emphasis, etc. For example, you might use
That is [[emph +]]not[[emph -]] my dog!
To add emphasis to the word "not" in the phrase
That is not my dog!
Is there any such support in the iOS speech synthesizer? It looks like there is not, but I'm hoping against hope somebody knows of a way to do this.
As a follow-on question, is there a way to make global changes to the "Stock" voice you get for a given locale?" In the settings for Siri you can select the Language and country as well as the gender. The AVSpeechSynthesizer appears to only give you a single, semi-random gender for each language/country however. (For example the voice for en-US is female, en-GB is male, en-AU is female, with no apparent way to change it.)
I agree that it doesn't seem possible. From the docs, it seems Apple intends that you would create separate utterances and manually adjust the pitch/rate:
Because an utterance can control speech parameters, you can split text
into sections that require different parameters. For example, you can
emphasize a sentence by increasing the pitch and decreasing the rate
of that utterance relative to others, or you can introduce pauses
between sentences by putting each one into an utterance with a leading
or trailing delay. Because the speech synthesizer sends messages to
its delegate as it starts or finishes speaking an utterance, you can
create an utterance for each meaningful unit in a longer text in order
to be notified as its speech progresses.
I'm thinking to create a category extension to AVSpeechUtterance to parse embedded commands (as in your example) and automatically create separate utterances. If someone else has done this, or wants to help, please let me know. I'll update here.
Currently i am using open ears to detect a phrase and it works pretty well, although i would like to recognize all words in the english language and add that to a text field. So I had two thoughts on how to approach this.
1) Somehow load the entire english dictionary into OpenEars.
(i don't think it is a good idea because they say from 2-300 words or something like that
2)Activate the native iOS voice recognition without deploying the keyboard.
I'm leaning towoards the second way if possible because i love the live recognition in iOS 8, it works flawlessly for me.
How do i recognize all words using one of the two methods (or a better way if you know)?
Thank you
The answer is that you can't do 1) or 2), at least not the way you want to. OpenEars won't handle the whole English dictionary, and you can't get iOS voice recognition without the keyboard widget. You might want to look into Dragon Dictation, which is the speech engine that Siri uses, or SILVIA. You'll have to pay for a license though.