How can you synthesize speech using Phonemes for iOS - ios

I'm working on an application that requires the use of a text to speech synthesizer. Implementing this was rather simple for iOS using AVSpeechSynthesizer. However, when it comes to customizing synthesis, I was directed to documentation about speech synthesis for an OSX only API, which allows you to input phoneme pairs, in order to customize word pronunciation. Unfortunately, this interface is not available on iOS.
I was hoping someone might know of a similar library or plugin that might accomplish the same task. If you do, it would be much appreciated if you would lend a hand.
Thanks in advance!

AVSpeechSynthesizer for iOS is not capable (out of the box) to work with phonemes. NSSpeechSynthesizer is capable of it, but that's not available on iOS.
You can create an algorithm that produces short phonemes, but it would be incredibly difficult to make it sound good by any means.

... allows you to input phoneme pairs, in order to customize word pronunciation. Unfortunately, this interface is not available on iOS.
This kind of interface is definitely available on iOS: in your device settings (iOS 12), once the menu General - Accessibility - Speech - Pronunciations is reached:
Select the '+' icon to add a new phonetic element.
Name this new element in order to quickly find it later on.
Tap the microphone icon.
Vocalize an entire sentence or a single word.
Listen to the different system proposals.
Validate your choice with the 'OK' button or cancel to start over.
Tap the back button to confirm the new created phonetic element.
Find all the generated elements in the Pronunciations page.
Following the steps above, you will be able to synthesize speech using phonemes for iOS.

Related

Where to find emoji's accessibility texts for iOS "voice over" feature?

I am working on an app that uses emojis on screen.
These emojis are displayed on buttons that can be pressed by the users.
To make this app compatible with "accessibility requirements", a.k.a. voice over, etc. I need to get all the emojis' description text, and when user is using "voice over", the emojis can be read to the user.
For example, when user is choosing an emoji is a "smiley face", voice over should read "smiley face" to the user. However, I cannot label manually for each of the emoji, because there are thousands of them.
I am wondering where should I get all the emoji description texts?
Thanks!!
As you've noticed already, the Accessibility subsystem already knows how to accessibly describe an emoji if given one as part of an accessibility-oriented text (like the accessibilityLabel for a control).
However, should you ever need emoji descriptions for other purposes (perhaps some kind of accessibility accommodation that doesn't go through the OS's Accessibility system), it might help to know how to find them yourself.
You can do this with Swift String.applyingTransform or ObjC NSString.stringByApplyingTransform:. (Both of these are wrappers for CoreFoundation's CFStringTransform API, which is better documented and featured in an old NSHipster post.) Use the toUnicodeName transform to get the names for emoji and other special characters — for example, as noted in the docs, that transforms “🐶🐮” into “{DOG FACE}{COW FACE}”.
(As you might notice in the StringTransform docs and the aforelinked NSHipster article, there are lots of other fun things you can do with string transforms, too, like latinizing text from other scripts or producing the XML/HTML hex escape codes for unusual characters.)
Forgot to post my answer the other day.
Turns out that Apple has already handled this in the framework.
All that we need to do is just to set the *.accessibilityLabel = the emoji itself. Then it all reads out correctly, such as "smiley face" when voice over feature is turned on.
Awesome!

Access iOS Voice recognition in app

Currently i am using open ears to detect a phrase and it works pretty well, although i would like to recognize all words in the english language and add that to a text field. So I had two thoughts on how to approach this.
1) Somehow load the entire english dictionary into OpenEars.
(i don't think it is a good idea because they say from 2-300 words or something like that
2)Activate the native iOS voice recognition without deploying the keyboard.
I'm leaning towoards the second way if possible because i love the live recognition in iOS 8, it works flawlessly for me.
How do i recognize all words using one of the two methods (or a better way if you know)?
Thank you
The answer is that you can't do 1) or 2), at least not the way you want to. OpenEars won't handle the whole English dictionary, and you can't get iOS voice recognition without the keyboard widget. You might want to look into Dragon Dictation, which is the speech engine that Siri uses, or SILVIA. You'll have to pay for a license though.

Speech input for visually impaired users without the need to tap the screen

We're working on a app for blind and visually impaired users. We've been experimenting with a third party library to get spoken user input and convert it to text, which we then parse as commands to control the app. The problem is that the word recognition is not very good and certainly not anywhere near as good as what iOS uses to get voice input on a text field.
I'd like to experiment with that, but our users are mostly unable to tap a text field, then hit the mic button on the popup keyboard, then hit the done button or even dismiss any of it. I'm not even sure how they can deal with a single tap on the whole screen, it might be too difficult for some. So, I'd like to automate that for them, but I don't see anything in the docs that indicates it is possible. So, is it even possible, and if so, what's the proper way to do it so that it passes verification?
The solution for you is to implement a keyword spotting so that the speech recognition will be activated with the keyword instead of button tap. After that you can record commands/text and recognize them with any service you need. Something like "Ok google" activation on Motorola X.
There are several keyword activation libraries for iOS, one possible solution is OpenEars based on the open source speech recogntion library CMUSphinx. If you want to use Pocketsphinx directly, you can find keyword activation implementation in kws branch in subversion (branches/kws)
The only way to get the iOS dictation is to sign up yourself through Nuance: http://dragonmobile.nuancemobiledeveloper.com/ - it's expensive, because it's the best. Presumably, Apple's contract prevents them from exposing an API.
The built in iOS accessibility features allow immobilized users to access dictation (and other keyboard buttons) through tools like VoiceOver and Assistive Touch. It may not be worth reinventing this if your users might be familiar with these tools.

Use audio files or text to speech for iOS application

I am creating an iOS game in which I have to inform user about events in the game with voice, that you have moved one piece, 2 pieces or well done you have performed well.
The problem is that voices are in large amount and if I replace audio files for each voice the app size will grow very large.
Second option I have discovered is to use text-to-speech library. I have tried "OpenEars" but the issue is I want voice like cartoon character or bird like which is not available in any of open source text-to-speech libraries as far as I have searched.
Can anybody suggest me what is the better way to handle it or any text-to-speech framework with different voice capabilities as mentioned in above paragraph.
Thanks in advance.
VoiceForge offers different TTS voices.
http://www.voiceforge.com

Vista Speech Recognition in Delphi

I would like to be able to dictate into my Delphi application using Microsoft Vista's speech recognition. However when attempting to dictate into a TMemo, it simply does not work. I noticed that Firefox 3.0 has the same issue and after they contacted Microsoft about the problem, they told them that they need to implement the Text Services Framework for their application.
I am wondering if there is any way to implement this in Delphi so that I can dictate into a TMemo or a TRichEdit. Searching Google for a solution didn't return any relevant results. Where would I start in finding a solution?
Edit: I found out that there is a way to enable speech recognition in all programs, even those that don't support it, simply by going to the options of Windows Speech Recognition and selecting Enable dictation everywhere. However when you use this to dictate into an editbox that doesn't use the Text Services Framework, it always pops up the Alternates Panel which displays the prompt Say the number next to the item you want, followed by OK. While this may work for short sentences, it does not have many of the useful features such as correcting or deleting a word. So I am still trying to figure out how to enable speech recognition without relying on the Enable dictation everywhere option.
I found out that there is a way to enable speech recognition in all programs, even those that don't support it, simply by going to the options of Windows Speech Recognition and selecting Enable dictation everywhere. However when you use this to dictate into an editbox that doesn't use the Text Services Framework, it always pops up the Alternates Panel which displays the prompt Say the number next to the item you want, followed by OK. While this may work for short sentences, it does not have many of the useful features such as correcting or deleting a word.
Text to speech in Vista
Just tested it with a button like the demo code on that page, works fine in Vista SP1/D2007. (funny, I clicked the 'Vista' tag-link and found it there...)

Resources