Currently i am using open ears to detect a phrase and it works pretty well, although i would like to recognize all words in the english language and add that to a text field. So I had two thoughts on how to approach this.
1) Somehow load the entire english dictionary into OpenEars.
(i don't think it is a good idea because they say from 2-300 words or something like that
2)Activate the native iOS voice recognition without deploying the keyboard.
I'm leaning towoards the second way if possible because i love the live recognition in iOS 8, it works flawlessly for me.
How do i recognize all words using one of the two methods (or a better way if you know)?
Thank you
The answer is that you can't do 1) or 2), at least not the way you want to. OpenEars won't handle the whole English dictionary, and you can't get iOS voice recognition without the keyboard widget. You might want to look into Dragon Dictation, which is the speech engine that Siri uses, or SILVIA. You'll have to pay for a license though.
Related
I'm working on an application that requires the use of a text to speech synthesizer. Implementing this was rather simple for iOS using AVSpeechSynthesizer. However, when it comes to customizing synthesis, I was directed to documentation about speech synthesis for an OSX only API, which allows you to input phoneme pairs, in order to customize word pronunciation. Unfortunately, this interface is not available on iOS.
I was hoping someone might know of a similar library or plugin that might accomplish the same task. If you do, it would be much appreciated if you would lend a hand.
Thanks in advance!
AVSpeechSynthesizer for iOS is not capable (out of the box) to work with phonemes. NSSpeechSynthesizer is capable of it, but that's not available on iOS.
You can create an algorithm that produces short phonemes, but it would be incredibly difficult to make it sound good by any means.
... allows you to input phoneme pairs, in order to customize word pronunciation. Unfortunately, this interface is not available on iOS.
This kind of interface is definitely available on iOS: in your device settings (iOS 12), once the menu General - Accessibility - Speech - Pronunciations is reached:
Select the '+' icon to add a new phonetic element.
Name this new element in order to quickly find it later on.
Tap the microphone icon.
Vocalize an entire sentence or a single word.
Listen to the different system proposals.
Validate your choice with the 'OK' button or cancel to start over.
Tap the back button to confirm the new created phonetic element.
Find all the generated elements in the Pronunciations page.
Following the steps above, you will be able to synthesize speech using phonemes for iOS.
Before I spend a lot of time (I guess) reading up on SpeechSynthesisUtterance, is this the way to go to add voice recognition to my Dart+Polymer web app?
In fact, all I need is dictation (to fill in a text box). Nothing particularly clever (on my part at least!).
Or [polymer] "is there an element for that?" ;-)
cheers
Steve
If it works in JS you can make it work in Dart. No need of Polymer but it will work with Polymer.dart as well.
I'm working on an applicaion in Swift and I was thinking about a way to get Non-Speech sound recognition in my project.
I mean is there a way in which I can take in sound inputs and match them against some predefined sounds already incorporated in the project and if a match occurs, it should do some particular action?
Is there any way to do the above? I'm thinking breaking up the sounds and doing the checks, but can't seem to get any further than that.
My personal experience follows matt's comment above: requires serious technical knowledge.
There are several ways to do this, and one is typically as follows: extract some properties from the sound segment of interest (audio feature extraction), and classify this audio feature vector with some kind of machine learning technique. This typically requires some training phase where the machine learning technique was given some examples to learn what sounds you want to recognize (your predefined sounds) so that it can build a model from that data.
Without knowing what types of sounds you're aiming for to be recognized, maybe our C/C++ SDK available here might do the trick for you: http://www.samplesumo.com/percussive-sound-recognition
There's a technical demo on that page that you can download and try with your sounds. It's a C/C++ library, and there is a Mac, Windows and iOS version, so you should be able to integrate it with a Swift app on iOS. Maybe this will allow you to do what you need?
If you want to develop your own technology, you may want to start by finding and reading some scientific papers using the keywords "sound classification", "audio recognition", "machine listening", "audio feature classification", ...
Matt,
We've been developing a bunch of cool tools to speed up iOS development, specially in Swift. One of these tools is what we called TLSphinx: a Swift wrapper around Pocketsphinx which can perform speech recognition without the audio leaving the device.
I assume TLSphinx can help you solve your problem since it is a totally open source library. Search for it on Github ('TLSphinx') and you can also download our iOS app ('Tryolabs Mobile Showcase') and try the module live to see how it works.
Hope it is useful!
Best!
We're working on a app for blind and visually impaired users. We've been experimenting with a third party library to get spoken user input and convert it to text, which we then parse as commands to control the app. The problem is that the word recognition is not very good and certainly not anywhere near as good as what iOS uses to get voice input on a text field.
I'd like to experiment with that, but our users are mostly unable to tap a text field, then hit the mic button on the popup keyboard, then hit the done button or even dismiss any of it. I'm not even sure how they can deal with a single tap on the whole screen, it might be too difficult for some. So, I'd like to automate that for them, but I don't see anything in the docs that indicates it is possible. So, is it even possible, and if so, what's the proper way to do it so that it passes verification?
The solution for you is to implement a keyword spotting so that the speech recognition will be activated with the keyword instead of button tap. After that you can record commands/text and recognize them with any service you need. Something like "Ok google" activation on Motorola X.
There are several keyword activation libraries for iOS, one possible solution is OpenEars based on the open source speech recogntion library CMUSphinx. If you want to use Pocketsphinx directly, you can find keyword activation implementation in kws branch in subversion (branches/kws)
The only way to get the iOS dictation is to sign up yourself through Nuance: http://dragonmobile.nuancemobiledeveloper.com/ - it's expensive, because it's the best. Presumably, Apple's contract prevents them from exposing an API.
The built in iOS accessibility features allow immobilized users to access dictation (and other keyboard buttons) through tools like VoiceOver and Assistive Touch. It may not be worth reinventing this if your users might be familiar with these tools.
What is the best way to accomplish voice to text for a search box? I don't know if there is a good api to help accomplish this. Or is it better to use the built in voice to text that Apple gave us? I am hoping to get this working for older Iphone devices.
Trying to get something done in spite of Apple's OS restrictions is never a good way to approach things in a maintainable way. There is probably a library out there to do voice-to-text, but the built-in one is device restricted because voice-to-text requires some heavy lifting.