Would I be able to change the speaking voice of OpenEars to another? I don't quite like the the default one. Is that possible or would I have to use another API. Sorry if this is a stupid question.
Yes, you can, but it's a bit complex process. Openears uses Flite for speech synthesis, so you need to change the voice in Flite. Flite supports 13 voices to choose already and you have an option to build a new voice.
To build a new voice you need to follow the documentation. A festvox documentation might be also useful for you to understand the basics.
Related
I am a comic developer, however, I am deaf iOS developer. I don't need to hiring hearing but i know iOS has feature to select all sentence to speak like that. I need any idea whats called and code sample to make a speak whenever the String comes into speaking like that in Swift. Thanks.
I think you are asking for “speech synthesis”, also called “text-to-speech” (TTS for short).
Apple provides speech synthesis in the AVFoundation framework.
Speech Synthesis documentation
AVSpeechSynthesizer: Making iOS Talk
Create a seamless speech experience in your apps
Apple's accessibility system also provides speech synthesis. This feature is called VoiceOver.
UIAccessibilityReadingContent protocol documentation
Creating an Accessible Reading Experience
I know siri provides limited Intents and we have to add our app to domain to be able take input from siri.
But I would like to create my own intent for the users to access my app via siri.
I couldn't find much support for this anywhere ? any helpful pointers are welcome
Update
Now you can create custom intents and use them with Siri Shortcuts. Here is a simplified tutorial by Ray Wenderlich
https://www.raywenderlich.com/6462-siri-shortcuts-tutorial-in-ios-12
or you can prefer
This article for beginners
For more details, you should prefer Apple's official documentation.
https://developer.apple.com/documentation/sirikit
Outdated Answer
You can't create a custom intent for now. Maybe in later versions, they can add support for custom intents. Maybe they don't. With their current approach, Apple holds all the control over, intent(operation) types, data, privacy, etc. I'm not sure they will change that.
If you really need custom voice commands, you can implement it inside your application(not out of the app like Siri). There are alternatives like
Apple's AVSpeechSynthesizer
Apple's Speech
IBM's Watson
Nuance Speech Kit
You can create a custom intent as of iOS 12. It is quite a complicated process, but there are some tutorials available that can help you out.
I think Apple's WWDC example is a good starting point.
I'm working on an applicaion in Swift and I was thinking about a way to get Non-Speech sound recognition in my project.
I mean is there a way in which I can take in sound inputs and match them against some predefined sounds already incorporated in the project and if a match occurs, it should do some particular action?
Is there any way to do the above? I'm thinking breaking up the sounds and doing the checks, but can't seem to get any further than that.
My personal experience follows matt's comment above: requires serious technical knowledge.
There are several ways to do this, and one is typically as follows: extract some properties from the sound segment of interest (audio feature extraction), and classify this audio feature vector with some kind of machine learning technique. This typically requires some training phase where the machine learning technique was given some examples to learn what sounds you want to recognize (your predefined sounds) so that it can build a model from that data.
Without knowing what types of sounds you're aiming for to be recognized, maybe our C/C++ SDK available here might do the trick for you: http://www.samplesumo.com/percussive-sound-recognition
There's a technical demo on that page that you can download and try with your sounds. It's a C/C++ library, and there is a Mac, Windows and iOS version, so you should be able to integrate it with a Swift app on iOS. Maybe this will allow you to do what you need?
If you want to develop your own technology, you may want to start by finding and reading some scientific papers using the keywords "sound classification", "audio recognition", "machine listening", "audio feature classification", ...
Matt,
We've been developing a bunch of cool tools to speed up iOS development, specially in Swift. One of these tools is what we called TLSphinx: a Swift wrapper around Pocketsphinx which can perform speech recognition without the audio leaving the device.
I assume TLSphinx can help you solve your problem since it is a totally open source library. Search for it on Github ('TLSphinx') and you can also download our iOS app ('Tryolabs Mobile Showcase') and try the module live to see how it works.
Hope it is useful!
Best!
I want to have a set of images/labels. Let's call them labelOne and labelTwo. When labelOne is present is it possible to change it to labelTwo using voice?
Example. As I say "one" the labelOne changes to labelTwo.
Is this possible with speech kit? If so, can I host some voice recognition on the app so it doesn't have to contact the server?
Yes by using iSpeech its possible, but iSpeech is paid. The better option to do this via OpenEars
Follow this official doc
Yes, You can do that.But if you wanted to track just few keywords then don't go with API implementation it will be time consuming as well,Just use "Keyword Spotting".also you can use open source package like OpenEars which i had used in my project voice recognition logic and all.
check following ref which i made for my self as bookmark
http://appaloud.com/top-sdks-to-voice-enable-mobile-apps-quickly/
http://www.raywenderlich.com/60870/building-ios-app-like-siri
http://www.politepix.com/openears
http://www.politepix.com/openears/tutorial
Rejecto Plugin
I hope it will help you.
I use OpenEars (http://www.politepix.com/openears/) to do my voice recognition. It works offline and supports grammar based rules too.
Here is a video of it in action: http://youtu.be/idq7IRnrVq8
I've been researching several iOS speech recognition frameworks and have found it hard to accomplish something I would think is pretty straightforward.
I have an app that allows people to record their voices. After a recording is made, they have the option to create a text version.
Looking into the services out there (i.e., Nuance) most require you to use the microphone. OpenEars allows you to do this, but the dictionary is so limited because it is an offline solution (they recommend 300 or less words).
There are a few other things going on with the app that would make it very unappealing to switch from the current recording method. For what it is worth, I am using the Amazing Audio Engine framework.
Anyone have any other suggestions for frameworks. Or is there a way to dig deeper with Nuance to transcribe a recorded file?
Thank you for your time.
For services, there are a few cloud based hosted speech recognition services you can use. You simply post the audio file to their URL and receive back the text. Most of them don't have any constraint on the vocabulary. You can of course choose any recording method you like.
See here: Server-side Voice Recognition . Many of them offer free trial as well.