I am creating an iOS game in which I have to inform user about events in the game with voice, that you have moved one piece, 2 pieces or well done you have performed well.
The problem is that voices are in large amount and if I replace audio files for each voice the app size will grow very large.
Second option I have discovered is to use text-to-speech library. I have tried "OpenEars" but the issue is I want voice like cartoon character or bird like which is not available in any of open source text-to-speech libraries as far as I have searched.
Can anybody suggest me what is the better way to handle it or any text-to-speech framework with different voice capabilities as mentioned in above paragraph.
Thanks in advance.
VoiceForge offers different TTS voices.
http://www.voiceforge.com
Related
I'm trying to synchronise text in my iOS app to audio that is being streamed simultaneously. The text is a very very accurate transcription of the audio that has been previously done manually. Is it possible to use keyword spotting or audio to text to assist with this?
The text is already indexed in the app with the clucene search engine, so it'll be very easy to search for any string of text/words in any paragraph in the text. Even if the audio to text conversion is not 100% accurate the search engine should be able to handle it and still find the best match in text within a couple tries.
Could you point me to any open source libraries for the audio to text conversion that would assist with this? I would prefer one that can convert the streamed audio to text directly and not rely on the microphones as is common in speech the text libraries as there may be cases where users may use headphones with the app and/or their may be background noise.
To recognize audiofile or audiostream on iOS you can use CMUSphinx with Openears.
To recognize a file you need to set pathToTestFile, see for details
http://www.politepix.com/openears/#PocketsphinxController_Class_Reference
To recognize the stream you can feed the audio into pocketsphinx through Pocketsphinx API
Since you know the text beforehand you can create a grammar from it and the recognition will be accurate.
I've been researching several iOS speech recognition frameworks and have found it hard to accomplish something I would think is pretty straightforward.
I have an app that allows people to record their voices. After a recording is made, they have the option to create a text version.
Looking into the services out there (i.e., Nuance) most require you to use the microphone. OpenEars allows you to do this, but the dictionary is so limited because it is an offline solution (they recommend 300 or less words).
There are a few other things going on with the app that would make it very unappealing to switch from the current recording method. For what it is worth, I am using the Amazing Audio Engine framework.
Anyone have any other suggestions for frameworks. Or is there a way to dig deeper with Nuance to transcribe a recorded file?
Thank you for your time.
For services, there are a few cloud based hosted speech recognition services you can use. You simply post the audio file to their URL and receive back the text. Most of them don't have any constraint on the vocabulary. You can of course choose any recording method you like.
See here: Server-side Voice Recognition . Many of them offer free trial as well.
After finally successfully finding a way to concatenate multiple voice files into one single audio file on the iPhone, I am am now trying to superimpose an audio file over the length of the voice file.
So basically I have two .m4a files:
voice.m4a which is about 10 seconds for example.
music.m4a which is about 5 seconds.
What I require is that two file be combined in such a manner that the resulting single audio file now contains the music in the background of the voice file for the length of it, so basically the resulting output should have the 10 seconds of voice and the 5seconds of music repeated twice. It is absolutely important to have a single file that contains all of this.
I am trying to get all of this done in an application on the iPhone.
Can anyone please help me out with this?
If you are looking to do that programmatically, you will need to go deeper down into CoreAudio. For a simpler solution you could use AudioQueues or for more fine grained control AudioUnits and an AUGraph. The MultiChannelMixer is the Audio Unit you are looking for. Unfortunately there is no space for an elaborate tutorial here (would take a couple of days to write just the tutorial itself), but I am hoping I could point you to the right direction.
If you decide to go down that path and want to do further audio programming then this one time simple example, then I strongly suggest you buy "Learning Core Audio, A Hands-on Guide to Audio Programming for Mac and iOS" - Chris Adamson, Kevin Avila. You can find it on Amazon, paperback or Kindle.
I have been doing some research on the best way to program a music game for iOS similar to Tap Tap Revenge, Guitar Hero, Rock Band etc. Portability is a plus.
This video explains that Open AL has some great ways of handling sounds, playing multiple sounds at once and recycling memory. I have also come across Cocoas2d Denshion for handling audio at low latency.
This article states that HTML5 is terrible for audio playback especially polyphony. He goes on to state that Phonegap's Media class works nicely and by using the native plugin model you can create a low latency solution with Phonegap
If you were to choose an API which would you choose to create a low latency audio based game and why? If you have a different suggestion than the ones mentioned please describe and why. Thank you.
The RemoteIO Audio Unit, when configured with an Audio Session requesting very short buffers, will allow the lowest latencies on current iOS devices. OpenAL appears to be built on top of it.
There are ways to address HTML5 latency, as described here and here. I suggest you try those out on your phone and see if they feel responsive enough. If not, then Novocaine is probably your best bet.
If you should decide to go the PhoneGap route then Andy Trice's Low Latency Audio Plugin should address your concerns.
Wedge.js is something I saw on Hacker News today, maybe it'll help you out
http://www.boxuk.com/labs/wedge-js
I'm developing a virtual instrument app for iOS and am trying to implement a recording function so that the app can record and playback the music the user makes with the instrument. I'm currently using the CocosDenshion sound engine (with a few of my own hacks involving fades etc) which is based on OpenAL. From my research on the net it seems I have two options:
Keep a record of the user's inputs (ie. which notes were played at what volume) so that the app can recreate the sound (but this cannot be shared/emailed).
Hack my own low-level sound engine using AudioUnits & specifically RemoteIO so that I manually mix all the sounds and populate the final output buffer by hand and hence can save said buffer to a file. This will be able to be shared by email etc.
I have implemented a RemoteIO callback for rendering the output buffer in the hope that it would give me previously played data in the buffer but alas the buffer is always all 00.
So my question is: is there an easier way to sniff/listen to what my app is sending to the speakers than my option 2 above?
Thanks in advance for your help!
I think you should use remoteIO, I had a similar project several months ago and wanted to avoid remoteIO and audio units as much as possible, but in the end, after I wrote tons of code and read lots of documentations from third party libraries (including cocosdenshion) I end up using audio units anyway. More than that, it's not that hard to set up and work with. If you however look for a library to do most of the work for you, you should look for one written a top of core audio not open al.
You might want to take a look at the AudioCopy framework. It does a lot of what you seem to be looking for, and will save you from potentially reinventing some wheels.