I am working on Voice Recognition to Display the Phonemes and its wave form if possible using the built-in voice recognition on vista and windows 7 using Delphi2009. Other programming languages are welcome.
To get the wave form, you need to enable retained audio using SetAudioOptions:
m_pRecoCtxt->SetAudioOptions(SPAO_RETAIN_AUDIO, NULL, NULL);
Once you have the reco, you can get the audio using ISpRecoResult::GetAudio and do whatever processiong you need.
For phonemes, I'd look at the answers on your other question here.
Related
I'm trying to synchronise text in my iOS app to audio that is being streamed simultaneously. The text is a very very accurate transcription of the audio that has been previously done manually. Is it possible to use keyword spotting or audio to text to assist with this?
The text is already indexed in the app with the clucene search engine, so it'll be very easy to search for any string of text/words in any paragraph in the text. Even if the audio to text conversion is not 100% accurate the search engine should be able to handle it and still find the best match in text within a couple tries.
Could you point me to any open source libraries for the audio to text conversion that would assist with this? I would prefer one that can convert the streamed audio to text directly and not rely on the microphones as is common in speech the text libraries as there may be cases where users may use headphones with the app and/or their may be background noise.
To recognize audiofile or audiostream on iOS you can use CMUSphinx with Openears.
To recognize a file you need to set pathToTestFile, see for details
http://www.politepix.com/openears/#PocketsphinxController_Class_Reference
To recognize the stream you can feed the audio into pocketsphinx through Pocketsphinx API
Since you know the text beforehand you can create a grammar from it and the recognition will be accurate.
From what I understand the iPhone5 has 3 separate microphones (see here), is it possible to record audio from all 3 mics simultaneously? I've been digging through the documents, and I've started digging into RemoteIO and CoreAudio but I can't figure out if its even possible to specify which built-in microphone to record from? Does anyone have any experience with this, or know if its even possible?
Thanks in advance.
EDIT: Pi's comment below is probably correct: You can select which mic to record from, but can't record from multiple mics at same time.
Apple documentation says it's possible since iOS 7:
Using APIs introduced in iOS 7, developers can perform tasks such as
locating a port description that represents the built-in microphone,
locating specific microphones like the "front", "back" or "bottom",
setting your choice of microphone as the preferred data source,
setting the built-in microphone port as the preferred input and even
selecting a preferred microphone polar pattern if the hardware
supports it. See AVAudioSession.h.
I've been researching several iOS speech recognition frameworks and have found it hard to accomplish something I would think is pretty straightforward.
I have an app that allows people to record their voices. After a recording is made, they have the option to create a text version.
Looking into the services out there (i.e., Nuance) most require you to use the microphone. OpenEars allows you to do this, but the dictionary is so limited because it is an offline solution (they recommend 300 or less words).
There are a few other things going on with the app that would make it very unappealing to switch from the current recording method. For what it is worth, I am using the Amazing Audio Engine framework.
Anyone have any other suggestions for frameworks. Or is there a way to dig deeper with Nuance to transcribe a recorded file?
Thank you for your time.
For services, there are a few cloud based hosted speech recognition services you can use. You simply post the audio file to their URL and receive back the text. Most of them don't have any constraint on the vocabulary. You can of course choose any recording method you like.
See here: Server-side Voice Recognition . Many of them offer free trial as well.
I am creating an iOS game in which I have to inform user about events in the game with voice, that you have moved one piece, 2 pieces or well done you have performed well.
The problem is that voices are in large amount and if I replace audio files for each voice the app size will grow very large.
Second option I have discovered is to use text-to-speech library. I have tried "OpenEars" but the issue is I want voice like cartoon character or bird like which is not available in any of open source text-to-speech libraries as far as I have searched.
Can anybody suggest me what is the better way to handle it or any text-to-speech framework with different voice capabilities as mentioned in above paragraph.
Thanks in advance.
VoiceForge offers different TTS voices.
http://www.voiceforge.com
I'm trying to save time on a project I'm beginning that will record audio from the connected audio input devices on a Windows XP or Windows 7 PC. In the past I have used the DSPACK components for Delphi 6 Pro to do video capture on a Windows PC, but I am wondering if it is the best solution for doing a project that only needs to record audio, not video. Is DSPACK still the way to go or is their a faster/easier solution to recording audio via Direct3D from the PC's connected audio input devices? Sample rate conversion and other similar features in a suggested solution would be desirable too. Links to tutorials, etc. are also appreciated.
If you are familiar with DSPack and using DirectShow filters then it is a good choice for the job. DSP-Worx have an audio filter (DCDSPFilter) that provides a range of effects and they also have DirectShow Interface (LameDShowIntf) to the Lame encoder.
You may also want to consider using GMFBridge to reduce latencies to a minimum.
http://www.mitov.com/html/audiolab.html
I think you can find this components real usefull for your work...