iOS / C: Algorithm to detect phonemes - ios

I am searching for an algorithm to determine whether realtime audio input matches one of 144 given (and comfortably distinct) phoneme-pairs.
Preferably the lowest level that does the job.
I'm developing radical / experimental musical training software for iPhone / iPad.
My musical system comprises 12 consonant phonemes and 12 vowel phonemes, demonstrated here. That makes 144 possible phoneme pairs. The student has to sing the correct phoneme pair 'laa duu bee' etc in response to visual stimulus.
I have done a lot of research into this, it looks like my best bet may be to use one of the iOS Sphinx wrappers ( iPhone App › Add voice recognition? is the best source of information I have found ). However, I can't see how I would adapt such a package, can anyone with experience using one of these technologies give a basic rundown of the steps that would be required?
Would training be necessary by the user? I would have thought not, as it is such an elementary task, compared with full language models of thousands of words and far greater and more subtle phoneme base. However, it would be acceptable (not ideal) to have the user train 12 phoneme pairs: { consonant1+vowel1, consonant2+vowel2, ..., consonant12+vowel12 }. The full 144 would be too burdensome.
Is there a simpler approach? I feel like using a fully featured continuous speech recogniser is using a sledgehammer to crack a nut. It would be far more elegant to use the minimum technology that would solve the problem.
So really I'm hunting for any open source software that recognises phonemes.
PS I need a solution which runs pretty much real-time. so even as they are singing the note, firstly it blinks on to illustrate that it picked up the phoneme pair that was sung, and then it glows to illustrate whether they are singing the correct note pitch

If you are looking for a phone-level open source recogniser, then I would recommend HTK. Very good documentation is available with this tool in the form of the HTK Book. It also contains an entire chapter dedicated to building a phone level real-time speech recogniser. From your problem statement above, it seems to me like you might be able to re-work that example into your own solution. Possible pitfalls:
Since you want to do a phone level recogniser, the data needed to train the phone models would be very high. Also, your training database should be balanced in terms of distribution of the phones.
Building a speaker-independent system would require data from more than one speaker. And lots of that too.
Since this is open-source, you should also check into the licensing info for any additional details about shipping the code. A good alternative would be to use the on-phone recorder and then have the recorded waveform sent over a data channel to a server for the recognition, pretty much something like what google does.

I have a little bit of experience with this type of signal processing, and I would say that this is probably not the type of finite question that can be answered definitively.
One thing worth noting is that although you may restrict the phonemes you are interested in, the possibility space remains the same (i.e. infinite-ish). User training might help the algorithms along a bit, but useful training takes quite a bit of time and it seems you are averse to too much of that.
Using Sphinx is probably a great start on this problem. I haven't gotten very far in the library myself, but my guess is that you'll be working with its source code yourself to get exactly what you want. (Hooray for open source!)
...using a sledgehammer to crack a nut.
I wouldn't label your problem a nut, I'd say it's more like a beast. It may be a different beast than natural language speech recognition, but it is still a beast.
All the best with your problem solving.

Not sure if this would help: check out OpenEars' LanguageModelGenerator. OpenEars uses Sphinx and other libraries.

http://www.hfink.eu/matchbox
This page links to both YouTube video demo and github source.
I'm guessing it would still be a lot of work to mould it into the shape I'm after, but is also definitely does do a lot of the work.

Related

iOs Image Recognition, Categorizing and matching pattern

I have been investigation a little about image recognition, But Haven't found something useful for me yet.
For my Wife who Is a Dentist that has to make his Tesis, I need to make an App that recognize all teeth Shape from a picture taken at standard conditions.
I need to find the best match based on teeth pattern predefined to categorize and see which match best. I know this is a big issue and not a simple solution.
Does someone know an image recognition software that makes me able to give it a a number of patterns, and then have an image and see wich pattern fits the best? Or maybe just some orientation to start searching and working on solving this problem.
Thanks!
OpenCV would be the way to go here but let me give you the facts before you start ripping your hair out.
I don't know your development experience but although OpenCV has an iOS wrapper you will be working with low-level, C libraries. If that makes you uncomfortable then turn back now. Furthermore, you will be writing the majority of the recognition/detection algorithms yourself and it takes a lot of time and patience to get these to the point where they work to an extent. Additionally, don't expect the end product to be all that reliable, professional image recognition/manipulation tools take years of development by teams of experts in computer vision. No disrespect but something that has been hacked together over a few weeks by one person will be sub-par and lacking.
Nonetheless if you want to go ahead, you can download OpenCV for iOS here:
http://docs.opencv.org/2.4/doc/tutorials/introduction/ios_install/ios_install.html

Robotics library in Forth?

I have read the documentation for the Roboforth environment from STrobotics and recognized that this a nice way for programming a robot. What I missed is a sophisticated software library with predefined motion primitives. For example, for picking up a object, for regrasping or for changing a tool.
In other programming languages like Python or C++, a library is a convenient way for programming repetitive tasks and for storing expert knowledge into machine-readable files. Also a library is good way for not-so-talented programmers to get access on higher-level-functions. In my opinion Forth is the perfect language for implementing such an API, but I didn't find information about it. Where should I search? Are there any examples out there?
I am author of RoboForth, and you make a good point. I have approached the problem of starting off new users with videos on YouTube; see How to... (playlist with 6 items, e.g "ST Robotics How-to number 1 - getting started") which is a playlist covering basics and indeed tool changing.
I never wrote any starter programs, because the physical positions (coordinates) would be different from one user to the next, however I think it can be done, and I will do it. Thanks for the heads up.

"Sound" Recognition in Swift?

I'm working on an applicaion in Swift and I was thinking about a way to get Non-Speech sound recognition in my project.
I mean is there a way in which I can take in sound inputs and match them against some predefined sounds already incorporated in the project and if a match occurs, it should do some particular action?
Is there any way to do the above? I'm thinking breaking up the sounds and doing the checks, but can't seem to get any further than that.
My personal experience follows matt's comment above: requires serious technical knowledge.
There are several ways to do this, and one is typically as follows: extract some properties from the sound segment of interest (audio feature extraction), and classify this audio feature vector with some kind of machine learning technique. This typically requires some training phase where the machine learning technique was given some examples to learn what sounds you want to recognize (your predefined sounds) so that it can build a model from that data.
Without knowing what types of sounds you're aiming for to be recognized, maybe our C/C++ SDK available here might do the trick for you: http://www.samplesumo.com/percussive-sound-recognition
There's a technical demo on that page that you can download and try with your sounds. It's a C/C++ library, and there is a Mac, Windows and iOS version, so you should be able to integrate it with a Swift app on iOS. Maybe this will allow you to do what you need?
If you want to develop your own technology, you may want to start by finding and reading some scientific papers using the keywords "sound classification", "audio recognition", "machine listening", "audio feature classification", ...
Matt,
We've been developing a bunch of cool tools to speed up iOS development, specially in Swift. One of these tools is what we called TLSphinx: a Swift wrapper around Pocketsphinx which can perform speech recognition without the audio leaving the device.
I assume TLSphinx can help you solve your problem since it is a totally open source library. Search for it on Github ('TLSphinx') and you can also download our iOS app ('Tryolabs Mobile Showcase') and try the module live to see how it works.
Hope it is useful!
Best!

shazam for voice recognition on iphone

I am trying to build an app that allows the user to record individual people speaking, and then save the recordings on the device and tag each record with the name of the person who spoke. Then there is the detection mode, in which i record someone and can tell whats his name if he is in the local database.
First of all - is this possible at all? I am very new to iOS development and not so familiar with the available APIs.
More importantly, which API should I use (ideally free) to correlate between the incoming voice and the records I have in the local db? This should behave something like Shazam, but much more simple since the database I am looking for a match against is much smaller.
If you're new to iOS development, I'd start with the core app to record the audio and let people manually choose a profile/name to attach it to and worry about the speaker recognition part later.
You obviously have two options for the recognition side of things: You can either tie in someone else's speech authentication/speaker recognition library (which will probably be in C or C++), or you can try to write your own.
How many people are going to use your app? You might be able to create something basic yourself: If it's the difference between a man and a woman you could probably figure that out by doing an FFT spectral analysis of the audio and figure out where the frequency peaks are. Obviously the frequencies used to enunciate different phonemes are going to vary somewhat, so solving the general case for two people who sound fairly similar is probably hard. You'll need to train the system with a bunch of text and build some kind of model of frequency distributions. You could try to do clustering or something, but you're going to run into a fair bit of maths fairly quickly (gaussian mixture models, et al). There are libraries/projects that'll do this. You might be able to port this from matlab, for example: https://github.com/codyaray/speaker-recognition
If you want to take something off-the-shelf, I'd go with a straight C library like mistral, as it should be relatively easy to call into from Objective-C.
The SpeakHere sample code should get you started for audio recording and playback.
Also, it may well take longer for the user to train your app to recognise them than it's worth in time-saving from just picking their name from a list. Unless you're intending their voice to be some kind of security passport type thing, it might just not be worth bothering with.

Custom robotics for building an auto CD-loading arm

Where would you recommend that I find a company to develop or buy a CD/DVD loading arm similar to: http://www.dextimus.com/
Preferably programmable via USB but if I only can get one with a serial interface that would be fine. Drivers dont matter - I can interface directly with the unit as my situation is very unique.
If you have some experience with electronics, you can give it a shot and build it yourself, like this or this.
I should add that the schematics and the source code are included, and in more details in the first project.
I suppose I might just shorten this by giving a list of resources first:
http://www.embedinc.com/ I trust this company to do good work. Expensive (actually, they are reasonably priced in the design community, but would be considered expensive by most hobbyists and individuals). Not great at people skills, but very very very good at what they do.
You should check out the various microcontroller communities and forums for hobbyists and professionals that can do this. Search for microchip, atmel, msp430, arm, powerpc, etc.
Sparkfun is a supplier to the electronics community - they have great forums where you can post your request, and you'll find people who might do it for fun with only the cost of materials. Might take longer, might not be as 'professional' or well packaged and delivered, but it might be your best low cost option.
There are many electronic design companies that could do this (for instance, I can do this sort of thing).
But there are many questions you haven't answered (and may not have researched) that could prevent success:
Is this patented?
What CD loading/unloading methods are not patented, are out of patent, or otherwise available?
What is your design goal - a one off just for you, or a device that can be built in the hundreds for industrial use, or a device meant for general office workers/consumers that is built in the millions?
Do you realize that this design qould surely cost mroe than simply buying one, if one is all you need?
As an example, assuming you don't need the nice enclosure and don't mind a 'prototype' look, just the mechanicals, electronics, and firmware design (no software on the PC) would likely be 100-250 billable hours for a design firm. At a cheap $90/hr, that's $9k to nearly $25k for one prototype. Add PC software and the nice enclosure, etc and you'll double that.
If you can find a local 'Make' group (techshop, GoTech, or similar) then you might be able to find a hobbyist that is willing to play with this idea for the cost of materials.
But if you define what your goal is, and give us an idea of your resources you may find a better answer.
-Adam
You can create a very nice simple solution using radio control servos. They come in many sizes, but even the small ones have enough torque to move a big arm to move a cd.
The real bonus with servos is that they normally have 180 degrees of rotation and internally have a variable resistor (rheostat) for positioning feedback. Positioning accuracy is normally within 1 degree of rotation which should be fine for a cd loader.
For picking up the CDs, nothing will beat a vacuum. I recommend a small battery powered vacuum cleaner. Funnel the suction into a 1/4 inch pipe. At the other end of the pipe a one inch diameter cup should provide more than enough lift from the small amount of suction.
As for the pile of blank CDs to be burnt, I would advise in moving the pile up rather than an arm down to it. probably having the top blank cd about 1/4 inch higher than the cd tray - By doing this, the arm only needs to rotate in one axis and the vacuum should be enough to suck the cd back out of the tray.
Now, for the electronics. For the servo control I suggest an rs232 serial servo controller. I've used the one from http://www.basicx.com/Products/servo/servo8t.htm as it also gives back torque information from the current draw.
For the low sample rate digital i/o, i suggest (for windows) inpout32.dll which is a dll to give you direct access to the bits of a parallel port. This will allow you to turn on the vacuum at the correct time and possibly sense when cd's have run out. Note that a parallel port can sink more current than it sources so for outputs you should connect to the 5v power line and set the output pin to 0 to turn on the output and 1 to turn it off.
The other nice option, which is very, very simple to interface and very cheap is to get hold of a picaxe from http://www.rev-ed.co.uk/picaxe/. These use a very simple programming language (a BASIC spin off) allowing you to read serial data in and control the servos and digital I/O on one chip. Last time I used one, the language was a bit simple - if statements had to jumped labels, else didn't exist.
If you do use a microcontroller and servos, it is best to use a dual voltage power supply as servos are noisy and can cause the microcontrollers to reset.
As for switching loads such as the vacuum on, you'll need to use a mosfet or (if money is no object) the simpler option, a solid state relay.
All digital inputs you use on the microcontroller should be pulled either to +V or ground with say a 5k resistor so they never float.
I cannot stress how simple and cheap the picaxes are. They have a built in interpreter so although code space is minimal on the small 8 pin units, they are programmable via a simple serial lead.
Good luck. Once you get into automation control, you'll never be able to stop. I'm in the middle of building a 3 axis CNC router so I can cut parts for other projects (I tell my girlfriend it's so she can cut out xmas decorations!).
You might want to contact Aaron Shephard about his Florian project.
I've found that a really easy board to control stepper motors or sorvos are produced by phidgets - the API is incredibly easy, and available for a vast array of platforms.

Resources