How to compare a webcam image/video to a specific image/video? - parsing

I am basically just starting out in computer programming; mostly fluent in basic Java. I have an idea of creating an ASL (American Sign Language) to English, and my initial problem is how to identify hand movement from a webcam then comparing them to Signs that is already stored as an image or another video. If the problem is a bit too advanced for me then please list any major concepts that I can learn. Please and thank you.

You clearly have a challenging problem ^^. Try to explain all you need to solve your problem would be very hard, mainly because there many ways to do this. I advice you to read a nice book about image processing (Gonzalez' book is a nice choice) and the OpenCV documentation (but it is implemented in C, C++ and has Python bindings; although it's a library that implements a lot of image processing techniques). Maybe you should focus your study on feature detection, motion analysis and object tracking. As sign language uses not just hand sign (static state) but also hand moviments (dynamic state) to express something, object tracking may be a good way to describe the signs.I hope these informations help you, at least a little -^.^- Bye bye.

Look at OpenCV. They have a lot of libraries that you might find handy.
http://opencv.willowgarage.com/wiki/

Related

fuzzy logic for RPL objective function experiment

I'm intending to develop a new OF for RPL in cooja simulator, the thing is that I don't find any tutorial or example on how to do so!
Also, There are hundreds of published papers on this work, yet no guidance on how to conduct your own experiements!
Any help or tutorial i can follow.
Furthermore, i need to know what are possible tools needed to do so, like matlab, python or C++ libraries?
too much confused and cannot figure out where to start actually.
Please Help
Please Help I have been searching and reading alot, nothing found but journal papers discusses things theoritically.

iOs Image Recognition, Categorizing and matching pattern

I have been investigation a little about image recognition, But Haven't found something useful for me yet.
For my Wife who Is a Dentist that has to make his Tesis, I need to make an App that recognize all teeth Shape from a picture taken at standard conditions.
I need to find the best match based on teeth pattern predefined to categorize and see which match best. I know this is a big issue and not a simple solution.
Does someone know an image recognition software that makes me able to give it a a number of patterns, and then have an image and see wich pattern fits the best? Or maybe just some orientation to start searching and working on solving this problem.
Thanks!
OpenCV would be the way to go here but let me give you the facts before you start ripping your hair out.
I don't know your development experience but although OpenCV has an iOS wrapper you will be working with low-level, C libraries. If that makes you uncomfortable then turn back now. Furthermore, you will be writing the majority of the recognition/detection algorithms yourself and it takes a lot of time and patience to get these to the point where they work to an extent. Additionally, don't expect the end product to be all that reliable, professional image recognition/manipulation tools take years of development by teams of experts in computer vision. No disrespect but something that has been hacked together over a few weeks by one person will be sub-par and lacking.
Nonetheless if you want to go ahead, you can download OpenCV for iOS here:
http://docs.opencv.org/2.4/doc/tutorials/introduction/ios_install/ios_install.html

Robotics library in Forth?

I have read the documentation for the Roboforth environment from STrobotics and recognized that this a nice way for programming a robot. What I missed is a sophisticated software library with predefined motion primitives. For example, for picking up a object, for regrasping or for changing a tool.
In other programming languages like Python or C++, a library is a convenient way for programming repetitive tasks and for storing expert knowledge into machine-readable files. Also a library is good way for not-so-talented programmers to get access on higher-level-functions. In my opinion Forth is the perfect language for implementing such an API, but I didn't find information about it. Where should I search? Are there any examples out there?
I am author of RoboForth, and you make a good point. I have approached the problem of starting off new users with videos on YouTube; see How to... (playlist with 6 items, e.g "ST Robotics How-to number 1 - getting started") which is a playlist covering basics and indeed tool changing.
I never wrote any starter programs, because the physical positions (coordinates) would be different from one user to the next, however I think it can be done, and I will do it. Thanks for the heads up.

OpenCV and Computer Vision, where do we stand now?

I want to do a project involving Computer Vision. Mostly object detection/identification. After some research, I keep coming back to OpenCV. But all of the tutorials are from 2008 (I guess it was big for a bit then). It doesn't compile in Python on the mac apparently. I'm using the C++ framework right out of Xcode, but none of the tutorials work as they're outdated and the documentation sucks from what I can parse.
Is there a better solution for what I'm doing, and does anyone have any suggestions as to learning how to to use OpenCV?
Thanks
I have had similar problems getting started with OpenCV and from my experience this is actually the biggest hurdle to learning it. Here is what worked for me:
This book: "OpenCV 2 Computer Vision Application Programming Cookbook." It's the most up-to-date book and has examples on how to solve different Computer Vision problems (You can see the table of contents on Amazon with "Look Inside!"). It really helped ease me into OpenCV and get comfortable with how the library works.
Like have others have said, the samples are very helpful. For things that the book skips or covers only briefly you can usually find more detailed examples when looking through the samples. You can also find different ways of solving the same problem between the book and the samples. For example, for finding keypoints/features, the book shows an example using FAST features:
vector<KeyPoint> keypoints;
FastFeatureDetector fast(40);
fast.detect(image, keypoints);
But in the samples you will find a much more flexible way (if you want to have the option of choosing which keypoint detection algorithm to use):
vector<KeyPoint> keypoints;
Ptr<FeatureDetector> featureDetector = FeatureDetector::create("FAST");
featureDetector->detect(image, keypoints);
From my experience things eventually start to click and for more specific questions you start finding up-to-date information on blogs or right here on StackOverflow.
Let me add a couple of things. First, I can assure you that the Python bindings to OpenCV work on a Mac. I use them every day.
Many people like OpenCV for many reasons:
The license is good, friendly to integration into commercial products, etc.
It is quite good from a technical stand point. It gives you a reference implementation of state of the art algorithms.
It tends to be quite fast compared to the alternatives (Matlab I'm looking at you).
Like everything in life, it is not perfect:
It is a good example of a software library that is a moving target.
I have a 300 line python program that uses OpenCV and every few
months when a new version of OpenCV is released I have to change it
to adapt to the new function names/calling conventions, etc. The
library does advance, a lot, however it is a pain to have to change
the same program 3 times per year.
It has a learning curve, like computer vision itself, it is quite
technical and not easy to learn.
There are alternatives (with other pros and cons) MATLAB with the Image Processing Toolbox is one such example.
The simplest answer that comes to mind, is to read the example code with a bit of understanding, and to try out if Your ideas work. The api does change, and most of the tutorials are writen for the first versions of OpenCV, and it looks that nobody bothered to rewrite them. Nevertheless the core ideas behind it are not changing. So if You find a tutorial answering Your questions, but written in old API just look in the documentation for modern replacements of used functions. It’s not easy and quick, but looks like it works. If You use the newest (actually 2.3) version, I suggest using both the 2.1 documntation and 2.3 docs + tutorials . You should also look into the samples, which should have been installed alongside the library. There are lots of hints about how to use certain structures and tricks that weren't mentioned in documentation. Finally, don't be afraid to look inside the code of the library itself (if You compiled it on Your own). Unfortunately, thats the only source I know to check for example what code corresponds to which type of Mat object.

iOS / C: Algorithm to detect phonemes

I am searching for an algorithm to determine whether realtime audio input matches one of 144 given (and comfortably distinct) phoneme-pairs.
Preferably the lowest level that does the job.
I'm developing radical / experimental musical training software for iPhone / iPad.
My musical system comprises 12 consonant phonemes and 12 vowel phonemes, demonstrated here. That makes 144 possible phoneme pairs. The student has to sing the correct phoneme pair 'laa duu bee' etc in response to visual stimulus.
I have done a lot of research into this, it looks like my best bet may be to use one of the iOS Sphinx wrappers ( iPhone App › Add voice recognition? is the best source of information I have found ). However, I can't see how I would adapt such a package, can anyone with experience using one of these technologies give a basic rundown of the steps that would be required?
Would training be necessary by the user? I would have thought not, as it is such an elementary task, compared with full language models of thousands of words and far greater and more subtle phoneme base. However, it would be acceptable (not ideal) to have the user train 12 phoneme pairs: { consonant1+vowel1, consonant2+vowel2, ..., consonant12+vowel12 }. The full 144 would be too burdensome.
Is there a simpler approach? I feel like using a fully featured continuous speech recogniser is using a sledgehammer to crack a nut. It would be far more elegant to use the minimum technology that would solve the problem.
So really I'm hunting for any open source software that recognises phonemes.
PS I need a solution which runs pretty much real-time. so even as they are singing the note, firstly it blinks on to illustrate that it picked up the phoneme pair that was sung, and then it glows to illustrate whether they are singing the correct note pitch
If you are looking for a phone-level open source recogniser, then I would recommend HTK. Very good documentation is available with this tool in the form of the HTK Book. It also contains an entire chapter dedicated to building a phone level real-time speech recogniser. From your problem statement above, it seems to me like you might be able to re-work that example into your own solution. Possible pitfalls:
Since you want to do a phone level recogniser, the data needed to train the phone models would be very high. Also, your training database should be balanced in terms of distribution of the phones.
Building a speaker-independent system would require data from more than one speaker. And lots of that too.
Since this is open-source, you should also check into the licensing info for any additional details about shipping the code. A good alternative would be to use the on-phone recorder and then have the recorded waveform sent over a data channel to a server for the recognition, pretty much something like what google does.
I have a little bit of experience with this type of signal processing, and I would say that this is probably not the type of finite question that can be answered definitively.
One thing worth noting is that although you may restrict the phonemes you are interested in, the possibility space remains the same (i.e. infinite-ish). User training might help the algorithms along a bit, but useful training takes quite a bit of time and it seems you are averse to too much of that.
Using Sphinx is probably a great start on this problem. I haven't gotten very far in the library myself, but my guess is that you'll be working with its source code yourself to get exactly what you want. (Hooray for open source!)
...using a sledgehammer to crack a nut.
I wouldn't label your problem a nut, I'd say it's more like a beast. It may be a different beast than natural language speech recognition, but it is still a beast.
All the best with your problem solving.
Not sure if this would help: check out OpenEars' LanguageModelGenerator. OpenEars uses Sphinx and other libraries.
http://www.hfink.eu/matchbox
This page links to both YouTube video demo and github source.
I'm guessing it would still be a lot of work to mould it into the shape I'm after, but is also definitely does do a lot of the work.

Resources