I am newbie in image processing, so sorry if it will seem dumb to you.
I am building app that will show user the keys that he/she must press.
I want to recognize the keys on piano.
How can I achieve this? What can you advise me(tutorials, online courses)?
Try openCV tutorials, it is free and well documented. Just read all and think how can you use this feature for your task. Spend 1-2 month, and ask here again but more distinctly, like: I use binarization and morphology operations and Hough line detectors but now I have this problem...
You may use terms "binarization", "morphology operations" and "Hough line detector" as a hint.
Instead of recognizing all keys, try something simpler that wiil just do.
For instance recognizing the piano as a whole would be enough I presume. Just ask the user to press C key then each time one octave upper C or whatever. It would be easy to divide 7, the distance between the two keys, even on 2D image of 3D perspective.
Related
First, you need to know that I'm a beginner in this subject. Initially, I'm an Embedded System Developpers but I never worked with image recognition.
Let me expose my main goal:
I would like to create my own database of Logos and be able to
recognize them in a larger image. Typical application would be, for
example, to make a database of pepsi logos and coca-cola logos and
when I take a photo of a bottle of Soda, it tells me if it one of
them or an another.
So, here is my problem:
I first wanted to use the Auto ML Kit of Google. I gave him my
databases so it could train itself on it. My first attempt was to
take photos of bottle entirely and then compare. It was ok but not
too efficient. I then tried to give him only logos but after
training, it couldnt recognize anything in the whole image of a
bottle.
I think I didn't give enough images in the first case. But I'd prefer to use the second case (by giving only logo) so that the machine would search something similar in the image.
Finally, my questions:
If you've worked with ML Kit from Google, were you able to train a
model by giving images that should be recognized in a larger image?
If yes, do you have any hints to give me?
Do you know reliable software that could help me to perform tests of this kind? I thought about Azure Machine Learning Studio from
Microsoft (since I develop on Visual Studio).
In a first time, I'd like to code as few as I can just for testing. Maybe later I could try to code my own Machine Learning System but I think it's a big challenge.
I also thought that I would need to split my image in smaller image and then send each of this images into the Machine but it would be time consuming and I need a fast reaction (like < 2 seconds).
Thanks in advance for your answer. I don't need complete answer with full tutorial (Stack Overflow is not intended for that anyway ^^) but just some advices would already be good.
Have a good day!
Azure’s Custom Vision is great for this: https://www.customvision.ai
Let’s say you want to detect a pepsi logo. Upload 70 images of products with the logo on them. Use Custom Vision to draw a box around the logo for each photo. Click “train”, and you get a tensorflow model with code.
Look up any tutorial for it, it’s pretty incredible and really easy to use.
I have been investigation a little about image recognition, But Haven't found something useful for me yet.
For my Wife who Is a Dentist that has to make his Tesis, I need to make an App that recognize all teeth Shape from a picture taken at standard conditions.
I need to find the best match based on teeth pattern predefined to categorize and see which match best. I know this is a big issue and not a simple solution.
Does someone know an image recognition software that makes me able to give it a a number of patterns, and then have an image and see wich pattern fits the best? Or maybe just some orientation to start searching and working on solving this problem.
Thanks!
OpenCV would be the way to go here but let me give you the facts before you start ripping your hair out.
I don't know your development experience but although OpenCV has an iOS wrapper you will be working with low-level, C libraries. If that makes you uncomfortable then turn back now. Furthermore, you will be writing the majority of the recognition/detection algorithms yourself and it takes a lot of time and patience to get these to the point where they work to an extent. Additionally, don't expect the end product to be all that reliable, professional image recognition/manipulation tools take years of development by teams of experts in computer vision. No disrespect but something that has been hacked together over a few weeks by one person will be sub-par and lacking.
Nonetheless if you want to go ahead, you can download OpenCV for iOS here:
http://docs.opencv.org/2.4/doc/tutorials/introduction/ios_install/ios_install.html
I cant seem to see anything in my searches.
I am trying to find a Rails 4 or 5 gem/plugin that acts similar to when you hold the camera to an app store card and it converts the text on the card to an object.
in my case i want a user to submit a photo and it reads two boxes in the screen shot / photo and converts the text in the image to text to save to an appropriate field and saves on from submit..
is there anything like this out there or am i just thinking a bit too far out of the box here..??
thanks.
Hope you're ready to invest a lot of time and probably money.
What you're looking for is OCR (Optical character recognition). ABBYY makes finereader which is a business solution for OCR. It's probably the best you can hope for at this point. There probably are quite a few open source solutions out there that will work more or less well.
Check out this project (or find similar ones): https://sourceforge.net/projects/tesseract-ocr/
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have an application in mind that I want to produce. We have wall-mounted schedule boards that are divided into small rectangles using black lines on a white background. Magnetic name tags are placed into a particular partition to indicate this person is to work in that cell. This system works very well for communication among people, but I would like an automatic way of saving this schedule information into a database automatically.
I am envisioning a system where a camera is set in a fix position focusing on the schedule board. Periodically the camera will take a picture of the board. I want to write some code to decipher which name tags are in which area. This would require some OCR or symbol recognition. There are big numbers on each name tag that I will use to identify the person whose name tag it is.
I naturally go to Python when tackling a new programming problem. I found this post -> python image recognition which looks like a good place to start (with PIL and numpy).
Do you know a good way to do this?
Update: I have tried SimpleCV and it seems good for now.
This is actually a pretty hard problem, even though it looks quite simple. But you can make it a lot easier by doing some stuff to your image to make this manageable. I have the following suggestions:
Try to make it so that your camera is looking straight at the board with a reasonable lens so that there is minimal distortion of the image on the edges, and no perspective distortion.
Given that you'll be shooting the occasional image for analysis I think performance is in no way an issue, so shoot high-resolution images, with a flash or with a long exposure time (because everything you're shooting is stationary) to get the best possible picture quality.
If the number of different tags you expect is not too large you might find it easier to just try to match reference images of these tags in your image through template matching rather than going for full OCR of numbers. This is a lot easier to get working if your image is good enough. The python opencv interface is very complete.
High Performance Mark has a good comment to your question about including barcodes on the tags. I would add the option of QR codes, but that is just the same thing. Both are easy to detect and there are good libraries to help you read them.
If you decide you do need OCR, you should look into available OCR packages and not try to roll your own. Try pytesser for the tesseract engine or the OCRopus python interface.
Since you mentioned that you would like to use Python for this problem, perhaps you could take a look at SimpleCV. It will provides you an easy way to grab the image from the camera and do basic image processing.
I strongly agree with jilles de witt that OCR would be an extremely hard image analysis task to develop from scratch. Code reading would be a better option, but that also will be difficult to program and will require sophisticated or somewhat challenging imaging as others have noted. However, for this app you really do not need to implement OCR or formal bar codes, QR or other 2d codes.
Since your application is constrained to a limited number of targets, perhaps you could make your own simple code. For example, you could place 0 to 4 big dots in a 2x2 array after each person's name. This simple example code uniquely identifies 16 unique tags, and the features will be much easier to image, extract and decode than formal codes. Add a locator line if the code position is not consistent.
I am searching for an algorithm to determine whether realtime audio input matches one of 144 given (and comfortably distinct) phoneme-pairs.
Preferably the lowest level that does the job.
I'm developing radical / experimental musical training software for iPhone / iPad.
My musical system comprises 12 consonant phonemes and 12 vowel phonemes, demonstrated here. That makes 144 possible phoneme pairs. The student has to sing the correct phoneme pair 'laa duu bee' etc in response to visual stimulus.
I have done a lot of research into this, it looks like my best bet may be to use one of the iOS Sphinx wrappers ( iPhone App › Add voice recognition? is the best source of information I have found ). However, I can't see how I would adapt such a package, can anyone with experience using one of these technologies give a basic rundown of the steps that would be required?
Would training be necessary by the user? I would have thought not, as it is such an elementary task, compared with full language models of thousands of words and far greater and more subtle phoneme base. However, it would be acceptable (not ideal) to have the user train 12 phoneme pairs: { consonant1+vowel1, consonant2+vowel2, ..., consonant12+vowel12 }. The full 144 would be too burdensome.
Is there a simpler approach? I feel like using a fully featured continuous speech recogniser is using a sledgehammer to crack a nut. It would be far more elegant to use the minimum technology that would solve the problem.
So really I'm hunting for any open source software that recognises phonemes.
PS I need a solution which runs pretty much real-time. so even as they are singing the note, firstly it blinks on to illustrate that it picked up the phoneme pair that was sung, and then it glows to illustrate whether they are singing the correct note pitch
If you are looking for a phone-level open source recogniser, then I would recommend HTK. Very good documentation is available with this tool in the form of the HTK Book. It also contains an entire chapter dedicated to building a phone level real-time speech recogniser. From your problem statement above, it seems to me like you might be able to re-work that example into your own solution. Possible pitfalls:
Since you want to do a phone level recogniser, the data needed to train the phone models would be very high. Also, your training database should be balanced in terms of distribution of the phones.
Building a speaker-independent system would require data from more than one speaker. And lots of that too.
Since this is open-source, you should also check into the licensing info for any additional details about shipping the code. A good alternative would be to use the on-phone recorder and then have the recorded waveform sent over a data channel to a server for the recognition, pretty much something like what google does.
I have a little bit of experience with this type of signal processing, and I would say that this is probably not the type of finite question that can be answered definitively.
One thing worth noting is that although you may restrict the phonemes you are interested in, the possibility space remains the same (i.e. infinite-ish). User training might help the algorithms along a bit, but useful training takes quite a bit of time and it seems you are averse to too much of that.
Using Sphinx is probably a great start on this problem. I haven't gotten very far in the library myself, but my guess is that you'll be working with its source code yourself to get exactly what you want. (Hooray for open source!)
...using a sledgehammer to crack a nut.
I wouldn't label your problem a nut, I'd say it's more like a beast. It may be a different beast than natural language speech recognition, but it is still a beast.
All the best with your problem solving.
Not sure if this would help: check out OpenEars' LanguageModelGenerator. OpenEars uses Sphinx and other libraries.
http://www.hfink.eu/matchbox
This page links to both YouTube video demo and github source.
I'm guessing it would still be a lot of work to mould it into the shape I'm after, but is also definitely does do a lot of the work.