Auto Text Recognition (OCR) from Image - image-processing

I want to recognize nutrient information from package labels Sample Nutrient label . This is one package image, different brands may style/layout their labels differently. But I know some things for sure, layout would be somewhat tabular with certain key words in heading like 'Nutrient' as well as the content of the table will have certain common words, like Energy/Fat etc. I want to extract these values in text form and save it into my db.
The sample image is part of a bigger problem, finding the contour/box that might contain this section 'Nutrient Label'.
As I understand their are 3 broad steps.
Scan the input image (product front/back/side image) to look for the best contour that could be my target contour containing these nutrient information
Go to this contour and perform OCR (possibly retain the layout information and not output everything in 1 line)
scan the text and look for needed info.
I am a beginner in Image Recognition. it would be a great help,
If i could get a feedback on my approach. for instance should I look for text in Image or gather similar images and train a model and then do classification? similar to performing face recognition.
if someone has already solved this problem, it would be great to get some pointers (their is no fun reinventing the wheel).
If its a research problem, then relevant code/libraries/pointers/similar SO questions that I could refer to.
It would be highly appreciable if the answers are not general (like perform feature extraction, I would no clue what is feature extraction, instead a sample code pointer would be awesome.)
I thank you for your time and help.
thanks
Chahat

It would be required to collect at least 200-300 images for sufficient training.
2/3. I did solve the problem, but it was done using not a free solution, so I'm not supposed to give a directions here.

Related

How to recognize or match two images?

I have one image stored in my bundle or in the application.
Now I want to scan images in camera and want to compare that images with my locally stored image. When image is matched I want to play one video and if user move camera from that particular image to somewhere else then I want to stop that video.
For that I have tried Wikitude sdk for iOS but it is not working properly as it is crashing anytime because of memory issues or some other reasons.
Other things came in mind that Core ML and ARKit but Core ML detect the image's properties like name, type, colors etc and I want to match the image. ARKit will not support all devices and ios and also image matching as per requirement is possible or not that I don't have idea.
If anybody have any idea to achieve this requirement they can share. every help will be appreciated. Thanks:)
Easiest way is ARKit's imageDetection. You know the limitation of devices it support. But the result it gives is wide and really easy to implement. Here is an example
Next is CoreML, which is the hardest way. You need to understand machine learning even if in brief. Then the tough part - training with your dataset. Biggest drawback is you have single image. I would discard this method.
Finally mid way solution is to use OpenCV. It might be hard but suit your need. You can find different methods of feature matching to find your image in camera feed. example here. You can use objective-c++ to code in c++ for ios.
Your task is image similarity you can do it simply and with more reliable output results using machine learning. Since your task is using camera scanning. Better option is CoreML.You can refer this link by apple for Image Similarity.You can optimize your results by training with your own datasets. Any more clarifications needed comment.
Another approach is to use a so-called "siamese network". Which really means that you use a model such as Inception-v3 or MobileNet and both images and you compare their outputs.
However, these models usually give a classification output, i.e. "this is a cat". But if you remove that classification layer from the model, it gives an output that is just a bunch of numbers that describe what sort of things are in the image but in a very abstract sense.
If these numbers for two images are very similar -- if the "distance" between them is very small -- then the two images are very similar too.
So you can take an existing Core ML model, remove the classification layer, run it twice (once on each image), which gives you two sets of numbers, and then compute the distance between these numbers. If this distance is lower than some kind of threshold, then the images are similar enough.

when using opencv cvMatchTemplate, how to choose a good template for more accurate matching?

I am using opencv cvMatchTemplate to find a trademark pattern in a bunch of images.
now what I did is looking at the picture, find some unique patch from the trademark, and use it as my template.
I found using the whole trademark image is not necessarily better off than using part of it. My question is
is this normal? or I did it wrongly.
if it is normal, how can I choose a good template for more accurate matching?
or in other words, is there any mathematical theory behind selecting good template, which can help me select best template.
I am not using feature detection, as I found that is not accurate as template matching.
Why is template matching working as good as feature detection (when everyone seems to love feature detection)
Template matching is different than feature detection, as assumes that what you are looking for is in the same plane (the image plane) as the template. A "warped" template will not work ( template matching with warped templates is called Digital Image Correlation).
Thus, if you are looking for a logo in an image of a sheet of paper aligned to the camera, then template-matching is your thing, but if you are looking for a logo on a random image of a street, then feature detection is your thing.
How does template matching work?
Well openCV has a brilliant example of it:
http://docs.opencv.org/doc/tutorials/imgproc/histograms/template_matching/template_matching.html
Why does a piece of the template work as good as the whole?
However you are wondering why just a piece of the template is as good as the whole thing (not always, but can happen).
This is easy to understand: Your "part" of the template has enough information to be identified.
Example*:
If I ask you to find the following image, would you find it accurately?
I hope the answer is yes. Why didnt you need the whole image to find it?
->Because that part of the image has enough information to be identified accurately! You dont need the whole image!
However, if I would have given you the following image:
You wouldn't be able to identify the logo, as there are at least 3 or 4 other logos that have yellow on them.
How can I know if a piece of my template is good enough ?
There is not way "for sure" to know if a template will be enough to be uniquely identified, but there is a way to know if there is information in the image or not.
As the template matching relies correlation coefficients and sum of squares coefficients, it means that the more "different" amount of information the template has, the better. This can be approximated with the sum of the image gradients.
Compute the gradients of the template in the X and Y directions and sum them, then sum the result. The bigger that number is, the better the template!**
*Logos are a brilliant example, I have no association with any of these companies.
** This is mathematically proven in http://www.ncbi.nlm.nih.gov/pubmed/18545407

Use Azure Machine learning to detect symbol within an image

4 years ago I posted this question and got a few answers that were unfortunately outside my skill level. I just attended a build tour conference where they spoke about machine learning and this got me thinking of the possibility of using ML as a solution to my problem. i found this on the azure site but i dont think it will help me because its scope is pretty narrow.
Here is what i am trying to achieve:
i have a source image:
and i want to which one of the following symbols (if any) are contained in the image above:
the compare needs to support minor distortion, scaling, color differences, rotation, and brightness differences.
the number of symbols to match will ultimately at least be greater than 100.
is ML a good tool to solve this problem? if so, any starting tips?
As far as I know, Project Oxford (MS Azure CV API) wouldn't be suitable for your task. Their APIs are very focused to Face related tasks (detection, verification, etc), OCR and Image description. And apparently you can't extend their models or train new ones from the existing ones.
However, even though I don't know an out of the box solution for your object detection problem; there are easy enough approaches that you could try and that would give you some start point results.
For instance, here is a naive method you could use:
1) Create your dataset:
This is probably the more tedious step and paradoxically a crucial one. I will assume you have a good amount of images to work with. What would you need to do is to pick a fixed window size and extract positive and negative examples.
If some of the images in your dataset are in different sizes you would need to rescale them to a common size. You don't need to get too crazy about the size, probably 30x30 images would be more than enough. To make things easier I would turn the images to gray scale too.
2) Pick a classification algorithm and train it:
There is an awful amount of classification algorithms out there. But if you are new to machine learning I will pick the one I would understand the most. Keeping that in mind, I would check out logistic regression which give decent results, it's easy enough for starters and have a lot of libraries and tutorials. For instance, this one or this one. At first I would say to focus in a binary classification problem (like if there is an UD logo in the picture or not) and when you master that one you can jump to the multi-class case. There are resources for that too or you can always have several models one per logo and run this recipe for each one separately.
To train your model, you just need to read the images generated in the step 1 and turn them into a vector and label them accordingly. That would be the dataset that will feed your model. If you are using images in gray scale, then each position in the vector would correspond to a pixel value in the range 0-255. Depending on the algorithm you might need to rescale those values to the range [0-1] (this is because some algorithms perform better with values in that range). Notice that rescaling the range in this case is fairly easy (new_value = value/255).
You also need to split your dataset, reserving some examples for training, a subset for validation and another one for testing. Again, there are different ways to do this, but I'm keeping this answer as naive as possible.
3) Perform the detection:
So now let's start the fun part. Given any image you want to run your model and produce coordinates in the picture where there is a logo. There are different ways to do this and I will describe one that probably is not the best nor the more efficient, but it's easier to develop in my opinion.
You are going to scan the picture, extracting the pixels in a "window", rescaling those pixels to the size you selected in step 1 and then feed them to your model.
If the model give you a positive answer then you mark that window in the original image. Since the logo might appear in different scales you need to repeat this process with different window sizes. You also would need to tweak the amount of space between windows.
4) Rinse and repeat:
At the first iteration it's very likely that you will get a lot of false positives. Then you need to take those as negative examples and retrain your model. This would be an iterative process and hopefully on each iteration you will have less and less false positives and fewer false negatives.
Once you are reasonable happy with your solution, you might want to improve it. You might want to try other classification algorithms like SVM or Deep Learning Artificial Neural Networks, or to try better object detection frameworks like Viola-Jones. Also, you will probably need to use crossvalidation to compare all your solutions (you can actually use crossvalidation from the beginning). By this moment I bet you would be confident enough that you would like to use OpenCV or another ready to use framework in which case you will have a fair understanding of what is going on under the hood.
Also you could just disregard all this answer and go for an OpenCV object detection tutorial like this one. Or take another answer from another question like this one. Good luck!

Haar training - where to obtain eyeglasses images?

I want to train a new haar-cascade for glasses as I'm not satisfied with the results I'm getting from the cascade that is included in OpenCV.
My main problem is that I'm not sure where to get eyeglasses images. I can manually search and download, but that's not practical for the amount of images I really need. I'm specifically looking for images of people wearing eyeglasses.
As this forum contain many experienced computer vision experts, I hope someone here can guide as to how to obtain images for training.
I'll also be happy to hear other approaches for detecting eyeglasses (on people).
Thanks in advance,
Gil
If you simply want images, it looks like #herhuyongtao pointed you to a good place. Then you can follow opencv's tutorial on training.
Another option is to see what others have trained:
There's a trained data set found here that might be of use, which states simply that it is "better". I'm assuming that it's supposed to be better than opencv.
I didn't immediately see any other places for trained or labeled data.

Sketch-based Image Retrieval with OpenCV or LIRe

I'm currently reading for BSc Creative Computing with the University of London and I'm in my last year of my studies. The only remaining module I have left in order to complete the degree is the Project.
I'm very interested in the area of content-based image retrieval and my project idea is based on that concept. In a nutshell, my idea is to help novice artists in drawing sketches in perspective with the use of 3D models as references. I intend to achieve this by rendering the side/top/front views of each 3D model in a collection, pre-process these images and index them. While drawing, the user gets a series of models (that have been pre-processed) that best match his/her sketch, which can be used as guidelines to further enhance the sketch. Since this approach relies on 3D models, it is also possible for the user to rotate the sketch in 3D space and continue drawing based on that perspective. Such approach could help comic artists or concept designers in quickly sketching their ideas.
While carrying out my research I came across LIRe and I must say I was really impressed. I've downloaded the LIRe demo v0.9 and I played around with the included sample. I've also developed a small application which automatically downloades, indexes and searches for similar images in order to better understand the inner workings of the engine. Both approaches returned very good results even with a limited set of images (~300).
Next experiment was to test the output response when a sketch rather than an actual image is provided as input. As mentioned earlier, the system should be able to provide a set of matching models based on the user's sketch. This can be achieved by matching the sketch with the rendered images (which are of course then linked to the 3D model). I've tried this approach by comparing several sketches to a small set of images and the results were quite good - see http://claytoncurmi.net/wordpress/?p=17. However when I tried with a different set of images, results weren't as good as the previous scenario. I used the Bag of Visual Words (using SURF) technique provided by LIRe to create and search through the index.
I'm also trying out some sample code that comes with OpenCV (I've never used this library and I'm still finding my way).
So, my questions are;
1..Has anyone tried implementing a sketch-based image retrieval system? If so, how did you go about it?
2..Can LIRe/OpenCV be used for sketch-based image retrieval? If so, how this can be done?
PS. I've read several papers about this subject, however I didn't find any documentation about the actual implementation of such system.
Any help and/or feedback is greatly appreciated.
Regards,
Clayton

Resources