Our project is all about OCR and base on my research, before performing the character recognition it will go through on pre-processing stage. I know we can use openCV for that but we can't use it base on our rules.
My question is, can someone tells me the step-by-step of pre processing and the best method/algorithm to use.
like what I know,
1.YUVluminace
2.greyscale
3.otsu thresholding
4.Binarization
5.Hough transform
Original Image> YUVluminace> greyscale what's next??
thanks!
In some of my older blog posts, I addressed some parts of your questions:
Binarization on various image qualities from mobile cameras:
http://www.ocr-it.com/guide-to-better-mobile-images-from-cell-phone-camera-for-higher-quality-ocr
Image pre-rpocessing and segmentation for better OCR:
http://www.ocr-it.com/user-scenario-process-digital-camera-pictures-and-ocr-to-extract-specific-numbers
In reality, there is no step-by-step, per my experience. You could use original image for OCR if you wanted to, with means no pre-processing is nessesary. Yes, pre-processing will help, but it depends on the source and type of your images (which you did not specify). For example, a typical office document scanned on a professional scanner with Kofax VRS requires no pre-processing before OCR. Mobile camera image requires a lot of pre-processing. Picture from a parking garage camera will require a lot of pre-processing, but different steps and algorithms from mobile camera picture.
I think decide what is the next major limiting factor in your images, pre-process against it, then look for the next correctable issue.
Related
I'm doing a business card scanner for my final examination about digital image processing, and I would ask to you how I have to preprocess a photo of a business card so tesseract can recognize text. I tried a lot of things, like erosion, dilation, thresholding, but I can't have a good result... Can you help me?
Thank you
Marco
If your concern is only about text recognition and not about preprocessing, consider using ScanTailor. It is an excellent pre-processing tool and it is open source.
If you want to implement the pre-processing yourself, you might want to have a look at this paper - especially the skew correction and the background estimation.The results of the algorithms described here are good. ScanTailor uses some of these.
I would recommend the open source C++ image processing library OpenCV combination with the open source free Optical Character Recognition (OCR) library tesseract.
Since your information of your problem isn't quite specific, i can answer your question in general
The main procedure in OCR is:
perform some kind of preprocessing on the image
text detection to get your ROI (Region of interest, the region containing your text)
character detection (take the text-only image and use it as input for tesseract
a few words about tesseract:
There is a lot of information to the library available online. It is a google open source library used for the google books OCR purpose. Can also handle layout analyzes in your image, but isn't perfect in this, therefore a preprocessing yourself and using tesseract only for the real character recognition part can lead to a better result. Feel free to question, if you still have questions, or if I missunderstood your question.
can someone tell me how i can detect pictures of architecture or sculpture?
I think hough-transforming is a good approach. But i'm new in CV and maybe there a better methods to detect pattern. I heard about haarcascade. can i take this for architecture,too?
For example i want to detect those kind of pictures:
Image Hosted by ImageShack.us http://img842.imageshack.us/img842/4748/resizeimg0931.jpg
If you want an algorithm to detect them, then detecting an object from an image need a description of that object which can be understood by a machine or computer. For a sculpture or architecture, how can you have such uniform definition since they vary a lot in every sense? For example both your input images vary a lot. How can we differentiate between a house and an architecture? A lot of problems will rise in your question. Even with Hough Transforming, how you are supposed to differentiate a big house and a big architecture?
Check out this SOF : Image Processing: Algorithm Improvement for 'Coca-Cola Can' Recognition
He wants to detect coca-cola cans, and not coca-cola bottles. But if you look into it clearly, you will understand can and bottles are almost alike and it will be difficult to differentiate between them. You can find a lot of its difficulties in subsequent answers. Major problem is that, in some cases, it will be difficult for humans as well to differentiate them.
In your second image, even if you train some cascades for second image, there is a change it will detect live lions if they are present in your image, since a sculpture lion and an original lion seems almost same for a machine.
Haar cascades may not be much effective since you have to train for a lot of these kinds of images.
If you have some sample images and want to check if those things are there in your image, may be you can use SURF features etc. But you may need some sample images first to compare. For a demo of SURF, check out this SOF : OpenCV 2.4.1 - computing SURF descriptors in Python
Another option is template matching. But it is slow, and it is not scale and orientation invariant. And you need some template images for this
I think I have seen some papers relating this topic ( but i don't remember now). May be googling will get you them. I will update the answer if I get it.
what is the best approach to identify a pattern (could be a text,signature, logo. NOT faces,objects,people,etc) in an image, given that all images are taken from the same angle, which means the pattern to identify will be ALWAYS visible at the same angle, but not position / size/ quality / brightness, etc.
Assuming I have the logo, I would like to run a test on 1000 images, from different sizes & quality and get those images that have this pattern embedded or at least a high probability to have this pattern embedded.
Thanks,
Perhaps you can show a couple of images but it seems like template matching (perhaps with a distance transform) seems like an ideal candidate to your problem.
Perl? I'd have suggested using OpenCV with python or C since you're on the Linux platform.
You could check out SURF and SIFT (explains how to do this with OpenCV and C++ with code attached) which can do decent template matching (logos, etc.).
Text detection is a different kettle of fish, I'd suggest Robust Text Detection in Natural Images with Edge-enhanced maximally stable extremal regions paper which is the latest I've seen that does robust text detection from natural scenes without becoming overly intricate.
Training a neural network with the expected patterns seems to be the best way all-round, though the training process will take a long time. Actual identification is almost real-time though.
Here's a discussion on MSER implementation in two libraries: a) OpenCV, b) VLfeat
Have you checked AForgenet.com ? It has great libs for blob processing. Its in .NET
I'm using OpenCV on the iPhone and need to detect numbers in an image. I split the image into smaller images so each image has only one number (1-9). All numbers are printed, NOT handwritten.
What would be the best approach to figure out the numbers with OpenCV?
UPDATE:
I have successfully found the numbers and extracted them. They look like this:
http://img198.imageshack.us/img198/5671/101ht.jpg
http://img824.imageshack.us/img824/539/606yu.jpg
When they are extracted they are in the same size and so on. I have saved a bunch of images and put them in a OCR dir where they are categorized into numbers. Like: ocr/1/100.jpg 101.jpg.... and ocr/2/200.jpg 201.jpg....
Then I was going to use the same approach as in the Basic OCR tutorial:http://blog.damiles.com/?p=93
However, I'm programming for iPhone and can't use C++ code (error on compiling and so on) and I don't have access to highgui.
I tried using cvMatchTemplate() and match a bunch of images but it seems to work pretty bad...
Any other ideas I can try?
You could start by reading about Principal Component Analysis (PCA), Fisher's Linear Discriminant Analysis (LDA), and Support Vector Machines (SVMs). These are classification methods that are extremely useful for OCR, and there are libraries in any language including C++, Python, C# etc.
It turns out that OpenCV already includes excellent implementations on PCAs and SVMs[dead link]. I haven't seen any OpenCV code examples for OCR, but you can use some modified version of face classification to perform character classification. An excellent resource for face recognition code for OpenCV is this website[dead link].
If the numbers are printed, the job is quite simple, you just need to figure out a nice set of features to match. If the numbers are one font, you can get away with this approach:
Extract the number
Find the bounding box
Scale the image down to something like 10x8, try to match the aspect ratio
Do this for a small training set, take the 'average' image for each number
For new images, follow the steps above, but the last is just a absolute image difference with each of the number-templates. Then take the sum of the differences (pixels in the difference image). The one with the minimum is your number.
All above are basic OpenCV operations.
Basically your problem is just to classify a feature vector, which is the set of pixel intensities after some preprocessing steps. You can use any classifier for this task, like eg. neural networks, which should have a C implementation inside OpenCV. You might also try a C libsvm library for Support Vector Machines.
There is a good site related to this problem with a lot of papers and a training database.
Maybe the most simple and convinient way is to use svm as ml algorithm
http://opencv.willowgarage.com/documentation/cpp/support_vector_machines.html
and gray images as feature vectors.
Objective C++?
Try renaming your .m files to .mm and you can then use c++ in your iPhone project.
Convolution Neural Networks are by far the best algorithms for hand written digits. The are implemented in most systems like USPS etc. Here are few papers explaining the algorithms.
http://yann.lecun.com/exdb/lenet/
This is a nice open source ,It is a ORCDemo on iPhone.Hope it is useful to you
Simple Digit Recognition OCR in OpenCV-Python
This might help you out. Converting the code from Python to C++ is not a difficult task, since OpenCV API's are same for the both.
Tesseract is also a nice free OCR engine that is readily available for iPhone and allows you to use your own sets of training images:
http://tinsuke.wordpress.com/2011/11/01/how-to-compile-and-use-tesseract-3-01-on-ios-sdk-5/
HOG + SVM (Try to play with kernels)
I have decided to spend my personal time after office hours to learn the building blocks of how images jpeg type are parsed and represented in screen. My interest is on object recognition in an image.so I want to know where to start , I know there are math involved in this.so I needed step by step on what resources in Internet specifically to look at.
Need a lot more information on what you want, but take a look at OpenCV
http://sourceforge.net/projects/opencvlibrary/
To see good examples.
I'd get Ritter's book (warning: costly!) and give it serious studying. If you just want to grab existing code and go play then perhaps you should look at libraries like OpenCV (see Lou's answer).
The ultimate goal of most image processing is to extract information about some high-level and application-dependent objects from an image available in low-level (pixel) form. The objects may be of every day interest like in robotics, cosmic ray showers or particle tracks like in physics, chromosomes like in biology, houses, roads, or differently used agricultural surfaces like in aerial photography or synthetic-aperture radar, etc.
This task of pattern recognition is usually preceded by multiple steps of image restoration and enhancement, image segmentation, or feature extraction, steps which can be described in general terms. The final description in problem-dependent terms, and even more so the eventual image reconstruction, escapes such generality, and the literature of application areas has to be consulted.