I have photos/scans of documents in few 10s of known formats. Every document contains some known attributes (date/time, names, and list of items).
Can you please suggest which apps/libs to start with (in Objective)C/C++)? Can OpenCV do that? What about OCR? Layout recognition for OCR?
Thanks!
P.S. Please suggest how to rephrase my post
P.P.S. I have found some promising tool (with examples for iOS): https://code.google.com/p/tesseract-ocr/ and https://github.com/robmathews/OCR-iOS-Example
To detect where the text is on the page I would recommend using OpenCV to do that, then send the regions of text to tesseract.
Find text:
Erode Image
Find Contours
Get bounding boxes of contours
Those bounding boxes should contain text or logo/picture.
Related
I'm trying to detect an object composed of other objects. Actualy, there are three circles in my binary image which shape up a triangle as shown here:
These circles are correctly detected, but only as single objects as shown here:
What I need to have is an aggregation or composition of these objects, so they get detected as one big object as shown here:
The bigger goal is to get the image moments to get the rotation and scale of the shape. Please share your ideas or code if you have any, it would be well appreciated.
I would suggest using the bounding box functions of opencv
Here is a link to an example of bounding box in C++ OpenCV, however if you are using something like Python, it might be worth your while looking at this link, which is a full set of tutorials for working with binary images and contours (including bounding box/elipse)
Again if you are using the Python port, look at this set of tutorials, they really are great and have a massive supply of information on most functions of OpenCV.
Hope this helps.
Good luck.
Your question is very similar to this question, which has answers with code examples. Alternatively check the documentation of OpenCV. If you are interested in the convex hull of your points, see cv::convexHull().
I am trying to develop a character matching application which will take an image from a camera and match it with a provided image template. So far I have tried matchShapes of contours which is not working correctly on characters, it's working fine for simple shapes. I tried using matchTemplate but that's also not working correctly if I change size, font or rotation of character written in image captured from camera and try matching it with template image.
I am now thinking I need to do feature extraction after segmenting the camera image in sets and compare these sets with a feature set of reference images. Can anyone please give me a starting off direction or suggestion.
For example, this is an image from camera
and I need to find a template image
I must stress that I am no expert at optical character recognition so please do a thorough research on your end as well. Following are two links that might help you achieve your goal using character feature sets:
http://blog.damiles.com/2008/11/basic-ocr-in-opencv/
http://opencv-python-tutroals.readthedocs.org/en/latest/py_tutorials/py_ml/py_knn/py_knn_opencv/py_knn_opencv.html
I have an image for mobile phone credit recharge card and I want to extract the recharge number only (the gray area) as a sequence of number that can be used to recharge the phone directly
This is a sample photo only and cannot be considered as standard, thus the rectangle area may differ in position , in the background and the card also may differ in size .The scratch area may not be fully scratched , the camera's depth and position may differ too . I read a lots and lots of papers on the internet but i can't find any thing that could be interesting and most of papers discuss detection of handwritten numbers .
Any links or algorithms names could be very useful .
You can search the papers on vehicle plate number detection with machine learning methods. Basically you need to extract the number first, you may use sobel filter to extract the vertical edges , then threshold (binary image) and morphologic operations (remove blank spaces between each vertical edge line, and connect all regions that have a high number of edges). Finally retrieve the contour and fill in the connected components with mask.
After you extract the numbers , you can use machine learning method such as neural network and svm to recognize them.
Hope it helps.
Extract the GRAY part from image and then Use Tesseract(OCR) to extract the text written on the gray image.
I think you may not find the algorithm to read from the image on the internet. Nobody will disclose that. I think, if you are a hardcore programmer you can crack that using your own code. I tried from the screenshots where the fonts were clearer and the algorithm was simple. For this, the algorithm should be complex since you are reading from photo source instead of a screenshot.
Follow the following steps:
Load the image.
Select the digits ( By contour finding and applying constraints on area and height of letters to avoid false detections). This will split the image and thus modularise the OCR operation you want to perform.
A simple K - nearest neighbour algorithm for performing the identification and classification.
If the end goal was just to make a bot, you could probably pull the text directly from the app rather than worrying about OCR, but if you want to learn more about machine learning and you haven't done them already the MNIST and CIFAR-10 datasets are fantastic places to start.
If you preprocessed your image so that yellow pixels are black and all others are white you would have a much cleaner source to work with.
If you want to push forward with Tesseract for this and the preprocessing isn't enough then you will probably have to retrain it for this font. You will need to prepare a corpus, process it similarly to how you expect your source data to look, and then use something like qt-box-editor to correct the data. This guide should be able to walk you through the basic steps of retraining.
I was watching this talk from pycon http://youtu.be/B1d9dpqBDVA?t=15m34s around the 15:33 mark the speaker talks about extracting lines from an image (receipt) and then feeding that to the OCR engine so that text can be extracted in a better way.
I have a similar need where I'm passing images to the OCR engine. However, I don't quite understand what he means by extracting lines from an image. What are some open source tools that I can use to extract lines from an image?
Take a look at the technique used to detect the skew angle of a text.
Groups are lines are used to isolate text on an image (this is the interesting part).
From this result you can easily detect the upper/lower limits of each line of text. The text itself will be located inside them. I've faced a similar problem before, the code might be useful to you:
All you need to do from here is crop each pair of lines and feed that as an image to Tesseract.
i can tell u a simple technique to feed the images to OCR.. just perform some operations to get the ROI (Region of Interest) of ur image, and localize the area where the image after binarizing it.. then you may find contours, and by keeping the threasholding value, and setting the required contour area, you can feed the resulting image to OCR :) ..
(sorry for bad way of explaination)
Direct answer: you extract lines from an image with Hough Transform.
You can find an analytical guide here.
Text lines can be detected as well. Karlphillip's answer is based on Hough Transform too.
I'm trying to create a simpler OCR enginge by using openCV. I have this image: https://dl.dropbox.com/u/63179/opencv/test-image.png
I have saved all possible characters as images and trying to detect this images in input image.
From here I need to identify the code. I have been trying matchTemplate and FAST detection. Both seem to fail (or more likely: I'm doing something wrong).
When I used the matchTemplate method I found the edges of both the input image and the reference images using Sobel. This provide a working result but the accuracy is not good enough.
When using the FAST method it seems like I cant get any interresting descriptions from the cvExtractSURF method.
Any recomendations on the best way to be able to read this kind of code?
UPDATE 1 (2012-03-20)
I have had some progress. I'm trying to find the bounding rects of the characters but the matrix font is killing me. See the samples below:
My font: https://dl.dropbox.com/u/63179/opencv/IMG_0873.PNG
My font filled in: https://dl.dropbox.com/u/63179/opencv/IMG_0875.PNG
Other font: https://dl.dropbox.com/u/63179/opencv/IMG_0874.PNG
As seen in the samples I find the bounding rects for a less complex font and if I can fill in the space between the dots in my font it also works. Is there a way to achieve this with opencv? If I can find the bounding box of each character it would be much more simple to recognize the character.
Any ideas?
Update 2 (2013-03-21)
Ok, I had some luck with finding the bounding boxes. See image:
https://dl.dropbox.com/u/63179/opencv/IMG_0891.PNG
I'm not sure where to go from here. I tried to use matchTemplate template but I guess that is not a good option in this case? I guess that is better when searching for the exact match in a bigger picture?
I tried to use surf but when I try to extract the descriptors with cvExtractSURF for each bounding box I get 0 descriptors... Any ideas?
What method would be most appropriate to use to be able to match the bounding box against a reference image?
You're going the hard way with FASt+SURF, because they were not designed for this task.
In particular, FAST detects corner-like features that are ubiquituous iin structure-from-motion but far less present in OCR.
Two suggestions:
maybe build a feature vector from the number and locations of FAST keypoints, I think that oyu can rapidly check if these features are dsicriminant enough, and if yes train a classifier from that
(the one I would choose myself) partition your image samples into smaller squares. Compute only the decsriptor of SURF for each square and concatenate all of them to form the feature vector for a given sample. Then train a classifier with these feature vectors.
Note that option 2 works with any descriptor that you can find in OpenCV (SIFT, SURF, FREAK...).
Answer to update 1
Here is a little trick that senior people taught me when I started.
On your image with the dots, you can project your binarized data to the horizontal and vertical axes.
By searching for holes (disconnections) in the projected patterns, you are likely to recover almost all the boudnig boxes in your example.
Answer to update 2
At this point, you're back the my initial answer: SURF will be of no good here.
Instead, a standard way is to binarize each bounding box (to 0 - 1 depending on background/letter), normalize the bounding boxes to a standard size, and train a classifier from here.
There are several tutorials and blog posts on the web about how to do digit recognition using neural networks or SVM's, you just have to replace digits by your letters.
Your work is almost done! Training and using a classifier is tedious but straightforward.