Hello I'm trying to detect characters on this specific image.
Original Image:
After some preprocessing I've finally got this image
Currently I could detect single characters by using SURF feature detection and matching algorithm.
Also with template matching I could detect single characters.
But with this algorithms I cannot really classify my matches to detect which character is what.
After some research and test I decided to use hog descriptors to extract and then train those descriptors with svm to classify.
When I crop those chars for surf and template matching I manually cropped those chars in photshop so sizes are not equal to each other but Hog algorithm uses a specific width * height. so I have to have every characters with same size.
So my question is since I am going to use those method only for the original image can I manually crop and resize every char for HOG ? or do I have to first detect each character extract them and then prepare those extracted characters for Hog? If the second one what kind of methods would you suggest me to use?
Related
I want to find the occurrence of a particular word in any webpage given as a input.
I used Pyramid-Sliding window , where I generated HOG(Histogram of Gradients) features for all the sliding windows.
For now , I am comparing the HOG features of all windows with the HOG features of the word I want to extract.
For comparison of the two HOG feature vectors, I am just taking summation(vector1(i) - vector2(i)) for all i.
However, the results are below expectations.
My query is that can there be a better comparison system for comparing the HOG-features of each window with that of the word I want to find.
Or should I train a classifier like SVM , to classify the HOG-features of a window.
For training the classifier, I can have max 100-200 elements for the word I want to find in my data-set. And since for SVM , its better to have equal number of true and false data elements in the data-set , how to restrict the non word representations(false elements) to 100-200.
For non-word data elements in the training set, I have :
1. ICDAR-2003 (this word data-set do not contain the word I want to extract)
2. CIFAR image data set
The reason I am not extracting/finding this word in the html code, is because the word can occur in an image also.
Moreover, since the word I want to find is fixed, how many images of the word should I have in the data-set.
If you have fixed font and looking only for particular word, here is simple workaround:
https://stackoverflow.com/a/9647509/8682088
You have to extract word box, resize it to for example 40x10 pixels. Grayscale pixel values could be your feature vector. Then you could train your SVM. It is primitive, but suprisingly effective.
It work perfectly fine with fixed font and simple symbols.
I develop an image recognition algorithm that helps to find characters on dirty pannels from the real world. Actually the image is a car registering plate containing letters, digits and a mud.
The algorithm must classify characters into two classes: alphabet characters and digits. Is it possible to train LBP or Haar cascade to discriminate between the two classes, will be training result stable due to digits shape variety?
Could you explain briefly or recommend better method, please?
"The algorithm must classify characters into two classes: alphabet characters and digits.” - you forgot mud and background though technically you can add them to a broad category “other”. Haars cascades are used for something like face detection since they typically approximate wavelets on the middle spatial scale where faces have characteristic features. Your problem is different.You need to first understand your problem structure, read the literature and only then try to use a sheer force of learning algorithms. This book actually talks a bit about people starting to think about method first instead of analyzing the problem which is not always a good idea.
Technically you first need to find the text in the image which can be more challenging than recognizing it given the current state of art OCR that is typically used as a library rather than created from scratch. To find text in the image I suggest first do adaptive thresholding to create a binary map (1-foreground that is letters and numbers and 0 is background), then perform connected components on the foreground coupled with SWT (stroke width transform) http://research.microsoft.com/pubs/149305/1509.pdf
I'm trying to create a simpler OCR enginge by using openCV. I have this image: https://dl.dropbox.com/u/63179/opencv/test-image.png
I have saved all possible characters as images and trying to detect this images in input image.
From here I need to identify the code. I have been trying matchTemplate and FAST detection. Both seem to fail (or more likely: I'm doing something wrong).
When I used the matchTemplate method I found the edges of both the input image and the reference images using Sobel. This provide a working result but the accuracy is not good enough.
When using the FAST method it seems like I cant get any interresting descriptions from the cvExtractSURF method.
Any recomendations on the best way to be able to read this kind of code?
UPDATE 1 (2012-03-20)
I have had some progress. I'm trying to find the bounding rects of the characters but the matrix font is killing me. See the samples below:
My font: https://dl.dropbox.com/u/63179/opencv/IMG_0873.PNG
My font filled in: https://dl.dropbox.com/u/63179/opencv/IMG_0875.PNG
Other font: https://dl.dropbox.com/u/63179/opencv/IMG_0874.PNG
As seen in the samples I find the bounding rects for a less complex font and if I can fill in the space between the dots in my font it also works. Is there a way to achieve this with opencv? If I can find the bounding box of each character it would be much more simple to recognize the character.
Any ideas?
Update 2 (2013-03-21)
Ok, I had some luck with finding the bounding boxes. See image:
https://dl.dropbox.com/u/63179/opencv/IMG_0891.PNG
I'm not sure where to go from here. I tried to use matchTemplate template but I guess that is not a good option in this case? I guess that is better when searching for the exact match in a bigger picture?
I tried to use surf but when I try to extract the descriptors with cvExtractSURF for each bounding box I get 0 descriptors... Any ideas?
What method would be most appropriate to use to be able to match the bounding box against a reference image?
You're going the hard way with FASt+SURF, because they were not designed for this task.
In particular, FAST detects corner-like features that are ubiquituous iin structure-from-motion but far less present in OCR.
Two suggestions:
maybe build a feature vector from the number and locations of FAST keypoints, I think that oyu can rapidly check if these features are dsicriminant enough, and if yes train a classifier from that
(the one I would choose myself) partition your image samples into smaller squares. Compute only the decsriptor of SURF for each square and concatenate all of them to form the feature vector for a given sample. Then train a classifier with these feature vectors.
Note that option 2 works with any descriptor that you can find in OpenCV (SIFT, SURF, FREAK...).
Answer to update 1
Here is a little trick that senior people taught me when I started.
On your image with the dots, you can project your binarized data to the horizontal and vertical axes.
By searching for holes (disconnections) in the projected patterns, you are likely to recover almost all the boudnig boxes in your example.
Answer to update 2
At this point, you're back the my initial answer: SURF will be of no good here.
Instead, a standard way is to binarize each bounding box (to 0 - 1 depending on background/letter), normalize the bounding boxes to a standard size, and train a classifier from here.
There are several tutorials and blog posts on the web about how to do digit recognition using neural networks or SVM's, you just have to replace digits by your letters.
Your work is almost done! Training and using a classifier is tedious but straightforward.
Iam a beginner in image mining. I would like to know the minimum dimension required for effective classification of textured images. As what i feel if a image is too small feature extraction step will not extract enough features. And if the image size goes beyond a certain dimension the processing time will increase exponentially with image size.
This is a complex question that requires a bit of thinking.
Short answer: It depends.
Long answer: It depends on the type of texture you want to classify and the type of feature your classification is based on. If the feature extracted is, say, color only, you can use "texture" as small as 1x1 pixel (in that case, using the word "texture" is a bit of an abuse). If you want to classify, say for example characters, you can usually extract a lot of local information from edges (Hough transform, Gabor filters, etc). The image plane just have to be big enough to hold the characters (say 16x16 pixels for Latin alphabet).
If you want to be able to classify any kind of images in any kind of number, you can also base your classification on global information, like entropy, correlogram, energy, inertia, cluster shade, cluster prominence, color and correlation. Those features are used for content based image retrieval.
From the top of my head, I would try using texture as small as 32x32 pixels if the kind of texture you are using is a priori unknown. If on the contrary the kind of texture is a priori known, I would choose one or more feature that I know would classify the images according to my needs (1x1 pixel for color-only, 16x16 pixels for characters, etc). Again, it really depends on what you are trying to achieve. There isn't a unique answer to your question.
As we know Fourier Transform is sensitive to noises(like salt and peppers),
how can it still be used for image recognization?
Is there a FT expert here?
Update to actually answer the question you asked... :) Pre-process the image with a non-linear filter to suppress the salt & pepper noise. Median filter maybe?
Basic lesson on FFTs on matched filters follows...
The classic way of detecting a smaller image within a larger image is the matched filter. Essentially, this involves doing a cross correlation of the larger image with the smaller image (the thing you're trying to recognize).
For every position in the larger image
Overlay the smaller image on the larger image
Multiply all corresponding pixels
Sum the results
Put that sum in this position in the filtered image
The matched filter is optimal where the only noise in the larger image is white noise.
This IS computationally slow, but it can be decomposed into FFT (fast Fourier transform) operations, which are much more efficient. There are much more sophisticated approaches to image matching that tolerate other types of noise much better than the matched filter does. But few are as efficient as the matched filter implemented using FFTs.
Google "matched filter", "cross correlation" and "convolution filter" for more.
For example, here's one brief explanation that also points out the drawbacks of this very oldschool image matching approach: http://www.dspguide.com/ch24/6.htm
Not sure exactly what you're asking. If you are asking about how FFT can be used for image recognition, here are some thoughts.
FFT can be used to perform image "classification". It can't be used to recognize different faces or objects, but it can be used to classify the type of image. FFT calculates the spacial frequency content of the image. So for example, natural scene, face, city scene, etc. will have different FFTs. Therefore you can classify image or even within image (e.g. aerial photo to classify terrain).
Also, FFT is used in pre-processing for image recognition. It can be used for OCR (optical character recognition) to rotate the scanned image into correct orientation. FFT of typed text has a strong orientation. Same thing for parts inspection in industrial automation.
I don't think you'll find many methods in use that rely on Fourier Transforms for image recognition.
In the case of salt and pepper noise, it can be considered high frequency noise, and thus you could low pass filter your FFT before making a comparison with the target image.
I would imagine that it would work, but that different images that are somewhat similar (like both are photographs taken outside) would register as being the same image.