I had implemented tesseract ocr for text recognition in IOS.I had preprocessed the input image and give into Tesseract method.It gives poor recognition result.
1.Erode function
2.Dilate function
3.Bitwise_not function
Mat MCRregion;
cv::dilate ( MCRregion, MCRregion, 24);
cv::erode ( MCRregion, MCRregion, 24);
cv::bitwise_not(MCRregion, MCRregion);
UIImage * croppedMCRregion = [self UIImageFromCVMat:MCRregion];
Tesseract* tesseract = [[Tesseract alloc] initWithDataPath:#"tessdata" language:#"eng"];
[tesseract setVariableValue:#"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz.>,'`;-:</" forKey:#"tessedit_char_whitelist"];
[tesseract setImage:[self UIImageFromCVMat:MCRregion]];
// [tesseract setImage:image];
[tesseract recognize];
NSLog(#"%#", [tesseract recognizedText]);
Input Image:
Image Link
1.How to Improve text recognition rate using Tesseract ?
2.Is any other pre processing steps applied in Tesseract.?
3.Is dewarp text Done in Tesseract OCR.?
Tesseract is a highly configurable piece of software -- though its configurations are poorly documented (unless you want to dig deep in the 150K lines of code). A good comprehensive list is present here http://www.sk-spell.sk.cx/tesseract-ocr-parameters-in-302-version.
Also look at
https://code.google.com/p/tesseract-ocr/wiki/ControlParams and https://code.google.com/p/tesseract-ocr/wiki/ImproveQuality
You can improve the quality tremendously if you feed more info about the data you're OCR'ing.
e.g. in case the images are all National IDs or Passports which follow certain standard MRZ formats, you can configure tesseract to use that info.
For the image you attach (an MRZ), i got the following result,
by using the following config
# disable dict, freq tables etc which would distract OCR'ing an MRZ
load_system_dawg F
load_freq_dawg F
load_unambig_dawg F
load_punc_dawg F
load_number_dawg F
load_fixed_length_dawgs F
load_bigram_dawg F
wordrec_enable_assoc F
# mrz allows only these chars
tessedit_char_whitelist 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ<
Also see that your installation is trained for the fonts to get more accurate results. In your case it seems it is OCR-B font.
It is not necessary to go through the tedious task of retraining Tesseract. Yes, you will get much better results but in some cases you can get pretty far with the ENG training set.
You can improve your results by paying attention to the following things:
Use a binary image as input and make sure you have black text on a white background
By default Tesseract will try to make words out of things that have no spacing. Try to segment each character seperately and place them in a new image with lots of spacing. Especially if you have combinations of letters and numbers Tesseract will "correct" this to match the surrounding characters.
Try to segment different parts of your image with a whitelist for the characters you know should be in there. If your only looking for digits in the first part then use a seperate instance of Tesseract to detect these numbers with a number only whitelist.
If you use the same object multiple times without resetting it Tesseract seems to have a memory. That means that you can get a different result each time you perform OCR. You can reset Tesseract to counter this or just create a new object.
Last but not least, use the resultIterator to go through the boxes that Tesseract can give as a result. You can check the size and confidence of each character and filter accordingly.
Based on my experience:
1.How to Improve text recognition rate using Tesseract ?
Firstly, preprocessing. Ensure that the input image is binary image with a good threshold. OpenCV has a good set of functions to apply threshold algorithms such as the Otsu algorithm as well as contour detection to help with warping and rotation.
You can also use contour detection in OpenCV to distinguish between lines of text.
Some filtering would also remove noise which often confuse tesseract and increase processing time.
Set up proper configurations for tesseract (e.g. eng.config). Full list of configs here (http://www.sk-spell.sk.cx/tesseract-ocr-parameters-in-302-version). Some examples include blacklists, whitelists, chopping, etc...
Use proper flags. E.g. -psm 6 if you are doing blocks of text rather than lines
Having trained my own language data... I would say do so only if you have lots of time and resources. Or if your font is very peculiar (e.g. dot matrix).
More recent versions of Tesseract (closer to 3.0) allow for multiple language files to be used on the same pass (-l one+two). This means you can have one specially trained for text and another for numbers. In our case, it seemed to work well.
Postprocessing of tesseract results was particularly important for us too. String replacements of typical mis-recognitions and what not.
2.Is any other pre processing steps applied in Tesseract.?
Tesseract uses leptonica library for preprocessing.
3.Is dewarp text Done in Tesseract OCR.?
I am inclined to think yes. Considering that warping functions are part of leptonica.
I'm learning Image processing. My problem is Segmentation in RGB vector space. How to describe the Euclidean distance(Fomula 6.7-1 chap6 in Image processing- Gonzalez) to segment RGB in C programming. Thanks.
Minimum requirements to solve this problem:
1) learn C (to at least an medium-advanced level)
presuming that you're not going to decode jpegs or whatever image format you have from scratch:
2) learn how to use libraries in C.
3) find a library that allows you to read and write the image file format at hand
4) implement the algorithm and apply it to the image data
I need to recognize images with hand-written numerals with known values. Physical objects with the number are always identical but come in slight variations of positions/scale/lighting. They are about 100 in number, having about 100x500 px in size.
In the first pass, the code should "learn" possible inputs, and then recognize them (classify them as being close to one of the "training" images) when they come again.
I was mostly following the Feature Matching Python-OpenCV tutorial
Input images are analyzed first, keypoints & descriptors are remembered in the orbTrained list:
import cv2
import collections
for img in trainingImgs:
A typical result of this first stage looks like this:
Then in the next loop, for each real input image, cycle through all training images to see which is matching the best:
for img in inputImgs:
for train in orbTrained:
dist=sum([m_.distance for m_ in m])
# sort matching images based on score
mm.sort(key=lambda m: m.dist)
print([m.dist for m in mm[:5]])
best.match.sort(key=lambda x:x.distance) # sort matches in the best match
The result I get is nonsensical, and consistently so (only when I run with pixel-identical input, the result is correct):
What is the problem? Am I completely misunderstanding what to do, or do I just need to tune some parameters?
First, are you sure you're not reinventing the wheel by creating your own OCR library? There are many free frameworks, some of which support training with custom character sets.
Second, you should understand what feature matching is. It will find similar small areas, but isn't aware of other feature pairs. It will match similar corners of characters, not the characters itself. You might experiment with larger patchSize so that it covers at least half of the digit.
You can minimize false pairs by running feature detection only on a single digit at a time using thresholding and contours to find character bounds.
If the text isn't rotated, using roation-invariant feature descriptor such as ORB isn't the best option, try rotation-variant descriptor, such as FAST.
According to authors of papers (ORB and ORB-SLAM) ORB is invariant to rotation and scale "in a certain range". May be you should first match for small scale or rotation change.
Am making a licence plate recognition software, I already trained my language using SunnyPage 2.7, currently the detection is good except Tesseract is not giving me good results. For example it reads This plate as AC2 4529 well thats good except when I load the same image in SunnyPage with my language I get ACZ 4529 which is correct, I ended up configuring Tesseract to tess.setPageSegMode(10) single character mode segmenting the individual characters and processing each character one by one in Tesseract, that increased accuracy but not as much, below is my Tesseract configuration
Tesseract instance = new Tesseract(); //
instance.setTessVariable("tessedit_char_whitelist", "ACPBZRT960847152");
instance.setTessVariable("load_system_dawg", "false");
instance.setTessVariable("load_freq_dawg", "false");
Anyone know How I can get results as good as SunnyPage? as far as I know my image is good, it is skewed and well segmented so it is most likely to do with Tesseract alone.
The best thing to do would be to train tesseract with actual images of your license plates. This will make your results much more accurate because tesseract will actually know what a Z and 2 look like and it will recognize them much more accurately.
I need to improve image quality, from low quality to high hd quality. I am using OpenCV libraries. I experimented a lot with GaussianBlur(), Laplacian(), transformation functions, filter functions etc, but all I could succeed is to convert image to hd resolution and keep the same quality. Is it possible to do this? Do I need to implement my own algorithm or is there a way how it's done? I will really appreciate any kind of help. Thanks in advance.
I used this link for my reference. It has other interesting filters that you can play with.
If you are using C++:
detailEnhance(Mat src, Mat dst, float sigma_s=10, float sigma_r=0.15f)
If you are using python:
dst = cv2.detailEnhance(src, sigma_s=10, sigma_r=0.15)
The variable 'sigma_s' determines how big the neighbourhood of pixels must be to perform filtering.
The variable 'sigma_r' determines how the different colours within the neighbourhood of pixels will be averaged with each other. Its range is from: 0 - 1. A smaller value means similar colors will be averaged out while different colors remain as they are.
Since you are looking for sharpness in the image, I would suggest you keep the kernel as minimum as possible.
Here is the result I obtained for a sample image:
1. Original image:
2. Sharpened image for lower sigma_r value:
3. Sharpened image for higher sigma_r value:
Check the above mentioned link for more information.
How about applying Super Resolution in OpenCV? A reference article with more details can be found here: https://learnopencv.com/super-resolution-in-opencv/
So basically you will need to have the Python dependency opencv-contrib-python installed, together with a working version of opencv-python.
There are different techniques for the Super Resolution in OpenCV you can choose from, including EDSR, ESPCN, FSRCNN, and LapSRN. Code examples in both Python and C++ have been included in the tutorial article as well for easy reference.
A correction is needed
dst = cv2.detailEnhance(src, sigma_s=10, sigma_r=0.15)
using kernel will give error.
+1 to kris stern answer,
If you are looking for practical implementation of super resolution using pretrained model in OpenCV, have a look at below notebook also video describing details.
Below is a sample code using opencv
model_pretrained = cv2.dnn_superres.DnnSuperResImpl_create()
# setting up the model initialization
model_pretrained.setModel(modelname, scale)
# prediction or upscaling
img_upscaled = model_pretrained.upsample(img_small)
I need a simple and fast way to compare two images for similarity. I.e. I want to get a high value if they contain exactly the same thing but may have some slightly different background and may be moved / resized by a few pixel.
(More concrete, if that matters: The one picture is an icon and the other picture is a subarea of a screenshot and I want to know if that subarea is exactly the icon or not.)
I have OpenCV at hand but I am still not that used to it.
One possibility I thought about so far: Divide both pictures into 10x10 cells and for each of those 100 cells, compare the color histogram. Then I can set some made up threshold value and if the value I get is above that threshold, I assume that they are similar.
I haven't tried it yet how well that works but I guess it would be good enough. The images are already pretty much similar (in my use case), so I can use a pretty high threshold value.
I guess there are dozens of other possible solutions for this which would work more or less (as the task itself is quite simple as I only want to detect similarity if they are really very similar). What would you suggest?
There are a few very related / similar questions about obtaining a signature/fingerprint/hash from an image:
OpenCV / SURF How to generate a image hash / fingerprint / signature out of the descriptors?
Image fingerprint to compare similarity of many images
Near-Duplicate Image Detection
OpenCV: Fingerprint Image and Compare Against Database.
more, more, more, more, more, more, more
Also, I stumbled upon these implementations which have such functions to obtain a fingerprint:
imgSeek (GitHub repo) (GPL) based on the paper Fast Multiresolution Image Querying
image-match. Very similar to what I was searching for. Similar to pHash, based on An image signature for any kind of image, Goldberg et al. Uses Python and Elasticsearch.
ImageHash. supports pHash.
Image Deduplicator (imagededup). Supports CNN, PHash, DHash, WHash, AHash.
Some discussions about perceptual image hashes: here
A bit offtopic: There exists many methods to create audio fingerprints. MusicBrainz, a web-service which provides fingerprint-based lookup for songs, has a good overview in their wiki. They are using AcoustID now. This is for finding exact (or mostly exact) matches. For finding similar matches (or if you only have some snippets or high noise), take a look at Echoprint. A related SO question is here. So it seems like this is solved for audio. All these solutions work quite good.
A somewhat more generic question about fuzzy search in general is here. E.g. there is locality-sensitive hashing and nearest neighbor search.
Can the screenshot or icon be transformed (scaled, rotated, skewed ...)? There are quite a few methods on top of my head that could possibly help you:
Simple euclidean distance as mentioned by #carlosdc (doesn't work with transformed images and you need a threshold).
(Normalized) Cross Correlation - a simple metrics which you can use for comparison of image areas. It's more robust than the simple euclidean distance but doesn't work on transformed images and you will again need a threshold.
Histogram comparison - if you use normalized histograms, this method works well and is not affected by affine transforms. The problem is determining the correct threshold. It is also very sensitive to color changes (brightness, contrast etc.). You can combine it with the previous two.
Detectors of salient points/areas - such as MSER (Maximally Stable Extremal Regions), SURF or SIFT. These are very robust algorithms and they might be too complicated for your simple task. Good thing is that you do not have to have an exact area with only one icon, these detectors are powerful enough to find the right match. A nice evaluation of these methods is in this paper: Local invariant feature detectors: a survey.
Most of these are already implemented in OpenCV - see for example the cvMatchTemplate method (uses histogram matching): http://dasl.mem.drexel.edu/~noahKuntz/openCVTut6.html. The salient point/area detectors are also available - see OpenCV Feature Detection.
I face the same issues recently, to solve this problem(simple and fast algorithm to compare two images) once and for all, I contribute an img_hash module to opencv_contrib, you can find the details from this link.
img_hash module provide six image hash algorithms, quite easy to use.
Codes example
origin lena
blur lena
resize lena
shift lena
#include <opencv2/core.hpp>
#include <opencv2/core/ocl.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/img_hash.hpp>
#include <opencv2/imgproc.hpp>
#include <iostream>
void compute(cv::Ptr<cv::img_hash::ImgHashBase> algo)
auto input = cv::imread("lena.png");
cv::Mat similar_img;
//detect similiar image after blur attack
cv::GaussianBlur(input, similar_img, {7,7}, 2, 2);
cv::imwrite("lena_blur.png", similar_img);
cv::Mat hash_input, hash_similar;
algo->compute(input, hash_input);
algo->compute(similar_img, hash_similar);
std::cout<<"gaussian blur attack : "<<
algo->compare(hash_input, hash_similar)<<std::endl;
//detect similar image after shift attack
input(cv::Rect(0,10, input.cols,input.rows-10)).
cv::imwrite("lena_shift.png", similar_img);
algo->compute(similar_img, hash_similar);
std::cout<<"shift attack : "<<
algo->compare(hash_input, hash_similar)<<std::endl;
//detect similar image after resize
cv::resize(input, similar_img, {120, 40});
cv::imwrite("lena_resize.png", similar_img);
algo->compute(similar_img, hash_similar);
std::cout<<"resize attack : "<<
algo->compare(hash_input, hash_similar)<<std::endl;
int main()
using namespace cv::img_hash;
//disable opencl acceleration may(or may not) boost up speed of img_hash
//if the value after compare <= 8, that means the images
//very similar to each other
//there are other algorithms you can try out
//every algorithms have their pros and cons
//BlockMeanHash support mode 0 and mode 1, they associate to
//mode 1 and mode 2 of PHash library
In this case, ColorMomentHash give us best result
gaussian blur attack : 0.567521
shift attack : 0.229728
resize attack : 0.229358
Pros and cons of each algorithm
The performance of img_hash is good too
Speed comparison with PHash library(100 images from ukbench)
If you want to know the recommend thresholds for these algorithms, please check this post(http://qtandopencv.blogspot.my/2016/06/introduction-to-image-hash-module-of.html).
If you are interesting about how do I measure the performance of img_hash modules(include speed and different attacks), please check this link(http://qtandopencv.blogspot.my/2016/06/speed-up-image-hashing-of-opencvimghash.html).
Does the screenshot contain only the icon? If so, the L2 distance of the two images might suffice. If the L2 distance doesn't work, the next step is to try something simple and well established, like: Lucas-Kanade. Which I'm sure is available in OpenCV.
If you want to get an index about the similarity of the two pictures, I suggest you from the metrics the SSIM index. It is more consistent with the human eye. Here is an article about it: Structural Similarity Index
It is implemented in OpenCV too, and it can be accelerated with GPU: OpenCV SSIM with GPU
If you can be sure to have precise alignment of your template (the icon) to the testing region, then any old sum of pixel differences will work.
If the alignment is only going to be a tiny bit off, then you can low-pass both images with cv::GaussianBlur before finding the sum of pixel differences.
If the quality of the alignment is potentially poor then I would recommend either a Histogram of Oriented Gradients or one of OpenCV's convenient keypoint detection/descriptor algorithms (such as SIFT or SURF).
If for matching identical images - code for L2 distance
// Compare two images by getting the L2 error (square-root of sum of squared error).
double getSimilarity( const Mat A, const Mat B ) {
if ( A.rows > 0 && A.rows == B.rows && A.cols > 0 && A.cols == B.cols ) {
// Calculate the L2 relative error between images.
double errorL2 = norm( A, B, CV_L2 );
// Convert to a reasonable scale, since L2 error is summed across all pixels of the image.
double similarity = errorL2 / (double)( A.rows * A.cols );
return similarity;
else {
//Images have a different size
return 100000000.0; // Return a bad value
Fast. But not robust to changes in lighting/viewpoint etc.
If you want to compare image for similarity,I suggest you to used OpenCV. In OpenCV, there are few feature matching and template matching. For feature matching, there are SURF, SIFT, FAST and so on detector. You can use this to detect, describe and then match the image. After that, you can use the specific index to find number of match between the two images.
Hu invariant moments is very powerful tool to compare two images
Hash functions are used in the undouble library to detect (near-)identical images (disclaimer: I am also the author). This is a simple and fast way to compare two or more images for similarity. It works using a multi-step process of pre-processing the images (grayscaling, normalizing, and scaling), computing the image hash, and the grouping of images based on a threshold value.