Tesseract (Tess4j) increasing accuracy - opencv

Am making a licence plate recognition software, I already trained my language using SunnyPage 2.7, currently the detection is good except Tesseract is not giving me good results. For example it reads This plate as AC2 4529 well thats good except when I load the same image in SunnyPage with my language I get ACZ 4529 which is correct, I ended up configuring Tesseract to tess.setPageSegMode(10) single character mode segmenting the individual characters and processing each character one by one in Tesseract, that increased accuracy but not as much, below is my Tesseract configuration
Tesseract instance = new Tesseract(); //
instance.setLanguage(LANGUAGE);
instance.setHocr(false);
instance.setTessVariable("tessedit_char_whitelist", "ACPBZRT960847152");
instance.setTessVariable("load_system_dawg", "false");
instance.setTessVariable("load_freq_dawg", "false");
instance.setOcrEngineMode(TessOcrEngineMode.OEM_CUBE_ONLY);
instance.setPageSegMode(TessPageSegMode.PSM_SINGLE_CHAR);
instance.setPageSegMode(10);
Anyone know How I can get results as good as SunnyPage? as far as I know my image is good, it is skewed and well segmented so it is most likely to do with Tesseract alone.

The best thing to do would be to train tesseract with actual images of your license plates. This will make your results much more accurate because tesseract will actually know what a Z and 2 look like and it will recognize them much more accurately.
https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract
http://vietocr.sourceforge.net/training.html

Related

Improve image quality

I need to improve image quality, from low quality to high hd quality. I am using OpenCV libraries. I experimented a lot with GaussianBlur(), Laplacian(), transformation functions, filter functions etc, but all I could succeed is to convert image to hd resolution and keep the same quality. Is it possible to do this? Do I need to implement my own algorithm or is there a way how it's done? I will really appreciate any kind of help. Thanks in advance.
I used this link for my reference. It has other interesting filters that you can play with.
If you are using C++:
detailEnhance(Mat src, Mat dst, float sigma_s=10, float sigma_r=0.15f)
If you are using python:
dst = cv2.detailEnhance(src, sigma_s=10, sigma_r=0.15)
The variable 'sigma_s' determines how big the neighbourhood of pixels must be to perform filtering.
The variable 'sigma_r' determines how the different colours within the neighbourhood of pixels will be averaged with each other. Its range is from: 0 - 1. A smaller value means similar colors will be averaged out while different colors remain as they are.
Since you are looking for sharpness in the image, I would suggest you keep the kernel as minimum as possible.
Here is the result I obtained for a sample image:
1. Original image:
2. Sharpened image for lower sigma_r value:
3. Sharpened image for higher sigma_r value:
Check the above mentioned link for more information.
How about applying Super Resolution in OpenCV? A reference article with more details can be found here: https://learnopencv.com/super-resolution-in-opencv/
So basically you will need to have the Python dependency opencv-contrib-python installed, together with a working version of opencv-python.
There are different techniques for the Super Resolution in OpenCV you can choose from, including EDSR, ESPCN, FSRCNN, and LapSRN. Code examples in both Python and C++ have been included in the tutorial article as well for easy reference.
A correction is needed
dst = cv2.detailEnhance(src, sigma_s=10, sigma_r=0.15)
using kernel will give error.
+1 to kris stern answer,
If you are looking for practical implementation of super resolution using pretrained model in OpenCV, have a look at below notebook also video describing details.
https://github.com/pankajr141/experiments/blob/master/Reasoning/ComputerVision/super_resolution_enhancing_image_quality_using_pretrained_models.ipynb
https://www.youtube.com/watch?v=JrWIYWO4bac&list=UUplf_LWNn0a9ubnKCZ-95YQ&index=4
Below is a sample code using opencv
model_pretrained = cv2.dnn_superres.DnnSuperResImpl_create()
# setting up the model initialization
model_pretrained.readModel(filemodel_filepath)
model_pretrained.setModel(modelname, scale)
# prediction or upscaling
img_upscaled = model_pretrained.upsample(img_small)

How to improve Text recognition usingTesseract OCR.?

I had implemented tesseract ocr for text recognition in IOS.I had preprocessed the input image and give into Tesseract method.It gives poor recognition result.
Steps:
1.Erode function
2.Dilate function
3.Bitwise_not function
Mat MCRregion;
cv::dilate ( MCRregion, MCRregion, 24);
cv::erode ( MCRregion, MCRregion, 24);
cv::bitwise_not(MCRregion, MCRregion);
UIImage * croppedMCRregion = [self UIImageFromCVMat:MCRregion];
Tesseract* tesseract = [[Tesseract alloc] initWithDataPath:#"tessdata" language:#"eng"];
[tesseract setVariableValue:#"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz.>,'`;-:</" forKey:#"tessedit_char_whitelist"];
[tesseract setImage:[self UIImageFromCVMat:MCRregion]];
// [tesseract setImage:image];
[tesseract recognize];
NSLog(#"%#", [tesseract recognizedText]);
Input Image:
Image Link
1.How to Improve text recognition rate using Tesseract ?
2.Is any other pre processing steps applied in Tesseract.?
3.Is dewarp text Done in Tesseract OCR.?
Tesseract is a highly configurable piece of software -- though its configurations are poorly documented (unless you want to dig deep in the 150K lines of code). A good comprehensive list is present here http://www.sk-spell.sk.cx/tesseract-ocr-parameters-in-302-version.
Also look at
https://code.google.com/p/tesseract-ocr/wiki/ControlParams and https://code.google.com/p/tesseract-ocr/wiki/ImproveQuality
You can improve the quality tremendously if you feed more info about the data you're OCR'ing.
e.g. in case the images are all National IDs or Passports which follow certain standard MRZ formats, you can configure tesseract to use that info.
For the image you attach (an MRZ), i got the following result,
IDFRADOUEL<<<<<<<<<<<<<<<<<<<<9320
05O693202O438CHRISTIANE<<N1Z90620<3
by using the following config
# disable dict, freq tables etc which would distract OCR'ing an MRZ
load_system_dawg F
load_freq_dawg F
load_unambig_dawg F
load_punc_dawg F
load_number_dawg F
load_fixed_length_dawgs F
load_bigram_dawg F
wordrec_enable_assoc F
# mrz allows only these chars
tessedit_char_whitelist 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ<
Also see that your installation is trained for the fonts to get more accurate results. In your case it seems it is OCR-B font.
It is not necessary to go through the tedious task of retraining Tesseract. Yes, you will get much better results but in some cases you can get pretty far with the ENG training set.
You can improve your results by paying attention to the following things:
Use a binary image as input and make sure you have black text on a white background
By default Tesseract will try to make words out of things that have no spacing. Try to segment each character seperately and place them in a new image with lots of spacing. Especially if you have combinations of letters and numbers Tesseract will "correct" this to match the surrounding characters.
Try to segment different parts of your image with a whitelist for the characters you know should be in there. If your only looking for digits in the first part then use a seperate instance of Tesseract to detect these numbers with a number only whitelist.
If you use the same object multiple times without resetting it Tesseract seems to have a memory. That means that you can get a different result each time you perform OCR. You can reset Tesseract to counter this or just create a new object.
Last but not least, use the resultIterator to go through the boxes that Tesseract can give as a result. You can check the size and confidence of each character and filter accordingly.
Based on my experience:
1.How to Improve text recognition rate using Tesseract ?
Firstly, preprocessing. Ensure that the input image is binary image with a good threshold. OpenCV has a good set of functions to apply threshold algorithms such as the Otsu algorithm as well as contour detection to help with warping and rotation.
You can also use contour detection in OpenCV to distinguish between lines of text.
Some filtering would also remove noise which often confuse tesseract and increase processing time.
Set up proper configurations for tesseract (e.g. eng.config). Full list of configs here (http://www.sk-spell.sk.cx/tesseract-ocr-parameters-in-302-version). Some examples include blacklists, whitelists, chopping, etc...
Use proper flags. E.g. -psm 6 if you are doing blocks of text rather than lines
Having trained my own language data... I would say do so only if you have lots of time and resources. Or if your font is very peculiar (e.g. dot matrix).
More recent versions of Tesseract (closer to 3.0) allow for multiple language files to be used on the same pass (-l one+two). This means you can have one specially trained for text and another for numbers. In our case, it seemed to work well.
Postprocessing of tesseract results was particularly important for us too. String replacements of typical mis-recognitions and what not.
2.Is any other pre processing steps applied in Tesseract.?
Tesseract uses leptonica library for preprocessing.
3.Is dewarp text Done in Tesseract OCR.?
I am inclined to think yes. Considering that warping functions are part of leptonica.

Determine skeleton joints with a webcam (not Kinect)

I'm trying to determine skeleton joints (or at the very least to be able to track a single palm) using a regular webcam. I've looked all over the web and can't seem to find a way to do so.
Every example I've found is using Kinect. I want to use a single webcam.
There's no need for me to calculate the depth of the joints - I just need to be able to recognize their X, Y position in the frame. Which is why I'm using a webcam, not a Kinect.
So far I've looked at:
OpenCV (the "skeleton" functionality in it is a process of simplifying graphical models, but it's not a detection and/or skeletonization of a human body).
OpenNI (with NiTE) - the only way to get the joints is to use the Kinect device, so this doesn't work with a webcam.
I'm looking for a C/C++ library (but at this point would look at any other language), preferably open source (but, again, will consider any license) that can do the following:
Given an image (a frame from a webcam) calculate the X, Y positions of the visible joints
[Optional] Given a video capture stream call back into my code with events for joints' positions
Doesn't have to be super accurate, but would prefer it to be very fast (sub-0.1 sec processing time per frame)
Would really appreciate it if someone can help me out with this. I've been stuck on this for a few days now with no clear path to proceed.
UPDATE
2 years later a solution was found: http://dlib.net/imaging.html#shape_predictor
To track a hand using a single camera without depth information is a serious task and topic of ongoing scientific work. I can supply you a bunch of interesting and/or highly cited scientific papers on the topic:
M. de La Gorce, D. J. Fleet, and N. Paragios, “Model-Based 3D Hand Pose Estimation from Monocular Video.,” IEEE transactions on pattern analysis and machine intelligence, vol. 33, Feb. 2011.
R. Wang and J. Popović, “Real-time hand-tracking with a color glove,” ACM Transactions on Graphics (TOG), 2009.
B. Stenger, A. Thayananthan, P. H. S. Torr, and R. Cipolla, “Model-based hand tracking using a hierarchical Bayesian filter.,” IEEE transactions on pattern analysis and machine intelligence, vol. 28, no. 9, pp. 1372–84, Sep. 2006.
J. M. Rehg and T. Kanade, “Model-based tracking of self-occluding articulated objects,” in Proceedings of IEEE International Conference on Computer Vision, 1995, pp. 612–617.
Hand tracking literature survey in the 2nd chapter:
T. de Campos, “3D Visual Tracking of Articulated Objects and Hands,” 2006.
Unfortunately I don't know about some freely available hand tracking library.
there is a simple way for detecting hand using skin tone. perhaps this could help... you can see the results on this youtube video. caveat: the background shouldn't contain skin colored things like wood.
here is the code:
''' Detect human skin tone and draw a boundary around it.
Useful for gesture recognition and motion tracking.
Inspired by: http://stackoverflow.com/a/14756351/1463143
Date: 08 June 2013
'''
# Required moduls
import cv2
import numpy
# Constants for finding range of skin color in YCrCb
min_YCrCb = numpy.array([0,133,77],numpy.uint8)
max_YCrCb = numpy.array([255,173,127],numpy.uint8)
# Create a window to display the camera feed
cv2.namedWindow('Camera Output')
# Get pointer to video frames from primary device
videoFrame = cv2.VideoCapture(0)
# Process the video frames
keyPressed = -1 # -1 indicates no key pressed
while(keyPressed < 0): # any key pressed has a value >= 0
# Grab video frame, decode it and return next video frame
readSucsess, sourceImage = videoFrame.read()
# Convert image to YCrCb
imageYCrCb = cv2.cvtColor(sourceImage,cv2.COLOR_BGR2YCR_CB)
# Find region with skin tone in YCrCb image
skinRegion = cv2.inRange(imageYCrCb,min_YCrCb,max_YCrCb)
# Do contour detection on skin region
contours, hierarchy = cv2.findContours(skinRegion, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# Draw the contour on the source image
for i, c in enumerate(contours):
area = cv2.contourArea(c)
if area > 1000:
cv2.drawContours(sourceImage, contours, i, (0, 255, 0), 3)
# Display the source image
cv2.imshow('Camera Output',sourceImage)
# Check for user input to close program
keyPressed = cv2.waitKey(1) # wait 1 milisecond in each iteration of while loop
# Close window and camera after exiting the while loop
cv2.destroyWindow('Camera Output')
videoFrame.release()
the cv2.findContour is quite useful, you can find the centroid of a "blob" by using cv2.moments after u find the contours. have a look at the opencv documentation on shape descriptors.
i havent yet figured out how to make the skeletons that lie in the middle of the contour but i was thinking of "eroding" the contours till it is a single line. in image processing the process is called "skeletonization" or "morphological skeleton". here is some basic info on skeletonization.
here is a link that implements skeletonization in opencv and c++
here is a link for skeletonization in opencv and python
hope that helps :)
--- EDIT ----
i would highly recommend that you go through these papers by Deva Ramanan (scroll down after visiting the linked page): http://www.ics.uci.edu/~dramanan/
C. Desai, D. Ramanan. "Detecting Actions, Poses, and Objects with
Relational Phraselets" European Conference on Computer Vision
(ECCV), Florence, Italy, Oct. 2012.
D. Park, D. Ramanan. "N-Best Maximal Decoders for Part Models" International Conference
on Computer Vision (ICCV) Barcelona, Spain, November 2011.
D. Ramanan. "Learning to Parse Images of Articulated Objects" Neural Info. Proc.
Systems (NIPS), Vancouver, Canada, Dec 2006.
The most common approach can be seen in the following youtube video. http://www.youtube.com/watch?v=xML2S6bvMwI
This method is not quite robust, as it tends to fail if the hand is rotated to much (eg; if the camera is looking at the side of the hand or at a partially bent hand).
If you do not mind using two camera's you can look into the work Robert Wang. His current company (3GearSystems) uses this technology, augmented with a kinect, to provide tracking. His original paper uses two webcams but has much worse tracking.
Wang, Robert, Sylvain Paris, and Jovan Popović. "6d hands: markerless hand-tracking for computer aided design." Proceedings of the 24th annual ACM symposium on User interface software and technology. ACM, 2011.
Another option (again if using "more" than a single webcam is possible), is to use a IR emitter. Your hand reflects IR light quite well whereas the background does not. By adding a filter to the webcam that filters normal light (and removing the standard filter that does the opposite) you can create a quite effective hand tracking. The advantage of this method is that the segmentation of the hand from the background is much simpler. Depending on the distance and the quality of the camera, you would need more IR leds, in order to reflect sufficient light back into the webcam. The leap motion uses this technology to track the fingers & palms (it uses 2 IR cameras and 3 IR leds to also get depth information).
All that being said; I think the Kinect is your best option in this. Yes, you don't need the depth, but the depth information does make it a lot easier to detect the hand (using the depth information for the segmentation).
My suggestion, given your constraints, would be to use something like this:
http://docs.opencv.org/doc/tutorials/objdetect/cascade_classifier/cascade_classifier.html
Here is a tutorial for using it for face detection:
http://opencv.willowgarage.com/wiki/FaceDetection?highlight=%28facial%29|%28recognition%29
The problem you have described is quite difficult, and I'm not sure that trying to do it using only a webcam is a reasonable plan, but this is probably your best bet. As explained here (http://docs.opencv.org/modules/objdetect/doc/cascade_classification.html?highlight=load#cascadeclassifier-load), you will need to train the classifier with something like this:
http://docs.opencv.org/doc/user_guide/ug_traincascade.html
Remember: Even though you don't require the depth information for your use, having this information makes it easier for the library to identify a hand.
At last I've found a solution. Turns out a dlib open-source project has a "shape predictor" that, once properly trained, does exactly what I need: it guesstimates (with a pretty satisfactory accuracy) the "pose". A "pose" is loosely defined as "whatever you train it to recognize as a pose" by training it with a set of images, annotated with the shapes to extract from them.
The shape predictor is described in here on dlib's website
I don't know about possible existing solutions. If supervised (or semi-supervised) learning is an option, training decision trees or neural networks might already be enough (kinect uses random forests from what i have heard). Before you go such a path, do everything you can to find an existing solution. Getting Machine Learning stuff right takes a lot of time and experimentation.
OpenCV has machine learning components, what you would need is training data.
With the motion tracking features of the open source Blender project it is possible to create a 3D model based on 2D footage. No kinect needed. Since blender is open source you might be able to use their pyton scripts outside the blender framework for your own purposes.
Have you ever heard about Eyesweb
I have been using it for one of my project and I though it might be usefull for what you want to achieve.
Here are some interesting publication LNAI 3881 - Finger Tracking Methods Using EyesWeb and Powerpointing-HCI using gestures
Basically the workflow is:
You create your patch in EyesWeb
Prepare the datas you want to send with a network client
Use theses processed datas on your own server (your app)
However, I don't know if there is a way to embed the real time image processing part of Eyes Web into a soft as a library.

OpenCV Multilevel B-Spline Approximation

Hi (sorry for my english) .. i'm working in a project for University in this project i need to use the MBA (Multilevel B-Spline Approximation) algorithm to get some points (control points) of a image to use in other operations.
I'm reading a lot of papers about this algorithm, and i think i understand, but i can't writing.
The idea is: Read a image, process a image (OpenCV), then get control points of the image, use this points.
So the problem here is:
The algorithm use a set of points {(x,y,z)} , this set of points are approximated with a surface generated with the control points obtained from MBA. the set of points {(x,y,z)} represents de data we need to approximate (the image)..
So, the image is in a cv::Mat format , how can transform this format to an ordinary array to simply access to the data an manipulate...
Here are one paper with an explanation of the method:
(Paper) REGULARIZED MULTILEVEL B-SPLINE REGISTRATION
(Paper)Scattered Data Interpolation with Multilevel B-splines
(Matlab)MBA
If someone can help, maybe a guideline, idea or anything will be appreciate ..
Thanks in advance.
EDIT: Finally i wrote the algorithm in C++ using armadillo and OpenCV ...
Well i'm using armadillo a C++ linear algebra library to works with matrix for the algorithm

How to compute SVD using Cimg (or maybe openCV or eigen library)?

May anyone give me a quick guide on how to use Cimg to compute SVD for a 3-dimension array?
I just want to get the decomposition of the array in order to compress it small for speeding up further process.
What value should I input at where, and how to get the output?
I've been searched around and still can't understand how it works. and not really fully understand how SVD works as well..only know that it can used to decompress matrix.
At the same time I found that OpenCV and Eigen library also can done the job, do let me know their steps if is much more easier..
(Alternative for me instead of SVD is PCA, which I found its source/library but also don't know how to use..)
Thanks!
See http://cimg.sourceforge.net/reference/structcimg__library_1_1CImg.html#a9a79f3a0849388b3ec13bd140b67a12e
CImg<float> A(3,3); // A = U'*S*V
A.rand(0,1);
CImgList<float> USV = A.get_SVD(); //USV[0] = U and so forth

Resources