I am currently using tesseract to scan receipts. The quality wasn't good so I read this article on how to improve it: https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality#noise-removal. I implemented resizing, deskewing(aligning), and gaussian blur. But none of them seem to have a positive effect on the accuracy of the OCR except the deskewing. Here is my code for resizing and gaussian blur. Am I doing anything wrong? If not, what else can I do to help?
Code:
+(UIImage *) prepareImage: (UIImage *)image{
//converts UIImage to Mat format
Mat im = cvMatWithImage(image);
//grayscale image
Mat gray;
cvtColor(im, gray, CV_BGR2GRAY);
//deskews text
//did not provide code because I know it works
Mat preprocessed = preprocess2(gray);
double skew = hough_transform(preprocessed, im);
Mat rotated = rot(im,skew* CV_PI/180);
//resize image
Mat scaledImage = scaleImage(rotated, 2);
//Guassian Blur
GaussianBlur(scaledImage, scaledImage, cv::Size(1, 1), 0, 0);
return UIImageFromCVMat(scaledImage);
}
// Organization -> Resizing
Mat scaleImage(Mat mat, double factor){
Mat resizedMat;
double width = mat.cols;
double height = mat.rows;
double aspectRatio = width/height;
resize(mat, resizedMat, cv::Size(width*factor*aspectRatio, height*factor*aspectRatio));
return resizedMat;
}
Receipt:
If you read the Tesseract documentation you will see that tesseract engine works best with texts in a single line in a square. Passing it the whole receipt image reduces the engine's accuracy. What you need to do is use the new iOS framework CITextFeature to detect texts in your receipt into multiple blocks of images. Then only you can pass those images to tesseract for processing.
Related
I want to compare two images and find same and different parts of images. I tired "cv::compare and cv::absdiff" methods but confused which one can good for my case. Both show me different results. So how i can achieve my desired task ?
Here's an example how you can use cv::absdiff to find image similarities:
int main()
{
cv::Mat input1 = cv::imread("../inputData/Similar1.png");
cv::Mat input2 = cv::imread("../inputData/Similar2.png");
cv::Mat diff;
cv::absdiff(input1, input2, diff);
cv::Mat diff1Channel;
// WARNING: this will weight channels differently! - instead you might want some different metric here. e.g. (R+B+G)/3 or MAX(R,G,B)
cv::cvtColor(diff, diff1Channel, CV_BGR2GRAY);
float threshold = 30; // pixel may differ only up to "threshold" to count as being "similar"
cv::Mat mask = diff1Channel < threshold;
cv::imshow("similar in both images" , mask);
// use similar regions in new image: Use black as background
cv::Mat similarRegions(input1.size(), input1.type(), cv::Scalar::all(0));
// copy masked area
input1.copyTo(similarRegions, mask);
cv::imshow("input1", input1);
cv::imshow("input2", input2);
cv::imshow("similar regions", similarRegions);
cv::imwrite("../outputData/Similar_result.png", similarRegions);
cv::waitKey(0);
return 0;
}
Using those 2 inputs:
You'll observe that output (black background):
I have a question about Tesseract OCR principle. As far as I understand, after shapes detection , symbols (their forms) are scaled(resized) to have some specific font size.
Such font size is based on trained data. Basically, trained set defines symbols (their geometry,shape), maybe their representation.
I am using Tesseract 3.01 (the latest) version on iOS platform.
I check Tesseract FAQ, looked at forum, but I do not understand why for some images I have low quality of recognition.
It is said that font should be bigger than 12pt & image should have more than 300 DPI. I did all necessary preprocessing such as blurring (if it is needed), contrast enhancement.
I even used other engine in Tesseract OCR - it is called CUBE.
But for some images (in spite of fact that they are bigger MIN(width, height) >1000 - I rescale them for tesseract, I get bad results for recognition
http://goo.gl/l9uJMe
However on other set of images results are better:
http://goo.gl/cwA9DC
Those images smaller I do not resize them, (just convert to grayscale mode).
If what I wrote about engine is correct.
Suppose trained set is based on font with size 14pt. Symbols from pictures are resized to some specific size, and I do not see any reason why they are not recognised in such case.
I also tried custom dictionaries, to penalise non dictionary words - did not give too much benefit to recognition.
tesseract = new tesseract::TessBaseAPI();
GenericVector<STRING> variables_name(1),variables_value(1);
variables_name.push_back("user_words_suffix");
variables_value.push_back("user-words");
int retVal = tesseract->Init([self.tesseractDataPath cStringUsingEncoding:NSUTF8StringEncoding], NULL,tesseract::OEM_TESSERACT_ONLY, NULL, 0, &variables_name, &variables_value, false);
ok |= retVal == 0;
ok |= tesseract->SetVariable("language_model_penalty_non_dict_word", "0.2");
ok |= tesseract->SetVariable("language_model_penalty_non_freq_dict_word", "0.2");
if (!ok)
{
NSLog(#"Error initializing tesseract!");
}
So my question is should I train tesseract on another font?
And ,honestly speaking, why I should train it? on default trained data text from Internet, or screen of PC(Mac) I get good recognition.
I also checked original tesseract English trained data it has 38 tiff files, that belong to the following families:
1) Аrial
2) verdana
3 )trebuc
4) times
5) georigia
6 ) cour
It seems that font from image does not belong to this set.
In your case the size of the image is not the problem. As I can see from your attached images (and I'm surprised that nobody mentioned it before) the problem is that the text on images from which you get bad results is not placed on straight lines.
One of the things that Tesseract does at early stages of OCR process is to detect image layout and extracting whole lines of text.
This image is the best example to illustrate this part of the process:
As you can see the engine is expecting the text to be perpendicular to the edge of the image.
If you done with all necessary image processing then try this, It may helpful for you
CGSize size = [image size];
int width = size.width;
int height = size.height;
uint32_t* _pixels = (uint32_t *) malloc(width * height * sizeof(uint32_t));
if (!_pixels) {
return;//Invalid image
}
// Clear the pixels so any transparency is preserved
memset(_pixels, 0, width * height * sizeof(uint32_t));
CGColorSpaceRef colorSpace = CGColorSpaceCreateDeviceRGB();
// Create a context with RGBA _pixels
CGContextRef context = CGBitmapContextCreate(_pixels, width, height, 8, width * sizeof(uint32_t), colorSpace,kCGBitmapByteOrder32Little | kCGImageAlphaPremultipliedLast);
// Paint the bitmap to our context which will fill in the _pixels array
CGContextDrawImage(context, CGRectMake(0, 0, width, height), [image CGImage]);
// We're done with the context and color space
CGContextRelease(context);
CGColorSpaceRelease(colorSpace);
_tesseract->SetImage((const unsigned char *) _pixels, width, height, sizeof(uint32_t), width * sizeof(uint32_t));
_tesseract->SetVariable("tessedit_char_whitelist", ".#0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz/-!");
_tesseract->SetVariable("tessedit_consistent_reps", "0");
char* utf8Text = _tesseract->GetUTF8Text();
NSString *str = nil;
if (utf8Text) {
str = [NSString stringWithUTF8String:utf8Text];
}
Let me start by saying that I'm still a beginner using OpenCV. Some things might seem obvious and once I learn them hopefully they also become obvious to me.
My goal is to use the floodFill feature to generate a separate image containing only the filled area. I have looked into this post but I'm a bit lost on how to convert the filled mask into an actual BGRA image with the filled color. Besides that I also need to crop the newly filled image to contain only the filled area. I'm guessing OpenCV has some magical function that could do the trick.
Here is what I'm trying to achieve:
Original image:
Filled image:
Filled area only:
UPDATE 07/07/13
Was able to do a fill on a separate image using the following code. However, I still need to figure out the best approach to get only the filled area. Also, my floodfill solution has an issue with filling an image that contains alpha values...
static int floodFillImage (cv::Mat &image, int premultiplied, int x, int y, int color)
{
cv::Mat out;
// un-multiply color
unmultiplyRGBA2BGRA(image);
// convert to no alpha
cv::cvtColor(image, out, CV_BGRA2BGR);
// create our mask
cv::Mat mask = cv::Mat::zeros(image.rows + 2, image.cols + 2, CV_8U);
// floodfill the mask
cv::floodFill(
out,
mask,
cv::Point(x,y),
255,
0,
cv::Scalar(),
cv::Scalar(),
+ (255 << 8) + cv::FLOODFILL_MASK_ONLY);
// set new image color
cv::Mat newImage(image.size(), image.type());
cv::Mat maskedImage(image.size(), image.type());
// set the solid color we will mask out of
newImage = cv::Scalar(ARGB_BLUE(color), ARGB_GREEN(color), ARGB_RED(color), ARGB_ALPHA(color));
// crop the 2 extra pixels w and h that were given before
cv::Mat maskROI = mask(cv::Rect(1,1,image.cols,image.rows));
// mask the solid color we want into new image
newImage.copyTo(maskedImage, maskROI);
// pre multiply the colors
premultiplyBGRA2RGBA(maskedImage, image);
return 0;
}
you can get the difference of those two images to get the different pixels.
pixels with no difference will be zero and other are positive value.
cv::Mat A, B, C;
A = getImageA();
B = getImageB();
C = A - B;
handle negative values in the case.(i presume not in your case)
I have been trying for hours to run an xcode project with openCV. I have built the source, imported it into the project and included
#ifdef __cplusplus #import opencv2/opencv.hpp>
#endif
in the .pch file.
I followed the instructions from http://docs.opencv.org/trunk/doc/tutorials/introduction/ios_install/ios_install.html
Still I am getting many Apple Mach-O linker errors when I compile.
Undefined symbols for architecture i386:
"std::__1::__vector_base_common<true>::__throw_length_error() const", referenced from:
Please help me I am really lost..
UPDATE:
Errors all fixed and now I am trying to detect circles..
Mat src, src_gray;
cvtColor( image, src_gray, CV_BGR2GRAY );
vector<Vec3f> circles;
/// Apply the Hough Transform to find the circles
HoughCircles( src_gray, circles, CV_HOUGH_GRADIENT, 1, image.rows/8, 200, 100, 0, 0 );
/// Draw the circles detected
for( size_t i = 0; i < circles.size(); i++ )
{
Point center(cvRound(circles[i][0]), cvRound(circles[i][1]));
int radius = cvRound(circles[i][2]);
// circle center
circle( src, center, 3, Scalar(0,255,0), -1, 8, 0 );
// circle outline
circle( src, center, radius, Scalar(0,0,255), 3, 8, 0 );
}
I am using the code above, however no circles are being drawn on the image.. is there something obvious that I am doing wrong?
Try the solution in my answer to this question...
How to resolve iOS Link errors with OpenCV
Also on github I have a couple of simple working samples - with recently built openCV framework.
NB - OpenCVSquares is simpler than OpenCVSquaresSL. The latter was adapted for Snow Leopard backwards compatibility - it contains two builds of the openCV framework and 3 targets, so you are better off using the simpler OpenCVSquares if it will run on your system.
To adapt OpenCVSquares to detect circles, I suggest that you start with the Hough Circles c++ sample from the openCV distro, and use it to adapt/replace CVSquares.cpp and CVSquares.h with, say CVCircles.cpp and CVCicles.h
The principles are exactly the same:
remove UI code from the c++, the UI is provided on the obj-C side
transform the main() function into a static member function for the class declared in the header file. This should mirror in form an Objective-C message to the wrapper (which translates the obj-c method to a c++ function call).
From the objective-C side, you are passing a UIImage to the wrapper object, which:
converts the UIImage to a cv::Mat image
pass the Mat to a c++ class for processing
converts the result from Mat back to UIImage
returns the processed UIImage back to the objective-C calling object
update
The adapted houghcircles.cpp should look something like this at it's most basic (I've replaced the CVSquares class with a CVCircles class):
cv::Mat CVCircles::detectedCirclesInImage (cv::Mat img)
{
//expects a grayscale image on input
//returns a colour image on ouput
Mat cimg;
medianBlur(img, img, 5);
cvtColor(img, cimg, CV_GRAY2RGB);
vector<Vec3f> circles;
HoughCircles(img, circles, CV_HOUGH_GRADIENT, 1, 10,
100, 30, 1, 60 // change the last two parameters
// (min_radius & max_radius) to detect larger circles
);
for( size_t i = 0; i < circles.size(); i++ )
{
Vec3i c = circles[i];
circle( cimg, Point(c[0], c[1]), c[2], Scalar(255,0,0), 3, CV_AA);
circle( cimg, Point(c[0], c[1]), 2, Scalar(0,255,0), 3, CV_AA);
}
return cimg;
}
Note that the input parameters are reduced to one - the input image - for simplicity. Shortly I will post a sample on github which will include some parameters tied to slider controls in the iOS UI, but you should get this version working first.
As the function signature has changed you should follow it up the chain...
Alter the houghcircles.h class definition:
static cv::Mat detectedCirclesInImage (const cv::Mat image);
Modify the CVWrapper class to accept a similarly-structured method which calls detectedCirclesInImage
+ (UIImage*) detectedCirclesInImage:(UIImage*) image
{
UIImage* result = nil;
cv::Mat matImage = [image CVGrayscaleMat];
matImage = CVCircles::detectedCirclesInImage (matImage);
result = [UIImage imageWithCVMat:matImage];
return result;
}
Note that we are converting the input UIImage to grayscale, as the houghcircles function expects a grayscale image on input. Take care to pull the latest version of my github project, I found an error in the CVGrayscaleMat category which is now fixed . Output image is colour (colour applied to grayscale input image to pick out found circles).
If you want your input and output images in colour, you just need to ensure that you make a grayscale conversion of your input image for sending to Houghcircles() - eg cvtColor(input_image, gray_image, CV_RGB2GRAY); and apply your found circles to the colour input image (which becomes your return image).
Finally in your CVViewController, change your messages to CVWrapper to conform to this new signature:
UIImage* image = [CVWrapper detectedCirclesInImage:self.image];
If you follow all of these details your project will produce circle-detected results.
update 2
OpenCVCircles now on Github
With sliders to adjust HoughCircles() parameters
Is there a quick solution to specify the ROI only within the contours of the blob I'm intereseted in?
My ideas so far:
Using the boundingRect, but it contains too much stuff I don't want to analyse.
Applying goodFeaturesToTrack to the whole image and then loop through the output coordinates to eliminate the once outside my blobs contour
Thanks in advance!
EDIT
I found what I need: cv::pointPolygonTest() seems to be the right thing, but I'm not sure how to implement it …
Here's some code:
// ...
IplImage forground_ipl = result;
IplImage *labelImg = cvCreateImage(forground.size(), IPL_DEPTH_LABEL, 1);
CvBlobs blobs;
bool found = cvb::cvLabel(&forground_ipl, labelImg, blobs);
IplImage *imgOut = cvCreateImage(cvGetSize(&forground_ipl), IPL_DEPTH_8U, 3);
if (found) {
vb::CvBlob *greaterBlob = blobs[cvb::cvGreaterBlob(blobs)];
cvb::cvRenderBlob(labelImg, greaterBlob, &forground_ipl, imgOut);
CvContourPolygon *polygon = cvConvertChainCodesToPolygon(&greaterBlob->contour);
}
"polygon" contains the contour I need.
goodFeaturesToTrack is implemented this way:
- (std::vector<cv::Point2f>)pointsFromGoodFeaturesToTrack:(cv::Mat &)_image
{
std::vector<cv::Point2f> corners;
cv::goodFeaturesToTrack(_image,corners, 100, 0.01, 10);
return corners;
}
So next I need to loop through the corners and check each point with cv::pointPolygonTest(), right?
You can create a mask over your interest region:
EDIT
How to make a mask:
Make a mask;
Mat mask(origImg.size(), CV_8UC1);
mask.setTo(Scalar::all(0));
// here I assume your contour is extracted with findContours,
// and is stored in a vector<vector<Point>>
// and that you know which contour is the blob
// if it's not the case, use fillPoly instead of drawContour();
Scalar color(255,255,255); // white. actually, it's monchannel.
drawContours(mask, contours, contourIdx, color );
// fillPoly(Mat& img, const Point** pts, const int* npts,
// int ncontours, const Scalar& color)
And now you're ready to use it. BUT, look carefully at the result - I have heard about some bugs in OpenCV regarding the mask parameter for feature extractors, and I am not sure if it's about this one.
// note the mask parameter:
void goodFeaturesToTrack(InputArray image, OutputArray corners, int maxCorners,
double qualityLevel, double minDistance,
InputArray mask=noArray(), int blockSize=3,
bool useHarrisDetector=false, double k=0.04 )
This will also improve the speed of your aplication - goodFeaturesToTrack eats a hoge amount of time, and if you apply it only on a smaller image, the overall gain is significant.