iOS .Tesseract OCR why recognition is so pure. Engine principle

iOS .Tesseract OCR why recognition is so pure. Engine principle - ios

I have a question about Tesseract OCR principle. As far as I understand, after shapes detection , symbols (their forms) are scaled(resized) to have some specific font size.
Such font size is based on trained data. Basically, trained set defines symbols (their geometry,shape), maybe their representation.
I am using Tesseract 3.01 (the latest) version on iOS platform.
I check Tesseract FAQ, looked at forum, but I do not understand why for some images I have low quality of recognition.
It is said that font should be bigger than 12pt & image should have more than 300 DPI. I did all necessary preprocessing such as blurring (if it is needed), contrast enhancement.
I even used other engine in Tesseract OCR - it is called CUBE.
But for some images (in spite of fact that they are bigger MIN(width, height) >1000 - I rescale them for tesseract, I get bad results for recognition
http://goo.gl/l9uJMe
However on other set of images results are better:
http://goo.gl/cwA9DC
Those images smaller I do not resize them, (just convert to grayscale mode).
If what I wrote about engine is correct.
Suppose trained set is based on font with size 14pt. Symbols from pictures are resized to some specific size, and I do not see any reason why they are not recognised in such case.
I also tried custom dictionaries, to penalise non dictionary words - did not give too much benefit to recognition.
tesseract = new tesseract::TessBaseAPI();
GenericVector<STRING> variables_name(1),variables_value(1);
variables_name.push_back("user_words_suffix");
variables_value.push_back("user-words");
int retVal = tesseract->Init([self.tesseractDataPath cStringUsingEncoding:NSUTF8StringEncoding], NULL,tesseract::OEM_TESSERACT_ONLY, NULL, 0, &variables_name, &variables_value, false);
ok |= retVal == 0;
ok |= tesseract->SetVariable("language_model_penalty_non_dict_word", "0.2");
ok |= tesseract->SetVariable("language_model_penalty_non_freq_dict_word", "0.2");
if (!ok)
{
NSLog(#"Error initializing tesseract!");
}
So my question is should I train tesseract on another font?
And ,honestly speaking, why I should train it? on default trained data text from Internet, or screen of PC(Mac) I get good recognition.
I also checked original tesseract English trained data it has 38 tiff files, that belong to the following families:
1) Аrial
2) verdana
3 )trebuc
4) times
5) georigia
6 ) cour
It seems that font from image does not belong to this set.

In your case the size of the image is not the problem. As I can see from your attached images (and I'm surprised that nobody mentioned it before) the problem is that the text on images from which you get bad results is not placed on straight lines.
One of the things that Tesseract does at early stages of OCR process is to detect image layout and extracting whole lines of text.
This image is the best example to illustrate this part of the process:
As you can see the engine is expecting the text to be perpendicular to the edge of the image.

If you done with all necessary image processing then try this, It may helpful for you
CGSize size = [image size];
int width = size.width;
int height = size.height;
uint32_t* _pixels = (uint32_t *) malloc(width * height * sizeof(uint32_t));
if (!_pixels) {
return;//Invalid image
}
// Clear the pixels so any transparency is preserved
memset(_pixels, 0, width * height * sizeof(uint32_t));
CGColorSpaceRef colorSpace = CGColorSpaceCreateDeviceRGB();
// Create a context with RGBA _pixels
CGContextRef context = CGBitmapContextCreate(_pixels, width, height, 8, width * sizeof(uint32_t), colorSpace,kCGBitmapByteOrder32Little | kCGImageAlphaPremultipliedLast);
// Paint the bitmap to our context which will fill in the _pixels array
CGContextDrawImage(context, CGRectMake(0, 0, width, height), [image CGImage]);
// We're done with the context and color space
CGContextRelease(context);
CGColorSpaceRelease(colorSpace);
_tesseract->SetImage((const unsigned char *) _pixels, width, height, sizeof(uint32_t), width * sizeof(uint32_t));
_tesseract->SetVariable("tessedit_char_whitelist", ".#0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz/-!");
_tesseract->SetVariable("tessedit_consistent_reps", "0");
char* utf8Text = _tesseract->GetUTF8Text();
NSString *str = nil;
if (utf8Text) {
str = [NSString stringWithUTF8String:utf8Text];
}

Related

Improving Tesseract OCR Quality Fails

I am currently using tesseract to scan receipts. The quality wasn't good so I read this article on how to improve it: https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality#noise-removal. I implemented resizing, deskewing(aligning), and gaussian blur. But none of them seem to have a positive effect on the accuracy of the OCR except the deskewing. Here is my code for resizing and gaussian blur. Am I doing anything wrong? If not, what else can I do to help?
Code:
+(UIImage *) prepareImage: (UIImage *)image{
//converts UIImage to Mat format
Mat im = cvMatWithImage(image);
//grayscale image
Mat gray;
cvtColor(im, gray, CV_BGR2GRAY);
//deskews text
//did not provide code because I know it works
Mat preprocessed = preprocess2(gray);
double skew = hough_transform(preprocessed, im);
Mat rotated = rot(im,skew* CV_PI/180);
//resize image
Mat scaledImage = scaleImage(rotated, 2);
//Guassian Blur
GaussianBlur(scaledImage, scaledImage, cv::Size(1, 1), 0, 0);
return UIImageFromCVMat(scaledImage);
}
// Organization -> Resizing
Mat scaleImage(Mat mat, double factor){
Mat resizedMat;
double width = mat.cols;
double height = mat.rows;
double aspectRatio = width/height;
resize(mat, resizedMat, cv::Size(width*factor*aspectRatio, height*factor*aspectRatio));
return resizedMat;
}
Receipt:

If you read the Tesseract documentation you will see that tesseract engine works best with texts in a single line in a square. Passing it the whole receipt image reduces the engine's accuracy. What you need to do is use the new iOS framework CITextFeature to detect texts in your receipt into multiple blocks of images. Then only you can pass those images to tesseract for processing.

Find average color of an area inside UIImageView [duplicate]

I am writing this method to calculate the average R,G,B values of an image. The following method takes a UIImage as an input and returns an array containing the R,G,B values of the input image. I have one question though: How/Where do I properly release the CGImageRef?
-(NSArray *)getAverageRGBValuesFromImage:(UIImage *)image
{
CGImageRef rawImageRef = [image CGImage];
//This function returns the raw pixel values
const UInt8 *rawPixelData = CFDataGetBytePtr(CGDataProviderCopyData(CGImageGetDataProvider(rawImageRef)));
NSUInteger imageHeight = CGImageGetHeight(rawImageRef);
NSUInteger imageWidth = CGImageGetWidth(rawImageRef);
//Here I sort the R,G,B, values and get the average over the whole image
int i = 0;
unsigned int red = 0;
unsigned int green = 0;
unsigned int blue = 0;
for (int column = 0; column< imageWidth; column++)
{
int r_temp = 0;
int g_temp = 0;
int b_temp = 0;
for (int row = 0; row < imageHeight; row++) {
i = (row * imageWidth + column)*4;
r_temp += (unsigned int)rawPixelData[i];
g_temp += (unsigned int)rawPixelData[i+1];
b_temp += (unsigned int)rawPixelData[i+2];
}
red += r_temp;
green += g_temp;
blue += b_temp;
}
NSNumber *averageRed = [NSNumber numberWithFloat:(1.0*red)/(imageHeight*imageWidth)];
NSNumber *averageGreen = [NSNumber numberWithFloat:(1.0*green)/(imageHeight*imageWidth)];
NSNumber *averageBlue = [NSNumber numberWithFloat:(1.0*blue)/(imageHeight*imageWidth)];
//Then I store the result in an array
NSArray *result = [NSArray arrayWithObjects:averageRed,averageGreen,averageBlue, nil];
return result;
}
I tried two things:
Option 1:
I leave it as it is, but then after a few cycles (5+) the program crashes and I get the "low memory warning error"
Option 2:
I add one line
CGImageRelease(rawImageRef)
before the method returns. Now it crashes after the second cycle, I get the EXC_BAD_ACCESS error for the UIImage that I pass to the method. When I try to analyze (instead of RUN) in Xcode I get the following warning at this line
"Incorrect decrement of the reference count of an object that is not owned at this point by the caller"
Where and how should I release the CGImageRef?
Thanks!

Your memory issue results from the copied data, as others have stated. But here's another idea: Use Core Graphics's optimized pixel interpolation to calculate the average.
Create a 1x1 bitmap context.
Set the interpolation quality to medium (see later).
Draw your image scaled down to exactly this one pixel.
Read the RGB value from the context's buffer.
(Release the context, of course.)
This might result in better performance because Core Graphics is highly optimized and might even use the GPU for the downscaling.
Testing showed that medium quality seems to interpolate pixels by taking the average of color values. That's what we want here.
Worth a try, at least.
Edit: OK, this idea seemed too interesting not to try. So here's an example project showing the difference. Below measurements were taken with the contained 512x512 test image, but you can change the image if you want.
It takes about 12.2 ms to calculate the average by iterating over all pixels in the image data. The draw-to-one-pixel approach takes 3 ms, so it's roughly 4 times faster. It seems to produce the same results when using kCGInterpolationQualityMedium.
I assume that the huge performance gain is a result from Quartz noticing that it does not have to decompress the JPEG fully but that it can use the lower frequency parts of the DCT only. That's an interesting optimization strategy when composing JPEG compressed pixels with a scale below 0.5. But I'm only guessing here.
Interestingly, when using your method, 70% of the time is spent in CGDataProviderCopyData and only 30% in the pixel data traversal. This hints to a lot of time spent in JPEG decompression.
Note: Here's a late follow up on the example image above.

You don't own the CGImageRef rawImageRef because you obtain it using [image CGImage]. So you don't need to release it.
However, you own rawPixelData because you obtained it using CGDataProviderCopyData and must release it.
CGDataProviderCopyData
Return Value:
A new data object containing a copy of the provider’s data. You are responsible for releasing this object.

I believe your issue is in this statement:
const UInt8 *rawPixelData = CFDataGetBytePtr(CGDataProviderCopyData(CGImageGetDataProvider(rawImageRef)));
You should be releasing the return value of CGDataProviderCopyData.

Your mergedColor works great on an image loaded from a file, but not for an image capture by the camera. Because CGBitmapContextGetData() on the context created from a captured sample buffer doesn't return it bitmap. I changed your code to as following. It works on any image and it is as fast as your code.
- (UIColor *)mergedColor
{
CGImageRef rawImageRef = [self CGImage];
// scale image to an one pixel image
uint8_t bitmapData[4];
int bitmapByteCount;
int bitmapBytesPerRow;
int width = 1;
int height = 1;
bitmapBytesPerRow = (width * 4);
bitmapByteCount = (bitmapBytesPerRow * height);
memset(bitmapData, 0, bitmapByteCount);
CGColorSpaceRef colorspace = CGColorSpaceCreateDeviceRGB();
CGContextRef context = CGBitmapContextCreate (bitmapData,width,height,8,bitmapBytesPerRow,
colorspace,kCGBitmapByteOrder32Little|kCGImageAlphaPremultipliedFirst);
CGColorSpaceRelease(colorspace);
CGContextSetBlendMode(context, kCGBlendModeCopy);
CGContextSetInterpolationQuality(context, kCGInterpolationMedium);
CGContextDrawImage(context, CGRectMake(0, 0, width, height), rawImageRef);
CGContextRelease(context);
return [UIColor colorWithRed:bitmapData[2] / 255.0f
green:bitmapData[1] / 255.0f
blue:bitmapData[0] / 255.0f
alpha:1];
}

CFDataRef abgrData = CGDataProviderCopyData(CGImageGetDataProvider(rawImageRef));
const UInt8 *rawPixelData = CFDataGetBytePtr(abgrData);
...
CFRelease(abgrData);

Generating a 54 megapixel image on iPhone 4/4S and iPad 2

I'm currently working on a project that must generate a collage of a 9000x6000 pixels resolution, generated from 15 photos. The problem that I'm facing is that when I finish drawing I'm getting an empty image (those 15 images are not being drawn in the context).
This problem is only present on devices with 512MB of RAM like iPhone 4/4S or iPad 2 and I think that this is a problem caused by the system because it cannot allocate enough memory for this app. When I run this line: UIGraphicsBeginImageContextWithOptions(outputSize, opaque, 1.0f); the app's memory usage raises by 216MB and the total memory usage gets to ~240MB RAM.
The thing that I cannot understand is why on Earth the images that I'm trying to draw within the for loop are not being rendered always on the currentContext? I emphasized the word always, because only once in 30 tests the images were rendered (without changing any line of code).
Question nr. 2: If this is a problem caused by the system because it cannot allocate enough memory, is there any other way to generate this image, like a CGContextRef backed by a file output stream, so that it won't keep the image in the memory?
This is the code:
CGSize outputSize = CGSizeMake(9000, 6000);
BOOL opaque = YES;
UIGraphicsBeginImageContextWithOptions(outputSize, opaque, 1.0f);
CGContextRef currentContext = UIGraphicsGetCurrentContext();
CGContextSetFillColorWithColor(currentContext, [UIColor blackColor].CGColor);
CGContextFillRect(currentContext, CGRectMake(0, 0, outputSize.width, outputSize.height));
for (NSUInteger i = 0; i < strongSelf.images.count; i++)
{
#autoreleasepool
{
AGAutoCollageImageData *imageData = (AGAutoCollageImageData *)strongSelf.layout.images[i];
CGRect destinationRect = CGRectMake(floorf(imageData.destinationRectangle.origin.x * scaleXRatio),
floorf(imageData.destinationRectangle.origin.y * scaleYRatio),
floorf(imageData.destinationRectangle.size.width * scaleXRatio),
floorf(imageData.destinationRectangle.size.height * scaleYRatio));
CGRect sourceRect = imageData.sourceRectangle;
// Draw clipped image
CGImageRef clippedImageRef = CGImageCreateWithImageInRect(((ALAsset *)strongSelf.images[i]).defaultRepresentation.fullScreenImage, sourceRect);
CGContextDrawImage(currentContext, drawRect, clippedImageRef);
CGImageRelease(clippedImageRef);
}
}
// Pull the image from our context
strongSelf.result = UIGraphicsGetImageFromCurrentImageContext();
// Pop the context
UIGraphicsEndImageContext();
P.S: The console doesn't show anything but 'memory warnings', which are expected to see.

Sound like a cool project.
Tactic: try also releasing imageData at the end of every loop (explicitly, after releasing the clippedImageRef)
Strategic:
If you do need to support such "low" RAM requirements with such "high" input, maybe you should consider 2 alternative options:
Compress (obviously): even minimal, naked to the eye, JPEG compression can go a long way.
Split: never "really" merge the image. Have an arrayed datastructure which represents a BigImage. make utilities for the presentation logic.

Variable size of CGContext

I'm currently using a UIGraphicsBeginImageContext(resultingImageSize); to create an image.
But when I call this function, I don't know exactly the width of resultingImageSize.
Indeed, I developed some kind of video processing which consume lots of memory, and I cannot process first then draw after: I must draw during the video process.
If I set, for example UIGraphicsBeginImageContext(CGSizeMake(300, 400));, the drawn part over 400 is lost.
So is there a solution to set a variable size of CGContext, or resize a CGContext with very few memory consume?

I found a solution by creating a new larger Context each time it must be resized. Here's the magic function:
void MPResizeContextWithNewSize(CGContextRef *c, CGSize s) {
size_t bitsPerComponents = CGBitmapContextGetBitsPerComponent(*c);
size_t numberOfComponents = CGBitmapContextGetBitsPerPixel(*c) / bitsPerComponents;
CGContextRef newContext = CGBitmapContextCreate(NULL, s.width, s.height, bitsPerComponents, sizeof(UInt8)*s.width*numberOfComponents,
CGBitmapContextGetColorSpace(*c), CGBitmapContextGetBitmapInfo(*c));
// Copying context content
CGImageRef im = CGBitmapContextCreateImage(*c);
CGContextDrawImage(newContext, CGRectMake(0, 0, CGBitmapContextGetWidth(*c), CGBitmapContextGetHeight(*c)), im);
CGImageRelease(im);
CGContextRelease(*c);
*c = newContext;
}
I wonder if it could be optimized, for example with memcpy, as suggested here. I tried but it makes my code crash.

iOS Resize and crop not squared images - high quality

I'm facing the following problem : I have several UIImage (not squared) and I need to resize and crop them. I have read almost every question on StackOverflow but the results that I get are not good, I mean the image produced has a poor quality(blurry).
This is the scenario :
1) Original images size : width 208 pixel - height variable (i.e. from 50 to 2500)
2) Result images : width 100 pixel - height max 200 pixel
That is what I've done so far to achieve this result :
..... // missing code
CGFloat height = (100*image.size.height)/image.size.width;
self.thumbnail=[image resizedImage:CGSizeMake(100,height)
interpolationQuality:kCGInterpolationHigh];
..... // missing code
The method that I use to resize the image can be found here , once the image is resized I crop it using the following code :
CGRect croppedRect;
croppedRect = CGRectMake(0, 0, self.thumbnail.size.width, 200);
CGImageRef tmp = CGImageCreateWithImageInRect([self.thumbnail CGImage],
croppedRect);
self.thumbnail = [UIImage imageWithCGImage:tmp];
CGImageRelease(tmp);
Long story short, the image is resized and cropped but the quality is really poor considering that the original image had a really good quality.
So the question is how to achieve this keeping an high quality of the image?

If you target iOS 4 and later you should use ImageIO to resize images.
http://www.cocoabyss.com/coding-practice/uiimage-scaling-using-imageio/

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart