Tesseract OCR read text from photo - image-processing

I have more than 5000 of images like below. Text in the images have 10 variants like T122 R-2 T123 R-12 T45 R-1 etc.
I want to read text in the image but result is like below.
, | 5 ğ ve . > | >
İmes <ğ
de 3
U | | i
( ' g
i : .
İM ; »
| ? > e
* >
| NN | va
, İ
( di
(Ny ,
i
>
i |
, > N i | Ni
How can I improve OCR accuracy ?
I have tried various of filters but result is almost same.
I am familiar with tesseract so it's better for me to use it but If there is better OCR I can try it as well.
P.S. I know Google Vision has better results but I couldn't find a way to automate it.

Any OCR should be able to handle that image without a problem if you roughly deskew first
so using CLI deskew https://github.com/galfar/deskew
read the options so in this case needed to change with -a 89
deskew -a angle: Maximal expected skew angle (both directions) in degrees (default: 10)
also worth adding -b 000000 to black fill background and set output to .pgm
Deskew 1.30 (2019-06-07) x64 by Marek Mauder
http://galfar.vevb.net/deskew/
Preparing input image (yRXJb.jpg [2000x1500/R8G8B8]) ...
Calculating skew angle...
Skew angle found [deg]: -75.025
Rotating image...
Saving output (\deskewed-yRXJb.pgm [1966x2320/Gray8]) ...
Done!
For Tesseract there may be better adjustments in some cases such as threshold but I left at default for the one test case and tesseract needs either --psm 11 or --psm 12 to see the text in this busy image.
I would have hoped for T122 -R-2 but got T122 R-2 and can't seem to better that without considering training.

OCR-Tesseract have many corner points which gives inappropriate results. One of these is rotation. Documentation says:
The quality of Tesseract’s line segmentation reduces significantly if
a page is too skewed, which severely impacts the quality of the OCR.
Before come to the others, the biggest problem is rotation in your case. So you should figure it out firstly.
Second is noise which gives different kind of letters as result. To improve it you can check the tesseract documentation(Improving the quality of the output
) which is very clear.
Also I can suggest you to use the method which I explained here before. First detect the text then recognize the letters inside. This will help you not to recognize unexpected letters.

Related

image preprocessing methods that can be used for identification of industrial parts name (stuck or engraved) on the surface?

I am working on a project where my task is to identify machine part by its part number written on label attached to it or engraved on its surface. One such example of label and engraved part is shown in below figures.
My task is to recognise 9 or 10 alphanumerical number (03C 997 032 D in 1st image and 357 955 531 in 2nd image). This seems to be easy task however I am facing problem in distinguishing between useful information in the image and rest of the part i.e. there are many other numbers and characters in both image and I want to focus on only mentioned numbers. I tried many things but no success as of now. Does anyone know the image pre processing methods or any ML/DL model which I should apply to get desired result?
Thanks in advance!
JD
You can use OCR to the get all characters from the image and then use regular expressions to extract the desired patterns.
You can use OCR method, like Tesseract.
Maybe, you want to clean the images before running the text-recognition system, by performing some filtering to remove noise / remove extra information, such as:
Convert to gray scale (colors are not relevant, aren't them?)
Crop to region of interest
Canny Filter
A good start can be one of this tutorial:
OpenCV OCR with Tesseract (Python API)
Recognizing text/number with OpenCV (C++ API)

Tesseract & OpenCV - Processing Image

I"m trying to process this chalkboard image I recorded with tesseract, but am having no luck. I thought about doing pre-processing to help improve our results but am unsure of what methods to call on it.
Here is the image I have:
And here are the tesseract commands I'm running:
convert 468.jpg -bordercolor black -border 20x20 468-b.jpg
tesseract 468-b.jpg - -psm 11
Please note it doesn't have to process all the math symbols correctly, but at least get the 1 + 4 = 5 and maybe the 2 x 12 = 24 x 1.
I had a previous photo that worked well with using just these commands. Here's the photo:
And here are the results it would spit out:
I+I
2+2m
It's not perfect, but it was much better than what I was getting before. How can I improve the results of my new chalkboard image? Do I need to use OpenCV, and if so, an example implementation would be very, very helpful
Thanks in advance
P.S. Here was my original question I asked on Tesseracts GitHub that lead me to better results:
https://github.com/tesseract-ocr/tesseract/issues/468
tesseract cannot recognize handwriting image well. So I think you should use deep learning for recognizing handwriting image.
Here is some tutorial you interest in.
https://www.tensorflow.org/versions/r0.12/tutorials/mnist/beginners/index.html#mnist-for-ml-beginners

Scan video for text string?

My goal is to find the title screen from a movie trailer. I need a service where I can search a video for a string, then return the frame with that string. Pretty obscure, does anything like this exist?
e.g. for this movie, I'd scan for "Sausage Party" and retrieve this frame:
Edit: I found the cloudsight api which would actually work except cost is prohibitive # $.04 per call assuming I need to split the video into 1s intervals and scan every image (at least 60 calls per video).
No exact service that I can find, but you could attempt to do this yourself...
ffmpeg -i sausage_party.mp4 -r 1 %04d.png
/usr/local/bin/parallel --no-notice -j 8 \
/usr/local/bin/tesseract -psm 6 -l eng {} {.} \
::: *.png
This extracts one frame a second from the video file, and then uses tesseract to extract the text via OCR into files of the same name as the image frame (eg. 0135.txt. However your results are going to vary massively depending on the font used and the quality of the video file.
You'd probably find it cheaper/easier to use something like Amazon Mechanical Turk , especially since the OCR is going to have a hard time doing this automatically.
Another option could be implementing this service by yourself using the Scene Text Detection and Recognition module in OpenCV (docs.opencv.org/3.0-beta/modules/text/doc/text.html). You can take a look at this video to get an idea of how such a system would operate. As pointed out above the accuracy would depend on the font used in the movie titles, the quality of the video files, and the OCR.
OpenCV relies on Tesseract as the underlying OCR but, alternatively, you could use the text detection and localization functions (docs.opencv.org/3.0-beta/modules/text/doc/erfilter.html) in OpenCV to find text areas in the image and then employ a different OCR to perform the recognition. The text detection and localization stage can be done very quickly thus achieving real time performance would be mostly a matter of picking a fast OCR.

setting threshold and batch processing in ImageJ (FIJI) macro

I know this has been posted elsewhere and that this is no means a difficult problem but I'm very new to writing macros in FIJI and am having a hard time even understanding the solutions described in various online resources.
I have a series of images all in the same folder and want to apply the same operations to them all and save the resultant excel files and images in an output folder. Specifically, I'd like to open, smooth the image, do a Max intensity Z projection, then threshold the images to the same relative value.
This thresholding is one step causing a problem. By relative value I mean that I would like to set the threshold so that the same % of the intensity histogram is included. Currently, in FIJI if you go to image>adjust>threshold you can move the sliders such that a certain percentage of the image is thresholded and it will display that value for you in the open window. In my case 98% is what I am trying to achieve, eg thresholding all but the top 2% of the data.
Once the threshold is applied to the MIP, I convert it to binary and do particle analysis and save the results (summary table, results, image overlay.
My approach has been to try and automate all the steps/ do batch processing but I have been having a hard time adapting what I have written to work based on instructions found online. Instead I've been just opening every image in the directory one by one and applying the macro that I wrote, then saving the results manually. Obviously this is a tedious approach so any help would be much appreciated!
What I have been using for my simple macro:
run("Smooth", "stack");
run("Z Project...", "projection=[Max Intensity]");
setAutoThreshold("Default");
//run("Threshold...");
run("Convert to Mask");
run("Make Binary");
run("Analyze Particles...", " show=[Overlay Masks] display exclude clear include summarize in_situ");
You can use the Process ▶ Batch ▶ Macro... command for this.
For further details, see the Batch Processing page of the ImageJ wiki.

Tesseract on iOS - bad results

After spending over 10 hours to compile tesseract using libc++ so it works with OpenCV, I've got issue getting any meaningful results. I'm trying to use it for digit recognition, the image data I'm passing is a small square (50x50) image with either one or no digits in it.
I've tried using both eng and equ tessdata (from google code), the results are different but both get guess 0 digits. Using eng data I get '4\n\n' or '\n\n' as a result most of the time (even when there's no digit in the image), with confidence anywhere from 1 to 99.
Using equ data I get '\n\n' with confidence 0-4.
I also tried binarizing the image and the results are more or less the same, I don't think there's a need for it though since images are filtered pretty good.
I'm assuming that there's something wrong since the images are pretty easy to recognize compared to even simplest of the example images.
Here's the code:
Initialization:
_tess = new TessBaseAPI();
_tess->Init([dataPath cStringUsingEncoding:NSUTF8StringEncoding], "eng");
_tess->SetVariable("tessedit_char_whitelist", "0123456789");
_tess->SetVariable("classify_bln_numeric_mode", "1");
Recognition:
char *text = _tess->TesseractRect(imageData, (int)bytes_per_pixel, (int)bytes_per_line, 0, 0, (int)imageSize.width, (int)imageSize.height);
I'm getting no errors. TESSDATA_PREFIX is set properly and I've tried different methods for recognition. imageData looks ok when inspected.
Here are some sample images:
http://imgur.com/a/Kg8ar
Should this work with the regular training data?
Any help is appreciated, my first time trying tessarect out and I could have missed something.
EDIT:
I've found this:
_tess->SetPageSegMode(PSM_SINGLE_CHAR);
I'm assuming it must be used in this situation, tried it but got the same results.
I think Tesseract is a bit overkill for this stuff. You would be better off with a simple neural network, trained explicitly for your images. At my company, recently we were trying to use Tesseract on iOS for an OCR task (scanning utility bills with the camera), but it was too slow and inaccurate for our purposes (scanning took more than 30 seconds on an iPhone 4 at a tremendously low FPS). At the end, I trained a neural-network specifically for our target font, and this solution not only beat Tesseract (it could scan stuff flawlessly even on an iPhone 3Gs), but also a commercial ABBYY OCR engine, which we were given a sample from the company.
This course's material would be a good start in machine learning.

Resources