pytesseract - Read text from images with more accuracy - opencv

I am working on pytesseract. I want to read data from Driving License kind of thing. Presently i am converting .jpg image to binary(gray scale) format using opencv but i am not accurate result. How do you solve this? Is there any standard size of image?

Localize your detection by setting the rectangles where Tesseract has to look. You can then restrict according to rectangle which type of data is present at that place example: numerical,alphabets etc.You can also make a dictionary file for tesseract to improve accuracy(This can be used for detecting card holder name by listing common names in a file). If there is disturbance in the background then design a filter to remove it. Good Luck!

Related

How to overlay the RTDOSE and Image data in the correct position?

I'm currently an MS student in Medical Physics and I have a great need to be able to overlay an isodose distribution from an RTDOSE file onto a CT image from a .dcm file set.
I've managed to extract the image and the dose pixel arrays myself using pydicom and dicom_numpy, but the two arrays are not the same size! So, if I overlay the two together, the dose will not be in the correct position based on what the Elekta Gamma Plan software exported it as.
I've played around with dicompyler and 3DSlicer and they obviously are able to do this even though the arrays are not the same size. However, I think I cannot export the numerical data when using these softwares.I can only scroll through and view it as an image. How can I overlay the RTDOSE to an CT image?
Thank you
for what you want it sounds like you should use Simple ITK (or equivalent - my experience is with sitk) to do the dicom handling, not pydicom.
Dicom has built in a complete system for 3D point and location specifications for all the pixel data in patient coordinates. This uses a bunch of attributes in the dicom files in the Image Plane Module set of tags. See here for a good overview.
The simple ITK library fully understands and uses the full 3D Image Plane tags to identify and locate any images in patient coordinates by default - irrespective of such things as the specific pixel spacing, slice thickness etc etc.
So - in your case - if you use SITK to open your studies, then you should be able to overlay them correctly "out of the box", because SITK will do all the work to parse the Image Plane Module tags and locate the data in patient coordinates - just like you get with 3DSlicer.
Pydicom, in contrast, doesn't itself try to use any of that information at all. It only gives you the raw pixel arrays (for images).
Note I use both pydicom and SITK. This isn't something bad about pydicom, but more a question of right tool for the job. In fact, for many (most?) things I use pydicom, but for any true 3D type work, SITK is the easier toolkit to use.

Best practices for tesseract ocr using python

I'm working on a project in which I want to recognize text from a credit card sized document.The document contains details like name,phone number ,address etc. I'm capturing the image and pass the image into tesseract engine using
text = pytesseract.image_to_string(Image.open(filename), lang = 'eng'). Sometimes I'm getting decent results for each field but most of the time result is very bad. How do I resolve this issue ? What are the best practices. How the document readers work with OCR. Is it possible to process region based ocr in the document ?
A single approach can't read every text. You have to apply multiple approach for multiple types of pdf.
If the text is not horizontal, you have to rotate the text. If the text is curved, you have to use transformation (e.g. hog transform).
Moreover, to read text using the package, the texts should be clear and horizontal. Otherwise you need to create rules and transform them.

How can I improve Tesseract results quality?

I'm trying to read the NIRPP number (social security number) from a French vital card using Tesseract's OCR (I'm using TesseractOCRiOS 4.0.0). So here is what I'm doing :
First, I request a picture of the whole card :
Then, using a custom cropper, I ask the user to zoom specifically on the card number:
And then I catch this image (1291x202px) and using Tesseract I try to read the number:
let tesseract = G8Tesseract(language: "eng")
tesseract?.image = pickedImage
tesseract?.recognize()
print("\(tesseract?.recognizedText ?? "")")
But I'm getting pretty bad results... only like 30% of the time Tesseract is able to find the right number, and among these sometimes I need to trim some characters (like alpha characters, dots, dashes...).
So is there a solution for me to improve these results?
Thanks for your help.
To improve your results :
Zoom your image to appropriate level. Right amount of zoom will improve your accuracy by a lot.
Configure tesseract so that only digits are whitelisted . I am
assuming here what you are trying to extract contains only digits.If
you whitelist only digits then it will improve your chances of
recognizing 0 as 0 and not O character.
If your extracted text matches a regex, you should configure
tesseract to use that regex as well.
Pre process your image to remove any background colors and apply
Morphology effects like erode to increase the space between your
characters/digits. If they are too close , tesseract will have
hard time recognizing them correctly. Most of the image processing
library comes prebuilt with those effects.
Use tiff as image format.
Once you have the right preprocessing pipeline and configuration for tesseract , you will usually get a very good and consistent result.
There are couple of things you need to do it....
1.you need to apply black and white or gray scale on image.
you will use default functionality like Graphics framework or third party libray like openCV or GPUImage for applying black&white or grayscale.
2.and then apply text detection using Vision framework.
From vision text detection you can crop texts according to vision text detected coordinates.
3.pass this cropped images(text detected) to TesseractOCRiOS...
I hope it will work for your use-case.
Thanks
I have a similar issue. I discovered that Tesseract recognizes a text only if the given image contain a region of interest.
I solved the problem using Apple' Vision framework. It has VNDetectTextRectanglesRequest that returns CGRect of detected text according to the image. Then you can crop the image to region where text is present and send them to Tesseract for detection.
Ray Smith says:
Since HP had independently-developed page layout analysis technology that was used in products, (and therefore not released for open-source) Tesseract never needed its own page layout analysis. Tesseract therefore assumes that its input is a binary image with optional polygonal text regions defined.

Parsing / Scraping information from an image

I am looking for a library that would help scrape the information from the image below.
I need the current value so it would have to recognise the values on the left and then estimate the value of the bottom line.
Any ideas if there is a library out there that could do something like this? Language isn't really important but I guess Python would be preferable.
Thanks
I don't know of any "out of the box" solution for this and I doubt one exists. If all you have is the image, then you'll need to do some image processing. A simple binarization method (like Otsu binarization) would make it easier to process:
The binarization makes it easier because now the pixels are either "on" or "off."
The locations for the lines can be found by searching for some number of pixels that are all on horizontally (5 on in a row while iterating on the x axis?).
Then a possible solution would be to pass the image to an OCR engine to get the numbers (tesseractOCR is an open source OCR engine hosted at Google (C++): tesseractOCR). You'd still have to find out where the numbers are in the image by iterating through it.
Then, you'd have to find where the lines are relative to the keys on the left and do a little math and you can get your answer.
OpenCV is a beefy computer vision library that has things like the binarization. It is also a C++ library.
Hope that helps.

Semi-Automatic Text Highlighting in Images?

Greetings Overflowers,
Given that:
I have images of documents with text of mixed languages
I need this text to be highlightable (word by word) by end users
I have this text in plain digital format already
I will help my program to figure out where words are
I do not want my help to be tedious to me
I will also manually fix small inaccuracies after my program
What is the best easy help I can provide for my program to be able to draw rectangles around selected words ? What algorithm would you use for this program ? I tried OCR stuff like OmniPage Pro but they do not provide this functionality.
Regards
I have implemented a word bounding box and highlighted words in my application some years ago. You said "I have this text in plain digital format". One key component is to have coordinates of characters or words in order to map them to proper image areas. Like in searchable PDF, when you select text it is internally mapped to the image layer, and opposite selection on the image selects matching text. But even from PDF those coordinates cannot be exported I believe. If no such coordinate informaiton exists currently in your text, easiest is probably to re-OCR images with a high quality engine that can produce coordinates as part of output. If you were to use WiseTREND OCR Cloud 2.0, then XML output will produce all that detailed metadata. If coordinates informaiton exists, then all major components are there and it is just work around efficient UI design.

Resources