What technology does google use for OCR in Google Drive - machine-learning

I need to use an OCR library and have been trying to use Tesseract. The problem is that the accuracy of Tesseract on a few test images is not very good. I'm not able to improve the quality of the images either as they are going to be submitted by users.
However, when I uploaded the file using the OCR feature in GoogleDocs it was able to read the image perfectly. I was wondering whether there a google OCR library, I know it has been improving Tesseract, or is the Google OCR library strictly proprietary (I presume it is)?
Thanks

to have a good result and good detection, you may need to improve the detection by using some filters and automatic thresholding. You can try PILLOW library for filters and image manipulation.

Related

Image pre-processing in OCR

Our project is all about OCR and base on my research, before performing the character recognition it will go through on pre-processing stage. I know we can use openCV for that but we can't use it base on our rules.
My question is, can someone tells me the step-by-step of pre processing and the best method/algorithm to use.
like what I know,
1.YUVluminace
2.greyscale
3.otsu thresholding
4.Binarization
5.Hough transform
Original Image> YUVluminace> greyscale what's next??
thanks!
In some of my older blog posts, I addressed some parts of your questions:
Binarization on various image qualities from mobile cameras:
http://www.ocr-it.com/guide-to-better-mobile-images-from-cell-phone-camera-for-higher-quality-ocr
Image pre-rpocessing and segmentation for better OCR:
http://www.ocr-it.com/user-scenario-process-digital-camera-pictures-and-ocr-to-extract-specific-numbers
In reality, there is no step-by-step, per my experience. You could use original image for OCR if you wanted to, with means no pre-processing is nessesary. Yes, pre-processing will help, but it depends on the source and type of your images (which you did not specify). For example, a typical office document scanned on a professional scanner with Kofax VRS requires no pre-processing before OCR. Mobile camera image requires a lot of pre-processing. Picture from a parking garage camera will require a lot of pre-processing, but different steps and algorithms from mobile camera picture.
I think decide what is the next major limiting factor in your images, pre-process against it, then look for the next correctable issue.

iOS: Real Time OCR on top of live camera feed (similar to iTunes Redeem Gift Card)

Is there a way to accomplish something similar to what the iTunes and App Store Apps do when you redeem a Gift Card using the device camera, recognizing a short string of characters in real time on top of the live camera feed?
I know that in iOS 7 there is now the AVMetadataMachineReadableCodeObject class which, AFAIK, only represents barcodes. I'm more interested in detecting and reading the contents of a short string. Is this possible using publicly available API methods, or some other third party SDK that you might know of?
There is also a video of the process in action:
https://www.youtube.com/watch?v=c7swRRLlYEo
Best,
I'm working on a project that does something similar to the Apple app store redeem with camera as you mentioned.
A great starting place on processing live video is a project I found on GitHub. This is using the AVFoundation framework and you implement the AVCaptureVideoDataOutputSampleBufferDelegate methods.
Once you have the image stream (video), you can use OpenCV to process the video. You need to determine the area in the image you want to OCR before you run it through Tesseract. You have to play with the filtering, but the broad steps you take with OpenCV are:
Convert the images to B&W using cv::cvtColor(inputMat, outputMat, CV_RGBA2GRAY);
Threshold the images to eliminate unnecessary elements. You specify the threshold value to eliminate, and then set everything else to black (or white).
Determine the lines that form the boundary of the box (or whatever you are processing). You can either create a "bounding box" if you have eliminated everything but the desired area, or use the HoughLines algorithm (or the probabilistic version, HoughLinesP). Using this, you can determine line intersection to find corners, and use the corners to warp the desired area to straighten it into a proper rectangle (if this step is necessary in your application) prior to OCR.
Process the portion of the image with Tesseract OCR library to get the resulting text. It is possible to create training files for letters in OpenCV so you can read the text without Tesseract. This could be faster but also could be a lot more work. In the App Store case, they are doing something similar to display the text that was read overlaid on top of the original image. This adds to the cool factor, so it just depends on what you need.
Some other hints:
I used the book "Instant OpenCV" to get started quickly with this. It was pretty helpful.
Download OpenCV for iOS from OpenCV.org/downloads.html
I have found adaptive thresholding to be very useful, you can read all about it by searching for "OpenCV adaptiveThreshold". Also, if you have an image with very little in between light and dark elements, you can use Otsu's Binarization. This automatically determines the threshold values based on the histogram of the grayscale image.
This Q&A thread seems to consistently be one of the top search hits for the topic of OCR on iOS, but is fairly out of date, so I thought I'd post some additional resources that might be useful that I've found as of the time of writing this post:
Vision Framework
https://developer.apple.com/documentation/vision
As of iOS 11, you can now use the included CoreML-based Vision framework for things like rectangle or text detection. I've found that I no longer need to use OpenCV with these capabilities included in the OS. However, note that text detection is not the same as text recognition or OCR so you will still need another library like Tesseract (or possibly your own CoreML model) to translate the detected parts of the image into actual text.
SwiftOCR
https://github.com/garnele007/SwiftOCR
If you're just interested in recognizing alphanumeric codes, this OCR library claims significant speed, memory consumption, and accuracy improvements over Tesseract (I have not tried it myself).
ML Kit
https://firebase.google.com/products/ml-kit/
Google has released ML Kit as part of its Firebase suite of developer tools, in beta at the time of writing this post. Similar to Apple's CoreML, it is a machine learning framework that can use your own trained models, but also has pre-trained models for common image processing tasks like Vision Framework. Unlike Vision Framework, this also includes a model for on-device text recognition of Latin characters. Currently, use of this library is free for on-device functionality, with charges for using cloud/SAAS API offerings from Google. I have opted to use this in my project, as the speed and accuracy of recognition seems quite good, and I also will be creating an Android app with the same functionality, so having a single cross platform solution is ideal for me.
ABBYY Real-Time Recognition SDK
https://rtrsdk.com/
This commercial SDK for iOS and Android is free to download for evaluation and limited commercial use (up to 5000 units as of time of writing this post). Further commercial use requires an Extended License. I did not evaluate this offering due to its opaque pricing.
'Real time' is just a set of images. You don't even need to think about processing all of them, just enough to broadly represent the motion of the device (or the change in the camera position). There is nothing built into the iOS SDK to do what you want, but you can use a 3rd party OCR library (like Tesseract) to process the images you grab from the camera.
I would look into Tesseract. It's an open source OCR library that takes image data and processes it. You can add different regular expressions and only look for specific characters as well. It isn't perfect, but from my experience it works pretty well. Also it can be installed as a CocoaPod if you're into that sort of thing.
If you wanted to capture that in real time you might be able to use GPUImage to catch images in the live feed and do processing on the incoming images to speed up Tesseract by using different filters or reducing the size or quality of the incoming images.
There's a project similar to that on github: https://github.com/Devxhkl/RealtimeOCR

image preprocessing tesseract

I'm doing a business card scanner for my final examination about digital image processing, and I would ask to you how I have to preprocess a photo of a business card so tesseract can recognize text. I tried a lot of things, like erosion, dilation, thresholding, but I can't have a good result... Can you help me?
Thank you
Marco
If your concern is only about text recognition and not about preprocessing, consider using ScanTailor. It is an excellent pre-processing tool and it is open source.
If you want to implement the pre-processing yourself, you might want to have a look at this paper - especially the skew correction and the background estimation.The results of the algorithms described here are good. ScanTailor uses some of these.
I would recommend the open source C++ image processing library OpenCV combination with the open source free Optical Character Recognition (OCR) library tesseract.
Since your information of your problem isn't quite specific, i can answer your question in general
The main procedure in OCR is:
perform some kind of preprocessing on the image
text detection to get your ROI (Region of interest, the region containing your text)
character detection (take the text-only image and use it as input for tesseract
a few words about tesseract:
There is a lot of information to the library available online. It is a google open source library used for the google books OCR purpose. Can also handle layout analyzes in your image, but isn't perfect in this, therefore a preprocessing yourself and using tesseract only for the real character recognition part can lead to a better result. Feel free to question, if you still have questions, or if I missunderstood your question.

How to use Opencv for Document Recognition with OCR?

I´m a beginner on computer vision, but I know how to use some functions on opencv. I´m tryng to use Opencv for Document Recognition, I want a help to find the steps for it.
I´m thinking to use opencv example find_obj.cpp , but the documents, for example passport, has some variables, name, birthdate, pictures. So, I need a help to define the steps for it, and if is possible how function I have to use on the steps.
I'm not asking a whole code, but if anyone has any example link or you can just type a walkthrough, it is of great help.
There are two very different steps involved here. One is detecting your object, and the other is analyzing it.
For object detection, you're just trying to figure out whether the object is in the frame, and approximately where it's located. The OpenCv features framework is great for this. For some tutorials and comprehensive sample code, see the OpenCv features2d tutorials and especially the feature matching tutorial.
For analysis, you need to dig into optical character recognition (OCR). OpenCv does not include OCR libraries, but I recommend checking out tesseract-ocr, which is a great OCR library. If your documents have a fixed structured (consistent layout of text fields) then tesseract-ocr is all you need. For more advanced analysis checking out ocropus, which uses tesseract-ocr but adds layout analysis.

Which SDK should I use to visualize medical images in 3D?

I need to process DICOM formatted medical images and visualize them in 3D, also do some image processing on these images on real-time. Therefore, I am asking this question to learn which SDK has better real-time characteristics for medical visualization and image processing?
The Visualization Toolkit (VTK) is an open-source, freely available software system for 3D computer graphics, image processing and visualization.
You can find details here.
Or another solution would be the modifying or utilizing 3D engine that supports volume rendering.
Moreover, for computer vision algorithms, OpenCV seems promising.
osgVolume is an add-in to the popular openscenegraph library for doing this
Just use GDCM+VTK. In 2D simply use gdcmviewer. In 3D you need to build gdcmorthoplanes.
Ref:
http://sourceforge.net/apps/mediawiki/gdcm/index.php?title=Gdcmviewer
http://sourceforge.net/apps/mediawiki/gdcm/index.php?title=Using_GDCM_API
You could check out MITK (http://mitk.org) which combines the already mentioned VTK with the Insight Toolkit (http://www.itk.org) for image processing. Another option to start from could be Slicer (http://www.slicer.org), but this depends on the license you need.
In a uni we were taught Matlab for DICOM file processing. I think it has pretty nice and easy to use plugins for that as well. The end results were that using Matlab I was able to do all kinds of DICOM image processing, filtering and so forth.
As you probably know, Matlab is not SDK but a complete environment. Nevertheless you can write scripts to achieve normal application behavior: Create windows, buttons, images, etc.

Resources