I'm new to video processing and I'm wondering what libraries I can use to do things like detecting letters, drawing boxes around them and so on. If you can name me a couple of good ones, I'd appreciate it very much!
OpenCV: (Open Source Computer Vision) is a cross-platform library of programming functions for real time computer vision.
It provides interfaces for both C and C++ programming laguages.
As for detecting the text region and drawing boxes around it, you can take a look at this article, which explains how to do this stuff using OpenCV. For better OCR capabilities I think that tesseract is the best open source tool available right now.
I've worked on a similar project some time ago and used OpenCV to detect the text region and then tesseract to do proper text recognition.
Related
I have a project about Character Recognition (using openCV libraries).
I don't know how to detect character in text image.
Can you recommend some methods to do this?
Thanks all!
Here is a tutorial, it is dated and uses the C-style API though. This online book has a bunch related to OCR using OpenCV in chapter 5. Many people have done work intergrating tesseract (an OCR engine) with OpenCV, so you might want to check that out.
Is there a way to accomplish something similar to what the iTunes and App Store Apps do when you redeem a Gift Card using the device camera, recognizing a short string of characters in real time on top of the live camera feed?
I know that in iOS 7 there is now the AVMetadataMachineReadableCodeObject class which, AFAIK, only represents barcodes. I'm more interested in detecting and reading the contents of a short string. Is this possible using publicly available API methods, or some other third party SDK that you might know of?
There is also a video of the process in action:
https://www.youtube.com/watch?v=c7swRRLlYEo
Best,
I'm working on a project that does something similar to the Apple app store redeem with camera as you mentioned.
A great starting place on processing live video is a project I found on GitHub. This is using the AVFoundation framework and you implement the AVCaptureVideoDataOutputSampleBufferDelegate methods.
Once you have the image stream (video), you can use OpenCV to process the video. You need to determine the area in the image you want to OCR before you run it through Tesseract. You have to play with the filtering, but the broad steps you take with OpenCV are:
Convert the images to B&W using cv::cvtColor(inputMat, outputMat, CV_RGBA2GRAY);
Threshold the images to eliminate unnecessary elements. You specify the threshold value to eliminate, and then set everything else to black (or white).
Determine the lines that form the boundary of the box (or whatever you are processing). You can either create a "bounding box" if you have eliminated everything but the desired area, or use the HoughLines algorithm (or the probabilistic version, HoughLinesP). Using this, you can determine line intersection to find corners, and use the corners to warp the desired area to straighten it into a proper rectangle (if this step is necessary in your application) prior to OCR.
Process the portion of the image with Tesseract OCR library to get the resulting text. It is possible to create training files for letters in OpenCV so you can read the text without Tesseract. This could be faster but also could be a lot more work. In the App Store case, they are doing something similar to display the text that was read overlaid on top of the original image. This adds to the cool factor, so it just depends on what you need.
Some other hints:
I used the book "Instant OpenCV" to get started quickly with this. It was pretty helpful.
Download OpenCV for iOS from OpenCV.org/downloads.html
I have found adaptive thresholding to be very useful, you can read all about it by searching for "OpenCV adaptiveThreshold". Also, if you have an image with very little in between light and dark elements, you can use Otsu's Binarization. This automatically determines the threshold values based on the histogram of the grayscale image.
This Q&A thread seems to consistently be one of the top search hits for the topic of OCR on iOS, but is fairly out of date, so I thought I'd post some additional resources that might be useful that I've found as of the time of writing this post:
Vision Framework
https://developer.apple.com/documentation/vision
As of iOS 11, you can now use the included CoreML-based Vision framework for things like rectangle or text detection. I've found that I no longer need to use OpenCV with these capabilities included in the OS. However, note that text detection is not the same as text recognition or OCR so you will still need another library like Tesseract (or possibly your own CoreML model) to translate the detected parts of the image into actual text.
SwiftOCR
https://github.com/garnele007/SwiftOCR
If you're just interested in recognizing alphanumeric codes, this OCR library claims significant speed, memory consumption, and accuracy improvements over Tesseract (I have not tried it myself).
ML Kit
https://firebase.google.com/products/ml-kit/
Google has released ML Kit as part of its Firebase suite of developer tools, in beta at the time of writing this post. Similar to Apple's CoreML, it is a machine learning framework that can use your own trained models, but also has pre-trained models for common image processing tasks like Vision Framework. Unlike Vision Framework, this also includes a model for on-device text recognition of Latin characters. Currently, use of this library is free for on-device functionality, with charges for using cloud/SAAS API offerings from Google. I have opted to use this in my project, as the speed and accuracy of recognition seems quite good, and I also will be creating an Android app with the same functionality, so having a single cross platform solution is ideal for me.
ABBYY Real-Time Recognition SDK
https://rtrsdk.com/
This commercial SDK for iOS and Android is free to download for evaluation and limited commercial use (up to 5000 units as of time of writing this post). Further commercial use requires an Extended License. I did not evaluate this offering due to its opaque pricing.
'Real time' is just a set of images. You don't even need to think about processing all of them, just enough to broadly represent the motion of the device (or the change in the camera position). There is nothing built into the iOS SDK to do what you want, but you can use a 3rd party OCR library (like Tesseract) to process the images you grab from the camera.
I would look into Tesseract. It's an open source OCR library that takes image data and processes it. You can add different regular expressions and only look for specific characters as well. It isn't perfect, but from my experience it works pretty well. Also it can be installed as a CocoaPod if you're into that sort of thing.
If you wanted to capture that in real time you might be able to use GPUImage to catch images in the live feed and do processing on the incoming images to speed up Tesseract by using different filters or reducing the size or quality of the incoming images.
There's a project similar to that on github: https://github.com/Devxhkl/RealtimeOCR
I'm doing a business card scanner for my final examination about digital image processing, and I would ask to you how I have to preprocess a photo of a business card so tesseract can recognize text. I tried a lot of things, like erosion, dilation, thresholding, but I can't have a good result... Can you help me?
Thank you
Marco
If your concern is only about text recognition and not about preprocessing, consider using ScanTailor. It is an excellent pre-processing tool and it is open source.
If you want to implement the pre-processing yourself, you might want to have a look at this paper - especially the skew correction and the background estimation.The results of the algorithms described here are good. ScanTailor uses some of these.
I would recommend the open source C++ image processing library OpenCV combination with the open source free Optical Character Recognition (OCR) library tesseract.
Since your information of your problem isn't quite specific, i can answer your question in general
The main procedure in OCR is:
perform some kind of preprocessing on the image
text detection to get your ROI (Region of interest, the region containing your text)
character detection (take the text-only image and use it as input for tesseract
a few words about tesseract:
There is a lot of information to the library available online. It is a google open source library used for the google books OCR purpose. Can also handle layout analyzes in your image, but isn't perfect in this, therefore a preprocessing yourself and using tesseract only for the real character recognition part can lead to a better result. Feel free to question, if you still have questions, or if I missunderstood your question.
I´m a beginner on computer vision, but I know how to use some functions on opencv. I´m tryng to use Opencv for Document Recognition, I want a help to find the steps for it.
I´m thinking to use opencv example find_obj.cpp , but the documents, for example passport, has some variables, name, birthdate, pictures. So, I need a help to define the steps for it, and if is possible how function I have to use on the steps.
I'm not asking a whole code, but if anyone has any example link or you can just type a walkthrough, it is of great help.
There are two very different steps involved here. One is detecting your object, and the other is analyzing it.
For object detection, you're just trying to figure out whether the object is in the frame, and approximately where it's located. The OpenCv features framework is great for this. For some tutorials and comprehensive sample code, see the OpenCv features2d tutorials and especially the feature matching tutorial.
For analysis, you need to dig into optical character recognition (OCR). OpenCv does not include OCR libraries, but I recommend checking out tesseract-ocr, which is a great OCR library. If your documents have a fixed structured (consistent layout of text fields) then tesseract-ocr is all you need. For more advanced analysis checking out ocropus, which uses tesseract-ocr but adds layout analysis.
Does anybody here do computer vision work on Mathematica? I would like to know what external libraries are available for doing that. The built in image processing functions are not enough. I am looking for things like SURF, stereo, camera calibration, multi-view geometry etc.
How difficult would it be to wrap OpenCV for use in Mathematica?
Apart from the extensive set of image processing tools that are now (version 8) natively present in Mathematica, and which include a number of CV algorithms like finding morphologic objects, image segmentation and feature detection (see figure below), there's the new LibraryLink functionality, which makes working with DLLs very easy. You wouldn't have to change OpenCV much to be able to call it from Mathematica. Just some wrappers for the functions to be called and you're basically done.
I don't think such a thing exists, but I'm getting started.
It has the advantage that you can perform some analytic methods... for example rather than hacking in openCV or even Matlab endlessly, you can compute analytically a quantity, and see that the method leading to this matrix is numerically unstable as a function of input variables. Thus you do not need to hack, as it would be pointless.
As for wrapping opencv, that doesn't seem to make sense. The correct procedure would be to fix bad implementations in opencv based on your analysis in Mathematica and on paper.
Agreeing with Peter, I don't believe that forcing Mathematica to use OpenCV is a great thing.
All of the computer vision people that I've talked to, read about, and seen examples are using Matlab and the Imaging toolkit. Its either that, or go with a OpenCV compatible language + OpenCV.
Mathematica has a rich set of tools for image processing, but I'm uncertain about the computer vision capabilities.