My current project involves transcribing texts in pdf into text files, and I first tried putting the image file directly into OCR program (tesseract) and it didnt' do that well.
The original image files are old news papers, basically, and have some background noises, which I am sure tesseract has problem with. So I am trying to use some image preprocessing before feeding it into tesseract. Is there any suggestion for open source image preprocessing engine that fits well to this situation??? And instructions on how to use it would be even more appreciated !
I never heard of an "image preprocessing engine" for that purpose, but you can take a look at OpenCV (Open Source Computer Vision Library) and implement your own "pre-processing engine". OpenCV is a computer vision library that offers many features to perform image processing.
One interesting thing you might want test as a preprocessing step is apply a threshold to the image to remove noises and stuff. Anyway, I've talked about this kind of stuff in this thread.
Like #karlphillip mentioned, I highly doubt there's a readily available preprocessing engine for your purposes as the preprocessing technique vary greatly with the desired result.
Some common approaches to clearing up the text in noisy images include:
1. Adaptive thresholding (Sauvola or Niblack binarization)
2. Applying a median filter of a size slightly larger than the text to get a background image, then subtract out the background from the original image (to remove the larger noise like creases, stains, handwritten notes, etc.).
OpenCV has implementations of these filters/binarization methods. If you have access to published literature there's quite a bit of work on binarization of noisy documents.
Check out ScanTailor. It has pretty impressive pre-processing functionality and it is open source.
Related
my first hurdle so far is that running tesseract vanilla on images of MTG cards doesn't recognize the card title (honestly that's all I need because I can use that text to pull the rest of the card info from a database) I think the issue might be having to need to train tesseract to recognize the font use in mtg cards but im wondering if it might be an issue with tesseract not looking or not detecting text in a section of the image (specifically the title.)
Edit: including an image of a MTG card for reference.http://gatherer.wizards.com/Handlers/Image.ashx?multiverseid=175263&type=card
Ok so, after asking on reddit programming forums I think I found an answer that I am going to pursue:
The training feature of tesseract is indeed for improving rates for unusual fonts, but that's probably not the reason you have low success.
The environment the text is in is not well controlled - the card background can be a texture in one of five colours plus artifacts and lands. Tesseract greyscales the image before processing, so the contrast between the text and background is not sufficient.
You could put your cards through a preprocessor which mutes coloured areas to white and enhances monotones. That should increase the contrast so tesseract can make out the characters.
If anyone still following thsi believes that above path to be the wrong one to start down, please say so.
TLDR
I believe you're on the right track, doing preprocessing.
But you will need doing both preprocessing AND training Tesseract.
Preprocessing
Basically, you want to get the title text, and only the title text, for Tesseract to read. I suggest you follow the steps below:
Identify the borders of the card.
Cut out the title area for further processing.
Convert the image to black'n'white.
Use contours to identify the exact text area, and crop it out.
Send the image you got to Tesseract.
How to create a basic preprocessing is shown in the YouTube video Automatic MTG card sorting: Part 2 - Automatic perspective correction with OpenCV Also have a look at the third part in that serie.
With this said, there is a number of problems you will encounter. How to handle split cards? Will your algorithm manage white borders? What if the card is rotated or upside-down? just to name a few.
Need for Training
But even if you manage to create a perfect preprocessing algorithm you will still have to train Tesseract. This is due to the special text font used on the cards (which happens to be different fonts depending on the age of the card!).
Consider the card "Kinjalli's Caller".
http://gatherer.wizards.com/Handlers/Image.ashx?multiverseid=435169&type=card
Note how similar the "j" is to the "i". An untrained Tesseract tend to mix them up.
Conclusion
All this considered, my answer to you is that you need to do both preprocessing of the card image AND to train Tesseract.
If you're still interested I would like to suggest that you have a look at this MTG-card reading project on GitHub. That way you don't have to reinvent the wheel.
https://github.com/klanderfri/ReadMagicCard
I have 1000 of images having watermark (text). The position of watermark is random.
I just want to remove it using matlab as manually using (photoshop) is big task.
My question is, what should be work flow.
Any help about literature, link, routines in matlab etc.
You need to use an Image Recovery algorithm and more specifically an inpainting alogrithm would serve your purpose.
I am building an iOS app that, as a key feature, incorporates image matching. The problem is the images I need to recognize are small orienteering 10x10 plaques with simple large text on them. They can be quite reflective and will be outside(so the light conditions will be variable). Sample image
There will be up to 15 of these types of image in the pool and really all I need to detect is the text, in order to log where the user has been.
The problem I am facing is that with the image matching software I have tried, aurasma and slightly more successfully arlabs, they can't distinguish between them as they are primarily built to work with detailed images.
I need to accurately detect which plaque is being scanned and have considered using gps to refine the selection but the only reliable way I have found is to get the user to manually enter the text. One of the key attractions we have based the product around is being able to detect these images that are already in place and not have to set up any additional material.
Can anyone suggest a piece of software that would work(as is iOS friendly) or a method of detection that would be effective and interactive/pleasing for the user.
Sample environment:
http://www.orienteeringcoach.com/wp-content/uploads/2012/08/startfinishscp.jpeg
The environment can change substantially, basically anywhere a plaque could be positioned they are; fences, walls, and posts in either wooded or open areas, but overwhelmingly outdoors.
I'm not an iOs programmer, but I will try to answer from an algorithmic point of view. Essentially, you have a detection problem ("Where is the plaque?") and a classification problem ("Which one is it?"). Asking the user to keep the plaque in a pre-defined region is certainly a good idea. This solves the detection problem, which is often harder to solve with limited resources than the classification problem.
For classification, I see two alternatives:
The classic "Computer Vision" route would be feature extraction and classification. Local Binary Patterns and HOG are feature extractors known to be fast enough for mobile (the former more than the latter), and they are not too complicated to implement. Classifiers, however, are non-trivial, and you would probably have to search for an appropriate iOs library.
Alternatively, you could try to binarize the image, i.e. classify pixels as "plate" / white or "text" / black. Then you can use an error-tolerant similarity measure for comparing your binarized image with a binarized reference image of the plaque. The chamfer distance measure is a good candidate. It essentially boils down to comparing the distance transforms of your two binarized images. This is more tolerant to misalignment than comparing binary images directly. The distance transforms of the reference images can be pre-computed and stored on the device.
Personally, I would try the second approach. A (non-mobile) prototype of the second approach is relatively easy to code and evaluate with a good image processing library (OpenCV, Matlab + Image Processing Toolbox, Python, etc).
I managed to find a solution that is working quite well. Im not fully optimized yet but I think its just tweaking filters, as ill explain later on.
Initially I tried to set up opencv but it was very time consuming and a steep learning curve but it did give me an idea. The key to my problem is really detecting the characters within the image and ignoring the background, which was basically just noise. OCR was designed exactly for this purpose.
I found the free library tesseract (https://github.com/ldiqual/tesseract-ios-lib) easy to use and with plenty of customizability. At first the results were very random but applying sharpening and monochromatic filter and a color invert worked well to clean up the text. Next a marked out a target area on the ui and used that to cut out the rectangle of image to process. The speed of processing is slow on large images and this cut it dramatically. The OCR filter allowed me to restrict allowable characters and as the plaques follow a standard configuration this narrowed down the accuracy.
So far its been successful with the grey background plaques but I havent found the correct filter for the red and white editions. My goal will be to add color detection and remove the need to feed in the data type.
So, I'm trying to stitch images taken by a microscope of a microchip, but it's very hard to have all the features aligned. I already have a 50% overlap between two adjacent images, but even with that, it's not always a good fit.
I'm using SURF with OpenCV to extract the keypoints and find the homographic matrix. But still, it's far from being an acceptable result.
My objective is to be able to stitch perfectly 2x2 images, so this way, I can repeat that process recursively until I have the final image.
Do you have any suggestion ? A nice algorithm to approach this problem. Or maybe a way to transform the images to be able to extract better keypoints from them. Play with the threshold (a smaller one to get more keypoints, or a larger one?).
Right now, my approach is to first stitch two 2x1 images and then, stitch these two together. It's close from what we want, but still not acceptable. Also, the problem might be that the image used to be the "source" (while the second image is transform with the matrix to overlap that one) might not be a bit misaligned or there's a small angle on that image that affects the whole result.
Any help or suggestion is appreciated. Specially any solution that would allow to use OpenCV and SURF (even if I'm not totally against other libraries... it's just that most of the project has been developed with that).
Thanks!
I found using TurboReg during image registration development to be a helpful comparison tool. It is a free ImageJ plugin, and has many different fitting types.
Have you taken a look at the new OpenCV stitching samples: stitching.cpp and stitching_detailed.cpp?
EDIT : I forgot this was cutting edge OpenCV because I'm using the trunk at home :) To get access to these new samples, you'll need to check out the OpenCV trunk from SVN like this:
svn co https://code.ros.org/svn/opencv/trunk/opencv opencv-trunk
Unfortunately, you'll need to recompile it, but you should be able to use the new stitching code :) If you haven't built OpenCV from source before, here is a good little tutorial to get you started. I will mention that OpenCV has a lot more options that can be enabled/disabled than are mentioned in the tutorial, so you might want to use the cmake-gui to get a look at all of the options. You can apt-get it with this command:
> sudo apt-get install cmake-qt-gui
Also, if you're more concerned with quality, and you don't mind slower performance; you might consider using the Lucas-Kanade method for image registration. Here is a lecture, and here is a paper on the topic that might be helpful to you.
The Fiji's stitching plugin handles this situation of alignment error propagation with 2D mosaicing. We use it in daily use for microscopic stitching, and I must say it is perfect.
I'm thinking of starting a project for school where I'll use genetic algorithms to optimize digital sharpening of images. I've been playing around with unsharp masking (USM) techniques in Photoshop. Basically, I want to create a software that optimizes the parameters (i.e. blur radius, types of blur, blending the image) to create the "best-fit" set of filters.
I'm sort of quickly planning this project before starting it, and I can't think of a good fitness function for the 'selection' part. How would I determine the 'quality' of the filter sets, or measure how sharp the image is?
Also, I will be programming using python (with the Python Imaging Library) since it's the only language I'm proficient with. Should I learn a low-level language instead?
Any advice/tips on anything is greatly appreciated. Thanks in advance!
tl;dr How do I measure how 'sharp' an image is?
if its for tuning parameters you could take a known image and apply a known blurring/low pass filter. Then sharpen this with your GA+USM algorithm. Calculate your fitness function making use of the original image, e.g maybe something as simple as the mean absolute error. May need to create different datasets, e.g. landscape images (mostly sharp, in focus with large depth of field), portrait images (could be large areas deliberately out of focus and "soft"), along with low noise and noisy images. Sharpening noisy images is actually quite a challenge.
It would definitely be worth taking a look at Bruce Frasier' work on sharpening techniques for Photoshop etc.
Also it might worth checking out Imatest (www.imatest.com) to see if there is anything regarding sharpness/resolution. And finally you might also consider resolution charts.
And finally I seroiusly doubt one set of ideal parameters exists for USM, the optimum parameters will be image dependant and indeed be a personal perference (thatwhy I suggest starting for a known sharp image and blurring it). Understanding the type of image is probably as important and in itself and very interesting and challenging problem. Although perhaps basic hueristics like image varinance and edge histogram would reveal suitable clues.
Anyway just a thought, hopefully some of the above is useful