I have QR-code scanned from document, when I trying do decode it with online decoder, like http://zxing.org/w/decode.jspx or other, they not find QR-code, but when I decode with camera on smartphone, it's decode correct text. I think it's because of small noises in image, how can I clear it with ImageMagick?
There can be another QR-codes.
You can try a median filter like this, but you would probably be better off extracting the image from the PDF the way Kurt suggested in your previous question in order to retain more quality:
convert qr.png -median 3 result.png
I don't have the Imagemagick command available, but what you usually do to remove noise is to
blur the image by 1 or 2 pixels
lighten it up a bit
add contrast again
Imagemagick should be able to do all this.
Related
I previously posted how to auto fix a skewed image, is there a way to simply detect that it is skewed with image magick? I.e. is there a command that I could run on two images, one skewed and one not, and use that output as the determinant of whether it's skewed?
Thanks for any help,
Kevin
Correction to my comment above. There is a way to determine the skew angle in Imagemagick if you have regular lines of text.
Input:
convert img.jpg -deskew 60% -format "%[deskew:angle]" info:
2.18111
See https://imagemagick.org/script/escape.php
For e.g.
magick -density 100 apple.jpg -resize 100x100 apple-edited.jpg
Density here is the option which has to be provided prior to input file name.
Can someone please explain when is such prior options are needed. And also provide me few more examples.
PS: I tried looking for the options but not sure if there is an specific term for such options which will help limit the search results.
ImageMagick is, exclusively, a raster image processor, which means that it processes bitmap images made up of pixels laid out on a rectangular grid (or raster), rather than a vector image processor like Adobe Illustrator or Inkscape which deal with shapes, lines, rectangles and bezier curves described by their vertices or inflection points - not pixels.
When you load a vector image (e.g. an SVG image, or a PDF) into ImageMagick, the very first thing it does is rasterise your image onto a rectangular grid before it can work with the pixels. So, if your image is an SVG or a PDF, you need to set the density, or number of lines in the grid before you load the image.
I don't know of a reason to do that for a JPEG, like your example. I can't think of any other settings you absolutely need to make prior to loading an image.
I am trying to use an binarize images similar to the following image:
Basically, I want all non white to become black but threshold in OpenCV is giving fringing (JPEG Artifacts). I even tried Otsu thresholding but some parts of the colors don't work so well.
Is there any simple way of doing this binarization properly?
Turn to greyscale, apply 5x5 blur filter, and binarize? The blur will smooth the ringing artifacts.
After quite some trial and error, turns out Morphological Closing before thresholding using a large value turns out to be most suitable for the next stage of what I am working on. But it does cause some loss of shape info.
Given that you have to use JPEG for this project, the one thing you can do use all one quantization tables. That is usually done through "quality" settings. You want an encoder that will allow you to do no quantization.
I have to OCR table from PDF document. I wrote simple Python+opencv script to get individual cells. After that new problem arose. Text is antialiased and not good-quality.
Recognition rate of tesseract is very low. I've tried to preprocess images with adaptive thresholding but results weren't much better.
I've tried trial version of ABBYY FineReader and indeed it gives fine output, but I don't want to use non-free software.
I wonder if some preprocessing would solve issue or is it nessesary to write and learn other OCR system.
If you look closely at your antialiased text samples, you'll notice that the edges contain a lot of red and blue:
This suggests that the antialiasing is taking place inside your computer, which has used subpixel rendering to optimise the results for your LCD monitor.
If so, it should be quite easy to extract the text at a higher resolution. For example, you can use ImageMagick to extract images from PDF files at 300 dpi by using a command line like the following:
convert -density 300 source.pdf output.png
You could even try loading the PDF in your favourite viewer and copying the text directly to the clipboard.
Addendum:
I tried converting your sample text back into its original pixels and applying the scaling technique mentioned in the comments. Here are the results:
Original image:
After scaling 300% and applying simple threshold:
After smart scaling and thresholding:
As you can see, some of the letters are still a bit malformed, but I think there's a better chance of reading this with Tesseract.
I'm looking for a possibility to convert raster images to vector data using OpenCV. There I found a function cv::findContours() which seems to be a bit primitive (more probably I did not understand it fully):
It seems to use b/w images only (no greyscale and no coloured images) and does not seem to accept any filtering/error suppresion parameters that could be helpful in noisy images, to avoid very short vector lines or to avoid uneven polylines where one single, straight line would be the better result.
So my question: is there a OpenCV possibility to vectorise coloured raster images where the colour-information is assigned to the resulting polylinbes afterwards? And how can I apply noise reduction and error suppression to such a algorithm?
Thanks!
If you want to raster image by color than I recommend you to clusterize image on some group of colors (or quantalize it) and after this extract contours of each color and convert to needed format. There are no ready vectorizing methods in OpenCV.