I am trying to do OCR using Tesseract overall results seems acceptable. The images are very very long receipts and we are scanning using scanner, the quality is better. Only issue is that in receipts few characters are joint between two lines
Please see the attached sample image. You may see in the first line character 'p' and in the second line character M are joint. This is causing problem in OCR.
SO, the real question is may we add a white line or square between every text line ?
You can do that for this image in Imagemagick by trimming the image to remove surrounding white and adding the same amount of black. Then average that image down to one column and looking for the brightest row. I start and stop 4 pixels from the top and bottom to avoid any really bright rows in those regions. Once I find the brightest row, I splice in 4 rows of white between the top and bottom regions divided by that row. This is not the most elegant way. But it shows the potential. One could likely pipe the list of row values to AWK and search for the max value in more efficient manner than saving to an array and using a for loop. Unix syntax with Imagemagick.
Input:
max=0
row=0
arr=()
arr=(`convert text.png -fuzz 50% -trim -background black -flatten -colorspace gray -scale 1x! -depth 8 txt:- | tail -n +2 | sed -n 's/^.*gray[(]\(.*\)[)]$/\1/p'`)
num=${#arr[*]}
#echo "${arr[*]}"
for ((i=4; i<num-4; i++)); do
val="${arr[$i]}"
max=`convert xc: -format "%[fx:$val>$max?$val:$max]" info:`
row=`convert xc: -format "%[fx:$val==$max?$i:$row]" info:`
#echo "$i $val $max $row"
done
convert text.png -gravity north -splice 0x4+0+$row text2.png
If you want less space, you can change to -splice 0x1+0+$row, but it won't change much. It is not writing over your image, but inserting white between the existing rows.
But by doing the processing above, your OCR still may not recognize the p or M, since the bottom of the p is cut off and appended to the M.
If you have more than two lines of text, you will have to search the column for approximately evenly spaced maxima.
Related
I don't see any rhyme or reason to the use of + vs -, and I've never seen a Unix command other than ImageMagick's suite (convert, etc.) which expects some flags to have a +. Win32 commands (and ports of them to Linux) sometimes use /, but never +.
I don't know the background behind this but personally find it quite rational. In general, the normal form is like other Linux commands preceded with a dash, or hyphen:
magick INPUT -something OUTPUT
The + form is used when there is a sense of:
negation, or
opposite direction, or
of resetting, disabling or clearing or
as a shorthand form.
There may be some overlap in these concepts, and maybe other additional ones exist.
So, in terms of "negation":
magick INPUT -fill red -opaque blue RESULT
will turn all blue pixels into red, whereas this command:
magick INPUT -fill red +opaque blue RESULT
will turn all non-blue pixels into red.
Similarly, -adjoin will clump multiple images together into a single output file if possible, whereas +adjoin will force multiple, separate output files even when it may have been possible to make, say, a multipage TIFF or animated GIF.
Another example is -level 10%,90% which will increase contrast so that the top and bottom 10% of the brightness range are discarded and the remaining 80% are stretched across the full, permissible brightness range. On the other hand, +level 10%,90% will decrease contrast by compressing the entire possible brightness range into the central 80% of the possible brightness range.
In terms of "opposite direction", this command will append images vertically below the first:
magick INPUT INPUT INPUT -append TALL_RESULT
whereas the following positive form will append images horizontally to the right:
magick INPUT INPUT INPUT +append WIDE_RESULT
In terms of "resetting, disabling or clearing", this command will use Riemer dithering:
magick INPUT -dither RiemerSMA ... RESULT
whereas the following positive form will disable dithering:
magick INPUT +dither ... RESULT
If you select a couple of channels to apply a filter or threshold to, you can reset back to the default channels afterwards:
magick INPUT -channel alpha -threshold 50% +channel RESULT
If you set a fuzz for some operation, you can reset it back to zero afterwards:
magick INPUT -fill red -fuzz 10% -opaque blue +fuzz -opaque yellow RESULT
which will set the fill-colour to red, then turn all pixels within 10% of blue into that red and also all perfectly yellow pixels into that fill-colour of red.
In terms of "shorthand", -swap 0,2 will swap the first and third image in a sequence, whereas +swap will swap the last two in the sequence regardless of how many there are. This is a common operation and the plus form is succinct compared to the conventional alternative -swap -1,-2
Likewise, -clone 2 will clone the third image in a sequence, whereas +clone will clone the last... again, a very common operation. Compare +clone with the more conventional-looking, but IMHO uglier alternative -clone -1
Likewise, +delete will delete the last image in a sequence.
I would like to ignore/filter white and those that are close to white when using this type of command
convert $file -colors 10 -format "%c" histogram:info:
I have some images that are predominantly white in their content and i'd like to focus on the other colours.
I thought i could use something like -fuzz 20% -fill "#0000ff" -opaque white to change all white and white'ish into a simple blue which i could then grep out in the histogram output but no luck.
Can anyone point me in the correct direction or offer an example please ?
Let's say for example i have a photo that contains a very overexposed sky i.e lot's of white. I'd like the histogram output to ignore that part of the photo and instead output the other top 10 colours that are in the photo.
Thanks
fLo
Please help. Is there an easy way to take the largest layer of a tiff and zip compress it back as a single layer tiff again with imagemagick or similar?
Just a slightly easier version of Fred's answer. You can generate a list of the area (in pixels) of each layer in a TIF followed by the layer/scene number like this:
magick identify -format "%[fx:w*h] %s\n" image.tif
Sample Output
240000 0
560000 1
200000 2
So, if we do that again, sort it reverse numerically and take the second field of the first result, we will get the number of the layer with the largest area:
layer=$(magick identify -format "%[fx:w*h] %s\n" image.tif | sort -rn | awk 'NR==1{print $2}')
So, the complete solution would look like:
#!/bin/bash
# Get layer number of layer with largest area
layer=$(magick identify -format "%[fx:w*h] %s\n" image.tif | sort -rn | awk 'NR==1{print $2}')
# Extract that layer and recompress as single layer
magick image.tif[$layer] -compress lzw result.tif
If you are using ImageMagick v6 or older:
magick identify ... becomes identify ...
magick image.tif ... becomes convert image.tif ...
In concept, using ImageMagick this can be done in a single command. Here's an example...
magick input.tif -background none -virtual-pixel none ^
( -clone 0--1 +repage -layers merge ) ^
-distort affine "0,0 0,%[fx:s.w==u[-1].w&&s.h==u[-1].h?0:h]" ^
-delete -1 -layers merge output.tif
That starts by reading in the original TIF and setting the background and virtual-pixel settings to "none".
Then inside the parentheses it clones all the layers of the TIF, repages them, and merges them into a single image with the dimensions of the largest layer. That will become a gauge to measure with.
Next it uses "-distort affine" to slide each image out of the viewport and leave it transparent unless the image matches the width and height of that gauge. So after that distort, the largest image will remain unchanged, and all the others will be transparent.
Finish by deleting that gauge image and merging the rest. All the layers are transparent except the largest one, so merging them leaves just that visible one as a single layer.
The command is in Windows syntax using IM7. If you're using ImageMagick v6, use "convert" instead of "magick". To make it work in *nix, change the continued line carets "^" to backslashes "\" and escape the parentheses with backslashes "\(...\)". There may be other issues I've overlooked.
Obviously if there are two or more layers matching the largest dimensions, the output result will only be the first one from the original TIF.
Edited to add: This method will only work if both the greatest width and greatest height are on the same image.
How do you define largest? Width, Height, File size? If the largest dimension from width and height is used, then in Unix, you can do the following on a 3 layer tif file. Get the max dimension of each layer. Then find which layer is the largest. Then use just that layer when reading and writing the file.
Arr=(`identify -format "%[fx:max(w,h)]\n" img.tif`)
echo "${Arr[*]}"
500 1024 770
num=${#Arr[*]}
dim=0
for ((i=0; i<num; i++)); do
if [ ${Arr[$i]} > $dim ]; then
dim=${Arr[$i]}
index=$i
fi
done
echo "$index"
2
convert img.tif[$index] -compress zip newimg.tif
identify newimg.tif
newimg.tif[2] TIFF 770x768 770x768+0+0 8-bit sRGB 3662B 0.000u 0:00.000
I cannot think of any direct and simple method to find the largest layer and extract it in the same command line.
I known some ways to compare 2 images with ImageMagick or OpenCV
Using Objective-C, is there any way to compare two images and get a % difference value returned?
http://www.imagemagick.org/Usage/compare/#methods
How can I quantify difference between two images?
Image comparison - fast algorithm.
But for my case, I also have the same character with different position.
Image1:
Image2:
or
or
So, what should I do now to find the % difference value between Image1 and Image2?
This actually answers your question - which doesn't in fact ask anything about images 3 and 4 - but I fear it will not help you much.
As #GPPK suggests, you need to trim the extraneous material off around your kanji characters, which you can do with the -trim command in ImageMagick. I have added a thin red border so you can see where the edges are:
convert kanji2.png -trim kanji2-t.png
If you want do that to images 1 and 2, and then compare them, you can do that all in one go like this:
convert -metric ae kanji1.png kanji2.png -trim -compare -format "%[distortion]" info:
0
which shows there are zero pixels different in the resulting images if you trim kanji1 and kanji2.
If you compare the trimmed kanji1 and kanji3 like this, you get:
convert -metric AE kanji1.png kanji3.png -trim -compare -format "%[distortion]" info:
893184
which indicates 900,000 pixels of 5,000,000 are different.
Likewise, if you compare kanji1 and kanji4:
convert -metric AE kanji1.png kanji4.png -trim -compare -format "%[distortion]" info:
1.14526e+06
or 1.1 million of 5 million.
But this doesn't help when your images are a different size (scale), or rotated.
You could scale your images to a normalised size before comparing, and I guess that might help you become a bit more "scale invariant":
convert -metric AE kanji1.png kanji4.png -trim -scale 1000x1000! -compare -format "%[distortion]" info:
You could also rotate your images using a little iterative procedure that rotates the images through say +/- 20 degrees and chooses the one with the smallest trimmed bounding box to become a little more "orientation invariant". But then you will still have a problem if the characters are sheared, or fatter, or thinner, or brighter, or darker, or contrastier... I think you need to look into "Template Matching".
I'm generating CAPTCHAs for training data and I have a pretty good ImageMagick script going already.
However, one thing I really want is for individual letters of the word to be slightly rotated, see for example this reCAPTCHA:
Is there an easy (or hard) way to accomplish this effect?
I think you need this:
#!/bin/bash
word="theId"
for (( i=0 ; i<${#word} ; i++ )) ; do
rotation=$(((RANDOM%10)*4)) # Generate random rotation for each letter
convert -background none -virtual-pixel none -pointsize 72 label:"${word:i:1}" +distort SRT $rotation miff:-
done | convert -background none - +append result.png
Basically I am creating and rotating one letter at a time and writing them to a MIFF stream, one after the other, and at the end, I am using +append to join together everything I see on the input stream.
If you want to scrunch the letters closer together (TM) you can add -trim +repage just before miff:-