How to find paragraph bounding box coordinates in a scanned document? - opencv

I'd like to get the coordinates of all areas containing any text in scans of documents like the one shown below (in reduced quality; the original files are of high resolution):
I'm looking for something similar to these (GIMP'ed-up!) bounding boxes. It's important to me that the paragraphs be recognized as such. If the two big blocks (top box on left page, center block on right page) would get two bounding boxes each, though, that would be fine:
The way of obtaining these bounding box coordinates could be through some kind of API (scripted languages preferred over compiled ones) or through a command line command, I don't care. What's important is that I get the coordinates themselves, not just a modified version of the image where they're visible. The reason for that is that I need to calculate the area size of each one of them and then cut out a piece at the center of the largest.
What I've already tried, so far without success:
ImageMagick - it's just not meant for such a task
OpenCV - either the learning curve is too high or my google-foo too bad
Tesseract - from what I've been able to garner, it's the one-off OCR software that, for historical reasons, doesn't do Page Layout Analysis before attempting character shape recognition
OCRopus/OCRopy - should be able to do it, but I'm not finding out how to tell it I'm interested in paragraphs as opposed to words or characters
Kraken ibn OCRopus - a fork of OCRopus with some rough edges, still fighting with it
Using statistics, specifically, a clustering algorithm (OPTICS seems to be the one most appropriate for this task) after binarization of the image - both my maths and coding skills are insufficient for it
I've seen images around the internet of document scans being segmented into parts containing text, photos, and other elements, so this problem seems to be one that has academically already been solved. How to get to the goodies, though?

In Imagemagick, you can threshold the image to keep from getting too much noise, then blur it and then threshold again to make large regions of black connected. Then use -connected-components to filter out small regions, especially of white and then find the bounding boxes of the black regions. (Unix bash syntax)
convert image.png -threshold 95% \
-shave 5x5 -bordercolor white -border 5 \
-blur 0x2.5 -threshold 99% -type bilevel \
-define connected-components:verbose=true \
-define connected-components:area-threshold=20 \
-define connected-components:mean-color=true \
-connected-components 4 \
+write tmp.png null: | grep "gray(0)" | tail -n +2 | sed 's/^[ ]*//' | cut -d\ -f2`
This is the tmp.png image that was created. Note that I have discarded regions smaller than 20 pixels in area. Adjust as desired. Also adjust the blur as desired. You can make it larger to get bigger connected regions or smaller to get closer to individual lines of text. I shaved 5 pixels all around to remove spot noise at the top of your image and then padded with a border of 5 pixels white.
This is the list of bounding boxes.
Here is the listing:
267x223+477+123
267x216+136+43
48x522+413+0
266x86+136+317
266x43+136+410
266x66+477+404
123x62+479+346
137x43+142+259
117x43+486+65
53x20+478+46
31x20+606+347
29x19+608+48
26x18+716+347
26x17+256+480
25x17+597+481
27x18+716+47
21x17+381+240
7x7+160+409
We can go one step further an draw boxes about the regions:
boxes=""
bboxArr=(`convert image.png -threshold 95% \
-shave 5x5 -bordercolor white -border 5 \
-blur 0x2.5 -threshold 99% -type bilevel \
-define connected-components:verbose=true \
-define connected-components:area-threshold=20 \
-define connected-components:mean-color=true \
-connected-components 4 \
+write tmp.png null: | grep "gray(0)" | sed 's/^[ ]*//' | cut -d\ -f2`)
num="${#bboxArr[*]}"
for ((i=0; i<num; i++)); do
WxH=`echo "${bboxArr[$i]}" | cut -d+ -f1`
xo=`echo "${bboxArr[$i]}" | cut -d+ -f2`
yo=`echo "${bboxArr[$i]}" | cut -d+ -f3`
ww=`echo "$WxH" | cut -dx -f1`
hh=`echo "$WxH" | cut -dx -f2`
x1=$xo
y1=$yo
x2=$((xo+ww-1))
y2=$((yo+hh-1))
boxes="$boxes rectangle $x1,$y1 $x2,$y2"
done
convert image.png -fill none -strokewidth 2 -stroke red -draw "$boxes" -alpha off image_boxes.png
Increase the threshold-area from 20 a little more and you can get rid of the tiny box on the lower left side around a round dot, which I think is noise.

Related

How can I separate an image file by color region into multiple image files?

Take this image of a circle and a square on a transparent background (png).
I want to split this image by region. The regions are defined by connected color. I have a circle region. And a square region.
The output would be, one image file with the circle on it
and a second image file with the square on it.
Can ImageMagick do this? Can another tool do this? How can I do this?
Here is how to do that in ImageMagick. I note that your posted image has a white background, not a transparent one. However, I have changed the white to transparent. I first get the bounding boxes of the objects using connected components. Then I loop over each bounding box and crop it out and put it back into a transparent image using the virtual canvas information stored in the cropped image.
Input:
bboxArr=(`convert circle_rectangle_transp.png -type bilevel \
-define connected-components:verbose=true \
-define connected-components:exclude-header=true \
-define connected-components:mean-color=true \
-connected-components 8 null: | grep "gray(0)" | awk '{print $2}'`)
echo "${bboxArr[*]}"
i=0
for bbox in ${bboxArr[*]}; do
echo $bbox
convert circle_rectangle_transp.png -crop "$bbox" -background none -flatten circle_rectangle_transp_$i.png
i=$((i+1))
done

Extract sub images from bigger image with imagemagick

I have an image:
and I want to crop this image and extract sub images by horizontal black(gray) line and got smth like this (list of sub images):
1.
2.
and so on...
How can I do it with ImageMagick? Thanks for help.
Here is one way to do that in ImageMagick in Unix syntax and bash scripting. Threshold the image.
Add a black border all around. Then use connected components processing to find all the large white rectangles and their bounding boxes. Put those in an array. Then sort the array by Y value (top of bounding boxes). Then loop over each bounding box and crop the input image.
Input:
bboxArr=(`convert math.png -threshold 75% -bordercolor black -border 1 -type bilevel \
-define connected-components:exclude-header=true \
-define connected-components:area-threshold=100 \
-define connected-components:mean-color=true \
-define connected-components:verbose=true \
-connected-components 8 null: | grep "gray(255)" | awk '{print $2}'`)
num="${#bboxArr[*]}"
sortedArr=(`echo ${bboxArr[*]} | tr " " "\n" | sort -n -t "+" -n -k3,3 | tr "\n" " "`)
for ((i=0; i<num; i++)); do
cropval=${sortedArr[$i]}
convert math.png -crop $cropval +repage math_$i.png
done
Output (showing first 4 out of 11):
ImageMagick automatically sorts them by largest area first. So I had to sort them by Y.

ImageMagick - Trim / Crop to contiguous objects

How do you do the equivalent of this step in Photoshop.
https://gyazo.com/180a507c0f3c9b342fe33ce218cd512e
Supposed there are two contiguous objects in an image, and you want to create exact sized crops around each one and output as two files. (Generalize to N files)
You can do that with "Connected Component Analysis" to find the contiguous blobs.
Start Image
convert shapes.png -colorspace gray -negate -threshold 10% \
-define connected-components:verbose=true \
-connected-components 8 -normalize output.png
Sample Output
Objects (id: bounding-box centroid area mean-color):
0: 416x310+0+0 212.3,145.2 76702 srgb(0,0,0)
1: 141x215+20+31 90.0,146.2 26129 srgb(255,255,255)
2: 141x215+241+75 311.0,190.2 26129 srgb(255,255,255)
Notice how each blob, or contiguous object, is "labelled" or identified with its own unique colour (shade of grey).
So there is a header line telling you what the fields are followed by 3 blobs, i.e. one per line of output. The first line is the entire image and not much use. The second one is 141 px wide and 215 px tall starting at +20+31 from the top-left corner. The third one is the same size (because I copied the shape) and starts as +241+75 from the top-left corner.
Now stroke red around the final indicated rectangle - bearing in mind that rectangle takes top-left and bottom-right corners rather than top-left corner plus width and height.
convert shapes.png -stroke red -fill none -draw "rectangle 241,75 382,290" z.png
And crop it:
convert shapes.png -crop 141x215+241+75 z.png
And here is the extracted part:
If you want to generalise, you can just pipe the ImageMagick output into awk and pick out the geometry field:
convert shapes.png -colorspace gray -negate -threshold 10% -define connected-components:verbose=true -connected-components 8 -normalize output.png | awk 'NR>2{print $2}'
Sample Output
141x215+20+31
141x215+241+75

Adding white line between text lines

I am trying to do OCR using Tesseract overall results seems acceptable. The images are very very long receipts and we are scanning using scanner, the quality is better. Only issue is that in receipts few characters are joint between two lines
Please see the attached sample image. You may see in the first line character 'p' and in the second line character M are joint. This is causing problem in OCR.
SO, the real question is may we add a white line or square between every text line ?
You can do that for this image in Imagemagick by trimming the image to remove surrounding white and adding the same amount of black. Then average that image down to one column and looking for the brightest row. I start and stop 4 pixels from the top and bottom to avoid any really bright rows in those regions. Once I find the brightest row, I splice in 4 rows of white between the top and bottom regions divided by that row. This is not the most elegant way. But it shows the potential. One could likely pipe the list of row values to AWK and search for the max value in more efficient manner than saving to an array and using a for loop. Unix syntax with Imagemagick.
Input:
max=0
row=0
arr=()
arr=(`convert text.png -fuzz 50% -trim -background black -flatten -colorspace gray -scale 1x! -depth 8 txt:- | tail -n +2 | sed -n 's/^.*gray[(]\(.*\)[)]$/\1/p'`)
num=${#arr[*]}
#echo "${arr[*]}"
for ((i=4; i<num-4; i++)); do
val="${arr[$i]}"
max=`convert xc: -format "%[fx:$val>$max?$val:$max]" info:`
row=`convert xc: -format "%[fx:$val==$max?$i:$row]" info:`
#echo "$i $val $max $row"
done
convert text.png -gravity north -splice 0x4+0+$row text2.png
If you want less space, you can change to -splice 0x1+0+$row, but it won't change much. It is not writing over your image, but inserting white between the existing rows.
But by doing the processing above, your OCR still may not recognize the p or M, since the bottom of the p is cut off and appended to the M.
If you have more than two lines of text, you will have to search the column for approximately evenly spaced maxima.

Skin probability of a given pixel of a image [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Given RGB value's of all pixels of a image , how can we find the probability that the given pixel is of skin color and what percentage of the image is of skin color .
Noodling around on Google tells me that caucasian skin tones often, or maybe generally, or maybe sometimes conform to the following sort of rule:
Blue channel: 140-180
Green channel: Blue * 1.15
Red channel: Blue * 1.5
So, with that in mind, I made some colour swatches that correspond to that with ImageMagick, using this command line:
#!/bin/bash
for b in $(seq 140 5 180); do
g=$(echo "$b * 1.15/1" | bc)
r=$(echo "$b * 1.5/1" | bc)
convert -label "R:$r,G:$g,B:$b" -size 200x200 xc:"rgb($r,$g,$b)" miff:-
done | montage - -frame 5 -tile 3x swatches.png
And got this:
Ok, those look kind of reasonable, now I try to use those to detect skin tones, again with ImageMagick. For the moment, and just so you can see it, I am going to colour lime-green everthing I detect as a skin-tone, using this which is right in the middle of the tonal range identified above:
convert -fuzz 5% face1.jpg -fill lime -opaque "rgb(240,184,160)" out.jpg
Mmmm, not very good. Increase the fuzziness maybe?
Mmmm, still pretty rubbish - picking up only part of the skin and some of the white shirt collar. Different face maybe?
Ok, not bad at detecting him, although notice it completely fails to detect the right side of his face, however there are still a few problems as we can see from the pink cadillac:
and Miss Piggy below...
Maybe we can be a bit more targeted in our search, and, though I can't draw it in 3-D, I can explain in 2-D. Instead of targeting a single large circle (actually sphere in 3-D space) in the middle of our range, maybe we could target some smaller circles spread along our range and thereby include fewer extraneous colours... the magenta represents the degree of fuzz. So rather than this:
we could do this:
using this command:
convert -fuzz 13% face1.jpg -fill lime \
-opaque "rgb(219,168,146)" \
-opaque "rgb(219,168,146)" \
-opaque "rgb(255,198,172)" out.jpg
So, you can see it is pretty hard to find skin-tones just by using RGB values and I haven't even started to address different races, different lighting etc.
Another approach may be to use a different colourspace, such as HSL - Hue Saturation and Lightness. We are not so interested in Lightness because that is just a function of exposure, so we look for hues that match those of skin and some degree of saturation to avoid washed out colours. You can do that with ImageMagick like this:
#!/bin/bash
convert face1.jpg -colorspace hsl -separate \
\( -clone 0 -threshold 7% -negate +write h.png \) \
\( -clone 1 -threshold 30% +write s.png \) \
-delete 0-2 -evaluate-sequence min out.png
That says this... take the image face1.jpg and convert it to HSL colorspace, then separate the layers so we now have 3 images in our stack. image 0 is the Hue, image 1 is the Saturation and image 2 is the Lightness. Next line. Take the Hue layer and threshold it at 7% which means pinky-reds, invert it and save it (just so you can see it) as h.png. Next line. Take the Saturation layer, and say "any saturation over 30% is good enough for me", then save as file s.png. Next line. Delete the 3 original layers (HS&L) from the original image leaving just the thresholded Hue and thresholded Saturation layers. Now put these ontop of each other and choose whichever is the minimum and save that. The point is that either the Hue or the Saturation layer can be used to gate which pixels are selected.
Here are the files, first the Hue (h.png):
next the Saturation (s.png):
and now the combined output file.
Once you have got your algorithm sorted out for deciding which pixels are skin coloured, you will need to count them to work out the percentages you seek. That is pretty easy... all we do is change everything that is not lime-green to black (so it counts for zero in the averaging) and then resize the image to a single pixel and get its colour as text:
convert -fuzz 13% face1.jpg -fill lime \
-opaque "rgb(219,168,146)" \
-opaque "rgb(219,168,146)" \
-opaque "rgb(255,198,172)" \
-fill black +opaque lime -resize 1x1! txt:
# ImageMagick pixel enumeration: 1,1,255,srgb
0,0: (0,92,0) #005C00 srgb(0,92,0)
We can see there is, not surprisingly, no red and no blue and the average colour of the green pixels is 92/255, so 36% of pixels match our description of skin-toned.
If you want to get more sophisticated you may have to look at shapes, textures and contexts, or train a skin classifier and write a whole bunch of stuff in OpenCV or somesuch...

Resources