I need to detect the size of a square on a grid like this one (all squares are supposed to be equal):
http://imgur.com/VZAimWS
I could think of many different strategies to find out what is the length of a side of a square, but I would expect there might be a particular technique that is more robust or considered better in terms of providing a more accurate answer. Any tips? One of the things I was exploring is to identify each parallel line and find the average (or maybe median) distance between the lines.
In case it's relevant, I am planning to use MatLab or OpenCV for this.
Thank you
I see you want to do this in Matlab, but you might get some inspiration from my attacking it with ImageMagick, which is installed on most Linux distros, and available for OS X and Windows for free from here.
Here is the bones of how I proceed - it's just one command in the Terminal - no compilers, no OpenCV, no Matlab:
convert grid.jpg \
-threshold 80% -negate \
-morphology Thinning:-1 Skeleton \
\( +clone \
-background none \
-fill red -stroke red -strokewidth 2 \
-hough-lines 9x9+150 -write lines.mvg \
\) \
-composite hough.png
That lovely command, does these steps:
thresholds the image to black and white at 80%
inverts it
thins it to a skeleton
copies the whole lot and on the copy performs a Hough Line detection colouring the lines in red
overlays the detected lines back onto the original image
The output image is like this:
And the file lines.mvg contains the line coordinates on which the maths is needed...
# Hough line transform: 9x9+150
viewbox 0 0 1777 1449
line 177.944,0 102.005,1449 # 191 <-- shown in yellow below
line 171.848,0 121.248,1449 # 332
line 0,118.401 1777,149.419 # 453
line 0,143 1777,143 # 181
line 0,283.426 1777,314.444 # 431
line 504.586,0 479.293,1449 # 252
line 0,454.452 1777,485.47 # 403
line 0,481 1777,481 # 164
line 0,627.479 1777,658.496 # 309
line 0,649 1777,649 # 233
line 842.637,0 817.345,1449 # 299
line 0,801.505 1777,832.523 # 558
line 0,844.525 1777,813.507 # 167
line 0,973.531 1777,1004.55 # 291
line 0,1013.55 1777,982.533 # 158
line 1180.69,0 1155.4,1449 # 495
line 0,1146.56 1777,1177.58 # 396
line 0,1182.58 1777,1151.56 # 350
line 0,1331 1777,1331 # 320
line 1510.74,0 1485.45,1449 # 539
line 0,1352.6 1777,1321.58 # 277
line 1504,0 1504,1449 # 201
I'll draw the first line from the list above in on the image in yellow so you can see how the coordinates work.
convert hough.png -stroke yellow -draw "line 177.944,0 102.005,1449" out.jpg
Now, about that maths... it is going to be hard to measure correctly on a distorted image, because it's... well, distorted. I would think about 2 strategies. Either take the top line and solve for its intersection with the leftmost line, and the same in the other 3 corners, You can then calculate the two diagonals of the image and estimate the distortion and amortize it across lots of squares which may be good. Or you could solve for the intersections of all the lines and apply some type of clustering to eradicate multiply-defined corners... or something simpler altogether.
Let's look at just the vertical lines...
grep -v "line 0" lines.mvg
line 177.944,0 102.005,1449 # 191
line 171.848,0 121.248,1449 # 332
line 504.586,0 479.293,1449 # 252
line 842.637,0 817.345,1449 # 299
line 1180.69,0 1155.4,1449 # 495
line 1510.74,0 1485.45,1449 # 539
line 1504,0 1504,1449 # 201
The first two are the same line, so let's average 177 and 171 to give 174. The last two are also the same line, so if we average 1510 and 1504 we get 1507. So, now the grid spacings are
330px = 504-174
338px = 842-504
338px = 1180-842
357px = 1507-1180
So, I'm going for 338px... ish :-)
Well what I would try is:
Pass a threshold and you will get a better edge of the squares.
Then pass the HoughLines algorithm
You will get lines, please adapt the configuration to the best performance in order to see all the lines.
Calculate the points where every line cross with another line and you will have the vertexes of each one.
Use a little math! :)
Related
I'm trying to identify the coordinates of the reactangles that are fully formed. I mean, that have 4 sides with white border line making fully box.
This is the input image I have.
In the below image I show, in yellow, the rectangles for which I'd like to get the coordinates.
In this input image there are 3 black rectangles with white border line and 1 rectangle that is all white.
My current convert code gives coordinates of all areas, including those white areas that generate noise for my purpose.
convert input.png \
-define connected-components:verbose=true \
-define connected-components:area-threshold=100 \
-connected-components 8 -auto-level out:null | grep "255,255,255"
7602: 233x81+295+192 411.0,232.0 18873 srgb(255,255,255)
31: 356x70+365+28 542.4,57.2 4602 srgb(255,255,255)
7604: 538x510+45+273 163.1,529.1 4394 srgb(255,255,255)
7605: 292x470+627+273 809.5,494.2 2116 srgb(255,255,255)
1393: 149x45+785+40 860.8,60.5 2040 srgb(255,255,255)
8449: 513x125+70+658 326.0,708.6 761 srgb(255,255,255)
7015: 43x27+291+110 312.5,122.1 620 srgb(255,255,255)
7599: 84x43+676+148 717.5,169.0 250 srgb(255,255,255)
So, my question is: is there a way to identify from the output given by convert command, which coordinates belong to rectangles fully formed? Thanks
A couple of ideas spring to mind. I haven't developed them into full solutions but may do so if time permits later.
You could maybe choose the centre of each connected component in your list as the seed point for a flood-fill with, say yellow, and then make everything not yellow black (with -fill black +opaque yellow) and run connected components again to see if you get a filled shape the correct area. So, for example, choosing your 4th output line:
7604: 538x510+45+273 163.1,529.1 4394 srgb(255,255,255)
And flood filling from the centre:
magick outlines.png -fill yellow -floodfill +314+478 black result.png
Or maybe go a little further:
magick outlines.png -fill yellow -floodfill +314+478 black -fill black +opaque yellow result.png
Then run another connected components analysis and see if you get a fully yellow-filled shape detected.
You could maybe run a Hit-or-Miss morphology, looking for line ends and follow them back to T-junctions and erase them to get rid of the "overshoot" lines that stick out beyond the ends of your rectangles.
By the way, if you are looking specifically for rectangles, you will probably be better off checking for 4-connected components rather than 8-connected as at present.
How do you do the equivalent of this step in Photoshop.
https://gyazo.com/180a507c0f3c9b342fe33ce218cd512e
Supposed there are two contiguous objects in an image, and you want to create exact sized crops around each one and output as two files. (Generalize to N files)
You can do that with "Connected Component Analysis" to find the contiguous blobs.
Start Image
convert shapes.png -colorspace gray -negate -threshold 10% \
-define connected-components:verbose=true \
-connected-components 8 -normalize output.png
Sample Output
Objects (id: bounding-box centroid area mean-color):
0: 416x310+0+0 212.3,145.2 76702 srgb(0,0,0)
1: 141x215+20+31 90.0,146.2 26129 srgb(255,255,255)
2: 141x215+241+75 311.0,190.2 26129 srgb(255,255,255)
Notice how each blob, or contiguous object, is "labelled" or identified with its own unique colour (shade of grey).
So there is a header line telling you what the fields are followed by 3 blobs, i.e. one per line of output. The first line is the entire image and not much use. The second one is 141 px wide and 215 px tall starting at +20+31 from the top-left corner. The third one is the same size (because I copied the shape) and starts as +241+75 from the top-left corner.
Now stroke red around the final indicated rectangle - bearing in mind that rectangle takes top-left and bottom-right corners rather than top-left corner plus width and height.
convert shapes.png -stroke red -fill none -draw "rectangle 241,75 382,290" z.png
And crop it:
convert shapes.png -crop 141x215+241+75 z.png
And here is the extracted part:
If you want to generalise, you can just pipe the ImageMagick output into awk and pick out the geometry field:
convert shapes.png -colorspace gray -negate -threshold 10% -define connected-components:verbose=true -connected-components 8 -normalize output.png | awk 'NR>2{print $2}'
Sample Output
141x215+20+31
141x215+241+75
The task is to take an image of a document, and leverage straight lines surrounding different 'sections' in order to split up the image into different documents for further parsing. Size of the different 'sections' is completely variable from page to page (we're dealing with several thousand pages). Here is an image of what one of these images looks like:
Example of how the documents are laid out:
Image analysis/manipulation is completely new to me. So far I've attempted to use Scikit-image edge detection algorithms to find the 'boxes', with hopes to use those 'coordinates' to cut the image. However, the two algorithms I've tried (Canny, Hough) are picking up lines of text as 'edges' on high sensitivity, and not picking up the lines that I want on low sensitivity. I could write something custom and low level to detect the boxes myself, but I have to assume this is a solved problem.
Is my approach headed in the right direction? Thank you!
You don't seem to be getting any OpenCV answers, so I had a try with ImageMagick, just in the Terminal at the command-line. ImageMagick is installed on most Linux distros and is available for macOS and Windows for free. The technique is pretty readily adaptable to OpenCV so you can port it across if it works for you.
My first step was to do a 5x5 box filter and threshold at 80% to get rid of noise an scanning artefacts and then invert (probably because I was planning on using morphology, but didn't in the end).
convert news.jpg -depth 16 -statistic mean 5x5 -threshold 80% -negate z.png
I then ran that through "Connected Components Analysis" and discarded all blobs with too small an area (under 2000 pixels):
convert news.jpg -depth 16 -statistic mean 5x5 -threshold 80% -negate \
-define connected-components:verbose=true \
-define connected-components:area-threshold=2000 \
-connected-components 4 -auto-level output.png
Output
Objects (id: bounding-box centroid area mean-color):
110: 1254x723+59+174 686.3,536.0 901824 srgb(0,0,0)
2328: 935x723+59+910 526.0,1271.0 676005 srgb(0,0,0)
0: 1370x1692+0+0 685.2,712.7 399651 srgb(0,0,0)
2329: 303x722+1007+911 1158.0,1271.5 218766 srgb(0,0,0)
25: 1262x40+54+121 685.2,140.5 49820 srgb(255,255,255)
109: 1265x735+54+168 708.3,535.0 20601 srgb(255,255,255)
1: 1274x64+48+48 675.9,54.5 16825 srgb(255,255,255)
2326: 945x733+54+905 526.0,1271.0 16660 srgb(255,255,255)
2327: 312x732+1003+906 1169.9,1271.5 9606 srgb(255,255,255) <--- THIS ONE
421: 403x15+328+342 528.6,350.1 4816 srgb(255,255,255)
7: 141x23+614+74 685.5,85.2 2831 srgb(255,255,255)
The fields are labelled in the first line, but the interesting ones are the second (block geometry) and fourth field (blob area). As you can see, there are 11 lines so it has found 11 blobs in the image. The second field, AxB+C+D means a rectangle A pixels wide by B pixels tall with its top-left corner C pixels from the left edge of the image and D pixels down from the top.
Let's look at the one I have marked with an arrow, which starts 2327: 312x732+1003+906 and draw a rectangle over that one:
convert news.jpg -fill "rgba(255,0,0,0.5)" -draw "rectangle 1003,906 1315,1638" oneArticle.png
If you want to crop that article out into a new image:
convert news.jpg -crop 312x732+1003+906 article.jpg
If we draw in all the other boxes , we get:
Preview:
I have done a hough line detection using the below mentioned code:
convert image.jpg -threshold 90% -canny 0x1+10%+30% \
\( +clone -background none \
-fill red -stroke red -strokewidth 2 \
-hough-lines 5x5+80 -write lines.mvg \
\) -composite hough.png
And I wrote the details of the line in a .mvg file. the .mvg file contents are as shown below:
# Hough line transform: 5x5+80
viewbox 0 0 640 360
line 448.256,0 473.43,360 # 104
line 0,74.5652 640,29.8121 # 158
line 0,289.088 640,244.335 # 156
line 0,292.095 640,247.342 # 133
line 154.541,0 179.714,360 # 125
line 151.533,0 176.707,360 # 145
And check here the output hough.png file.
Problem:
What does #104, #158, #156... stands for, I guess they are line numbers. If so why they are numbered in such a way?
Also I would like to know how the co-ordinates has been assigned.
It will be really helpful if I can get an explanation for the contents in .mvg file.
The # <number> is the maxima value. It defaults to count, which is set by line_count, and in returned influenced by threshold you specified. The number will decrease if the matrix element count is greater than previous height/width iteration. So... If you give it a threshold of -hough-lines 5x5+80, then line 448.256,0 473.43,360 # 104 was found about 24 pixels(or lines?) past the threshold. The next iteration would drop the maxima below the 80 threashold, so we stop comparing the matrix elements.
Also I would like to know how the co-ordinates has been assigned.
I can only answer this by pseudo-quoting the source code, but it's basic trigonometry.
if ((x >= 45) %% (x <= 135)) {
y = (r-x cos(t))/sin(t)
else {
x = (r-y cos(t))/sin(t)
}
where r is defined as y - midpoint element matrix height
where t is defined as x + midpoint rows
Find out more in the HoughLineImage method located in feature.c
For a project I am trying to create a perspective distortion of an image to match a DVD case front template. So I want to automate this using ImageMagick (CLI) but I have a hard time understanding the mathematical aspects of this transformation.
convert \
-verbose mw2.png \
-alpha set \
-virtual-pixel transparent \
-distort Perspective-Projection '0,0 0,0 0,0 0,0' \
box.png
This code is en empty set of coordinates, I have read the documentation thoroughly but I can't seem to understand what parameter represents what point. The documentation gives me variables and names where I have no clue what they actually mean (more useful for a mathematical mastermind maybe). So if someone could explain me (visually prefered, or give me a link to useful information) on this subject because I have no clue on what I am doing. Just playing around with the parameters just wont do for this job and I need to calculate these points.
Here you will find an easy image of what I am trying to achieve (with CLI tools):
Update:
convert \
-virtual-pixel transparent \
-size 159x92 \
-verbose \
cd_empty.png \
\(mw2.png -distort Perspective '7,40 4,30 4,124 4,123 85,122 100,123 85,2 100,30'\) \
-geometry +3+20 \
-composite cover-after.png
Gives me as output:
cd_empty.png PNG 92x159 92x159+0+0 8-bit sRGB 16.1KB 0.000u 0:00.000
convert: unable to open image `(mw2.png': No such file or directory # error/blob.c/OpenBlob/2641.
convert: unable to open file `(mw2.png' # error/png.c/ReadPNGImage/3741.
convert: invalid argument for option Perspective : 'require at least 4 CPs' # error/distort.c/GenerateCoefficients/807.
convert: no images defined `cover-after.png' # error/convert.c/ConvertImageCommand/3044.
Correction by Kurt Pfeifle:
The command has a syntax error, because it does not surround the \( and \) delimiters by (at least one) blank on each side as required by ImageMagick!
Since there are no links to the source images provided, I cannot test the outcome of this corrected command:
convert \
-virtual-pixel transparent \
-size 159x92 \
-verbose \
cd_empty.png \
\( \
mw2.png -distort Perspective '7,40 4,30 4,124 4,123 85,122 100,123 85,2 100,30' \
\) \
-geometry +3+20 \
-composite \
cover-after.png
Did you see this very detailed explanation of ImageMagick's distortion algorithms? It comes with quite a few illustrations as well.
From looking at your example image, my guess is that you'll get there using a Four Point Distortion Method.
Of course, the example you gave with the 0,0 0,0 0,0 0,0 parameter does not do what you want.
Many of the distortion methods available in ImageMagick work like this:
The method uses a set of pairs of control points.
The values are numbers (may be floating point, not only integer).
Each pair of control points represents a pixel coordinate.
Each set of four values represent a source image coordinate, followed immediately by the destination image coordinate.
Transfer the coordinates for each source image control point into the respective destination image control point exactly as given by the respective parameters.
Transfer all the other pixel's coordinates according to the distortion method given.
Example:
Sx1,Sy1 Dx1,Dy1
Sx2,Sy2 Dx2,Dy2
Sx3,Sy3 Dx3,Dy3
...
Sxn,Syn Dxn,Dyn
x is used to represent an X coordinate.
y is used to represent an Y coordinate.
1, 2, 3, ... n is used to represent the 1st, 2nd, 3rd, ... nth pixel.
S is used here for the source pixel.
D is used here for the destination pixel.
First: method -distort perspective
The distortion method perspective will make sure that straight lines in the source image will remain straight lines in the destination image. Other methods, like barrel or bilinearforward do not: they will distort straight lines into curves.
The -distort perspective requires a set of at least 4 pre-calculated pairs of pixel coordinates (where the last one may be zero). More than 4 pairs of pixel coordinates provide for more accurate distortions. So if you used for example:
-distort perspective '1,2 3,4 5,6 7,8 9,10 11,12 13,14 15,16'
(for readability reasons using more {optional} blanks between the mapping pairs than required) would mean:
From the source image take pixel at coordinate (1,2) and paint it at coordinate (3,4) in the destination image.
From the source image take pixel at coordinate (5,6) and paint it at coordinate (7,8) in the destination image.
From the source image take pixel at coordinate (9,10) and paint it at coordinate (11,12) in the destination image.
From the source image take pixel at coordinate (13,14) and paint it at coordinate (15,16) in the destination image.
You may have seen photo images where the vertical lines (like the corners of building walls) do not look vertical at all (due to some tilting of the camera when taking the snap). The method -distort perspective can rectify this.
It can even achieve things like this, 'straightening' or 'rectifying' one face of a building that appears in the 'correct' perspective of the original photo:
==>
The control points used for this distortion are indicated by the corners of the red (source controls) and blue rectangles (destination controls) drawn over the original image:
==>
This particular distortion used
-distort perspective '7,40 4,30 4,124 4,123 85,122 100,123 85,2 100,30'
Complete command for your copy'n'paste pleasure:
convert \
-verbose \
http://i.stack.imgur.com/SN7sm.jpg \
-matte \
-virtual-pixel transparent \
-distort perspective '7,40 4,30 4,124 4,123 85,122 100,123 85,2 100,30' \
output.png
Second: method -distort perspective-projection
The method -distort perspective-projection is derived from the easier understandable perspective method. It achieves the exactly same distortion result as -distort perspective does, but doesn't use (at least) 4 pairs of coordinate values (at least 16 integers) as parameter, but 8 floating point coefficients.
It uses...
A set of exactly 8 pre-calculated coefficients;
Each of these coefficients is a floating point value (unlike with -distort perspective, where for values only integers are allowed);
These 8 values represent a matrix of the form
sx ry tx
rx sy ty
px py
which is used to calculate the destination pixels from the source pixels according to this formula:
X-of-destination = (sx*xs + ry+ys +tx) / (px*xs + py*ys +1)
Y-of-destination = (rx*xs + sy+ys +ty) / (px*xs + py*ys +1)
(TO BE DONE --
I've no time right now to find out how to
properly format + put formulas into the SO editor)
To avoid (the more difficult) calculating of the 8 required cooefficients for a re-usable -distort perspective-projection method, you can...
FIRST, (more easily) calculate the coordinates for a -distort perspective ,
SECOND, run this -distort perspective with a -verbose parameter added,
LAST, read the 8 coefficients from the output printed to stderr .
The (above quoted) complete command example would spit out this info:
Perspective Projection:
-distort PerspectiveProjection \
'1.945622, 0.071451, -12.187838, 0.799032,
1.276214, -24.470275, 0.006258, 0.000715'
Thanks to ImageMagick Distorting Images Documentation, I ended up with this clean-understandable code:
$points = array(
0,0, # Source Top Left
0,0, # Destination Top Left
0,490, # Source Bottom Left
2.2,512, # Destination Bottom Left
490,838, # Source Bottom Right
490,768, # Destination Bottom Right
838,0, # Source Top Right
838,50 # Destination Top Right
);
$imagick->distortImage(Imagick::DISTORTION_PERSPECTIVE, $points, false);
Please keep in mind that each set of coordinates are separated into two
parts. The first is the X axis and the second is the Y axis .. so when we say 838,0
at Destination Right Top, we mean the X axis of Destination Right Top
is 838 and the Y axis of it is zero (0).