I'd like to figure out a method for finding bounding boxes of words or a pair of words in binary image. The image itself looks like this: (bounding boxes I need are marked by blue rectangles).
Image is free of any other objects. I'm thinking about some form of connected component analysis, like detecting single letters first, then "drawing" their bounding boxes on another Mat object in such a way that neighbouring letters connect. There is a useful information I'd like to utilize - word or a pair of words forms a horizontal line, which is an information that could be used to separate "Hello there" and "abcdf" - I just don't know how to do it.
Contour the image.
Pick contours with a suitable area and width/height to be letters - get coords of centers.
From list of centers decide how far apart 2 centers can be to be adjacent letters
rather than a gap.
Group these contours into a word and take their
bounding box
Opencv has clustering, contour area and bounding box funcs if you don't want to do it yourself
Do OX-dilation using window size N, where N is approximate 1..2 size of letter width, then you will have black filled "boxes".
Find contours ( see http://docs.opencv.org/modules/imgproc/doc/structural_analysis_and_shape_descriptors.html ).
Find rectangles and correct its with (minus approx 1 size of letter width) due to dilation width enlargement.
Related
Let's say I have a 16x16 black & white bitmap image
Here white pixels indicate empty space and black pixels indicate filled space.
I want to extract all of it's contour lines that surround black pixels, including holes and nested contour lines. (see the second image)
Let's define a coordinate space for pixels
top-left pixel -> index (0,0)
top-right pixel -> index (15,0)
bottom-left pixel -> index (0,15)
bottom-right pixel -> index (15,15)
Contour lines also have their coordinate space
top-left corner of top-left pixel -> index (0,0)
top-right corner of top-right pixel -> index (16,0)
bottom-left corner of bottom-left pixel -> index (0,16)
bottom-right corner of bottom-right pixel -> index (16,16)
Finally, contour lines are defined as a sequence of points in that coordinate space.
On the second image I marked 3 contours to demonstrate what the desired output should look like.
Path1 (RED): 1(1,0) 2(2,0) 3(2, 3) 4(3,3) 5(0,3) ... 23(4,4) 24(1, 4)
Hole1 of Path1 (BLUE): 1(7,5) 2(7,6) 3(6,6) ... 13(11,6) 14(11,5)
Path2 (RED again): 1(8,6) 2(10,6) 3(10,8) 4(8,8)
...
Note that the order of points in contour is important. Winding difference for holes is not that important, but we should somehow indicate "hole" property of that contour.
I solved this problem using ClipperLib, but it is more like a brute-force approach in my opinion, if we ignore what happens inside the ClipperLib.
Here's a brief description of the algorithm.
First, define a 16x16 subject polygon from which we will be subtracting all white pixels
Scan the image matrix row by row
On each row extract all contiguous white rectangle shapes as a clipping polygon
Do the polygon clipping by subtracting all collected white rectangular polygons from initial 16x16 subject polygon
Extract path data (including holes) from ClipperLib's PolyTree solution
I'm wondering if there is a better way to solve this problem?
Using ClipperLib seems overkill here, as it addresses general polygons by means of complex intersection detection and topological reconstruction algorithms, whereas your problem is more "predictable".
You can proceed in two steps:
use a standard contouring algorithm, such as used by cv.findContours. (It is an implementation of "Satoshi Suzuki and others. Topological structural analysis of digitized binary images by border following. Computer Vision, Graphics, and Image Processing, 30(1):32–46, 1985.")
from the contours, which link pixel centers to pixel centers, derive the contours that follow the pixel edges. This can probably be achieved by studying the different configurations of sequences of three pixels along the outline.
You can use boundary tracing algorithms for this. I personally use Moore-Neighbor tracing, because it's intuitive and straightforward to implement. You first find the boundary contours, and then come up with a hole searching algorithm (you may need to combine parts of scanline fill algorithm). Once you find a hole, you can apply the same boundary tracing algorithm, but in opposite direction.
You can definitely use libraries like OpenCV to find contours, but it my experience, it may produce degenerate output incompatible with other libraries, such as poly2tri used to decompose polygons into triangles.
If we take your input sample image, then the red path could be considered self-intersecting (vertices 7 and 23 are touching), which may lead to failed polygon decomposition. You may need to figure out a way to find and treat those objects as separate, if that's a problem. However, the newest Clipper2 is going to have triangulation unit that could handle such degenerate input, if you ever need to solve this problem down the road.
I am doing object detection in order to count penguins on a UAV georeferenced dataset, so for practical reasons let's say they appear as dots on the images. After running the object detection model, it returns inferred images with the corresponding bounding boxes for each penguin detected.
I need to extract the coordinate of the center of the bounding box (something like x,y), so, as the image is georeferenced, I would be able to convert image b.box center coordinates into GPS coordinates.
This picture is a good example. Here, the authors are counting banana plants, and after detecting the plants of the same regions in 3 differently-treated pictures of the same area, they see that up to three boxes appear around some of the plants (left). So in order to count each plant as one, despite having some of them up to 3 bboxes, this is what they do (quoted from the original article):
Collect bounding boxes of detection from each ROI tiles.
Calculate centroid of each bounding box.
Add the tile number information on x and y-value of centroids to overlay them on original ROI image.
And this is exactly what I am looking for, the step number 3, how to calculate the centroid of each bbox and how to obtain the x,y coords, so then I would be able to transform those coords into real ones, as the image is georeferenced, and then display each real coord on a mosaic.
Thank you very much in advance.
You could use the Intersection over Union algorithm to select one of the boxes and then use the coordinates of the selected box to plot the output circle or box over detected objects.
I have an array of data from a grayscale image that I have segmented sets of contiguous points of a certain intensity value from.
Currently I am doing a naive bounding box routine where I find the minimum and maximum (x,y) [row, col] points. This obviously does not provide the smallest possible box that contains the set of points which is demonstrable by simply rotating a rectangle so the longest axis is no longer aligned with a principal axis.
What I wish to do is find the minimum sized oriented bounding box. This seems to be possible using an algorithm known as rotating calipers, however the implementations of this algorithm seem to rely on the idea that you have a set of vertices to begin with. Some details on this algorithm: https://www.geometrictools.com/Documentation/MinimumAreaRectangle.pdf
My main issue is in finding the vertices within the data that I currently have. I believe I need to at least find candidate vertices in order to reduce the amount of iterations I am performing, since the amount of points is relatively large and treating the interior points as if they are vertices is unnecessary if I can figure out a way to not include them.
Here is some example data that I am working with:
Here's the segmented scene using the naive algorithm, where it segments out the central objects relatively well due to the objects mostly being aligned with the image axes:
.
In red, you can see the current bounding boxes that I am drawing utilizing 2 vertices: top-left and bottom-right corners of the groups of points I have found.
The rotation part is where my current approach fails, as I am only defining the bounding box using two points, anything that is rotated and not axis-aligned will occupy much more area than necessary to encapsulate the points.
Here's an example with rotated objects in the scene:
Here's the current naive segmentation's performance on that scene, which is drawing larger than necessary boxes around the rotated objects:
Ideally the result would be bounding boxes aligned with the longest axis of the points that are being segmented, which is what I am having trouble implementing.
Here's an image roughly showing what I am really looking to accomplish:
You can also notice unnecessary segmentation done in the image around the borders as well as some small segments, which should be removed with some further heuristics that I have yet to develop. I would also be open to alternative segmentation algorithm suggestions that provide a more robust detection of the objects I am interested in.
I am not sure if this question will be completely clear, therefore I will try my best to clarify if it is not obvious what I am asking.
It's late, but that might still help. This is what you need to do:
expand pixels to make small segments connect larger bodies
find connected bodies
select a sample of pixels from each body
find the MBR ([oriented] minimum bounding rectangle) for selected set
For first step you can perform dilation. It's somehow like DBSCAN clustering. For step 3 you can simply select random pixels from a uniform distribution. Obviously the more pixels you keep, the more accurate the MBR will be. I tested this in MATLAB:
% import image as a matrix of 0s and 1s
oI = ~im2bw(rgb2gray(imread('vSb2r.png'))); % original image
% expand pixels
dI = imdilate(oI,strel('disk',4)); % dilated
% find connected bodies of pixels
CC = bwconncomp(dI);
L = labelmatrix(CC) .* uint8(oI); % labeled
% mark some random pixels
rI = rand(size(oI))<0.3;
sI = L.* uint8(rI) .* uint8(oI); % sampled
% find MBR for a set of connected pixels
for i=1:CC.NumObjects
[Y,X] = find(sI == i);
mbr(i) = getMBR( X, Y );
end
You can also remove some ineffective pixels using some more processing and morphological operations:
remove holes
find boundaries
find skeleton
In MATLAB:
I = imfill(I, 'holes');
I = bwmorph(I,'remove');
I = bwmorph(I,'skel');
I would like to measure the horizontal lengths of multiple ROI. I tried Feret's diameter, but it only gives the longest distance between any two points along the selection boundary. I tried bounding rectangle, but I suppose the rectangles are tilted to obtain the minimum bounding rectangle.
Does anyone have another idea? Because clearly, the selection boundaries fit nicely to the ROI - so how could I extract that information, i.e. the xy-coordinates of the fits? Thanks in advance
PS: I did not write ROIs because 'Region of Interests' makes no sense
Actually, I want five external bounding boxes for the "white" pixels on the following binary image. Desired zones are highlighted with red color.
To get 5th bounding box I'd dilate or blur it. However, dilation will merge zone 3 with zones 1 and 2, so I'll get a bounding box which covers almost entire image. (If I don't dilate or blur it, then cv::findContours + cv::boundingRect will produce a big number of small rectangles.)
In other words, I want only "big enough" bounding boxes.
It's just a sample pattern. Positions of the zones may vary. Is there a way to solve the problem in a general way?
Dilation is done at a per-pixel basis, without regard for the size of the component to which the pixel belongs.
If you want to apply dilation only to small blobs, then you need to remove big blobs before applying the dilation.
So, extract all contours with findContours, then store all contours that are 'big enough' in a list, and paint them black in your source image. Then dilate the modified source and extract the remaining contours.
Note that to get the correct size of the boundingBox, what you probably want is morphological closing (dilation followed by the same amount of erosion), instead of dilation only.