Coordinates of bounding box in an image - opencv

I am doing object detection in order to count penguins on a UAV georeferenced dataset, so for practical reasons let's say they appear as dots on the images. After running the object detection model, it returns inferred images with the corresponding bounding boxes for each penguin detected.
I need to extract the coordinate of the center of the bounding box (something like x,y), so, as the image is georeferenced, I would be able to convert image b.box center coordinates into GPS coordinates.
This picture is a good example. Here, the authors are counting banana plants, and after detecting the plants of the same regions in 3 differently-treated pictures of the same area, they see that up to three boxes appear around some of the plants (left). So in order to count each plant as one, despite having some of them up to 3 bboxes, this is what they do (quoted from the original article):
Collect bounding boxes of detection from each ROI tiles.
Calculate centroid of each bounding box.
Add the tile number information on x and y-value of centroids to overlay them on original ROI image.
And this is exactly what I am looking for, the step number 3, how to calculate the centroid of each bbox and how to obtain the x,y coords, so then I would be able to transform those coords into real ones, as the image is georeferenced, and then display each real coord on a mosaic.
Thank you very much in advance.

You could use the Intersection over Union algorithm to select one of the boxes and then use the coordinates of the selected box to plot the output circle or box over detected objects.

Related

Easiest/most robust to detect shape for OpenCV for Intersection over Union of two objects

I am trying to measure the precision of my marker tracking algorithm via post-processing a video.
My algorithm is: Find a printed planar marker in a Videostream and place a virtual marker at that position. I am working with AR.
Here are two frames of such a video:
Virtual Marker on top of detected marker
Virtual Marker with offset to actual marker
I want to calculate the Intersecion over Union / Jaccard Index of the actual marker and virtual marker. For the first picture it would give me ~98% and the second ~1/5th %. This will give me the quality for my algorithm, how precise and well it works.
I want to get the position and rotation of both markers in each frame with OpenCV and calculate the Jaccard Index. As you can see though, if I directly place a virtual marker on top of the paper marker, I will make it difficult for myself (with OpenCV) to detect them.
My idea is to not place a white marker on top of the actual marker, but place an easily detectable "thing" with a specific color or shape with an offset to the marker, let's say 10cm to the right maybe. Then I subtract the offset. So now, at the best case scenario, the position and rotation of the actual marker and the "thing" with the offset subtracted will be the same.
But what should I use as the easily detectable "thing"? I don't have enough experience with OpenCV to know what (colored?) shape I should use. The augmentation can go in front, behind, left, right... of the actual marker anytime during the video and it should do two things:
Not hinder the detection of the actual marker, like currently shown in the pictures
Be easily detectable itself
Help would be much appreciated!
Assuming you have enough white background around the visual marker:
You could use colored circles, for example in red, green, blue and black.
Use opencv blob detection [1] to detect all blobs and filter for circular ones:
Look-up average color values for detected blobs and filter for the colors of the circles.
Alternatively you could filter the whole image for each color and do blob detection on the filtered images. But this is slower.
Find the centroids (~ center point) of each blob using moments of the blob contours. [2] "Center of multiple blobs in an Image".
Now you have the four pixel positions of your circles. If you know the world coordinates of your light projected circles you can use solvePnP to get a pose from this.
Knowing the correct world coordinates is tricky in your case because you project the circle with light on a surface. This involves some 3D geometry. You need to know the transformation from camera coordinate system to pattern projector coordinate system and the projection parameters of your projector.
I guess you send the projected pattern as an image to the projector. I think you can then model the projector as a camera with a certain camera matrix (basically field of view & center point). Naturally you know the pixel coordinates of the projected circles. From this you can compute rays in 3D space (in projector coordinate system). As a starting point see [3]. Intersecting [4] them with the correct surface plane (in projector coordinate system) gives you the 3D coordinates of
the projected circle pattern in projector coordinate system. Transform these to camera coordinate system using your known transformation. Now use opencv solvePnP to determine pose of projected light marker.
How to get surface plane?
If your setup is static you could use visual marker detection of all recorded images and use mean oder median of marker pose as surface plane. Not sure what this implies for your evaluation though..
[1] https://www.learnopencv.com/blob-detection-using-opencv-python-c/
[2] https://www.learnopencv.com/find-center-of-blob-centroid-using-opencv-cpp-python/
[3] https://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html
[4] https://www.cs.princeton.edu/courses/archive/fall00/cs426/lectures/raycast/sld017.htm

How to get back the co-ordinate points corresponding to the intensity points obtained from a faster r-cnn object detection process?

As a result of the faster r-cnn method of object detection, I have obtained a set of boxes of intensity values(each bounding box can be thought of as a 3D matrix with depth of 3 for rgb intensity, a width and a height which can then be converted into a 2D matrix by taking gray scale) corresponding to the region containing the object. What I want to do is to obtain the corresponding co-ordinate points in the original image for each cell of intensity inside of the bounding box. Any ideas how to do so?
From what I understand, you got an R-CNN model that outputs cropped pieces of the input image and you now want to trace those output crops back to their coordinates in the original image.
What you can do is simply use a patch-similarity-measure to find the original position.
Since the output crop should look exactly like itself in the original image, just use Pixel-based distance:
Find the place in the image with the smallest distance (should be zero) and from that you can find your desired coordinates.
In python:
d_min = 10**6
crop_size = crop.shape
for x in range(org_image.shape[0]-crop_size[0]):
for y in range(org_image.shape[1]-crop_size[1]):
d = np.abs(np.sum(np.sum(org_image[x:x+crop_size[0],y:y+crop_size[0]]-crop)))
if d <= d_min:
d_min = d
coord = [x,y]
However, your model should have that info available in it (after all, it crops the output based on some coordinates). Maybe if you add some info on your implementation.

Calculation of center point for the localization of robot in 3D data

I am trying to find a reliable method to calculate the corner points of a container. From these corner point’s idea is to calculate the center point of the container for the localization of robot, it means that the calculated center point will be the destination of robot in order to pick the container. For this I am looking for any suggestions to calculate the corner points or may be if any possibility to calculate the center point directly. Up to this point PCL library C/C++ is used for the processing of the 3D data.
The image below is the screenshot of the container.
thanks in advance.
afterApplyingPassthrough
I did the following things:
I binarized the image (black pixels = 0, green pixels = 1),
inverted the image (black pixels = 1, green pixels = 0),
eroded the image with 3x3 kernel N-times and dilated it with same kernel M-times.
Left: N=2, M=1;Right: N=6, M=6
After that:
I computed contours of all non-zero areas and
removed the contour that surrounded entire image.
This are the contours that remained:
I do not know how "typical" input image looks like in your case. Since I only have access to one sample image, I would rather not speculate about "general solution" that will be suitable for you. But to solve this particular case, you could analyze every contour in the following way:
compute rotatated rectangle that fits best around your contour (you need something similar to minAreaRect from OpenCV)
compute areas of rectangle and contour interior
if the difference between contour area and the area of the rotated bounding rectangle is small, the contour has approximately rectangular shape
find the contour that is both rectangular and satisfies some other condition (for example: typical area of the container). Assume that this belongs to container and compute its center.
I am not claiming that this is a solution that will work well in real world scenarios. It is also not fast. You should view it as a "sketch" that shows how to extract some useful information.
I assume the wheels maintain the cart a known offset from the floor and you can identify the floor. Filter out all points which are too close to the floor (this will remove wheels and everything but cart which will help limit data and simplify later steps.
If you isolate the cart, you could apply a simple average point (centroid), alternately, if that is not precise, you could try finding the bounding box of the isolated cart (min max in primary directions) and then take the centroid of that bounding box (this should be more accurate, but will still need a slight vertical offset due to the top handles).
If you can not isolate the cart or the other methods are not working well, you could try using PCL sample consensus specifically SACMODEL_LINE. This will be an involved strategy, but will give very solid results, basically run through and find each line and subtract its members from the cloud so as to find the next best line. After you have your 4 primary cart lines, use their parameters to find your centroid. *this would also be robust against random items being in or on the cart as well as carts of various sizes (assuming they always had linear perpendicular walls)

Locate words in binary image

I'd like to figure out a method for finding bounding boxes of words or a pair of words in binary image. The image itself looks like this: (bounding boxes I need are marked by blue rectangles).
Image is free of any other objects. I'm thinking about some form of connected component analysis, like detecting single letters first, then "drawing" their bounding boxes on another Mat object in such a way that neighbouring letters connect. There is a useful information I'd like to utilize - word or a pair of words forms a horizontal line, which is an information that could be used to separate "Hello there" and "abcdf" - I just don't know how to do it.
Contour the image.
Pick contours with a suitable area and width/height to be letters - get coords of centers.
From list of centers decide how far apart 2 centers can be to be adjacent letters
rather than a gap.
Group these contours into a word and take their
bounding box
Opencv has clustering, contour area and bounding box funcs if you don't want to do it yourself
Do OX-dilation using window size N, where N is approximate 1..2 size of letter width, then you will have black filled "boxes".
Find contours ( see http://docs.opencv.org/modules/imgproc/doc/structural_analysis_and_shape_descriptors.html ).
Find rectangles and correct its with (minus approx 1 size of letter width) due to dilation width enlargement.

How to merge 2 CvRects with minimal distance that are result of cvContour

in my project i use cvFindContours to detect objects.
With the result(s), i want to mark the roi of the input image(If the distance between the detected blobs are high i want to iterate the tagging of the roi).
My problem is, that a few rects from the found blobs are overlapped or is part of a bigger blob.
Is there a fast solution to remove inner blobs and merge blobs with minimal distance?
For example:
You can check if rectangles are overlaping using operator& of cv::Rect:
cv::Rect a(x1,y1,w1,h1);
cv::Rect b(x2,y2,w2,h2);
cv::Rect intersect = a&b; // if intersect is not empty, the rect overlaps
As for your "minimal distance", there is no way to do that using standard opencv functions. You have to determine what is the "distance" between the rectangles: distance between their centers (not recommanded) ? Distance between their borders? Then remind you have 2 dimensions. You can do it, but you have to code it yourself.

Resources