In OpenCV or object detection models, they represent bounding box as 4 numbers e.g. x,y,width,height or x1,y1,x2,y2.
These numbers seem to be ill-defined but it's fine when the resolution is big.
But it causes me to think when the image has very low resolution e.g. 8x8, the one-pixel error can cause things to go very wrong.
So I want to know, what exactly does it mean when you say that a bounding box has x1=0, x2=100?
Specifically, I want to clear these confusions when understood well:
Does the bounding box border occupy the 0th pixel or is it surrounding 0th pixel (its border is at x=-1)?
Where is the exact end of the bounding box? If the image have shape=(8,8), would the end be at 7 or 8?
If you want to represent a bounding box that occupy the entire image, what should be its values?
So I think the right question should be, how do I think about bounding box intuitively so that these are not confusing for me?
OK. After many days working with bounding boxes, I have my own intuition on how to think about bounding box coordinates now.
I divide coordinates in 2 categories: continuous and discrete. The mental problems usually arise when you try to convert between them.
Suppose the image have width=100, height=100 then you can have a continuous point with x,y that can have any real value in the range [0,100].
It means that points like (0,0), (0.5,7.1,39.83,99.9999) are valid points.
Now you can convert a continuous point to a discrete point on the image by taking the floor of the number. E.g. (5.5, 8.9) gets mapped to pixel number (5,8) on the image. It's very important to understand that you should not use the ceiling or rounding operation to convert it to the discrete version. Suppose you have a continuous point (0.9,0.9) this point lies in the (0,0) pixel so it's closest to (0,0) pixel, not (1,1) pixel.
From this foundation, let's try to answer my question:
So I want to know, what exactly does it mean when you say that a bounding box has x1=0, x2=100?
It means that the continuous point 1 has x value = 0, and continuous point 2, has x value = 100. Continuous point has zero size. It's not a pixel.
Does the bounding box border occupy the 0th pixel or is it surrounding 0th pixel (its border is at x=-1)?
In continuous-space, the bounding box border occupy zero space. The border is infinitesimally slim. But when we want to draw it onto an image, the border will have the size of at least 1 pixel thick. So if we have a continuous point (0,0), it will occupy 0th pixel of the image. But theoretically, it represents a slim border at the left side and top side of the 0th pixel.
Where is the exact end of the bounding box? If the image have shape=(8,8), would the end be at 7 or 8?
The biggest x,y value you can have is 7.999... but when converted to discrete version you will be left with 7 which represent the last pixel.
If you want to represent a bounding box that occupy the entire image, what should be its values?
You should represent bounding box coordinates in continuous space instead of discrete space because of the precision that you have. It means the largest bounding box starts at (0,0) and ends at (100,100). But if you want to draw this box, you need to convert it to discrete version and draws the bounding box at (0,0) and end at (99,99).
In OpenCv the bounding rectangle can be defined in many ways. One way is its top-left corner and bottom-right corner. In case of constructor Rect(int x1, int y1, int x2, int y2) it defines those two points. The rectangle starts exactly on that pixel and coordinate. For subpixel rectangles there are also variants holding the floating point coordinates.
So I want to know, what exactly does it mean when you say that a bounding box has x1=0, x2=100?
That means the top-left corner x-coordinate starts at 0 and bottom-right x-coordinate
starts at 100.
Does the bounding box border occupy the 0th pixel or is it surrounding 0th pixel (its border is at x=-1)?
The border starts exactly on the 0-th pixel. Meaning that rectangle with width and height of 1px when drawn is just a signle dot (1px)
Where is the exact end of the bounding box? If the image have shape=(8,8), would the end be at 7 or 8?
The end would be at 7, see below.
If you want to represent a bounding box that occupy the entire image, what should be its values?
Lets have an image size of 100,100. The around the image rectangle defined by two points would be Rect(Point(0,0), Point(99,99)) by starting point and size Rect(0, 0, 100, 100)
The basic is to know that image of size X,Y has a minimum top-left coordinate at (0,0) and maximum at bottom-right (X-1,Y-1)
Related
I want to be able to interpolate any point in an image grid with uniform pixel distances. Until now, I succeeded in interpolating inner points surrounded by 4 existing pixel points. However, I am now lost at interpolating points that are at the boundaries of an image.
In the image example below, dark red dots represent the position of the values of the pixels (so from which to calculate the X and Y factors/weights for interpolation from the point we need to interpolate). So image[0] contains the actual value (not position) of the dark red dot in the left upper pixel, image[1] has the value of the dark red dot in the right upper pixel, image[2] has value of dark red dot of left-bottom pixel and so on. The grid is divided in such a way that vertex (0,0) represents the upper left corner of the image and (1,1) is the bottom right corner of the image and so on. All positions from the point that we need to interpolate to the vertices and red dot center of the pixels are represented by 2D coordinate where X and Y are always between 0 and 1. Each pixel has 4 vertices which are colored in yellow with their corresponding coordinates.
Now, I know how to calculate point p3 since it has 4 pixel centers around it. But how can we interpolate the points p1 and p2 if there are no 4 centers of pixels to use in the bilinear interpolation formula?
Example of 2x2 image:
I know, that the Meanshift-Algorithm calculates the mean of a pixel density and checks, if the center of the roi is equal with this point. If not, it moves the new ROI center to the mean center and checks again... . Like in this picture:
For density, it is clear how to find the mean point. But it can't simply calculate the mean of a histogram and get the new position by this point. How can this algorithm work using color histogram?
The feature space in your image is 2D.
Say you have an intensity image (so it's 1D) then you would just have a line (e.g. from 0 to 255) on which the points are located. The circles shown above would just be line segments on that [0,255] line. Depending on their means, these line segments would then shift, just like the circles do in 2D.
You talked about color histograms, so I assume you are talking about RGB.
In that case your feature space is 3D, so you have a sphere instead of a line segment or circle. Your axes are R,G,B and pixels from your image are points in that 3D feature space. You then still look where the mean of a sphere is, to then shift the center towards that mean.
So in scilab I did a analyzeblobs on my image and got a feature which is called BoundingBox which shows the rectangle around my object.
Now when I call this bounding Box I get 4 numbers, which I suppose are related to the corners of the rectangle.
What I don't know is that what are these numbers representing? Are they the pixel Index? or what?
Basically I want to calculate the width of the rectangle of my bounding box, so I need the coordinates of those four corners, but I don't know how to get it.
So I got the Answer:
the four elements are in order (x,y, width, height).
x,y are the coordinates of the top left corner
and the next two are the width and height of the rectangle.
So my second question has also been answered.
I'm trying to calculate the size of a bounding box after rotating a square. I've attached an image that hopefully describes what I'm looking to do.
After rotating by x degrees, the bounds becomes bigger. Is there a way to calculate this new size, given the angle and the dimensions of the original square? Thank you.
This can be solved through 2 applications of Pythag.
Each side of your larger square is split into two by a corner of your small blue square. Lets call the larger of these 2 sections length a, the smaller length b (although if x > 45 degrees then b will be larger), with side length l for the blue square and L as length of the black square.
We can calculate the first as: Cos(x) = a/l.
And the 2nd as Sin(x) = b/l
Thus we have L = (Sin(x)+Cos(x))*l.
Edit: Area is of course side length squared in both cases.
This works only if you have the co-ordinates. If you can get the co-ordinates of the four vertices, it would be so easy.. Lets assume the point at the top left corner of the bound be A. And the top two vertices of the square be sq_a and sq_b. The value of the vertex A would be (sq_a.x,sq_b.y). Then by symmetry , all the small four triangles formed between the bound and the square will be of the same area. Calculate the area of the triangle formed between A,sq_a and sq_b (which should be easy .. 1/2 * breadth * height). Multiply by 4 and the you will get the total area. Sorry couldnt post detailed pics.
I'm going to find the most look-like rectangles among shapes. The first image is the original image with shapes which possibly be rectangles but they are not. The green rectangles in the second image is what I want. So is there a way to do this with opencv? I've tried hough lines but the result's not good
The source image:
And what I want is to find out the most look-like rectangle among these shapes, like the rectangles in green.
What I want:
A very simple approach is, after you have a rectangle bounding box around your shape, count the percentage of pixels inside the box which are white.
The higher the percentage of white pixels, the closest to a rectangle it is.
To get the bounding boxes you should take a look at either findContours from opencv, or some Blob extracting algorithm, you will find plenty of questions regarding those.
Edit:
Maybe you should first get the Minimum bounding rectangles of the shapes and then do this kind of heuristic:
Shrink the rectangle dimensions until the white-pixel percentage inside the rectangle reaches some threshold defined by you (like 90% of white pixels inside the rectangle).
To get the Minimum bounding rectangle (the smallest rectangle which contains the whole shape), you might check this tutorial:
http://docs.opencv.org/doc/tutorials/imgproc/shapedescriptors/bounding_rects_circles/bounding_rects_circles.html
One thing that might also help is doing the difference of sizes from the minimum bounding rectangle and the maximum inner rectangle (the biggest rectangle you can fit inside the white shape). The less difference there is between those rectangle's properties (width, height, area, center coordinates) the closest is the shape to a rectangle.