Considering the screen resolutions of ios device is (320*480) points. What is the point value of top left pixel? Is it 0,0 or 1,1 ?
Technically, the top-left pixel is actually a rect that stretches from {0, 0} to {1, 1}, with the center point at {0.5, 0.5}.
When expressing rects, to include the top-left pixel you want to start at {0, 0}. But, for example, if you want to draw a line that's centered on the top-left pixel, then your line needs to pass through {0.5, 0.5}.
(0, 0). In computing world, numbering always starts at zero. That's how you can tell between programmers and other people: tell them to count to ten.
Related
I would like to draw some arrows like this. What I'm doing is draw from the circle's 45 degree, vertically to the square. But the distance between two arrows will change if the circle's size changes, so is there any way to draw the arrow with specified distance shifted to left or right?
If i use xshift or coordinate calculation, the arrow won't look like it starts from the circle, leaving a distance in between:
I am figuring out how OpenCV resize() function calculates linear interpolation when we set fx=2 and fy=1. I have written the following minimum working example,
import cv2
import numpy as np
pattern_img = np.zeros((6, 6), np.uint8)
pattern_img[:, 0::2] = 255
patteen_img_x2 = cv2.resize(pattern_img, None, fx=2, fy=1, interpolation=cv2.INTER_LINEAR)
if we look at the first row of pattern_img and pattern_img_x2, we will have,
pattern_img[0, :]
> array([255, 0, 255, 0, 255, 0], dtype=uint8)
pattern_img_x2[0, :]
> array([[255, 191, 64, 64, 191, 191, 64, 64, 191, 191, 64, 0]], dtype=uint8)
I cannot figure out how numbers 191 and 64 are calculated. I know that it implements bilinear algorithm, but in this case we have set fy=1, so it shall be a simple linear interpolation along x-axis. But I cannot figure out how resize() calculate those interpolated numbers. Could anybody help me to understand the algorithm behind?
This has to do with pixel "grids".
Is 0,0 the center of the first pixel, or the top left corner of it? Where are the corners of a pixel? A common question in computer graphics.
Interpolation adds another complication. Does a pixel define its whole square area? Then you get nearest neighbor interpolation. Or does it merely define the center point? Then, anything in between is undefined, technically, and interpolation gets to decide how to fill the space.
In OpenCV generally, pixel centers are at integer coordinates. That means the first pixel's top left corner sits at (-0.5, -0.5), so that's where the picture's top left corner starts.
Now, if you were to sample with fx=1, i.e. an identity transformation, you'd start at -0.5, which should be the left edge of a pixel, and the output pixel has a width of 1, so the first output pixel spans -0.5 to +0.5, and its center is at 0.0.
Since you want fx=2, your output pixels are 0.5 wide. You still start at -0.5, and your output pixels span... -0.5 to 0.0, 0.0 to +0.5, 0.5 to 1.0, 1.0 to 1.5...
And their centers sit at -0.25, +0.25, +0.75, +1.25, ...
And that is how you get those 1/4 and 3/4 values. 64 is one quarter of 255, 191 is three quarters of 255. And that's also why the first output pixel is 255. It sits to the left of the first input pixel, so that is its only support and determines 100% of its value.
You could "index-shift" this all so it is a little easier to visualize. Then the picture's top left pixel's top left corner is at (0,0), and the pixel extends to (1,1), with the center at (0.5,0.5). The output pixel grid lies accordingly, top left pixel going from 0 to 0.5 with center at 0.25, its neighbor to the right spanning 0.5 to 1.0, center at 0.75, and so on.
If you want to have full control over this madness, construct your own affine transformation (I'd recommend working with 3x3 matrices, easy to compose/matrix-multiply) and then use warpAffine. It'll take integer coordinates for the output, transform them using your matrix (it implicitly inverts it), and looks the resulting coordinates up in the source image, including interpolation in the source image space.
Made a little graphic here (click for full size). Black squares are input pixels, black dots their centers. Red squares and dots are the output pixels and their centers. You see, if you sample at the red dot positions, you'll sit at one or three quarters between input pixel centers.
In OpenCV or object detection models, they represent bounding box as 4 numbers e.g. x,y,width,height or x1,y1,x2,y2.
These numbers seem to be ill-defined but it's fine when the resolution is big.
But it causes me to think when the image has very low resolution e.g. 8x8, the one-pixel error can cause things to go very wrong.
So I want to know, what exactly does it mean when you say that a bounding box has x1=0, x2=100?
Specifically, I want to clear these confusions when understood well:
Does the bounding box border occupy the 0th pixel or is it surrounding 0th pixel (its border is at x=-1)?
Where is the exact end of the bounding box? If the image have shape=(8,8), would the end be at 7 or 8?
If you want to represent a bounding box that occupy the entire image, what should be its values?
So I think the right question should be, how do I think about bounding box intuitively so that these are not confusing for me?
OK. After many days working with bounding boxes, I have my own intuition on how to think about bounding box coordinates now.
I divide coordinates in 2 categories: continuous and discrete. The mental problems usually arise when you try to convert between them.
Suppose the image have width=100, height=100 then you can have a continuous point with x,y that can have any real value in the range [0,100].
It means that points like (0,0), (0.5,7.1,39.83,99.9999) are valid points.
Now you can convert a continuous point to a discrete point on the image by taking the floor of the number. E.g. (5.5, 8.9) gets mapped to pixel number (5,8) on the image. It's very important to understand that you should not use the ceiling or rounding operation to convert it to the discrete version. Suppose you have a continuous point (0.9,0.9) this point lies in the (0,0) pixel so it's closest to (0,0) pixel, not (1,1) pixel.
From this foundation, let's try to answer my question:
So I want to know, what exactly does it mean when you say that a bounding box has x1=0, x2=100?
It means that the continuous point 1 has x value = 0, and continuous point 2, has x value = 100. Continuous point has zero size. It's not a pixel.
Does the bounding box border occupy the 0th pixel or is it surrounding 0th pixel (its border is at x=-1)?
In continuous-space, the bounding box border occupy zero space. The border is infinitesimally slim. But when we want to draw it onto an image, the border will have the size of at least 1 pixel thick. So if we have a continuous point (0,0), it will occupy 0th pixel of the image. But theoretically, it represents a slim border at the left side and top side of the 0th pixel.
Where is the exact end of the bounding box? If the image have shape=(8,8), would the end be at 7 or 8?
The biggest x,y value you can have is 7.999... but when converted to discrete version you will be left with 7 which represent the last pixel.
If you want to represent a bounding box that occupy the entire image, what should be its values?
You should represent bounding box coordinates in continuous space instead of discrete space because of the precision that you have. It means the largest bounding box starts at (0,0) and ends at (100,100). But if you want to draw this box, you need to convert it to discrete version and draws the bounding box at (0,0) and end at (99,99).
In OpenCv the bounding rectangle can be defined in many ways. One way is its top-left corner and bottom-right corner. In case of constructor Rect(int x1, int y1, int x2, int y2) it defines those two points. The rectangle starts exactly on that pixel and coordinate. For subpixel rectangles there are also variants holding the floating point coordinates.
So I want to know, what exactly does it mean when you say that a bounding box has x1=0, x2=100?
That means the top-left corner x-coordinate starts at 0 and bottom-right x-coordinate
starts at 100.
Does the bounding box border occupy the 0th pixel or is it surrounding 0th pixel (its border is at x=-1)?
The border starts exactly on the 0-th pixel. Meaning that rectangle with width and height of 1px when drawn is just a signle dot (1px)
Where is the exact end of the bounding box? If the image have shape=(8,8), would the end be at 7 or 8?
The end would be at 7, see below.
If you want to represent a bounding box that occupy the entire image, what should be its values?
Lets have an image size of 100,100. The around the image rectangle defined by two points would be Rect(Point(0,0), Point(99,99)) by starting point and size Rect(0, 0, 100, 100)
The basic is to know that image of size X,Y has a minimum top-left coordinate at (0,0) and maximum at bottom-right (X-1,Y-1)
please correct me if I am wrong: if we have x=10, y=20, when we apply a transform on these coordinates (Lets say scaling x and y by 10), the new coordinates will be x=100 and y=200.
So, if we apply scaling of x by -1 we get x= -10, y =20. But why this action causes the view to be mirrored? shouldn't the view just be re-drawn at it's new coordinates?
What am I missing here ?
Don't think about a single coordinate, think about a range of coordinates.
If you take the coords (x-value only here) of... 0, 1, 2, 3, 4 and scale them by 10 then they will map to 0, 10, 20, 30, 40 respectively. This will stretch out the x axis and so the view will look 10 times bigger than it did originally.
If you take those same x coords and scale them by -1 then they will map to 0, -1, -2, -3, -4 respectively.
That is, the pixel that is furthest away from the origin (4) is still furthest away from the origin but now at -4.
Each pixel is mirrored through the origin.
That's how scaling works in iOS, Android and general mathematics.
If you just want to slide the view around without changing the size of it at all then you can use a translation instead.
I'm trying to calculate the size of a bounding box after rotating a square. I've attached an image that hopefully describes what I'm looking to do.
After rotating by x degrees, the bounds becomes bigger. Is there a way to calculate this new size, given the angle and the dimensions of the original square? Thank you.
This can be solved through 2 applications of Pythag.
Each side of your larger square is split into two by a corner of your small blue square. Lets call the larger of these 2 sections length a, the smaller length b (although if x > 45 degrees then b will be larger), with side length l for the blue square and L as length of the black square.
We can calculate the first as: Cos(x) = a/l.
And the 2nd as Sin(x) = b/l
Thus we have L = (Sin(x)+Cos(x))*l.
Edit: Area is of course side length squared in both cases.
This works only if you have the co-ordinates. If you can get the co-ordinates of the four vertices, it would be so easy.. Lets assume the point at the top left corner of the bound be A. And the top two vertices of the square be sq_a and sq_b. The value of the vertex A would be (sq_a.x,sq_b.y). Then by symmetry , all the small four triangles formed between the bound and the square will be of the same area. Calculate the area of the triangle formed between A,sq_a and sq_b (which should be easy .. 1/2 * breadth * height). Multiply by 4 and the you will get the total area. Sorry couldnt post detailed pics.