How do I convert pixel/screen coordinates to cartesian coordinates? - image-processing

How do I convert pixel/screen coordinates to cartesian coordinates(x,y)?
The info I have on the pictures is (see image below):
vFov in degrees, hFov in degrees, pixel width, pixel height
Basically what I want is to take any pixel on the image, and calculate the pixel position from image center in degrees.

For my answer I assume that your image represents a projection onto a planar surface.
Then a virtual camera can be constructed such that it sees the width/height of the image in exactly the right field of view. To get the distance d between the image and the camera(in pixels) a constructed right triangle can be used:
tan(FOV/2) = width/2 / d → d = width/(2tan(FOV/2))
The same equation should hold for the height.
In a similar way the angle of the pixel can be calculated(assuming the center of the image is (0, 0)):
tan(angleX) = x/d → angleX = arctan(x/d) = arctan(x/width * 2tan(hFov))
tan(angleY) = y/d → angleY = arctan(y/d) = arctan(y/width * 2tan(vFov))
In case the image is warped the d's of the vertical and the horizontal can be different and therefor you should not precalculate d.

Related

ArUco Markers, pose estimatimation- Exactly for which point the traslation and rotation is given?

I Detected the ArUco marker and estimated the pose. See the image below. However, Xt (X translation) I get is a positive value. According to the drawAxis function, the positive direction is away from the image center. So I thought it was supposed to be a negative value. Why I am getting positive instead.
My camera is about 120 mm away from the imaging surface. But I am getting Zt (Z translation) in the range of 650 mm. Is pose estimation giving the pose of marker with respect to physical camera or image plane center? I didn't get why the Zt is so high.
I kept measuring Pose while changing Z, and obtained roll, pitch, yaw. I noticed roll ( rotation w.r.t. cam X-axis) is changing its sign back and forth magnitude change 166-178, but the sign of Xt did not change with the sign change in roll. Any thoughts on why it behaves like that?
Any suggestion to get more consistent data?
image=cv.imread(fname)
arucoDict = cv.aruco.Dictionary_get(cv.aruco.DICT_4X4_1000)
arucoParams = cv.aruco.DetectorParameters_create()
(corners, ids, rejected) = cv.aruco.detectMarkers(image, arucoDict,
parameters=arucoParams)
print(corners, ids, rejected)
if len(corners) > 0:
# flatten the ArUco IDs list
ids = ids.flatten()
# loop over the detected ArUCo corners
#for (markerCorner, markerID) in zip(corners, ids):
#(markerCorner, markerID)=(corners, ids)
# extract the marker corners (which are always returned in
# top-left, top-right, bottom-right, and bottom-left order)
#corners = corners.reshape((4, 2))
(topLeft, topRight, bottomRight, bottomLeft) = corners[0][0][0],corners[0][0][1],corners[0][0][2],corners[0][0][3]
# convert each of the (x, y)-coordinate pairs to integers
topRight = (int(topRight[0]), int(topRight[1]))
bottomRight = (int(bottomRight[0]), int(bottomRight[1]))
bottomLeft = (int(bottomLeft[0]), int(bottomLeft[1]))
topLeft = (int(topLeft[0]), int(topLeft[1]))
# draw the bounding box of the ArUCo detection
cv.line(image, topLeft, topRight, (0, 255, 0), 2)
cv.line(image, topRight, bottomRight, (0, 255, 0), 2)
cv.line(image, bottomRight, bottomLeft, (0, 255, 0), 2)
cv.line(image, bottomLeft, topLeft, (0, 255, 0), 2)
# compute and draw the center (x, y)-coordinates of the ArUco
# marker
cX = int((topLeft[0] + bottomRight[0]) / 2.0)
cY = int((topLeft[1] + bottomRight[1]) / 2.0)
cv.circle(image, (cX, cY), 4, (0, 0, 255), -1)
if topLeft[1]!=topRight[1] or topLeft[0]!=bottomLeft[0]:
rot1=np.degrees(np.arctan((topLeft[0]-bottomLeft[0])/(bottomLeft[1]-topLeft[1])))
rot2=np.degrees(np.arctan((topRight[1]-topLeft[1])/(topRight[0]-topLeft[0])))
rot=(np.round(rot1,3)+np.round(rot2,3))/2
print(rot1,rot2,rot)
else:
rot=0
# draw the ArUco marker ID on the image
rotS=",rotation:"+str(np.round(rot,3))
cv.putText(image, ("position: "+str(cX) +","+str(cY)),
(100, topLeft[1] - 15), cv.FONT_HERSHEY_SIMPLEX,0.5, (255, 0, 80), 2)
cv.putText(image, rotS,
(400, topLeft[1] -15), cv.FONT_HERSHEY_SIMPLEX,0.5, (255, 0, 80), 2)
print("[INFO] ArUco marker ID: {}".format(ids))
d=np.round((math.dist(topLeft,bottomRight)+math.dist(topRight,bottomLeft))/2,3)
# Get the rotation and translation vectors
rvecs, tvecs, obj_points = cv.aruco.estimatePoseSingleMarkers(corners,aruco_marker_side_length,mtx,dst)
# Print the pose for the ArUco marker
# The pose of the marker is with respect to the camera lens frame.
# Imagine you are looking through the camera viewfinder,
# the camera lens frame's:
# x-axis points to the right
# y-axis points straight down towards your toes
# z-axis points straight ahead away from your eye, out of the camera
#for i, marker_id in enumerate(marker_ids):
# Store the translation (i.e. position) information
transform_translation_x = tvecs[0][0][0]
transform_translation_y = tvecs[0][0][1]
transform_translation_z = tvecs[0][0][2]
# Store the rotation information
rotation_matrix = np.eye(4)
rotation_matrix[0:3, 0:3] = cv.Rodrigues(np.array(rvecs[0]))[0]
r = R.from_matrix(rotation_matrix[0:3, 0:3])
quat = r.as_quat()
# Quaternion format
transform_rotation_x = quat[0]
transform_rotation_y = quat[1]
transform_rotation_z = quat[2]
transform_rotation_w = quat[3]
# Euler angle format in radians
roll_x, pitch_y, yaw_z = euler_from_quaternion(transform_rotation_x,transform_rotation_y,transform_rotation_z,transform_rotation_w)
roll_x = math.degrees(roll_x)
pitch_y = math.degrees(pitch_y)
yaw_z = math.degrees(yaw_z)
Disclaimer: this goes for OpenCV v4.5.5 and corresponding aruco module (contrib repo). They redid a lot of aruco stuff for v4.6.0 and v4.7.0, so best check everything I say here.
Without checking all the code (looks roughly okay), a few basics about OpenCV and aruco:
Both use right-handed coordinate systems. Thumb X, index Y, middle Z.
OpenCV uses X right, Y down, Z far, for screen/camera frames. Origin for screens and pictures is the top left corner. For cameras, the origin is the center of the pinhole model, which would be the center of the aperture. I can't comment on lenses or lens systems. Assume the lens center is the origin. That's probably close enough.
Aruco uses X right, Y far, Z up, if the marker is lying flat on a table. Origin is in the center of the marker. The top left corner of the marker is considered the "first" corner.
The marker can be considered to have its own coordinate system/frame.
The pose given by rvec and tvec is the pose of the marker in the camera frame. That means np.linalg.norm(tvec) gives you the direct distance from the camera to the marker's center. tvec's Z is just the component parallel to optical axis.
If the marker is in the right half of the picture ("half" defined by camera matrix's cx,cy), you'd expect tvec's X to grow. Lower half, Y positive/growing.
Conversely, that transformation transforms marker-local coordinates to camera-local. Try transforming some marker-local points, such as origin or points on the axes. I believe that cv::transform can help with that. Using OpenCV's projectPoints to map 3D space points to 2D image points, you can then draw the marker's axes, or a cube on top of it, or anything you like.
Say the marker sits upright and faces the camera dead-on. When you consider the frame triads of the marker and the camera in space ("world" space), both would be X "right", but one's Y and Z are opposite the other's Y and Z, so you'd expect to see a rotation around the X axis by half a turn (rotating Z and Y).
You could imagine the transformation to happen like this:
initially the camera looks through the marker, from the marker's back out into the world. The camera would be "upside down". The camera sees marker-space.
the pose's rotation component rotates the whole marker-local world around the camera's origin. Seen from the world frame (point of reference), the camera rotates, into an attitude you'd find natural.
the pose's translation moves the marker's world out in front of the camera (Z being positive), or equivalently, the camera backs away from the marker.
If you get implausible values, check aruco_marker_side_length and camera matrix. f would be around 500-3000 for typical resolutions (VGA-4k) and fields of view (60-80 degrees).

How to rotate an image with multiple points and angles?

I want to straighten the inclined text in the image. I was able to get contours and angle of rotation for each contour. I have formed a box around each contour and also have x-y cordinates of four corners of the box. I am trying to figure out how to rotate each box of contour according to the data point and angle associated with it. I have been trying to replicate the code below according to my task, but couldn't get it to work.
new_image = cv2.imread(image_path)
(h, w) = new_image.shape[:2]
center = (w // 2, h // 2)
M = cv2.getRotationMatrix2D(center, angle, 1.0)
new_image = cv2.warpAffine(new_image, M, (w, h), flags=cv2.INTER_CUBIC, borderMode=cv2.BORDER_REPLICATE)
cv2.imshow(new_image)`

How to perform image transformation operation(rotate, scale, translate) in python by transformation matrix

I want to perform matrix transformation operations like rotation, scaling, translation on an image without any other external library(PIL, CV2) function call.
I am trying with the basic matrix transformation approach. But I am facing some issues likes while rotation there are holes, image is cropped.
I want to perform all the above operation by along the center. Needed some guidance here.
Rotation Logic:
newImage = np.zeros((2*row, 2*column, 3), dtype=np.uint8)
radAngle = math.radians(int(input("angle: ")))
c, s = math.cos(radAngle), math.sin(radAngle)
mid_coords = np.floor(0.5*np.array(image.shape))
for i in range(0, row-1):
for j in range(0, column-1):
x2 = mid_coords[0] + (i-mid_coords[0])*c - (j-mid_coords[1])*s
y2 = mid_coords[1] + (i-mid_coords[0])*s + (j-mid_coords[1])*c
newImage[int(math.ceil(x2)), int(math.ceil(y2))] = image[i, j]
The reason for holes in your rotated image is you are assigning each pixel of your original image to the computed pixel location in the rotated image. Due to this, some pixels in the rotated image are left out.
You should do the other way round.
For each pixel in the rotated image, find its value from the original image.
Please refer to this answer.

Flutter - How to rotate an image around the center with canvas

I am trying to implement a custom painter that can draw an image (scaled down version) on the canvas and the drawn image can be rotated and scaled.
I get to know that to scale the image I have to scale the canvas using scale method.
Now the questions is how to rotate the scaled image on its center (or any other point). The rotate method of canvas allow only to rotate on top left corner.
Here is my implementation that can be extended
Had the same problem, Solution was simply making your own rotation method in three lines
void rotate(Canvas canvas, double cx, double cy, double angle) {
canvas.translate(cx, cy);
canvas.rotate(angle);
canvas.translate(-cx, -cy);
}
We thus first move the canvas towards the point you want to pivot around. We then rotate along the the topleft (default for Flutter) which in coordinate space is the pivot you want and then put the canvas back to the desired position, with the rotation applied. Method is very efficient, requiring only 4 additions for the translation and the rotation cost is identical to the original one.
This can achieve by shifting the coordinate space as illustrated in figure 1.
The translation is the difference in coordinates between C1 and C2, which are exactly as between A and B in figure 2.
With some geometry formulas, we can calculate the desired translation and produce the rotated image as in the method below
ui.Image rotatedImage({ui.Image image, double angle}) {
var pictureRecorder = ui.PictureRecorder();
Canvas canvas = Canvas(pictureRecorder);
final double r = sqrt(image.width * image.width + image.height * image.height) / 2;
final alpha = atan(image.height / image.width);
final beta = alpha + angle;
final shiftY = r * sin(beta);
final shiftX = r * cos(beta);
final translateX = image.width / 2 - shiftX;
final translateY = image.height / 2 - shiftY;
canvas.translate(translateX, translateY);
canvas.rotate(angle);
canvas.drawImage(image, Offset.zero, Paint());
return pictureRecorder.endRecording().toImage(image.width, image.height);
}
alpha, beta, angle are all in radian.
Here is the repo of the demo app
If you don't want to rotate the image around the center of the image you can use this way. You won't have to care about what the offset of the canvas should be in relation to the image rotation, because the canvas is moved back to its original position after the image is drawn.
void rotate(Canvas c, Image image, Offset focalPoint, Size screenSize, double angle) {
c.save();
c.translate(screenSize.width/2, screenSize.height/2);
c.rotate(angle);
// To rotate around the center of the image, focal point is the
// image width and height divided by 2
c.drawImage(image, focalPoint*-1, Paint());
c.translate(-screenSize.width/2, -screenSize.height/2);
c.restore();
}

OpenGL: How to lathe a 2D shape into 3D?

I have an OpenGL program (written in Delphi) that lets user draw a polygon. I want to automatically revolve (lathe) it around an axis (say, Y asix) and get a 3D shape.
How can I do this?
For simplicity, you could force at least one point to lie on the axis of rotation. You can do this easily by adding/subtracting the same value to all the x values, and the same value to all the y values, of the points in the polygon. It will retain the original shape.
The rest isn't really that hard. Pick an angle that is fairly small, say one or two degrees, and work out the coordinates of the polygon vertices as it spins around the axis. Then just join up the points with triangle fans and triangle strips.
To rotate a point around an axis is just basic Pythagoras. At 0 degrees rotation you have the points at their 2-d coordinates with a value of 0 in the third dimension.
Lets assume the points are in X and Y and we are rotating around Y. The original 'X' coordinate represents the hypotenuse. At 1 degree of rotation, we have:
sin(1) = z/hypotenuse
cos(1) = x/hypotenuse
(assuming degree-based trig functions)
To rotate a point (x, y) by angle T around the Y axis to produce a 3d point (x', y', z'):
y' = y
x' = x * cos(T)
z' = x * sin(T)
So for each point on the edge of your polygon you produce a circle of 360 points centered on the axis of rotation.
Now make a 3d shape like so:
create a GL 'triangle fan' by using your center point and the first array of rotated points
for each successive array, create a triangle strip using the points in the array and the points in the previous array
finish by creating another triangle fan centered on the center point and using the points in the last array
One thing to note is that usually, the kinds of trig functions I've used measure angles in radians, and OpenGL uses degrees. To convert degrees to radians, the formula is:
degrees = radians / pi * 180
Essentially the strategy is to sweep the profile given by the user around the given axis and generate a series of triangle strips connecting adjacent slices.
Assume that the user has drawn the polygon in the XZ plane. Further, assume that the user intends to sweep around the Z axis (i.e. the line X = 0) to generate the solid of revolution, and that one edge of the polygon lies on that axis (you can generalize later once you have this simplified case working).
For simple enough geometry, you can treat the perimeter of the polygon as a function x = f(z), that is, assume there is a unique X value for every Z value. When we go to 3D, this function becomes r = f(z), that is, the radius is unique over the length of the object.
Now, suppose we want to approximate the solid with M "slices" each spanning 2 * Pi / M radians. We'll use N "stacks" (samples in the Z dimension) as well. For each such slice, we can build a triangle strip connecting the points on one slice (i) with the points on slice (i+1). Here's some pseudo-ish code describing the process:
double dTheta = 2.0 * pi / M;
double dZ = (zMax - zMin) / N;
// Iterate over "slices"
for (int i = 0; i < M; ++i) {
double theta = i * dTheta;
double theta_next = (i+1) * dTheta;
// Iterate over "stacks":
for (int j = 0; j <= N; ++j) {
double z = zMin + i * dZ;
// Get cross-sectional radius at this Z location from your 2D model (was the
// X coordinate in the 2D polygon):
double r = f(z); // See above definition
// Convert 2D to 3D by sweeping by angle represented by this slice:
double x = r * cos(theta);
double y = r * sin(theta);
// Get coordinates of next slice over so we can join them with a triangle strip:
double xNext = r * cos(theta_next);
double yNext = r * sin(theta_next);
// Add these two points to your triangle strip (heavy pseudocode):
strip.AddPoint(x, y, z);
strip.AddPoint(xNext, yNext, z);
}
}
That's the basic idea. As sje697 said, you'll possibly need to add end caps to keep the geometry closed (i.e. a solid object, rather than a shell). But this should give you enough to get you going. This can easily be generalized to toroidal shapes as well (though you won't have a one-to-one r = f(z) function in that case).
If you just want it to rotate, then:
glRotatef(angle,0,1,0);
will rotate it around the Y-axis. If you want a lathe, then this is far more complex.

Resources