what Orthogonal Matrix is to do with image processing?

what Orthogonal Matrix is to do with image processing? - image-processing

I am trying to map mathematics to image processing. I am very much beginner in Math. I read what is orthogonal matrix is from this link : http://people.revoledu.com/kardi/tutorial/LinearAlgebra/MatrixOrthogonal.html
How can i relate this orthogonal matrix to image processing,or any other application of this orthogonal.

I wouldn't spend too much time thinking about it. Nonetheless.
The main place where we see orthogonal matrices in graphics/image-processing is rotation matrices. For example, if I want to rotate an image by t degrees, I would (conceptually) transform that image by the matrix
[ cos(t) , -sin(t) ]
[ sin(t) , cos(t) ]
Meaning I would take a point [ x, y ] in my image, and transform it by that matrix to obtain [ x', y' ], the new location of that point. Were I to do this for every pixel in the image and place the new points on a blank canvas (that could fit them, obviously), I would see (roughly) the image rotated by t radians.
This is an example of an orthogonal matrix. To 'undo' the operation, I transform with the inverse of the matrix. But since this matrix is orthogonal, the inverse is just the transpose:
[ cos(t) , sin(t) ]
[ -sin(t) , cos(t) ]
If I applied that transform to my rotated image, I'd obtain the original (I'm glossing over details like filtering). That matrix represents the inverse operation of rotating by t radians. It is, in fact, a matrix for rotating by -t radians.
In this case it's actually very easy to see that: if you plug -t into the first matrix, since sin is an odd function and cos an even, you'll get exactly the transpose.
This is conceptually very simple compared to a general matrix inversion. We like orthogonal matrices because of that inverse-is-transpose property.
Just one practical example.

Related

OpenCV: get perspective matrix from translation & rotation

I'm trying to verify my camera calibration, so I'd like to rectify the calibration images. I expect that this will involve using a call to warpPerspective but I do not see an obvious function that takes the camera matrix, and the rotation and translation vectors to generate the perspective matrix for this call.
Essentially I want to do the process described here (see especially the images towards the end) but starting with a known camera model and pose.
Is there a straightforward function call that takes the camera intrinsic and extrinsic parameters and computes the perspective matrix for use in warpPerspective?
I'll be calling warpPerspective after having called undistort on the image.
In principle, I could derive the solution by solving the system of equations defined at the top of the opencv camera calibration documentation after specifying the constraint Z=0, but I figure that there must be a canned routine that will allow me to orthorectify my test images.
In my seearches, I'm finding it hard to wade through all of the stereo calibration results -- I only have one camera, but want to rectify the image under the constraint that I'm only looking a a planar test pattern.

Actually there is no need to involve an orthographic camera. Here is how you can get the appropriate perspective transform.
If you calibrated the camera using cv::calibrateCamera, you obtained a camera matrix K a vector of lens distortion coefficients D for your camera and, for each image that you used, a rotation vector rvec (which you can convert to a 3x3 matrix R using cv::rodrigues, doc) and a translation vector T. Consider one of these images and the associated R and T. After you called cv::undistort using the distortion coefficients, the image will be like it was acquired by a camera of projection matrix K * [ R | T ].
Basically (as #DavidNilosek intuited), you want to cancel the rotation and get the image as if it was acquired by the projection matrix of form K * [ I | -C ] where C=-R.inv()*T is the camera position. For that, you have to apply the following transformation:
Hr = K * R.inv() * K.inv()
The only potential problem is that the warped image might go outside the visible part of the image plane. Hence, you can use an additional translation to solve that issue, as follows:
[ 1 0 | ]
Ht = [ 0 1 | -K*C/Cz ]
[ 0 0 | ]
where Cz is the component of C along the Oz axis.
Finally, with the definitions above, H = Ht * Hr is a rectifying perspective transform for the considered image.

This is a sketch of what I mean by "solving the system of equations" (in Python):
import cv2
import scipy # I use scipy by habit; numpy would be fine too
#rvec= the rotation vector
#tvec = the translation *emphasized text*matrix
#A = the camera intrinsic
def unit_vector(v):
return v/scipy.sqrt(scipy.sum(v*v))
(fx,fy)=(A[0,0], A[1,1])
Ainv=scipy.array( [ [1.0/fx, 0.0, -A[0,2]/fx],
[ 0.0, 1.0/fy, -A[1,2]/fy],
[ 0.0, 0.0, 1.0] ], dtype=scipy.float32 )
R=cv2.Rodrigues( rvec )
Rinv=scipy.transpose( R )
u=scipy.dot( Rinv, tvec ) # displacement between camera and world coordinate origin, in world coordinates
# corners of the image, for here hard coded
pixel_corners=[ scipy.array( c, dtype=scipy.float32 ) for c in [ (0+0.5,0+0.5,1), (0+0.5,640-0.5,1), (480-0.5,640-0.5,1), (480-0.5,0+0.5,1)] ]
scene_corners=[]
for c in pixel_corners:
lhat=scipy.dot( Rinv, scipy.dot( Ainv, c) ) #direction of the ray that the corner images, in world coordinates
s=u[2]/lhat[2]
# now we have the case that (s*lhat-u)[2]==0,
# i.e. s is how far along the line of sight that we need
# to move to get to the Z==0 plane.
g=s*lhat-u
scene_corners.append( (g[0], g[1]) )
# now we have: 4 pixel_corners (image coordinates), and 4 corresponding scene_coordinates
# can call cv2.getPerspectiveTransform on them and so on..

finding the real world coordinates of an image point

I am searching lots of resources on internet for many days but i couldnt solve the problem.
I have a project in which i am supposed to detect the position of a circular object on a plane. Since on a plane, all i need is x and y position (not z) For this purpose i have chosen to go with image processing. The camera(single view, not stereo) position and orientation is fixed with respect to a reference coordinate system on the plane and are known
I have detected the image pixel coordinates of the centers of circles by using opencv. All i need is now to convert the coord. to real world.
http://www.packtpub.com/article/opencv-estimating-projective-relations-images
in this site and other sites as well, an homographic transformation is named as:
p = C[R|T]P; where P is real world coordinates and p is the pixel coord(in homographic coord). C is the camera matrix representing the intrinsic parameters, R is rotation matrix and T is the translational matrix. I have followed a tutorial on calibrating the camera on opencv(applied the cameraCalibration source file), i have 9 fine chessbordimages, and as an output i have the intrinsic camera matrix, and translational and rotational params of each of the image.
I have the 3x3 intrinsic camera matrix(focal lengths , and center pixels), and an 3x4 extrinsic matrix [R|T], in which R is the left 3x3 and T is the rigth 3x1. According to p = C[R|T]P formula, i assume that by multiplying these parameter matrices to the P(world) we get p(pixel). But what i need is to project the p(pixel) coord to P(world coordinates) on the ground plane.
I am studying electrical and electronics engineering. I did not take image processing or advanced linear algebra classes. As I remember from linear algebra course we can manipulate a transformation as P=[R|T]-1*C-1*p. However this is in euclidian coord system. I dont know such a thing is possible in hompographic. moreover 3x4 [R|T] Vector is not invertible. Moreover i dont know it is the correct way to go.
Intrinsic and extrinsic parameters are know, All i need is the real world project coordinate on the ground plane. Since point is on a plane, coordinates will be 2 dimensions(depth is not important, as an argument opposed single view geometry).Camera is fixed(position,orientation).How should i find real world coordinate of the point on an image captured by a camera(single view)?
EDIT
I have been reading "learning opencv" from Gary Bradski & Adrian Kaehler. On page 386 under Calibration->Homography section it is written: q = sMWQ where M is camera intrinsic matrix, W is 3x4 [R|T], S is an "up to" scale factor i assume related with homography concept, i dont know clearly.q is pixel cooord and Q is real coord. It is said in order to get real world coordinate(on the chessboard plane) of the coord of an object detected on image plane; Z=0 then also third column in W=0(axis rotation i assume), trimming these unnecessary parts; W is an 3x3 matrix. H=MW is an 3x3 homography matrix.Now we can invert homography matrix and left multiply with q to get Q=[X Y 1], where Z coord was trimmed.
I applied the mentioned algorithm. and I got some results that can not be in between the image corners(the image plane was parallel to the camera plane just in front of ~30 cm the camera, and i got results like 3000)(chessboard square sizes were entered in milimeters, so i assume outputted real world coordinates are again in milimeters). Anyway i am still trying stuff. By the way the results are previosuly very very large, but i divide all values in Q by third component of the Q to get (X,Y,1)
FINAL EDIT
I could not accomplish camera calibration methods. Anyway, I should have started with perspective projection and transform. This way i made very well estimations with a perspective transform between image plane and physical plane(having generated the transform by 4 pairs of corresponding coplanar points on the both planes). Then simply applied the transform on the image pixel points.

You said "i have the intrinsic camera matrix, and translational and rotational params of each of the image.” but these are translation and rotation from your camera to your chessboard. These have nothing to do with your circle. However if you really have translation and rotation matrices then getting 3D point is really easy.
Apply the inverse intrinsic matrix to your screen points in homogeneous notation: C-1*[u, v, 1], where u=col-w/2 and v=h/2-row, where col, row are image column and row and w, h are image width and height. As a result you will obtain 3d point with so-called camera normalized coordinates p = [x, y, z]T. All you need to do now is to subtract the translation and apply a transposed rotation: P=RT(p-T). The order of operations is inverse to the original that was rotate and then translate; note that transposed rotation does the inverse operation to original rotation but is much faster to calculate than R-1.

Warping Perspective using arbitary rotation angle

I have an image of a chessboard taken at some angle. Now I want to warp perspective so the chessboard image look again as if was taken directly from above.
I know that I can try to use 'findHomography' between matched points but I wanted to avoid it and use e.g. rotation data from mobile sensors to build homography matrix on my own. I calibrated my camera to get intrinsic parameters. Then lets say the following image has been taken at ~60degrees angle around x-axis. I thought that all I have to do is to multiply camera matrix with rotation matrix to obtain homography matrix. I tried to use the following code but looks like I'm not understanding something correctly because it doesn't work as expected (result image completely black or white.
import cv2
import numpy as np
import math
camera_matrix = np.array([[ 5.7415988502105745e+02, 0., 2.3986181527877352e+02],
[0., 5.7473682183375217e+02, 3.1723734404756237e+02],
[0., 0., 1.]])
distortion_coefficients = np.array([ 1.8662919398453856e-01, -7.9649812697463640e-01,
1.8178068172317731e-03, -2.4296638847737923e-03,
7.0519002388825025e-01 ])
theta = math.radians(60)
rotx = np.array([[1, 0, 0],
[0, math.cos(theta), -math.sin(theta)],
[0, math.sin(theta), math.cos(theta)]])
homography = np.dot(camera_matrix, rotx)
im = cv2.imread('data/chess1.jpg')
gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
im_warped = cv2.warpPerspective(gray, homography, (480, 640), flags=cv2.WARP_INVERSE_MAP)
cv2.imshow('image', im_warped)
cv2.waitKey()
pass
I also have distortion_coefficients after calibration. How can those be incorporated into the code to improve results?

This answer is awfully late by several years, but here it is ...
(Disclaimer: my use of terminology in this answer may be imprecise or incorrect. Please do look up on this topic from other more credible sources.)
Remember:
Because you only have one image (view), you can only compute 2D homography (perspective correspondence between one 2D view and another 2D view), not the full 3D homography.
Because of that, the nice intuitive understanding of the 3D homography (rotation matrix, translation matrix, focal distance, etc.) are not available to you.
What we say is that with 2D homography you cannot factorize the 3x3 matrix into those nice intuitive components like 3D homography does.
You have one matrix - (which is the product of several matrices unknown to you) - and that is it.
However,
OpenCV provides a getPerspectiveTransform function which solves the 3x3 perspective matrix (using homogenous coordinate system) for a 2D homography between two planar quadrilaterals.
Link to documentation
To use this function,
Find the four corners of the chessboard on the image. These will be your source coordinates.
Supply four rectangle corners of your choice. These will be your destination coordinates.
Pass the source coordinates and destination coordinates into the getPerspectiveTransform to generate a 3x3 matrix that is able to dewarp your chessboard to an upright rectangle.
Notes to remember:
Mind the ordering of the four corners.
If the source coordinates are picked in clockwise order, the destination also needs to be picked in clockwise order.
Likewise, if counter-clockwise order is used, do it consistently.
Likewise, if z-order (top left, top right, bottom left, bottom right) is used, do it consistently.
Failure to order the corners consistently will generate a matrix that executes the point-to-point correspondence exactly (mathematically speaking), but will not generate a usable output image.
The aspect ratio of the destination rectangle can be chosen arbitrarily. In fact, it is not possible to deduce the "original aspect ratio" of the object in world coordinates, because "this is 2D homography, not 3D".

One problem is that to multiply by a camera matrix you need some concept of a z coordinate. You should start by getting basic image warping given Euler angles to work before you think about distortion coefficients. Have a look at this answer for a slightly more detailed explanation and try to duplicate my result. The idea of moving your image down the z axis and then projecting it with your camera matrix can be confusing, let me know if any part of it does not make sense.

You do not need to calibrate the camera nor estimate the camera orientation (the latter, however, in this case would be very easy: just find the vanishing points of those orthogonal bundles of lines, and take their cross product to find the normal to the plane, see Hartley & Zisserman's bible for details).
The only thing you need to do is estimate the homography that maps the checkers to squares, then apply it to the image.

How can I transform an image using matrices R and T (extrinsic parameters matrices) in opencv?

I have a rotation-translation matrix [R T] (3x4).
Is there a function in opencv that performs the rotation-translation described by [R T]?

A lot of solutions to this question I think make hidden assumptions. I will try to give you a quick summary of how I think about this problem (I have had to think about it a lot in the past). Warping between two images is a 2 dimensional process accomplished by a 3x3 matrix called a homography. What you have is a 3x4 matrix which defines a transform in 3 dimensions. You can convert between the two by treating your image as a flat plane in 3 dimensional space. The trick then is to decide on the initial position in world space of your image plane. You can then transform its position and project it onto a new image plane with your camera intrinsics matrix.
The first step is to decide where your initial image lies in world space, note that this does not have to be the same as your initial R and T matrices specify. Those are in world coordinates, we are talking about the image created by that world, all the objects in the image have been flattened into a plane. The simplest decision here is to set the image at a fixed displacement on the z axis and no rotation. From this point on I will assume no rotation. If you would like to see the general case I can provide it but it is slightly more complicated.
Next you define the transform between your two images in 3d space. Since you have both transforms with respect to the same origin, the transform from [A] to [B] is the same as the transform from [A] to your origin, followed by the transform from the origin to [B]. You can get that by
transform = [B]*inverse([A])
Now conceptually what you need to do is to take your first image, project its pixels onto the geometric interpretation of your image in 3d space, then transform those pixels in 3d space by the transform above, then project them back onto a new 2d image with your camera matrix. Those steps need to be combined into a single 3x3 matrix.
cv::Matx33f convert_3x4_to_3x3(cv::Matx34f pose, cv::Matx33f camera_mat, float zpos)
{
//converted condenses the 3x4 matrix which transforms a point in world space
//to a 3x3 matrix which transforms a point in world space. Instead of
//multiplying pose by a 4x1 3d homogeneous vector, by specifying that the
//incoming 3d vectors will ALWAYS have a z coordinate of zpos, one can instead
//multiply converted by a homogeneous 2d vector and get the same output for x and y.
cv::Matx33f converted(pose(0,0),pose(0,1),pose(0,2)*zpos+pose(0,3),
pose(1,0),pose(1,1),pose(1,2)*zpos+pose(1,3),
pose(2,0),pose(2,1),pose(2,2)*zpos+pose(2,3));
//This matrix will take a homogeneous 2d coordinate and "projects" it onto a
//flat plane at zpos. The x and y components of the incoming homogeneous 2d
//coordinate will be correct, the z component is dropped.
cv::Matx33f projected(1,0,0,
0,1,0,
0,0,zpos);
projected = projected*camera_mat.inv();
//now we have the pieces. A matrix which can take an incoming 2d point, and
//convert it into a pseudo 3d point (x and y correspond to 3d, z is unused)
//and a matrix which can take our pseudo 3d point and transform it correctly.
//Now we just need to turn our transformed pseudo 3d point back into a 2d point
//in our new image, to do that simply multiply by the camera matrix.
return camera_mat*converted*projected;
}
This is probably a more complicated answer than you were looking for but I hope it gives you an idea of what you are asking. This can be very confusing and I glazed over some parts of it quickly, feel free to ask for clarification. If you need the solution to work without the assumption that the initial image appears without rotation let me know, I just didn't want to make it more complicated than it needed to be.

cvRemap to replace cvWarpPerspective

Below transformation is what I want to do.
For each tile in source image, I know the coordinate of each corner, and I know the coordinate of each corresponding corner in the output image, so I can call cvWarpPerspective to warp each tile and then connect the quadrangles together to get the final output image.
Can cvRemap do this in one transformation? If yes, how do I construct the map (mapx, and mapy) from the coordinate that I have so to pass to the cvRemap function? I've searched the EmguCV documentation but could not find a cvRemap example.

I have no experience with emgu (actually I have a low esteem for it) but I can explain about remap and warpPerspective. It may help you find the corresponding functions in emgu.
remap takes an input picture, and a coordinate relocation map, and applies it to the output. The map stores info like this: take pixel (1, 4) and move it to (3, 5). The pixels that are not defined in the map are filled with 0, or other value, depending on the extra parameters. Note that it also uses some interpolation for smooth results.
warpPerspective takes a geometric perspective transform, calculates internally the map for the transform, and then calls remap() to apply it to the input. Actually, many functions in OpenCV use remap, or can use it. warpAffine, lens correction, and others build their custom maps, then call remap to apply them.
The perspective transform is defined by a 3.3 matrix H. So, each coordinate in the input image will be shifted according to the H matrix:
[ h11 h12 h13 ] [ x ]
[x' y' 1] = [ h21 h22 h23 ] * [ y ]
[ h31 h32 h33 ] [ 1 ]
warpPerspective applies the inverse transform for each point in the destination image, to find out where in the source image is the pixel that should be moved in, and stores that info in the map.
You can take that code and make a custom function in your app, but I do not know how easy is to do it in C#. C++ would have been a piece of cake.
Final note: I have used the term map, although there are two map parameters in the remap() function. And to be more confusing, they can have completely different meanings.
First valid combination is where mapx contains the coordinate transforms of the x coordinate, in a width-by-height image, one channel. mapy is the corresponding map for y dimension. The coordinates are floating-point values, reflecting the fact that coordinate transforms from one image to another do not map exactly to integer values. By example, pixel (2, 5) may map to (3.456, 7.293)
second possibility is to store the integer coordinates in mapx, for both x, and y, in two channels, and keep an interpolation weights table in the second parameter, mapy. It is usually far easier to generate the first format, however the second is faster to process. To understand the sencod format you should read the OpenCV sources, because it's not documented anywere.

Categories

HOME

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart