SVD not giving Rotation and Translation matrix from stereo essential matrix - opencv

I am trying to extract rotation and translation from the fundamental matrix. I used the intrinsic matrices to get the essential matrix, but the SVD is not giving expected results. So I composed the essential matrix and trying my SVD code to get the Rotation and translation matrices back and found that the SVD code is wrong.
I created the essential matrix using Rotation and translation matrices
R = [[ 0.99965657, 0.02563432, -0.00544263],
[-0.02596087, 0.99704732, -0.07226806],
[ 0.00357402, 0.07238453, 0.9973704 ]]
T = [-0.1679611706725666, 0.1475313058767286, -0.9746915198833979]
tx = np.array([[0, -T[2], T[1]], [T[2], 0, -T[0]], [-T[1], T[0], 0]])
E = R.dot(tx)
// E Output: [[-0.02418259, 0.97527093, 0.15178621],
[-0.96115177, -0.01316561, 0.16363519],
[-0.21769595, -0.16403593, 0.01268507]]
Now while trying to get it back using SVD.
U,S,V = np.linalg.svd(E)
diag_110 = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 0]])
newE = U.dot(diag_110).dot(V.T)
U,S,V = np.linalg.svd(newE)
W = np.array([[0, -1, 0], [1, 0, 0], [0, 0, 1]])
Z = np.array([[0, 1, 0],[-1, 0, 0],[0, 0, 0]])
R1 = U.dot(W).dot(V.T)
R2 = U.dot(W.T).dot(V.T)
T = U.dot(Z).dot(U.T);
T = [T[1,0], -T[2, 0], T[2, 1]]
'''
Output
R1 : [[-0.99965657, -0.00593909, 0.02552386],
[ 0.02596087, -0.35727319, 0.93363906],
[-0.00357402, -0.93398105, -0.35730468]]
R2 : [[-0.90837444, -0.20840016, -0.3625262 ],
[ 0.26284261, 0.38971602, -0.8826297 ],
[-0.32522244, 0.89704559, 0.29923163]]
T : [-0.1679611706725666, 0.1475313058767286, -0.9746915198833979],
'''
What is wrong with the SVD code? I referred the code here and here

Your R1 output is a left-handed and axis-permuted version of your initial (ground-truth) rotation matrix: notice that the first column is opposite to the ground-truth, and the second and third are swapped, and that the determinant of R1 is ~= -1 (i.e. it's a left-handed frame).
The reason this happens is that the SVD decomposition returns unitary matrices U and V with no guaranteed parity. In addition, you multiplied by an axis-permutation matrix W. It is up to you to flip or permute the axes so that the rotation has the correct handedness. You do so by enforcing constraints from the images and the scene, and known order of the cameras (i.e. knowing which camera is the left one).

Related

Find vertical 1px lines with OpenCV

I have an image something like the image below (on the left):
I want to extract only the pixels in red on the right: the pixels that belong to a 1px vertical line, but not to any thicker line or other region with more than 1 adjacent black pixel. The image is bitonal.
I have so far tried a morphology OPEN with a vertical (10px, which is find for my purposes) and horizontal kernel and taken the difference, but this needs an awkward shift and leaves some "speckles":
vertical_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1, 10))
vertical_mask1 = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, vertical_kernel,
iterations=1)
horz_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2, 1))
horz_mask = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horz_kernel,
iterations=1)
M = np.float32([[1,0,-1],[0,1,1]])
rows, cols = horz_mask.shape
vertical_mask = cv2.warpAffine(horz_mask, M, (cols, rows))
result = cv2.bitwise_and(thresh, cv2.bitwise_not(horz_mask))
What is the correct way to isolate the 1px lines (and only the 1px lines)?
In the general case, for other kernels, this question is: how do I find all pixels in the image that are in regions that the kernel "fits inside" (and then a subtraction to get my desired result)?
That's basically (binary) template matching. You need to derive proper templates from your "kernels". For larger "kernels", that might involve using masks for these templates, too, cf. cv2.matchTemplate.
What's the most important feature for a single pixel vertical line? The left and right neighbour of the current pixel must be 0. So, the template to match is [0, 1, 0]. By using the TemplateMatchMode cv2.TM_SQDIFF_NORMED, perfect matches will lead to close to 0 values in the result array.
You can mask those locations, and dilate according to the size of your template. Then, you use bitwise_and to extract the actual pixels that belong to your template.
Here's some code with a few template ("kernels"):
import cv2
import numpy as np
img = cv2.imread('AapJk.png', cv2.IMREAD_GRAYSCALE)[:, :50]
vert_line = np.array([[0, 1, 0]], np.uint8)
cross = np.array([[0, 1, 0], [1, 1, 1], [0, 1, 0]], np.uint8)
corner = np.array([[0, 0, 1], [0, 0, 1], [1, 1, 1]], np.uint8)
for i_k, k in enumerate([vert_line, cross, corner]):
m, n = k.shape
img_tmp = 1 - img // 255
mask = cv2.matchTemplate(img_tmp, k, cv2.TM_SQDIFF_NORMED) < 10e-6
mask = cv2.dilate(mask.astype(np.uint8), np.ones((m, n)), anchor=(n-1, m-1))
m, n = mask.shape
mask = cv2.bitwise_and(img_tmp[:m, :n], mask)
out = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)
roi = out[:m, :n]
roi[mask.astype(bool), :] = [0, 0, 255]
cv2.imwrite('{}.png'.format(i_k), out)
Vertical line:
Cross:
Bottom right corner 3 x 3:
Larger templates ("kernels") most likely will require additional masks, depending on how many or which neighbouring pixels should be considered or not.
----------------------------------------
System information
----------------------------------------
Platform: Windows-10-10.0.19041-SP0
Python: 3.9.1
PyCharm: 2021.1.3
NumPy: 1.20.3
OpenCV: 4.5.2
----------------------------------------

How to create a 2D perspective transform matrix from individual components?

I am trying to create a 2D perspective transform matrix from individual components like translation, rotation, scale, shear. But at the end the matrix is not producing a true perspective effect like the image below. I think I am missing some component in the code that I wrote to create the matrix. Could some one help me add the missing components and their formulation in the below function? I have used opencv library for my code
cv::Mat getPerspMatrix2D( double rz, double s, double tx, double ty ,double shx, double shy)
{
cv::Mat R = (cv::Mat_<double>(3,3) <<
cos(rz), -sin(rz), 0,
sin(rz), cos(rz), 0,
0, 0, 1);
cv::Mat S = (cv::Mat_<double>(3,3) <<
s, 0, 0,
0, s, 0,
0, 0, 1);
cv::Mat Sh = (cv::Mat_<double>(3,3) <<
1, shx, 0,
shy, 1, 0,
0, 0, 1);
cv::Mat T = (cv::Mat_<double>(3,3) <<
1, 0, tx,
0, 1, ty,
0, 0, 1);
return T * Sh * S * R;
}
Keywords are Homography and 8DOF. Taken from 1 and 2 there exists two coefficients for perspective transformation. But it needs a 2nd step to calculate it. I'm not familiar with OpenCV but I'm hoping to answer your question a bit late in a basically way ;-)
Step 1
You can imagine lx describes a vanishing point on the x axis. The image shows a31=lx=1. lx=100 is less transformation. For lx=0 the position is infinite far means no perspective transform = identity matrix.
[1 0 0]
PL = [0 1 0]
[lx ly 1]
lx/ly are perspective foreshortening parameters
Step 2
When you apply a right hand matrix multiplication P x [u; v; 1] you will recognize that the last value in the result is sometimes other than 1. For affine transformation it is always 1 for perspective projection not. In the 2nd step the result is scaled to make the last coefficient 1. This is a part of the effect.
Your Example Image
Image' = P4 x P3 x P2 x P1 x Image
I would translate the center of the blue rectangle to the origin tx=-w/2 and ty=-h/2 = P1.
Apply projective projection with ly = h (to make both sides at an angle)
Eventually translate back that all point are located in one quadrant
Eventually scale to desired size
Step 2 from the perspective projection can be done after 2.) or at the end.

Inverse Perspective Transform?

I am trying to find the bird's eye image from a given image. I also have the rotations and translations (also intrinsic matrix) required to convert it into the bird's eye plane. My aim is to find an inverse homography matrix(3x3).
rotation_x = np.asarray([[1,0,0,0],
[0,np.cos(R_x),-np.sin(R_x),0],
[0,np.sin(R_x),np.cos(R_x),0],
[0,0,0,1]],np.float32)
translation = np.asarray([[1, 0, 0, 0],
[0, 1, 0, 0 ],
[0, 0, 1, -t_y/(dp_y * np.sin(R_x))],
[0, 0, 0, 1]],np.float32)
intrinsic = np.asarray([[s_x * f / (dp_x ),0, 0, 0],
[0, 1 * f / (dp_y ) ,0, 0 ],
[0,0,1,0]],np.float32)
#The Projection matrix to convert the image coordinates to 3-D domain from (x,y,1) to (x,y,0,1); Not sure if this is the right approach
projection = np.asarray([[1, 0, 0],
[0, 1, 0],
[0, 0, 0],
[0, 0, 1]], np.float32)
homography_matrix = intrinsic # translation # rotation # projection
inv = cv2.warpPerspective(source_image, homography_matrix,(w,h),flags = cv2.INTER_CUBIC | cv2.WARP_INVERSE_MAP)
My question is, Is this the right approach, as I can manual set a suitable ty,rx, but not for the one (ty,rx) which is provided.
First premise: your bird's eye view will be correct only for one specific plane in the image, since a homography can only map planes (including the plane at infinity, corresponding to a pure camera rotation).
Second premise: if you can identify a quadrangle in the first image that is the projection of a rectangle in the world, you can directly compute the homography that maps the quad into the rectangle (i.e. the "birds's eye view" of the quad), and warp the image with it, setting the scale so the image warps to a desired size. No need to use the camera intrinsics. Example: you have the image of a building with rectangular windows, and you know the width/height ratio of these windows in the world.
Sometimes you can't find rectangles, but your camera is calibrated, and thus the problem you describe comes into play. Let's do the math. Assume the plane you are observing in the given image is Z=0 in world coordinates. Let K be the 3x3 intrinsic camera matrix and [R, t] the 3x4 matrix representing the camera pose in XYZ world frame, so that if Pc and Pw represent the same 3D point respectively in camera and world coordinates, it is Pc = R*Pw + t = [R, t] * [Pw.T, 1].T, where .T means transposed. Then you can write the camera projection as:
s * p = K * [R, t] * [Pw.T, 1].T
where s is an arbitrary scale factor and p is the pixel that Pw projects onto. But if Pw=[X, Y, Z].T is on the Z=0 plane, the 3rd column of R only multiplies zeros, so we can ignore it. If we then denote with r1 and r2 the first two columns of R, we can rewrite the above equation as:
s * p = K * [r1, r2, t] * [X, Y, 1].T
But K * [r1, r2, t] is a 3x3 matrix that transforms points on a 3D plane to points on the camera plane, so it is a homography.
If the plane is not Z=0, you can repeat the same argument replacing [R, t] with [R, t] * inv([Rp, tp]), where [Rp, tp] is the coordinate transform that maps a frame on the plane, with the plane normal being the Z axis, to the world frame.
Finally, to obtain the bird's eye view, you select a rotation R whose third column (the components of the world's Z axis in camera frame) is opposite to the plane's normal.

Rotate an image about its centre on a GPU

Assume an image I of dimension (2, 2). Graphical coordinates C are given as:
C = [[0, 0], [1, 0],
[0, 1], [1, 1]]
Objective: rotate I by 90 degrees about the centre (NOT the origin).
Transformation Matrix:
TRotate(90) = [[0, 1], [-1, 0]]
(Assume each coordinate pair can be transformed in lockstep (i.e. on a GPU).)
Method:
Convert graphical coordinates to mathematical coordinates with the origin as the centre of the image.
Apply the transformation matrix.
Convert back to graphical coordinates.
E.g.:
Convert to graphical coordinates:
tx' = tx - width /2
ty' = ty - width /2
C' =[[-1, -1], [0, -1],
[-1, 0], [0, 0]]
Apply the transformation matrix:
C" = [[-1, 1], [-1, -0],
[0, 1], [0, 0]]
Convert back:
C" = [[0, 2], [0, 1],
[1, 2], [1, 1]]
the conversion back is out of bounds...
I'm really battling to get a proper rotation about a 'centre of gravity' working. I think that my conversion to 'mathematical coordinates is wrong'.
I had better luck by rather converting the coordinates into the following:
C' =[[-1, -1], [1, -1],
[-1, 1], [1, 1]]
I achieved this transformation by observing that if the origin existed in between the four pixels, with the +ve y-axis pointing down, and the +ve x-axis to the right, then the point (0,0) would be (-1, -1) and so on for the rest. (The resultant rotation and conversion give the desired result).
However, I can't find the right kind of transform to apply to the coordinates to place the origin at the centre. I've tried a transformation matrix using homogenous coordinates but this does not work.
Edit
For Malcolm's advice:
position vector =
[0
0
1]
Translate by subtracting width/2 == 1:
[-1
-1
0]
Rotate by multiplying the transformation matrix:
|-1| | 0 1 0| |-1|
|-1| X |-1 0 0| = | 1|
|0 | | 0 0 1| | 0|
You need an extra row in your matrix, for translation by x and translation by y. You then add an extra column to your position vector, call it w, which is hard-coded to 1. This is a trick to ensure that the translation can be performed with standard matrix multiplication.
Since you need a translation followed by a rotation, you need to set up the translation matrix, then do a multiplication by the rotation matrix (make them both 3x3s with the last column ignored if you're shaky on matrix multiplication). So the translations and rotations will be interweaved with each other.

Compute homography for a virtual camera with opencv

I have an image of a planar surface, and I want to compute an image warping that gives me a synthetic view of the same planar surface seen from a virtual camera located at another point in the 3d space.
So, given an image I1 I want to compute an image I2 that represents the image I1 seen from a virtual camera.
In theory, there exists an homography that relates these two images.
How do I compute this homography given the camera pose of the virtual camera, as well as it's matrix of internal parameters?
I'm using opencv's warpPerspective() function to apply this homography and generate the image warped.
Thanks in advance.
Ok, found this post (Opencv virtually camera rotating/translating for bird's eye view), where I found some code doing what I needed.
However, I noticed that the rotation in Y had a sign error (-sin instead of sin) . Here's my solution adapted for python. I'm new to python, sorry if I'm doing something ugly.
import cv2
import numpy as np
rotXdeg = 90
rotYdeg = 90
rotZdeg = 90
f = 500
dist = 500
def onRotXChange(val):
global rotXdeg
rotXdeg = val
def onRotYChange(val):
global rotYdeg
rotYdeg = val
def onRotZChange(val):
global rotZdeg
rotZdeg = val
def onFchange(val):
global f
f=val
def onDistChange(val):
global dist
dist=val
if __name__ == '__main__':
#Read input image, and create output image
src = cv2.imread('/home/miquel/image.jpeg')
dst = np.ndarray(shape=src.shape,dtype=src.dtype)
#Create user interface with trackbars that will allow to modify the parameters of the transformation
wndname1 = "Source:"
wndname2 = "WarpPerspective: "
cv2.namedWindow(wndname1, 1)
cv2.namedWindow(wndname2, 1)
cv2.createTrackbar("Rotation X", wndname2, rotXdeg, 180, onRotXChange)
cv2.createTrackbar("Rotation Y", wndname2, rotYdeg, 180, onRotYChange)
cv2.createTrackbar("Rotation Z", wndname2, rotZdeg, 180, onRotZChange)
cv2.createTrackbar("f", wndname2, f, 2000, onFchange)
cv2.createTrackbar("Distance", wndname2, dist, 2000, onDistChange)
#Show original image
cv2.imshow(wndname1, src)
h , w = src.shape[:2]
while True:
rotX = (rotXdeg - 90)*np.pi/180
rotY = (rotYdeg - 90)*np.pi/180
rotZ = (rotZdeg - 90)*np.pi/180
#Projection 2D -> 3D matrix
A1= np.matrix([[1, 0, -w/2],
[0, 1, -h/2],
[0, 0, 0 ],
[0, 0, 1 ]])
# Rotation matrices around the X,Y,Z axis
RX = np.matrix([[1, 0, 0, 0],
[0,np.cos(rotX),-np.sin(rotX), 0],
[0,np.sin(rotX),np.cos(rotX) , 0],
[0, 0, 0, 1]])
RY = np.matrix([[ np.cos(rotY), 0, np.sin(rotY), 0],
[ 0, 1, 0, 0],
[ -np.sin(rotY), 0, np.cos(rotY), 0],
[ 0, 0, 0, 1]])
RZ = np.matrix([[ np.cos(rotZ), -np.sin(rotZ), 0, 0],
[ np.sin(rotZ), np.cos(rotZ), 0, 0],
[ 0, 0, 1, 0],
[ 0, 0, 0, 1]])
#Composed rotation matrix with (RX,RY,RZ)
R = RX * RY * RZ
#Translation matrix on the Z axis change dist will change the height
T = np.matrix([[1,0,0,0],
[0,1,0,0],
[0,0,1,dist],
[0,0,0,1]])
#Camera Intrisecs matrix 3D -> 2D
A2= np.matrix([[f, 0, w/2,0],
[0, f, h/2,0],
[0, 0, 1,0]])
# Final and overall transformation matrix
H = A2 * (T * (R * A1))
# Apply matrix transformation
cv2.warpPerspective(src, H, (w, h), dst, cv2.INTER_CUBIC)
#Show the image
cv2.imshow(wndname2, dst)
cv2.waitKey(1)

Resources