About the lookat() function - webgl

This is the function:
lookat(vec3 eye, vec3 at, vec3 up)
As we all know, the up vector can be(0.0, 1.0, 0.0). It cannot be parallel to the observation vector n, or there will be somethingw wrong.
My question is how to calculate the up vector so that it passes through the point eye and is perpendicular to the n vector.

Related

Leveraging webGL to find bounding boxes

Given some vertices with xyz coordinates it is easy to obtain an xyz aligned bounding box, just take the min/max xyz values from the vertices. Ideally, these values only have to be found once and can be found before any rendering takes place.
My question is: if rotating, scaling, or translating the object, what's the best way to calculate the new xyz values which bound the object? Do I have to go through all the vertices after each transform and find new min/max xyz values?
Given GLSL code
in vec4 a_position;
in vec4 a_color;
// transformation matrix
uniform mat4 u_matrix;
out vec4 v_color;
void main() {
gl_Position = u_matrix * a_position;
v_color = a_color;
}
My idea: would adding new out variables for bounding box coordinates work? Or is there a better way?
in vec4 a_position;
in vec4 a_color;
// transformation matrix
uniform mat4 u_matrix;
out vec4 v_color;
out vec3 max_bounds;
out vec3 min_bounds;
void main() {
vec4 position = u_matrix * a_position;
if(position.x > max_bounds.x){
max_bounds.x = position.x;
}
if(position.y > max_bounds.y){
max_bounds.y = position.y;
}
if(position.z > max_bounds.z){
max_bounds.z = position.z;
}
// ...
gl_Position = position;
v_color = a_color;
}
You can't, since your vertex shader code (and all other shader code) is executed in parallel and the outputs only go to the next stage (fragment shader in your case).
An exemption is transform feedback where the outputs of a vertex shader can be written to buffers, however you can only use that to map data not gather / reduce it. A significant chunk of the efficiency/performance advantage of GPUs is due to executing code in parallel. The ability to share data among those parallel threads is very limited and not accessible via WebGL to begin with.
On top of all that, your task (finding the min/max extents in a vertex array) is inherently sequential as it requires shared data(the min and max values) available and current to all threads.
Since AABBs are inherently rather loose fitting, one (if not the) common approach is to transform the 8 corner vertices of the AABB(of the untransformed mesh) and gather the AABB from those.
Theoretically speaking you could store the vertex positions in a floating point texture, transform those with a fragment (instead of vertex) shader, write it back to a texture, then do a bunch of gather passes where you gather the min max values for chunks of X by X size (e.g. 64x64) and write that back to a set of increasingly smaller textures until you've reached a 1x1 pixel texture which you'd then read your result from using readPixels. That said, this is simply not worth the effort (and probably slower for meshes with lower vertex counts) just to get a slightly better fitting AABB, if you really need that you'd rather create a compound volume comprised of better fitting bounding shapes and than gather a combined AABB from those.

pytorch affine_grid: what is the theta input?

When trying to use torch.nn.functional.affine_grid, it requires a theta affine matrix of size (N x 3 x 4) according to the documentation. I thought a general affine matrix is (N x 4 x 4). What is the supposed affine matrix format in pytorch?
An example of 3D rotation affine input would be ideal. Appreciate your help.
The dimensions you mention are applicable for the case of 3D inputs, that is you wish to apply 3D geometric transforms on the input tensor x of shape bxcxdxhxw.
A transformation to points in 3D (represented as 4-vector in homogeneous coordinates as (x, y, z, 1)) should be, in the general case, a 4x4 matrix as you noted.
However, since we restrict ourselves to homogeneous coordinates, i.e., the fourth coordinate must be 1, the 4th row of the matrix must be (0, 0, 0, 1) (see this).
Therefore, there's no need to explicitly code this last row.
To conclude, a 3D transformation composed of a 3x3 rotation R and 3d translation t is simply the 3x4 matrix:
theta = [R t]

Reverse of OpenCV projectPoints

I have a camera facing the equivalent of a chessboard. I know the world 3d location of the points as well as the 2d location of the corresponding projected points on the camera image. All the world points belong to the same plane. I use solvePnP:
Matx33d camMat;
Matx41d distCoeffs;
Matx31d rvec;
Matx31d tvec;
std::vector<Point3f> objPoints;
std::vector<Point2f> imgPoints;
solvePnP(objPoints, imgPoints, camMat, distCoeffs, rvec, tvec);
I can then go from the 3d world points to the 2d image points with projectPoints:
std::vector<Point2f> projPoints;
projectPoints(objPoints, rvec, tvec, camMat, distCoeffs, projPoints);
projPoints are very close to imgPoints.
How can I do the reverse with a screen point that corresponds to a 3d world point that belongs to the same plane. I know that from a single view, it's not possible to reconstruct the 3d location but here I'm in the same plane so it's really a 2d problem. I can calculate the reverse rotation matrix as well as the reverse translation vector but then how can I proceed?
Matx33d rot;
Rodrigues(rvec, rot);
Matx33d camera_rotation_vector;
Rodrigues(rot.t(), camera_rotation_vector);
Matx31d camera_translation_vector = -rot.t() * tvec;
Suppose you calibrate your camera by objpoints-imgpoints pair. Note first is real world 3-d coordinate of featured points on calibration board, the second one is 2-d pixel location of featured points in each image. So both of them should be the list where it has the number of calibration board images element. After following line of Python code, you will have calibration matrix mtx, each calibration board's rotations rvecs, and its translations tvecs.
ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, gray.shape[::-1], None, np.zeros(5,'float32'),flags=cv2.CALIB_USE_INTRINSIC_GUESS )
Now we can find any pixel's 3D coordinate under the assumption. That assumption is we need to define some reference point. Let's assume our reference is 0th (first) calibration board, where its pivot point is at 0,0 where the long axis of the calibration board is x, and the short one is y-axis, also the surface of calibration board shows Z=0 plane. Here is how we can create a projection matrix.
# projection matrix
Lcam=mtx.dot(np.hstack((cv2.Rodrigues(rvecs[0])[0],tvecs[0])))
Now we can define any pixel location and desired Z value. Note since I want to project (100,100) pixel location on the reference calibration board, I set Z=0.
px=100
py=100
Z=0
X=np.linalg.inv(np.hstack((Lcam[:,0:2],np.array([[-1*px],[-1*py],[-1]])))).dot((-Z*Lcam[:,2]-Lcam[:,3]))
Now we have X and Y coordinate of (px,py) pixel, it is X[0], X[1] .
the last element of X is lambda factor. As a result we can say, pixe on (px,py) location drops on X[0],X[1] coordinate on the 0th calibration board's surface.
This question seems to be a duplicate of another Stackoverflow question in which the asker provides nicely the solution. Here is the link: Answer is here: Computing x,y coordinate (3D) from image point

Exact definition of the matrices in OpenCv StereoRectify

Normally the definition of a projection matrix P is the 3x4 matrix which projects point from world coordinates to image/pixel coordinates. The projection matrix can be split up into:
K: a 3x4 camera matrix K with the intrinsic parameters
T: a 4x4 transformation matrix with the extrinsic parameters
The projection matrix is then P = K * T.
What are the clear definitions of the following input to OpenCV's stereoRectify:
cameraMatrix1 – First camera matrix (I assume it is the instrinsic K part of the projection matrix, correct?).
R – Rotation matrix between the coordinate systems of the first and the second cameras. (what does 'between' means? Is it the rotation from cam1 to cam2 or from cam2 to cam1?)
T – Translation vector between coordinate systems of the cameras. (Same is above. Is the translation from cam1 -> cam2 or cam2->cam1)
R1 – Output 3x3 rectification transform (rotation matrix) for the first camera. (Is this the rotation after rectification so the new extrinsic part of the projection matrix becomes T1new = R1*T1old?)
P1 – Output 3x4 projection matrix in the new (rectified) coordinate systems for the first camera. (What is meant by 'projection matrix in the new coordinate system'? It seems that this projection matrix is dependent on the rotation matrix R1 to project point from world coordinates to image/pixel coordinates, so from the above definition it is neither the 'projection matrix' or the 'camera matrix' but some kind of mixture of the two)
CAMERAMATRIX1 - is the intrinsic K matrix as computed by stereocalibrate() function in opencv. you got it right!!!
R is the rotation matrix of cam2 frame w.r.t cam1 frame. Similarily , T is the translation vector of cam2 origin w.r.t
cam1 origin.
If you'll look in O'Riley book "LEARNING OPENCV" pg.-434, you'll understand what R1(/Rl) and R2(/Rr) are.
Rl=[Rrect][rl]; Rr=[Rect][rr];
let camera's picture planes be plane1 and plane2. When stereo rectification hasn't been done , then plane1 and plane2 will not be parallel at all. Also, the epilines willn't be parallel to the stereo camera baseline. So, what Rl does is that it transforms the left image plane to be parallel to right image plane(which is transformed by Rr) and also , epilines on both images are now parallel .
P1 and P2 are the new projection matrices after stereo rectification. Remember, camera matrix(K) transforms a point in 3d space onto 2d image plane. But P1 and P2 transforms a point in 3d space on rectified 2d image planes.
if you have calibrated a stereo camera rig before and observed the P1 and K1 values, you'll find that they are pretty much similiar if your stereo rig is almost in rectified configuration (obviously within human range)

OpenCV Camera calibration use of rotation matrix

http://docs.opencv.org/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html#calibratecamera
I used cv::calibrateCamera method with 9*6 chessboard pattern.
Now I am getting rvecs and tvecs corresponding to each pattern,
Can somebody explain the format of rvecs and tvecs?
As far as I have figured out it is each one is 3*1 matrix.
and OpenCV documentation suggests to see Rodrigues function.
http://en.wikipedia.org/wiki/Rodrigues'_rotation_formula
As far rodrigues is concerned it is way to rotate a vector
around a given axis with angle theta.
but for this we need four values unit Vector(ux,uy,uz) and the angle. but openCV seem to use only 3 values.
OpenCV rodrigues documentation refer the below link http://docs.opencv.org/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html#void Rodrigues(InputArray src, OutputArray dst, OutputArray jacobian)
says that it will convert 3*1 matrix to 3*3 rotation matrix.
Is this matrix same as which we use 3D graphics.
can I convert it to 4*4 matrix and use it for transformations like below
M4X4 [
x x x 0
x x x 0
x x x 0
0 0 0 1
]
x : are the values from output 3by3 matrix of rodrigues function.
Is the relationship valid:
Vout = M4X4 * Vin;
using the matrix above.
The 3x1 rotation vector can express a rotation matrix by defining an axis of rotation via the direction that the vector points and an angle via the magnitude of the vector. Using the opencv function Rodrigues(InputArray src, OutputArray dst) you can obtain a rotation matrix which fits the function you describe.

Resources