Setting up an Orthographic Top View Projection in WebGL - webgl

I've studied the example here:
The matrix library in use is:
So the example is in clip space and I understand that. I would like to set up an orthographic projection using this example as my base. The Orthographic projection should have it's center at 0,0,0 with an eye location somewhere at +Z and looking down on 0,0,0. When I enter coordinates for my 3d faces in the buffers I would like to be able to enter those coordinates in model space units. Example for my mapping exhibit I have an area of 10,000 cubics x -5000 to +5000 y -5000 to +5000 and z -5000 to +5000 that is to be projected onto the canvas which is 500 x 500. So my 3d faces will have coordinates someplace within those 10,000 cubics and the 500 x 500 canvas will display all 10,000 cubics.
This is the same projection that a CAD program would use to start a scratch drawing. Does anyone know how to do this in WebGL with the glMatrix library? I am new to WebGL and I really could use some guidance on this topic.

For sake of simplicity you first should separate the camera transform matrix from it's projection matrix. You can then multiply them to get the "view-projection matrix" which transform world-space coordinates to screen-space.
var cam = mat4.create();
var proj = mat4.create();
Start placing your camera (cam matrix)
mat4.translate( cam, cam, position )
mat4.rotate( ... )
mat4.lookAt( .. )
Setup othographic projection (proj matrix). You can see the ortho projection like a box aligned with the camera, you can expand each sides with the six parameters. Everything inside the box will show on screen.
var ratio = screenWidth/screenHeight;
var halfWorldWidth = 5000.0;
// the near/far will depend on your camera position
mat4.ortho( proj,
-halfWorldWidth / ratio,
halfWorldWidth /ratio,
Finally get the view-projection
var view = mat4.create()
var viewProj = mat4.create()
mat4.invert( view, cam );
mat4.multiply( viewProj, lens, view );


DirectX and DirectXTK translation limits

I use DirectX Toolkit to display a 3d model, following the 'Rendering the model' and my pyramid is displayed:
When trying to transform the object, the scaling and rotation work well but I'm not sure how to move the object (translate) around. Basically I'm looking for an algorithm that determines, given the current camera position, focus, viewport and the rendered model (which the DirectX toolkit gives me the bounding box so it's "size") the minimum and maximum values for XYZ translation so the object remains visible.
The bounding box is always the same, no matter the view port size, so how do I compare it's size against my viewport?
Please excuse my newbiness, I'm not a 3D developer, at least yet.
The "Simple Rendering" example which draws a triangle:
Matrix proj = Matrix::CreateScale( 2.f/float(backBufferWidth),
-2.f/float(backBufferHeight), 1.f)
* Matrix::CreateTranslation( -1.f, 1.f, 0.f );
says that the normalized triangle size is [1,1,1] but here normalized values do not work.
TL:DR: To move your model around the world, create a matrix for the translation and set it as the world matrix with SetWorld.
Matrix world = Matrix::CreateTranslation( 2.f, 1.f, 3.f);
// Also be sure you have called SetView and SetProjection for the 3D camera setup
//covered in the 3D shapes / Rendering a model tutorial
You should start with a basic review of 3D transformations, in particular the world -> view -> projection transformation pipeline.
The world transformation performs the affine transformation to get the model you are rendering into it's 'world' position. (a.k.a. 'local coordinates to world coordinates transformation').
The view transformation performs the transformation to get world positions into the camera's point of view (i.e. position and direction) (a.k.a. 'world coordinates to view coordinates transformation').
The projection transformation performs the transformation to get the view positions into the canonical "-1 to 1" range that the actual hardware uses, including any perspective projection (a.k.a. 'view coordinates to 'clip' coordinates transformation).
The hardware itself performs the final step of converting the "-1 to 1" to pixel locations in the render target based on the Direct3D SetViewport information (a.k.a. 'clip' coordinates to pixel coordinates transformation).
This Direct3D 9 era article is a bit dated, but it covers the overall idea well.
In the DirectX Tool Kit BasicEffect system, there are distinct methods for each of these matrices: SetWorld, SetView, and SetProjection. There is also a helper if you want to set all three at once SetMatrices.
The simple rendering tutorial is concerned with the simplest form of rendering, 2D rendering, where you want the coordinates you provide to be in natural 'pixel coordinates'
Matrix proj = Matrix::CreateScale( 2.f/float(backBufferWidth),
-2.f/float(backBufferHeight), 1.f)
* Matrix::CreateTranslation( -1.f, 1.f, 0.f );
The purpose of this matrix is to basically 'undo' what the SetViewport will do so that you can think in simple pixel coordinates. It's not suitable for 3D models.
In the 3D shapes tutorial I cover the basic camera model, but I leave the world matrix as the identity so the shape is sitting at the world origin.
m_view = Matrix::CreateLookAt(Vector3(2.f, 2.f, 2.f),
Vector3::Zero, Vector3::UnitY);
m_proj = Matrix::CreatePerspectiveFieldOfView(XM_PI / 4.f,
float(backBufferWidth) / float(backBufferHeight), 0.1f, 10.f);
In the Rendering a model tutorial, I also leave the world matrix as identity. I get into the basics of this in Basic game math tutorial.
One of the nice properties of affine transformations is that you can perform them all at once by transforming by the concatenation of the individual transforms. Point p transformed by matrix W, then transformed by matrix V, then transformed by matrix P is the same as point p transformed by matrix W * V * P.

Find camera orientation && translation from 2d image

The Goal ::
I intend to have an uploaded image as a static background, and render 3d objects on a designated plane in that image.
I need to get the orientation of the camera, in relation to the plane. So then I can properly render the 3D models on said plane.
The user will specify the length & width of the plane. As well as outline the plane, resulting in the plane's 4 corners ( A,B,C,D on the 2D image ).
What I've tried ::
I've looked at using webassembly ported OpenCV, particularly solvePnP, but while testing I was getting the error Uncaught TypeError: Cannot read property '$$' of undefined at RegisteredPointer.nonConstNoSmartPtrRawPointerToWireType
Code I was using below:
// 3D world coords
var vv = cv.matFromArray( 4,3,cv.CV_32SC1,[
// 2D img coords
var imageP = cv.matFromArray( 4,2,cv.CV_8S,[
// camera internal params
var cm = new cv.Mat(3,3,cv.CV_32FC1,new cv.Scalar())
var rvec
var tvec
cv.solvePnP( vv, imageP, cm, new cv.Mat(), rvec, tvec, false, cv.SOLVEPNP_P3P )
With the known variables, is it possible to glean any information about the camera's orientation / position / FOV?
The answer is abit more complex.
First, you have to calibrate camera to get cameraMatrix. You should also remove distortion along the way,
var cm = new cv.Mat(3,3,cv.CV_32FC1,new cv.Scalar())
In ur code, you just declare it and did not put any content to it. and the math requires it as shown below
then You need to know the physical size of the object in order to do that.
The easiest way for you to start is you use chessboard calibration pattern sample which you can pre-enter its size.
You can follow this sample to find how to find the camera orientation and relative position to pre-determined objects.
The source code to achieve what you want is here
Start from line 1400.
The sample result with AR can be found in this link

How to get the transformation matrix of a 3d model to object in a 2d image

Given an object's 3D mesh file and an image that contains the object, what are some techniques to get the orientation/pose parameters of the 3d object in the image?
I tried searching for some techniques, but most seem to require texture information of the object or at least some additional information. Is there a way to get the pose parameters using just an image and a 3d mesh file (wavefront .obj)?
Here's an example of a 2D image that can be expected.
FOV of camera
Field of view of camera is absolute minimum to know to even start with this (how can you determine how to place object when you have no idea how it would affect scene). Basically you need transform matrix that maps from world GCS (global coordinate system) to Camera/Screen space and back. If you do not have a clue what about I am writing then perhaps you should not try any of this before you learn the math.
For unknown camera you can do some calibration based on markers or etalones (known size and shape) in the view. But much better is use real camera values (like FOV angles in x,y direction, focal length etc ...)
The goal for this is to create function that maps world GCS(x,y,z) into Screen LCS(x,y).
For more info read:
transform matrix anatomy
3D graphic pipeline
Perspective projection
Silhouette matching
In order to compare rendered and real image similarity you need some kind of measure. As you need to match geometry I think silhouette matching is the way (ignoring textures, shadows and stuff).
So first you need to obtain silhouettes. Use image segmentation for that and create ROI mask of your object. For rendered image is this easy as you van render the object with single color without any lighting directly into ROI mask.
So you need to construct function that compute the difference between silhouettes. You can use any kind of measure but I think you should start with non overlapping areas pixel count (it is easy to compute).
Basically you count pixels that are present only in one ROI (region of interest) mask.
estimate position
as you got the mesh then you know its size so place it in the GCS so rendered image has very close bounding box to real image. If you do not have FOV parameters then you need to rescale and translate each rendered image so it matches images bounding box (and as result you obtain only orientation not position of object of coarse). Cameras have perspective so the more far from camera you place your object the smaller it will be.
fit orientation
render few fixed orientations covering all orientations with some step 8^3 orientations. For each compute the difference of silhouette and chose orientation with smallest difference.
Then fit the orientation angles around it to minimize difference. If you do not know how optimization or fitting works see this:
How approximation search works
Beware too small amount of initial orientations can cause false positioves or missed solutions. Too high amount will be slow.
Now that was some basics in a nutshell. As your mesh is not very simple you may need to tweak this like use contours instead of silhouettes and using distance between contours instead of non overlapping pixels count which is really hard to compute ... You should start with simpler meshes like dice , coin etc ... and when grasping all of this move to more complex shapes ...
[Edit1] algebraic approach
If you know some points in the image that coresponds to known 3D points (in your mesh) then you can along with the FOV of the camera used compute the transform matrix placing your object ...
if the transform matrix is M (OpenGL style):
M = xx,yx,zx,ox
0, 0, 0, 1
Then any point from your mesh (x,y,z) is transformed to global world (x',y',z') like this:
(x',y',z') = M * (x,y,z)
The pixel position (x'',y'') is done by camera FOV perspective projection like this:
y''=FOVy*(z'+focus)*y' + ys2;
x''=FOVx*(z'+focus)*x' + xs2;
where camera is at (0,0,-focus), projection plane is at z=0 and viewing direction is +z so for any focal length focus and screen resolution (xs,ys):
When put all this together you obtain this:
xi'' = ( xx*xi + yx*yi + zx*zi + ox ) * ( xz*xi + yz*yi + zz*zi + ox + focus ) * FOVx
yi'' = ( xy*xi + yy*yi + zy*zi + oy ) * ( xz*xi + yz*yi + zz*zi + oy + focus ) * FOVy
where (xi,yi,zi) is i-th known point 3D position in mesh local coordinates and (xi'',yi'') is corresponding known 2D pixel positions. So unknowns are the M values:
{ xx,xy,xz,yx,yy,yx,zx,zy,zz,ox,oy,oz }
So we got 2 equations per each known point and 12 unknowns total. So you need to know 6 points. Solve the system of equations and construct your matrix M.
Also you can exploit that M is a uniform orthogonal/orthonormal matrix so vectors
X = (xx,xy,xz)
Y = (yx,yy,yz)
Z = (zx,zy,zz)
Are perpendicular to each other so:
(X.Y) = (Y.Z) = (Z.X) = 0.0
Which can lower the number of needed points by introducing these to your system. Also you can exploit cross product so if you know 2 vectors the thirth can be computed
Z = (X x Y)*scale
So instead of 3 variables you need just single scale (which is 1 for orthonormal matrix). If I assume orthonormal matrix then:
|X| = |Y| = |Z| = 1
so we got 6 additional equations (3 x dot, and 3 for cross) without any additional unknowns so 3 point are indeed enough.

camera frame world coordinates relative to fiducial

I am trying to determine camera position in world coordinates, relative to a fiducial position based on fiducial marker found in a scene.
My methodology for determining the viewMatrix is described here:
Determine camera pose?
I have the rotation and translation, [R|t], from the trained marker to the scene image. Given camera calibration training, and thus the camera intrinsic results, I should be able to discern the cameras position in world coordinates based on the perspective & orientation of the marker found in the scene image.
Can anybody direct me to a discussion or example similar to this? I'd like to know my cameras position based on the fiducial marker, and I'm sure that something similar to this has been done before, I'm just not searching the correct keywords.
Appreciate your guidance.
What do you mean under world coordinates? If you mean object coordinates then you should use the inverse transformation of solvepnp's result.
Given a view matrix [R|t], we have that inv([R|t]) = [R'|-R'*t], where R' is the transpose of R. In OpenCV:
cv::Mat rvec, tvec;
cv::solvePnP(objectPoints, imagePoints, intrinsics, distortion, rvec, tvec);
cv::Mat R;
cv::Rodrigues(rvec, rotation);
R = R.t(); // inverse rotation
tvec = -R * tvec; // translation of inverse
// camPose is a 4x4 matrix with the pose of the camera in the object frame
cv::Mat camPose = cv::Mat::eye(4, 4, R.type());
R.copyTo(camPose.rowRange(0, 3).colRange(0, 3)); // copies R into camPose
tvec.copyTo(camPose.rowRange(0, 3).colRange(3, 4)); // copies tvec into camPose
Update #1:
Result of solvePnP
solvePnP estimates the object pose given a set of object points (model coordinates), their corresponding image projections (image coordinates), as well as the camera matrix and the distortion coefficients.
The object pose is given by two vectors, rvec and tvec. rvec is a compact representation of a rotation matrix for the pattern view seen on the image. That is, rvec together with the corresponding tvec brings the fiducial pattern from the model coordinate space (in which object points are specified) to the camera coordinate space.
That is, we are in the camera coordinate space, it moves with the camera, and the camera is always at the origin. The camera axes have the same directions as image axes, so
x-axis is pointing in the right side from the camera,
y-axis is pointing down,
and z-axis is pointing to the direction of camera view
The same would apply to the model coordinate space, so if you specified the origin in upper right corner of the fiducial pattern, then
x-axis is pointing to the right (e.g. along the longer side of your pattern),
y-axis is pointing to the other side (e.g. along the shorter one),
and z-axis is pointing to the ground.
You can specify the world origin as the first point of the object points that is the first object is set to (0, 0, 0) and all other points have z=0 (in case of planar patterns). Then tvec (combined rvec) points to the origin of the world coordinate space in which you placed the fiducial pattern. solvePnP's output has the same units as the object points.
Take a look at to the following: 6dof positional tracking. I think this is very similar as you need.

How can I get the camera space position on the near plane from the object space position

As titled, I have an item with a specific position in object space defined by a single vector.
I would like to retrieve the coordinates in camera space of the projection of this vector on the near clipping plane.
In other words, I need the intersection in camera space between this vector and the plane defined by the z coordinate equals to -1 (my near plane).
I needed it for moving object linearly with the mouse in perspective projection
Edit: Right now I go from the object space down to the window space, then from there up to the camera space by setting the window depth window.z equal to 0, that is the near plane.
Note that to get the camera space from the unProject I just pass in as modelview matrix an identity matrix new Mat4(1f):
public Vec3 getCameraSpacePositionOnNearPlane(Vec2i mousePoint) {
int[] viewport = new int[]{0, 0, glViewer.getGlWindow().getWidth(), glViewer.getGlWindow().getHeight()};
Vec3 window = new Vec3();
window.x = mousePoint.x;
window.y = viewport[3] - mousePoint.y - 1;
window.z = 0;
return Jglm.unProject(window, new Mat4(1f), glViewer.getVehicleCameraToClipMatrix(), new Vec4(viewport));
Is there a better way (more efficient) to get it without going down to the window space and come back to the camera one?
The most direct approach I could think of would be to simply transform your object space position (let this be called vector x in the following) into eye space, construct a ray from the origin to that eye-space coordinates and calculate the intersection between that ray and the near plane z_eye=-near.
Another approach would be to fully transfrom into the clip space. Since the near plane is z_clip = - w_clip there, you can just set the z coordinate of the result to -w, and project that back to eye space by using the inverse projection matrix.
In both cases, the result will be meaningless if the point lies behind the camera, or at the camera plane z_eye = 0.
