map the kinect joint position to 2d point - xna

I am new to programming using official Kinect SDK 1.5 in XNA. How do I map the Skeleton joint's position to the XNA 2D screen to match the image stream?

This is quite easy since the Kinect SDK provides some Mapping helper methods.
MapSkeletonPointToColor will gives you the position of a SkeletonPoint in a 2D color frame. You just jave to pass two arguments : your skeleton point, and the target color frame format.
foreach (Joint joint in skeleton.Joints)
{
// Transforms a SkeletonPoint to a ColorImagePoint
var colorPoint = Kinect.MapSkeletonPointToColor(joint.Position, Kinect.ColorStream.Format);
// colorPoint has two properties : X and Y which gives you the positions in the 2D color frame.
// TODO : Do something with the colorPoint value.
}

Related

DirectX and DirectXTK translation limits

I use DirectX Toolkit to display a 3d model, following the 'Rendering the model' and my pyramid is displayed:
When trying to transform the object, the scaling and rotation work well but I'm not sure how to move the object (translate) around. Basically I'm looking for an algorithm that determines, given the current camera position, focus, viewport and the rendered model (which the DirectX toolkit gives me the bounding box so it's "size") the minimum and maximum values for XYZ translation so the object remains visible.
The bounding box is always the same, no matter the view port size, so how do I compare it's size against my viewport?
Please excuse my newbiness, I'm not a 3D developer, at least yet.
The "Simple Rendering" example which draws a triangle:
Matrix proj = Matrix::CreateScale( 2.f/float(backBufferWidth),
-2.f/float(backBufferHeight), 1.f)
* Matrix::CreateTranslation( -1.f, 1.f, 0.f );
m_effect->SetProjection(proj);
says that the normalized triangle size is [1,1,1] but here normalized values do not work.
TL:DR: To move your model around the world, create a matrix for the translation and set it as the world matrix with SetWorld.
Matrix world = Matrix::CreateTranslation( 2.f, 1.f, 3.f);
m_effect->SetWorld(world);
// Also be sure you have called SetView and SetProjection for the 3D camera setup
//covered in the 3D shapes / Rendering a model tutorial
You should start with a basic review of 3D transformations, in particular the world -> view -> projection transformation pipeline.
The world transformation performs the affine transformation to get the model you are rendering into it's 'world' position. (a.k.a. 'local coordinates to world coordinates transformation').
The view transformation performs the transformation to get world positions into the camera's point of view (i.e. position and direction) (a.k.a. 'world coordinates to view coordinates transformation').
The projection transformation performs the transformation to get the view positions into the canonical "-1 to 1" range that the actual hardware uses, including any perspective projection (a.k.a. 'view coordinates to 'clip' coordinates transformation).
The hardware itself performs the final step of converting the "-1 to 1" to pixel locations in the render target based on the Direct3D SetViewport information (a.k.a. 'clip' coordinates to pixel coordinates transformation).
This Direct3D 9 era article is a bit dated, but it covers the overall idea well.
In the DirectX Tool Kit BasicEffect system, there are distinct methods for each of these matrices: SetWorld, SetView, and SetProjection. There is also a helper if you want to set all three at once SetMatrices.
The simple rendering tutorial is concerned with the simplest form of rendering, 2D rendering, where you want the coordinates you provide to be in natural 'pixel coordinates'
Matrix proj = Matrix::CreateScale( 2.f/float(backBufferWidth),
-2.f/float(backBufferHeight), 1.f)
* Matrix::CreateTranslation( -1.f, 1.f, 0.f );
m_effect->SetProjection(proj);
The purpose of this matrix is to basically 'undo' what the SetViewport will do so that you can think in simple pixel coordinates. It's not suitable for 3D models.
In the 3D shapes tutorial I cover the basic camera model, but I leave the world matrix as the identity so the shape is sitting at the world origin.
m_view = Matrix::CreateLookAt(Vector3(2.f, 2.f, 2.f),
Vector3::Zero, Vector3::UnitY);
m_proj = Matrix::CreatePerspectiveFieldOfView(XM_PI / 4.f,
float(backBufferWidth) / float(backBufferHeight), 0.1f, 10.f);
In the Rendering a model tutorial, I also leave the world matrix as identity. I get into the basics of this in Basic game math tutorial.
One of the nice properties of affine transformations is that you can perform them all at once by transforming by the concatenation of the individual transforms. Point p transformed by matrix W, then transformed by matrix V, then transformed by matrix P is the same as point p transformed by matrix W * V * P.

Find camera orientation && translation from 2d image

The Goal ::
I intend to have an uploaded image as a static background, and render 3d objects on a designated plane in that image.
I need to get the orientation of the camera, in relation to the plane. So then I can properly render the 3D models on said plane.
The user will specify the length & width of the plane. As well as outline the plane, resulting in the plane's 4 corners ( A,B,C,D on the 2D image ).
What I've tried ::
I've looked at using webassembly ported OpenCV, particularly solvePnP, but while testing I was getting the error Uncaught TypeError: Cannot read property '$$' of undefined at RegisteredPointer.nonConstNoSmartPtrRawPointerToWireType
Code I was using below:
// 3D world coords
var vv = cv.matFromArray( 4,3,cv.CV_32SC1,[
0,0,0,
0,4,0,
8,4,0,
8,0,0,
])
// 2D img coords
var imageP = cv.matFromArray( 4,2,cv.CV_8S,[
292,272,
72,379,
487,530,
701,470,
])
// camera internal params
var cm = new cv.Mat(3,3,cv.CV_32FC1,new cv.Scalar())
var rvec
var tvec
cv.solvePnP( vv, imageP, cm, new cv.Mat(), rvec, tvec, false, cv.SOLVEPNP_P3P )
With the known variables, is it possible to glean any information about the camera's orientation / position / FOV?
The answer is abit more complex.
First, you have to calibrate camera to get cameraMatrix. You should also remove distortion along the way,
var cm = new cv.Mat(3,3,cv.CV_32FC1,new cv.Scalar())
In ur code, you just declare it and did not put any content to it. and the math requires it as shown below
then You need to know the physical size of the object in order to do that.
The easiest way for you to start is you use chessboard calibration pattern sample which you can pre-enter its size.
You can follow this sample to find how to find the camera orientation and relative position to pre-determined objects.
https://docs.opencv.org/2.4/doc/tutorials/calib3d/camera_calibration/camera_calibration.html
The source code to achieve what you want is here
https://github.com/opencv/opencv/blob/master/modules/calib3d/src/calibration.cpp
Start from line 1400.
The sample result with AR can be found in this link
https://www.youtube.com/watch?v=2hek-DmiGEw

Mapping infrared images to color images in the RealSense Library

I currently use an Intel d435 camera.
I want to align with the left-infrared camera and the color camera.
the align function provided by the RealSense library has only the ability to align depth and color.
I heard that RealSense Camera is already aligned with the left-infrared camera and the depth camera.
However, I cannot map the infrared image and the color image with this information. The depth image is again set to the color image through the align function. I wonder how I can fit the color image with the left-infrared image that is set to the depth of the initial state.
ㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡ
[Realsense Customer Engineering Team Comment]
#Panepo
The align class used in librealsense demos maps between depth and some other stream and vice versa. We do not offer other forms of stream alignments.
But one suggestion for you to have a try, Basically the mapping is a triangulation technique where we go through the intersection point of a pixel in 3D space to find its origin in another frame, this method work properly when the source data is depth (Z16 format). One possible way to map between two none-depth stream is to play three streams (Depth+IR+RGB), then calculate the UV map for Depth to Color, and then use this UV map to remap IR frame ( remember that Depth and left IR are aligned by design).
Hope the suggestion give you some idea.
ㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡ
This is the method suggested by Intel Corporation.
Can you explain what it means to be able to solve the problem by creating a UV map using deep and color images? and does the RealSense2 library have a UV map function?
I need your precious answer.
Yes, Intel RealSense SDK 2.0 provides class PointCloud.
So, you
-configure sensors
-start streaming
-obtain color & depth frames
-get UV Map as follows (C#):
var pointCloud = new PointCloud();
pointCloud.MapTexture(colorFrame);
var points = pointCloud.Calculate(depthFrame);
var vertices = new Points.Vertex[depth frame height * depth frame width];
var uvMap = new Points.TextureCoordinate[depth frame height * depth frame width];
points.CopyTo(vertices);
points.CopyTo(uvMap);
uvMap you'll get is a normalized depth to color mapping
NOTE: if depth is aligned to color, size of vertices and UV Map is calculated using color frame width and height

Setting up an Orthographic Top View Projection in WebGL

I've studied the example here:
http://learningwebgl.com/lessons/lesson01/index.html
The matrix library in use is:
http://glmatrix.net/
So the example is in clip space and I understand that. I would like to set up an orthographic projection using this example as my base. The Orthographic projection should have it's center at 0,0,0 with an eye location somewhere at +Z and looking down on 0,0,0. When I enter coordinates for my 3d faces in the buffers I would like to be able to enter those coordinates in model space units. Example for my mapping exhibit I have an area of 10,000 cubics x -5000 to +5000 y -5000 to +5000 and z -5000 to +5000 that is to be projected onto the canvas which is 500 x 500. So my 3d faces will have coordinates someplace within those 10,000 cubics and the 500 x 500 canvas will display all 10,000 cubics.
This is the same projection that a CAD program would use to start a scratch drawing. Does anyone know how to do this in WebGL with the glMatrix library? I am new to WebGL and I really could use some guidance on this topic.
For sake of simplicity you first should separate the camera transform matrix from it's projection matrix. You can then multiply them to get the "view-projection matrix" which transform world-space coordinates to screen-space.
var cam = mat4.create();
var proj = mat4.create();
Start placing your camera (cam matrix)
mat4.translate( cam, cam, position )
mat4.rotate( ... )
mat4.lookAt( .. )
//...
Setup othographic projection (proj matrix). You can see the ortho projection like a box aligned with the camera, you can expand each sides with the six parameters. Everything inside the box will show on screen.
var ratio = screenWidth/screenHeight;
var halfWorldWidth = 5000.0;
// the near/far will depend on your camera position
mat4.ortho( proj,
-halfWorldWidth,
halfWorldWidth,
-halfWorldWidth / ratio,
halfWorldWidth /ratio,
-50,
50
)
Finally get the view-projection
var view = mat4.create()
var viewProj = mat4.create()
mat4.invert( view, cam );
mat4.multiply( viewProj, lens, view );

Spritebatch.Begin() Transform Matrix

I have been wondering for a while about how the transform matrix in spriteBatch is implemented. I've created a 2D camera, and the transform matrix is as follows:
if (needUpdate)
transformMatrix =
Matrix.CreateTranslation(-Position.X, -Position.Y, 0) *
Matrix.CreateScale(curZoom, curZoom, 1) ; needUpdate = false;
The camera works as good as I want, but I just want to know how the transformation is applied: Does the transformation only affects the axis of the sprites, or the screen co-ordinates too?
Thanks in advance!
I see you've answered your own question, but to provide complete information - SpriteBatch provides a similar interface to the traditional world-view-projection system of transformations.
The SpriteBatch class has an implicit projection matrix that takes coordinates in the "client space" of the viewport ((0,0) at the top left, one unit per pixel) and puts them on screen.
The Begin call has an overload that accepts a transformation matrix, which is the equivalent of a view matrix used for moving the camera around.
And the Draw call, while not actually using a matrix, allows you to specify position, rotation, scale, etc - equivalent to a world matrix used for positioning a model in the scene (model space to world space).
So you start with your "model" equivalent - which for SpriteBatch is a quad (sprite) of the size of the texture (or source rectangle). When drawn, that quad is transformed to its world coordinates, then that is transformed to its view coordinates, and then finally that is transformed to its projection coordinates.

Resources