Normalize the Kinect Skeleton Joints - image-processing

I'm working on a project to recognize people by their skeleton using Microsoft Kinect SDK.
The problem is that the skeleton size grows as the person moves towards the Kinect sensor. I need to build my system in a way to be independent of the position of the person. I don't know how to solve this problem.
Some related works say " the skeleton should be normalized in the time domain." , I don't know what that means!
Any advice would be appreciated.
Thank you

I believe the normalization methods depends on the reason you need it. Maybe you can provide more information on why you would need a normalized skeleton.
Here are some methods that I can think of:
Compute a 3D bounding box (AABB) for the skeleton joints. Find the axis where the AABB is the largest and calculate a scale factor so that size would be 1.0. You can then apply this scaling factor on all other axes.
Maybe you want skeleton joints independent of their position. In this case, you can use a joint (e.g. spine) and consider that the origin of your skeleton. Then, compute every other joint position as:
newjointPosition = oldjointPosition - spinePosition;
If you need a more specific answer, please tell us more about your app and requirements.

Related

Finding displacement between two camera frames

I'm currently working on a visual odometry project. Currently I've implemented up to Essential Matrix decomposition stage. But the resulting translation vector is normalized and cannot be able to plot the movement.
Now how can I compute the displacement in some scale? I have seen some suggestions to use planner homography to compute the absolute translation. I didn't got the idea of doing it as, the outdoor environment is not simply planner. At least, by considering the ground as planner, how to obtain, the translation of it. I've seen a suggestion here. Is it possible to use this approach to get the displacement between two frames?
What you are referring to is called registration. This is a vast field. There are methods for linear transformation across the entire image, and per pixel methods ( the two ends of the spectrum). Naturally per pixel methods are far slower typically and have many local errors.
Typically two frames have very little transformation between them and simple Homography will do to find the general scaling between them. Especially if you are talking about aerial photos. If your case is very far from planar then you may want to use something closer to pixel-wise. For example using spline fitting: https://www.mathworks.com/matlabcentral/fileexchange/20057-b-spline-grid--image-and-point-based-registration
You cannot recover scale, generally speaking, unless you can recognize in the scene 1 or more objects of known physical size.

How to merge two 3D point clouds where cameras are at fixed position.

I have four cameras which are at fixed position. So I can measure the distance (even rotation) among them using a ruler (physically). Camera one and two gives me a point cloud and camera three and four gives me another point cloud. I need to merge these point clouds.
As far I have understood that ICP and other such algorithms do a rigid transformation of one point cloud to match the other. My question is how can I use the extra knowledge (distance between the cameras in centimeter) to do this transformation.
I am quite new to such work so, please correct me if I misunderstood something. And thanks in advance.
First, what kind of accuracy are you looking for, and over what volume of space? Achieving 0.1 mm registration accuracy over a 0.5 m tabletop scene is a completely different problem (in terms of mechanical design and constraints) than a few mm over a floor tens of meters wide.
Generally speaking, with a well reconstructed and unambiguous object shape, ICP will always give you a better solution than manual measurements.
If the cameras are static, then what you have is really a calibration problem, and you need to calibrate your 4-camera rig only at setup and when its configuration changes for whatever reason.
I suggest using a calibration object of precisely known size and geometry, e.g. a machined polyhedron. You can generate and ICP-register point clouds for it, then fit the merged cloud to the known geometry, thus obtaining position and orientation of every individual point cloud with respect to the fixed object. From these you can work out the poses of the cameras w.r.t. each other.

How align(register) and merge point clouds to get full 3d model?

I want to get 3d model of some real word object.
I have two web cams and using openCV and SBM for stereo correspondence I get point cloud of the scene, and filtering through z I can get point cloud only of object.
I know that ICP is good for this purprose, but it needs point clouds to be initally good aligned, so it is combined with SAC to achieve better results.
But my SAC fitness score it too big smth like 70 or 40, also ICP doesn't give good results.
My questions are:
Is it ok for ICP if I just rotate the object infront of cameras for obtaining point clouds? What angle of rotation must be to achieve good results? Or maybe there are better way of taking pictures of the object for getting 3d model? Is it ok if my point clouds will have some holes? What is maximal acceptable fitness score of SAC for good ICP, and what is maximal fitness score of good ICP?
Example of my point cloud files:
https://drive.google.com/file/d/0B1VdSoFbwNShcmo4ZUhPWjZHWG8/view?usp=sharing
My advice and experience is that you already have rgb images or grey. ICP is an good application for optimising the point cloud, but has some troubles aligning them.
First start with rgb odometry (through feature points aligning the point cloud (rotated from each other)) then use and learn how ICP works with the already mentioned point cloud library. Let rgb features giving you a prediction and then use ICP to optimize that when possible.
When this application works think about good fitness score calculation. If that all works use the trunk version of ICP and optimize the parameter. After this all been done You have code that is not only fast, but also with the a low error of going wrong.
The following post is explain what went wrong.
Using ICP, we refine this transformation using only geometric information. However, here ICP decreases the precision. What happens is that ICP tries to match as many corresponding points as it can. Here the background behind the screen has more points that the screen itself on the two scans. ICP will then align the clouds to maximize the correspondences on the background. The screen is then misaligned
https://github.com/introlab/rtabmap/wiki/ICP

Extrinsic Camera Calibration Using OpenCV's solvePnP Function

I'm currently working on an augmented reality application using a medical imaging program called 3DSlicer. My application runs as a module within the Slicer environment and is meant to provide the tools necessary to use an external tracking system to augment a camera feed displayed within Slicer.
Currently, everything is configured properly so that all that I have left to do is automate the calculation of the camera's extrinsic matrix, which I decided to do using OpenCV's solvePnP() function. Unfortunately this has been giving me some difficulty as I am not acquiring the correct results.
My tracking system is configured as follows:
The optical tracker is mounted in such a way that the entire scene can be viewed.
Tracked markers are rigidly attached to a pointer tool, the camera, and a model that we have acquired a virtual representation for.
The pointer tool's tip was registered using a pivot calibration. This means that any values recorded using the pointer indicate the position of the pointer's tip.
Both the model and the pointer have 3D virtual representations that augment a live video feed as seen below.
The pointer and camera (Referred to as C from hereon) markers each return a homogeneous transform that describes their position relative to the marker attached to the model (Referred to as M from hereon). The model's marker, being the origin, does not return any transformation.
I obtained two sets of points, one 2D and one 3D. The 2D points are the coordinates of a chessboard's corners in pixel coordinates while the 3D points are the corresponding world coordinates of those same corners relative to M. These were recorded using openCV's detectChessboardCorners() function for the 2 dimensional points and the pointer for the 3 dimensional. I then transformed the 3D points from M space to C space by multiplying them by C inverse. This was done as the solvePnP() function requires that 3D points be described relative to the world coordinate system of the camera, which in this case is C, not M.
Once all of this was done, I passed in the point sets into solvePnp(). The transformation I got was completely incorrect, though. I am honestly at a loss for what I did wrong. Adding to my confusion is the fact that OpenCV uses a different coordinate format from OpenGL, which is what 3DSlicer is based on. If anyone can provide some assistance in this matter I would be exceptionally grateful.
Also if anything is unclear, please don't hesitate to ask. This is a pretty big project so it was hard for me to distill everything to just the issue at hand. I'm wholly expecting that things might get a little confusing for anyone reading this.
Thank you!
UPDATE #1: It turns out I'm a giant idiot. I recorded colinear points only because I was too impatient to record the entire checkerboard. Of course this meant that there were nearly infinite solutions to the least squares regression as I only locked the solution to 2 dimensions! My values are much closer to my ground truth now, and in fact the rotational columns seem correct except that they're all completely out of order. I'm not sure what could cause that, but it seems that my rotation matrix was mirrored across the center column. In addition to that, my translation components are negative when they should be positive, although their magnitudes seem to be correct. So now I've basically got all the right values in all the wrong order.
Mirror/rotational ambiguity.
You basically need to reorient your coordinate frames by imposing the constraints that (1) the scene is in front of the camera and (2) the checkerboard axes are oriented as you expect them to be. This boils down to multiplying your calibrated transform for an appropriate ("hand-built") rotation and/or mirroring.
The basic problems is that the calibration target you are using - even when all the corners are seen, has at least a 180^ deg rotational ambiguity unless color information is used. If some corners are missed things can get even weirder.
You can often use prior info about the camera orientation w.r.t. the scene to resolve this kind of ambiguities, as I was suggesting above. However, in more dynamical situation, of if a further degree of automation is needed in situations in which the target may be only partially visible, you'd be much better off using a target in which each small chunk of corners can be individually identified. My favorite is Matsunaga and Kanatani's "2D barcode" one, which uses sequences of square lengths with unique crossratios. See the paper here.

Determine movement/motion (in pixels) between two frames

First of all I'm a total newbie in image processing, so please don't be too harsh on me.
That being said, I'm developing an application to analyse changes in blood flow in extremities using thermal images obtained by a camera. The user is able to define a region of interest by placing a shape (circle,rectangle,etc.) on the current image. The user should then be able to see how the average temperature changes from frame to frame inside the specified ROI.
The problem is that some of the images are not steady, due to (small) movement by the test subject. My question is how can I determine the movement between the frames, so that I can relocate the ROI accordingly?
I'm using the Emgu OpenCV .Net wrapper for image processing.
What I've tried so far is calculating the center of gravity using GetMoments() on the biggest contour found and calculating the direction vector between this and the previous center of gravity. The ROI is then translated using this vector but the results are not that promising yet.
Is this the right way to do it or am I totally barking up the wrong tree?
------Edit------
Here are two sample images showing slight movement downwards to the right:
http://postimg.org/image/wznf2r27n/
Comparison between the contours:
http://postimg.org/image/4ldez2di1/
As you can see the shape of the contour is pretty much the same, although there are some small differences near the toes.
Seems like I was finally able to find a solution for my problem using optical flow based on the Lukas-Kanade method.
Just in case anyone else is wondering how to implement it in Emgu/C#, here's the link to a Emgu examples project, where they use Lukas-Kanade and Farneback's algorithms:
http://sourceforge.net/projects/emguexample/files/Image/BuildBackgroundImage.zip/download
You may need to adapt a few things, e.g. the parameters for the corner detection (the frame.GoodFeaturesToTrack(..) method) , but it's definetly something to start with.
Thanks for all the ideas!

Resources