I have a code for head pose which returns Yaw, Pitch and Roll in terms of angle. I need to test this code whether it works fine or not.
1) With these three angles as input how can I draw a 3D line using opencv as below. IF so can some one provide a code snippet to draw a 3D line like this in my camera window?
2) Also I need to test the accuracy of my pose code. Is there any datasets available to test the pose code, if so can some one provide the links? Or is there any other ways to test the head poses' accuracy?
Check out PRIMA head pose estimation database. It is freely available. PRIMA Head Pose Estimation Database consists of 2790 face images of 15 people with variation of yaw and pitch from -90 to +90 degrees. People in the dataset wear glasses or not and have various skin color. Background is neutral and face has a visible contrast from the background. Although the resolution of images is quite small - 384 x 288.
In case of PRIMA, each image is labeled with yaw and pitch angles, but no roll. You can read those two values by parsing image file name, e.g. personne01157+15-30.jpg has a face oriented in such way that pitch = +15 degress, yaw = -30 degrees. Here you have an example how to parse database file names (although is old OpenCV API, you can use just the part which extracts and parses file names).
I have used this database for research purposes in my master thesis, you only have to cite their paper in your work if you would like to do that.
Check out the following:
http://www.vision.ee.ethz.ch/~gfanelli/head_pose/head_forest.html#db
and
http://gi4e.unavarra.es/databases/hpdb/#!prettyPhoto
Related
There are a number of calibration tutorials to calibrate camera images of chessboards in EMGU (OpenCV). They all end up calibrating and then undistorting an image for display. That's cool and all but I need to do machine vision where I am taking an image, identifying the location of a corner or blob or feature in the image and then translating the location of that feature in pixels into real world X, Y coordinates.
Pixel -> mm.
Is this possible with EMGU? If so, how? I'd hate to spend a bunch of time learning EMGU and then not be able to do this crucial function.
Yes, it's certainly possible as the "bread and butter" of OpenCV.
The calibration you are describing, in terms of removing distortions, is a prerequisite to this process. After which, the following applies:
The Intrinsic calibration, or "camera matrix" is the first of two required matrices. The second is the Extrinsic calibration of the camera which is essentially the 6 DoF transform that describes the physical location of the sensor center relative to a coordinate reference frame.
All of the Distortion Coefficients, Intrinsic, and Extrinsic Calibrations are available from a single function in Emgu.CV: CvInvoke.CalibrateCamera This process is best explained, I'm sure, by one of the many tutorials available that you have described.
After that it's as simple as CvInvoke.ProjectPoints to apply the transforms above and produce 3D coordinates from 2D pixel locations.
The key to doing this successfully this providing comprehensive IInputArray objectPoints and IInputArray imagePoints to CvInvoke.CalibrateCamera. Be sure to cause "excitation" by using many images, from many different perspectives.
A computer-tomography device has a roentgen matrix of 20x500 dots with the resolution of 2mm in each direction. This matrix is rotating around a belt, shich transports items to be analysed. A special reconstruction algorithm produces 3D model of the items from many-many matixes captured from all 360 perspectives ( one image per 1° angle).
The problem is, the reconstruction algorithm is very sensitive to the belt speed/position. Measuring the belt position requires quite complicated and expensive positining sensors and very fine mechanics.
I wonder if it is possible to calculate the belt velocity using the roentgen-image itself. It has a width of 40mm and should be sufficient for capturing the movement. The problem is, the movement is always in 2 directions - rotation and X (belt). For those working in CT-area, are you aware of some applications/publishings about such direct measurement of the belt/table velocity?
P.S.: It is not vor medical application.
Hmm, interesting idea.
Are you doing a full 180 degree for the reconstruction? I'd go with the 0 and 180 degree cone beam images. They should be approximately the same, minus some non-linear stuff, artifacts, Poisson noise and difference in 'shadows' and scattering due to perspective.
You could translate the 180 image along the negative x-axis, to the direction opposite of the movement. You could then subtract images at some suitable intervals along this axis. When the absolute of the sum hits a minimum the translation should be approx at the distance the object has moved between 0 and 180, as the mirror images cancel each other out partially.
This could obviously be ruined by artifacts and wonkily shaped heavy objects. Still, worth a try. I'm assuming your voltage is pretty high if you are doing industrial stuff.
EDIT: "A special reconstruction algorithm produces 3D model of the items from many-many matixes captured from all 360 perspectives ( one image per 1° angle)."
So yes, you are using 180+ degrees. You could then perhaps use multiple opposite images for a more robust routine. How do you get the full circle? Are you shooting through the belt?
I am a beginner when it comes to computer vision so I apologize in advance. Basically, the idea I am trying to code is that given two cameras that can simulate a multiple baseline stereo system; I am trying to estimate the pose of one camera given the other.
Looking at the same scene, I would incorporate some noise in the pose of the second camera, and given the clean image from camera 1, and slightly distorted/skewed image from camera 2, I would like to estimate the pose of camera 2 from this data as well as the known baseline between the cameras. I have been reading up about homography matrices and related implementation in opencv, but I am just trying to get some suggestions about possible approaches. Most of the applications of the homography matrix that I have seen talk about stitching or overlaying images, but here I am looking for a six degrees of freedom attitude of the camera from that.
It'd be great if someone can shed some light on these questions too: Can an approach used for this be extended to more than two cameras? And is it also possible for both the cameras to have some 'noise' in their pose, and yet recover the 6dof attitude at every instant?
Let's clear up your question first. I guess You are looking for the pose of the camera relative to another camera location. This is described by Homography only for pure camera rotations. For General motion that includes translation this is described by rotation and translation matrices. If the fields of view of the cameras overlap the task can be solved with structure from motion which still estimates only 5 dof. This means that translation is estimated up to scale. If there is a chessboard with known dimensions in the cameras' field of view you can easily solve for 6dof by running a PnP algorithm. Of course, cameras should be calibrated first. Finally, in 2008 Marc Pollefeys came up with an idea how to estimate 6 dof from two moving cameras with non-overlapping fields of view without using any chess boards. To give you more detail please tell a bit for the intended appljcation you are looking for.
This is the setup: A fairly large room with 4 fish-eye cameras mounted on the ceiling. There are no blind spots. Each camera coverage overlaps a little with the other.
The idea is to track people across these cameras. As of now a blob extracting algorithm is in place, which detects people as blobs. It's a fairly decently working algorithm which detects individual people pretty good. Am using the OpenCV API for all of this.
What I mean by track people is that - Say, camera 1 identifies two people, say Person A and Person B. Now, as these two people move from the coverage of camera 1 into the overlapping area of coverage of cam1 and cam2 and into the area where only cam2 covers, cam2 should be able to identify them as the same people A and B cam1 identified them as.
This is what I thought I'd do -
1) The camera renders the image at 15fps and I think the dimensions of the frames are of 1920x1920.
2) Identify blobs individually in each camera and give each blob an unique label.
3) Now as for the overlaps - Compute an affine transformation matrix which maps pixels on one camera's frame onto another camera's frame - this needn't be done for every frame - this can be done before the whole process starts, as a pre-processing step. So in real time, whenever I detect a blob which is in the overlapping area, all I have to do is apply the transformation matrix to the pixels in cam1 and see if there is a corresponding blob in cam2 and give them the same label.
So, Questions :
1) Would this system give me a badly-working system which tracks people decently ?
2) So, for the affine transform, do I have to convert the fish-eye to rectilinear image ? (My answer is yes, but am not too sure)
Please feel free to point out possible errors and why certain things might not work in the process I've described. Also alternate suggestions are welcome! TIA
1- blob extraction is not enough to track a specific object, for people case I suggest HoG - or at least background subtraction before blob extraction, since all of the cameras have still scenes.
2- opencv <=2.4.9 uses pinhole model for stereo vision. so, before any calibration with opencv methods your fisheye images must be converted to rectilinear images first. You might try calibrating yourself using other approaches too
release 3.0.0 will have support for fisheye model. It is on alpha stage, you can still download and give it a try.
i have a stereopair,
photo 1: http://savepic.org/1671682.jpg
photo 2: http://savepic.org/1667586.jpg
there is coordinate system in each image. How can I find coordinates of point A in this system using OpenCV library. It would be nice to see sample code.
I've looked for it at opencv.willowgarage.com/documentation/cpp/camera_calibration_and_3d_reconstruction.html but haven't found (or haven't understood :) )
Your 'stereo' images are fine. What you have already done is solve the correspondence problem: in both images you have indicated points 'A'. This means that you know which pixel corresponds to eachother labeling point 'A'.
What you want to do, is triangulate where your camera is. You can only do this by first calibrating your camera. This is inside of OpenCV already.
http://docs.opencv.org/doc/tutorials/calib3d/camera_calibration/camera_calibration.html
http://docs.opencv.org/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html
This gives you the exact vector/ray of light for each vector, and the optical center of your cameras through which the ray passes. Moreover, you need stereo calibration. This establishes the orientation and position of each camera with respect through each other.
From that point on, your triangulation is simple, knowing the pixel location in both images of point 'A'. You have
Location and orientation of camera 1 and camera 2
Otical Ray Vector (pixel location) from the cameras to label 'A'.
So you have 2 locations in space, and 2 rays from these location. The intersection of these rays is your 3D answer.
Note that in practice there rays will never exactly intersect (2 lines in 3D rarely do), so you need to approximate. Use opencv function triangulatePoints(), using the input of the stereo calibration and the pixel index relating to label A.
Firstly of all this is not truly a stereo pair. A nice stereo pair needs to have 60%-80% overlap usually small rotation differences between images. Even if this pair had the necessary BASE to be a good stereo pair due to the extremely kappa rotation the resulting epipolar image would be useless.
Secondly among others you should take a look at the camera calibration and collinearity equations both supported by OpenCV
http://en.wikipedia.org/wiki/Camera_resectioning
http://en.wikipedia.org/wiki/Collinearity_equation
You need to understand the maths.
If the page isn't enough then you should look at the opencv book - it devotes a couple of chapters to this. Then there are a lot of textbooks that cover it in more detail