I'm trying to implement a head pose estimation algorithm and I'm using a Time-of-Flight camera. I need to detect the nose tip in the point cloud data I get from the camera.
After I know where the nose tip is I would sample N nearest neighbour points around it and do Least Square Error Plane Fitting on that part of the point cloud to retrieve the Yaw and Pitch angles.
The nose detection should work for different head poses not just for a full frontal head pose.
I implemented the plane fitting and that works fine but I don't know how to detect the nose tip from the 3D data.
Any advice on how this could be done would be much appreciated.
Regards,
V.
I used to work with Kinect images that have a limit on depth z > .5m, see below. I hope you don’t have this restriction with your ToF camera. Nose as an object is not very pronounced but probably can be detected using connected components on depth image. You have to find it as a blob on otherwise flat face. You can further confirm that it is a nose by comparing face depth with nose depth and nose position relative to the face. This of course doesn’t apply to the non frontal pose where nose should be found differently.
I suggest inverting your logical chain of processing: find nose then found face and start looking for a head first (as a larger object with possibly better depth contrast) and then for nose. Head is well defined by its size and shape in 3D and a face 2D detection can also fit a raw head model into your 3D point cloud using similarity transform in 3D.
link to Kinect depth map
Related
In such tasks, I tend to you Mediapipe or Dlib to detect the landmarks of the face and get the specific coordinates I'm interested to work with.
But in the case of the human face taken from a Profil view, Dlib can't detect anything and Mediapipe shows me a standard 3D face-mesh superimposed on top of the 2D image which provides false coordinates.
I was wondering if anyone with Computer Vision (Image Processing) knowledge can guide me on how to detect the A & B points coordinates from this image
PS: The color of the background changes & also the face location is not standard.
Thanks in advance strong text
Your question seems a little unclear. If you just want (x,y) screen coordinates you can use this answer to convert the (x,y,z) that mediapipe gives you to just (x,y). If this doesn't
doesnt work for you I would recommend this repo or this one which both only work with 68 facial landmarks, but this should be sufficient for your use case.
If all of this fails I would recommend retraining hrnet on a dataset with profile views. I believe either 300-W dataset or the 300-VW dataset provides some data with heads at extreme angles.
If you wish to get the 3D coordinates in camera coordinates (X,Y,Z) you're going to have to use solvePNP. This will require getting calibration info and model mesh points for whatever facial landmark detector you use. You can find a some examples for some of this here
I am tracking a moving vehicle with a stereo camera system. In both images I use background segmentation to get only the moving parts in the pictures, then put a rectangle around the biggest object.
Now I want to get the 3D coordinates of the center of the rectangle. The identified centers in the two 2D pictures are almost correlating points (I know not exactly). I did a stereo calibration with MATLAB, so I have the intrinsic parameters of both cameras and the extrinsic parameters of the stereo system.
OpenCV doesn't provide any function for doing this as far as I know and to be honest reading Zisserman didn't really help me, but maybe I am just blind to the obvious.
This should work:
1. For both camera's, compute a ray from your camera origin through the rectangle's center.
2. Convert the rays to world coordinates.
3. Compute the intersection between the two rays (or the closest point, in case they do not exactly intersect)
I'm playing with Eye Gaze estimation using a IR Camera. So far i have detected the two Pupil Center points as follows:
Detect the Face by using Haar Face cascade & Set the ROI to Face.
Detect the Eyes by using Haar Eye cascade & Name it as Left & Right Eye Respectively.
Detect the Pupil Center by thresholding the Eye region & found the pupil center.
So far I've tried to find the gaze direction by using the Haar Eye Boundary region. But this Haar Eye rect is not always showing the Eye Corner points. So the results was poor.
Then I've tried to tried to detect Eye Corner points using GFTT, Harriscorners & FAST but since I'm using NIR Camera the Eye Corner points are not clearly visible & so i cant able to get the exact corner positions.So I'm stuck here!
What else is the best feature that can be tracked easily from face? I heard about Flandmark but i think that is also will not work in IR captured images.
Is there any feature that can be extracted easily from the face images? Here I've attached my sample output image.
I would suggest flandmark, even if your intuition is the opposite - I've used it in my master thesis (which was about head pose estimation, a related topic). And if the question is whether it will work with the example image you've provided, I think it might detect features properly - even on a gray scaled image. I think in the flandmark they probably convert to image to grayscale before applying a detector (like the haar detector works). Moreover, It surprisingly works with the low resolution images what is an advantage too (especially when you're saying eye corners are not clearly visible). And flandmark can detect both eye corners, mouth corners and nose tip (actually I will not rely on the last one, from my experience detecting nose tip on the single image is quite noisy, however works fine with an image sequence with some filtering e.g. average, Kalman). If you decide to use that technique and it works, please let us know!
I have an application which extract the basic keypoints of the face. This includes the the corners of the eyes, the corners of the mouth, the nose, and the face border.
Is there any build application or algorithm which I can apply so that I can detect the face angle(how much is it oriented to the left or how much is it oriented to the right)?
You will have to detect the face plane in 3D system. This would involve finding extrinsic parameters (orientation and pose) of the face plane with respect to camera. Base on how the face plane is related to image sensor plane, you can determine its orientation towards right and left.
There are numerous posts about extrinsic parameters in stackoverflow. A quick search would help you in finding the extrinsic parameters.
P.S : You will have to calibrate your camera, before you find extrinsic parameters
I have a set of 3-d points and some images with the projections of these points. I also have the focal length of the camera and the principal point of the images with the projections (resulting from previously done camera calibration).
Is there any way to, given these parameters, find the automatic correspondence between the 3-d points and the image projections? I've looked through some OpenCV documentation but I didn't find anything suitable until now. I'm looking for a method that does the automatic labelling of the projections and thus the correspondence between them and the 3-d points.
The question is not very clear, but I think you mean to say that you have the intrinsic calibration of the camera, but not its location and attitude with respect to the scene (the "extrinsic" part of the calibration).
This problem does not have a unique solution for a general 3d point cloud if all you have is one image: just notice that the image does not change if you move the 3d points anywhere along the rays projecting them into the camera.
If have one or more images, you know everything about the 3D cloud of points (e.g. the points belong to an object of known shape and size, and are at known locations upon it), and you have matched them to their images, then it is a standard "camera resectioning" problem: you just solve for the camera extrinsic parameters that make the 3D points project onto their images.
If you have multiple images and you know that the scene is static while the camera is moving, and you can match "enough" 3d points to their images in each camera position, you can solve for the camera poses up to scale. You may want to start from David Nister's and/or Henrik Stewenius's papers on solvers for calibrated cameras, and then look into "bundle adjustment".
If you really want to learn about this (vast) subject, Zisserman and Hartley's book is as good as any. For code, look into libmv, vxl, and the ceres bundle adjuster.