Detect nose coordinates on a face profil view - opencv

In such tasks, I tend to you Mediapipe or Dlib to detect the landmarks of the face and get the specific coordinates I'm interested to work with.
But in the case of the human face taken from a Profil view, Dlib can't detect anything and Mediapipe shows me a standard 3D face-mesh superimposed on top of the 2D image which provides false coordinates.
I was wondering if anyone with Computer Vision (Image Processing) knowledge can guide me on how to detect the A & B points coordinates from this image
PS: The color of the background changes & also the face location is not standard.
Thanks in advance strong text

Your question seems a little unclear. If you just want (x,y) screen coordinates you can use this answer to convert the (x,y,z) that mediapipe gives you to just (x,y). If this doesn't
doesnt work for you I would recommend this repo or this one which both only work with 68 facial landmarks, but this should be sufficient for your use case.
If all of this fails I would recommend retraining hrnet on a dataset with profile views. I believe either 300-W dataset or the 300-VW dataset provides some data with heads at extreme angles.
If you wish to get the 3D coordinates in camera coordinates (X,Y,Z) you're going to have to use solvePNP. This will require getting calibration info and model mesh points for whatever facial landmark detector you use. You can find a some examples for some of this here

Related

Estimating pose of one camera given another with known baseline

I am a beginner when it comes to computer vision so I apologize in advance. Basically, the idea I am trying to code is that given two cameras that can simulate a multiple baseline stereo system; I am trying to estimate the pose of one camera given the other.
Looking at the same scene, I would incorporate some noise in the pose of the second camera, and given the clean image from camera 1, and slightly distorted/skewed image from camera 2, I would like to estimate the pose of camera 2 from this data as well as the known baseline between the cameras. I have been reading up about homography matrices and related implementation in opencv, but I am just trying to get some suggestions about possible approaches. Most of the applications of the homography matrix that I have seen talk about stitching or overlaying images, but here I am looking for a six degrees of freedom attitude of the camera from that.
It'd be great if someone can shed some light on these questions too: Can an approach used for this be extended to more than two cameras? And is it also possible for both the cameras to have some 'noise' in their pose, and yet recover the 6dof attitude at every instant?
Let's clear up your question first. I guess You are looking for the pose of the camera relative to another camera location. This is described by Homography only for pure camera rotations. For General motion that includes translation this is described by rotation and translation matrices. If the fields of view of the cameras overlap the task can be solved with structure from motion which still estimates only 5 dof. This means that translation is estimated up to scale. If there is a chessboard with known dimensions in the cameras' field of view you can easily solve for 6dof by running a PnP algorithm. Of course, cameras should be calibrated first. Finally, in 2008 Marc Pollefeys came up with an idea how to estimate 6 dof from two moving cameras with non-overlapping fields of view without using any chess boards. To give you more detail please tell a bit for the intended appljcation you are looking for.

why is shape-indexed-feature so effective on face alignment?

I am implementing some face alignment algorithm recently. I have read the following papers:
Supervised descent method and its applications to face alignment
Face alignment by explicit shape regression
Face alignment at 3000 fps via regressing local binary features
All the paper mentioned a important keyword: shape-indexed-feature or pose-indexed-feature. This feature plays a key role in face alignment process. I did not get the key point of this feature. Why is it so important?
A shape-indexed-feature is a feature who's index gives some clue about the hierarchical structure of the shape that it came from. So in face alignment, facial landmarks are extremely important, since they are the things that will be useful in successfully aligning the faces. But, just taking facial landmarks into account throws away some of the structure inherent to a face. You know that the pupil is inside the iris, which is inside the eye. So a shape-indexed-feature would do more than tell you that you are looking at a facial landmark - it would tell you that you are looking at a facial landmark inside another landmark inside another landmark. Because there are only a few features that are 3-nested like that, you can be more confident about aligning those correctly.
Here is a much older paper that explains some of this with simpler language (especially in the introduction): http://www.cs.ubc.ca/~lowe/papers/cvpr97.pdf
If you want get shape-indexed-feature, you should do similarity transform for the face landmarks in one image first. The aim is transform the origin landmarks to a specific location which could be the mean landmark of all images. So the landmarks of each image is at same position.
Then you could extract local features according to the relocate landmarks, which are shape-indexed-feature, cause now the landmarks of each image is a fix shape.
I seached hours get the answer above, a graduation thesis and translated it, but not sure whether it's a right answer or not. In my opinion, it make sense.

Best Facial Landmark that can be easily extracted from NIR Image?

I'm playing with Eye Gaze estimation using a IR Camera. So far i have detected the two Pupil Center points as follows:
Detect the Face by using Haar Face cascade & Set the ROI to Face.
Detect the Eyes by using Haar Eye cascade & Name it as Left & Right Eye Respectively.
Detect the Pupil Center by thresholding the Eye region & found the pupil center.
So far I've tried to find the gaze direction by using the Haar Eye Boundary region. But this Haar Eye rect is not always showing the Eye Corner points. So the results was poor.
Then I've tried to tried to detect Eye Corner points using GFTT, Harriscorners & FAST but since I'm using NIR Camera the Eye Corner points are not clearly visible & so i cant able to get the exact corner positions.So I'm stuck here!
What else is the best feature that can be tracked easily from face? I heard about Flandmark but i think that is also will not work in IR captured images.
Is there any feature that can be extracted easily from the face images? Here I've attached my sample output image.
I would suggest flandmark, even if your intuition is the opposite - I've used it in my master thesis (which was about head pose estimation, a related topic). And if the question is whether it will work with the example image you've provided, I think it might detect features properly - even on a gray scaled image. I think in the flandmark they probably convert to image to grayscale before applying a detector (like the haar detector works). Moreover, It surprisingly works with the low resolution images what is an advantage too (especially when you're saying eye corners are not clearly visible). And flandmark can detect both eye corners, mouth corners and nose tip (actually I will not rely on the last one, from my experience detecting nose tip on the single image is quite noisy, however works fine with an image sequence with some filtering e.g. average, Kalman). If you decide to use that technique and it works, please let us know!

Correspondence between a set of 3D model points and their image projections

I have a set of 3-d points and some images with the projections of these points. I also have the focal length of the camera and the principal point of the images with the projections (resulting from previously done camera calibration).
Is there any way to, given these parameters, find the automatic correspondence between the 3-d points and the image projections? I've looked through some OpenCV documentation but I didn't find anything suitable until now. I'm looking for a method that does the automatic labelling of the projections and thus the correspondence between them and the 3-d points.
The question is not very clear, but I think you mean to say that you have the intrinsic calibration of the camera, but not its location and attitude with respect to the scene (the "extrinsic" part of the calibration).
This problem does not have a unique solution for a general 3d point cloud if all you have is one image: just notice that the image does not change if you move the 3d points anywhere along the rays projecting them into the camera.
If have one or more images, you know everything about the 3D cloud of points (e.g. the points belong to an object of known shape and size, and are at known locations upon it), and you have matched them to their images, then it is a standard "camera resectioning" problem: you just solve for the camera extrinsic parameters that make the 3D points project onto their images.
If you have multiple images and you know that the scene is static while the camera is moving, and you can match "enough" 3d points to their images in each camera position, you can solve for the camera poses up to scale. You may want to start from David Nister's and/or Henrik Stewenius's papers on solvers for calibrated cameras, and then look into "bundle adjustment".
If you really want to learn about this (vast) subject, Zisserman and Hartley's book is as good as any. For code, look into libmv, vxl, and the ceres bundle adjuster.

Find the position of a pattern/marker inside a photograph

i need to find a marker like the ones used in Augmented Reality.
Like this:
I have a solid background on algebra and calculus, but no experience whatsoever on image processing. My thing is Php, sql and stuff.
I just want this to work, i've read the theory behind this and it's extremely hard to see in code for me.
The main idea is to do this as a batch process, so no interactivity is needed. What do you suggest?
Input : The sample image.
Output: Coordinates and normal vector in 3D of the marker.
The use for this will be linking images that have the same marker to spatialize them, a primitive version of photosync we could say. Just a caroussel of pinned images, the marker acting like the pin.
The reps given allowed me to post images, thanks.
You can always look at the open source libraries such as ARToolkit and see how it works but generally in order to get the 3D coordinates of marker you would need to:
Do the camera calibration.
Find marker in image using local features for example.
Using calibrated camera parameters and 2D coordinates of marker do the approximation the 3D coordinates.
I've never implemented sth similar by myself but I think this is a general concept you should apply on your method.
Your problem can be solved by perspective n point camera pose estimation. When you can reasonably assume that all correspondences are correct, a linear algorithm should do.
Since the marker is planar, you can also recover the displacement from the homography between the model plane and the image plane (link). As usual, best results are obtained by iterative algorithms (link).

Resources