I'm playing with Eye Gaze estimation using a IR Camera. So far i have detected the two Pupil Center points as follows:
Detect the Face by using Haar Face cascade & Set the ROI to Face.
Detect the Eyes by using Haar Eye cascade & Name it as Left & Right Eye Respectively.
Detect the Pupil Center by thresholding the Eye region & found the pupil center.
So far I've tried to find the gaze direction by using the Haar Eye Boundary region. But this Haar Eye rect is not always showing the Eye Corner points. So the results was poor.
Then I've tried to tried to detect Eye Corner points using GFTT, Harriscorners & FAST but since I'm using NIR Camera the Eye Corner points are not clearly visible & so i cant able to get the exact corner positions.So I'm stuck here!
What else is the best feature that can be tracked easily from face? I heard about Flandmark but i think that is also will not work in IR captured images.
Is there any feature that can be extracted easily from the face images? Here I've attached my sample output image.
I would suggest flandmark, even if your intuition is the opposite - I've used it in my master thesis (which was about head pose estimation, a related topic). And if the question is whether it will work with the example image you've provided, I think it might detect features properly - even on a gray scaled image. I think in the flandmark they probably convert to image to grayscale before applying a detector (like the haar detector works). Moreover, It surprisingly works with the low resolution images what is an advantage too (especially when you're saying eye corners are not clearly visible). And flandmark can detect both eye corners, mouth corners and nose tip (actually I will not rely on the last one, from my experience detecting nose tip on the single image is quite noisy, however works fine with an image sequence with some filtering e.g. average, Kalman). If you decide to use that technique and it works, please let us know!
Related
I've been using Vision to identify Facial Landmarks, using VNDetectFaceLandmarksRequest.
It seems that whenever a face is detected, the resulting VNFaceObservation will always contain all the possible landmarks, and have positions for all of them. It also seems that positions returned for the occluded landmarks are 'guessed' by the framework.
I have tested this using a photo where the subject's face is turned to the left, and the left eye thus isn't visible. Vision returns a left eye landmark, along with a position.
Same thing with the mouth and nose of a subject wearing a N95 face mask, or the eyes of someone wearing opaque sunglasses.
While this can be a useful feature for other use cases, is there a way, using Vision or CIDetector, to figure which face landmarks actually are visible on a photo?
I also tried using CIDetector, but it appears to be able to detect mouths and smiles through N95 masks, so it doesn't seem to be a reliable alternative.
After confirmation from Apple, it appears it simply cannot be done.
If Vision detects a face, it will guess some occluded landmarks' positions, and there is no way to differentiate actually detected landmarks from guesses.
For those facing the same issue, a partial way around can be to compare the landmarks' points' positions to those of the median line's and the nose crest's points.
While this can help determine if a facial landmark is occluded by the face itself, it won't help with facial landmarks occluded by opaque sunglasses or face masks.
In such tasks, I tend to you Mediapipe or Dlib to detect the landmarks of the face and get the specific coordinates I'm interested to work with.
But in the case of the human face taken from a Profil view, Dlib can't detect anything and Mediapipe shows me a standard 3D face-mesh superimposed on top of the 2D image which provides false coordinates.
I was wondering if anyone with Computer Vision (Image Processing) knowledge can guide me on how to detect the A & B points coordinates from this image
PS: The color of the background changes & also the face location is not standard.
Thanks in advance strong text
Your question seems a little unclear. If you just want (x,y) screen coordinates you can use this answer to convert the (x,y,z) that mediapipe gives you to just (x,y). If this doesn't
doesnt work for you I would recommend this repo or this one which both only work with 68 facial landmarks, but this should be sufficient for your use case.
If all of this fails I would recommend retraining hrnet on a dataset with profile views. I believe either 300-W dataset or the 300-VW dataset provides some data with heads at extreme angles.
If you wish to get the 3D coordinates in camera coordinates (X,Y,Z) you're going to have to use solvePNP. This will require getting calibration info and model mesh points for whatever facial landmark detector you use. You can find a some examples for some of this here
I am asked to create calibration for the eye-tracking algorithm. However, I still don't really understand about how does the calibration helps in making our gaze estimation more accurate, as well as how calibration in eye-tracking actually works. I have read https://www.tobiidynavox.com/support-training/eye-tracker-calibration/, as well as https://developer.tobii.com/community/forums/topic/explain-calibration/, but I still don't fully understand it. I will appreciate if somebody can explain it to me.
Thank you
In the answer below, I assume that you are referring to standard pupil-centre corneal-reflection video-oculography rather than any other form of eye tracking technology.
In eye tracking, calibration is the process that transforms the coordinates of features located in a two dimensional still video frame of the eye into gaze coordinates (i.e. coordinates that are related to the world being observed). For example, let's say your eye tracker produces a 400 × 400 pixel image of the eye, and the subject is looking at a screen that is 1024 × 768 pixels in size, some distance in front of them. The calibration process needs to relate coordinates in the eye image to where the person is looking (i.e. gazing) at on the display screen. This process is not trivial: just because the pupil is centred in the eye image does not mean that the person is looking at the centre of the display in the world, for example. And the position of the pupil centre could move within the eye image even though the direction of gaze is held constant in the world. This is why we track the centre of the pupil and the corneal reflection, as the vector linking the two is robust to translation of the eye within the image that occurs in the absence of a gaze rotation.
A standard way to do this mapping is via relatively straightforward 2D non-linear regression: you move a target at known coordinates on the display and ask the participant to fixate steadily on each, while recording the location of the pupil centre and corneal reflection in the eye image. The calibration process will map the vector linking the pupil centre and the corneal reflection to the corresponding known gaze coordinates. This produces a regression solution that allows you to map intermediate locations to their interpolated gaze coordinates.
(An alternative, or supplementary, approach is model-based rather than regression-based, but let's not go there right now.)
So in essence, calibration doesn't improve gaze estimation, it provides gaze estimation. Without first doing a calibration, all you are doing is tracking the movements of features (the pupil and corneal reflection) within a relatively arbitrary image of the eye. Until calibration is carried out, you have no idea at that stage where that eye is actually pointing in the world.
Having said all that, this is not at all a coding-based question (or answer), so not actually sure that StackOverflow is the ideal venue to be asking this.
I am able to detect eyes, nose and mouth in a given face using Matlab. Now, I want four more points i.e corners of the eyes and nose. how do i get these points??
This is the Image for corner points of nose.
Red point is showing the point, what I'm looking for.(its just to let you know.. there is no point in original image)
Active Appearance Model (AAM) could be useful in your case.
AMM is normally used for matching a statistical model of object shape and appearance to a new image and widely used for extracting face features and for head pose estimation.
I believe this could be helpful for you to start with.
You can try using corner detectors included in the computer vision system toolbox, such as detectHarrisFeatures, detectMinEigenFeatures, or detectFASTFeatures. However they may give you more points than you want, so you will have to do some parameter tweaking.
I am working on Face detection using Core Image. I m facing a problem when giving the points on face boundaries. Actually I ve to give the points on face boundaries and make those points in a movable state.
Please share your ideas with me..
Thanks in Advance...
I will suggest you four ways. I have not done them in IOS. I implemented them in java. For face boundary detection you can use following methods:
Skin color Thresholding followed by chin curve estimate and shrinking and growing algorithm.
Canny Edge Detection followed by four region divide and getting longest face edge map.
Adaptive active contour model (Snake algorithm).
Hair region detection followed by chin curve esimate and removing all the region except hair and chin. (Doesn't work for bald.)
If there are shadows in the face use Snake algorithm. If the faces in the images are clear then go for Skin Color Thresholding. Canny Edge Detection can give u very slow performance.
For further clarification read this : FACE REGION DETECTION