I am developing an iOS app where I want to get the size of user's real face so that I can suggest him/her suitable (matched) sized glasses.
I have detected user's face using OpenCV and got various dimensions of eyes, nose, face, etc.
But I want the real size from that dimensions (i.e in millimetres that I am getting in pixels).
I have searched a lot but could not get any solution matching with my requirement.
Has anyone idea how to calculate real size (i.e in millimetres) of the someone's face?
Thank you.
I think there are two ways of doing it.
You have an object of known size in the image that you can use to compare with. That object must also be at the same (or known) distance from the camera as the face.
If the camera supports depth, you can get the distance to the face from the camera, and using that to calculate the actual size of the face. This option is currently only available on iPhone X. The accuracy of the depth data can vary, so I am not sure how well it might work for you.
Read more about capturing depth data here
Read more on depth data accuracy here
If you have no reference point for size in the image, i guess there is really no way to tell the exact size. You would need at least one length that correlates to your picture to get some sort of a result.
That said, this would only work a 100% accurately on images of plain objects, because objects further away seem to be smaller in an image (like, e.g. here).
You would need multiple pictures from different sides (all with a size reference) and there would be a horrendous amount of calculations to do.
The focal length of the camera will distort your image as well, making accurate measurement even harder (see comparision of different focal lengths with different distances to the face).
Assume that you have a square shaped and red coloured paper with a length of 5 centimetres.
I can detect it's (good enough) size (bounding box) in pixels on the image that camera takes.
I know the physical size.
On the other hand,
I do not know how to use the camera image's pixel size in the formula with the actual physical size of the paper. To do this, I would perhaps need a constant that will come from the specifications of the camera.
I believe mobile each iOS model has different calibration (and probably even the same models may be different than each other slightly?).
I think, if somehow I can get that information from the camera, I can map certain things and use the ratio to find the distance to a specific object.
Do you see any problems in the above ideas?
What would be the best way to get that constant from the camera? iOS Camera API? Announced specifications of the lens? Individually measuring and comparing size of the box on the image at the same camera distance on different iOS device models and saving those values per model type?
If all of these do not make any sense, what would you recommend me to look into? I appreciate the time taken by you to read this question and comment on it.
I have a photocamera mounted vertically under water in a tank, looking downwards.
There is a flat grid on the bottom of the tank (approx 2m away from the camera).
I want to be able to place markers on the bottom, and use computer vision to know their real life exact position.
So, I need to map from pixels to mm.
If I am not mistaken, cv::calibrateCamera(...) does just this, but is dependent on moving a pattern in front of the camera.
I have just static pictures of the scene, and the camera never moves in relation to the grid. Thus, I have only a "single" image to find the parameters.
How can I do this using the grid?
Thank you.
Interesting problem! The "cute" part is the effect on the intrinsic parameters of the refraction at the water-glass interface, namely to increase the focal length (or, conversely, to reduce the field of view) compared to the same lens in air. In theory, you could calibrate in air and then correct for the difference in refraction index, but calibrating directly in water is likely to give you more accurate results.
Do know your accuracy requirements? And have you verified that your lens/sensor combination is adequate to meet them (with an adequate margin)? To answer the question you need to estimate (either by calculation from the lens and sensor specifications, or experimentally using a resolution chart) whether you can resolve in an image the minimal distances required by your application.
From the wording of your question I think that you are interested only in measurements on a single plane. So you only need to (a) remove the nonlinear (barrel or pincushion) lens distortion and (b) estimate the homography between the plane of interest and the image. Once you have the latter, you can directly convert from undistorted image coordinates to world ones by matrix multiplication. Additionally if (as I imagine) the plane of interest is roughly parallel to the image plane, you should not have any problem keeping the entire field-of-view in focus.
Of course, for all of this to work as expected, you should make sure that the tank bottom is really flat, within the measurement tolerances of your application. Otherwise you are really dealing with a 3D problem, and need to modify your procedures accordingly.
The actual procedure depends a lot on the size of the tank, which you don't indicate clearly. If it's small enough that it is practical to manufacture a chessboard-like movable calibration target, by all means go for it. You may want to take a look at this other answer for suggestions. In the following I'll discuss the more interesting case in which your tank is large, e.g. the size of a swimming pool.
I'd proceed by sticking calibration markers in a regular grid at the pool bottom. I'd probably choose checker-like markers like these, maybe printing them myself with a good laser printer on plastic with an adhesive backing (assuming you can leave them in place forever). You should plan on having quite a few of them, say, an 8x8 or 10x10 grid, covering as much as possible of the field of view of the camera in its operating position and pose. To help with lining up the grid nicely you might use a laser line projector of suitable fan angle, or a laser pointer attached to a rotating support. Note carefully that it is not necessary that they be affixed in a precise X-Y grid (which may be complicated, depending on the size of your pool), only that their positions with respect any arbitrarily chosen (but fixed) three of them be known. In other words, you can attach them to the bottom approximately in a grid, then measure the distances of three extreme corners from each other as accurately as you can, thus building a base triangle, then measure the distances of all the other corners from the vertices of the triangle, and finally reconstruct their true positions with a bit of trigonometry. It's basically a surveying problem and, depending on your accuracy requirements and budget, you may want to enroll a local friendly professional surveyor (and their tools) to get it done as precisely as necessary.
Once you have your grid, you can fill the pool, get your camera, focus and f-stop the lens as needed for the application. From now on you may not touch the focus and f-stop ever again, under penalty of miscalibrating - exposure can only be controlled by the exposure time, so make sure to have enough light. Disable any and all auto-focus and auto-iris functions, if any. If the camera has a non-rigid lens mount (e.g. a DLSR), you'll need some kind of mechanical rig to ensure that the lens-body pair stay rigid. F-stop as close as you can, given the available lighting and sensor, so to have a fair bit of depth of field available. Then take several photos (~ 10) of the grid, moving and rotating the camera, and going a bit closer and farther away than your expected operating distance from the plane. You'll want to "see" in some images some significant perspective foreshortening of the grid - this is needed to accurately calibrate the focal length. Avoid JPG and any other lossy compression format when storing the images - use lossless PNG or TIFF.
Once you have the images, you can manually mark and identify the checker markers in the images. For a once-off project like this I would not bother with automatic identification, just do it manually (e.g. in Matlab, or even in Photoshop or Gimp). To help identify the markers, you could, e.g. print a number next to them. Once you have the manual marks, you can refine them automatically to subpixel accuracy, e.g. using cv::findCornerSubpix.
You're almost done. Feed the "reference" measured position of the real corners, and the observed ones in all images, to your favorite camera calibration routine, e.g. cv::calibrateCamera. You use the nominal focal length of the camera (converted to pixels) for an initial estimate, along with null distortion. If all goes well, you will obtain the camera intrinsic parameters, which you will keep, and the camera poses at all images, which you'll throw away.
Now you can mount the camera in your final setup, as needed by your application, and take one further image of the grid. Mark and refine the corner positions as before. Undistort their image positions using the distortion parameters returned by the calibration. Finally compute the homography between the reference positions of the real markers (in meters) and their undistorted positions, and you're done.
HTH
To calibrate the camera you do need multiple images of the checkerboard (or one of the other patterns found here). What you can do, is calibrate the camera outside of the water or do a calibration sequence once.
Once you have that information (focal length, center of lens, distortion, etc). You can use the solvePNP function to estimate the orientation of a single board. This estimation provides you with a distance from the camera to the board.
A completely different alternative could be to find what kind of lens the camera uses and manually fill in the data. I've not tried this, so I'm uncertain how well this would work.
I uploaded my 2D FFT magnitude image here:
If you take a look at it, for high frequencies[right, left, top and bottom], only at around x and y-axis, there are some points with high power[yellow color]. These points shouldn't be in the resultant FFT2, since I know the original height image is isotropic and therefore the 2D FFT must look something like the example below(just note high frequencies):
Now, the question is, what could be the possible reasons for such behavior at high frequencies?
added:
Here is the magnitude power spectrum before windowing:
https://dl.dropboxusercontent.com/u/82779497/nowin.png
here is the original image, which is a height profile recorded by a profilometer:
https://dl.dropboxusercontent.com/u/82779497/asph5.jpg
By the way, I export data as a .txt file from profilometer software to Matlab.
The profilometer we use for capturing the surface image, uses fringe projection method which produces some artifacts along the projected stripes on the surface. So, the problem lies on the device we are capturing images with.
Thanks for comments Eddy.
I have to make a mobile app that calculates the real life size of an object in an image.
I have done some research on it and found helpful [question]: How would you find the height of objects given an image?
The relation of the distance of the camera and real life size of the object isn't actually that complex, the ratio of the size of the object on the sensor and the size of the object in real life is the same as the ratio between the focal length and distance to the object.
distance to object (mm) = focal length (mm) * real height of the object (mm) * image height (pixels)
---------------------------------------------------------------------------
object height (pixels) * sensor height (mm)
But how to get the value of real height of the object if distance is not known ?
Do the tools that create 3d models from images have real life dimensions?
The simple answer is you can't.
Incidentally, this is why humans have two eyes. If you want to judge size without a known distance, you'll need at least two reference points. This allows you to triangulate the position of the object, get a distance to it, and use your known focal distance to calculate the size.
The more complex answer is there are ways around this for example:
Cheat by using a known reference:
For example, if you have an object of known size, you can infer the distance. This is similar to what NASA does to calibrate its cameras, for example.
You can make safe assumptions if you're dealing with common objects, such as the height of one storey when analysing the image of a building.
Move your camera around:
This allows you to get more than one reference point with the same camera.
I suppose you could use the accelerometer to accurately measure the positional relation between the image captured at point T1 in time and point T2. This would give you two images of the same subject with a known distance between them. This then allows you to triangulate as if you had two eyes.
Whether normal hand-held camera jitters will be sufficient for triangulation, or whether the accelerometer will be accurate enough to inertially position the phone, I don't know.
Assume a distance:
If your app is designed to compare something on the scale of a human hand (or other bit of human anatomy), you can probably safely assume a distance based on what people will naturally do. The focus limits of the camera itself will also give an upper and lower range on how far an object can be and still be in focus. This will probably be within a tolerable margin of error.
As you mention in your question, there is an entire subfield dedicated to this question, and it is an active research area.