How would you find the height of objects given an image? - image-processing

This isn't exactly a programming question exactly. I just want to know what your approach would be to a common problem in Digital image processing.
Let's say you have an image of a few trees in say jpg format. How would you go about finding the heights of each of these trees? The photo is the only input you have.
I want to know the approaches you have not to code. So it doesn't matter if your answers are vague, or non DIP-ish.
Small correction :
The height need not be the actual height of the tree. The height can be taken to any scale. But should be consistent to all objects in the pic.

Yes it is possible. What you are describing has an entire industry around it, called Photogrammetry

There is a fair amount of computer vision research in this area. Assuming you don't know the camera constraints, you'll have to make assumptions about the scene and camera to determine the heights up to a scale factor. Note that without camera constraints or a reference height in the image it is impossible to tell the difference between a tall tree photographed from a distance or a short tree photographed up close. A great start is the Single View Metrology work by Criminisi.

It is simple to find the size of an object from images using Photogrammetry.
Photogrammetry is the science of making measurements from photographs.
For this we need to know two things,
the distance between the camera and the image plane(distance from camera to object).
Focal-length(in mm and pixels per mm) or physical size of the image sensor.
Following are the steps:
Calibrate the Camera
Use openCV to calibrate the camera.You can use the OpenCV calibrate.py tool and the Chessboard pattern PNG provided in the source code to generate a calibration matrix. Camera calibration is done to find the camera parameters. I took about a dozen of photos of the chessboard photos from many angles as I could with my webcam (to calibrate my webcam). For more details check openCV camera calibration.
We will get f_x,f_y,c_x,c_y from calibration matrix.
Checking the details of the photos you took, you will find the native resolution of the photos(heightXwidth) and in their EXIF headers you can find the focal length value(f). These items may vary depending on your camera.
Pixels per millimeter
We need to know the pixels per millimeter(px/mm) on the image sensor.
f_x=f*m_x
f_y=f*m_y
Since we have two of the variables for each formula we can solve for m_x and m_y.I just averaged f_x and f_y to get f_xy.
m=f_xy/focal_length_of_camera
Insert the image
Insert your image from which you need to find the actual size of image. You should know the distance between object and camera. Find the dimension of the image (height1Xwidth1)
Find the Object size in pixels
Determine the size of object in pixels. I simply use distance formula to find length of a selected line. You can adopt any other method.
Convert px/mm in the lower resolution
pxpermm_in_lower_resolution = (width1*m)/width
Size of object in the image sensor
size_of_object_in_image_sensor = object_size_in_pixels/(pxpermm_in_lower_resolution)
Actual size of object
The actual size of object can be found with the above data as,
real_size = (dist*size_of_object_in_image_sensor)/focal_length

Assuming they're all the same distance away, all to scale, you'd want to find a single unit of measurement you can guarantee. For example, if there's a person in the photo, again, same scale, and you know they're exactly 6 feet tall, you use that as your measure. You then take that, and count how many stacked make the tree. For example, if you need 3.5 of this person, then:
3.5 * 6 = 21
gives you a 21 foot tall tree.
Without a single point of reference for everything, or if they're all on different scales, you would need a lot more information than you could easily get without having been there.

I would rely on an object of known dimensions to be present in the picture. For instance, a man.
Or perhaps, we could use the EXIF data to reverse engineer the size of the object based on the camera's sensor dimensions, the lens and the focal length used. This again depends on the angle. We should be getting most accurate results when the camera has been held perpendicular to the subject.

If your image is 3*3 and you want to find out the size of image (i.e 3x3..so 3x3 = 9) now we have 8 pixels starting from 0 up to 8. So 9/8=(___)kb.
If you want to find the size of image in MB, like doing above example, just do like that (9/8)/(1024)=(----)MB..
So you will get the result in Mb.

Related

Get real size of someone's face

I am developing an iOS app where I want to get the size of user's real face so that I can suggest him/her suitable (matched) sized glasses.
I have detected user's face using OpenCV and got various dimensions of eyes, nose, face, etc.
But I want the real size from that dimensions (i.e in millimetres that I am getting in pixels).
I have searched a lot but could not get any solution matching with my requirement.
Has anyone idea how to calculate real size (i.e in millimetres) of the someone's face?
Thank you.
I think there are two ways of doing it.
You have an object of known size in the image that you can use to compare with. That object must also be at the same (or known) distance from the camera as the face.
If the camera supports depth, you can get the distance to the face from the camera, and using that to calculate the actual size of the face. This option is currently only available on iPhone X. The accuracy of the depth data can vary, so I am not sure how well it might work for you.
Read more about capturing depth data here
Read more on depth data accuracy here
If you have no reference point for size in the image, i guess there is really no way to tell the exact size. You would need at least one length that correlates to your picture to get some sort of a result.
That said, this would only work a 100% accurately on images of plain objects, because objects further away seem to be smaller in an image (like, e.g. here).
You would need multiple pictures from different sides (all with a size reference) and there would be a horrendous amount of calculations to do.
The focal length of the camera will distort your image as well, making accurate measurement even harder (see comparision of different focal lengths with different distances to the face).

How to find the distance of the camera to an object with known physical size on iOS?

Assume that you have a square shaped and red coloured paper with a length of 5 centimetres.
I can detect it's (good enough) size (bounding box) in pixels on the image that camera takes.
I know the physical size.
On the other hand,
I do not know how to use the camera image's pixel size in the formula with the actual physical size of the paper. To do this, I would perhaps need a constant that will come from the specifications of the camera.
I believe mobile each iOS model has different calibration (and probably even the same models may be different than each other slightly?).
I think, if somehow I can get that information from the camera, I can map certain things and use the ratio to find the distance to a specific object.
Do you see any problems in the above ideas?
What would be the best way to get that constant from the camera? iOS Camera API? Announced specifications of the lens? Individually measuring and comparing size of the box on the image at the same camera distance on different iOS device models and saving those values per model type?
If all of these do not make any sense, what would you recommend me to look into? I appreciate the time taken by you to read this question and comment on it.

OpenCV - calibrate camera using static images in water

I have a photocamera mounted vertically under water in a tank, looking downwards.
There is a flat grid on the bottom of the tank (approx 2m away from the camera).
I want to be able to place markers on the bottom, and use computer vision to know their real life exact position.
So, I need to map from pixels to mm.
If I am not mistaken, cv::calibrateCamera(...) does just this, but is dependent on moving a pattern in front of the camera.
I have just static pictures of the scene, and the camera never moves in relation to the grid. Thus, I have only a "single" image to find the parameters.
How can I do this using the grid?
Thank you.
Interesting problem! The "cute" part is the effect on the intrinsic parameters of the refraction at the water-glass interface, namely to increase the focal length (or, conversely, to reduce the field of view) compared to the same lens in air. In theory, you could calibrate in air and then correct for the difference in refraction index, but calibrating directly in water is likely to give you more accurate results.
Do know your accuracy requirements? And have you verified that your lens/sensor combination is adequate to meet them (with an adequate margin)? To answer the question you need to estimate (either by calculation from the lens and sensor specifications, or experimentally using a resolution chart) whether you can resolve in an image the minimal distances required by your application.
From the wording of your question I think that you are interested only in measurements on a single plane. So you only need to (a) remove the nonlinear (barrel or pincushion) lens distortion and (b) estimate the homography between the plane of interest and the image. Once you have the latter, you can directly convert from undistorted image coordinates to world ones by matrix multiplication. Additionally if (as I imagine) the plane of interest is roughly parallel to the image plane, you should not have any problem keeping the entire field-of-view in focus.
Of course, for all of this to work as expected, you should make sure that the tank bottom is really flat, within the measurement tolerances of your application. Otherwise you are really dealing with a 3D problem, and need to modify your procedures accordingly.
The actual procedure depends a lot on the size of the tank, which you don't indicate clearly. If it's small enough that it is practical to manufacture a chessboard-like movable calibration target, by all means go for it. You may want to take a look at this other answer for suggestions. In the following I'll discuss the more interesting case in which your tank is large, e.g. the size of a swimming pool.
I'd proceed by sticking calibration markers in a regular grid at the pool bottom. I'd probably choose checker-like markers like these, maybe printing them myself with a good laser printer on plastic with an adhesive backing (assuming you can leave them in place forever). You should plan on having quite a few of them, say, an 8x8 or 10x10 grid, covering as much as possible of the field of view of the camera in its operating position and pose. To help with lining up the grid nicely you might use a laser line projector of suitable fan angle, or a laser pointer attached to a rotating support. Note carefully that it is not necessary that they be affixed in a precise X-Y grid (which may be complicated, depending on the size of your pool), only that their positions with respect any arbitrarily chosen (but fixed) three of them be known. In other words, you can attach them to the bottom approximately in a grid, then measure the distances of three extreme corners from each other as accurately as you can, thus building a base triangle, then measure the distances of all the other corners from the vertices of the triangle, and finally reconstruct their true positions with a bit of trigonometry. It's basically a surveying problem and, depending on your accuracy requirements and budget, you may want to enroll a local friendly professional surveyor (and their tools) to get it done as precisely as necessary.
Once you have your grid, you can fill the pool, get your camera, focus and f-stop the lens as needed for the application. From now on you may not touch the focus and f-stop ever again, under penalty of miscalibrating - exposure can only be controlled by the exposure time, so make sure to have enough light. Disable any and all auto-focus and auto-iris functions, if any. If the camera has a non-rigid lens mount (e.g. a DLSR), you'll need some kind of mechanical rig to ensure that the lens-body pair stay rigid. F-stop as close as you can, given the available lighting and sensor, so to have a fair bit of depth of field available. Then take several photos (~ 10) of the grid, moving and rotating the camera, and going a bit closer and farther away than your expected operating distance from the plane. You'll want to "see" in some images some significant perspective foreshortening of the grid - this is needed to accurately calibrate the focal length. Avoid JPG and any other lossy compression format when storing the images - use lossless PNG or TIFF.
Once you have the images, you can manually mark and identify the checker markers in the images. For a once-off project like this I would not bother with automatic identification, just do it manually (e.g. in Matlab, or even in Photoshop or Gimp). To help identify the markers, you could, e.g. print a number next to them. Once you have the manual marks, you can refine them automatically to subpixel accuracy, e.g. using cv::findCornerSubpix.
You're almost done. Feed the "reference" measured position of the real corners, and the observed ones in all images, to your favorite camera calibration routine, e.g. cv::calibrateCamera. You use the nominal focal length of the camera (converted to pixels) for an initial estimate, along with null distortion. If all goes well, you will obtain the camera intrinsic parameters, which you will keep, and the camera poses at all images, which you'll throw away.
Now you can mount the camera in your final setup, as needed by your application, and take one further image of the grid. Mark and refine the corner positions as before. Undistort their image positions using the distortion parameters returned by the calibration. Finally compute the homography between the reference positions of the real markers (in meters) and their undistorted positions, and you're done.
HTH
To calibrate the camera you do need multiple images of the checkerboard (or one of the other patterns found here). What you can do, is calibrate the camera outside of the water or do a calibration sequence once.
Once you have that information (focal length, center of lens, distortion, etc). You can use the solvePNP function to estimate the orientation of a single board. This estimation provides you with a distance from the camera to the board.
A completely different alternative could be to find what kind of lens the camera uses and manually fill in the data. I've not tried this, so I'm uncertain how well this would work.

2D FFT of an image shows some invalid values for high frequencies

I uploaded my 2D FFT magnitude image here:
If you take a look at it, for high frequencies[right, left, top and bottom], only at around x and y-axis, there are some points with high power[yellow color]. These points shouldn't be in the resultant FFT2, since I know the original height image is isotropic and therefore the 2D FFT must look something like the example below(just note high frequencies):
Now, the question is, what could be the possible reasons for such behavior at high frequencies?
added:
Here is the magnitude power spectrum before windowing:
https://dl.dropboxusercontent.com/u/82779497/nowin.png
here is the original image, which is a height profile recorded by a profilometer:
https://dl.dropboxusercontent.com/u/82779497/asph5.jpg
By the way, I export data as a .txt file from profilometer software to Matlab.
The profilometer we use for capturing the surface image, uses fringe projection method which produces some artifacts along the projected stripes on the surface. So, the problem lies on the device we are capturing images with.
Thanks for comments Eddy.

How to get the real life size of an object from an image, when not knowing the distance between object and the camera?

I have to make a mobile app that calculates the real life size of an object in an image.
I have done some research on it and found helpful [question]: How would you find the height of objects given an image?
The relation of the distance of the camera and real life size of the object isn't actually that complex, the ratio of the size of the object on the sensor and the size of the object in real life is the same as the ratio between the focal length and distance to the object.
distance to object (mm) = focal length (mm) * real height of the object (mm) * image height (pixels)
---------------------------------------------------------------------------
object height (pixels) * sensor height (mm)
But how to get the value of real height of the object if distance is not known ?
Do the tools that create 3d models from images have real life dimensions?
The simple answer is you can't.
Incidentally, this is why humans have two eyes. If you want to judge size without a known distance, you'll need at least two reference points. This allows you to triangulate the position of the object, get a distance to it, and use your known focal distance to calculate the size.
The more complex answer is there are ways around this for example:
Cheat by using a known reference:
For example, if you have an object of known size, you can infer the distance. This is similar to what NASA does to calibrate its cameras, for example.
You can make safe assumptions if you're dealing with common objects, such as the height of one storey when analysing the image of a building.
Move your camera around:
This allows you to get more than one reference point with the same camera.
I suppose you could use the accelerometer to accurately measure the positional relation between the image captured at point T1 in time and point T2. This would give you two images of the same subject with a known distance between them. This then allows you to triangulate as if you had two eyes.
Whether normal hand-held camera jitters will be sufficient for triangulation, or whether the accelerometer will be accurate enough to inertially position the phone, I don't know.
Assume a distance:
If your app is designed to compare something on the scale of a human hand (or other bit of human anatomy), you can probably safely assume a distance based on what people will naturally do. The focus limits of the camera itself will also give an upper and lower range on how far an object can be and still be in focus. This will probably be within a tolerable margin of error.
As you mention in your question, there is an entire subfield dedicated to this question, and it is an active research area.

Resources