the position of image plane in optical imaging - opencv

In the book Learning OpenCV, there is one figure that show the image plane always is focal image.
The description of the figure from the book is as follow.
We begin by looking at the simplest model of a camera, the pinhole camera model. In this simple model, light is envisioned as entering from the scene or a distant object, but only a single ray enters from any particular point. In a physical pinhole camera, this point is then “projected” onto an imaging surface. As a result, the image on this image plane(also called the projective plane) is always in focus, and the size of the image relative to the distant object is given by a single parameter of the camera: its focal length. For our idealized pinhole camera, the distance from the pinhole aperture to the screen is precisely the focal length. Th is is shown in Figure 11-1, where fis the focal length of the camera, Zis the distance from the camera to the object, Xis the length of the object, and xis the object’s image on the imaging plane. In the figure, we can see by similar triangles that –x/f = X/Z,
From this link https://en.wikipedia.org/wiki/Lens_(optics)
There is thin lens equation. Only object in infinity can be imaged in focal image. Which one is correct?

Both are correct but they're stating different subjects. The mentioned Wikipedia's page is not talking about pinhole camera model. In pinhole camera you can put the image plane in any distance that you want (limited to the size of the image plane) and the image would be formed on that anyway. However in the second model that you brought from Wikipedia, the image plane has to be somewhere that is matched with the formula because of the existence of the lens, and that's the reason you have blur image when you take a picture out of the camera's focus.
In other word with a specific lens, there is a predefined focal length, so you have to set your image plane in a distance that gives you the clear image (Of course in digital camera, this distance is constant but there are multiple lenses and the camera changes the focal length by moving the lenses). In pinhole model there is no lens so there is no predefined focal length and the image will be caught correctly wherever you put the catcher plane. (There are many things that we have to consider but I came up with the simplest sketch.)

Related

view mapping between two images taken from same cameras of same scene using homography in open CV, except the camera positions are not parallel

I am trying to understand mapping points between two images of same scene except the camera positions are different. say like this apologies for the rough sketch and the hand-writing. Sample image taken from cam1 and Sample image taken from cam2 . Trying to map between these two images. since the two cameras used are same(logitech camera). I assume camera calibration isn't required. So with the help of SIFT descriptors and feature matching, using the good matches from the images as inputs to Homography with RANSAC. I get 3*3 matrix. To verify the view mapping. I select few objects(say bins in the image) in cam1 image and try to map the same object in cam2 image using 3 * 3 matrix by using warp_perspective, but the outputs aren't good. say something like this had selected top left and bottom right of the objects in cam1 image(i.e. bins) and trying to draw a bounding box for the desired object in cam2 image.
But as visible in the view map output image the bounding boxes aren't proper to the bins.
Wanted to understand, where am i going wrong. Is it the camera positions affecting, and this shouldn't be used for homography or have to use multiple homographies or have to get to know the translation between the camera positions. very confused. Thank you.
Homography transforms plane into a plane. It can only be used if all of the matches lay on a plane in real world (e.g. on the planar wall) or the feature points are located far from both cameras so the transformation between the cameras might be expressed as pure rotation. See this link for further explanation.
In your case the objects are located at different depths so you need to perform stereo calibration of cameras and then compute the depth map to be able to map pixels from one camera into another.

how to obtain the world coordinates of an image

After to calibrated a camera using Jean- Yves Bouget's Camera Calibration Toolbox and checkerboard-patterns printed on cardboard, I´ve obtained extrinsic and intrinsic parameters, I can use the informations to find camera coordinates:
Pc = R * Pw + T
After that, how to obtain the world coordinates of an image using the Pc and calibration parametesr?
thanks in advance.
EDIT
The goal is to use the calibrated camera parameters to measure planar objects with a calibrated Camera). To perform this task i dont know to use the camera parameters. in other words i have to convert the pixels coordinates of the image to world coordinates using the calibrated parameters. I already have the parameters and the new image. How can i do this convertion?
thanks in advance.
I was thinking about problem, and came to the result:
You can't find the object size. The problem is by a single shot, when you have no idea how far the Object is from your camera you can't say something about the size of the object. The calibration just say how far is the image plane from the camera (focal length) and the open angles of the lense. When the focal length changes the calbriation changes too.
But there are some possibiltys:
How to get the real life size of an object from an image, when not knowing the distance between object and the camera?
So how I understand you can approximate the size of the objects.
Your problem can be solved if (and only if) you can express the plane of your object in calibrated camera coordinates.
The calibration procedure outputs, along with the camera intrinsic parameters K, a coordinate transform matrix for every calibration image Qwc_i = [Rwc_i |Twc_i] matrix, that expresses the location and pose of a particular scene coordinate frame in the camera coordinates at that calibration image. IIRC, in Jean-Yves toolbox this is the frame attached to the top-left corner of the calibration checkerboard.
So, if your planar object is on the same plane as the checkerboard in one of the calibration images, all you have to do in order to find its location in space is intersect the checkerboard plane with camera rays cast from the camera center (0,0,0) to the pixels into which the object is imaged.
If your object is NOT in one of those planes, all you can do is infer the object's own plane from additional information, if available, e.g. from a feature of known size and shape.

Consistency of projecting points onto an undistorted image

I want to project a point in 3D space into 2D image coordinates. I have the calibrated intrinsics and extrinsics of the camera I'm using. I have the camera matrix K and distortion coefficients D. However, I want the projected image coordinates to be of the undistorted image.
From my research, I found two ways to do this.
Use opencv's getOptimalNewCameraMatrix function to compute a new undistorted image's camera matrix K'. Then use this K' in opencv's projectPoints function, with the distortion parameters set to 0, to get the projected point.
Use projectPoints function using the raw camera matrix K, along with the distortion coefficients D in this function and get the projected point.
Should the output of both methods match?
I think that there is something missing in your thought.
Camera matrix K and dist. coefficent D are the parameters for make the undistortion (if your lens is distorting the image like in a fisheye). They are what is called intrinsic camera parameters.
If we change terms from computer vision to computer graphics, those parameters are the one you use for defining the frustum of the view, and, for example, they are used for getting the focal length of the camera.
But they are not enough to do the projection stuff.
For the projection, if you think in a computer graphics term (like opengl, for instance) you need to have the model-view-projection matrix. The model matrix is the matrix that specifies the position of the object in the world. The view matrix specifies the position of the camera, and the projection matrix specify the frustum (focal angle, perspective distortion, etc).
If you want to know how to transform the points of the model from 3d to 2d (or viceversa) you need the projection and the view matrixes (you have the model matrix because you have the 3d points from which you want to start). And in computer vision the view matrix is called estrinsic parameters.
So, you need the estrinsic parameters too, that are the position of the camera in the world. That is, for instance, those parameters are the rvec and tvec that cv:: projectPoints needs.
If you want to compute them, they are exactly the output of cv::solvePnP that do the opposite of what you want to do: from some known 3d points coupled with the known 2d projection on them on the camera screen, this function gives you the estrinsic parameters (from which you can get the view matrix for some opengl-opencv-augmented-reality-whatever application via cv::Rodrigues).
Last note: while the instrinsic parameters are fixed in all the pictures you shoot with a camera (while you don't change the focal length of course), the estrinisc parameters changes every time you move the camera for take a new picture from a different view point (that is: this changes the perspective point of view, so the 3D-2D projection you want to find)
Hope could help!

Finding distance from camera to object of known size

I am trying to write a program using opencv to calculate the distance from a webcam to a one inch white sphere. I feel like this should be pretty easy, but for whatever reason I'm drawing a blank. Thanks for the help ahead of time.
You can use triangle similarity to calibrate the camera angle and find the distance.
You know your ball's size: D units (e.g. cm). Place it at a known distance Z, say 1 meter = 100cm, in front of the camera and measure its apparent width in pixels. Call this width d.
The focal length of the camera f (which is slightly different from camera to camera) is then f=d*Z/D.
When you see this ball again with this camera, and its apparent width is d' pixels, then by triangle similarity, you know that f/d'=Z'/D and thus: Z'=D*f/d' where Z' is the ball's current distance from the camera.
To my mind you will need a camera model = a calibration model if you want to measure distance or other things (int the real-world).
The pinhole camera model is simple, linear and gives good results (but won't correct distortions, (whether they are radial or tangential).
If you don't use that, then you'll be able to compute disparity-depth map, (for instance if you use stereo vision) but it is relative and doesn't give you an absolute measurement, only what is behind and what is in front of another object....
Therefore, i think the answer is : you will need to calibrate it somehow, maybe you could ask the user to approach the sphere to the camera till all the image plane is perfectly filled with the ball, and with a prior known of the ball measurement, you'll be able to then compute the distance....
Julien,

Distance to the object using stereo camera

Is there a way to calculate the distance to specific object using stereo camera?
Is there an equation or something to get distance using disparity or angle?
NOTE: Everything described here can be found in the Learning OpenCV book in the chapters on camera calibration and stereo vision. You should read these chapters to get a better understanding of the steps below.
One approach that do not require you to measure all the camera intrinsics and extrinsics yourself is to use openCVs calibration functions. Camera intrinsics (lens distortion/skew etc) can be calculated with cv::calibrateCamera, while the extrinsics (relation between left and right camera) can be calculated with cv::stereoCalibrate. These functions take a number of points in pixel coordinates and tries to map them to real world object coordinates. CV has a neat way to get such points, print out a black-and-white chessboard and use the cv::findChessboardCorners/cv::cornerSubPix functions to extract them. Around 10-15 image pairs of chessboards should do.
The matrices calculated by the calibration functions can be saved to disc so you don't have to repeat this process every time you start your application. You get some neat matrices here that allow you to create a rectification map (cv::stereoRectify/cv::initUndistortRectifyMap) that can later be applied to your images using cv::remap. You also get a neat matrix called Q, which is a disparity-to-depth matrix.
The reason to rectify your images is that once the process is complete for a pair of images (assuming your calibration is correct), every pixel/object in one image can be found on the same row in the other image.
There are a few ways you can go from here, depending on what kind of features you are looking for in the image. One way is to use CVs stereo correspondence functions, such as Stereo Block Matching or Semi Global Block Matching. This will give you a disparity map for the entire image which can be transformed to 3D points using the Q matrix (cv::reprojectImageTo3D).
The downfall of this is that unless there is much texture information in the image, CV isn't really very good at building a dense disparity map (you will get gaps in it where it couldn't find the correct disparity for a given pixel), so another approach is to find the points you want to match yourself. Say you find the feature/object in x=40,y=110 in the left image and x=22 in the right image (since the images are rectified, they should have the same y-value). The disparity is calculated as d = 40 - 22 = 18.
Construct a cv::Point3f(x,y,d), in our case (40,110,18). Find other interesting points the same way, then send all of the points to cv::perspectiveTransform (with the Q matrix as the transformation matrix, essentially this function is cv::reprojectImageTo3D but for sparse disparity maps) and the output will be points in an XYZ-coordinate system with the left camera at the center.
I am still working on it, so I will not post entire source code yet. But I will give you a conceptual solution.
You will need the following data as input (for both cameras):
camera position
camera point of interest (point at which camera is looking)
camera resolution (horizontal and vertical)
camera field of view angles (horizontal and vertical)
You can measure the last one yourself, by placing the camera on a piece of paper and drawing two lines and measuring an angle between these lines.
Cameras do not have to be aligned in any way, you only need to be able to see your object in both cameras.
Now calculate a vector from each camera to your object. You have (X,Y) pixel coordinates of the object from each camera, and you need to calculate a vector (X,Y,Z). Note that in the simple case, where the object is seen right in the middle of the camera, the solution would simply be (camera.PointOfInterest - camera.Position).
Once you have both vectors pointing at your target, lines defined by these vectors should cross in one point in ideal world. In real world they would not because of small measurement errors and limited resolution of cameras. So use the link below to calculate the distance vector between two lines.
Distance between two lines
In that link: P0 is your first cam position, Q0 is your second cam position and u and v are vectors starting at camera position and pointing at your target.
You are not interested in the actual distance, they want to calculate. You need the vector Wc - we can assume that the object is in the middle of Wc. Once you have the position of your object in 3D space you also get whatever distance you like.
I will post the entire source code soon.
I have the source code for detecting human face and returns not only depth but also real world coordinates with left camera (or right camera, I couldn't remember) being origin. It is adapted from source code from "Learning OpenCV" and refer to some websites to get it working. The result is generally quite accurate.

Resources