I am doing a project which need to measure volume or H x W x T of an object (in ex: A box).I have some level of understanding in applying filters and finding contours in openCV and managed to calibrate and get depth of an image by using Intel RealSense camera.
So i have few questions in my mind.
How to find edges and coordinate of the box only within the virtually designated area? I have tried to perform canny & hough but i couldn't think of how to get only the lines of the box? Should i do apply masking or should i put a reference plane (say.. a plywood bigger than the object to be measured?)
Will viewing from this angle and distance will be successful? Should cv2 Affinity work in this method?
Please let me know if you need more questions.
Related
I am trying to find the coordinates of an object, which is detected from single camera, by using OpenCV. The camera will be mounted on the drone, looking through directly to the surface.
I have:
-Camera's coordinates from GPS sensor on the drone.
-Camera's height .
-Camera's intrinsic parameters.
3D Reconstruction formula
According to this formula, I need to find the extrinsic parameters to find the real world coordinates. I suppose to be use OpenCV’s solvePnP method to find extrinsic parameters. As I know, extrinsic parameters are about the camera location but my camera will be on the drone and the location will be change. Is the extrinsic parameters are constant just like the intrinsic parameters?
Is there any other way to do this calculation?
What you're trying to do is what is called monoplotting. In order to estimate XY real world coordinates from a single image you need to know the following:
X0, Y0, Z0 real world coordinates of your camera
Exterior orientation of your camera (Pitch, Roll and Yaw)
Interior orientation of your camera (focal length, principle point in x and y direction, radial and tangential distortion parameters)
So the XYZ of your camera and the exterior orientations are typically stored within the exif data of your image. This does however depend on the drone. There are some good python modules for extracting this information in the images. I use exifread.
The interior orientation can be more difficult to get, as you need to perform at camera calibration, which you can do with OpenCV. You should be able to use the tutorial Yunus linked in his comment. However there is a shortcut if you use photogrammetry software from pix4D, you can use their camera database which stores information about many different drones and their interior orientations. They are not perfect but should be alright for many use cases, see link.
When you have all of these parameters, you need to do the following:
Undistort your images
Create image coordinates of the points you wish to know the real world coordinate of, use undistorted image.
Create rotation matrices with rotations along X,Y,Z axis and do matrix multiplikation on them (RXRYRZ)
Apply collinearity equations
Regarding 1. you can use cv2.undistort for this, the link Yunus provided have a tutorial for this as well.
Regarding 3, I'm sure OpenCV probably can provide this matrix for you, but creating the function for your self is quite easy and good for understanding what is going on. See wikipedia: link It can be a little confusing which of the pitch, roll, yaw angles to use for which matrices. This all depends on how your cameras exterior coordinate system is. The typical convention is that pitch is around x, roll is around y and yaw is around z.
Regarding 4. The collinearity equations depends on the rotation matrices you created, see: link the term Z-Z0 just means the negative relative height above the object you try to find coordinates for. So if you do not have the relative height of your drone you need to know the height of your object and subtract the drones height from the objects height (you'll get a negative number).
I hope this helps you and have pointed you towards the right direction.
I am trying to understand mapping points between two images of same scene except the camera positions are different. say like this apologies for the rough sketch and the hand-writing. Sample image taken from cam1 and Sample image taken from cam2 . Trying to map between these two images. since the two cameras used are same(logitech camera). I assume camera calibration isn't required. So with the help of SIFT descriptors and feature matching, using the good matches from the images as inputs to Homography with RANSAC. I get 3*3 matrix. To verify the view mapping. I select few objects(say bins in the image) in cam1 image and try to map the same object in cam2 image using 3 * 3 matrix by using warp_perspective, but the outputs aren't good. say something like this had selected top left and bottom right of the objects in cam1 image(i.e. bins) and trying to draw a bounding box for the desired object in cam2 image.
But as visible in the view map output image the bounding boxes aren't proper to the bins.
Wanted to understand, where am i going wrong. Is it the camera positions affecting, and this shouldn't be used for homography or have to use multiple homographies or have to get to know the translation between the camera positions. very confused. Thank you.
Homography transforms plane into a plane. It can only be used if all of the matches lay on a plane in real world (e.g. on the planar wall) or the feature points are located far from both cameras so the transformation between the cameras might be expressed as pure rotation. See this link for further explanation.
In your case the objects are located at different depths so you need to perform stereo calibration of cameras and then compute the depth map to be able to map pixels from one camera into another.
I need to find the intrinsic parameters of a CCTV camera using a set of historic footage images (That is all I got, no control on the environment, thus no chessboard calibration).
The good news is that I have the access to some ground-truth real-world coordinates, visible in most of the images.
Just wondering if there is any solid approach to come up with the camera intrinsic parameters.
P.S. I already found the homography matrix using cv2.findHomography in Python.
P.S. I have already tested QTcalib on two machines, but it is unable to visualize the images in the first place. Not sure what is wrong with it.
Thanks in advance.
intrinsic parameters contain both fx fy cx cy and skew with additional distortion parameters k1-k5 r1-r2.
Assuming you have no distortion and cx and cy are perfectly in the center. Image origin at top left as a normal understanding of the image. As you say you know some ground truth level 3D points.3D measurements are with respect to camera optical axis. Then this 3D point P can be projected into camera image plane called p. The P p O(the camera optical center) with center lines forms isosceles triangle.
fx / (p_x-cx) = P_z / P_x
fx = (p_x-cx) * P_z / P_x
The same goes for the fy. and usually fx and fy are the same.
This is under the perfect assumption that you don't have distortion on camera. If you start to have distortion, then you need to find enough sample points all over the image to form distortion understanding as shown below. One or 2 points won't give you the whole picture understanding.
There are some cheats in some papers that using sea vanishing lines(see ref, it is a series of works) or perfect 3D building vanishing points to detect the distortion. We start from extrinsic to intrinsic and it can get some good guess after some trial eventually. But it is very much in research and can not apply to general cases.
Ref: Han Wang, Wei Mou, Xiaozheng Mou, Shenghai Yuan, Soner Ulun, Shuai Yang and Bok-Suk Shin, An Automatic Self-Calibration Approach for Wide Baseline Stereo Cameras Using Sea Surface Images, unmanned system
If all you have is a video and a few 3d points, your best bet is probably to matchmove it, that is, do a manually assisted bundle adjustment using a 3D computer graphics environment, e.g. Blender. There are a lot of tutorials online on how to do it (example). To add the 3d points as constraints, you build some shapes representing them in the virtual world (e.g. some small spheres) and place them so that their relative positions match the ground truth you have, then add them to the tracker solution.
I am currently working on a camera 3D realsense camera that detection and calculate the box or boxes dimension.
I am new in computer vision. I first worked on i just work on detection objects detection with color or without color to get a basic understanding. Using C++ and openCV, I want to managed to get the corners (and their x y z pixel coordinates) of the square using smoothing (remove noise), edge detection (canny function), lines detection (Hough transform) and lines intersection (mathematical calculation) on an simplified picture (uniform background).
Now is my question: do you have any direction/recommendation/advice/literature about dimension calculation of box. https://www.youtube.com/watch?v=l-i2E7aZY6A
i am using c++ and opencv with Intel realsens 3D camera.
thanks in advance((-_-))
Once you have the colour image pixel coordinates of the box you can obtain the real-world coordinates (also known as vertices or camera coordinates in the documentation) using methods in the projection interface, then simple pythagoras to calculate the distance between the points in mm.
If you have no experience with RealSense I'd recommend reading the documentation and looking through the sample apps included with the SDK.
With PCL (Point Cloud Library) you can find planes (or spheres and other surfaces), then refine the result with 2D image processing (eg. edge detection).
http://pointclouds.org/
https://www.youtube.com/watch?v=VD044WAHEe4
Is there a way to calculate the distance to specific object using stereo camera?
Is there an equation or something to get distance using disparity or angle?
NOTE: Everything described here can be found in the Learning OpenCV book in the chapters on camera calibration and stereo vision. You should read these chapters to get a better understanding of the steps below.
One approach that do not require you to measure all the camera intrinsics and extrinsics yourself is to use openCVs calibration functions. Camera intrinsics (lens distortion/skew etc) can be calculated with cv::calibrateCamera, while the extrinsics (relation between left and right camera) can be calculated with cv::stereoCalibrate. These functions take a number of points in pixel coordinates and tries to map them to real world object coordinates. CV has a neat way to get such points, print out a black-and-white chessboard and use the cv::findChessboardCorners/cv::cornerSubPix functions to extract them. Around 10-15 image pairs of chessboards should do.
The matrices calculated by the calibration functions can be saved to disc so you don't have to repeat this process every time you start your application. You get some neat matrices here that allow you to create a rectification map (cv::stereoRectify/cv::initUndistortRectifyMap) that can later be applied to your images using cv::remap. You also get a neat matrix called Q, which is a disparity-to-depth matrix.
The reason to rectify your images is that once the process is complete for a pair of images (assuming your calibration is correct), every pixel/object in one image can be found on the same row in the other image.
There are a few ways you can go from here, depending on what kind of features you are looking for in the image. One way is to use CVs stereo correspondence functions, such as Stereo Block Matching or Semi Global Block Matching. This will give you a disparity map for the entire image which can be transformed to 3D points using the Q matrix (cv::reprojectImageTo3D).
The downfall of this is that unless there is much texture information in the image, CV isn't really very good at building a dense disparity map (you will get gaps in it where it couldn't find the correct disparity for a given pixel), so another approach is to find the points you want to match yourself. Say you find the feature/object in x=40,y=110 in the left image and x=22 in the right image (since the images are rectified, they should have the same y-value). The disparity is calculated as d = 40 - 22 = 18.
Construct a cv::Point3f(x,y,d), in our case (40,110,18). Find other interesting points the same way, then send all of the points to cv::perspectiveTransform (with the Q matrix as the transformation matrix, essentially this function is cv::reprojectImageTo3D but for sparse disparity maps) and the output will be points in an XYZ-coordinate system with the left camera at the center.
I am still working on it, so I will not post entire source code yet. But I will give you a conceptual solution.
You will need the following data as input (for both cameras):
camera position
camera point of interest (point at which camera is looking)
camera resolution (horizontal and vertical)
camera field of view angles (horizontal and vertical)
You can measure the last one yourself, by placing the camera on a piece of paper and drawing two lines and measuring an angle between these lines.
Cameras do not have to be aligned in any way, you only need to be able to see your object in both cameras.
Now calculate a vector from each camera to your object. You have (X,Y) pixel coordinates of the object from each camera, and you need to calculate a vector (X,Y,Z). Note that in the simple case, where the object is seen right in the middle of the camera, the solution would simply be (camera.PointOfInterest - camera.Position).
Once you have both vectors pointing at your target, lines defined by these vectors should cross in one point in ideal world. In real world they would not because of small measurement errors and limited resolution of cameras. So use the link below to calculate the distance vector between two lines.
Distance between two lines
In that link: P0 is your first cam position, Q0 is your second cam position and u and v are vectors starting at camera position and pointing at your target.
You are not interested in the actual distance, they want to calculate. You need the vector Wc - we can assume that the object is in the middle of Wc. Once you have the position of your object in 3D space you also get whatever distance you like.
I will post the entire source code soon.
I have the source code for detecting human face and returns not only depth but also real world coordinates with left camera (or right camera, I couldn't remember) being origin. It is adapted from source code from "Learning OpenCV" and refer to some websites to get it working. The result is generally quite accurate.