I want to rotated my vehicle images (90, 180, or 270 degree) to the right position.
I have used a hypothesis that is: rotate the image with 4 angle (0, 90, 180, 270) and give each one to an object detetion Neural Net, the angle that gives greatest car object confidence score will be the angle I have to use to rotate my image.
The problem is: Some images with 180 rotate give better score than the original, because the NN mistakenly detect the road with 2 wheels is the car. For example:
The original image
The rotated image but get better score
How can I can prevent that from happening?
My model: ssd_resnet_50_fpn_coco from model zoo
Here is some other fail cases:
1 2 enter image description here
Search for wheels in the picture and determine if their centers are above or below the car box.
I found the solution that is a bit tricky.
Firstly, I created a model (SSD for example) that will detect 4 classes which are the rotation angle of the image, i.e: class1 - 0 degree, class2 - 90 degree, etc.
The model, then, will find 2 label correctly out of 4: 0 and 180. If the label ran into 90 or 270 (which are sometimes recognized as one another), I will rotate them again with 90 degree and let the model predict what their label are between 0 and 180. After that, I will subtract the degree with 90 degree to get the right label.
Related
I want to estimate vehicle speed. For calibration (converting pixels to meter) I only have two photos from camera.
In the first photo one vehicle is on top of the image (Photo 1).
In the second photo the same vehicle is moved for 12 meters (Photo 2).
The information that I have in meters includes:
Movement Distance in meters (Purple line)
License plate dimensions (X, Y -> its width and height in meters)
H Height of the license plate from the ground in meters
I want to know how can I write a function with these information to take an arbitrary license plate L (i.e. its width and height and location on the image in pixels) and return the distance from the first photos license plate (or any arbitrary location) in meters (Orange lice marked as Question?).
If the camera were in the top of the road we could simply convert pixels to meter but in perspective view I do not know how can I do this if it is possible?
The problem is when vehicle is small for example 5 pixel of movement (up -> down) is about 0.5 meter and when vehicle is near the camera and is larger 5 pixels is about 0.2 meter (Numbers are just for example).
Update:
Using a third point between Photo 1 and Photo 2 I created a polynomial function with two degrees (a + bx + cx^2) and now I can calculate distance. I measured speed and the result is about ±4 Km/h.
In the following picture X is start height of the license plate in pixels and Y is the distance in meters.
I also posted this topic in the Q&A forum at opencv.org but I don't know how many experts from here are reading this forum - so forgive me that I'm also trying it here.
I'm currently learning OpenCV and my current task is to measure the distance between two balls which are lying on a plate. My next step is to compare several cameras and resolutions to get a feeling how important resolution, noise, distortion etc. is and how heavy these parameters affect the accuracy. If the community is interested in the results I'm happy to share the results when they are ready! The camera is placed above the plate using a wide-angle lens. The width and height of the plate (1500 x 700 mm) and the radius of the balls (40 mm) are known.
My steps so far:
camera calibration
undistorting the image (the distortion is high due to the wide-angle lens)
findHomography: I use the corner points of the plate as input (4 points in pixels in the undistorted image) and the corner points in millimeters (starting with 0,0 in the lower left corner, up to 1500,700 in the upper right corner)
using HoughCircles to find the balls in the undistorted image
applying perspectiveTransform on the circle center points => circle center points now exist in millimeters
calculation the distance of the two center points: d = sqrt((x1-x2)^2+(y1-y2)^2)
The results: an error of around 4 mm at a distance of 300 mm, an error of around 25 mm at a distance of 1000 mm But if I measure are rectangle which is printed on the plate the error is smaller than 0.2 mm, so I guess the calibration and undistortion is working good.
I thought about this and figured out three possible reasons:
findHomography was applied to points lying directly on the plate whereas the center points of the balls should be measured in the equatorial height => how can I change the result of findHomography to change this, i.e. to "move" the plane? The radius in mm is known.
the error increases with increasing distance of the ball to the optical center because the camera will not see the ball from the top, so the center point in the 2D projection of the image is not the same as in the 3D world - I will we projected further to the borders of the image. => are there any geometrical operations which I can apply on the found center to correct the value?
during undistortion there's probably a loss of information, because I produce a new undistorted image and go back to pixel accuracy although I have many floating point values in the distortion matrix. Shall I search for the balls in the distorted image and tranform only the center points with the distortion matrix? But I don't know what's the code for this task.
I hope someone can help me to improve this and I hope this topic is interesting for other OpenCV-starters.
Thanks and best regards!
Here are some thoughts to help you along... By no means "the answer", though.
First a simple one. If you have calibrated your image in mm at a particular plane that is distance D away, then points that are r closer will appear larger than they are. To get from measured coordinates to actual coordinates, you use
Actual = measured * (D-r)/D
So since the centers of the spheres are radius r above the plane, the above formula should answer part 1 of your question.
Regarding the second question: if you think about it, the center of the sphere that you see should be in the right place "in the plane of the center of the sphere", even though you look at it from an angle. Draw yourself a picture to convince yourself this is so.
Third question: if you find the coordinates of the spheres in the distorted image, you should be able to transform them to the corrected image using perspectiveTransform. This may improve accuracy a little bit - but I am surprised at the size of errors you see. How large is a single pixel at the largest distance (1000mm)?
EDIT
You asked about elliptical projections etc. Basically, if you think of the optical center of the camera as a light source, and look at the shadow of the ball onto the plane as your "2D image", you can draw a picture of the rays that just hit the sides of the ball, and determine the different angles:
It is easy to see that P (the mid point of A and B) is not the same as C (the projection of the center of the sphere). A bit more trig will show you that the error C - (A+B)/2 increases with x and decreases with D. If you know A and B you can calculate the correct position of C (given D) from:
C = D * tan( (atan(B/D) + atan(A/D)) / 2 )
The error becomes larger as D is smaller and/or x is larger. Note D is the perpendicular (shortest) distance from the lens to the object plane.
This only works if the camera is acting like a "true lens" - in other words, there is no pincushion distortion, and a rectangle in the image plane maps into a rectangle on the sensor. The above combined with your own idea to fit in the uncorrected ('pixel') space, then transform the centers found with perspectiveTransform, ought to get you all the way there.
See what you can do with that!
I really hope this isn't a waste of anyone's time but I've run into a small problem. I am able to construct the transformation matrix using the following:
M =
s*cos(theta) -s*sin(theta) t_x
s*sin(theta) s*cos(theta) t_y
0 0 1
This works if I give the correct values for theta, s (scale) and tx/ty and then use this matrix as one of the arguments for cv::warpPerspective. The problem lies in that this matrix rotates about the (0,0) pixel whereas I would like it to rotate about the centre pixel (cols/2, rows/2). How can incoporate the centre point rotation into this matrix?
Two possibilities. The first is to use the function getRotationMatrix2D which takes the center of rotation as an argument, and gives you a 2x3 matrix. Add the third row and you're done.
A second possibility is to construct an additional matrix that translates the picture before and after the rotation:
T =
1 0 -cols/2
0 1 -rows/2
0 0 1
Multiply your rotation matrix M with this one to get the total transform -TMT (e.g. with function gemm) and apply this one with warpPerspective.
I'm trying to estimate the relative camera pose using OpenCV. Cameras in my case are calibrated (i know the intrinsic parameters of the camera).
Given the images captured at two positions, i need to find out the relative rotation and translation between two cameras. Typical translation is about 5 to 15 meters and yaw angle rotation between cameras range between 0 - 20 degrees.
For achieving this, following steps are adopted.
a. Finding point corresponding using SIFT/SURF
b. Fundamental Matrix Identification
c. Estimation of Essential Matrix by E = K'FK and modifying E for singularity constraint
d. Decomposition Essential Matrix to get the rotation, R = UWVt or R = UW'Vt (U and Vt are obtained SVD of E)
e. Obtaining the real rotation angles from rotation matrix
Experiment 1: Real Data
For real data experiment, I captured images by mounting a camera on a tripod. Images captured at Position 1, then moved to another aligned Position and changed yaw angles in steps of 5 degrees and captured images for Position 2.
Problems/Issues:
Sign of the estimated yaw angles are not matching with ground truth yaw angles. Sometimes 5 deg is estimated as 5deg, but 10 deg as -10 deg and again 15 deg as 15 deg.
In experiment only yaw angle is changed, however estimated Roll and Pitch angles are having nonzero values close to 180/-180 degrees.
Precision is very poor in some cases the error in estimated and ground truth angles are around 2-5 degrees.
How to find out the scale factor to get the translation in real world measurement units?
The behavior is same on simulated data also.
Have anybody experienced similar problems as me? Have any clue on how to resolve them.
Any help from anybody would be highly appreciated.
(I know there are already so many posts on similar problems, going trough all of them has not saved me. Hence posting one more time.)
In chapter 9.6 of Hartley and Zisserman, they point out that, for a particular essential matrix, if one camera is held in the canonical position/orientation, there are four possible solutions for the second camera matrix: [UWV' | u3], [UWV' | -u3], [UW'V' | u3], and [UW'V' | -u3].
The difference between the first and third (and second and fourth) solutions is that the orientation is rotated by 180 degrees about the line joining the two cameras, called a "twisted pair", which sounds like what you are describing.
The book says that in order to choose the correct combination of translation and orientation from the four options, you need to test a point in the scene and make sure that the point is in front of both cameras.
For problems 1 and 2,
Look for "Euler angles" in wikipedia or any good math site like Wolfram Mathworld. You would find out the different possibilities of Euler angles. I am sure you can figure out why you are getting sign changes in your results based on literature reading.
For problem 3,
It should mostly have to do with the accuracy of our individual camera calibration.
For problem 4,
Not sure. How about, measuring a point from camera using a tape and comparing it with the translation norm to get the scale factor.
Possible reasons for bad accuracy:
1) There is a difference between getting reasonable and precise accuracy in camera calibration. See this thread.
2) The accuracy with which you are moving the tripod. How are you ensuring that there is no rotation of tripod around an axis perpendicular to surface during change in position.
I did not get your simulation concept. But, I would suggest the below test.
Take images without moving the camera or object. Now if you calculate relative camera pose, rotation should be identity matrix and translation should be null vector. Due to numerical inaccuracies and noise, you might see rotation deviation in arc minutes.
I was wondering how would I figure out the actual size of the object, using the kinect depth values.
For example, if the kinect sees a round object in front of it, and the round object take 100 pixels of space in the image, and the depth value the kinect gives is x, how would I know the actual size of the round object?
I don't need it in units like meters or anything, I am just trying to find a formula to calculate the size of object that is independant from how far the object is from the kinect.
I am using OpenCV and the kinect SDK, if anything is useful there please let me know.
Thanks in advance.
To find the size in 3d, given a size in 2d, you just do:
3d_rad = 2d_rad * depth
So if the ball appears on the screen as 10 pixels wide and is 1 metre away, it really is 10 "units" wide. Do a little playing to find out the units returned, I'm unsure what they will be.
Suppose you have a 20 pixel radius ball on screen and the depth is returned as 30, the real size of the ball is 20*30 = 600 units. Again, I'm unsure what unit exactly, it depends on the camera, but it is a constant so play around with it. Put a 1 metre ball in front of the camera, far enough away that it looks like 100 pixels. The reciprocal of that distance should be the conversion factor to turn the units you have into centimetres and can be used as a constant. For example:
3d_rad_in_cm = conversion * 2d_rad * depth