Fisheye correction formula, need explanation - image-processing

After extensive search, I have not been able to find a decently explained formula for WebGL fisheye image correction. A Shadertoy at fisheye/antifisheye shows a formula
uv = m + normalize(d) * atan(r * -power * 10.0) * bind / atan(-power * bind * 10.0);
which is literally described as "weird formula". It somehow follows Paul Burke's work on lens distortion correction here, but I do not see the connection. In my application, the formula boils down to (values are hand-tuned to my lens and webcam):
uv = centerPoint + normalize(d) * atan(r * pi) * 1/3 / atan(pi/3)
Where r is the distance of a pixel from center of the image, d is the unit vector in that direction and centerPoint is, well, the center of the image. I don't understand how can the arctangent be tied directly to the coordinates, can anyone help me get it? I do get that the part of the formula with arc tangents is calculation of pixel distance from the image center, what I do not understand is how is that computed.
Thanks!

Ok, so after some searching and asking around, I understood what happens in the formula. A diagram and basic formulas are on the image below: ]1. Since the xy' position is known - it's the target position on the texture, we calculate xy - source pixel on the fisheye image. R is the fisheye image radius. We can substitute a desired value of d depending on maximum view angle we want to achieve (since tangent function goes to infinity at 90 deg and tan(B) = 1/d, it's nice to take a reasonable value).
After transformations, we get to:
xy= atan(xy'/d)* 2R/pi
which is the theoretically correct formula for equidistant projection, which we assume is performed in the lens. The formula I referenced in the original post had something else instead of 2R/pi and it still worked because of imperfections in the lens - it most probably has some strange function we'll never know and it worked as an approximation.
There, I hope it was understandable, in case of any questions, I'll be happy to answer them :)

Related

Determine a "tangential quadrilateral" from 4 points in OpenCv

what I'm trying to do is getting a tangential quadrilateral from 4 points using OpenCv.
I'm tried an approach where I just take the center of the four points and adding a circle. But this is not always true. Further its very hard to determine the radius of the circle.
Shortly: A tangential quadrilateral is a circle which lies completely within a square. e.g.:
Source: https://commons.wikimedia.org/wiki/File:Tangentenviereck.svg CC BY-SA 4.0
Is there a way in OpenCv for this?
If you have 4 points A,B, C, D you definitely already have quadrilateral (four-side polygon).
It is not guaranteed that this tangential is tangential - it is true only if sums of opposite side lengths are equal.
If you really have vertices of tangential tangential - find length of sides a,b,c,d and diagonals p,q and get incircle radius as
r = Sqrt(4*p^2*q^2-(a^2-b^2+c^2-d^2)^2)) / (2*(a+b+c+d))
There is a lot of formulas for incircle center at wiki page, but I'd use trigonometric approach - get bisector vector of A angle as sum of normalized AB and AD vectors, normalize it, multiply by length |AM|=r/tg(A/2) and add resulting vector to A.
Note that OpenCV is library for image processing, not for geometric calculations.

OpenCV: solvePnP detection problems

I've got problem with precise detection of markers using OpenCV.
I've recorded video presenting that issue: http://youtu.be/IeSSW4MdyfU
As you see I'm markers that I'm detecting are slightly moved at some camera angles. I've read on the web that this may be camera calibration problems, so I'll tell you guys how I'm calibrating camera, and maybe you'd be able to tell me what am I doing wrong?
At the beginnig I'm collecting data from various images, and storing calibration corners in _imagePoints vector like this
std::vector<cv::Point2f> corners;
_imageSize = cvSize(image->size().width, image->size().height);
bool found = cv::findChessboardCorners(*image, _patternSize, corners);
if (found) {
cv::Mat *gray_image = new cv::Mat(image->size().height, image->size().width, CV_8UC1);
cv::cvtColor(*image, *gray_image, CV_RGB2GRAY);
cv::cornerSubPix(*gray_image, corners, cvSize(11, 11), cvSize(-1, -1), cvTermCriteria(CV_TERMCRIT_EPS+ CV_TERMCRIT_ITER, 30, 0.1));
cv::drawChessboardCorners(*image, _patternSize, corners, found);
}
_imagePoints->push_back(_corners);
Than, after collecting enough data I'm calculating camera matrix and coefficients with this code:
std::vector< std::vector<cv::Point3f> > *objectPoints = new std::vector< std::vector< cv::Point3f> >();
for (unsigned long i = 0; i < _imagePoints->size(); i++) {
std::vector<cv::Point2f> currentImagePoints = _imagePoints->at(i);
std::vector<cv::Point3f> currentObjectPoints;
for (int j = 0; j < currentImagePoints.size(); j++) {
cv::Point3f newPoint = cv::Point3f(j % _patternSize.width, j / _patternSize.width, 0);
currentObjectPoints.push_back(newPoint);
}
objectPoints->push_back(currentObjectPoints);
}
std::vector<cv::Mat> rvecs, tvecs;
static CGSize size = CGSizeMake(_imageSize.width, _imageSize.height);
cv::Mat cameraMatrix = [_userDefaultsManager cameraMatrixwithCurrentResolution:size]; // previously detected matrix
cv::Mat coeffs = _userDefaultsManager.distCoeffs; // previously detected coeffs
cv::calibrateCamera(*objectPoints, *_imagePoints, _imageSize, cameraMatrix, coeffs, rvecs, tvecs);
Results are like you've seen in the video.
What am I doing wrong? is that an issue in the code? How much images should I use to perform calibration (right now I'm trying to obtain 20-30 images before end of calibration).
Should I use images that containg wrongly detected chessboard corners, like this:
or should I use only properly detected chessboards like these:
I've been experimenting with circles grid instead of of chessboards, but results were much worse that now.
In case of questions how I'm detecting marker: I'm using solvepnp function:
solvePnP(modelPoints, imagePoints, [_arEngine currentCameraMatrix], _userDefaultsManager.distCoeffs, rvec, tvec);
with modelPoints specified like this:
markerPoints3D.push_back(cv::Point3d(-kMarkerRealSize / 2.0f, -kMarkerRealSize / 2.0f, 0));
markerPoints3D.push_back(cv::Point3d(kMarkerRealSize / 2.0f, -kMarkerRealSize / 2.0f, 0));
markerPoints3D.push_back(cv::Point3d(kMarkerRealSize / 2.0f, kMarkerRealSize / 2.0f, 0));
markerPoints3D.push_back(cv::Point3d(-kMarkerRealSize / 2.0f, kMarkerRealSize / 2.0f, 0));
and imagePoints are coordinates of marker corners in processing image (I'm using custom algorithm to do that)
In order to properly debug your problem I would need all the code :-)
I assume you are following the approach suggested in the tutorials (calibration and pose) cited by #kobejohn in his comment and so that your code follows these steps:
collect various images of chessboard target
find chessboard corners in images of point 1)
calibrate the camera (with cv::calibrateCamera) and so obtain as a result the intrinsic camera parameters (let's call them intrinsic) and the lens distortion parameters (let's call them distortion)
collect an image of your own custom target (the target is seen at 0:57 in your video) and it is shown in the following figure and find some relevant points in it (let's call the point you found in image image_custom_target_vertices and world_custom_target_vertices the corresponding 3D points).
estimate the rotation matrix (let's call it R) and the translation vector (let's call it t) of the camera from the image of your own custom target you get in point 4), with a call to cv::solvePnP like this one cv::solvePnP(world_custom_target_vertices,image_custom_target_vertices,intrinsic,distortion,R,t)
giving the 8 corners cube in 3D (let's call them world_cube_vertices) you get the 8 2D image points (let's call them image_cube_vertices) by means of a call to cv2::projectPoints like this one cv::projectPoints(world_cube_vertices,R,t,intrinsic,distortion,image_cube_vertices)
draw the cube with your own draw function.
Now, the final result of the draw procedure depends on all the previous computed data and we have to find where the problem lies:
Calibration: as you observed in your answer, in 3) you should discard the images where the corners are not properly detected. You need a threshold for the reprojection error in order to discard "bad" chessboard target images. Quoting from the calibration tutorial:
Re-projection Error
Re-projection error gives a good estimation of just how exact is the
found parameters. This should be as close to zero as possible. Given
the intrinsic, distortion, rotation and translation matrices, we first
transform the object point to image point using cv2.projectPoints().
Then we calculate the absolute norm between what we got with our
transformation and the corner finding algorithm. To find the average
error we calculate the arithmetical mean of the errors calculate for
all the calibration images.
Usually you will find a suitable threshold with some experiments. With this extra step you will get better values for intrinsic and distortion.
Finding you own custom target: it does not seem to me that you explain how you find your own custom target in the step I labeled as point 4). Do you get the expected image_custom_target_vertices? Do you discard images where that results are "bad"?
Pose of the camera: I think that in 5) you use intrinsic found in 3), are you sure nothing is changed in the camera in the meanwhile? Referring to the Callari's Second Rule of Camera Calibration:
Second Rule of Camera Calibration: "Thou shalt not touch the lens
after calibration". In particular, you may not refocus nor change the
f-stop, because both focusing and iris affect the nonlinear lens
distortion and (albeit less so, depending on the lens) the field of
view. Of course, you are completely free to change the exposure time,
as it does not affect the lens geometry at all.
And then there may be some problems in the draw function.
So, I've experimented a lot with my code, and I still haven't fixed the main issue (shifted objects), but I've managed to answer some of calibration questions I've asked.
First of all - in order to obtain good calibration results you have to use images with properly detected grid elements/circles positions!. Using all captured images in calibration process (even those that aren't properly detected) will result bad calibration.
I've experimented with various calibration patterns:
Asymmetric circles pattern (CALIB_CB_ASYMMETRIC_GRID), give much worse results than any other pattern. By worse results I mean that it produces a lot of wrongly detected corners like these:
I've experimented with CALIB_CB_CLUSTERING and it haven't helped much - in some cases (different light environment) it got better, but not much.
Symmetric circles pattern (CALIB_CB_SYMMETRIC_GRID) - better results than asymmetric grid, but still I've got much worse results than standard grid (chessboard). It often produces errors like these:
Chessboard (found using findChessboardCorners function) - this method is producing best possible results - it doesn't produce misaligned corners very often, and almost every calibration is producing similar results to best-possible results from symmetric circles grid
For every calibration I've been using 20-30 images that were coming from different angles. I've tried even with 100+ images but it haven't produced noticeable change in calibration results than smaller amount of images. It's worth noticing that larger number of test images is increasing time needed to compute camera parameters in non-linear way (100 test images in 480x360 resolution are computing 25 minutes in iPad4, compared with 4 minutes with ~50 images)
I've also experimented with solvePNP parameters - but is also haven't gave me any acceptable results: I've tried all 3 detection methods (ITERATIVE, EPNP and P3P), but I haven't seen aby noticeable change.
Also I've tried with useExtrinsicGuess set to true, and I've used rvec and tvec from previous detection, but this one resulted with complete disapperance of detected cube.
I've ran out of ideas - what else could be affecting these shifting problems?
For those still interested:
this is an old question, but I think your problem is not the bad calibration.
I developed an AR app for iOS, using OpenCV and SceneKit, and I have had your same issue.
I think your problem is the wrong render position of the cube:
OpenCV's solvePnP returns the X, Y, Z coordinates of the marker center, but you wanna render the cube over the marker, at a specific distance along the Z axis of the marker, exactly at one half of the cube side size. So you need to improve the Z coordinate of the marker translation vector of this distance.
In fact, when you see your cube from the top, the cube is render properly.
I have done an image in order to explain the problem, but my reputation prevent to post it.

Open CV - Several Methods for SfM

I got a task:
We have a system working where a camera does a halfcircle around a human head. We know the camera matrix and the rotation/translation of every frame. (Distortion and more... but I want first to work without these parameters)
My task is that I have only the Camera Matrix, which is constant over this move, and the images (more than 100). Now I have to get the translation and rotation from frame by frame and compare it with the rotation and translation in real world (from the system which I have but only for compare, I have too prove it!)
First steps I did so far:
use the robustMatcher from the OpenCV Cookbook - works finde - 40-70 Matches each frame - visible looks it very good!
I get the fundamentalMatrix with getFundamental(). I use the robust Points from robustMatcher and RANSAC.
When I got the F i can get the Essentialmatrix E with my CameraMatrix K like this:
cv::Mat E = K.t() * F * K; //Found at the Bible HZ Chapter 9.12
Now we need to extract R and t out of E with SVD. By the way camera1 position is just zero because we have to start somewhere.
cv::SVD svd(E);
cv::SVD svd(E);
cv::Matx33d W(0,-1,0, //HZ 9.13
1,0,0,
0,0,1);
cv::Matx33d Wt(0,1,0,//W^
-1,0,0,
0,0,1);
cv::Mat R1 = svd.u * cv::Mat(W) * svd.vt; //HZ 9.19
cv::Mat R2 = svd.u * cv::Mat(Wt) * svd.vt; //HZ 9.19
//R1 or R2???
R = R1; //R2
//t=+u3 or t=-u3?
t = svd.u.col(2); //=u3
This is my actual status!
My plans are:
triangulate all points to get 3D points
Join frame i with frame i++
Visualize my 3D points them somehow!
Now my Questions are:
is this robust matcher dated? is there a other method?
Is it wrong to use this points as descriped at my second step? Must they be converted with distortion or something?
What R and t is this i extract here? Is it the rotation and translation between camera1 and camera2 with point of view from camera1?
When I read at the bible or papers or elsewhere i find that there are 4 possibilities how R and t can be!
´P′ = [UWV^T |+u3] oder [UWV^T |−u3] oder [UW^TV^T |+u3] oder [UW^TV^T |−u3]´
P´ is the projectionmatrix of the second image.
That means t could be - or + and R could be total different?!
I found out that I should calculate one point into 3D and find out if this point is infront of both cameras, then I have found the correct matrix!
I found some of this code at the internet and he just said this no further calculating:
cv::Mat R1 = svd.u * cv::Mat(W) * svd.vt
and
t = svd.u.col(2); //=u3
Why is this correct? If it isn't - how would I do this triangulation in OpenCV?
I compared this translation to the translation which is given to me. (First i had to transfer the translation and rotation in relationship to camera1 but I got this now!) But its not the same. The values of my program are just lets call it jumping from plus too minus. But it should be more constant because the camera is moving in a constant circle.
I am sure that some axes may be switched. I know that the translation is only from -1 till 1 but I thought I could extract a factor from my results to my comparevalues and then it should be similiar.
Does somebody have done something like this before?
Many people doing a camera calibration by using a chessboard, but I can't use this method to get the extrinsic parameters.
I know that visual sfm can do this somehow. (At youtube is a video where someone walks around a tree and get from these pictures a reconstruction of this tree using visual sfm)
This is pretty the same what I have to do.
Last question:
Does somebody know an easy way to visualize my 3D Points? I prefere MeshLab. Some experience with that?
Many people doing a camera calibration by using a chessboard, but I can't use this method to get the extrinsic parameters.
A chess board or checker board is used to find the internal/intrinsic matrix/parameters, not the extrinsic parameters. You're saying you have got the internal matrix already, I suppose that's what you meant by
We know the camera matrix and ...
Those videos you have seen on youtube have done the same, the camera is already calibrated, that is the internal matrix is known.
is this robust matcher dated? is there a other method?
I don't have that book so cant see the code and answer this.
Is it wrong to use this points as descriped at my second step? Must they be converted with distortion or something?
You need to cancel the radial distortion first, see undistortPoints.
What R and t is this i extract here? Is it the rotation and translation between camera1 and camera2 with point of view from camera1?
R is the orientation of the second camera in the first camera's coordinate system. And T is position of the second camera in that coordinate system. These have several usages.
When I read at the bible or papers or elsewhere i find that there are 4 possibilities how ....
Read the relevant section of the bible, this is very well explained there, triangulation is naive method, a better approach is explained there.
Does somebody know an easy way to visualize my 3D Points?
To see them in Meshlab a very easy way is to save the coordinate of the 3D points in a PLY file, this is an extremely simple format and supported by Meshlab and almost all other 3D model viewers.
In this article "An Efficient Solution to the Five-Point Relative Pose Problem", Nistér explain a very good method to determine which of the four configurations it the correct one (talking about R and T).
I've tried the robust matcher and I think is quiet good. The problems that has this matcher is that is really slow because it uses SURF, maybe you should try with others detectors and extractors to improve the speed.I also believe that the function in OpenCV that calculates the fundamental matrix does not need the Ransac parameter because the methods rate and symmetry do a great job removing the outliers, you should try the 8-point parameter.
OpenCV has the function triangulate, this only needs two projection Matrices, points that are in the first and the second image. Check the calib3d module.

opencv update homography matrix to fit on an image double the size

I'm doing video stabilization using optical flow. To make calcOpticalFlowPyrLK work faster I'm downscaling the original image 2x and running the function on that.
How can I modify the homograph matrix (retrieved via findHomography) to be able to warpPerspective the original, larger, image.
This is a little late and the answer you have works fine but I have one thing to add. I don't like taking functions like getPerspectiveTransform for granted. In this case it is easy to just make the matrix yourself. Image reductions that are powers of 2 are easy. Suppose you have a point and you want to move it to an image with twice the resolution.
float newx = (oldx+.5)*2 - .5;
float newy = (oldy+.5)*2 - .5;
conversely, to go to an image of half the resolution...
float newx = (oldx+.5)/2 - .5;
float newy = (oldy+.5)/2 - .5;
Draw yourself a diagram if you need to and convince yourself it works, remember 0 indexing. Instead of thinking about making your transformation work on other resolutions, think about moving every point to the resolution of your transform, then using your transform, then moving it back. Fortunately you can do all of this in 1 matrix, we just need to build that matrix! First build a matrix for each of the three steps
//move point to an image of half resolution, note it is equivalent to the above equation
project_down=(.5,0,-.25,
0,.5,-.25,
0, 0, 1)
//move point to an image of twice resolution, these are inverses of one another
project_up=(2,0,.5,
0,2,.5,
0, 0,1)
To make your final transformation just combine them
final_transform = [project_up][your_homography][project_down];
The nice thing is you only have to do this once for any given homography. This should work the same as getPerspectiveTransform (and probably run faster). Hopefully understanding this will help you deal with other questions you may run into regarding image resolution changes.
Let B be the transformation you have computed, you can multiply B by another homography, A, to get AB = C, where C is a homography that does both transformations, this is equivalent to apply first B and then A. To find A you can use getPerspectiveTransform.
Edit: by AB I meant matrix multiplication, not element-wise multiplication.
Edit 2: to get A you pass the four corners of the two images in the same order to getPerspectiveTransform such that the corners of the downsampled image are the source points and the corners of the original image are the destination points.

XNA 4.0 Camera Question

I'm having trouble understanding how the camera works in my test application. I've been able to piece together a working camera - now I am trying to make sure I understand how it all works. My camera is encapsulated in its own class. Here is the update method that gets called from my Game.Update() method:
public void Update(float dt)
{
Yaw += (200 - Game.MouseState.X) * dt * .12f;
Pitch += (200 - Game.MouseState.Y) * dt * .12f;
Mouse.SetPosition(200, 200);
_worldMatrix = Matrix.CreateFromAxisAngle(Vector3.Right, Pitch) * Matrix.CreateFromAxisAngle(Vector3.Up, Yaw);
float distance = _speed * dt;
if (_game.KeyboardState.IsKeyDown(Keys.E))
MoveForward(distance);
if (_game.KeyboardState.IsKeyDown(Keys.D))
MoveForward(-distance);
if (_game.KeyboardState.IsKeyDown(Keys.S))
MoveRight(-distance);
if (_game.KeyboardState.IsKeyDown(Keys.F))
MoveRight(distance);
if (_game.KeyboardState.IsKeyDown(Keys.A))
MoveUp(distance);
if (_game.KeyboardState.IsKeyDown(Keys.Z))
MoveUp(-distance);
_worldMatrix *= Matrix.CreateTranslation(_position);
_viewMatrix = Matrix.Invert(_worldMatrix); // What's gong on here???
}
First of all, I understand everything in this method other than the very last part where the matrices are being manipulated. I think the terminology is getting in my way as well. For example, my _worldMatrix is really a Rotation Matrix. What really baffles me is the part where the _viewMatrix is calculated by inverting the _worldMatrix. I just don't understand what this is all about.
In prior testing, I always used Matrix.CreateLookAt() to create a view matrix, so I'm a bit confused. I'm hoping someone can explain in simple terms what is going on.
Thanks,
-Scott
One operation the view matrix does for the graphics pipeline is that it converts a 3d point from world space (the x, y, z, we all know & love) into view (or camera) space, a space where the camera is considered to be the center of the world (0,0,0) and all points/objects are relative to it. So while a point may be at 1,1,1 relative to the world, what are it's cordinates relative to the camera location? Well, as it turns out, to find out, you can transform that point by the inverse of a matrix representing the camera's world space position/rotation.
It kinda makes sense if you think about it... let's say the camera position is 2,2,2. An arbitrary point is at 3,3,3. We know that the point is 1,1,1 away from the camera, right? so what transformation would you apply to the point 3,3,3 in order for it to become 1,1,1 (it's location relative to the camera)? you would transform 3,3,3 by -2,-2,-2 to result in 1,1,1. -2,-2,-2 is also the camera's inverted position. That example was for translation because it is relatively easy to groc but basically the same happens for rotation. But don't expect to be able to simply negate all basis vectors to invert a matrix... there is a little more going on with that for rotation.
The Matrix.CreateLookAt() method automatically returns the inverted matrix so you don't really notice it happening unless you reflect its code.
Taking that one step further, the Projection matrix then takes that point in view space and projects it onto a flat surface and that point that started out in 3d space is now in 2d space.

Resources