I am attempting to map a fisheye image to a 360 degree view using a sky sphere in Unity. The scene is inside the sphere. I am very close but I am seeing some slight distortion. I am calculating UV coordinates as follows:
Vector3 v = currentVertice; // unit vector from edge of sphere, -1, -1, -1 to 1, 1, 1
float r = Mathf.Atan2(Mathf.Sqrt(v.x * v.x + v.y * v.y), v.z) / (Mathf.PI * 2.0f);
float phi = Mathf.Atan2(v.y, v.x);
textureCoordinates.x = (r * Mathf.Cos(phi)) + 0.5f;
textureCoordinates.y = (r * Mathf.Sin(phi)) + 0.5f;
Here is the distortion and triangles:
The rest of the entire sphere looks great, it's just at this one spot that I get the distortion.
Here is the source fish eye image:
And the same sphere with a UV test texture over the top showing the same distortion area. Full UV test texture is on the right, and is a square although stretched into a rectangle on the right for purposes of my screenshot.
The distortion above is using sphere mapping rather than fish eye mapping. Here is the UV texture using fish eye mapping:
Math isn't my strong point, am I doing anything wrong here or is this kind of mapping simply not 100% possible?
The spot you are seeing is the case where r gets very close to 1. As you can see in the source image, this is the border area between the very distorted image data and the black.
This area is very distorted, however that's not the main problem. Looking at the result you can see that there are problems with UV orientation.
I've added a few lines to your source image to demonstrate what I mean. Where r is small (yellow lines) you can see that the UV coordinates can be interpolated between the corners of your quad (assuming quads instead of tris). However, where r is big (red corners), interpolating UV coordinates will make them travel through areas of your source image whose r is much smaller than 1 (red lines), causing distortions in UV space. Actually, those red lines should not be straight, but they should travel along the border of your source image data.
You can improve this by having a higher polycount in the area of your skysphere where r gets close to 1, but it will never be perfect as long as your UVs are interpolated in a linear way.
I also found another problem. If you look closely at the spot, you'll find that the complete source image is present there in small. This is because your UV coordinates wrap around at that point. As rendering passes around the viewer, uv coordinates travel from 0 towards 1. At the spot they are at 1, the neighboring vertex however is at 0.001 or something, causing the whole source image to be rendered inbetween. To fix that, you'll need to have two seperate vertices at the seam of your skysphere, one where the surface of the sphere starts, and one where it ends. In object space they are identical, but in uv space one is at 0, the other at 1.
Related
I am currently trying to map textures using image labels onto 2 different triangles (because im using right angle wedges so i need 2 to make scalene triangles), but here is the problem, I can only set positional, size, and rotational data so I need to figure out how I can use this information to correctly map the texture onto the triangle
the position is based on the topleft corner and size of triangle (<1,1> corner is at the bottom right and <0,0> corner is at top left) and the size is based on triangle size also (<1,1> is same size as triangle and <0,0> is infinitely tiny) and rotation is central based.
I have the UV coordinates (given 0-1) and face vertices, all from an obj file. The triangles in 3D are made up of 2 wedges which are split at a right angle from the longest surface and from the opposite angle.
I don't quite understand this however it may be help to change the canvas properties on the Surface GUI
Given an object's 3D mesh file and an image that contains the object, what are some techniques to get the orientation/pose parameters of the 3d object in the image?
I tried searching for some techniques, but most seem to require texture information of the object or at least some additional information. Is there a way to get the pose parameters using just an image and a 3d mesh file (wavefront .obj)?
Here's an example of a 2D image that can be expected.
FOV of camera
Field of view of camera is absolute minimum to know to even start with this (how can you determine how to place object when you have no idea how it would affect scene). Basically you need transform matrix that maps from world GCS (global coordinate system) to Camera/Screen space and back. If you do not have a clue what about I am writing then perhaps you should not try any of this before you learn the math.
For unknown camera you can do some calibration based on markers or etalones (known size and shape) in the view. But much better is use real camera values (like FOV angles in x,y direction, focal length etc ...)
The goal for this is to create function that maps world GCS(x,y,z) into Screen LCS(x,y).
For more info read:
transform matrix anatomy
3D graphic pipeline
Perspective projection
Silhouette matching
In order to compare rendered and real image similarity you need some kind of measure. As you need to match geometry I think silhouette matching is the way (ignoring textures, shadows and stuff).
So first you need to obtain silhouettes. Use image segmentation for that and create ROI mask of your object. For rendered image is this easy as you van render the object with single color without any lighting directly into ROI mask.
So you need to construct function that compute the difference between silhouettes. You can use any kind of measure but I think you should start with non overlapping areas pixel count (it is easy to compute).
Basically you count pixels that are present only in one ROI (region of interest) mask.
estimate position
as you got the mesh then you know its size so place it in the GCS so rendered image has very close bounding box to real image. If you do not have FOV parameters then you need to rescale and translate each rendered image so it matches images bounding box (and as result you obtain only orientation not position of object of coarse). Cameras have perspective so the more far from camera you place your object the smaller it will be.
fit orientation
render few fixed orientations covering all orientations with some step 8^3 orientations. For each compute the difference of silhouette and chose orientation with smallest difference.
Then fit the orientation angles around it to minimize difference. If you do not know how optimization or fitting works see this:
How approximation search works
Beware too small amount of initial orientations can cause false positioves or missed solutions. Too high amount will be slow.
Now that was some basics in a nutshell. As your mesh is not very simple you may need to tweak this like use contours instead of silhouettes and using distance between contours instead of non overlapping pixels count which is really hard to compute ... You should start with simpler meshes like dice , coin etc ... and when grasping all of this move to more complex shapes ...
[Edit1] algebraic approach
If you know some points in the image that coresponds to known 3D points (in your mesh) then you can along with the FOV of the camera used compute the transform matrix placing your object ...
if the transform matrix is M (OpenGL style):
M = xx,yx,zx,ox
xy,yy,zy,oy
xz,yz,zz,oz
0, 0, 0, 1
Then any point from your mesh (x,y,z) is transformed to global world (x',y',z') like this:
(x',y',z') = M * (x,y,z)
The pixel position (x'',y'') is done by camera FOV perspective projection like this:
y''=FOVy*(z'+focus)*y' + ys2;
x''=FOVx*(z'+focus)*x' + xs2;
where camera is at (0,0,-focus), projection plane is at z=0 and viewing direction is +z so for any focal length focus and screen resolution (xs,ys):
xs2=xs*0.5;
ys2=ys*0.5;
FOVx=xs2/focus;
FOVy=ys2/focus;
When put all this together you obtain this:
xi'' = ( xx*xi + yx*yi + zx*zi + ox ) * ( xz*xi + yz*yi + zz*zi + ox + focus ) * FOVx
yi'' = ( xy*xi + yy*yi + zy*zi + oy ) * ( xz*xi + yz*yi + zz*zi + oy + focus ) * FOVy
where (xi,yi,zi) is i-th known point 3D position in mesh local coordinates and (xi'',yi'') is corresponding known 2D pixel positions. So unknowns are the M values:
{ xx,xy,xz,yx,yy,yx,zx,zy,zz,ox,oy,oz }
So we got 2 equations per each known point and 12 unknowns total. So you need to know 6 points. Solve the system of equations and construct your matrix M.
Also you can exploit that M is a uniform orthogonal/orthonormal matrix so vectors
X = (xx,xy,xz)
Y = (yx,yy,yz)
Z = (zx,zy,zz)
Are perpendicular to each other so:
(X.Y) = (Y.Z) = (Z.X) = 0.0
Which can lower the number of needed points by introducing these to your system. Also you can exploit cross product so if you know 2 vectors the thirth can be computed
Z = (X x Y)*scale
So instead of 3 variables you need just single scale (which is 1 for orthonormal matrix). If I assume orthonormal matrix then:
|X| = |Y| = |Z| = 1
so we got 6 additional equations (3 x dot, and 3 for cross) without any additional unknowns so 3 point are indeed enough.
I think the easiest is to explain problem with image:
I have two cubes (same size) that are laying on the table. One of their side is marked with green color (for easy tracking). I want to calculate the relative position (x,y) of left cube to the right cube (red line on the picture) in cube size unit.
Is it even possible? I know problem would be simple if those two green sides would have common plane - like top side of cube however I can't use that for tracking. I would just calculate homography for one square and multiply with other cube corner.
Should I 'rotate' homography matrix by multiplying with 90deegre rotation matrix to get 'ground' homography? I plan to do processing in smartphone scenario so maybe gyroscope, camera intrinsic params can be of any value.
This is possible.
Let's assume (or state) that the table is the z=0-plane and that your first box is at the origin of this plane. This means that green corners of the left box have the (table-)coordinates (0,0,0),(1,0,0),(0,0,1) and (1,0,1). (Your box has the size 1).
You also have the pixel coordinates of these points. If you give these 2d and 3d-values (as well as the intrinsics and distortion of the camera) to cv::solvePnP, you get the relative Pose of the camera to your box (and the plane).
In the next step, you have to intersect the table-plane with the ray that goes from your camera's center through the lower right corner pixels of the second green box. This intersection will look like (x,y,0) and [x-1,y] will be translation between the right corners of your boxes.
If you have all the information (camera intrinsics) you can do it the way FooBar answered.
But you can use the information that the points lie on a plane even more directly with a homography (no need to calculate rays etc):
Compute the homography between the image plane and the ground plane.
Unfortunately you need 4 point correspondences, but there are only 3 cube-points visible in the image, touching the ground plane.
Instead you can use the top-plane of the cubes, where the same distance can be measured.
first the code:
int main()
{
// calibrate plane distance for boxes
cv::Mat input = cv::imread("../inputData/BoxPlane.jpg");
// if we had 4 known points on the ground plane, we could use the ground plane but here we instead use the top plane
// points on real world plane: height = 1: // so it's not measured on the ground plane but on the "top plane" of the cube
std::vector<cv::Point2f> objectPoints;
objectPoints.push_back(cv::Point2f(0,0)); // top front
objectPoints.push_back(cv::Point2f(1,0)); // top right
objectPoints.push_back(cv::Point2f(0,1)); // top left
objectPoints.push_back(cv::Point2f(1,1)); // top back
// image points:
std::vector<cv::Point2f> imagePoints;
imagePoints.push_back(cv::Point2f(141,302));// top front
imagePoints.push_back(cv::Point2f(334,232));// top right
imagePoints.push_back(cv::Point2f(42,231)); // top left
imagePoints.push_back(cv::Point2f(223,177));// top back
cv::Point2f pointToMeasureInImage(741,200); // bottom right of second box
// for transform we need the point(s) to be in a vector
std::vector<cv::Point2f> sourcePoints;
sourcePoints.push_back(pointToMeasureInImage);
//sourcePoints.push_back(pointToMeasureInImage);
sourcePoints.push_back(cv::Point2f(718,141));
sourcePoints.push_back(imagePoints[0]);
// list with points that correspond to sourcePoints. This is not needed but used to create some ouput
std::vector<int> distMeasureIndices;
distMeasureIndices.push_back(1);
//distMeasureIndices.push_back(0);
distMeasureIndices.push_back(3);
distMeasureIndices.push_back(2);
// draw points for visualization
for(unsigned int i=0; i<imagePoints.size(); ++i)
{
cv::circle(input, imagePoints[i], 5, cv::Scalar(0,255,255));
}
//cv::circle(input, pointToMeasureInImage, 5, cv::Scalar(0,255,255));
//cv::line(input, imagePoints[1], pointToMeasureInImage, cv::Scalar(0,255,255), 2);
// compute the relation between the image plane and the real world top plane of the cubes
cv::Mat homography = cv::findHomography(imagePoints, objectPoints);
std::vector<cv::Point2f> destinationPoints;
cv::perspectiveTransform(sourcePoints, destinationPoints, homography);
// compute the distance between some defined points (here I use the input points but could be something else)
for(unsigned int i=0; i<sourcePoints.size(); ++i)
{
std::cout << "distance: " << cv::norm(destinationPoints[i] - objectPoints[distMeasureIndices[i]]) << std::endl;
cv::circle(input, sourcePoints[i], 5, cv::Scalar(0,255,255));
// draw the line which was measured
cv::line(input, imagePoints[distMeasureIndices[i]], sourcePoints[i], cv::Scalar(0,255,255), 2);
}
// just for fun, measure distances on the 2nd box:
float distOn2ndBox = cv::norm(destinationPoints[0]-destinationPoints[1]);
std::cout << "distance on 2nd box: " << distOn2ndBox << " which should be near 1.0" << std::endl;
cv::line(input, sourcePoints[0], sourcePoints[1], cv::Scalar(255,0,255), 2);
cv::imshow("input", input);
cv::waitKey(0);
return 0;
}
Here's the output which I want to explain:
distance: 2.04674
distance: 2.82184
distance: 1
distance on 2nd box: 0.882265 which should be near 1.0
those distances are:
1. the yellow bottom one from one box to the other
2. the yellow top one
3. the yellow one on the first box
4. the pink one
so the red line (you asked for) should have a length of nearly exactly 2 x cube side length. But we have some error as you can see.
the better/more correct your pixel positions are before homography computation, the more exact your results.
You need a pinhole camera model, so undistort your camera (in real world application).
keep in mind too, that you could compute the distances on the ground plane, if you had 4 linear points visible there (that dont lie on same line)!
I have a very simple terrain map, 256x256 tiles for example, it's divided into tiles (same squares...). every tile have height, slope...
Something like figure below. My default look will be an iso view. (Each tile can be divied into smaller tiles for smooth, i called tesselation)
D3DXMatrixOrthoLH(&matProj, videoWidth , videoHeight , -100000, 100000);
float xPitch=0;
float yPitch=PI/3.0; //rotate yPitch 60 degree
float zPitch=PI/4.0; //rotate zPitch 45 degree
Now I need to select a unit tile onscreen with mouse (where to move to or build something...)! I have mouse position Mx,My I need to know what tile it is! If the map is flat, it's very easy! but with height, it's difficult. I'm planning to make the map quite static (not rotate often)... only translation. So I can store all the Vertex coordinate (x,y) on screen. By using D3DXVec3Project. And then search the triangles that contain the mouse position-> the tile we needed! However with this approach we may need to search 5-10 or 20 triangles. Do you know any betterway, more optimized or elegant ? Thanks!
I read about something like RayCasting for detection, maybe it can be used in my case. Because there is no eye pos in my view setup, the view vector is constant!
3D Screenspace Raycasting/Picking DirectX9
D3DXVec3Unproject also look quite promising!
Image : http://i1335.photobucket.com/albums/w666/greenpig83/sc4_zpsaaa61249.png
Let's say we have a texture (in this case 8x8 pixels) we want to use as a sprite sheet. One of the sub-images (sprite) is a subregion of 4x3 inside the texture, like in this image:
(Normalized texture coordinates of the four corners are shown)
Now, there are basically two ways to assign texture coordinates to a 4px x 3px-sized quad so that it effectively becomes the sprite we are looking for; The first and most straightforward is to sample the texture at the corners of the subregion:
// Texture coordinates
GLfloat sMin = (xIndex0 ) / imageWidth;
GLfloat sMax = (xIndex0 + subregionWidth ) / imageWidth;
GLfloat tMin = (yIndex0 ) / imageHeight;
GLfloat tMax = (yIndex0 + subregionHeight) / imageHeight;
Although when first implementing this method, ca. 2010, I realized the sprites looked slightly 'distorted'. After a bit of search, I came across a post in the cocos2d forums explaining that the 'right way' to sample a texture when rendering a sprite is this:
// Texture coordinates
GLfloat sMin = (xIndex0 + 0.5) / imageWidth;
GLfloat sMax = (xIndex0 + subregionWidth - 0.5) / imageWidth;
GLfloat tMin = (yIndex0 + 0.5) / imageHeight;
GLfloat tMax = (yIndex0 + subregionHeight - 0.5) / imageHeight;
...and after fixing my code, I was happy for a while. But somewhere along the way, and I believe it is around the introduction of iOS 5, I started feeling that my sprites weren't looking good. After some testing, I switched back to the 'blue' method (second image) and now they seem to look good, but not always.
Am I going crazy, or something changed with iOS 5 related to GL ES texture mapping? Perhaps I am doing something else wrong? (e.g., the vertex position coordinates are slightly off? Wrong texture setup parameters?) But my code base didn't change, so perhaps I am doing something wrong from the beginning...?
I mean, at least with my code, it feels as if the "red" method used to be correct but now the "blue" method gives better results.
Right now, my game looks OK, but I feel there is something half-wrong that I must fix sooner or later...
Any ideas / experiences / opinions?
ADDENDUM
To render the sprite above, I would draw a quad measuring 4x3 in orthographic projection, with each vertex assigned the texture coords implied in the code mentioned before, like this:
// Top-Left Vertex
{ sMin, tMin };
// Bottom-Left Vertex
{ sMin, tMax };
// Top-Right Vertex
{ sMax, tMin };
// Bottom-right Vertex
{ sMax, tMax };
The original quad is created from (-0.5, -0.5) to (+0.5, +0.5); i.e. it is a unit square at the center of the screen, then scaled to the size of the subregion (in this case, 4x3), and its center positioned at integer (x,y) coordinates. I smell this has something to do too, especially when either width, height or both are not even?
ADDENDUM 2
I also found this article, but I'm still trying to put it together (it's 4:00 AM here)
http://www.mindcontrol.org/~hplus/graphics/opengl-pixel-perfect.html
There's slightly more to this picture than meets the eye, the texture coordinates are not the only factor in where the texture gets sampled. In your case I believe the blue is probably what want to have.
What you ultimately want is to sample each texel in center. You don't want to be taking samples on the boundary between two texels, because that either combines them with linear sampling, or arbitrarily chooses one or the other with nearest, depending on which way the floating point calculations round.
Having said that, you might think that you don't want to have your texcoords at (0,0), (1,1) and the other corners, because those are on the texel boundary. However an important thing to note is that opengl samples textures in the center of a fragment.
For a super simple example, consider a 2 by 2 pixel monitor, with a 2 by 2 pixel texture.
If you draw a quad from (0,0) to (2,2), this will cover 4 pixels. If you texture map this quad, it will need to take 4 samples from the texture.
If your texture coordinates go from 0 to 1, then opengl will interpolate this and sample from the center of each pixel, with the lower left texcoord starting at the bottom left corner of the bottom left pixel. This will ultimately generate texcoord pairs of (0.25, 0.25), (0.75,0.75), (0.25, 0.75), and (0.75, 0.25). Which puts the samples right in the middle of each texel, which is what you want.
If you offset your texcoords by a half pixel as in the red example, then it will interpolate incorrectly, and you'll end up sampling the texture off center of the texels.
So long story short, you want to make sure that your pixels line up correctly with your texels (don't draw sprites at non-integer pixel locations), and don't scale sprites by arbitrary amounts.
If the blue square is giving you bad results, can you give an example image, or describe how you're drawing it?
Picture says 1000 words: