I'm using floating point textures in my app to store the data and i've been noticing that in ANGLE WebGL, when i store, for example, a Float32Array([ a, b, c, d ]), the data occupies 4 nearest quad texels and each of these texels contains the values divided by 4:
r = a / 4
g = b / 4
b = c / 4
a = d / 4
In OpenGL WebGL implementation (Linux and Windows) everything behave as expected: 1 texel contains RGBA values which i provide.
Have anybody faced that? Why is it happening?
This happens with anisotropic filtering enabled, if someone interested =) ANGLE is OK!
Related
I'm filming with 6 RGB cameras a scene that I want to reconstruct in 3D, kind of like in the following picture. And I forgot to take a calibration chessboard. So I used a blank rectangle board instead and filmed it, as I would film a regular chessboard.
First step, calibration --> OK.
I obviously couldn't use cv2.findChessboardCorners, so I made a small program that would allow me to click and store the location of each 4 corners. I calibrated from these 4 points for about 10-15 frames as a test.
Tl;Dr: It seemed to work great.
Next step, triangulation. --> NOT OK
I use direct linear transform (DLT) to triangulate my points from all 6 cameras.
Tl;Dr: It's not working so well.
Image and world coordinates are connected this way: ,
which can be written .
A singular value decomposition (SVD) gives
3 of the 4 points are correctly triangulated, but the blue one that should lie on the origin has a wrong x coordinate.
WHY?
Why only one point, and why only the x coordinate?
Does it have anything to do with the fact that I calibrate from a 4 points board?
If so, can you explain; and if not, what else could it be?
Update: I tried for an other frame while the board is somewhere else, and the triangulation is fine.
So there is the mystery: some points are randomly triangulated wrong (or at least the one at the origin), while most of the others are fine. Again, why?
My guess is that it comes from the triangulation rather than from the calibration, and that there is no connexion with my sloppy calibration process.
One common issue I came across is the ambiguity in the solutions found by DLT. Indeed, solving AQ = 0 or solving AC C-¹Q gives the same solution. See page 46 here. But I don't know what to do about it.
I'm now fairly sure this is not a calibration issue but I don't want to delete this part of my post.
I used ret, K, D, R, T = cv2.calibrateCamera(objpoints, imgpoints, imSize, None, None). It worked seamlessly, and the points where
perfectly reprojected on my original image with
cv2.projectPoints(objpoints, R, T, K, D).
I computed my projection matrix P as , and R, _ = cv2.Rodrigues(R)
How is it that I get a solution while I have only 4 points per image?
Wouldn't I need 6 of them at least? We have .We
can solve P by SVD under the form This is 2
equations per point, for 11 independent unknown P parameters. So 4
points make 8 equations, which shouldn't be enough. And yet
cv2.calibrateCamera still gives a solution. It must be using
another method? I came across Perspective-n-Point (PnP), is it what
opencv uses? In which case, is it directly optimizing K, R, and T and
thus needs less points?I could artificially add a few points
to get more than the 4 corner points of my board (for example, the
centers of the edges, or the center of the rectangle). But is it
really the issue?
When calibrating, one needs to decompose the projection matrix into
intrinsic and extrinsic matrices. But this decomposition is not
unique and has 4 solutions. See there section 'I'm seeing
double' and Chapt.21 of Hartley&Zisserman about Cheirality
for more information. It is not my issue since my camera points
are correctly reprojected to the image plane and my cameras are
correctly set up on my 3D scene.
I did not quite understand what you are asking, it is rather vague. However, I think you are miscalculating your projection matrix.
if I'm not mistaken, you will surely define 4 3D points representing your rectangle in real world space in this way for example:
pt_3D = [[ 0 0 0]
[ 0 1 0]
[ 1 1 0]
[ 1 0 0]]
you will then retrieve the corresponding 2D points (in order) of each image, and generate two vectors as follows:
objpoints = [pt_3D, pt_3D, ....] # N times
imgpoints = [pt_2D_img1, pt_3D_img2, ....] # N times ( N images )
You can then calibrate your camera and recover the camera poses as well as the projection matrices as follows:
ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, imSize, None, None)
cv2.projectPoints(objpoints, rvecs, tvecs, K, dist)
for rvec, tvec in zip(rvecs, tvecs):
Rt, _ = cv2.Rodrigues(rvec)
R = Rt.T
T = - R # tvec
pose_Matrix = np.vstack(( np.hstack((R,T)) , [0, 0, 0, 1])) ( transformation matrix == camera pose )
Projection_Matrix = K # TransformationMatrix.T[:3, :4]
You don't have to apply the DLT or the triangulation (all is done in the cv2.calibrateCamera () function, and the 3D points remain what you define yourself
I'm working on a swift app that should measure physical correct z-values of a person standing in front of a statically mounted iPhone 7+. Therefore, I am using AVDepthData objects that contain the depth maps coming from the Dual Camera System.
However, the resulting point clouds indicate that the depth maps do not have subpixel accuracy since the point clouds consist of slices along the z-direction and the distances of adjacent slices increase with increasing depth. This seems to be caused by the integer discretization.
Here are two files that visualize the problem:
Captured Depthmap, cropped Z-Values after 4.0m: DepthMap with Z-values in Legend
Textured Pointcloud, view from the side (90°): Pointcloud rendered from iPhone
According to Apple's documentation, I've already deactivated the temporal filtering and unwarped the images using the distortion coefficients from the lookup table, in order to have correct world coordinates.
Filtering depth data makes it more useful for applying visual effects to a companion image, but alters the data such that it may no longer be suitable for computer vision tasks. (In an unfiltered depth map, missing values are represented as NaN.)
Is there any way to retrieve depth maps that have subpixel accuracy in order to perform good measurements of a person standing in front of the camera?
Below you can find the python code I wrote to create the pointclouds offline, the method calculate_rectified_point was provided by Apple to remove lense distortion from the images.
for v in range(height):
for u in range(width):
r, g, b = rgb_texture[v, u]
z = depth_map[v, u]
if z <= 0:
continue
# Step 1: inverse the intrinsic parameters
x = (u - center[0]) / focal_lengths[0]
y = (v - center[1]) / focal_lengths[1]
# Step 2: remove the radial and tangential distortion
x_un, y_un = calculate_rectified_point((x, y), dist_coefficients, optical_center, (width, height))
# Step 3: inverse extrinsic parameters
x, y, z = extrinsic_matrix_inv.dot(np.array([x_un * z, y_un * z, z]))
I'm trying to achieve terrain texturing using 3D texture that consists of several layers of material and to make smooth blending between materials.
Maybe my illustration will explain it better:
Just imagine that each color is a cool terrain texture, like grass, stone, etc.
I want to get them properly blended, but with current approach I get all textures between requested besides textures which I want to appear (it seems logical because, as I've read, 3D texture is treated as three-dimensional array instead of texture pillars).
Current (and foolish, obviously) approach is simple as a pie ('current' result is rendered using point interpolation, desired result is hand-painted):
Vertexes:
Vertex 1: Position = Vector3.Zero, UVW = Vector3.Zero
Vertex 2: Position = Vector3(0, 1, 0), UVW = Vector3(0, 1, 0.75f)
Vertex 3: Position = Vector3(0, 0, 1), UVW = Vector3(1, 0, 1)
As you can see, first vertex of the triangle uses first material (the red one), second vertex uses third material (the blue one) and third vertex uses last fourth material (the yellow one).
This is how it's done in pixel shader (UVW is directly passed without changes):
float3 texColor = tex3D(ColorTextureSampler, input.UVW);
return float4(texColor, 1);
The reason about my choice is my terrain structure. The terrain is being generated from voxels (each voxel holds material ID) using marching cubes. Each vertex is 'welded' because meshes is pretty big and I don't want to make every triangle individual (but I can still do it if there is no way to solve my question using connected vertices).
I recently came to an idea about storing material IDs of other two vertices of the triangle and their blend factors (I would have an float2 UV pair, float3 for material IDs and float3 for blend factor of each material id) in each vertex, but I don't see any way to accomplish this without breaking my mesh into individual triangles.
Any help would be greatly appreciated. I'm targeting for SlimDX with C# and Direct3D 9 API. Thanks for reading.
P.S.: I'm sorry if I made some mistakes in this text, English is not my native language.
Probably, your ColorTextureSampler using point filtering (D3DTEXF_POINT). Use either D3DTEXF_LINEAR or D3DTEXF_ANISOTROPIC to acheve desired interpolation effect.
I'm not very familiar with SlimDX 9, but you've got the idea.
BTW, nice illustration =)
Update 1
Result in your comment below seems appropriate to your code.
Looks like to get desired effect you must change overall approach.
It is not complete solution for you, but there is how we make it in plain 3D terrains:
Every vertex has 1 pair (u, v) of texure coodrinates
You have n textures to sample into (T1, T2, T3, ..., Tn) that represents different layers of terrain: sand, grass, rock, etc.
You have mask texture(s) n channels in total, that stores blending coefficients for each texture T in its channels: R channel holds alpha for T1, G channel for T2, B for T3, ... etc.
In pixel shader you sampling your layer textures as usual, and get color values float4 val1, val2, val3, ...
Then you sampling masks texture(s) for corresponding blend coefficients and get float blend1, blend2, blend3, ...
Then you applying some kind of blending algorith, for example simple linear interpolation:
float4 terrainColor = lerp( val1, val2, blend1 );
terrainColor = lerp( terrainColor, val3, blend2);
terrainColor = lerp( terrainColor, ..., blendN );
For example if your T1 is a grass, and you have a big grass field in a middle of your map, you will wave a big red field in the middle.
This algorithm is a bit slow, because of much texture sampling, but simple to implement, gives good visual results and most flexible. You can use not only mask as blend coefficients, but any values: for example height (sample more snow in mountain peaks, rock in mountains, dirt in low ground), slope (rock on steep, grass on flat), even fixed values, etc. Or mix up all of that. Also, you can vary a blending: use built-in lerp or something more complicated (warning! this example is stupid):
float4 terrainColor = val1 * val2 * blend1 + val2 * val3 * blend2;
terrainColor = saturate(terrainColor);
Playing with blend algo is the most interesting part of this aproach. And you can find many-many techniques in google.
Not sure, but hope it helps!
Happy coding! =)
In my application I am getting the depth frame similar to the depth frame retrieved from Depth Basics Sample. What I don't understand is, why are there discrete levels in the image? I don't know what do I call these sudden changes in depth values. Clearly my half of my right hand is all black and my left hand seems divided into 3 such levels. What is this and how do I remove this?
When I run the KinectExplorer Sample app I get the depth as follows. This is the depth image I want to generate from the raw depth data.
I am using Microsoft Kinect SDK's (v1.6) NuiApi along with OpenCV. I have the following code:
BYTE *pBuffer = (BYTE*)depthLockedRect.pBits; //pointer to data having 8-bit jump
USHORT *depthBuffer = (USHORT*) pBuffer; //pointer to data having 16-bit jump
int cn = 4;
this->depthFinal = cv::Mat::zeros(depthHeight,depthWidth,CV_8UC4); //8bit 4 channel
for(int i=0;i<this->depthFinal.rows;i++){
for(int j=0;j<this->depthFinal.cols;j++){
USHORT realdepth = ((*depthBuffer)&0x0fff); //Taking 12LSBs for depth
BYTE intensity = (BYTE)((255*realdepth)/0x0fff); //Scaling to 255 scale grayscale
this->depthFinal.data[i*this->depthFinal.cols*cn + j*cn + 0] = intensity;
this->depthFinal.data[i*this->depthFinal.cols*cn + j*cn + 1] = intensity;
this->depthFinal.data[i*this->depthFinal.cols*cn + j*cn + 2] = intensity;
depthBuffer++;
}
}
The stripes that you see, are due to the wrapping of depth values, as caused by the %256 operation. Instead of applying the modulo operation (%256), which is causing the bands to show up, remap the depth values along the entire range, e.g.:
BYTE intensity = depth == 0 || depth > 4095 ? 0 : 255 - (BYTE)(((float)depth / 4095.0f) * 255.0f);
in case your max depth is 2048, replace the 4095 with 2047.
More pointers:
the Kinect presumably returns a 11bit value (0-2047) but you only use 8bit (0-255).
new Kinect versions seem to return a 12bit value (0-4096)
in the Kinect explorer source code, there's a file called DepthColorizer.cs where most of the magic seems to happen. I believe that this code makes the depth values so smooth in the kinect explorer - but I might be wrong.
I faced the same problem while I was working on a project which involved visualization of depth map. However I used OpenNI SDK with OpenCV instead of Kinect SDK libraries. The problem was same and hence the solution will work for you as it did for me.
As mentioned in previous answers to your question, Kinect Depth map is 11-bit (0-2047). While in examples, 8-bit data types are used.
What I did in my code to get around this was to acquire the depth map into a 16-bit Mat, and then convert it to 8-bit uchar Mat by using scaling options in convertTo function for Mat
First I initialize a Mat for acquiring depth data
Mat depthMat16UC1(XN_VGA_Y_RES, XN_VGA_X_RES, CV_16UC1);
Here XN_VGA_Y_RES, XN_VGA_X_RES defines the resolution of the acquired depth map.
The code where I do this is as follows:
depthMat16UC1.data = ((uchar*)depthMD.Data());
depthMat16UC1.convertTo(depthMat8UC1, CV_8U, 0.05f);
imshow("Depth Image", depthMat8UC1);
depthMD is metadata containing the data retrieved from Kinect sensor.
I hope this helps you in some way.
The visualization of the depth image data has discreet levels that are coarse (0 to 255 in your code example), but the actual depth image data are numbers between 0 and 2047. Still discreet, of course, but not in such coarse units as the colors chosen to depict them.
The kinect v2 can see 8 meter depth, (but accuracy beyond 4.5 decreases).
It start around 0.4 meter.
So one needs to express a number 8000 to a color.
A way to do this is use RGB colors just a numbers.
then you could potentially store a number like 255x255x255 i a pixel.
Or if you had different color format then it would be different.
Storing 8000 in that 255x255x255 max number will result in a certain amount of R+G+B, and that gives this banding effect.
But you could ofcourse devide 8000 or substract a number, or remove beyond a certain value.
I've received two conflicting answers in terms of multiplying matrices in Direct3D to achieve results. Tutorials do state to multiply from left to right and that's fine but it's not how I would visualize it.
Here's an example:
OpenGL (reading from top to bottom):
GLRotatef(90.0f);
GLTranslatef(20.0f,0,0);
So you visualize the world axis rotating 30 degrees. Then you translate 20.0 on the now rotated x-axis so it looks like you are going up on the world y-axis.
In Direct3D, doing:
wm = rotatem * translatem;
is different. It looks like the object was just rotated at the origin and translated on the world's x-axis so it goes to the right and not up. It only works once I reverse the order and read from right to left.
Also for example, in frank luna's book on DX10, he goes into explaining how to do mirror reflections. I get all of that but when he does for example:
reflection_matrix = world_m * reflection_m;
around the xy plane, do I interpret this as first doing a the world positioning then a reflection or the opposite?
The problem is the order you are multiplying the matrices to get the composite transform matrix is reversed from what it should be. You are doing: wm = rotatem * translatem, which follows the order of operations you are doing for OpenGL, but for DirectX the matrix should have been wm = translatem * rotatem
The fundamental difference between OpenGL and DirectX arises from the fact that OpenGL treats matrices in column major order, while DirectX treats matrics in row major order.
To go from column major to row major you need to find the transpose ( swap the rows and the columns ) of the OpenGL matrix.
So, if you write wm = rotatem * translatem in OpenGL, then you want the transpose of that for DirectX, which is:
wmT = (rotatem*translatem)T = translatemT * rotatemT
which explains why the order of the matrix multiply has to be reversed in DirectX.
See this answer. In OpenGL, each subsequent operation is a pre-multiplication of all the operations before it, not a post-multiplication. You can see a matrix multiplication of a vector as a function evaluation.
If what you want is to first rotate a vector and then translate your rotated vector, which you in OpenGL would have solved by first calling glRotatef and then calling glTranslatef, you could express that using function calls as
myNewVector = translate(rotate(myOldVector))
The rotate function does this
rotate(anyVector) = rotationMatrix * anyVector
and the translate function does this
translate(anyOtherVector) = translationMatrix * anyOtherVector
so your equivalent expression using matrix multiplications would look like
myNewVector = translationMatrix * rotationMatrix * myOldVector
That is, your combined transformation matrix would look be translationMatrix * rotationMatrix.