XNA/Directx Handedness (orientation)? - xna

I'm using XNA (which uses DirectX) for some graphical programming. I had a box rotating around a point, but the rotations are a bit odd.
Everything seems like someone took a compass and rotated it 180 degrees, so that N is 180, W is 90, etc..
I can't quite seem to find a source that states the orientation, so i'm probably just not using the right keywords.
Can someone help me find what XNA/DirectX's orientation is, and a page that states this too?

DirectX uses a left-handed coordinate system.
XNA
Uses a right-handed coordinate system.
Forward is -Z, backward is +Z. Forward points into the screen.
Right is +X, left is -X. Right points to the right-side of the screen.
Up is +Y, down is -Y. Up points to the top of the screen.
Matrix layout is as follows (using an identity matrix in this example). XNA uses a row layout for its matrices. The first three rows represent orientation. The last row and first three columns ([4, 1], [4, 2], and [4, 3]) represent translation/position. Here is documentation on XNA's Matrix Structure.
In the case of a translation matrix (translation is position and rotation combined):
Right 1 0 0 0
Up 0 1 0 0
Forward 0 0 -1 0
Pos 0 0 0 1

Related

OpenCV to OpenGL coordinate system transform

I have two right handed coordinate systems.
OpenCV
As you can see with the black arrows, the camera looks down the positive $Z$ axis. You can ignore the rest of the diagram.
OpenGL
Although not visible here, the camera in OpenGL looks down the -Z axis. I want to transform a 3D point in front of the camera in the OpenCV coordinate system to a 3D point in front of the camera in the OpenGL.
I'm trying to represent this in a 4x4 matrix that concatenates R and T with 0001 at the bottom.
So far, I've tried this
1 0 0 0
0 -1 0 0
0 0 -1 0
0 0 0 1
but it doesn't seem to do anything, nothing shows up in the OpenGL coordinate system.
The camera coordinates of OpenCV goes X right, Y down, Z forward. While the camera coordinates of OpenGL goes X right, Y up, Z inward.
Use solvePnP as one of the most commonly used example.
You get a 3x3 rotation matrix R and a 1x3 translation vector T, and create a 4x4 view matrix M with R and T. Simply inverse the 2nd and 3rd row of M and you will get a view matrix for OpenGL rendering.

Derive Sobel's operators?

The two operators for detecting and smoothing horizontal and vertical edges are shown below:
[-1 0 1]
[-2 0 2]
[-1 0 1]
and
[-1 -2 -1]
[ 0 0 0]
[ 1 2 1]
But after much Googling, I still have no idea where these operators come from. I would appreciate it if someone can show me how they are derived.
The formulation was proposed by Irwin Sobel a long time ago. I think about 1974. There is a great page on the subject here.
The main advantage of convolving the 9 pixels surrounding one at which gradients are to be detected is that this simple operator is really fast and can be done with shifts and adds in low-cost hardware.
They are not the greatest edge detectors in the world - Google Canny edge detectors for something better, but they are fast and suitable for a lot of simple applications.
So spatial filters, like the Sobel kernels, are applied by "sliding" the kernel over the image (this is called convolution). If we take this kernel:
[-1 0 1]
[-2 0 2]
[-1 0 1]
After applying the Sobel operator, each result pixel gets a:
high (positive) value if the pixels on the right side are bright and pixels on the left are dark
low (negative) value if the pixels on the right side are dark and pixels on the left are bright.
This is because in discrete 2D convolution, the result is the sum of each kernel value multiplied by the corresponding image pixel. Thus a vertical edge causes the value to have a large negative or positive value, depending on the direction of the edge gradient. We can then take the absolute value and scale to interval [0, 1], if we want to display the edges as white and don't care about the edge direction.
This works identically for the other kernel, except it finds horizontal edges.

Convolution kernel

in the following link http://homepages.inf.ed.ac.uk/rbf/HIPR2/linedet.htm it is being told that for detecting lines we need to specify the width and angle of line - "to detect the presence of lines of a particular width n, at a particular orientation theta angel". The example convolution kernels are given for 0,90,45,135 orientation and width is single pixel.
My problem of understanding is, how will the convolution kernel change is I want thicker lines, means width of 3 or 5 or 7 pixel in 90 or 0 or 45 or 135 degree. What if, I want to change the angels also, how will I change the convolution kernel?
I am new in image processing so have less understanding. Please a tutorial or some of help will be appreciated.
For thicker lines, you need a larger kernel in the conventions of your link. You will need more 2's to detect lines of the width you are looking for. For a 3-pixel width horizontal line, you will need the following kernel.
-1 -1 -1 -1 -1
2 2 2 2 2
2 2 2 2 2
2 2 2 2 2
-1 -1 -1 -1 -1
and so on, depending on angles and widths.
if you want a kernel for other orientations than 0, 40, 90, and 135 degrees, it's more complicated than kernels for 0,40, 90 and 135 orientation. There are some other methods you can use. For example, http://en.wikipedia.org/wiki/Hough_transform.

Rotate, Scale and Translate around image centre in OpenCV

I really hope this isn't a waste of anyone's time but I've run into a small problem. I am able to construct the transformation matrix using the following:
M =
s*cos(theta) -s*sin(theta) t_x
s*sin(theta) s*cos(theta) t_y
0 0 1
This works if I give the correct values for theta, s (scale) and tx/ty and then use this matrix as one of the arguments for cv::warpPerspective. The problem lies in that this matrix rotates about the (0,0) pixel whereas I would like it to rotate about the centre pixel (cols/2, rows/2). How can incoporate the centre point rotation into this matrix?
Two possibilities. The first is to use the function getRotationMatrix2D which takes the center of rotation as an argument, and gives you a 2x3 matrix. Add the third row and you're done.
A second possibility is to construct an additional matrix that translates the picture before and after the rotation:
T =
1 0 -cols/2
0 1 -rows/2
0 0 1
Multiply your rotation matrix M with this one to get the total transform -TMT (e.g. with function gemm) and apply this one with warpPerspective.

Where to center the kernel when using FFTW for image convolution?

I am trying to use FFTW for image convolution.
At first just to test if the system is working properly, I performed the fft, then the inverse fft, and could get the exact same image returned.
Then a small step forward, I used the identity kernel(i.e., kernel[0][0] = 1 whereas all the other components equal 0). I took the component-wise product between the image and kernel(both in the frequency domain), then did the inverse fft. Theoretically I should be able to get the identical image back. But the result I got is very not even close to the original image. I am suspecting this has something to do with where I center my kernel before I fft it into frequency domain(since I put the "1" at kernel[0][0], it basically means that I centered the positive part at the top left). Could anyone enlighten me about what goes wrong here?
For each dimension, the indexes of samples should be from -n/2 ... 0 ... n/2 -1, so if the dimension is odd, center around the middle. If the dimension is even, center so that before the new 0 you have one sample more than after the new 0.
E.g. -4, -3, -2, -1, 0, 1, 2, 3 for a width/height of 8 or -3, -2, -1, 0, 1, 2, 3 for a width/height of 7.
The FFT is relative to the middle, in its scale there are negative points.
In the memory the points are 0...n-1, but the FFT treats them as -ceil(n/2)...floor(n/2), where 0 is -ceil(n/2) and n-1 is floor(n/2)
The identity matrix is a matrix of zeros with 1 in the 0,0 location (the center - according to above numbering). (In the spatial domain.)
In the frequency domain the identity matrix should be a constant (all real values 1 or 1/(N*M) and all imaginary values 0).
If you do not receive this result, then the identify matrix might need padding differently (to the left and down instead of around all sides) - this may depend on the FFT implementation.
Center each dimension separately (this is an index centering, no change in actual memory).
You will probably need to pad the image (after centering) to a whole power of 2 in each dimension (2^n * 2^m where n doesn't have to equal m).
Pad relative to FFT's 0,0 location (to center, not corner) by copying existing pixels into a new larger image, using center-based-indexes in both source and destination images (e.g. (0,0) to (0,0), (0,1) to (0,1), (1,-2) to (1,-2))
Assuming your FFT uses regular floating point cells and not complex cells, the complex image has to be of size 2*ceil(2/n) * 2*ceil(2/m) even if you don't need a whole power of 2 (since it has half the samples, but the samples are complex).
If your image has more than one color channel, you will first have to reshape it, so that the channel are the most significant in the sub-pixel ordering, instead of the least significant. You can reshape and pad in one go to save time and space.
Don't forget the FFTSHIFT after the IFFT. (To swap the quadrants.)
The result of the IFFT is 0...n-1. You have to take pixels floor(n/2)+1..n-1 and move them before 0...floor(n/2).
This is done by copying pixels to a new image, copying floor(n/2)+1 to memory-location 0, floor(n/2)+2 to memory-location 1, ..., n-1 to memory-location floor(n/2), then 0 to memory-location ceil(n/2), 1 to memory-location ceil(n/2)+1, ..., floor(n/2) to memory-location n-1.
When you multiply in the frequency domain, remember that the samples are complex (one cell real then one cell imaginary) so you have to use a complex multiplication.
The result might need dividing by N^2*M^2 where N is the size of n after padding (and likewise for M and m). - You can tell this by (a. looking at the frequency domain's values of the identity matrix, b. comparing result to input.)
I think that your understanding of the Identity kernel may be off. An Identity kernel should have the 1 at the center of the 2D kernal not at the 0, 0 position.
example for a 3 x 3, you have yours setup as follows:
1, 0, 0
0, 0, 0
0, 0, 0
It should be
0, 0, 0
0, 1, 0
0, 0, 0
Check this out also
What is the "do-nothing" convolution kernel
also look here, at the bottom of page 3.
http://www.fmwconcepts.com/imagemagick/digital_image_filtering.pdf
I took the component-wise product between the image and kernel in frequency domain, then did the inverse fft. Theoretically I should be able to get the identical image back.
I don't think that doing a forward transform with a non-fft kernel, and then an inverse fft transform should lead to any expectation of getting the original image back, but perhaps I'm just misunderstanding what you were trying to say there...

Resources