Performing shear with altered origin in OpenCV - opencv

I want to shear an image along the y-axis, but keeping a given x value in the same height.
If I use a shear matrix as:
[1, 0, 0]
[s, 1, 0]
with s = the shear factor with the function "warpAffine", I do get my sheared image, but it's always with origin at 0,0.
I need to have my origin at another x (and maybe even y).
So I think I should combine translation matrices, to translate the image to the left, perform the shear and translate the image back to the right, but I am unsure how to combine these matrices, since:
[1, 0, dx] [1, 0, 0] [1, 0, -dx]
[0, 1, dy] * [s, 1, 0] * [0, 1, -dy]
obviously won't work.
How can I perform a shearing operation on a chosen origin other than 0,0?

you are correct that you should combine those transforms. To combine them convert all of them to 3x3 matrices by adding [0,0,1] as the 3rd row. Then you can multiply all 3 matrices. When you finish the bottom row will still be [0,0,1], you can then remove it and use the resulting 2x3 matrix as your combined transform.

Related

How to construct a filter to detect diagonal edges

I am trying to Construct a filter that detect diagonal edges in a given matrix of an image in the direction of a vector (1,1). Can someone point me to how do that.
You could use a diagonal variant of the Sobel filter, of the form:
[[2 1 0],
[1 0 -1],
[0 -1 -2]]

Keras ImageDataGenerator rescale values to [-0.5, 0.5]

ImageDataGenerator class from Keras has parameter rescale which changes pixel value from [0, 255] to [0, 1]. Is it possible to change it to [-0.5, 0.5]?
Thank you for your answers!
As it is specified in the documentation:
rescale: rescaling factor. Defaults to None. If None or 0, no
rescaling is applied, otherwise we multiply the data by the value
provided (after applying all other transformations).
So it is not possible to achieve [-0.5, 0.5] with this parameter, but you could achieve [0, 1] range with 1/255. factor.
For rescaling the images to [-0.5, 0.5] range you could do interp each image after data agumentation:
np.interp(image, (0, 255), (-0.5, 0.5))

How to calculate partial derivatives of error function with respect to values in matrix

I am building a project that is a basic neural network that takes in a 2x2 image with the goal to classify the image as either a forward slash (1-class) or back slash (0-class) shape. The data for the input is a flat numpy array. 1 represents a black pixel 0 represents a white pixel.
0-class: [1, 0, 0, 1]
1-class: [0, 1, 1, 0]
If I start my filter as a random 4x1 matrix, how can I use gradient descent to come to either perfect matrix [1,-1,-1,1] or [-1,1,1,-1] to classify the datapoints.
Side note: Even when multiplied with the "perfect" answer matrix then summed, the label output would be -2 and 2. Would my data labels need to be -2 and 2? What if I want my classes labeled as 0 and 1?

composition of 3D rigid transformations

This question belongs to the topic - 'Structure from Motion'.
Suppose, there are 3 images. There are point correspondences between image 1-2 and image 2-3, but there's no common point between image 1 and 3. I got the RT (rotation and translation matrix) of image 2, RT12, with respect to image 1(considering image 1 RT as [I|0], that means, rotation is identity, translation is zero). Lets split RT12 into R12 and T12.
Similarly, I got RT23 considering image 2 RT as [I|0]. So, now I have R23 and T23, who are related to image 2, but not image 1. Now I want to find R13 and T13.
For a synthetic dataset, the equation R13=R23*R12 is giving correct R(verified, as I actually have the R13 precalculated). Similary T13 should be T2+T1. But the translation computed this way is bad. Since I have actual results with me, I could verify that Rotation was estimated nicely, but not translation. Any ideas?
This is a simple matrix block-multiplication problem, but you have to remember that you are actually considering 4x4 matrices (associated to rigid transformations in the 3D homogeneous world) and not 3x4 matrices.
Your 3x4 RT matrices actually correspond to the first three rows of 4x4 matrices A, whose last rows are [0, 0, 0, 1]:
RT23 -> A23=[R23, T23; 0, 0, 0, 1]
RT12 -> A12=[R12, T12; 0, 0, 0, 1]
Then, if you do the matrix block-multiplication on the 4x4 matrices (A13 = A23 * A12), you'll quickly find out that:
R13 = R23 * R12
T13 = R23 * T12 + T23

Where to center the kernel when using FFTW for image convolution?

I am trying to use FFTW for image convolution.
At first just to test if the system is working properly, I performed the fft, then the inverse fft, and could get the exact same image returned.
Then a small step forward, I used the identity kernel(i.e., kernel[0][0] = 1 whereas all the other components equal 0). I took the component-wise product between the image and kernel(both in the frequency domain), then did the inverse fft. Theoretically I should be able to get the identical image back. But the result I got is very not even close to the original image. I am suspecting this has something to do with where I center my kernel before I fft it into frequency domain(since I put the "1" at kernel[0][0], it basically means that I centered the positive part at the top left). Could anyone enlighten me about what goes wrong here?
For each dimension, the indexes of samples should be from -n/2 ... 0 ... n/2 -1, so if the dimension is odd, center around the middle. If the dimension is even, center so that before the new 0 you have one sample more than after the new 0.
E.g. -4, -3, -2, -1, 0, 1, 2, 3 for a width/height of 8 or -3, -2, -1, 0, 1, 2, 3 for a width/height of 7.
The FFT is relative to the middle, in its scale there are negative points.
In the memory the points are 0...n-1, but the FFT treats them as -ceil(n/2)...floor(n/2), where 0 is -ceil(n/2) and n-1 is floor(n/2)
The identity matrix is a matrix of zeros with 1 in the 0,0 location (the center - according to above numbering). (In the spatial domain.)
In the frequency domain the identity matrix should be a constant (all real values 1 or 1/(N*M) and all imaginary values 0).
If you do not receive this result, then the identify matrix might need padding differently (to the left and down instead of around all sides) - this may depend on the FFT implementation.
Center each dimension separately (this is an index centering, no change in actual memory).
You will probably need to pad the image (after centering) to a whole power of 2 in each dimension (2^n * 2^m where n doesn't have to equal m).
Pad relative to FFT's 0,0 location (to center, not corner) by copying existing pixels into a new larger image, using center-based-indexes in both source and destination images (e.g. (0,0) to (0,0), (0,1) to (0,1), (1,-2) to (1,-2))
Assuming your FFT uses regular floating point cells and not complex cells, the complex image has to be of size 2*ceil(2/n) * 2*ceil(2/m) even if you don't need a whole power of 2 (since it has half the samples, but the samples are complex).
If your image has more than one color channel, you will first have to reshape it, so that the channel are the most significant in the sub-pixel ordering, instead of the least significant. You can reshape and pad in one go to save time and space.
Don't forget the FFTSHIFT after the IFFT. (To swap the quadrants.)
The result of the IFFT is 0...n-1. You have to take pixels floor(n/2)+1..n-1 and move them before 0...floor(n/2).
This is done by copying pixels to a new image, copying floor(n/2)+1 to memory-location 0, floor(n/2)+2 to memory-location 1, ..., n-1 to memory-location floor(n/2), then 0 to memory-location ceil(n/2), 1 to memory-location ceil(n/2)+1, ..., floor(n/2) to memory-location n-1.
When you multiply in the frequency domain, remember that the samples are complex (one cell real then one cell imaginary) so you have to use a complex multiplication.
The result might need dividing by N^2*M^2 where N is the size of n after padding (and likewise for M and m). - You can tell this by (a. looking at the frequency domain's values of the identity matrix, b. comparing result to input.)
I think that your understanding of the Identity kernel may be off. An Identity kernel should have the 1 at the center of the 2D kernal not at the 0, 0 position.
example for a 3 x 3, you have yours setup as follows:
1, 0, 0
0, 0, 0
0, 0, 0
It should be
0, 0, 0
0, 1, 0
0, 0, 0
Check this out also
What is the "do-nothing" convolution kernel
also look here, at the bottom of page 3.
http://www.fmwconcepts.com/imagemagick/digital_image_filtering.pdf
I took the component-wise product between the image and kernel in frequency domain, then did the inverse fft. Theoretically I should be able to get the identical image back.
I don't think that doing a forward transform with a non-fft kernel, and then an inverse fft transform should lead to any expectation of getting the original image back, but perhaps I'm just misunderstanding what you were trying to say there...

Resources