I try to filter a signal using opencv's dft function. The way I try to this is taking the signal in time domain:
x = [0.0201920000000000 -0.0514940000000000 0.0222140000000000 0.0142460000000000 -0.00313500000000000 0.00270600000000000 0.0111770000000000 0.0233470000000000 -0.00162700000000000 -0.0306280000000000 0.0239410000000000 -0.0225840000000000 0.0281410000000000 0.0265510000000000 -0.0272180000000000 0.0223850000000000 -0.0366850000000000 0.000515000000000000 0.0213440000000000 -0.0107180000000000 -0.0222150000000000 -0.0888300000000000 -0.178814000000000 -0.0279280000000000 -0.144982000000000 -0.199606000000000 -0.225617000000000 -0.188347000000000 0.00196200000000000 0.0830530000000000 0.0716730000000000 0.0723950000000000]
Convert it to FOURIER domain using :
cv::dft(x, x_fft, cv::DFT_COMPLEX_OUTPUT, 0);
Eliminate the unwanted frequencies:
for(int k=0; k<32;k++){
if(k==0 || k>6 )
{
x_fft.ptr<float>(0)[2*k+0]=0;
x_fft.ptr<float>(0)[2*k+1]=0;
}
}
Convert it back to time domain:
cv::dft(x_fft, x_filt, cv::DFT_INVERSE, 0);
In order to check my results I've compared them to Matlab. I took the same signal x, convert it to FOURIER using x_mfft = fft(x); The results are similar to the ones I get from opencv, excepting the fact that in opencv I only get the left side, while in matlab I get the symmetric values too.
After this I set to 0 in Matlab the values of x_mfft(0) and x_mfft(8:32) and now the signal look exactly the same except the fact that in Matlab they are in complex form, while in opencv they are separated, real part in one channel, imaginary part in the other.
The problem is that when I perform the inverse transform in matlab using x_mfilt = ifft(x_mfft) the results are completely different from what I get using opencv.
Matlab:
0.0126024108604191 + 0.0100628178150509i 0.00278762121814893 - 0.00615997579216921i 0.0116716145588075 - 0.0150834711251450i 0.0204808089882897 - 0.00937680194210788i 0.0187164132302469 - 0.000843687942567208i 0.0132322795522116 - 0.000108642129381095i 0.0140282455278201 - 0.00325620843335947i 0.0190436542174946 - 0.000556561558544529i 0.0182379867325824 + 0.00764390022568001i 0.00964801276734883 + 0.0107158342431018i 0.00405220362962359 + 0.00339496875258604i 0.0108096973356501 - 0.00476499376334313i 0.0236507440224628 - 0.000415067678294738i 0.0266197220512826 + 0.0154626911663024i 0.0142805873081583 + 0.0267004219364679i 0.000314527358302778 + 0.0215255889620223i 0.00173512964620177 + 0.00865151513638104i 0.0169666351363477 + 0.00836162056544561i 0.0255915540012784 + 0.0277878383595920i 0.0118710562486680 + 0.0506446948330055i -0.0160165379892836 + 0.0553846122152651i -0.0354343989166415 + 0.0406080858067314i -0.0370261047451452 + 0.0261077990289579i -0.0365120038155127 + 0.0268311542287801i -0.0541841640123775 + 0.0312446266697320i -0.0854132555297956 + 0.0125342802025550i -0.0989182320365535 - 0.0377079727602073i -0.0686133217915410 - 0.0925138855355046i -0.00474198249025186 - 0.111728716441247i 0.0515933837210975 - 0.0814138940625859i 0.0663201317560107 - 0.0279433757588921i 0.0426055814586485 + 0.00821080477569232i
OpenCV after cv::dft(x_fft, x_filt, cv::DFT_INVERSE, 0);
Channel 1:
0.322008 -0.197121 -0.482671 -0.300055 -0.026996 -0.003475 -0.104199 -0.017810 0.244606 0.342909 0.108642 -0.152477 -0.013281 0.494806 0.854412 0.688818 0.276848 0.267571 0.889207 1.620622 1.772298 1.299452 0.835450 0.858602 0.999833 0.401098 -1.206658 -2.960446 -3.575316 -2.605239 -0.894184 0.262747
Channel 2:
0.403275 0.089205 0.373494 0.655387 0.598925 0.423432 0.448903 0.609397 0.583616 0.308737 0.129670 0.345907 0.756820 0.851827 0.456976 0.010063 0.055522 0.542928 0.818924 0.379870 -0.512527 -1.133893 -1.184826 -1.168379 -1.733893 -2.733226 -3.165383 -2.195622 -0.151738 1.650990 2.122242 1.363375
What am I missing? Shouldn't the results be similar? How can I check if the inverse transform in opencv is done correctly?
Later EDIT:
After struggling with the problems for a few hours now I've decided to plot the results from Matlab and OpenCV and to my surprise they were very much similar.
Imaginary parts
Real parts:
So obviously it's something about a SCALE factor. After dividing them element by element apparently this factor is 32 - the length of the signal. Can someone explain why this happens?
The obvious solution is to use cv::dft(x_fft, x_filt, cv::DFT_INVERSE+cv::DFT_SCALE, 0); so I guess this topic is answered but I'm still interested in why is it this way.
There is no standard for scale factor used by all FFT libraries. Some use none, some include a scale factor of 1/N, some 1/sqrt(N). You have to test or look in the documentation for each particular library.
Related
I was writing a program that would scramble the colors of each pixel in an image so that the resulting image would be very noisy and hard to recogize but that it would appear normal in grayscale.
As soon as I wrote the program, I noticed a problem. The alogrithm I used assumed that the way to calculate grayscale was something like
Grayscale(Pixel p) {
average := (p.red + p.green + p.blue) / 3
p.red := p.green := p.blue := average;
}
I later realized that the numbers have weights attached to them:
Grayscale(Pixel p) {
average := (c1 * p.red^gamma) + (c2 * p.green^gamma) + (c3 * p.blue^gamma)
p.red := p.green := p.blue := average
}
and thus my program did not work.
I was specifically targeting the iOS grayscale filter, although the algorithm should be the same for any system.
(side note: I don't know the weights of the iOS values, so if anyone could give me those in the algorithm as well, that'd be pretty cool. I found this thread discussing a similar problem but it doesn't answer my question: https://www.reddit.com/r/compsci/comments/a1yrf1/what_algorithm_do_ios_devices_use_to_grayscale/ )
Assuming that the grayscale uses some sort of weights, what algorithm could I use to change the rgb values of the given pixel while keeping the grayscale values of the pixel the same?
The lens model in OpenCV is a sort of distortion model which distorts an ideal position to the corresponding real (distorted) position:
x_corrected = x_distorted ( 1 + k_1 * r^2 + k_2 * r^4 + ...),
y_corrected = y_distorted ( 1 + k_1 * r^2 + k_2 * r^4 + ...),
where r^2 = x_distorted^2 + y_distorted^2 in the normalized image coordinate (the tangential distortion is omitted for simplicity). This is also found in Z. Zhang: "A Flexible New Technique for Camera Calibration," TPAMI 2000, and also in "Camera Calibration Toolbox for Matlab" by Bouguet.
On the other hand, Bradski and Kaehler: "Learning OpenCV" introduces in p.376 the lens model as a correction model which corrects a distorted position to the ideal position:
x_distorted = x_corrected ( 1 + k'_1 * r'^2 + k'_2 * r'^4 + ...),
y_distorted = y_corrected ( 1 + k'_1 * r'^2 + k'_2 * r'^4 + ...),
where r'^2 = x_corrected^2 + y_corrected^2 in the normalized image coordinate.
Hartley and Zisserman: "Multiple View Geometry in Computer Vision" also describes this model.
I understand the both correction and distortion models have advantages and disadvantages in practice. For example, the former makes correction of detected feature point locations easy, while the latter makes the undistortion of the entire image straightforward.
My question is, why they share the same polynomial expression, while they are supposed to be the inverse of each other? I could find this document evaluating the inversibility, but its theoretical background is not clear to me.
Thank you for your help.
I think the short answer is: they are just different models, so they're not supposed to be each other's inverse. Like you already wrote, each has its own advantages and disadvantages.
As to inversibility, this depends on the order of the polynomial. A 2nd-order (quadratic) polynomial is easily inverted. A 4th-order requires some more work, but can still be analytically inverted. But as soon as you add a 6th-order term, you'll probably have to resort to numeric methods to find the inverse, because a 5th-order or higher polynomial is not analytically invertible in the general case.
According to taylor expansion every formula in world can be written as c0 + c1*x + c2*x^2 + c3*x^3 + c4*x^4...
The goal is just discover the constants.
In our particular case the expression must be symmetric in x and -x (even function) so the constants in x, x^3, x^5, x^7 are equal to zero.
I am working on replicating a neural network. I'm trying to get an understanding of how the standard layer types work. In particular, I'm having trouble finding a description anywhere of how cross-channel normalisation layers behave on the backward-pass.
Since the normalization layer has no parameters, I could guess two possible options:
The error gradients from the next (i.e. later) layer are passed backwards without doing anything to them.
The error gradients are normalized in the same way the activations are normalized across channels in the forward pass.
I can't think of a reason why you'd do one over the other based on any intuition, hence why I'd like some help on this.
EDIT1:
The layer is a standard layer in caffe, as described here http://caffe.berkeleyvision.org/tutorial/layers.html (see 'Local Response Normalization (LRN)').
The layer's implementation in the forward pass is described in section 3.3 of the alexNet paper: http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
EDIT2:
I believe the forward and backward pass algorithms are described in both the Torch library here: https://github.com/soumith/cudnn.torch/blob/master/SpatialCrossMapLRN.lua
and in the Caffe library here: https://github.com/BVLC/caffe/blob/master/src/caffe/layers/lrn_layer.cpp
Please could anyone who is familiar with either/both of these translate the method for the backward pass stage into plain english?
It uses the chain rule to propagate the gradient backwards through the local response normalization layer. It is somewhat similar to a nonlinearity layer in this sense (which also doesn't have trainable parameters on its own, but does affect gradients going backwards).
From the code in Caffe that you linked to I see that they take the error in each neuron as a parameter, and compute the error for the previous layer by doing following:
First, on the forward pass they cache a so-called scale, that is computed (in terms of AlexNet paper, see the formula from section 3.3) as:
scale_i = k + alpha / n * sum(a_j ^ 2)
Here and below sum is sum indexed by j and goes from max(0, i - n/2) to min(N, i + n/2)
(note that in the paper they do not normalize by n, so I assume this is something that Caffe does differently than AlexNet). Forward pass is then computed as b_i = a_i + scale_i ^ -beta.
To backward propagate the error, let's say that the error coming from the next layer is be_i, and the error that we need to compute is ae_i. Then ae_i is computed as:
ae_i = scale_i ^ -b * be_i - (2 * alpha * beta / n) * a_i * sum(be_j * b_j / scale_j)
Since you are planning to implement it manually, I will also share two tricks that Caffe uses in their code that makes the implementation simpler:
When you compute the addends for the sum, allocate an array of size N + n - 1, and pad it with n/2 zeros on each end. This way you can compute the sum from i - n/2 to i + n/2, without caring about going below zero and beyond N.
You don't need to recompute the sum on each iteration, instead compute the the addends in advance (a_j^2 for the front pass, be_j * b_j / scale_j for the backward pass), then compute the sum for i = 0, and then for each consecutive i just add addend[i + n/2] and subtract addend[i - n/2 - 1], it will give you the value of the sum for the new value of i in constant time.
Of cause,you can either print the variables to observe the changes with them or use the debug model to see how errors change during passing the net.
I have an alternative formulation of the backward and I don't know if it is equivalent to caffe's:
So caffe's is :
ae_i = scale_i ^ -b * be_i - (2 * alpha * beta / n) * a_i * sum(be_j * b_j / scale_j)
by differentiating the original expression
b_i = a_i/(scale_i^-b)
I get
ae_i = scale_i ^ -b * be_i - (2 * alpha * beta / n) * a_i * be_i*sum(ae_j)/scale_i^(-b-1)
The line in question is pretty contained:
w00 * ptr[0] + w01 * ptr[stride] + w10 * ptr[1] + w11 * ptr[stride+1]
Considering these variables are double (but I can downgrade to float), I think I can pass one value per register? Would it be more efficient to use the 2x2 matrix W directly?
EDIT 1:
This line is inside a loop that is fired hundreds of times per second and has real-time requirements. Instruments says this line takes 60% of the time of the loop.
EDIT 2:
This is the loop(s) I'm talking about:
for (int x=startingX; x<endingX; ++x)
{
for (int y=startingY; y<endingY; ++y)
{
Matx21d position(x,y);
// warp patch
uint8_t *data;
[self backwardWarpPatchWithWarpingMatrix:warpingMatrix withWarpData:&data withReferenceImage:_initialView withCenter:position];
// check that the backward patch was successful
if (!data)
continue;
// calculate zero mean (on the patch) sum of squared differences
int ssd = [self computeZMSSDScoreWithX:x withY:y withCurrentTargetPatch:data];
if (fabs(ssd) < bestSSD)
{
bestPosition = position;
bestSSD = ssd;
}
}
}
backwardWarpPatchWithWarpingMatrix:
Matx22d warpingMatrixInverse = warpingMatrix.inv();
double wmi0 = warpingMatrixInverse(0,0), wmi1 = warpingMatrixInverse(0,1), wmi2 = warpingMatrixInverse(1,0), wmi3 = warpingMatrixInverse(1,1);
if (isnan(wmi0))
{
warpingMatrixInverse = Matx22d::eye();
}
// Perform the warp on a larger patch.
int LEVEL_REF = 0, halfPatchSize = PATCH_SIZE/2;
Matx21d centerInLevel = center * (1.0 / (1<<LEVEL_REF));
__block Mat warped(PATCH_SIZE, PATCH_SIZE, CV_8UC1);
dispatch_apply(PATCH_SIZE, dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), ^(size_t y)
{
for (int x=0; x<PATCH_SIZE; ++x)
{
double pp0 = x - halfPatchSize, pp1 = (double)y - halfPatchSize;
Matx21d multiplication(wmi0 * pp0 + wmi1 * pp1, wmi2 * pp0 + wmi3 * pp1);
Matx21d px(multiplication(0) + centerInLevel(0), multiplication(1) + centerInLevel(1));
double warpedPixel = [self interpolatePointInImage:referenceImage withU:px(0) withV:px(1)];
warped.at<uchar>(y,x) = (uint8_t)warpedPixel;
}
});
computeReferencePatchScores:
int x = (int)u;
int y = (int)v;
float subpixX = u - x,
subpixY = v - y,
oneMinusSubpixX = 1.0 - subpixX,
oneMinusSubpixY = 1.0 - subpixY;
float w00 = oneMinusSubpixX * oneMinusSubpixY,
w01 = oneMinusSubpixX * subpixY,
w10 = subpixX * oneMinusSubpixY,
w11 = 1.0f - w00 - w01 - w10;
const int stride = (int)image.step.p[0];
uchar* ptr = image.data + y * stride + x;
return w00 * ptr[0] + w01 * ptr[stride] + w10 * ptr[1] + w11 * ptr[stride+1];
You typically don't translate a single line of code into assembly. For it to be worth writing in assembly, you have to first assume that you can generate better assembly than the compiler will. Sometimes that's true for vectorized code on NEON, but it's usually because you have special knowledge about a complex loop. You're unlikely to beat the compiler significantly on a single line of code (and will likely lose). Is this line part of a loop that you've profiled and identified as a major bottleneck? Have you already tried Accelerate? Have you analyzed the assembly the compiler is generating and found mistakes that it's making.
Trying to do this in ObjC++ is very inefficient. ObjC++ is a glue language for tying together C++ and ObjC; doing both in the same file imposes several performance costs, especially with ARC. Calling an ObjC method inside of a performance-critical inner-loop is very expensive in any case (even if there weren't mixed-in C++). You should never do any kind of function call (least of all an ObjC method dispatch) inside of a tight inner-loop. It's not clear where you're actually calling computeReferencePatchScores. The use of GCD here is probably hurting you more than helping (since it prevents the compiler from applying certain vector optimizations).
This is all to say: how a particular line of code is being compiled into assembly is by far the least of your problems in this code. Its structure is fighting clang's optimizer.
Step one is to step back and ask what computation you want to execute, and then read through the Core Image Programming Guide and the vImage Programming Guide and verify that it isn't already available. You might also look over OpenGL ES, but OpenGL is often a whole approach to drawing (so it's a bit more of a commitment). It looks like you're already using OpenCV, so make sure it doesn't have available functions to do what you want. (Most of what I see in there looks like stuff built into both OpenCV and vImage.)
The simplest way to improve performance without moving to more powerful frameworks is to move the entire loop into a single C++ function. Then the optimizer can see all the code and apply vector operations on its own. But the next step is to make use of the high-level high-performance frameworks already available.
In any case, you'll want to sit down and carefully work through exactly the calculations you need to perform (I usually do this by hand on paper). Make sure you're not duplicating anything, that you need every calculation you're performing, and that each change you make still generates the same result.
This looks to be a 2x2 convolution. If the data set is large, then vImageConvolve_PlanarF with a 3x3 kernel with some zero padding in it will do the job. It tries to skip work on kernel elements that are 0. You would need to convert the data set to single precision.
If the data set is small, then you are probably stuck with scalar code performance. Inline the function if you can. Perhaps you can figure out how to aggregate a bunch of these together to take advantage of a heavier duty high performance routine.
However, if the weights change from pixel to pixel, then a convolution isn't going to work. You may look instead at the N-dimensional lookup table feature in vImage/Transform.h, if your data set is not huge.
I am a bit skeptical that the time is really spent just in that line. It is best to look at the assembly view in instruments to see where the samples really land.
I am writing some image processing program using OpenCV.
I need to transform the image using several perspective transformations.
Perspective transformation is defined by the matrix. I know, that we can get complex affine transform by multiplication of the simple transform matriсes (rotation, translation, etc.).
But when I tried to multiply two perspective transformation matrices, I didn't get the transformation matrix, that corresponds to the consequently used first and second matrix.
So, how can I get the matrix of several consequent perspective transformations?
Let you have two perspective matrices C:(x,y)->(u,v) and D:(u,v)->(r,g):
And you try to get M:(x,y)->(r,g)
You should substitute ui and vi from (1),(2) to the equations (3),(4).
ui = (c00*xi + c01*yi + c02) / (c20*xi + c21*yi + c22) (1)
vi = (c10*xi + c11*yi + c12) / (c20*xi + c21*yi + c22) (2)
ri = (d00*ui + d01*vi + d02) / (d20*ui + d21*vi + d22) (3)
gi = (d10*ui + d11*vi + d12) / (d20*ui + d21*vi + d22) (4)
After that you can see that M = D*C