Reconciling convolution, fft and manual DFT results - signal-processing

In Octave, I am playing with signal processing primitives, attempting to reproduce the convolution theorem in multiple ways: that convolution in the time domain is equivalent to point-wise multiplication in the frequency domain.
I consider three routes to reconstruct the original signal:
The fft and ifft functions
The conv function
A manually constructed DFT matrix.
I am attaching my working code, and the output, glad for inputs on where the bugs may be located.
N = 512; % number of points
t = 0:N-1; % [0,1,2,...,N-1]
h = exp(-t); % filter impulse reponse
H = fft(h); % filter frequency response
x = (1+t) .* sin(sqrt(1+t)); % (input signal of our choice)
y1 = conv(x,h,"same"); % Direct convolution
y2 = ifft(fft(x) .* H); % FFT convolution
T = transpose(t) * t;
W = exp(j * 2*pi/N * T); % DFT matrix
y3 = (x * W .* H) * W/N; % "Manual" convolution
lw = 2
plot(t,
x, ";orig;", "linewidth", lw+1,
y1, ";conv;", "linestyle", "--", "linewidth", lw,
real(y2), ";fft;", "linestyle", ":", "linewidth", lw,
real(y3), ";manual;", "linestyle", "-.", "linewidth", lw)
set(gca, "fontsize", 20, "linewidth", lw)
In the first case (yellow), I am able to reconstruct the signal, but the scaling is not right (in fact the scaling is wrong in every case).
In the second case (red), it looks like the result is shifted, and half of the signal is lost.
In the third case (purple), I get something that's equivalent to fft but flipped horizontally.

Issues:
Your definition of the kernel is not right. conv expects the origin of the kernel to be in the middle of the array (at floor(N/2) + 1), so your t, for the purposes of building the kernel, should be t-floor(N/2) (this puts the 0 at the right location). Also, the kernel should be normalized to avoid changing the signal strength with the convolution. Just divide h by its sum.
But now, H = fft(h) will be wrong, because fft (and the DFT) expects the origin to be on the first element of the array. Use ifftshift to circularly shift the kernel array to put its origin on the first element: H = fft(ifftshift(h)).
For the manual DFT, you use the same matrix for the forward and the inverse transforms. You need to conjugate transpose the matrix to compute the inverse transform: y3 = (x * W .* H) * W'/N.
This is my corrected code (also some changes in the plotting to make it compatible with MATLAB):
N = 512; % number of points
t = 0:N-1; % [0,1,2,...,N-1]
h = exp(-abs(t-floor(N/2))); % filter impulse response (this definition is symmetric around the origin, for a causal filter set the left half of h to zero)
h = h / sum(h); % normalize
H = fft(ifftshift(h)); % filter frequency response
x = (1+t) .* sin(sqrt(1+t)); % (input signal of our choice)
y1 = conv(x,h,"same"); % Direct convolution
y2 = ifft(fft(x) .* H); % FFT convolution
T = transpose(t) * t;
W = exp(2j*pi / N * T); % DFT matrix
y3 = (x * W .* H) * W'/N; % "Manual" convolution
lw = 2;
clf
hold on
plot(t,x, "displayname", "orig", "linewidth", lw+1)
plot(t,y1, "displayname", "conv", "linestyle", "--", "linewidth", lw)
plot(t,real(y2), "displayname", "fft", "linestyle", ":", "linewidth", lw)
plot(t,real(y3), "displayname", "manual", "linestyle", "-.", "linewidth", lw)
legend

(just transforming my comment into an answer)
There are several flaws:
you are comparing the convolution of x with h, to x: why do you expect the same amplitudes?
the "same" option doesn't do what you think it does. y1 = conv(x,h); y1 = y1(1:N); gives the expected result.
one of the two W has to be transposed-conjugated: y3 = (x * W' .* H) * W/N;
Corrected code:
N = 512; % number of points
t = 0:N-1; % [0,1,2,...,N-1]
h = exp(-t); % filter impulse reponse
H = fft(h); % filter frequency response
x = (1+t) .* sin(sqrt(1+t)); % (input signal of our choice)
y1 = conv(x,h); y1 = y1(1:N); % Direct convolution
y2 = ifft(fft(x) .* H); % FFT convolution
T = transpose(t) * t;
W = exp(j * 2*pi/N * T); % DFT matrix
y3 = (x * W' .* H) * W/N; % "Manual" convolution
lw = 2
plot(t,
x, ";orig;", "linewidth", lw+1,
y1, ";conv;", "linestyle", "--", "linewidth", lw,
real(y2), ";fft;", "linestyle", ":", "linewidth", lw,
real(y3), ";manual;", "linestyle", "-.", "linewidth", lw)

Related

Gradient function not able to find optimal theta but normal equation does

I tried implementing my own linear regression model in octave with some sample data but the theta does not seem to be correct and does not match the one provided by the normal equation which gives the correct values of theta. But running my model(with different alpha and iterations) on the data from Andrew Ng's machine learning course gives the proper theta for the hypothesis. I have tweaked alpha and iterations so that the cost function decreases. This is the image of cost function against iterations.. As you can see the cost decreases and plateaus but not to a low enough cost. Can somebody help me understand why this is happening and what I can do to fix it?
Here is the data (The first column is the x values, and the second column is the y values):
20,48
40,55.1
60,56.3
80,61.2
100,68
Here is the graph of the data and the equations plotted by gradient descent(GD) and by the normal equation(NE).
Code for the main script:
clear , close all, clc;
%loading the data
data = load("data1.txt");
X = data(:,1);
y = data(:,2);
%Plotting the data
figure
plot(X,y, 'xr', 'markersize', 7);
xlabel("Mass in kg");
ylabel("Length in cm");
X = [ones(length(y),1), X];
theta = ones(2, 1);
alpha = 0.000001; num_iter = 4000;
%Running gradientDescent
[opt_theta, J_history] = gradientDescent(X, y, theta, alpha, num_iter);
%Running Normal equation
opt_theta_norm = pinv(X' * X) * X' * y;
%Plotting the hypothesis for GD and NE
hold on
plot(X(:,2), X * opt_theta);
plot(X(:,2), X * opt_theta_norm, 'g-.', "markersize",10);
legend("Data", "GD", "NE");
hold off
%Plotting values of previous J with each iteration
figure
plot(1:numel(J_history), J_history);
xlabel("iterations"); ylabel("J");
Function for finding gradientDescent:
function [theta, J_history] = gradientDescent (X, y, theta, alpha, num_iter)
m = length(y);
J_history = zeros(num_iter,1);
for iter = 1:num_iter
theta = theta - (alpha / m) * (X' * (X * theta - y));
J_history(iter) = computeCost(X, y, theta);
endfor
endfunction
Function for computing cost:
function J = computeCost (X, y, theta)
J = 0;
m = length(y);
errors = X * theta - y;
J = sum(errors .^ 2) / (2 * m);
endfunction
Try alpha = 0.0001 and num_iter = 400000. This will solve your problem!
Now, the problem with your code is that the learning rate is way too less which is slowing down the convergence. Also, you are not giving it enough time to converge by limiting the training iterations to 4000 only which is very less given the learning rate.
Summarising, the problem is: less learning rate + less iterations.

linear regression with one variable Gradient descent

I want to ask how this equation
can be written at octave by this way
predictions = X * theta;
delta = (1/m) * X' * (predictions - y);
theta = theta - alpha * delta;
I dont understand from where transpose come and how this equation converted to ve by this way?
The scalar product X.Y is mathematically sum (xi * yi) and can be written as X' * Y in octave when X and Y are vectors.
There are other ways to write a scalar product in octave, cf
https://octave.sourceforge.io/octave/function/dot.html
The question seems to be, given an example where:
X = randn(m, k); % m 'input' horizontal-vectors of dimensionality k
y = randn(m, n); % m 'target' horizontal-vectors of dimensionality n
theta = randn(k, n); % a (right) transformation from k to n dimensional
% horizontal-vectors
h = X * theta; % creates m rows of n-dimensional horizontal vectors
how is it that the following code
delta = zeros(k,n)
for j = 1 : k % iterating over all dimensions of the input
for l = 1 : n % iterating over all dimensions of the output
for i = 1 : m % iterating over all observations for that j,l pair
delta(j, l) += (1/m) * (h(i, l) - y(i, l)) * x(i,j);
end
theta(j, l) = theta(j, l) - alpha * delta(j, l);
end
end
can be vectorised as:
h = X * theta ;
delta = (1/ m) * X' * (h - y);
theta = theta - alpha * delta;
To confirm such a vectorised formulation makes sense, it always helps to note (e.g. below each line) the dimensions of the objects involved in the matrix / vectorised operations:
h = X * theta ;
% [m, n] [m, k] [k, n]
delta = (1/ m) * X' * (h - y);
% [k, n] [1, 1] [k, m] [m, n]
theta = theta - alpha * delta;
% [k, n] [k,n] [1, 1] [k, n]
Hopefully now it will become more obvious that they are equivalent.
W.r.t the X' * D calculation (where D = predictions - y) you can see that:
performing matrix multiplication with the 1st row of X' and the 1st column of D is equal to summing for k=1 and n=1 over all m observations, and placing that result at position [k=1, n=1] in the resulting matrix output. Then moving along the columns of D and still multiplying by the 1st row of X', you can see that we are simply moving along the n dimensions in D, and placing the result accordingly in the output. Similarly, moving along the rows of X', you move along the k dimensions of X', performing the same process for all n in that D, and placing the results accordingly, until you've finished matrix multiplications over all rows of X and columns in D.
If you follow the logic above, you will see that the summations involved are exactly the same as in the for loop formulation, but we managed to avoid using a for loop and use matrix operations instead.

How to convert bounding box (x1, y1, x2, y2) to YOLO Style (X, Y, W, H)

I'm training a YOLO model, I have the bounding boxes in this format:-
x1, y1, x2, y2 => ex (100, 100, 200, 200)
I need to convert it to YOLO format to be something like:-
X, Y, W, H => 0.436262 0.474010 0.383663 0.178218
I already calculated the center point X, Y, the height H, and the weight W.
But still need a away to convert them to floating numbers as mentioned.
for those looking for the reverse of the question (yolo format to normal bbox format)
def yolobbox2bbox(x,y,w,h):
x1, y1 = x-w/2, y-h/2
x2, y2 = x+w/2, y+h/2
return x1, y1, x2, y2
Here's code snipet in python to convert x,y coordinates to yolo format
def convert(size, box):
dw = 1./size[0]
dh = 1./size[1]
x = (box[0] + box[1])/2.0
y = (box[2] + box[3])/2.0
w = box[1] - box[0]
h = box[3] - box[2]
x = x*dw
w = w*dw
y = y*dh
h = h*dh
return (x,y,w,h)
im=Image.open(img_path)
w= int(im.size[0])
h= int(im.size[1])
print(xmin, xmax, ymin, ymax) #define your x,y coordinates
b = (xmin, xmax, ymin, ymax)
bb = convert((w,h), b)
Check my sample program to convert from LabelMe annotation tool format to Yolo format https://github.com/ivder/LabelMeYoloConverter
There is a more straight-forward way to do those stuff with pybboxes. Install with,
pip install pybboxes
use it as below,
import pybboxes as pbx
voc_bbox = (100, 100, 200, 200)
W, H = 1000, 1000 # WxH of the image
pbx.convert_bbox(voc_bbox, from_type="voc", to_type="yolo", image_size=(W,H))
>>> (0.15, 0.15, 0.1, 0.1)
Note that, converting to YOLO format requires the image width and height for scaling.
YOLO normalises the image space to run from 0 to 1 in both x and y directions. To convert between your (x, y) coordinates and yolo (u, v) coordinates you need to transform your data as u = x / XMAX and y = y / YMAX where XMAX, YMAX are the maximum coordinates for the image array you are using.
This all depends on the image arrays being oriented the same way.
Here is a C function to perform the conversion
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <math.h>
struct yolo {
float u;
float v;
};
struct yolo
convert (unsigned int x, unsigned int y, unsigned int XMAX, unsigned int YMAX)
{
struct yolo point;
if (XMAX && YMAX && (x <= XMAX) && (y <= YMAX))
{
point.u = (float)x / (float)XMAX;
point.v = (float)y / (float)YMAX;
}
else
{
point.u = INFINITY;
point.v = INFINITY;
errno = ERANGE;
}
return point;
}/* convert */
int main()
{
struct yolo P;
P = convert (99, 201, 255, 324);
printf ("Yolo coordinate = <%f, %f>\n", P.u, P.v);
exit (EXIT_SUCCESS);
}/* main */
There are two potential solutions. First of all you have to understand if your first bounding box is in the format of Coco or Pascal_VOC. Otherwise you can't do the right math.
Here is the formatting;
Coco Format: [x_min, y_min, width, height]
Pascal_VOC Format: [x_min, y_min, x_max, y_max]
Here are some Python Code how you can do the conversion:
Converting Coco to Yolo
# Convert Coco bb to Yolo
def coco_to_yolo(x1, y1, w, h, image_w, image_h):
return [((2*x1 + w)/(2*image_w)) , ((2*y1 + h)/(2*image_h)), w/image_w, h/image_h]
Converting Pascal_voc to Yolo
# Convert Pascal_Voc bb to Yolo
def pascal_voc_to_yolo(x1, y1, x2, y2, image_w, image_h):
return [((x2 + x1)/(2*image_w)), ((y2 + y1)/(2*image_h)), (x2 - x1)/image_w, (y2 - y1)/image_h]
If need additional conversions you can check my article at Medium: https://christianbernecker.medium.com/convert-bounding-boxes-from-coco-to-pascal-voc-to-yolo-and-back-660dc6178742
For yolo format to x1,y1, x2,y2 format
def yolobbox2bbox(x,y,w,h):
x1 = int((x - w / 2) * dw)
x2 = int((x + w / 2) * dw)
y1 = int((y - h / 2) * dh)
y2 = int((y + h / 2) * dh)
if x1 < 0:
x1 = 0
if x2 > dw - 1:
x2 = dw - 1
if y1 < 0:
y1 = 0
if y2 > dh - 1:
y2 = dh - 1
return x1, y1, x2, y2
There are two things you need to do:
Divide the coordinates by the image size to normalize them to [0..1] range.
Convert (x1, y1, x2, y2) coordinates to (center_x, center_y, width, height).
If you're using PyTorch, Torchvision provides a function that you can use for the conversion:
from torch import tensor
from torchvision.ops import box_convert
image_size = tensor([608, 608])
boxes = tensor([[100, 100, 200, 200], [300, 300, 400, 400]], dtype=float)
boxes[:, :2] /= image_size
boxes[:, 2:] /= image_size
boxes = box_convert(boxes, "xyxy", "cxcywh")
Just reading the answers I am also looking for this but find this more informative to know what happening at the backend.
Form Here: Source
Assuming x/ymin and x/ymax are your bounding corners, top left and bottom right respectively. Then:
x = xmin
y = ymin
w = xmax - xmin
h = ymax - ymin
You then need to normalize these, which means give them as a proportion of the whole image, so simple divide each value by its respective size from the values above:
x = xmin / width
y = ymin / height
w = (xmax - xmin) / width
h = (ymax - ymin) / height
This assumes a top-left origin, you will have to apply a shift factor if this is not the case.
So the answer

Gradient Descent Octave Code

need help in completing this function. Getting an error while trying to find out derJ :
error: X(0,_): subscripts must be either integers 1 to (2^63)-1 or logicals
My code:
function [theta, J_history] = gradientDescent (X, y, theta, alpha, num_iters)
m = length (y); % number of training examples
J_history = zeros (num_iters, 1);
for iter = 1 : num_iters
predictions = X * theta; % hypothesis
% derivative term for cost function
derJ = (1 / m) * sum ( (predictions - y) * X(iter-1, 2) );
% updating theta values
theta = theta - (alpha * derJ);
J_history(iter) = computeCost (X, y, theta);
end
end
Your code states X(iter - 1, 2), but in your for loop iter starts from 1.
Therefore in the very first iteration, X(iter - 1, 2) will evaluate to X(0,2), and 0 is not a valid index in matlab.

In case of logistic regression, how should I interpret this learning curve between cost and number of examples?

I have obtained the following learning curve on plotting the learning curves for training and cross validation sets between the error cost, and number of training examples (in 100s in the graph). Can someone please tell me if this learning curve is ever possible? Because I am of the impression that the Cross validation error should decrease as the number of training examples increase.
Learning Curve. Note that the x axis denotes the number of training examples in 100s.
EDIT :
This is the code which I use to calculate the 9 values for plotting the learning curves.
X is the 2D matrix of the training set examples. It is of dimensions m x (n+1). y is of dimensions m x 1, and each element has value 1 or 0.
for j=1:9
disp(j)
[theta,J] = trainClassifier(X(1:(j*100),:),y(1:(j*100)),lambda);
[error_train(j), grad] = costprediciton_train(theta , X(1:(j*100),:), y(1:(j*100)));
[error_cv(j), grad] = costfunction_test2(theta , Xcv(1:(j*100),:),ycv(1:(j*100)));
end
The code I use for finding the optimal value of Theta from the training set.
% Train the classifer. Return theta
function [optTheta, J] = trainClassifier(X,y,lambda)
[m,n]=size(X);
initialTheta = zeros(n, 1);
options=optimset('GradObj','on','MaxIter',100);
[optTheta, J, Exit_flag ] = fminunc(#(t)(regularizedCostFunction(t, X, y, lambda)), initialTheta, options);
end
%regularized cost
function [J, grad] = regularizedCostFunction(theta, X, y,lambda)
[m,n]=size(X);
h=sigmoid( X * theta);
temp1 = -1 * (y .* log(h));
temp2 = (1 - y) .* log(1 - h);
thetaT = theta;
thetaT(1) = 0;
correction = sum(thetaT .^ 2) * (lambda / (2 * m));
J = sum(temp1 - temp2) / m + correction;
grad = (X' * (h - y)) * (1/m) + thetaT * (lambda / m);
end
The code I use for calculating the error cost for prediction of results for training set: (similar is the code for error cost of CV set)
Theta is of dimensions (n+1) x 1 and consists of the coefficients of the features in the hypothesis function.
function [J,grad] = costprediciton_train(theta , X, y)
[m,n]=size(X);
h=sigmoid(X * theta);
temp1 = y .* log(h);
temp2 = (1-y) .* log(1- h);
J = -sum (temp1 + temp2)/m;
t=h-y;
grad=(X'*t)*(1/m);
end
function [J,grad] = costfunction_test2(theta , X, y)
m= length(y);
h=sigmoid(X*theta);
temp1 = y .* log(h);
temp2 = (1-y) .* log(1- h);
J = -sum (temp1 + temp2)/m ;
grad = (X' * (h - y)) * (1/m) ;
end
The Sigmoid function:
function g = sigmoid(z)
g= zeros(size(z));
den=1 + exp(-1*z);
g = 1 ./ den;
end

Resources