PyTorch: Broadcasting a weighted sum of tensors

PyTorch: Broadcasting a weighted sum of tensors - machine-learning

I am wondering whether I could broadcast the operation below without using a for loop.
X1 = torch.rand(B,d)
X2 = torch.rand(B,d)
X3 = torch.rand(B,d)
X = [X1,X2,X3]
w = torch.rand(3)
res = 0
for i in range(3):
res += X[i] * w[i]:
An important thing to note here is that w is a parameter that I need to optimize. So I need to track its gradient.
Any help would be appreciated.
I just don't know how to speed up this operation without ruining the gradient map.

Related

Gradient function not able to find optimal theta but normal equation does

I tried implementing my own linear regression model in octave with some sample data but the theta does not seem to be correct and does not match the one provided by the normal equation which gives the correct values of theta. But running my model(with different alpha and iterations) on the data from Andrew Ng's machine learning course gives the proper theta for the hypothesis. I have tweaked alpha and iterations so that the cost function decreases. This is the image of cost function against iterations.. As you can see the cost decreases and plateaus but not to a low enough cost. Can somebody help me understand why this is happening and what I can do to fix it?
Here is the data (The first column is the x values, and the second column is the y values):
20,48
40,55.1
60,56.3
80,61.2
100,68
Here is the graph of the data and the equations plotted by gradient descent(GD) and by the normal equation(NE).
Code for the main script:
clear , close all, clc;
%loading the data
data = load("data1.txt");
X = data(:,1);
y = data(:,2);
%Plotting the data
figure
plot(X,y, 'xr', 'markersize', 7);
xlabel("Mass in kg");
ylabel("Length in cm");
X = [ones(length(y),1), X];
theta = ones(2, 1);
alpha = 0.000001; num_iter = 4000;
%Running gradientDescent
[opt_theta, J_history] = gradientDescent(X, y, theta, alpha, num_iter);
%Running Normal equation
opt_theta_norm = pinv(X' * X) * X' * y;
%Plotting the hypothesis for GD and NE
hold on
plot(X(:,2), X * opt_theta);
plot(X(:,2), X * opt_theta_norm, 'g-.', "markersize",10);
legend("Data", "GD", "NE");
hold off
%Plotting values of previous J with each iteration
figure
plot(1:numel(J_history), J_history);
xlabel("iterations"); ylabel("J");
Function for finding gradientDescent:
function [theta, J_history] = gradientDescent (X, y, theta, alpha, num_iter)
m = length(y);
J_history = zeros(num_iter,1);
for iter = 1:num_iter
theta = theta - (alpha / m) * (X' * (X * theta - y));
J_history(iter) = computeCost(X, y, theta);
endfor
endfunction
Function for computing cost:
function J = computeCost (X, y, theta)
J = 0;
m = length(y);
errors = X * theta - y;
J = sum(errors .^ 2) / (2 * m);
endfunction

Try alpha = 0.0001 and num_iter = 400000. This will solve your problem!
Now, the problem with your code is that the learning rate is way too less which is slowing down the convergence. Also, you are not giving it enough time to converge by limiting the training iterations to 4000 only which is very less given the learning rate.
Summarising, the problem is: less learning rate + less iterations.

What does CWT amplitude mean?

I've a big issue with continuous wavelet transform. I've created this signal
t = 0:1/2000:1-1/2000;
dt = 1/2000;
x1 = sin(50*pi*t).*exp(-50*pi*(t-0.2).^2);
x2 = sin(50*pi*t).*exp(-100*pi*(t-0.5).^2);
x3 = 2*cos(140*pi*t).*exp(-50*pi*(t-0.2).^2);
x4 = 2*sin(140*pi*t).*exp(-80*pi*(t-0.8).^2);
x = x1+x2+x3+x4;
And its rappresentation on time is
Than I computed its Fourier Transform in the classic way
Ts =1/Fs;
N = length(x);
t = 0:Ts:Ts*N-Ts;
FTx = fft(x,N);
S = (abs(FTx).^2)/N; %amplitude
f_FT = (0:Fs/N:Fs-Fs/N);
S = S(1:N/2); % i reject half signal
f_FT = f_FT(1:N/2);
And its rappresentation is
I computed the continuous wavelet transform using the function cwtft
s0 = 2;
a0 = 2^(1/32);
scales = (s0*a0.^(32:7*32)).*dt;
cwtx = cwtft({x,dt},'Scales',scales,'Wavelet',{'bump',[4 0.9]});
figure;
contour(t,cwtx.frequencies,abs(cwtx.cfs))
xlabel('Seconds'), ylabel('Hz');
grid on;
title('Analytic CWT using Bump Wavelet')
hcol = colorbar;
hcol.Label.String = 'Magnitude';
I don't know what magnitude rappresent, and specially why it differs so much with the values obtained with the classic FFT. Is there a way to convert it?
Thank you so much

There are two dimensions present here: Time and Frequency (Scale). The third dimension, which is magnitude or amplitude, shows the intensity of your frequencies (scales, which in wavelets scales are the inverse of frequencies) in those local times.
I guess it is much simpler than what I just explained. You are basically looking at a similar image to these three dimensions from the top in your last image in the question:
Source
Machine Learning with Signal Processing Techniques

Linear Regression - Implementing Feature Scaling

I was trying to implement Linear Regression in Octave 5.1.0 on a data set relating the GRE score to the probability of Admission.
The data set is of the sort,
337 0.92
324 0.76
316 0.72
322 0.8
. . .
My main Program.m file looks like,
% read the data
data = load('Admission_Predict.txt');
% initiate variables
x = data(:,1);
y = data(:,2);
m = length(y);
theta = zeros(2,1);
alpha = 0.01;
iters = 1500;
J_hist = zeros(iters,1);
% plot data
subplot(1,2,1);
plot(x,y,'rx','MarkerSize', 10);
title('training data');
% compute cost function
x = [ones(m,1), (data(:,1) ./ 300)]; % feature scaling
J = computeCost(x,y,theta);
% run gradient descent
[theta, J_hist] = gradientDescent(x,y,theta,alpha,iters);
hold on;
subplot(1,2,1);
plot((x(:,2) .* 300), (x*theta),'-');
xlabel('GRE score');
ylabel('Probability');
hold off;
subplot (1,2,2);
plot(1:iters, J_hist, '-b');
xlabel('no: of iteration');
ylabel('Cost function');
computeCost.m looks like,
function J = computeCost(x,y,theta)
m = length(y);
h = x * theta;
J = (1/(2*m))*sum((h-y) .^ 2);
endfunction
and gradientDescent.m looks like,
function [theta, J_hist] = gradientDescent(x,y,theta,alpha,iters)
m = length(y);
J_hist = zeros(iters,1);
for i=1:iters
diff = (x*theta - y);
theta = theta - (alpha * (1/(m))) * (x' * diff);
J_hist(i) = computeCost(x,y,theta);
endfor
endfunction
The graphs plotted then looks like this,
which you can see, doesn't feel right even though my Cost function seems to be minimized.
Can someone please tell me if this is right? If not, what am I doing wrong?

The easiest way to check whether your implementation is correct is to compare with a validated implementation of linear regression. I suggest using an alternative implementation approach like the one suggested here, and then comparing your results. If the fits match, then this is the best linear fit to your data and if they don't match, then there may be something wrong in your implementation.

Correlation (offset detection) issues - Signal power concentrated at edge of domain

I'm in a bit of a bind - I am in too deep to quickly apply another technique, so here goes nothing...
I'm doing line tracking by correlating each row of a matrix with the row below and taking the max of the correlation to compute the offset. It works extremely well EXCEPT when the signals are up against the edge of the domain. It simply gives a 0. I suspect this is is because it is advantageous to simply add in place rather than shift in 0's to the edge. Here are some example signals that cause the issue. These signals aren't zero-mean, but they are when I correlate (I subtract the mean). I get the correct offset for the third image, but not for the first two.
Here is my correlation code
x0 -= mean(x0)
x1 -= mean(x1)
x0 /= max(x0)
x1 /= max(x1)
c = signal.correlate(x1, x0, mode='full')
m = interp_peak_offset(c)
foffset =(m - len(x0) + 1) * (f[2] - f[1])
I have tried clipping one of the signals by 20 samples on each side, correlating the gradient of the signal, and some other wonky methods with no success...
Any help is greatly appreciated! Thanks so much!

Instead of looking for the maximum amplitude, you should look for phase difference.
This can be achieved using the PHAT ( Phase Transform) method:
def PHAT(x, y, fs, nperseg=50):
f, pxy = csd(x, y, fs=1.0, nperseg=nperseg, return_onesided=False)
pxy_phase = np.divide(pxy, np.abs(pxy))
gcc_fun = np.real(ifft(pxy_phase)) # generelized cross correlation.
TDOA = np.argmax(gcc_fun) / float(fs)
return TDOA

I ended up minimizing the average absolute difference between the two vectors. For each time shift, I computed the absolute difference/number of points of overlap. Here is my function that does so
def offset_using_diff(x0, x1, f):
#Finds the offset of x0 from x1 such that x0(f) ~ x1(f - foffset). Does so by
#minimizing the average absolute difference between the two signals, with one signal
#shifted.
#In other words, we minimize |x0 - x1|/N where N is the number of points overlapping
#between x1 and the shifted version of x0
#Args:
# x0,x1 (vector): data
# f (vector): frequency vector
#Returns:
# foffset (float): frequency offset
OMAX = min(len(x0) // 2, 100) # max offset in samples
dvec = zeros((2 * OMAX,))
offsetvec = arange(-OMAX + 1, OMAX + 1)
y0 = x0.copy()
y1 = x1.copy()
y0 -= min(y0)
y1 -= min(y1)
y0 = pad(y0, (100, 100), 'constant', constant_values=(0, 0))
y1 = pad(y1, (100, 100), 'constant', constant_values=(0, 0))
for i, offset in enumerate(offsetvec):
d0 = roll(y0, offset)
d1 = y1
iinds1 = d0 != 0
iinds2 = d1 != 0
iinds = logical_and(iinds1, iinds2)
d0 = d0[iinds]
d1 = d1[iinds]
diff = d0 - d1
dvec[i] = sum(abs(diff))/len(d0)
m = interp_peak_offset(-1*dvec)
foffset = (m - OMAX + 1)*(f[2]-f[1])
return foffset

Obtain sigma of gaussian blur between two images

Suppose I have an image A, I applied Gaussian Blur on it with Sigam=3 So I got another Image B. Is there a way to know the applied sigma if A,B is given?
Further clarification:
Image A:
Image B:
I want to write a function that take A,B and return Sigma:
double get_sigma(cv::Mat const& A,cv::Mat const& B);
Any suggestions?

EDIT1: The suggested approach doesn't work in practice in its original form(i.e. using only 9 equations for a 3 x 3 kernel), and I realized this later. See EDIT1 below for an explanation and EDIT2 for a method that works.
EDIT2: As suggested by Humam, I used the Least Squares Estimate (LSE) to find the coefficients.
I think you can estimate the filter kernel by solving a linear system of equations in this case. A linear filter weighs the pixels in a window by its coefficients, then take their sum and assign this value to the center pixel of the window in the result image. So, for a 3 x 3 filter like
the resulting pixel value in the filtered image
result_pix_value = h11 * a(y, x) + h12 * a(y, x+1) + h13 * a(y, x+2) +
h21 * a(y+1, x) + h22 * a(y+1, x+1) + h23 * a(y+1, x+2) +
h31 * a(y+2, x) + h32 * a(y+2, x+1) + h33 * a(y+2, x+2)
where a's are the pixel values within the window in the original image. Here, for the 3 x 3 filter you have 9 unknowns, so you need 9 equations. You can obtain those 9 equations using 9 pixels in the resulting image. Then you can form an Ax = b system and solve for x to obtain the filter coefficients. With the coefficients available, I think you can find the sigma.
In the following example I'm using non-overlapping windows as shown to obtain the equations.
You don't have to know the size of the filter. If you use a larger size, the coefficients that are not relevant will be close to zero.
Your result image size is different than the input image, so i didn't use that image for following calculation. I use your input image and apply my own filter.
I tested this in Octave. You can quickly run it if you have Octave/Matlab. For Octave, you need to load the image package.
I'm using the following kernel to blur the image:
h =
0.10963 0.11184 0.10963
0.11184 0.11410 0.11184
0.10963 0.11184 0.10963
When I estimate it using a window size 5, I get the following. As I said, the coefficients that are not relevant are close to zero.
g =
9.5787e-015 -3.1508e-014 1.2974e-015 -3.4897e-015 1.2739e-014
-3.7248e-014 1.0963e-001 1.1184e-001 1.0963e-001 1.8418e-015
4.1825e-014 1.1184e-001 1.1410e-001 1.1184e-001 -7.3554e-014
-2.4861e-014 1.0963e-001 1.1184e-001 1.0963e-001 9.7664e-014
1.3692e-014 4.6182e-016 -2.9215e-014 3.1305e-014 -4.4875e-014
EDIT1:
First of all, my apologies.
This approach doesn't really work in the practice. I've used the filt = conv2(a, h, 'same'); in the code. The resulting image data type in this case is double, whereas in the actual image the data type is usually uint8, so there's loss of information, which we can think of as noise. I simulated this with the minor modification filt = floor(conv2(a, h, 'same'));, and then I don't get the expected results.
The sampling approach is not ideal, because it's possible that it results in a degenerated system. Better approach is to use random sampling, avoiding the borders and making sure the entries in the b vector are unique. In the ideal case, as in my code, we are making sure the system Ax = b has a unique solution this way.
One approach would be to reformulate this as Mv = 0 system and try to minimize the squared norm of Mv under the constraint squared-norm v = 1, which we can solve using SVD. I could be wrong here, and I haven't tried this.
Another approach is to use the symmetry of the Gaussian kernel. Then a 3x3 kernel will have only 3 unknowns instead of 9. I think, this way we impose additional constraints on v of the above paragraph.
I'll try these out and post the results, even if I don't get the expected results.
EDIT2:
Using the LSE, we can find the filter coefficients as pinv(A'A)A'b. For completion, I'm adding a simple (and slow) LSE code.
Initial Octave Code:
clear all
im = double(imread('I2vxD.png'));
k = 5;
r = floor(k/2);
a = im(:, :, 1); % take the red channel
h = fspecial('gaussian', [3 3], 5); % filter with a 3x3 gaussian
filt = conv2(a, h, 'same');
% use non-overlapping windows to for the Ax = b syatem
% NOTE: boundry error checking isn't performed in the code below
s = floor(size(a)/2);
y = s(1);
x = s(2);
w = k*k;
y1 = s(1)-floor(w/2) + r;
y2 = s(1)+floor(w/2);
x1 = s(2)-floor(w/2) + r;
x2 = s(2)+floor(w/2);
b = [];
A = [];
for y = y1:k:y2
for x = x1:k:x2
b = [b; filt(y, x)];
f = a(y-r:y+r, x-r:x+r);
A = [A; f(:)'];
end
end
% estimated filter kernel
g = reshape(A\b, k, k)
LSE method:
clear all
im = double(imread('I2vxD.png'));
k = 5;
r = floor(k/2);
a = im(:, :, 1); % take the red channel
h = fspecial('gaussian', [3 3], 5); % filter with a 3x3 gaussian
filt = floor(conv2(a, h, 'same'));
s = size(a);
y1 = r+2; y2 = s(1)-r-2;
x1 = r+2; x2 = s(2)-r-2;
b = [];
A = [];
for y = y1:2:y2
for x = x1:2:x2
b = [b; filt(y, x)];
f = a(y-r:y+r, x-r:x+r);
f = f(:)';
A = [A; f];
end
end
g = reshape(A\b, k, k) % A\b returns the least squares solution
%g = reshape(pinv(A'*A)*A'*b, k, k)

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

PyTorch: Broadcasting a weighted sum of tensors - machine-learning

Related

Gradient function not able to find optimal theta but normal equation does

What does CWT amplitude mean?

Linear Regression - Implementing Feature Scaling

Correlation (offset detection) issues - Signal power concentrated at edge of domain

Obtain sigma of gaussian blur between two images

Categories

Resources