Vectorization issue - machine-learning

Vectorization issue - machine-learning

Say you have two column vectors vv and ww, each with 7 elements (i.e., they have dimensions 7x1). Consider the following code:
z = 0;
for i = 1:7
z = z + v(i) * w(i)
end
A) z = sum (v .* w);
B) z = w' * v;
C) z = v * w;
D) z = w * v;
According to the solutions, answers (A) AND (B) are the right answers, can someone please help me understand why?
Why is z = v * w' which is similar to answer (B) but only the order of the operation changes, is false? Since we want a vector that by definition only has one column, wouldn't we need a matrix of this size: 1x7 * 7x1 = 1x1 ? So why is z = v' * w false ? It gives the same dimension as answer (B)?

z = v'*w is true and is equal to w'*v.
They both makes 1*1 matrix, which is a number value in octave.
See this:
octave:5> v = rand(7, 1);
octave:6> w = rand(7, 1);
octave:7> v'*w
ans = 1.3110
octave:8> w'*v
ans = 1.3110
octave:9> sum(v.*w)
ans = 1.3110

Answers A and B both perform a dot product of the two vectors, which yields the same result as the code provided. Answer A first performs the element-wise product (.*) of the two column vectors, then sums those intermediate values. Answer B performs the same mathematical operation but does so via a dot product (i.e., matrix multiplication).
Answer C is incorrect because it would be performing a matrix multiplication on misaligned matrices (7x1 and 7x1). The same is true for D.
z = v * w', which was not one of the options, is incorrect because it would yield a 7x7 matrix (instead of the 1x1 scalar value desired). The point is that order matters when performing matrix multiplication. (1xN)X(Nx1) -> (1x1), whereas (Nx1)X(1xN) -> (NxN).
z = v' * w is actually a correct solution but was simply not provided as one of the options.

Related

A working function but having trouble with a particular float value

This function takes a float then spits out the two integers for the decimal value. At least that was the intention
let flr (x:float) = float(int(x))
let f x =
let r y = let x = x * y in x = flr(x)
let rec f y =
if r(y)
then x*y,y
else f(y+1.0)
f 1.0
f 0.2;;
val it: float * float = (1.0, 5.0)
f 3.14;;
val it: float * float = (157.0, 50.0)
Here is an example where the integers, er will be integers eventually rather, have not been "simplified"
f 0.14;;
val it: float * float = (35.0, 250.0)
Checking the fractional part to be less than .01, as opposed to equaling exactly zero, got around this issue but I don't really like that solution. So I set it back to what you see in the code above. I am using the function below for some of the values that do not simplify though:
let g (x,y) =
let rec f k =
if x/k = flr(x/k)
then g(k)
else f(k-1.0)
and g k =
if y/k = flr(y/k)
then x/k,y/k
else f(k-1.0)
if x < y then f x else f y
Anyway, the main issue is this value:
3.142857143
Homeboy just keeps grinding without stack errors and I'm not sure what I've ran into here. Any clarity would be awesome! Thanks y'all.

Your algorithm is trying to find a rational number to represent a decimal number (represented as a floating point number).
For any input x, you are looking for a number represented as p/q such that x=p/q and you do this by incrementing q, starting from 1 and checking if you can find an integer p to make this work.
This works fine for numbers that have a nice rational representation like 0.2, but it does not work great for numbers like 3.142857 that do not have a simpler rational representation. For 3.142857, you will just keep iterating until you reach 3142857/1000000 (which is technically correct, but not very helpful).
As mentioned in the comments, there are issues caused by the fact that floating-point numbers cannot be precisely compared, but also, iterating like this for 3.142857143 might just take too long.
You can look up better algorithms for finding a rational number for a given decimal. You could also see if you can accept some margin of error. If you do not need a completely precise solution, you could for example change your r test function to something like:
let r y =
let x = x * y
x < flr(x) + 0.0001 && x > flr(x) + 0.0001
This will not give you exactly the same number, but it will likely find a solution that is good enough.

linear regression with one variable Gradient descent

I want to ask how this equation
can be written at octave by this way
predictions = X * theta;
delta = (1/m) * X' * (predictions - y);
theta = theta - alpha * delta;
I dont understand from where transpose come and how this equation converted to ve by this way?

The scalar product X.Y is mathematically sum (xi * yi) and can be written as X' * Y in octave when X and Y are vectors.
There are other ways to write a scalar product in octave, cf
https://octave.sourceforge.io/octave/function/dot.html

The question seems to be, given an example where:
X = randn(m, k); % m 'input' horizontal-vectors of dimensionality k
y = randn(m, n); % m 'target' horizontal-vectors of dimensionality n
theta = randn(k, n); % a (right) transformation from k to n dimensional
% horizontal-vectors
h = X * theta; % creates m rows of n-dimensional horizontal vectors
how is it that the following code
delta = zeros(k,n)
for j = 1 : k % iterating over all dimensions of the input
for l = 1 : n % iterating over all dimensions of the output
for i = 1 : m % iterating over all observations for that j,l pair
delta(j, l) += (1/m) * (h(i, l) - y(i, l)) * x(i,j);
end
theta(j, l) = theta(j, l) - alpha * delta(j, l);
end
end
can be vectorised as:
h = X * theta ;
delta = (1/ m) * X' * (h - y);
theta = theta - alpha * delta;
To confirm such a vectorised formulation makes sense, it always helps to note (e.g. below each line) the dimensions of the objects involved in the matrix / vectorised operations:
h = X * theta ;
% [m, n] [m, k] [k, n]
delta = (1/ m) * X' * (h - y);
% [k, n] [1, 1] [k, m] [m, n]
theta = theta - alpha * delta;
% [k, n] [k,n] [1, 1] [k, n]
Hopefully now it will become more obvious that they are equivalent.
W.r.t the X' * D calculation (where D = predictions - y) you can see that:
performing matrix multiplication with the 1st row of X' and the 1st column of D is equal to summing for k=1 and n=1 over all m observations, and placing that result at position [k=1, n=1] in the resulting matrix output. Then moving along the columns of D and still multiplying by the 1st row of X', you can see that we are simply moving along the n dimensions in D, and placing the result accordingly in the output. Similarly, moving along the rows of X', you move along the k dimensions of X', performing the same process for all n in that D, and placing the results accordingly, until you've finished matrix multiplications over all rows of X and columns in D.
If you follow the logic above, you will see that the summations involved are exactly the same as in the for loop formulation, but we managed to avoid using a for loop and use matrix operations instead.

Obtain sigma of gaussian blur between two images

Suppose I have an image A, I applied Gaussian Blur on it with Sigam=3 So I got another Image B. Is there a way to know the applied sigma if A,B is given?
Further clarification:
Image A:
Image B:
I want to write a function that take A,B and return Sigma:
double get_sigma(cv::Mat const& A,cv::Mat const& B);
Any suggestions?

EDIT1: The suggested approach doesn't work in practice in its original form(i.e. using only 9 equations for a 3 x 3 kernel), and I realized this later. See EDIT1 below for an explanation and EDIT2 for a method that works.
EDIT2: As suggested by Humam, I used the Least Squares Estimate (LSE) to find the coefficients.
I think you can estimate the filter kernel by solving a linear system of equations in this case. A linear filter weighs the pixels in a window by its coefficients, then take their sum and assign this value to the center pixel of the window in the result image. So, for a 3 x 3 filter like
the resulting pixel value in the filtered image
result_pix_value = h11 * a(y, x) + h12 * a(y, x+1) + h13 * a(y, x+2) +
h21 * a(y+1, x) + h22 * a(y+1, x+1) + h23 * a(y+1, x+2) +
h31 * a(y+2, x) + h32 * a(y+2, x+1) + h33 * a(y+2, x+2)
where a's are the pixel values within the window in the original image. Here, for the 3 x 3 filter you have 9 unknowns, so you need 9 equations. You can obtain those 9 equations using 9 pixels in the resulting image. Then you can form an Ax = b system and solve for x to obtain the filter coefficients. With the coefficients available, I think you can find the sigma.
In the following example I'm using non-overlapping windows as shown to obtain the equations.
You don't have to know the size of the filter. If you use a larger size, the coefficients that are not relevant will be close to zero.
Your result image size is different than the input image, so i didn't use that image for following calculation. I use your input image and apply my own filter.
I tested this in Octave. You can quickly run it if you have Octave/Matlab. For Octave, you need to load the image package.
I'm using the following kernel to blur the image:
h =
0.10963 0.11184 0.10963
0.11184 0.11410 0.11184
0.10963 0.11184 0.10963
When I estimate it using a window size 5, I get the following. As I said, the coefficients that are not relevant are close to zero.
g =
9.5787e-015 -3.1508e-014 1.2974e-015 -3.4897e-015 1.2739e-014
-3.7248e-014 1.0963e-001 1.1184e-001 1.0963e-001 1.8418e-015
4.1825e-014 1.1184e-001 1.1410e-001 1.1184e-001 -7.3554e-014
-2.4861e-014 1.0963e-001 1.1184e-001 1.0963e-001 9.7664e-014
1.3692e-014 4.6182e-016 -2.9215e-014 3.1305e-014 -4.4875e-014
EDIT1:
First of all, my apologies.
This approach doesn't really work in the practice. I've used the filt = conv2(a, h, 'same'); in the code. The resulting image data type in this case is double, whereas in the actual image the data type is usually uint8, so there's loss of information, which we can think of as noise. I simulated this with the minor modification filt = floor(conv2(a, h, 'same'));, and then I don't get the expected results.
The sampling approach is not ideal, because it's possible that it results in a degenerated system. Better approach is to use random sampling, avoiding the borders and making sure the entries in the b vector are unique. In the ideal case, as in my code, we are making sure the system Ax = b has a unique solution this way.
One approach would be to reformulate this as Mv = 0 system and try to minimize the squared norm of Mv under the constraint squared-norm v = 1, which we can solve using SVD. I could be wrong here, and I haven't tried this.
Another approach is to use the symmetry of the Gaussian kernel. Then a 3x3 kernel will have only 3 unknowns instead of 9. I think, this way we impose additional constraints on v of the above paragraph.
I'll try these out and post the results, even if I don't get the expected results.
EDIT2:
Using the LSE, we can find the filter coefficients as pinv(A'A)A'b. For completion, I'm adding a simple (and slow) LSE code.
Initial Octave Code:
clear all
im = double(imread('I2vxD.png'));
k = 5;
r = floor(k/2);
a = im(:, :, 1); % take the red channel
h = fspecial('gaussian', [3 3], 5); % filter with a 3x3 gaussian
filt = conv2(a, h, 'same');
% use non-overlapping windows to for the Ax = b syatem
% NOTE: boundry error checking isn't performed in the code below
s = floor(size(a)/2);
y = s(1);
x = s(2);
w = k*k;
y1 = s(1)-floor(w/2) + r;
y2 = s(1)+floor(w/2);
x1 = s(2)-floor(w/2) + r;
x2 = s(2)+floor(w/2);
b = [];
A = [];
for y = y1:k:y2
for x = x1:k:x2
b = [b; filt(y, x)];
f = a(y-r:y+r, x-r:x+r);
A = [A; f(:)'];
end
end
% estimated filter kernel
g = reshape(A\b, k, k)
LSE method:
clear all
im = double(imread('I2vxD.png'));
k = 5;
r = floor(k/2);
a = im(:, :, 1); % take the red channel
h = fspecial('gaussian', [3 3], 5); % filter with a 3x3 gaussian
filt = floor(conv2(a, h, 'same'));
s = size(a);
y1 = r+2; y2 = s(1)-r-2;
x1 = r+2; x2 = s(2)-r-2;
b = [];
A = [];
for y = y1:2:y2
for x = x1:2:x2
b = [b; filt(y, x)];
f = a(y-r:y+r, x-r:x+r);
f = f(:)';
A = [A; f];
end
end
g = reshape(A\b, k, k) % A\b returns the least squares solution
%g = reshape(pinv(A'*A)*A'*b, k, k)

Cost Function, Linear Regression, trying to avoid hard coding theta. Octave.

I'm in the second week of Professor Andrew Ng's Machine Learning course through Coursera. We're working on linear regression and right now I'm dealing with coding the cost function.
The code I've written solves the problem correctly but does not pass the submission process and fails the unit test because I have hard coded the values of theta and not allowed for more than two values for theta.
Here's the code I've got so far
function J = computeCost(X, y, theta)
m = length(y);
J = 0;
for i = 1:m,
h = theta(1) + theta(2) * X(i)
a = h - y(i);
b = a^2;
J = J + b;
end;
J = J * (1 / (2 * m));
end
the unit test is
computeCost( [1 2 3; 1 3 4; 1 4 5; 1 5 6], [7;6;5;4], [0.1;0.2;0.3])
and should produce ans = 7.0175
So I need to add another for loop to iterate over theta, therefore allowing for any number of values for theta, but I'll be damned if I can wrap my head around how/where.
Can anyone suggest a way I can allow for any number of values for theta within this function?
If you need more information to understand what I'm trying to ask, I will try my best to provide it.

You can use vectorize of operations in Octave/Matlab.
Iterate over entire vector - it is really bad idea, if your programm language let you vectorize operations.
R, Octave, Matlab, Python (numpy) allow this operation.
For example, you can get scalar production, if theta = (t0, t1, t2, t3) and X = (x0, x1, x2, x3) in the next way:
theta * X' = (t0, t1, t2, t3) * (x0, x1, x2, x3)' = t0*x0 + t1*x1 + t2*x2 + t3*x3
Result will be scalar.
For example, you can vectorize h in your code in the next way:
H = (theta'*X')';
S = sum((H - y) .^ 2);
J = S / (2*m);

Above answer is perfect but you can also do
H = (X*theta);
S = sum((H - y) .^ 2);
J = S / (2*m);
Rather than computing
(theta' * X')'
and then taking the transpose you can directly calculate
(X * theta)
It works perfectly.

The below line return the required 32.07 cost value while we run computeCost once using θ initialized to zeros:
J = (1/(2*m)) * (sum(((X * theta) - y).^2));
and is similar to the original formulas that is given below.

It can be also done in a line-
m- # training sets
J=(1/(2*m)) * ((((X * theta) - y).^2)'* ones(m,1));

J = sum(((X*theta)-y).^2)/(2*m);
ans = 32.073
Above answer is perfect,I thought the problem deeply for a day and still unfamiliar with Octave,so,Just study together!

If you want to use only matrix, so:
temp = (X * theta - y); % h(x) - y
J = ((temp')*temp)/(2 * m);
clear temp;

This would work just fine for you -
J = sum((X*theta - y).^2)*(1/(2*m))
This directly follows from the Cost Function Equation

Python code for the same :
def computeCost(X, y, theta):
m = y.size # number of training examples
J = 0
H = (X.dot(theta))
S = sum((H - y)**2);
J = S / (2*m);
return J

function J = computeCost(X, y, theta)
m = length(y);
J = 0;
% Hypothesis h(x)
h = X * theta;
% Error function (h(x) - y) ^ 2
squaredError = (h-y).^2;
% Cost function
J = sum(squaredError)/(2*m);
end

I think we needed to use iteration for much general solution for cost rather one iteration, also the result shows in the PDF 32.07 may not be correct answer that grader is looking for reason being its a one case out of many training data.
I think it should loop through like this
for i in 1:iteration
theta = theta - alpha*(1/m)(theta'*x-y)*x
j = (1/(2*m))(theta'*x-y)^2

gradient descent seems to fail

I implemented a gradient descent algorithm to minimize a cost function in order to gain a hypothesis for determining whether an image has a good quality. I did that in Octave. The idea is somehow based on the algorithm from the machine learning class by Andrew Ng
Therefore I have 880 values "y" that contains values from 0.5 to ~12. And I have 880 values from 50 to 300 in "X" that should predict the image's quality.
Sadly the algorithm seems to fail, after some iterations the value for theta is so small, that theta0 and theta1 become "NaN". And my linear regression curve has strange values...
here is the code for the gradient descent algorithm:
(theta = zeros(2, 1);, alpha= 0.01, iterations=1500)
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
tmp_j1=0;
for i=1:m,
tmp_j1 = tmp_j1+ ((theta (1,1) + theta (2,1)*X(i,2)) - y(i));
end
tmp_j2=0;
for i=1:m,
tmp_j2 = tmp_j2+ (((theta (1,1) + theta (2,1)*X(i,2)) - y(i)) *X(i,2));
end
tmp1= theta(1,1) - (alpha * ((1/m) * tmp_j1))
tmp2= theta(2,1) - (alpha * ((1/m) * tmp_j2))
theta(1,1)=tmp1
theta(2,1)=tmp2
% ============================================================
% Save the cost J in every iteration
J_history(iter) = computeCost(X, y, theta);
end
end
And here is the computation for the costfunction:
function J = computeCost(X, y, theta) %
m = length(y); % number of training examples
J = 0;
tmp=0;
for i=1:m,
tmp = tmp+ (theta (1,1) + theta (2,1)*X(i,2) - y(i))^2; %differenzberechnung
end
J= (1/(2*m)) * tmp
end

If you are wondering how the seemingly complex looking for loop can be vectorized and cramped into a single one line expression, then please read on. The vectorized form is:
theta = theta - (alpha/m) * (X' * (X * theta - y))
Given below is a detailed explanation for how we arrive at this vectorized expression using gradient descent algorithm:
This is the gradient descent algorithm to fine tune the value of θ:
Assume that the following values of X, y and θ are given:
m = number of training examples
n = number of features + 1
Here
m = 5 (training examples)
n = 4 (features+1)
X = m x n matrix
y = m x 1 vector matrix
θ = n x 1 vector matrix
xi is the ith training example
xj is the jth feature in a given training example
Further,
h(x) = ([X] * [θ]) (m x 1 matrix of predicted values for our training set)
h(x)-y = ([X] * [θ] - [y]) (m x 1 matrix of Errors in our predictions)
whole objective of machine learning is to minimize Errors in predictions. Based on the above corollary, our Errors matrix is m x 1 vector matrix as follows:
To calculate new value of θj, we have to get a summation of all errors (m rows) multiplied by jth feature value of the training set X. That is, take all the values in E, individually multiply them with jth feature of the corresponding training example, and add them all together. This will help us in getting the new (and hopefully better) value of θj. Repeat this process for all j or the number of features. In matrix form, this can be written as:
This can be simplified as:
[E]' x [X] will give us a row vector matrix, since E' is 1 x m matrix and X is m x n matrix. But we are interested in getting a column matrix, hence we transpose the resultant matrix.
More succinctly, it can be written as:
Since (A * B)' = (B' * A'), and A'' = A, we can also write the above as
This is the original expression we started out with:
theta = theta - (alpha/m) * (X' * (X * theta - y))

i vectorized the theta thing...
may could help somebody
theta = theta - (alpha/m * (X * theta-y)' * X)';

I think that your computeCost function is wrong.
I attended NG's class last year and I have the following implementation (vectorized):
m = length(y);
J = 0;
predictions = X * theta;
sqrErrors = (predictions-y).^2;
J = 1/(2*m) * sum(sqrErrors);
The rest of the implementation seems fine to me, although you could also vectorize them.
theta_1 = theta(1) - alpha * (1/m) * sum((X*theta-y).*X(:,1));
theta_2 = theta(2) - alpha * (1/m) * sum((X*theta-y).*X(:,2));
Afterwards you are setting the temporary thetas (here called theta_1 and theta_2) correctly back to the "real" theta.
Generally it is more useful to vectorize instead of loops, it is less annoying to read and to debug.

If you are OK with using a least-squares cost function, then you could try using the normal equation instead of gradient descent. It's much simpler -- only one line -- and computationally faster.
Here is the normal equation:
http://mathworld.wolfram.com/NormalEquation.html
And in octave form:
theta = (pinv(X' * X )) * X' * y
Here is a tutorial that explains how to use the normal equation: http://www.lauradhamilton.com/tutorial-linear-regression-with-octave

While not scalable like a vectorized version, a loop-based computation of a gradient descent should generate the same results. In the example above, the most probably case of the gradient descent failing to compute the correct theta is the value of alpha.
With a verified set of cost and gradient descent functions and a set of data similar with the one described in the question, theta ends up with NaN values just after a few iterations if alpha = 0.01. However, when set as alpha = 0.000001, the gradient descent works as expected, even after 100 iterations.

Using only vectors here is the compact implementation of LR with Gradient Descent in Mathematica:
Theta = {0, 0}
alpha = 0.0001;
iteration = 1500;
Jhist = Table[0, {i, iteration}];
Table[
Theta = Theta -
alpha * Dot[Transpose[X], (Dot[X, Theta] - Y)]/m;
Jhist[[k]] =
Total[ (Dot[X, Theta] - Y[[All]])^2]/(2*m); Theta, {k, iteration}]
Note: Of course one assumes that X is a n * 2 matrix, with X[[,1]] containing only 1s'

This should work:-
theta(1,1) = theta(1,1) - (alpha*(1/m))*((X*theta - y)'* X(:,1) );
theta(2,1) = theta(2,1) - (alpha*(1/m))*((X*theta - y)'* X(:,2) );

its cleaner this way, and vectorized also
predictions = X * theta;
errorsVector = predictions - y;
theta = theta - (alpha/m) * (X' * errorsVector);

If you remember the first Pdf file for Gradient Descent form machine Learning course, you would take care of learning rate. Here is the note from the mentioned pdf.
Implementation Note: If your learning rate is too large, J(theta) can di-
verge and blow up', resulting in values which are too large for computer
calculations. In these situations, Octave/MATLAB will tend to return
NaNs. NaN stands fornot a number' and is often caused by undened
operations that involve - infinity and +infinity.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Vectorization issue - machine-learning

z = v'w is true and is equal to w'v. They both makes 11 matrix, which is a number value in octave. See this: octave:5> v = rand(7, 1); octave:6> w = rand(7, 1); octave:7> v'w ans = 1.3110 octave:8> w'v ans = 1.3110 octave:9> sum(v.w) ans = 1.3110

Related

A working function but having trouble with a particular float value

linear regression with one variable Gradient descent

Obtain sigma of gaussian blur between two images

Cost Function, Linear Regression, trying to avoid hard coding theta. Octave.

gradient descent seems to fail

Categories

Resources

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Vectorization issue - machine-learning

z = v'*w is true and is equal to w'*v. They both makes 1*1 matrix, which is a number value in octave. See this: octave:5> v = rand(7, 1); octave:6> w = rand(7, 1); octave:7> v'*w ans = 1.3110 octave:8> w'*v ans = 1.3110 octave:9> sum(v.*w) ans = 1.3110

Related

A working function but having trouble with a particular float value

linear regression with one variable Gradient descent

Obtain sigma of gaussian blur between two images

Cost Function, Linear Regression, trying to avoid hard coding theta. Octave.

gradient descent seems to fail

Categories

Resources

z = v'w is true and is equal to w'v. They both makes 11 matrix, which is a number value in octave. See this: octave:5> v = rand(7, 1); octave:6> w = rand(7, 1); octave:7> v'w ans = 1.3110 octave:8> w'v ans = 1.3110 octave:9> sum(v.w) ans = 1.3110