I've been stuck for hours trying to figure it out why my code for Gradient Descent on Linear Regression won't converge.
tried with really small alphas and really huge iterations, still won't work.
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
%GRADIENTDESCENT Performs gradient descent
% t to learn theta
% theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by
% taking num_iters gradient steps with learning rate alpha
% Initialize some useful values
m = length(y) % number of training examples
ST0=0;
ST1=0;
for iter = 1:num_iters
for i=1:m
ST0=ST0+((theta(1)+theta(2)*X(i))-y(i));
ST1=ST1+(((theta(1)+theta(2)*X(i))-y(i)))*X(i,2);
end
ST0=ST0/m;
ST0=ST0*alpha;
ST1=ST1/m;
ST1=ST1*alpha;
theta(1)=theta(1)-ST0;
theta(2)=theta(2)-ST1;
J= computeCost(X, y, theta);
J_history(iter) = J;
end
end
Related
So I am trying to solve the first programming exercise from Andrew Ng's ML Coursera course. I have a little bit of trouble implementing linear gradient descent in octave. The code below shows what I am trying to implement, per the equation posted in the picture, but I am getting a different value from the expected value. I'm not sure what I am missing, I'm hoping someone can parse through this.
%GRADIENTDESCENT Performs gradient descent to learn theta
% theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by
% taking num_iters gradient steps with learning rate alpha
% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
theta0 = theta(1);
theta1 = theta(2);
temp0 = 0;
temp1 = 0;
errFunc = 0;
for iter = 1:num_iters
h = X * theta;
errFunc = h - y;
temp0 = temp0 + (alpha/m).*sum(errFunc'*X(:, 1));
temp1 = temp1 + (alpha/m).*sum(errFunc'*X(:, 2));
theta0 = theta0 - temp0;
theta1 = theta1 - temp1;
theta = [theta0; theta1];
% ============================================================
% Save the cost J in every iteration
J_history(iter) = computeCost(X, y, theta);
end
end
code
My expected results are [ -3.6303; 1.1664], but I am getting [-1.361798; 0.931592]. This is the equation I am working with. results
I am currently taking Andrew Nguyen's coursera course on machine learning. For the week 2 assignments, I have to create a function that performs gradient descent. However, everytime i call the function in the command line, it doesn't work and returns "error: 'num_iters' undefined near line 1 column 37". I have attached the code below(which is in a function file).
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
%GRADIENTDESCENT Performs gradient descent to learn theta
% theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by
% taking num_iters gradient steps with learning rate alpha
% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
% ====================== YOUR CODE HERE ======================
% Instructions: Perform a single gradient step on the parameter vector
% theta.
%
% Hint: While debugging, it can be useful to print out the values
% of the cost function (computeCost) and gradient here.
%
for iter = 1:num_iters
h = X * theta;
theta = theta - (alpha/m)*((h-y)'*X)';
% ============================================================
% Save the cost J in every iteration
J_history(iter) = computeCost(X, y, theta);
end
end
U have used for loop twice inside the function which is unnecessary.
Try removing one for loop (maybe the one above the "YOUR CODE HERE"
I am enrolled in Andrew Ng's Machine Learning course. And there is an assignment, which I did complete of course... but the code I was using earlier wasn't working... and in the end I had to look for the right answer on Internet.
But anyways, from "wasn't working" I mean that it was working for that particular question, but thing is when they check my code then they have a different training set so my code should work with all types of training sets in order to be considered a right code. But the problem was that my code wasn't working with other training sets.
Now, to make it a bit more clear... all other training sets were supposed to have only one feature... that is we only needed to implement linear regression with one variable.
So I did everything right till the part where we have to calculate cost function. But when I had to compute the gradient descent, in that function I had to just write the code for the gradient descent steps.
So here's my code -
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
% First Step
Prediction = X*theta;
A = (Prediction - y);
theta(1) = theta(1) - ((alpha * (1/m)) * sum(A' * X(:,1)));
% Second Step
Prediction = X*theta;
A = (Prediction - y);
theta(2) = theta(2) - ((alpha * (1/m)) * sum(A' * X(:,2)));
J_history(iter) = computeCost(X, y, theta);
end
end
Here is the code which is working, which I found on Internet... from one of the threads in Stack Overflow only -
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
Prediction = X*theta;
A = (Prediction - y);
theta = theta - ((alpha * (1/m)) * (A' * X)');
J_history(iter) = computeCost(X, y, theta);
end
end
From what I can see from my code, that it should work properly... but dunno why it isn't working with other training sets? Because no matter how many training examples there are, it is always gonna be a column vector, or m*2 if we add x0. So I don't see how my code won't work?
I've been trying to implement gradient descent in Octave. This is the code I have so far:
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
%GRADIENTDESCENT Performs gradient descent to learn theta
% theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by
% taking num_iters gradient steps with learning rate alpha
% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
% ====================== YOUR CODE HERE ======================
% Instructions: Perform a single gradient step on the parameter vector
% theta.
%
% Hint: While debugging, it can be useful to print out the values
% of the cost function (computeCost) and gradient here.
%
theta
X
y
theta' .* X
for inner = 1:length(theta)
hypothesis = (X * theta - y)';
% Updating the parameters
temp0 = theta(1) - (alpha * (1/m) * hypothesis * X(:, 1));
temp1 = theta(2) - (alpha * (1/m) * hypothesis * X(:, 2));
theta(1) = temp0;
theta(2) = temp1;
J_history(iter) = computeCost(X, y, theta);
end
end
I can't really tell what's going wrong with this code, it compiles and runs but it's being auto-graded and it fails every time.
EDIT: Sorry, wasn't specific. I was supposed to implement a single step of GD, not the whole loop
EDIT 2: Here's the full thing. Only the stuff inside the for loop is relevant imo.
EDIT 3: Both test cases fail, so there's something wrong with my calculations.
I think my problem is that I had an extra for loop in there for some reason.
My question is based on the data from Coursera course - https://www.coursera.org/learn/machine-learning/, but after a search is appears to be a common problem.
The gradient descent works perfectly on normalize data (pic.1), but goes in wrong direction on original data(pic.2) with J(cost function) growing very fast toward infinity. The difference between the parameters values is about 10^3.
I thought that normalization is required for better execution speed, I really can't see a reason of this growth in the cost function, even after a lot of search. Decreasing 'alpha', e.g. making it 0.001 or 0.0001 doesn't help either.
Please post if you have any ideas!
P.S. (I had manually provided matrices to the functions, where X_buf - normalized version and X_basic - original; Y - vector of all examles Q - theta vector, alpha - leaning rate).
function [theta, J_history] = gradientDescentMulti(X, Y, theta, alpha, num_iters)
m = length(Y);
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
theta = theta - (alpha/m)*X'*(X*theta-Y);
J_history(iter) = computeCostMulti(X, Y, theta);
end
end
And the second function:
function J = computeCostMulti(X, Y, theta)
m = length(Y); % number of training examples
J = 0;
J = (1/(2*rows(X)))*(X*theta-Y)'*(X*theta-Y);
end
Screenshots