I've been trying to implement gradient descent in Octave. This is the code I have so far:
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
%GRADIENTDESCENT Performs gradient descent to learn theta
% theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by
% taking num_iters gradient steps with learning rate alpha
% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
% ====================== YOUR CODE HERE ======================
% Instructions: Perform a single gradient step on the parameter vector
% theta.
%
% Hint: While debugging, it can be useful to print out the values
% of the cost function (computeCost) and gradient here.
%
theta
X
y
theta' .* X
for inner = 1:length(theta)
hypothesis = (X * theta - y)';
% Updating the parameters
temp0 = theta(1) - (alpha * (1/m) * hypothesis * X(:, 1));
temp1 = theta(2) - (alpha * (1/m) * hypothesis * X(:, 2));
theta(1) = temp0;
theta(2) = temp1;
J_history(iter) = computeCost(X, y, theta);
end
end
I can't really tell what's going wrong with this code, it compiles and runs but it's being auto-graded and it fails every time.
EDIT: Sorry, wasn't specific. I was supposed to implement a single step of GD, not the whole loop
EDIT 2: Here's the full thing. Only the stuff inside the for loop is relevant imo.
EDIT 3: Both test cases fail, so there's something wrong with my calculations.
I think my problem is that I had an extra for loop in there for some reason.
Related
So I am trying to solve the first programming exercise from Andrew Ng's ML Coursera course. I have a little bit of trouble implementing linear gradient descent in octave. The code below shows what I am trying to implement, per the equation posted in the picture, but I am getting a different value from the expected value. I'm not sure what I am missing, I'm hoping someone can parse through this.
%GRADIENTDESCENT Performs gradient descent to learn theta
% theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by
% taking num_iters gradient steps with learning rate alpha
% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
theta0 = theta(1);
theta1 = theta(2);
temp0 = 0;
temp1 = 0;
errFunc = 0;
for iter = 1:num_iters
h = X * theta;
errFunc = h - y;
temp0 = temp0 + (alpha/m).*sum(errFunc'*X(:, 1));
temp1 = temp1 + (alpha/m).*sum(errFunc'*X(:, 2));
theta0 = theta0 - temp0;
theta1 = theta1 - temp1;
theta = [theta0; theta1];
% ============================================================
% Save the cost J in every iteration
J_history(iter) = computeCost(X, y, theta);
end
end
code
My expected results are [ -3.6303; 1.1664], but I am getting [-1.361798; 0.931592]. This is the equation I am working with. results
I tried implementing my own linear regression model in octave with some sample data but the theta does not seem to be correct and does not match the one provided by the normal equation which gives the correct values of theta. But running my model(with different alpha and iterations) on the data from Andrew Ng's machine learning course gives the proper theta for the hypothesis. I have tweaked alpha and iterations so that the cost function decreases. This is the image of cost function against iterations.. As you can see the cost decreases and plateaus but not to a low enough cost. Can somebody help me understand why this is happening and what I can do to fix it?
Here is the data (The first column is the x values, and the second column is the y values):
20,48
40,55.1
60,56.3
80,61.2
100,68
Here is the graph of the data and the equations plotted by gradient descent(GD) and by the normal equation(NE).
Code for the main script:
clear , close all, clc;
%loading the data
data = load("data1.txt");
X = data(:,1);
y = data(:,2);
%Plotting the data
figure
plot(X,y, 'xr', 'markersize', 7);
xlabel("Mass in kg");
ylabel("Length in cm");
X = [ones(length(y),1), X];
theta = ones(2, 1);
alpha = 0.000001; num_iter = 4000;
%Running gradientDescent
[opt_theta, J_history] = gradientDescent(X, y, theta, alpha, num_iter);
%Running Normal equation
opt_theta_norm = pinv(X' * X) * X' * y;
%Plotting the hypothesis for GD and NE
hold on
plot(X(:,2), X * opt_theta);
plot(X(:,2), X * opt_theta_norm, 'g-.', "markersize",10);
legend("Data", "GD", "NE");
hold off
%Plotting values of previous J with each iteration
figure
plot(1:numel(J_history), J_history);
xlabel("iterations"); ylabel("J");
Function for finding gradientDescent:
function [theta, J_history] = gradientDescent (X, y, theta, alpha, num_iter)
m = length(y);
J_history = zeros(num_iter,1);
for iter = 1:num_iter
theta = theta - (alpha / m) * (X' * (X * theta - y));
J_history(iter) = computeCost(X, y, theta);
endfor
endfunction
Function for computing cost:
function J = computeCost (X, y, theta)
J = 0;
m = length(y);
errors = X * theta - y;
J = sum(errors .^ 2) / (2 * m);
endfunction
Try alpha = 0.0001 and num_iter = 400000. This will solve your problem!
Now, the problem with your code is that the learning rate is way too less which is slowing down the convergence. Also, you are not giving it enough time to converge by limiting the training iterations to 4000 only which is very less given the learning rate.
Summarising, the problem is: less learning rate + less iterations.
I am currently taking Andrew Nguyen's coursera course on machine learning. For the week 2 assignments, I have to create a function that performs gradient descent. However, everytime i call the function in the command line, it doesn't work and returns "error: 'num_iters' undefined near line 1 column 37". I have attached the code below(which is in a function file).
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
%GRADIENTDESCENT Performs gradient descent to learn theta
% theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by
% taking num_iters gradient steps with learning rate alpha
% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
% ====================== YOUR CODE HERE ======================
% Instructions: Perform a single gradient step on the parameter vector
% theta.
%
% Hint: While debugging, it can be useful to print out the values
% of the cost function (computeCost) and gradient here.
%
for iter = 1:num_iters
h = X * theta;
theta = theta - (alpha/m)*((h-y)'*X)';
% ============================================================
% Save the cost J in every iteration
J_history(iter) = computeCost(X, y, theta);
end
end
U have used for loop twice inside the function which is unnecessary.
Try removing one for loop (maybe the one above the "YOUR CODE HERE"
I am enrolled in Andrew Ng's Machine Learning course. And there is an assignment, which I did complete of course... but the code I was using earlier wasn't working... and in the end I had to look for the right answer on Internet.
But anyways, from "wasn't working" I mean that it was working for that particular question, but thing is when they check my code then they have a different training set so my code should work with all types of training sets in order to be considered a right code. But the problem was that my code wasn't working with other training sets.
Now, to make it a bit more clear... all other training sets were supposed to have only one feature... that is we only needed to implement linear regression with one variable.
So I did everything right till the part where we have to calculate cost function. But when I had to compute the gradient descent, in that function I had to just write the code for the gradient descent steps.
So here's my code -
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
% First Step
Prediction = X*theta;
A = (Prediction - y);
theta(1) = theta(1) - ((alpha * (1/m)) * sum(A' * X(:,1)));
% Second Step
Prediction = X*theta;
A = (Prediction - y);
theta(2) = theta(2) - ((alpha * (1/m)) * sum(A' * X(:,2)));
J_history(iter) = computeCost(X, y, theta);
end
end
Here is the code which is working, which I found on Internet... from one of the threads in Stack Overflow only -
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
Prediction = X*theta;
A = (Prediction - y);
theta = theta - ((alpha * (1/m)) * (A' * X)');
J_history(iter) = computeCost(X, y, theta);
end
end
From what I can see from my code, that it should work properly... but dunno why it isn't working with other training sets? Because no matter how many training examples there are, it is always gonna be a column vector, or m*2 if we add x0. So I don't see how my code won't work?
need help in completing this function. Getting an error while trying to find out derJ :
error: X(0,_): subscripts must be either integers 1 to (2^63)-1 or logicals
My code:
function [theta, J_history] = gradientDescent (X, y, theta, alpha, num_iters)
m = length (y); % number of training examples
J_history = zeros (num_iters, 1);
for iter = 1 : num_iters
predictions = X * theta; % hypothesis
% derivative term for cost function
derJ = (1 / m) * sum ( (predictions - y) * X(iter-1, 2) );
% updating theta values
theta = theta - (alpha * derJ);
J_history(iter) = computeCost (X, y, theta);
end
end
Your code states X(iter - 1, 2), but in your for loop iter starts from 1.
Therefore in the very first iteration, X(iter - 1, 2) will evaluate to X(0,2), and 0 is not a valid index in matlab.