Gradient Descent Octave Code - machine-learning

need help in completing this function. Getting an error while trying to find out derJ :
error: X(0,_): subscripts must be either integers 1 to (2^63)-1 or logicals
My code:
function [theta, J_history] = gradientDescent (X, y, theta, alpha, num_iters)
m = length (y); % number of training examples
J_history = zeros (num_iters, 1);
for iter = 1 : num_iters
predictions = X * theta; % hypothesis
% derivative term for cost function
derJ = (1 / m) * sum ( (predictions - y) * X(iter-1, 2) );
% updating theta values
theta = theta - (alpha * derJ);
J_history(iter) = computeCost (X, y, theta);
end
end

Your code states X(iter - 1, 2), but in your for loop iter starts from 1.
Therefore in the very first iteration, X(iter - 1, 2) will evaluate to X(0,2), and 0 is not a valid index in matlab.

Related

Octave / Gradient Descent code: GD works fine, but it won't save the output from the cost function

The Gradient Descent part of this code works fine, but can anyone tell me why it's not initialising (or populating) the vector 'J_history'?
Here's the principal code:
data = load('ex1data1.txt'); %2 columns of data - a single x variable and a single y
X = [ones(m, 1), data(:,1)]; %adds a column of 1s to allow for an intercept term
y = data(:, 2);
m = length(y);
theta = zeros(2, 1); %initialising the vector of coefficient estimates at [0; 0]
iterations = 1500; %how many times to iterate the cost function
alpha = 0.01; %adjustment speed
theta = gradientDescent(X, y, theta, alpha, iterations); %call the GD function
The last line of the principal code calls on function gradientDescent:
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
m = length(y);
J_history = zeros(num_iters, 1); %I don't understand why this doesn't initialise (or generate an error)!
for iter = 1:num_iters
theta = theta - ((alpha/m)*(X*theta-y)'*X)'; %adjusting the coefficient for each iteration
J_history(iter) = computeCost(X, y, theta); %storing the output of the cost function at each iteration - again, I can't figure out why this doesn't work
end
end
And the 'J_history' line in the code above calls on the function computeCost:
function J = computeCost(X, y, theta)
m = length(y);
J = 0;
predictions = X*theta;
sqrErrors = (predictions-y).^2;
J=1/(2*m)*sum(sqrErrors);
end
Thanks in advance for your help
You are calling your gradient function with only one output argument:
theta = gradientDescent(X, y, theta, alpha, iterations); %call the GD function
Your gradient function was defined to output two arguments:
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
In order to get the second argument out, you need to actually call this function with both output arguments:
[ theta, J_history ] = gradientDescent( X, y, theta, alpha, iterations );
In a statement like [A,B,...] = funcname, the [A,B,...] isn't an array; it is special syntax which tells octave how many output arguments to collect. If you only specify one output argument, like you have, then all other output arguments will be discarded.
See: https://docs.octave.org/latest/Assignment-Ops.html for details.

Gradient function not able to find optimal theta but normal equation does

I tried implementing my own linear regression model in octave with some sample data but the theta does not seem to be correct and does not match the one provided by the normal equation which gives the correct values of theta. But running my model(with different alpha and iterations) on the data from Andrew Ng's machine learning course gives the proper theta for the hypothesis. I have tweaked alpha and iterations so that the cost function decreases. This is the image of cost function against iterations.. As you can see the cost decreases and plateaus but not to a low enough cost. Can somebody help me understand why this is happening and what I can do to fix it?
Here is the data (The first column is the x values, and the second column is the y values):
20,48
40,55.1
60,56.3
80,61.2
100,68
Here is the graph of the data and the equations plotted by gradient descent(GD) and by the normal equation(NE).
Code for the main script:
clear , close all, clc;
%loading the data
data = load("data1.txt");
X = data(:,1);
y = data(:,2);
%Plotting the data
figure
plot(X,y, 'xr', 'markersize', 7);
xlabel("Mass in kg");
ylabel("Length in cm");
X = [ones(length(y),1), X];
theta = ones(2, 1);
alpha = 0.000001; num_iter = 4000;
%Running gradientDescent
[opt_theta, J_history] = gradientDescent(X, y, theta, alpha, num_iter);
%Running Normal equation
opt_theta_norm = pinv(X' * X) * X' * y;
%Plotting the hypothesis for GD and NE
hold on
plot(X(:,2), X * opt_theta);
plot(X(:,2), X * opt_theta_norm, 'g-.', "markersize",10);
legend("Data", "GD", "NE");
hold off
%Plotting values of previous J with each iteration
figure
plot(1:numel(J_history), J_history);
xlabel("iterations"); ylabel("J");
Function for finding gradientDescent:
function [theta, J_history] = gradientDescent (X, y, theta, alpha, num_iter)
m = length(y);
J_history = zeros(num_iter,1);
for iter = 1:num_iter
theta = theta - (alpha / m) * (X' * (X * theta - y));
J_history(iter) = computeCost(X, y, theta);
endfor
endfunction
Function for computing cost:
function J = computeCost (X, y, theta)
J = 0;
m = length(y);
errors = X * theta - y;
J = sum(errors .^ 2) / (2 * m);
endfunction
Try alpha = 0.0001 and num_iter = 400000. This will solve your problem!
Now, the problem with your code is that the learning rate is way too less which is slowing down the convergence. Also, you are not giving it enough time to converge by limiting the training iterations to 4000 only which is very less given the learning rate.
Summarising, the problem is: less learning rate + less iterations.

Trouble Implementing Gradient Descent in Octave

I've been trying to implement gradient descent in Octave. This is the code I have so far:
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
%GRADIENTDESCENT Performs gradient descent to learn theta
% theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by
% taking num_iters gradient steps with learning rate alpha
% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
% ====================== YOUR CODE HERE ======================
% Instructions: Perform a single gradient step on the parameter vector
% theta.
%
% Hint: While debugging, it can be useful to print out the values
% of the cost function (computeCost) and gradient here.
%
theta
X
y
theta' .* X
for inner = 1:length(theta)
hypothesis = (X * theta - y)';
% Updating the parameters
temp0 = theta(1) - (alpha * (1/m) * hypothesis * X(:, 1));
temp1 = theta(2) - (alpha * (1/m) * hypothesis * X(:, 2));
theta(1) = temp0;
theta(2) = temp1;
J_history(iter) = computeCost(X, y, theta);
end
end
I can't really tell what's going wrong with this code, it compiles and runs but it's being auto-graded and it fails every time.
EDIT: Sorry, wasn't specific. I was supposed to implement a single step of GD, not the whole loop
EDIT 2: Here's the full thing. Only the stuff inside the for loop is relevant imo.
EDIT 3: Both test cases fail, so there's something wrong with my calculations.
I think my problem is that I had an extra for loop in there for some reason.

In case of logistic regression, how should I interpret this learning curve between cost and number of examples?

I have obtained the following learning curve on plotting the learning curves for training and cross validation sets between the error cost, and number of training examples (in 100s in the graph). Can someone please tell me if this learning curve is ever possible? Because I am of the impression that the Cross validation error should decrease as the number of training examples increase.
Learning Curve. Note that the x axis denotes the number of training examples in 100s.
EDIT :
This is the code which I use to calculate the 9 values for plotting the learning curves.
X is the 2D matrix of the training set examples. It is of dimensions m x (n+1). y is of dimensions m x 1, and each element has value 1 or 0.
for j=1:9
disp(j)
[theta,J] = trainClassifier(X(1:(j*100),:),y(1:(j*100)),lambda);
[error_train(j), grad] = costprediciton_train(theta , X(1:(j*100),:), y(1:(j*100)));
[error_cv(j), grad] = costfunction_test2(theta , Xcv(1:(j*100),:),ycv(1:(j*100)));
end
The code I use for finding the optimal value of Theta from the training set.
% Train the classifer. Return theta
function [optTheta, J] = trainClassifier(X,y,lambda)
[m,n]=size(X);
initialTheta = zeros(n, 1);
options=optimset('GradObj','on','MaxIter',100);
[optTheta, J, Exit_flag ] = fminunc(#(t)(regularizedCostFunction(t, X, y, lambda)), initialTheta, options);
end
%regularized cost
function [J, grad] = regularizedCostFunction(theta, X, y,lambda)
[m,n]=size(X);
h=sigmoid( X * theta);
temp1 = -1 * (y .* log(h));
temp2 = (1 - y) .* log(1 - h);
thetaT = theta;
thetaT(1) = 0;
correction = sum(thetaT .^ 2) * (lambda / (2 * m));
J = sum(temp1 - temp2) / m + correction;
grad = (X' * (h - y)) * (1/m) + thetaT * (lambda / m);
end
The code I use for calculating the error cost for prediction of results for training set: (similar is the code for error cost of CV set)
Theta is of dimensions (n+1) x 1 and consists of the coefficients of the features in the hypothesis function.
function [J,grad] = costprediciton_train(theta , X, y)
[m,n]=size(X);
h=sigmoid(X * theta);
temp1 = y .* log(h);
temp2 = (1-y) .* log(1- h);
J = -sum (temp1 + temp2)/m;
t=h-y;
grad=(X'*t)*(1/m);
end
function [J,grad] = costfunction_test2(theta , X, y)
m= length(y);
h=sigmoid(X*theta);
temp1 = y .* log(h);
temp2 = (1-y) .* log(1- h);
J = -sum (temp1 + temp2)/m ;
grad = (X' * (h - y)) * (1/m) ;
end
The Sigmoid function:
function g = sigmoid(z)
g= zeros(size(z));
den=1 + exp(-1*z);
g = 1 ./ den;
end

Why simple logistic regression requires millions of iterations to converge?

The problem is extremely simple, there are just 5 samples.
But the Gradient Descent converges extremely slow, like couple of millions of iterations.
Why, is there a mistake in my algorithm?
P.S. The Julia code below:
X = [
1.0 34.6237 78.0247;
1.0 30.2867 43.895;
1.0 35.8474 72.9022;
1.0 60.1826 86.3086;
1.0 79.0327 75.3444
]
Y = [0 0 0 1 1]'
sigmoid(z) = 1 / (1 + e ^ -z)
# Cost function.
function costJ(Theta, X, Y)
m = length(Y)
H = map(z -> sigmoid(z), (Theta' * X')')
sum((-Y)'*log(H) - (1-Y)'*log(1 - H)) / m
end
# Gradient.
function gradient(Theta, X, Y)
m = length(Y)
H = map(z -> sigmoid(z), (Theta' * X')')
(((X'*H - X'*Y)') / m)'
end
# Gradient Descent.
function gradientDescent(X, Y, Theta, alpha, nIterations)
m = length(Y)
jHistory = Array(Float64, nIterations)
for i = 1:nIterations
jHistory[i] = costJ(Theta, X, Y)
Theta = Theta - alpha * gradient(Theta, X, Y)
end
Theta, jHistory
end
gradientDescent(X, Y, [0 0 0]', 0.0001, 1000)
I think #colinefang's comment may be the right diagnosis. Try plotting jHistory - does it always decrease?
Another thing you can do is add a simple linesearch on each iteration to make sure the cost always decreases, something like:
function linesearch(g, X, Y, Theta; alpha=1.0)
init_cost = costJ(Theta, X, Y)
while costJ(Theta - alpha*g, X, Y) > init_cost
alpha = alpha / 2.0 # or divide by some other constant >1
end
return alpha
end
Then modify the gradient descent function slightly to search over alpha on each iteration:
for i = 1:nIterations
g = gradient(Theta, X, Y)
alpha = linesearch(g,X,Y,Theta)
Theta = Theta - alpha * g
end
There are various performance enhancements you can make to the above code. I just wanted to show you the flavor.

Resources