Different answers for the same dataset on applying Linear Regression! Why? - machine-learning

I have written a code for linear regression as follows:
The file data.csv has two columns: X and Y
Here is my code.
import numpy as np
def gradientDescent(x, y, theta, alpha, m, numIterations):
xTrans = x.T
for i in range(numIterations):
print 'Iteration : ',i+1
hypo = np.dot(x, theta)
cost = np.sum((hypo-y)**2) / (2*m)
print 'Cost : ', cost
gradient = np.dot(xTrans, (hypo-y)) / m
theta = theta - alpha * gradient
print 'theta : ', theta
return theta
data = np.loadtxt('/Users/Nikesh/Downloads/linear_regression_live-master/data.csv', delimiter=',')
x = data[:, 0:1]
y = data[:, 1:2]
a = np.ones((100,1))
x = np.append(a, x, axis=1)
m, n = np.shape(x)
numIterations= 1000
alpha = 0.0005
theta = np.ones(n)
theta = theta[:, np.newaxis]
theta = gradientDescent(x, y, theta, alpha, m, numIterations)
The final output after 1000 iterations:
Iteration : 1000 Cost : 56.014846105 theta : [[ 1.1395461 ] [
1.45709467]]
Here, I assume that y = 1.139 + 1.457X
The next code on the same dataset:
import numpy as np
from sklearn import linear_model
data = np.loadtxt('/Users/Nikesh/Downloads/linear_regression_live-master/data.csv', delimiter=',')
regr = linear_model.LinearRegression()
regr.fit(data[:, 0:1], data[:, 1:2])
print 'Co-efficients : ', regr.coef_
print 'Intercept : ', regr.intercept_
print 'Regression line : ',regr.intercept_,'+',regr.coef_,' X'
The Output is :
Co-efficients : [[ 1.32243102]] Intercept : [ 7.99102099]
Regression line : [ 7.99102099] + [[ 1.32243102]] X
And the final one which I found online.
On the same dataset, on applying Linear Regression (Gradient Descent Alg)
A different answer(take a look at this image)
Could someone help me where I am going wrong?
Here is the link for the dataset :
https://github.com/llSourcell/linear_regression_live/blob/master/data.csv

Related

Octave Logistic regression cost function error

I am doing Andrew Ng's ML course on Coursera. Week3 logistic regression cost function using Octave is giving me some errors. I think it's because of incorrect matrix multiplication. Can someone point out my mistakes please?
Data file for training data is located in File ex2Data1.txt which is available from here https://upscfever.com/upsc-fever/en/data/images/ex2.zip
data = load('ex2data1.txt');
X = data(:, [1, 2]); y = data(:, 3);
[m, n] = size(X);
% Add intercept term to x and X_test
X = [ones(m, 1) X];
% Initialize fitting parameters
initial_theta = zeros(n + 1, 1);
% Compute and display initial cost and gradient
[cost, grad] = costFunction(initial_theta, X, y);
Code for my costFunction is as follows;
function [J, grad] = costFunction(theta, X, y)
% Initialize some useful values
m = length(y); % number of training examples
J = 0;
grad = zeros(size(theta));
%calculate hofX --> sigmoid theta'*X
hfX=sigmoid(theta'*X');
%cost --> bring '-' outside
J=-(1/m)*(y'*(log(hfX))')+(1-y)'*(log(1-hfX))';
%gradiant
fifth=(hfX-y)';
grad=(1/m)*(X'*fifth);
end
Code for the sigmoid function is as follows;
function g = sigmoid(z)
%SIGMOID Compute sigmoid function
g = zeros(size(z));
g = (1./(1+e.^(-1*z)));
end

How do you multiply an openCV image by a 3 x 3 matrix?

An openCV h x w RGB image is an array of shape (h, w, 3). What numpy matrix operation will multiply each [B G R] pixel value by a 3 x 3 matrix M?
The desired result S with image A and matrix M is given by
S = np.empty_like(A)
h, w, c = A.shape
for i in range(h):
for j in range(w):
BGR = A[i, j]
for k in range(c):
S[i, j, k] = M[k][0] * BGR[0] + M[k][1] * BGR[1] + M[k][2] * BGR[2]
Is this what you are looking for?
import numpy as np
A = np.random.rand(10, 5, 3)
M = np.random.rand(3, 3)
S = np.einsum("ijk,ka->ija", A, M)
print(S.shape)
Gives (10, 5, 3)
#Ananda was very close. The desired result is given by
S = np.einsum("ijk,ak->ija", A, M)

Gradient function not able to find optimal theta but normal equation does

I tried implementing my own linear regression model in octave with some sample data but the theta does not seem to be correct and does not match the one provided by the normal equation which gives the correct values of theta. But running my model(with different alpha and iterations) on the data from Andrew Ng's machine learning course gives the proper theta for the hypothesis. I have tweaked alpha and iterations so that the cost function decreases. This is the image of cost function against iterations.. As you can see the cost decreases and plateaus but not to a low enough cost. Can somebody help me understand why this is happening and what I can do to fix it?
Here is the data (The first column is the x values, and the second column is the y values):
20,48
40,55.1
60,56.3
80,61.2
100,68
Here is the graph of the data and the equations plotted by gradient descent(GD) and by the normal equation(NE).
Code for the main script:
clear , close all, clc;
%loading the data
data = load("data1.txt");
X = data(:,1);
y = data(:,2);
%Plotting the data
figure
plot(X,y, 'xr', 'markersize', 7);
xlabel("Mass in kg");
ylabel("Length in cm");
X = [ones(length(y),1), X];
theta = ones(2, 1);
alpha = 0.000001; num_iter = 4000;
%Running gradientDescent
[opt_theta, J_history] = gradientDescent(X, y, theta, alpha, num_iter);
%Running Normal equation
opt_theta_norm = pinv(X' * X) * X' * y;
%Plotting the hypothesis for GD and NE
hold on
plot(X(:,2), X * opt_theta);
plot(X(:,2), X * opt_theta_norm, 'g-.', "markersize",10);
legend("Data", "GD", "NE");
hold off
%Plotting values of previous J with each iteration
figure
plot(1:numel(J_history), J_history);
xlabel("iterations"); ylabel("J");
Function for finding gradientDescent:
function [theta, J_history] = gradientDescent (X, y, theta, alpha, num_iter)
m = length(y);
J_history = zeros(num_iter,1);
for iter = 1:num_iter
theta = theta - (alpha / m) * (X' * (X * theta - y));
J_history(iter) = computeCost(X, y, theta);
endfor
endfunction
Function for computing cost:
function J = computeCost (X, y, theta)
J = 0;
m = length(y);
errors = X * theta - y;
J = sum(errors .^ 2) / (2 * m);
endfunction
Try alpha = 0.0001 and num_iter = 400000. This will solve your problem!
Now, the problem with your code is that the learning rate is way too less which is slowing down the convergence. Also, you are not giving it enough time to converge by limiting the training iterations to 4000 only which is very less given the learning rate.
Summarising, the problem is: less learning rate + less iterations.

In case of logistic regression, how should I interpret this learning curve between cost and number of examples?

I have obtained the following learning curve on plotting the learning curves for training and cross validation sets between the error cost, and number of training examples (in 100s in the graph). Can someone please tell me if this learning curve is ever possible? Because I am of the impression that the Cross validation error should decrease as the number of training examples increase.
Learning Curve. Note that the x axis denotes the number of training examples in 100s.
EDIT :
This is the code which I use to calculate the 9 values for plotting the learning curves.
X is the 2D matrix of the training set examples. It is of dimensions m x (n+1). y is of dimensions m x 1, and each element has value 1 or 0.
for j=1:9
disp(j)
[theta,J] = trainClassifier(X(1:(j*100),:),y(1:(j*100)),lambda);
[error_train(j), grad] = costprediciton_train(theta , X(1:(j*100),:), y(1:(j*100)));
[error_cv(j), grad] = costfunction_test2(theta , Xcv(1:(j*100),:),ycv(1:(j*100)));
end
The code I use for finding the optimal value of Theta from the training set.
% Train the classifer. Return theta
function [optTheta, J] = trainClassifier(X,y,lambda)
[m,n]=size(X);
initialTheta = zeros(n, 1);
options=optimset('GradObj','on','MaxIter',100);
[optTheta, J, Exit_flag ] = fminunc(#(t)(regularizedCostFunction(t, X, y, lambda)), initialTheta, options);
end
%regularized cost
function [J, grad] = regularizedCostFunction(theta, X, y,lambda)
[m,n]=size(X);
h=sigmoid( X * theta);
temp1 = -1 * (y .* log(h));
temp2 = (1 - y) .* log(1 - h);
thetaT = theta;
thetaT(1) = 0;
correction = sum(thetaT .^ 2) * (lambda / (2 * m));
J = sum(temp1 - temp2) / m + correction;
grad = (X' * (h - y)) * (1/m) + thetaT * (lambda / m);
end
The code I use for calculating the error cost for prediction of results for training set: (similar is the code for error cost of CV set)
Theta is of dimensions (n+1) x 1 and consists of the coefficients of the features in the hypothesis function.
function [J,grad] = costprediciton_train(theta , X, y)
[m,n]=size(X);
h=sigmoid(X * theta);
temp1 = y .* log(h);
temp2 = (1-y) .* log(1- h);
J = -sum (temp1 + temp2)/m;
t=h-y;
grad=(X'*t)*(1/m);
end
function [J,grad] = costfunction_test2(theta , X, y)
m= length(y);
h=sigmoid(X*theta);
temp1 = y .* log(h);
temp2 = (1-y) .* log(1- h);
J = -sum (temp1 + temp2)/m ;
grad = (X' * (h - y)) * (1/m) ;
end
The Sigmoid function:
function g = sigmoid(z)
g= zeros(size(z));
den=1 + exp(-1*z);
g = 1 ./ den;
end

Represent Linear Regression features in Gradient Descent numerically

The following piece of python code works well for finding gradient descent:
def gradientDescent(x, y, theta, alpha, m, numIterations):
xTrans = x.transpose()
for i in range(0, numIterations):
hypothesis = np.dot(x, theta)
loss = hypothesis - y
cost = np.sum(loss ** 2) / (2 * m)
print("Iteration %d | Cost: %f" % (i, cost))
gradient = np.dot(xTrans, loss) / m
theta = theta - alpha * gradient
return theta
Here, x = m*n (m = no. of sample data and n = total features) feature matrix.
However, if my features are non-numerical (say, director and genre) of '2' movies then my feature matrix may look like:
['Peter Jackson', 'Action'
Sergio Leone', 'Comedy']
In such a case, how can I map these features to numerical values and apply gradient descent ?
You can map your features to numerical value of your choice and then apply gradient descent the usual way.
In python you can use panda to do that easily:
import pandas as pd
df = pd.DataFrame(X, ['director', 'genre'])
df.director = df.director.map({'Peter Jackson': 0, 'Sergio Leone': 1})
df.genre = df.genre.map({'Action': 0, 'Comedy': 1})
As you can see, this way can become pretty complicated and it might be better to write a piece of code doing that dynamically.

Resources