Logistic Regression not able to find value of theta - machine-learning

I have hundred Entries in csv file.
Physics,Maths,Status_class0or1
30,40,0
90,70,1
Using above data i am trying to build logistic (binary) classifier.
Please advise me where i am doing wrong ? Why i am getting answer in 3*3 Matrix (9 values of theta, where as it should be 3 only)
Here is code:
importing the libraries
import numpy as np
import pandas as pd
from sklearn import preprocessing
reading data from csv file.
df = pd.read_csv("LogisticRegressionFirstBinaryClassifier.csv", header=None)
df.columns = ["Maths", "Physics", "AdmissionStatus"]
X = np.array(df[["Maths", "Physics"]])
y = np.array(df[["AdmissionStatus"]])
X = preprocessing.normalize(X)
X = np.c_[np.ones(X.shape[0]), X]
theta = np.ones((X.shape[1], 1))
print(X.shape) # (100, 3)
print(y.shape) # (100, 1)
print(theta.shape) # (3, 1)
calc_z to caculate dot product of X and theta
def calc_z(X,theta):
return np.dot(X,theta)
Sigmoid function
def sigmoid(z):
return 1 / (1 + np.exp(-z))
Cost_function
def cost_function(X, y, theta):
z = calc_z(X,theta)
h = sigmoid(z)
return (-y * np.log(h) - (1 - y) * np.log(1 - h)).mean()
print("cost_function =" , cost_function(X, y, theta))
def derivativeofcostfunction(X, y, theta):
z = calc_z(X,theta)
h = sigmoid(z)
calculation = np.dot((h - y).T,X)
return calculation
print("derivativeofcostfunction=", derivativeofcostfunction(X, y, theta))
def grad_desc(X, y, theta, lr=.001, converge_change=.001):
cost = cost_function(X, y, theta)
change_cost = 1
num_iter = 1
while(change_cost > converge_change):
old_cost = cost
print(theta)
print (derivativeofcostfunction(X, y, theta))
theta = theta - lr*(derivativeofcostfunction(X, y, theta))
cost = cost_function(X, y, theta)
change_cost = old_cost - cost
num_iter += 1
return theta, num_iter
Here is the output :
[[ 0.4185146 -0.56877556 0.63999433]
[15.39722864 9.73995197 11.07882445]
[12.77277463 7.93485324 9.24909626]]
[[0.33944777 0.58199037 0.52493407]
[0.02106587 0.36300629 0.30297278]
[0.07040604 0.3969297 0.33737757]]
[[-0.05856159 -0.89826735 0.30849185]
[15.18035041 9.59004868 10.92827046]
[12.4804775 7.73302024 9.04599788]]
[[0.33950634 0.58288863 0.52462558]
[0.00588552 0.35341624 0.29204451]
[0.05792556 0.38919668 0.32833157]]
[[-5.17526527e-01 -1.21534937e+00 -1.03387571e-02]
[ 1.49729502e+01 9.44663458e+00 1.07843504e+01]
[ 1.21978140e+01 7.53778010e+00 8.84964495e+00]]
(array([[ 0.34002386, 0.58410398, 0.52463592],
[-0.00908743, 0.34396961, 0.28126016],
[ 0.04572775, 0.3816589 , 0.31948193]]), 46)

I changed this code , just added Transpose while returning the matrix and it fixed my issue.
def derivativeofcostfunction(X, y, theta):
z = calc_z(X,theta)
h = sigmoid(z)
calculation = np.dot((h - y).T,X)
return calculation.T

Related

The neural network after several training epochs has too large sigmoid values ​and does not learn

I'm implementing a fully connected neural network for MNIST (not convolutional!) and I'm having a problem. When I make multiple forward passes and backward passes, the exponents get abnormally high and python is unable to calculate them. It seems to me that I incorrectly registered backward_pass. Could you help me with this. Here are the network settings:
w_1 = np.random.uniform(-0.5, 0.5, (128, 784))
b_1 = np.random.uniform(-0.5, 0.5, (128, 1))
w_2 = np.random.uniform(-0.5, 0.5, (10, 128))
b_2 = np.random.uniform(-0.5, 0.5, (10, 1))
X_train shape: (784, 31500)
y_train shape: (31500,)
X_test shape: (784, 10500)
y_test shape: (10500,)
def sigmoid(x, alpha):
return 1 / (1 + np.exp(-alpha * x))
def dx_sigmoid(x, alpha):
exp_neg_x = np.exp(-alpha * x)
return alpha * exp_neg_x / ((1 + exp_neg_x)**2)
def ReLU(x):
return np.maximum(0, x)
def dx_ReLU(x):
return np.where(x > 0, 1, 0)
def one_hot(y):
one_hot_y = np.zeros((y.size, y.max() + 1))
one_hot_y[np.arange(y.size), y] = 1
one_hot_y = one_hot_y.T
return one_hot_y
def forward_pass(X, w_1, b_1, w_2, b_2):
layer_1 = np.dot(w_1, X) + b_1
layer_1_act = ReLU(layer_1)
layer_2 = np.dot(w_2, layer_1_act) + b_2
layer_2_act = sigmoid(layer_2, 0.01)
return layer_1, layer_1_act, layer_2, layer_2_act
def backward_pass(layer_1, layer_1_act, layer_2, layer_2_act, X, y, w_2):
one_hot_y = one_hot(y)
n_samples = one_hot_y.shape[1]
d_loss_by_layer_2_act = (2 / n_samples) * np.sum(one_hot_y - layer_2_act, axis=1).reshape(-1, 1)
d_layer_2_act_by_layer_2 = dx_sigmoid(layer_2, 0.01)
d_loss_by_layer_2 = d_loss_by_layer_2_act * d_layer_2_act_by_layer_2
d_layer_2_by_w_2 = layer_1_act.T
d_loss_by_w_2 = np.dot(d_loss_by_layer_2, d_layer_2_by_w_2)
d_loss_by_b_2 = np.sum(d_loss_by_layer_2, axis=1).reshape(-1, 1)
d_layer_2_by_layer_1_act = w_2.T
d_loss_by_layer_1_act = np.dot(d_layer_2_by_layer_1_act, d_loss_by_layer_2)
d_layer_1_act_by_layer_1 = dx_ReLU(layer_1)
d_loss_by_layer_1 = d_loss_by_layer_1_act * d_layer_1_act_by_layer_1
d_layer_1_by_w_1 = X.T
d_loss_by_w_1 = np.dot(d_loss_by_layer_1, d_layer_1_by_w_1)
d_loss_by_b_1 = np.sum(d_loss_by_layer_1, axis=1).reshape(-1, 1)
return d_loss_by_w_1, d_loss_by_b_1, d_loss_by_w_2, d_loss_by_b_2
for epoch in range(epochs):
layer_1, layer_1_act, layer_2, layer_2_act = forward_pass(X_train, w_1, b_1, w_2, b_2)
d_loss_by_w_1, d_loss_by_b_1, d_loss_by_w_2, d_loss_by_b_2 = backward_pass(layer_1, layer_1_act,
layer_2, layer_2_act,
X_train, y_train,
w_2)
w_1 -= learning_rate * d_loss_by_w_1
b_1 -= learning_rate * d_loss_by_b_1
w_2 -= learning_rate * d_loss_by_w_2
b_2 -= learning_rate * d_loss_by_b_2
_, _, _, predictions = forward_pass(X_train, w_1, b_1, w_2, b_2)
predictions = predictions.argmax(axis=0)
accuracy = accuracy_score(predictions, y_train)
print(f"epoch: {epoch} / acuracy: {accuracy}")
My loss is MSE: (1 / n_samples) * np.sum((one_hot_y - layer_2_act)**2, axis=0)
This is my
calculations
calculations
I tried to decrease lr, set the alpha coefficient to the exponent (e^(-alpha * x) for sigmoid), I divided my entire sample by 255. and still the program cannot learn because the numbers are too large
To start the unifrom initialization you are using has a relatively big std, for linear layer you should be 1/sqrt(fin) , which for first layer will be :
1 / np.sqrt(128)
0.08838834764831843
which means:
w_1 = np.random.uniform(-0.08, 0.08, (128, 784))
...
also did not check your forward and backward path, assuming if it is correct and you see very big values in your activation, you could as well normalize (like using an implementation of batchnorm or layer norm) to force centred around zero with unit std.
P.S:
also noticed you as well doing a multi-class, then MSE would not be a good choice, use Softmax or logSoftmax (easier implementation), but why loss is not moving fast enough could also be linked to not a good LR as well. and do your inputs normalized?
you could plot the dist for layers and see if they are good.

Implementing linear regression from scratch in python

I'm trying to Implement linear regression in python using the following gradient decent formulas (Notice that these formulas are after partial derive)
slope
y_intercept
but the code keeps giving me wearied results ,I think (I'm not sure) that the error is in the gradient_descent function
import numpy as np
class LinearRegression:
def __init__(self , x:np.ndarray ,y:np.ndarray):
self.x = x
self.m = len(x)
self.y = y
def calculate_predictions(self ,slope:int , y_intercept:int) -> np.ndarray: # Calculate y hat.
predictions = []
for x in self.x:
predictions.append(slope * x + y_intercept)
return predictions
def calculate_error_cost(self , y_hat:np.ndarray) -> int:
error_valuse = []
for i in range(self.m):
error_valuse.append((y_hat[i] - self.y[i] )** 2)
error = (1/(2*self.m)) * sum(error_valuse)
return error
def gradient_descent(self):
costs = []
# initialization values
temp_w = 0
temp_b = 0
a = 0.001 # Learning rate
while True:
y_hat = self.calculate_predictions(slope=temp_w , y_intercept= temp_b)
sum_w = 0
sum_b = 0
for i in range(len(self.x)):
sum_w += (y_hat[i] - self.y[i] ) * self.x[i]
sum_b += (y_hat[i] - self.y[i] )
w = temp_w - a * ((1/self.m) *sum_w)
b = temp_b - a * ((1/self.m) *sum_b)
temp_w = w
temp_b = b
costs.append(self.calculate_error_cost(y_hat))
try:
if costs[-1] > costs[-2]: # If global minimum reached
return [w,b]
except IndexError:
pass
I Used this dataset:-
https://www.kaggle.com/datasets/tanuprabhu/linear-regression-dataset?resource=download
after downloading it like this:
import pandas
p = pandas.read_csv('linear_regression_dataset.csv')
l = LinearRegression(x= p['X'] , y= p['Y'])
print(l.gradient_descent())
But It's giving me [-568.1905905426412, -2.833321633515304] Which is decently not accurate.
I want to implement the algorithm not using external modules like scikit-learn for learning purposes.
I tested the calculate_error_cost function and it worked as expected and I don't think that there is an error in the calculate_predictions function
One small problem you have is that you are returning the last values of w and b, when you should be returning the second-to-last parameters (because they yield a lower cost). This should not really matter that much... unless your learning rate is too high and you are immediately getting a higher value for the cost function on the second iteration. This I believe is your real problem, judging from the dataset you shared.
The algorithm does work on the dataset, but you need to change the learning rate. I ran it in the example below and it gave the result shown in the image. One caveat is that I added a limit to the iterations to avoid the algorithm from taking too long (and only marginally improving the result).
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
class LinearRegression:
def __init__(self , x:np.ndarray ,y:np.ndarray):
self.x = x
self.m = len(x)
self.y = y
def calculate_predictions(self ,slope:int , y_intercept:int) -> np.ndarray: # Calculate y hat.
predictions = []
for x in self.x:
predictions.append(slope * x + y_intercept)
return predictions
def calculate_error_cost(self , y_hat:np.ndarray) -> int:
error_valuse = []
for i in range(self.m):
error_valuse.append((y_hat[i] - self.y[i] )** 2)
error = (1/(2*self.m)) * sum(error_valuse)
return error
def gradient_descent(self):
costs = []
# initialization values
temp_w = 0
temp_b = 0
iteration = 0
a = 0.00001 # Learning rate
while iteration < 1000:
y_hat = self.calculate_predictions(slope=temp_w , y_intercept= temp_b)
sum_w = 0
sum_b = 0
for i in range(len(self.x)):
sum_w += (y_hat[i] - self.y[i] ) * self.x[i]
sum_b += (y_hat[i] - self.y[i] )
w = temp_w - a * ((1/self.m) *sum_w)
b = temp_b - a * ((1/self.m) *sum_b)
costs.append(self.calculate_error_cost(y_hat))
try:
if costs[-1] > costs[-2]: # If global minimum reached
print(costs)
return [temp_w,temp_b]
except IndexError:
pass
temp_w = w
temp_b = b
iteration += 1
print(iteration)
return [temp_w,temp_b]
p = pd.read_csv('linear_regression_dataset.csv')
x_data = p['X']
y_data = p['Y']
lin_reg = LinearRegression(x_data, y_data)
y_hat = lin_reg.calculate_predictions(*lin_reg.gradient_descent())
fig = plt.figure()
plt.plot(x_data, y_data, 'r.', label='Data')
plt.plot(x_data, y_hat, 'b-', label='Linear Regression')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()

Pytorch, slicing tensor causes RuntimeError:: one of the variables needed for gradient computation has been modified by an inplace operation:

I wrote a RNN with LSTM cell with Pycharm. The peculiarity of this network is that the output of the RNN is fed into a integration opeartion, computed with Runge-kutta.
The integration takes some input and propagate that in time one step ahead. In order to do so I need to slice the feature tensor X along the batch dimension, and pass this to the Runge-kutta.
class MyLSTM(torch.nn.Module):
def __init__(self, ni, no, sampling_interval, nh=10, nlayers=1):
super(MyLSTM, self).__init__()
self.device = torch.device("cpu")
self.dtype = torch.float
self.ni = ni
self.no = no
self.nh = nh
self.nlayers = nlayers
self.lstms = torch.nn.ModuleList(
[torch.nn.LSTMCell(self.ni, self.nh)] + [torch.nn.LSTMCell(self.nh, self.nh) for i in range(nlayers - 1)])
self.out = torch.nn.Linear(self.nh, self.no)
self.do = torch.nn.Dropout(p=0.2)
self.actfn = torch.nn.Sigmoid()
self.sampling_interval = sampling_interval
self.scaler_states = None
# Options
# description of the whole block
def forward(self, x, h0, train=False, integrate_ode=True):
x0 = x.clone().requires_grad_(True)
hs = x # initiate hidden state
if h0 is None:
h = torch.zeros(hs.shape[0], self.nh, device=self.device)
c = torch.zeros(hs.shape[0], self.nh, device=self.device)
else:
(h, c) = h0
# LSTM cells
for i in range(self.nlayers):
h, c = self.lstms[i](hs, (h, c))
if train:
hs = self.do(h)
else:
hs = h
# Output layer
# y = self.actfn(self.out(hs))
y = self.out(hs)
if integrate_ode:
p = y
y = self.integrate(x0, p)
return y, (h, c)
def integrate(self, x0, p):
# RK4 steps per interval
M = 4
DT = self.sampling_interval / M
X = x0
# X = self.scaler_features.inverse_transform(x0)
for b in range(X.shape[0]):
xx = X[b, :]
for j in range(M):
k1 = self.ode(xx, p[b, :])
k2 = self.ode(xx + DT / 2 * k1, p[b, :])
k3 = self.ode(xx + DT / 2 * k2, p[b, :])
k4 = self.ode(xx + DT * k3, p[b, :])
xx = xx + DT / 6 * (k1 + 2 * k2 + 2 * k3 + k4)
X_all[b, :] = xx
return X_all
def ode(self, x0, y):
# Here I a dynamic model
I get this error:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor []], which is output 0 of SelectBackward, is at version 64; expected version 63 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
the problem is in the operations xx = X[b, :] and p[b,:]. I know that because I choose batch dimension of 1, then I can replace the previous two equations with xx=X and p, and this works. How can split the tensor without loosing the gradient?
I had the same question, and after a lot of searching, I added .detach() function after "h" and "c" in the RNN cell.

How to solve logistic regression using gradient descent in octave?

I am learning Machine Learning course from coursera from Andrews Ng. I have written a code for logistic regression in octave. But, it is not working. Can someone help me?
I have taken the dataset from the following link:
Titanic survivors
Here is my code:
pkg load io;
[An, Tn, Ra, limits] = xlsread("~/ML/ML Practice/dataset/train_and_test2.csv", "Sheet2", "A2:H1000");
# As per CSV file we are reading columns from 1 to 7. 8-th column is Survived, which is what we are going to predict
X = [An(:, [1:7])];
Y = [An(:, 8)];
X = horzcat(ones(size(X,1), 1), X);
# Initializing theta values as zero for all
#theta = zeros(size(X,2),1);
theta = [-3;1;1;-3;1;1;1;1];
learningRate = -0.00021;
#learningRate = -0.00011;
# Step 1: Calculate Hypothesis
function g_z = estimateHypothesis(X, theta)
z = theta' * X';
z = z';
e_z = -1 * power(2.72, z);
denominator = 1.+e_z;
g_z = 1./denominator;
endfunction
# Step 2: Calculate Cost function
function cost = estimateCostFunction(hypothesis, Y)
log_1 = log(hypothesis);
log_2 = log(1.-hypothesis);
y1 = Y;
term_1 = y1.*log_1;
y2 = 1.-Y;
term_2 = y2.*log_2;
cost = term_1 + term_2;
cost = sum(cost);
# no.of.rows
m = size(Y, 1);
cost = -1 * (cost/m);
endfunction
# Step 3: Using gradient descent I am updating theta values
function updatedTheta = updateThetaValues(_X, _Y, _theta, _hypothesis, learningRate)
#s1 = _X * _theta;
#s2 = s1 - _Y;
#s3 = _X' * s2;
# no.of.rows
#m = size(_Y, 1);
#s4 = (learningRate * s3)/m;
#updatedTheta = _theta - s4;
s1 = _hypothesis - _Y;
s2 = s1 .* _X;
s3 = sum(s2);
# no.of.rows
m = size(_Y, 1);
s4 = (learningRate * s3)/m;
updatedTheta = _theta .- s4';
endfunction
costVector = [];
iterationVector = [];
for i = 1:1000
# Step 1
hypothesis = estimateHypothesis(X, theta);
#disp("hypothesis");
#disp(hypothesis);
# Step 2
cost = estimateCostFunction(hypothesis, Y);
costVector = vertcat(costVector, cost);
#disp("Cost");
#disp(cost);
# Step 3 - Updating theta values
theta = updateThetaValues(X, Y, theta, hypothesis, learningRate);
iterationVector = vertcat(iterationVector, i);
endfor
function plotGraph(iterationVector, costVector)
plot(iterationVector, costVector);
ylabel('Cost Function');
xlabel('Iteration');
endfunction
plotGraph(iterationVector, costVector);
This is the graph I am getting when I am plotting against no.of.iterations and cost function.
I am tired by adjusting theta values and learning rate. Can someone help me to solve this problem.
Thanks.
I have done a mathematical error. I should have used either power(2.72, -z) or exp(-z). Instead I have used as -1 * power(2.72, z). Now, I'm getting a proper curve.
Thanks.

how to plot on a graph using the hypothesis function by substituting the value of theta 0 and theta 1

this is the hypothesis function h(x)=theta 0 + theta 1(x)
After putting the value of theta 0 as 0 and theta 1 as 0.5, how to plot it on a graph?
It is the same way that we graph the linear equations. Let us assume h(x) as y and θ as some constant and x as x. So we basically have a linear expression like this y = m + p * x (m,p are constants) . To even simplify it assume the function as y = 2 + 4x. To plot this we will just assume the values of x from a range (0,5) so now for each value of x we will have corresponding value of x. so our (x,y) set will look like this ([0, 1, 2, 3, 4], [2, 6, 10, 14, 18]). Now the graph can be plotted as we know both x and y coords.
You simply plot the line equation y = 0 + 0.5 * x
So you get something like this plot
Here's how I did it with Python
import matplotlib.pyplot as plt
import numpy as np
theta_0 = 0
theta_1 = 0.5
def h(x):
return theta_0 + theta_1 * x
x = range(-100, 100)
y = map(h, x)
plt.plot(x, y)
plt.ylabel(r'$h_\theta(x)$')
plt.xlabel(r'$x$')
plt.title(r'Plot of $h_\theta(x) = \theta_0 + \theta_1 \cdot \ x$')
plt.text(60, .025, r'$\theta_0=0,\ \theta_1=0.5$')
plt.show()

Resources