Cost value doesn't converge - machine-learning

I'm trying code a logistic regression but I'm in trouble getting a convergent COST, can anyone help me? Below are my codes. Thank you!
#input:
m = 3, n = 4
# we have 3 training examples and each of them has 4 features (Sorry, I know it looks weired here). Y is a label matrix.
X = np.array([[1,2,1],[1,1,0],[1,2,1],[1,0,2]])
Y = np.array([[0,1,0]])
h = 100000 #iterations
alpha = 0.05 #learning rate
b = 0 #scalar bias
W = np.zeros(n).reshape(1,n) #weights
J = np.zeros(h).reshape(1,h) #a vector for holing cost value
Yhat = np.zeros(m).reshape(1,m) #predicted value
def activation(yhat):
return 1/(1+np.exp(-yhat))
W=W.T
for g in range(h):
m = X.T.shape[0]
Y_hat = activation(X.dot(W)+b)
cost = -1/m * np.sum(Y*np.log(Y_hat)+(1-Y)*np.log(1-Y_hat))
current_error = Y.T - Y_hat
dW = 1/m * np.dot(X.T, current_error)
db = 1/m * np.sum(current_error)
W = W + alpha * dW
b = b + alpha * db
J[0][g] = cost

Related

Implementing linear regression from scratch in python

I'm trying to Implement linear regression in python using the following gradient decent formulas (Notice that these formulas are after partial derive)
slope
y_intercept
but the code keeps giving me wearied results ,I think (I'm not sure) that the error is in the gradient_descent function
import numpy as np
class LinearRegression:
def __init__(self , x:np.ndarray ,y:np.ndarray):
self.x = x
self.m = len(x)
self.y = y
def calculate_predictions(self ,slope:int , y_intercept:int) -> np.ndarray: # Calculate y hat.
predictions = []
for x in self.x:
predictions.append(slope * x + y_intercept)
return predictions
def calculate_error_cost(self , y_hat:np.ndarray) -> int:
error_valuse = []
for i in range(self.m):
error_valuse.append((y_hat[i] - self.y[i] )** 2)
error = (1/(2*self.m)) * sum(error_valuse)
return error
def gradient_descent(self):
costs = []
# initialization values
temp_w = 0
temp_b = 0
a = 0.001 # Learning rate
while True:
y_hat = self.calculate_predictions(slope=temp_w , y_intercept= temp_b)
sum_w = 0
sum_b = 0
for i in range(len(self.x)):
sum_w += (y_hat[i] - self.y[i] ) * self.x[i]
sum_b += (y_hat[i] - self.y[i] )
w = temp_w - a * ((1/self.m) *sum_w)
b = temp_b - a * ((1/self.m) *sum_b)
temp_w = w
temp_b = b
costs.append(self.calculate_error_cost(y_hat))
try:
if costs[-1] > costs[-2]: # If global minimum reached
return [w,b]
except IndexError:
pass
I Used this dataset:-
https://www.kaggle.com/datasets/tanuprabhu/linear-regression-dataset?resource=download
after downloading it like this:
import pandas
p = pandas.read_csv('linear_regression_dataset.csv')
l = LinearRegression(x= p['X'] , y= p['Y'])
print(l.gradient_descent())
But It's giving me [-568.1905905426412, -2.833321633515304] Which is decently not accurate.
I want to implement the algorithm not using external modules like scikit-learn for learning purposes.
I tested the calculate_error_cost function and it worked as expected and I don't think that there is an error in the calculate_predictions function
One small problem you have is that you are returning the last values of w and b, when you should be returning the second-to-last parameters (because they yield a lower cost). This should not really matter that much... unless your learning rate is too high and you are immediately getting a higher value for the cost function on the second iteration. This I believe is your real problem, judging from the dataset you shared.
The algorithm does work on the dataset, but you need to change the learning rate. I ran it in the example below and it gave the result shown in the image. One caveat is that I added a limit to the iterations to avoid the algorithm from taking too long (and only marginally improving the result).
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
class LinearRegression:
def __init__(self , x:np.ndarray ,y:np.ndarray):
self.x = x
self.m = len(x)
self.y = y
def calculate_predictions(self ,slope:int , y_intercept:int) -> np.ndarray: # Calculate y hat.
predictions = []
for x in self.x:
predictions.append(slope * x + y_intercept)
return predictions
def calculate_error_cost(self , y_hat:np.ndarray) -> int:
error_valuse = []
for i in range(self.m):
error_valuse.append((y_hat[i] - self.y[i] )** 2)
error = (1/(2*self.m)) * sum(error_valuse)
return error
def gradient_descent(self):
costs = []
# initialization values
temp_w = 0
temp_b = 0
iteration = 0
a = 0.00001 # Learning rate
while iteration < 1000:
y_hat = self.calculate_predictions(slope=temp_w , y_intercept= temp_b)
sum_w = 0
sum_b = 0
for i in range(len(self.x)):
sum_w += (y_hat[i] - self.y[i] ) * self.x[i]
sum_b += (y_hat[i] - self.y[i] )
w = temp_w - a * ((1/self.m) *sum_w)
b = temp_b - a * ((1/self.m) *sum_b)
costs.append(self.calculate_error_cost(y_hat))
try:
if costs[-1] > costs[-2]: # If global minimum reached
print(costs)
return [temp_w,temp_b]
except IndexError:
pass
temp_w = w
temp_b = b
iteration += 1
print(iteration)
return [temp_w,temp_b]
p = pd.read_csv('linear_regression_dataset.csv')
x_data = p['X']
y_data = p['Y']
lin_reg = LinearRegression(x_data, y_data)
y_hat = lin_reg.calculate_predictions(*lin_reg.gradient_descent())
fig = plt.figure()
plt.plot(x_data, y_data, 'r.', label='Data')
plt.plot(x_data, y_hat, 'b-', label='Linear Regression')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()

Pytorch, slicing tensor causes RuntimeError:: one of the variables needed for gradient computation has been modified by an inplace operation:

I wrote a RNN with LSTM cell with Pycharm. The peculiarity of this network is that the output of the RNN is fed into a integration opeartion, computed with Runge-kutta.
The integration takes some input and propagate that in time one step ahead. In order to do so I need to slice the feature tensor X along the batch dimension, and pass this to the Runge-kutta.
class MyLSTM(torch.nn.Module):
def __init__(self, ni, no, sampling_interval, nh=10, nlayers=1):
super(MyLSTM, self).__init__()
self.device = torch.device("cpu")
self.dtype = torch.float
self.ni = ni
self.no = no
self.nh = nh
self.nlayers = nlayers
self.lstms = torch.nn.ModuleList(
[torch.nn.LSTMCell(self.ni, self.nh)] + [torch.nn.LSTMCell(self.nh, self.nh) for i in range(nlayers - 1)])
self.out = torch.nn.Linear(self.nh, self.no)
self.do = torch.nn.Dropout(p=0.2)
self.actfn = torch.nn.Sigmoid()
self.sampling_interval = sampling_interval
self.scaler_states = None
# Options
# description of the whole block
def forward(self, x, h0, train=False, integrate_ode=True):
x0 = x.clone().requires_grad_(True)
hs = x # initiate hidden state
if h0 is None:
h = torch.zeros(hs.shape[0], self.nh, device=self.device)
c = torch.zeros(hs.shape[0], self.nh, device=self.device)
else:
(h, c) = h0
# LSTM cells
for i in range(self.nlayers):
h, c = self.lstms[i](hs, (h, c))
if train:
hs = self.do(h)
else:
hs = h
# Output layer
# y = self.actfn(self.out(hs))
y = self.out(hs)
if integrate_ode:
p = y
y = self.integrate(x0, p)
return y, (h, c)
def integrate(self, x0, p):
# RK4 steps per interval
M = 4
DT = self.sampling_interval / M
X = x0
# X = self.scaler_features.inverse_transform(x0)
for b in range(X.shape[0]):
xx = X[b, :]
for j in range(M):
k1 = self.ode(xx, p[b, :])
k2 = self.ode(xx + DT / 2 * k1, p[b, :])
k3 = self.ode(xx + DT / 2 * k2, p[b, :])
k4 = self.ode(xx + DT * k3, p[b, :])
xx = xx + DT / 6 * (k1 + 2 * k2 + 2 * k3 + k4)
X_all[b, :] = xx
return X_all
def ode(self, x0, y):
# Here I a dynamic model
I get this error:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor []], which is output 0 of SelectBackward, is at version 64; expected version 63 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
the problem is in the operations xx = X[b, :] and p[b,:]. I know that because I choose batch dimension of 1, then I can replace the previous two equations with xx=X and p, and this works. How can split the tensor without loosing the gradient?
I had the same question, and after a lot of searching, I added .detach() function after "h" and "c" in the RNN cell.

How to solve logistic regression using gradient descent in octave?

I am learning Machine Learning course from coursera from Andrews Ng. I have written a code for logistic regression in octave. But, it is not working. Can someone help me?
I have taken the dataset from the following link:
Titanic survivors
Here is my code:
pkg load io;
[An, Tn, Ra, limits] = xlsread("~/ML/ML Practice/dataset/train_and_test2.csv", "Sheet2", "A2:H1000");
# As per CSV file we are reading columns from 1 to 7. 8-th column is Survived, which is what we are going to predict
X = [An(:, [1:7])];
Y = [An(:, 8)];
X = horzcat(ones(size(X,1), 1), X);
# Initializing theta values as zero for all
#theta = zeros(size(X,2),1);
theta = [-3;1;1;-3;1;1;1;1];
learningRate = -0.00021;
#learningRate = -0.00011;
# Step 1: Calculate Hypothesis
function g_z = estimateHypothesis(X, theta)
z = theta' * X';
z = z';
e_z = -1 * power(2.72, z);
denominator = 1.+e_z;
g_z = 1./denominator;
endfunction
# Step 2: Calculate Cost function
function cost = estimateCostFunction(hypothesis, Y)
log_1 = log(hypothesis);
log_2 = log(1.-hypothesis);
y1 = Y;
term_1 = y1.*log_1;
y2 = 1.-Y;
term_2 = y2.*log_2;
cost = term_1 + term_2;
cost = sum(cost);
# no.of.rows
m = size(Y, 1);
cost = -1 * (cost/m);
endfunction
# Step 3: Using gradient descent I am updating theta values
function updatedTheta = updateThetaValues(_X, _Y, _theta, _hypothesis, learningRate)
#s1 = _X * _theta;
#s2 = s1 - _Y;
#s3 = _X' * s2;
# no.of.rows
#m = size(_Y, 1);
#s4 = (learningRate * s3)/m;
#updatedTheta = _theta - s4;
s1 = _hypothesis - _Y;
s2 = s1 .* _X;
s3 = sum(s2);
# no.of.rows
m = size(_Y, 1);
s4 = (learningRate * s3)/m;
updatedTheta = _theta .- s4';
endfunction
costVector = [];
iterationVector = [];
for i = 1:1000
# Step 1
hypothesis = estimateHypothesis(X, theta);
#disp("hypothesis");
#disp(hypothesis);
# Step 2
cost = estimateCostFunction(hypothesis, Y);
costVector = vertcat(costVector, cost);
#disp("Cost");
#disp(cost);
# Step 3 - Updating theta values
theta = updateThetaValues(X, Y, theta, hypothesis, learningRate);
iterationVector = vertcat(iterationVector, i);
endfor
function plotGraph(iterationVector, costVector)
plot(iterationVector, costVector);
ylabel('Cost Function');
xlabel('Iteration');
endfunction
plotGraph(iterationVector, costVector);
This is the graph I am getting when I am plotting against no.of.iterations and cost function.
I am tired by adjusting theta values and learning rate. Can someone help me to solve this problem.
Thanks.
I have done a mathematical error. I should have used either power(2.72, -z) or exp(-z). Instead I have used as -1 * power(2.72, z). Now, I'm getting a proper curve.
Thanks.

Gradient Descent cost function explosion

I am writing this code for linear regression and trying Gradient Descent to minimize the RSS. The cost function seems to explode to infinity within 12 iterations. I know this is not supposed to happen. Maybe, I have used the wrong gradient function for RSS (can be seen in the function "grad()")?
NumberObservations=100
minVal=1
maxVal=20
X = np.random.uniform(minVal,maxVal,(NumberObservations,1))
e = np.random.normal(0, 1, (NumberObservations,1))
Y= 10 + 5*X + e
B = np.array([[0], [0]])
sum_y = sum(Y)
sum_x = sum(X)
sum_xy = sum(np.multiply(X, Y))
sum_x2 = sum(X*X)
alpha = 0.00001
iterations = 15
def cost_fun(X, Y, B):
b0 = B[0]
b1 = B[1]
s = (Y - (b0 + (b1*X)))**2
rss = sum(s)
return rss
def grad(X, Y, B):
print("B = " + str(B))
b0 = B[0]
b1 = B[1]
g0 = -2*(Y - b0 - (b1*X))
g1 = -2*((X*Y) - (b0*X) - (b1*X**2))
grad = np.concatenate((g0, g1), axis = 1)
return grad
def gradient_descent(X, Y, B, alpha, iterations):
cost_history = [0] * iterations
m = len(Y)
x0 = np.array(np.ones(m))
x0 = x0.reshape((100, 1))
X1 = np.concatenate((x0, X), axis = 1)
for iteration in range(iterations):
h = np.dot(X1, B)
h = h.reshape((100, 1))
loss = h - Y
g = grad(X, Y, B)
gradient = (np.dot(g.T, loss) / m)
B = B - alpha * gradient
cost = cost_fun(X, Y, B)
cost_history[iteration] = cost
print("Iteration %d | Cost: %f" % (iteration, cost))
print("-----------------------------------------------------------------------")
return B, cost_history
newB, cost_history = gradient_descent(X, Y, B, alpha, iterations)
# New Values of B
print(newB)
Please help.

How to derive an objective function for a multi-class logistic regression classifier using 1-of-k encoding?

I get what this wiki page says(http://en.wikipedia.org/wiki/Multinomial_logistic_regression), but I don't know how to get the update rules for stochastic gradient descent. Sorry to ask this here(this is really just about machine learning theories instead of actual implementation). Could someone provide a solution with explanation? Thanks in advance!
I happened to write code to implent softmax, I refer most to the page http://ufldl.stanford.edu/wiki/index.php/Softmax_Regression
this is the code I wrote in matlab ,hope it will help
function y = sigmoid_multi(weight,x,class_index)
%% weight feature_dim * class_num
%% x feature_dim * 1
%% class_index scalar
sum = eps;
class_num = size(weight,2);
for i = 1:class_num
sum = sum + exp(weight(:,i)'*x);
end
y = exp(weight(:,class_index)'*x)/sum;
end
function g = gradient(train_patterns,train_labels,weight)
m = size(train_patterns,2);
class_num = size(weight,2);
g = zeros(size(weight));
for j = 1:class_num
for i = 1:m
if(train_labels(i) == j)
g(:,j) = g(:,j) + (1 - log( sigmoid_multi(weight,train_patterns(:,i),j) + eps))*train_patterns(:,i);
end
end
end
g = -(g/m);
end
function J = object_function(train_patterns,train_labels,weight)
m = size(train_patterns,2);
J = 0;
for i = 1:m
J = J + log( sigmoid_multi(weight,train_patterns(:,i),train_labels(i)) + eps);
end
J = -(J/m);
end
function weight = multi_logistic_train(train_patterns,train_labels,alpha)
%% weight feature_dim * class_num
%% train_patterns featur_dim * sample_num
%% train_labels 1 * sample_num
%% alpha scalar
class_num = length(unique(train_labels));
m = size(train_patterns,2); %% sample_number;
n = size(train_patterns,1); % feature_dim;
weight = rand(n,class_num);
for i = 1:40
J = object_function(train_patterns,train_labels,weight);
fprintf('objec function value : %f\n',J);
weight = weight - alpha*gradient(train_patterns,train_labels,weight);
end
end

Resources