Gradient Descent cost function explosion

Gradient Descent cost function explosion - machine-learning

I am writing this code for linear regression and trying Gradient Descent to minimize the RSS. The cost function seems to explode to infinity within 12 iterations. I know this is not supposed to happen. Maybe, I have used the wrong gradient function for RSS (can be seen in the function "grad()")?
NumberObservations=100
minVal=1
maxVal=20
X = np.random.uniform(minVal,maxVal,(NumberObservations,1))
e = np.random.normal(0, 1, (NumberObservations,1))
Y= 10 + 5*X + e
B = np.array([[0], [0]])
sum_y = sum(Y)
sum_x = sum(X)
sum_xy = sum(np.multiply(X, Y))
sum_x2 = sum(X*X)
alpha = 0.00001
iterations = 15
def cost_fun(X, Y, B):
b0 = B[0]
b1 = B[1]
s = (Y - (b0 + (b1*X)))**2
rss = sum(s)
return rss
def grad(X, Y, B):
print("B = " + str(B))
b0 = B[0]
b1 = B[1]
g0 = -2*(Y - b0 - (b1*X))
g1 = -2*((X*Y) - (b0*X) - (b1*X**2))
grad = np.concatenate((g0, g1), axis = 1)
return grad
def gradient_descent(X, Y, B, alpha, iterations):
cost_history = [0] * iterations
m = len(Y)
x0 = np.array(np.ones(m))
x0 = x0.reshape((100, 1))
X1 = np.concatenate((x0, X), axis = 1)
for iteration in range(iterations):
h = np.dot(X1, B)
h = h.reshape((100, 1))
loss = h - Y
g = grad(X, Y, B)
gradient = (np.dot(g.T, loss) / m)
B = B - alpha * gradient
cost = cost_fun(X, Y, B)
cost_history[iteration] = cost
print("Iteration %d | Cost: %f" % (iteration, cost))
print("-----------------------------------------------------------------------")
return B, cost_history
newB, cost_history = gradient_descent(X, Y, B, alpha, iterations)
# New Values of B
print(newB)
Please help.

Related

How does Roblox's math.noise() deal with negative inputs?

While messing around with noise outside of Roblox, I realized Perlin/Simplex Noise does not like negative inputs. Remembering Roblox has a noise function, I tried there, and found out negative numbers do work nicely for Roblox's math.noise(). Does anybody know how they made this work, or how to get negative numbers to work for Perlin/Simplex noise in general?
The Simplex Noise I am using (copied from here but changed to have the bitwise and operation):
local function bit_and(a, b) --bitwise and operation
local p, c = 1, 0
while a > 0 and b > 0 do
local ra, rb = a%2, b%2
if (ra + rb) > 1 then
c = c + p
end
a = (a - ra) / 2
b = (b - rb) / 2
p = p * 2
end
return c
end
-- 2D simplex noise
local grad3 = {
{1,1,0},{-1,1,0},{1,-1,0},{-1,-1,0},
{1,0,1},{-1,0,1},{1,0,-1},{-1,0,-1},
{0,1,1},{0,-1,1},{0,1,-1},{0,-1,-1}
}
local p = {151,160,137,91,90,15,
131,13,201,95,96,53,194,233,7,225,140,36,103,30,69,142,8,99,37,240,21,10,23,
190, 6,148,247,120,234,75,0,26,197,62,94,252,219,203,117,35,11,32,57,177,33,
88,237,149,56,87,174,20,125,136,171,168, 68,175,74,165,71,134,139,48,27,166,
77,146,158,231,83,111,229,122,60,211,133,230,220,105,92,41,55,46,245,40,244,
102,143,54, 65,25,63,161, 1,216,80,73,209,76,132,187,208, 89,18,169,200,196,
135,130,116,188,159,86,164,100,109,198,173,186, 3,64,52,217,226,250,124,123,
5,202,38,147,118,126,255,82,85,212,207,206,59,227,47,16,58,17,182,189,28,42,
223,183,170,213,119,248,152, 2,44,154,163, 70,221,153,101,155,167, 43,172,9,
129,22,39,253, 19,98,108,110,79,113,224,232,178,185, 112,104,218,246,97,228,
251,34,242,193,238,210,144,12,191,179,162,241, 81,51,145,235,249,14,239,107,
49,192,214, 31,181,199,106,157,184, 84,204,176,115,121,50,45,127, 4,150,254,
138,236,205,93,222,114,67,29,24,72,243,141,128,195,78,66,215,61,156,180}
local perm = {}
for i=0,511 do
perm[i+1] = p[bit_and(i, 255) + 1]
end
local function dot(g, ...)
local v = {...}
local sum = 0
for i=1,#v do
sum = sum + v[i] * g[i]
end
return sum
end
local noise = {}
function noise.produce(xin, yin)
local n0, n1, n2 -- Noise contributions from the three corners
-- Skew the input space to determine which simplex cell we're in
local F2 = 0.5*(math.sqrt(3.0)-1.0)
local s = (xin+yin)*F2; -- Hairy factor for 2D
local i = math.floor(xin+s)
local j = math.floor(yin+s)
local G2 = (3.0-math.sqrt(3.0))/6.0
local t = (i+j)*G2
local X0 = i-t -- Unskew the cell origin back to (x,y) space
local Y0 = j-t
local x0 = xin-X0 -- The x,y distances from the cell origin
local y0 = yin-Y0
-- For the 2D case, the simplex shape is an equilateral triangle.
-- Determine which simplex we are in.
local i1, j1 -- Offsets for second (middle) corner of simplex in (i,j) coords
if x0 > y0 then
i1 = 1
j1 = 0 -- lower triangle, XY order: (0,0)->(1,0)->(1,1)
else
i1 = 0
j1 = 1
end-- upper triangle, YX order: (0,0)->(0,1)->(1,1)
-- A step of (1,0) in (i,j) means a step of (1-c,-c) in (x,y), and
-- a step of (0,1) in (i,j) means a step of (-c,1-c) in (x,y), where
-- c = (3-sqrt(3))/6
local x1 = x0 - i1 + G2 -- Offsets for middle corner in (x,y) unskewed coords
local y1 = y0 - j1 + G2
local x2 = x0 - 1 + 2 * G2 -- Offsets for last corner in (x,y) unskewed coords
local y2 = y0 - 1 + 2 * G2
-- Work out the hashed gradient indices of the three simplex corners
local ii = bit_and(i, 255)
local jj = bit_and(j, 255)
local gi0 = perm[ii + perm[jj+1]+1] % 12
local gi1 = perm[ii + i1 + perm[jj + j1+1]+1] % 12
local gi2 = perm[ii + 1 + perm[jj + 1+1]+1] % 12
-- Calculate the contribution from the three corners
local t0 = 0.5 - x0 * x0 - y0 * y0
if t0 < 0 then
n0 = 0.0
else
t0 = t0 * t0
n0 = t0 * t0 * dot(grad3[gi0+1], x0, y0) -- (x,y) of grad3 used for 2D gradient
end
local t1 = 0.5 - x1 * x1 - y1 * y1
if t1 < 0 then
n1 = 0.0
else
t1 = t1 * t1
n1 = t1 * t1 * dot(grad3[gi1+1], x1, y1)
end
local t2 = 0.5 - x2 * x2 - y2 * y2
if t2 < 0 then
n2 = 0.0
else
t2 = t2 * t2
n2 = t2 * t2 * dot(grad3[gi2+1], x2, y2)
end
-- Add contributions from each corner to get the final noise value.
-- The result is scaled to return values in the interval [-1,1].
return 70.0 * (n0 + n1 + n2)
end
return noise

The Lua programming language version that Roblox uses, LuaU (or Luau), is actually open-source since November of 2021. You can find it here. The math library can be found in this file called lmathlib.cpp and it contains the math.noise function along with internal functions to calculate it, perlin (main function), grad, lerp, and fade. It's a quite complicated thing I can't explain myself, but I have converted it into Lua here.

Pytorch, slicing tensor causes RuntimeError:: one of the variables needed for gradient computation has been modified by an inplace operation:

I wrote a RNN with LSTM cell with Pycharm. The peculiarity of this network is that the output of the RNN is fed into a integration opeartion, computed with Runge-kutta.
The integration takes some input and propagate that in time one step ahead. In order to do so I need to slice the feature tensor X along the batch dimension, and pass this to the Runge-kutta.
class MyLSTM(torch.nn.Module):
def __init__(self, ni, no, sampling_interval, nh=10, nlayers=1):
super(MyLSTM, self).__init__()
self.device = torch.device("cpu")
self.dtype = torch.float
self.ni = ni
self.no = no
self.nh = nh
self.nlayers = nlayers
self.lstms = torch.nn.ModuleList(
[torch.nn.LSTMCell(self.ni, self.nh)] + [torch.nn.LSTMCell(self.nh, self.nh) for i in range(nlayers - 1)])
self.out = torch.nn.Linear(self.nh, self.no)
self.do = torch.nn.Dropout(p=0.2)
self.actfn = torch.nn.Sigmoid()
self.sampling_interval = sampling_interval
self.scaler_states = None
# Options
# description of the whole block
def forward(self, x, h0, train=False, integrate_ode=True):
x0 = x.clone().requires_grad_(True)
hs = x # initiate hidden state
if h0 is None:
h = torch.zeros(hs.shape[0], self.nh, device=self.device)
c = torch.zeros(hs.shape[0], self.nh, device=self.device)
else:
(h, c) = h0
# LSTM cells
for i in range(self.nlayers):
h, c = self.lstms[i](hs, (h, c))
if train:
hs = self.do(h)
else:
hs = h
# Output layer
# y = self.actfn(self.out(hs))
y = self.out(hs)
if integrate_ode:
p = y
y = self.integrate(x0, p)
return y, (h, c)
def integrate(self, x0, p):
# RK4 steps per interval
M = 4
DT = self.sampling_interval / M
X = x0
# X = self.scaler_features.inverse_transform(x0)
for b in range(X.shape[0]):
xx = X[b, :]
for j in range(M):
k1 = self.ode(xx, p[b, :])
k2 = self.ode(xx + DT / 2 * k1, p[b, :])
k3 = self.ode(xx + DT / 2 * k2, p[b, :])
k4 = self.ode(xx + DT * k3, p[b, :])
xx = xx + DT / 6 * (k1 + 2 * k2 + 2 * k3 + k4)
X_all[b, :] = xx
return X_all
def ode(self, x0, y):
# Here I a dynamic model
I get this error:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor []], which is output 0 of SelectBackward, is at version 64; expected version 63 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
the problem is in the operations xx = X[b, :] and p[b,:]. I know that because I choose batch dimension of 1, then I can replace the previous two equations with xx=X and p, and this works. How can split the tensor without loosing the gradient?

I had the same question, and after a lot of searching, I added .detach() function after "h" and "c" in the RNN cell.

Logistic Regression not able to find value of theta

I have hundred Entries in csv file.
Physics,Maths,Status_class0or1
30,40,0
90,70,1
Using above data i am trying to build logistic (binary) classifier.
Please advise me where i am doing wrong ? Why i am getting answer in 3*3 Matrix (9 values of theta, where as it should be 3 only)
Here is code:
importing the libraries
import numpy as np
import pandas as pd
from sklearn import preprocessing
reading data from csv file.
df = pd.read_csv("LogisticRegressionFirstBinaryClassifier.csv", header=None)
df.columns = ["Maths", "Physics", "AdmissionStatus"]
X = np.array(df[["Maths", "Physics"]])
y = np.array(df[["AdmissionStatus"]])
X = preprocessing.normalize(X)
X = np.c_[np.ones(X.shape[0]), X]
theta = np.ones((X.shape[1], 1))
print(X.shape) # (100, 3)
print(y.shape) # (100, 1)
print(theta.shape) # (3, 1)
calc_z to caculate dot product of X and theta
def calc_z(X,theta):
return np.dot(X,theta)
Sigmoid function
def sigmoid(z):
return 1 / (1 + np.exp(-z))
Cost_function
def cost_function(X, y, theta):
z = calc_z(X,theta)
h = sigmoid(z)
return (-y * np.log(h) - (1 - y) * np.log(1 - h)).mean()
print("cost_function =" , cost_function(X, y, theta))
def derivativeofcostfunction(X, y, theta):
z = calc_z(X,theta)
h = sigmoid(z)
calculation = np.dot((h - y).T,X)
return calculation
print("derivativeofcostfunction=", derivativeofcostfunction(X, y, theta))
def grad_desc(X, y, theta, lr=.001, converge_change=.001):
cost = cost_function(X, y, theta)
change_cost = 1
num_iter = 1
while(change_cost > converge_change):
old_cost = cost
print(theta)
print (derivativeofcostfunction(X, y, theta))
theta = theta - lr*(derivativeofcostfunction(X, y, theta))
cost = cost_function(X, y, theta)
change_cost = old_cost - cost
num_iter += 1
return theta, num_iter
Here is the output :
[[ 0.4185146 -0.56877556 0.63999433]
[15.39722864 9.73995197 11.07882445]
[12.77277463 7.93485324 9.24909626]]
[[0.33944777 0.58199037 0.52493407]
[0.02106587 0.36300629 0.30297278]
[0.07040604 0.3969297 0.33737757]]
[[-0.05856159 -0.89826735 0.30849185]
[15.18035041 9.59004868 10.92827046]
[12.4804775 7.73302024 9.04599788]]
[[0.33950634 0.58288863 0.52462558]
[0.00588552 0.35341624 0.29204451]
[0.05792556 0.38919668 0.32833157]]
[[-5.17526527e-01 -1.21534937e+00 -1.03387571e-02]
[ 1.49729502e+01 9.44663458e+00 1.07843504e+01]
[ 1.21978140e+01 7.53778010e+00 8.84964495e+00]]
(array([[ 0.34002386, 0.58410398, 0.52463592],
[-0.00908743, 0.34396961, 0.28126016],
[ 0.04572775, 0.3816589 , 0.31948193]]), 46)

I changed this code , just added Transpose while returning the matrix and it fixed my issue.
def derivativeofcostfunction(X, y, theta):
z = calc_z(X,theta)
h = sigmoid(z)
calculation = np.dot((h - y).T,X)
return calculation.T

How to solve logistic regression using gradient descent in octave?

I am learning Machine Learning course from coursera from Andrews Ng. I have written a code for logistic regression in octave. But, it is not working. Can someone help me?
I have taken the dataset from the following link:
Titanic survivors
Here is my code:
pkg load io;
[An, Tn, Ra, limits] = xlsread("~/ML/ML Practice/dataset/train_and_test2.csv", "Sheet2", "A2:H1000");
# As per CSV file we are reading columns from 1 to 7. 8-th column is Survived, which is what we are going to predict
X = [An(:, [1:7])];
Y = [An(:, 8)];
X = horzcat(ones(size(X,1), 1), X);
# Initializing theta values as zero for all
#theta = zeros(size(X,2),1);
theta = [-3;1;1;-3;1;1;1;1];
learningRate = -0.00021;
#learningRate = -0.00011;
# Step 1: Calculate Hypothesis
function g_z = estimateHypothesis(X, theta)
z = theta' * X';
z = z';
e_z = -1 * power(2.72, z);
denominator = 1.+e_z;
g_z = 1./denominator;
endfunction
# Step 2: Calculate Cost function
function cost = estimateCostFunction(hypothesis, Y)
log_1 = log(hypothesis);
log_2 = log(1.-hypothesis);
y1 = Y;
term_1 = y1.*log_1;
y2 = 1.-Y;
term_2 = y2.*log_2;
cost = term_1 + term_2;
cost = sum(cost);
# no.of.rows
m = size(Y, 1);
cost = -1 * (cost/m);
endfunction
# Step 3: Using gradient descent I am updating theta values
function updatedTheta = updateThetaValues(_X, _Y, _theta, _hypothesis, learningRate)
#s1 = _X * _theta;
#s2 = s1 - _Y;
#s3 = _X' * s2;
# no.of.rows
#m = size(_Y, 1);
#s4 = (learningRate * s3)/m;
#updatedTheta = _theta - s4;
s1 = _hypothesis - _Y;
s2 = s1 .* _X;
s3 = sum(s2);
# no.of.rows
m = size(_Y, 1);
s4 = (learningRate * s3)/m;
updatedTheta = _theta .- s4';
endfunction
costVector = [];
iterationVector = [];
for i = 1:1000
# Step 1
hypothesis = estimateHypothesis(X, theta);
#disp("hypothesis");
#disp(hypothesis);
# Step 2
cost = estimateCostFunction(hypothesis, Y);
costVector = vertcat(costVector, cost);
#disp("Cost");
#disp(cost);
# Step 3 - Updating theta values
theta = updateThetaValues(X, Y, theta, hypothesis, learningRate);
iterationVector = vertcat(iterationVector, i);
endfor
function plotGraph(iterationVector, costVector)
plot(iterationVector, costVector);
ylabel('Cost Function');
xlabel('Iteration');
endfunction
plotGraph(iterationVector, costVector);
This is the graph I am getting when I am plotting against no.of.iterations and cost function.
I am tired by adjusting theta values and learning rate. Can someone help me to solve this problem.
Thanks.

I have done a mathematical error. I should have used either power(2.72, -z) or exp(-z). Instead I have used as -1 * power(2.72, z). Now, I'm getting a proper curve.
Thanks.

Cost value doesn't converge

I'm trying code a logistic regression but I'm in trouble getting a convergent COST, can anyone help me? Below are my codes. Thank you!
#input：
m = 3， n = 4
# we have 3 training examples and each of them has 4 features (Sorry, I know it looks weired here). Y is a label matrix.
X = np.array([[1,2,1],[1,1,0],[1,2,1],[1,0,2]])
Y = np.array([[0,1,0]])
h = 100000 #iterations
alpha = 0.05 #learning rate
b = 0 #scalar bias
W = np.zeros(n).reshape(1,n) #weights
J = np.zeros(h).reshape(1,h) #a vector for holing cost value
Yhat = np.zeros(m).reshape(1,m) #predicted value
def activation(yhat):
return 1/(1+np.exp(-yhat))
W=W.T
for g in range(h):
m = X.T.shape[0]
Y_hat = activation(X.dot(W)+b)
cost = -1/m * np.sum(Y*np.log(Y_hat)+(1-Y)*np.log(1-Y_hat))
current_error = Y.T - Y_hat
dW = 1/m * np.dot(X.T, current_error)
db = 1/m * np.sum(current_error)
W = W + alpha * dW
b = b + alpha * db
J[0][g] = cost

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Gradient Descent cost function explosion - machine-learning

Related

How does Roblox's math.noise() deal with negative inputs?

Pytorch, slicing tensor causes RuntimeError:: one of the variables needed for gradient computation has been modified by an inplace operation:

Logistic Regression not able to find value of theta

How to solve logistic regression using gradient descent in octave?

Cost value doesn't converge

Categories

Resources