Multivariate Linear Regression using gradient descent - machine-learning
I am learning Multivariate Linear Regression using gradient descent. I have written below python code:
import pandas as pd
import numpy as np
x1 = np.array([1,2,3,4,5,6,7,8,9,10],dtype='float64')
x2 = np.array([5,10,20,40,80,160,320,640,1280,2560],dtype='float64')
y = np.array([350,700,1300,2400,4500,8600,16700,32800,64900,129000],dtype='float64')
def multivar_gradient_descent(x1,x2,y):
w1=w2=w0=0
iteration=500
n=len(x1)
learning_rate=0.02
for i in range(iteration):
y_predicted = w1 * x1 + w2 * x2 +w0
cost = (1*(2/n))*float(sum((y_predicted-y)**2)) # cost function
x1d = sum(x1*(y_predicted-y))/n # derivative for feature x1
x2d = sum(x2*(y_predicted-y))/n # derivative for feature x2
cd = sum(1*(y-y_predicted))/n # derivative for bias
w1 = w1 - learning_rate * x1d
w2 = w2 - learning_rate * x2d
w0 = w0 - learning_rate * cd
print(f"Iteration {i}: a= {w1}, b = {w2}, c = {w0}, cost = {cost} ")
return w1,w2, w0
w1,w2,w0 = multivar_gradient_descent(x1,x2,y)
w1,w2,w0
However, the result is the cost function kept getting higher and higher until it became inf (shown below). I have spent hours checking the formula of derivatives and cost function, but I couldn't identify where the mistake is.
I feel so frustrated, and hope someone could help me with this. Thank you.
Iteration 0: a= 4685.5, b = 883029.5, c = -522.5, cost = 4462002500.0
Iteration 1: a= -81383008.375, b = -15430704757.735, c = 9032851.74, cost = 1.3626144151911089e+18
Iteration 2: a= 1422228350500.3176, b = 269662832866446.66, c = -157855848816.2755, cost = 4.161440004246925e+26
Iteration 3: a= -2.4854478828631716e+16, b = -4.712554891970221e+18, c = 2758646212375989.0, cost = 1.2709085355243152e+35
Iteration 4: a= 4.343501644116814e+20, b = 8.235533749226551e+22, c = -4.820935671838988e+19, cost = 3.881369199171854e+43
Iteration 5: a= -7.590586253095058e+24, b = -1.4392196523846473e+27, c = 8.424937075201089e+23, cost = 1.1853745914189544e+52
Iteration 6: a= 1.326510368511469e+29, b = 2.5151414235959125e+31, c = -1.472319266480111e+28, cost = 3.620147555871397e+60
Iteration 7: a= -2.3181737208386835e+33, b = -4.3953932745475034e+35, c = 2.5729854159139745e+32, cost = 1.105597202871857e+69
Iteration 8: a= 4.051177832870898e+37, b = 7.681270666011396e+39, c = -4.496479874458965e+36, cost = 3.37650649906685e+77
Iteration 9: a= -7.079729049644685e+41, b = -1.3423581317783506e+44, c = 7.857926879944079e+40, cost = 1.0311889455424087e+86
Iteration 10: a= 1.2372343423113349e+46, b = 2.3458688442326932e+48, c = -1.3732300949746233e+45, cost = 3.1492628303921182e+94
Iteration 11: a= -2.1621573467862958e+50, b = -4.099577083092681e+52, c = 2.3998198539580117e+49, cost = 9.617884692967256e+102
Iteration 12: a= 3.7785278280657085e+54, b = 7.164310273158479e+56, c = -4.193860411686855e+53, cost = 2.937312982406619e+111
Iteration 13: a= -6.603253259383672e+58, b = -1.2520155286691985e+61, c = 7.32907727374022e+57, cost = 8.970587433766233e+119
Iteration 14: a= 1.1539667190934036e+63, b = 2.187988549158328e+65, c = -1.280809765026251e+62, cost = 2.739627659321216e+128
Iteration 15: a= -2.0166410956339498e+67, b = -3.823669740212017e+69, c = 2.238308579532037e+66, cost = 8.366854196711946e+136
Iteration 16: a= 3.524227554668779e+71, b = 6.682142046784112e+73, c = -3.9116076672823015e+70, cost = 2.5552468384109146e+145
Iteration 17: a= -6.158844964518726e+75, b = -1.1677531106785476e+78, c = 6.835819994909099e+74, cost = 7.80375306142527e+153
Iteration 18: a= 1.0763031248287995e+80, b = 2.0407338215081817e+82, c = -1.194609454154816e+79, cost = 2.3832751078395456e+162
Iteration 19: a= -1.8809182942418207e+84, b = -3.5663313522046286e+86, c = 2.0876672425822773e+83, cost = 7.278549429920333e+170
Iteration 20: a= 3.287042049772272e+88, b = 6.232424424816986e+90, c = -3.648350932258958e+87, cost = 2.2228773182554595e+179
Iteration 21: a= -5.744345977200645e+92, b = -1.0891616727381027e+95, c = 6.375759629418162e+91, cost = 6.788692746528022e+187
Iteration 22: a= 1.0038664004334024e+97, b = 1.9033895455483145e+99, c = -1.1142105462686083e+96, cost = 2.0732745270409844e+196
Iteration 23: a= -1.7543298295730705e+101, b = -3.326312202113057e+103, c = 1.9471642809242535e+100, cost = 6.331804111587467e+204
Iteration 24: a= 3.065819465220816e+105, b = 5.812973435628952e+107, c = -3.402811748286256e+104, cost = 1.9337402155196325e+213
Iteration 25: a= -5.357743358678581e+109, b = -1.0158595498601174e+112, c = 5.946661977991267e+108, cost = 5.905664728753603e+221
Iteration 26: a= 9.363047701635277e+113, b = 1.7752887338463183e+116, c = -1.0392225987316703e+113, cost = 1.8035967607506306e+230
Iteration 27: a= -1.6362609478315793e+118, b = -3.102446680700735e+120, c = 1.816117367544431e+117, cost = 5.508205129817299e+238
Iteration 28: a= 2.8594854738709632e+122, b = 5.421752091975047e+124, c = -3.1737976990896245e+121, cost = 1.6822121447766637e+247
Iteration 29: a= -4.997159643830032e+126, b = -9.474907636509772e+128, c = 5.546443206127292e+125, cost = 5.13749512471037e+255
Iteration 30: a= 8.732901332811723e+130, b = 1.655809288168471e+133, c = -9.692814462503292e+129, cost = 1.5689968853439082e+264
Iteration 31: a= -1.5261382690222234e+135, b = -2.8936476258832726e+137, c = 1.6938900970034892e+134, cost = 4.791734427889445e+272
Iteration 32: a= 2.667038052317318e+139, b = 5.056860498736353e+141, c = -2.960196619698286e+138, cost = 1.46340117318896e+281
Iteration 33: a= -4.660843723593812e+143, b = -8.837232935670386e+145, c = 5.173159724337836e+142, cost = 4.4692439155775235e+289
Iteration 34: a= 8.145164706926056e+147, b = 1.5443709783730996e+150, c = -9.040474323708519e+146, cost = 1.364912201990395e+298
Iteration 35: a= -1.4234270024354842e+152, b = -2.698901043124031e+154, c = 1.5798888948493553e+151, cost = 4.168457471405497e+306
Iteration 36: a= 2.487542614748579e+156, b = 4.716526626425798e+158, c = -2.760971195418877e+155, cost = inf
Iteration 37: a= -4.347162341028204e+160, b = -8.24247464517401e+162, c = 4.824998749459281e+159, cost = inf
Iteration 38: a= 7.596983588224419e+164, b = 1.4404326246286964e+167, c = -8.432037599998082e+163, cost = inf
Iteration 39: a= -1.3276283495338805e+169, b = -2.517261181154549e+171, c = 1.473560135031107e+168, cost = inf
Iteration 40: a= 2.32012747430196e+173, b = 4.399097705650062e+175, c = -2.5751539243057795e+172, cost = inf
The issue here is that you initialized the weights to be 0 as indicated in w1=w2=w0=0.
If all the weights are initialized with 0, the derivative with respect to loss function is the same for every w in W[l], thus all weights have the same value in subsequent iterations.
With that we will have to initialize the weights to a random value.
Weight initialization with a large random value:
When the weights are initialized with a very high value, the term np.dot(W,X)+b becomes significantly higher and if an activation function like sigmoid() is applied, the function maps its value near to 1 where the slope of gradient changes slowly and learning takes a lot of time.
There are many ways in which you could initialize the weights for example in Keras, Dense, LSTM and CNN layers are all initialized with the glorot_uniform otherwise known as the Xavier initialization.
For your purposes you can follow the following formula to randomly initialize the weights using numpy's random.randn where l is a particular layer. This will result in the weights randomly being initialized with a value between 0 to 1:
# Specify the random seed value for reproducibility.
np.random.seed(3)
W[l] = np.random.randn(l, l-1)
Another thing you should do is Feature Normalization as a preprocessing step where you return a normalized version of the data where the mean value of each feature is 0 and the standard deviation is 1. This is often a good preprocessing step to do when working with learning algorithms.
def featureNormalize(X):
"""
X : The dataset of shape (m x n)
"""
X_norm = X.copy()
mu = np.zeros(X.shape[1])
sigma = np.zeros(X.shape[1])
mu = np.mean(X, axis=0)
sigma = np.std(X, axis=0)
X_norm = (X-mu)/ sigma
return X_norm
Related
Precision and recall missunderstanding
In pycocotools in cocoeval.py sctipt there is COCOeval class and in this class there is accumulate function for calculating Precision and Recall. Does anyone know what is this npig variable? Is this negative-positive or? Because I saw this formula for recall: Recall = (True Positive)/(True Positive + False Negative) Can I just use this precision and recall variable inside dictionary self.eval to get precision and recall of my model which I'm testing, and plot a precision-recall curve? And the variable scores is this F1 score? Because I'm not very well understand this T,R,K,A,M what is happening with this. How can I print precision and recall in terminal? def accumulate(self, p = None): ''' Accumulate per image evaluation results and store the result in self.eval :param p: input params for evaluation :return: None ''' print('Accumulating evaluation results...') tic = time.time() if not self.evalImgs: print('Please run evaluate() first') # allows input customized parameters if p is None: p = self.params p.catIds = p.catIds if p.useCats == 1 else [-1] T = len(p.iouThrs) R = len(p.recThrs) K = len(p.catIds) if p.useCats else 1 A = len(p.areaRng) M = len(p.maxDets) precision = -np.ones((T,R,K,A,M)) # -1 for the precision of absent categories recall = -np.ones((T,K,A,M)) scores = -np.ones((T,R,K,A,M)) # create dictionary for future indexing _pe = self._paramsEval catIds = _pe.catIds if _pe.useCats else [-1] setK = set(catIds) setA = set(map(tuple, _pe.areaRng)) setM = set(_pe.maxDets) setI = set(_pe.imgIds) # get inds to evaluate k_list = [n for n, k in enumerate(p.catIds) if k in setK] m_list = [m for n, m in enumerate(p.maxDets) if m in setM] a_list = [n for n, a in enumerate(map(lambda x: tuple(x), p.areaRng)) if a in setA] i_list = [n for n, i in enumerate(p.imgIds) if i in setI] I0 = len(_pe.imgIds) A0 = len(_pe.areaRng) # retrieve E at each category, area range, and max number of detections for k, k0 in enumerate(k_list): Nk = k0*A0*I0 for a, a0 in enumerate(a_list): Na = a0*I0 for m, maxDet in enumerate(m_list): E = [self.evalImgs[Nk + Na + i] for i in i_list] E = [e for e in E if not e is None] if len(E) == 0: continue dtScores = np.concatenate([e['dtScores'][0:maxDet] for e in E]) # different sorting method generates slightly different results. # mergesort is used to be consistent as Matlab implementation. inds = np.argsort(-dtScores, kind='mergesort') dtScoresSorted = dtScores[inds] dtm = np.concatenate([e['dtMatches'][:,0:maxDet] for e in E], axis=1)[:,inds] dtIg = np.concatenate([e['dtIgnore'][:,0:maxDet] for e in E], axis=1)[:,inds] gtIg = np.concatenate([e['gtIgnore'] for e in E]) npig = np.count_nonzero(gtIg==0 ) if npig == 0: continue tps = np.logical_and( dtm, np.logical_not(dtIg) ) fps = np.logical_and(np.logical_not(dtm), np.logical_not(dtIg) ) tp_sum = np.cumsum(tps, axis=1).astype(dtype=np.float) fp_sum = np.cumsum(fps, axis=1).astype(dtype=np.float) for t, (tp, fp) in enumerate(zip(tp_sum, fp_sum)): tp = np.array(tp) fp = np.array(fp) nd = len(tp) rc = tp / npig pr = tp / (fp+tp+np.spacing(1)) q = np.zeros((R,)) ss = np.zeros((R,)) if nd: recall[t,k,a,m] = rc[-1] else: recall[t,k,a,m] = 0 # numpy is slow without cython optimization for accessing elements # use python array gets significant speed improvement pr = pr.tolist(); q = q.tolist() for i in range(nd-1, 0, -1): if pr[i] > pr[i-1]: pr[i-1] = pr[i] inds = np.searchsorted(rc, p.recThrs, side='left') try: for ri, pi in enumerate(inds): q[ri] = pr[pi] ss[ri] = dtScoresSorted[pi] except: pass precision[t,:,k,a,m] = np.array(q) scores[t,:,k,a,m] = np.array(ss) self.eval = { 'params': p, 'counts': [T, R, K, A, M], 'date': datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S'), 'precision': precision, 'recall': recall, 'scores': scores, } toc = time.time() print('DONE (t={:0.2f}s).'.format( toc-tic))
How to solve logistic regression using gradient descent in octave?
I am learning Machine Learning course from coursera from Andrews Ng. I have written a code for logistic regression in octave. But, it is not working. Can someone help me? I have taken the dataset from the following link: Titanic survivors Here is my code: pkg load io; [An, Tn, Ra, limits] = xlsread("~/ML/ML Practice/dataset/train_and_test2.csv", "Sheet2", "A2:H1000"); # As per CSV file we are reading columns from 1 to 7. 8-th column is Survived, which is what we are going to predict X = [An(:, [1:7])]; Y = [An(:, 8)]; X = horzcat(ones(size(X,1), 1), X); # Initializing theta values as zero for all #theta = zeros(size(X,2),1); theta = [-3;1;1;-3;1;1;1;1]; learningRate = -0.00021; #learningRate = -0.00011; # Step 1: Calculate Hypothesis function g_z = estimateHypothesis(X, theta) z = theta' * X'; z = z'; e_z = -1 * power(2.72, z); denominator = 1.+e_z; g_z = 1./denominator; endfunction # Step 2: Calculate Cost function function cost = estimateCostFunction(hypothesis, Y) log_1 = log(hypothesis); log_2 = log(1.-hypothesis); y1 = Y; term_1 = y1.*log_1; y2 = 1.-Y; term_2 = y2.*log_2; cost = term_1 + term_2; cost = sum(cost); # no.of.rows m = size(Y, 1); cost = -1 * (cost/m); endfunction # Step 3: Using gradient descent I am updating theta values function updatedTheta = updateThetaValues(_X, _Y, _theta, _hypothesis, learningRate) #s1 = _X * _theta; #s2 = s1 - _Y; #s3 = _X' * s2; # no.of.rows #m = size(_Y, 1); #s4 = (learningRate * s3)/m; #updatedTheta = _theta - s4; s1 = _hypothesis - _Y; s2 = s1 .* _X; s3 = sum(s2); # no.of.rows m = size(_Y, 1); s4 = (learningRate * s3)/m; updatedTheta = _theta .- s4'; endfunction costVector = []; iterationVector = []; for i = 1:1000 # Step 1 hypothesis = estimateHypothesis(X, theta); #disp("hypothesis"); #disp(hypothesis); # Step 2 cost = estimateCostFunction(hypothesis, Y); costVector = vertcat(costVector, cost); #disp("Cost"); #disp(cost); # Step 3 - Updating theta values theta = updateThetaValues(X, Y, theta, hypothesis, learningRate); iterationVector = vertcat(iterationVector, i); endfor function plotGraph(iterationVector, costVector) plot(iterationVector, costVector); ylabel('Cost Function'); xlabel('Iteration'); endfunction plotGraph(iterationVector, costVector); This is the graph I am getting when I am plotting against no.of.iterations and cost function. I am tired by adjusting theta values and learning rate. Can someone help me to solve this problem. Thanks.
I have done a mathematical error. I should have used either power(2.72, -z) or exp(-z). Instead I have used as -1 * power(2.72, z). Now, I'm getting a proper curve. Thanks.
Cost value doesn't converge
I'm trying code a logistic regression but I'm in trouble getting a convergent COST, can anyone help me? Below are my codes. Thank you! #input: m = 3, n = 4 # we have 3 training examples and each of them has 4 features (Sorry, I know it looks weired here). Y is a label matrix. X = np.array([[1,2,1],[1,1,0],[1,2,1],[1,0,2]]) Y = np.array([[0,1,0]]) h = 100000 #iterations alpha = 0.05 #learning rate b = 0 #scalar bias W = np.zeros(n).reshape(1,n) #weights J = np.zeros(h).reshape(1,h) #a vector for holing cost value Yhat = np.zeros(m).reshape(1,m) #predicted value def activation(yhat): return 1/(1+np.exp(-yhat)) W=W.T for g in range(h): m = X.T.shape[0] Y_hat = activation(X.dot(W)+b) cost = -1/m * np.sum(Y*np.log(Y_hat)+(1-Y)*np.log(1-Y_hat)) current_error = Y.T - Y_hat dW = 1/m * np.dot(X.T, current_error) db = 1/m * np.sum(current_error) W = W + alpha * dW b = b + alpha * db J[0][g] = cost
How to create stratified folds for repeatedcv in caret?
The way to create stratified folds for cv in caret is like this library(caret) library(data.table) train_dat <- data.table(group = c(rep("group1",10), rep("group2",5)), x1 = rnorm(15), x2 = rnorm(15), label = factor(c(rep("treatment",15), rep("control",15)))) folds <- createFolds(train_dat[, group], k = 5) fitCtrl <- trainControl(method = "cv", index = folds, classProbs = T, summaryFunction = twoClassSummary) train(label~., data = train_dat[, !c("group"), with = F], trControl = fitCtrl, method = "xgbTree", metric = "ROC") To balance group1 and group2, the creation of fold indexes is based on "group" variable. However, is there any way to createFolds for repeatedcv in caret? So, I can have a balanced split for repeatedcv. Should I combined several createFolds and run trainControl? trControl = trainControl(method = "cv", index = many_repeated_folds) Thanks!
createMultiFolds is probably what you are interested in.
How to derive an objective function for a multi-class logistic regression classifier using 1-of-k encoding?
I get what this wiki page says(http://en.wikipedia.org/wiki/Multinomial_logistic_regression), but I don't know how to get the update rules for stochastic gradient descent. Sorry to ask this here(this is really just about machine learning theories instead of actual implementation). Could someone provide a solution with explanation? Thanks in advance!
I happened to write code to implent softmax, I refer most to the page http://ufldl.stanford.edu/wiki/index.php/Softmax_Regression this is the code I wrote in matlab ,hope it will help function y = sigmoid_multi(weight,x,class_index) %% weight feature_dim * class_num %% x feature_dim * 1 %% class_index scalar sum = eps; class_num = size(weight,2); for i = 1:class_num sum = sum + exp(weight(:,i)'*x); end y = exp(weight(:,class_index)'*x)/sum; end function g = gradient(train_patterns,train_labels,weight) m = size(train_patterns,2); class_num = size(weight,2); g = zeros(size(weight)); for j = 1:class_num for i = 1:m if(train_labels(i) == j) g(:,j) = g(:,j) + (1 - log( sigmoid_multi(weight,train_patterns(:,i),j) + eps))*train_patterns(:,i); end end end g = -(g/m); end function J = object_function(train_patterns,train_labels,weight) m = size(train_patterns,2); J = 0; for i = 1:m J = J + log( sigmoid_multi(weight,train_patterns(:,i),train_labels(i)) + eps); end J = -(J/m); end function weight = multi_logistic_train(train_patterns,train_labels,alpha) %% weight feature_dim * class_num %% train_patterns featur_dim * sample_num %% train_labels 1 * sample_num %% alpha scalar class_num = length(unique(train_labels)); m = size(train_patterns,2); %% sample_number; n = size(train_patterns,1); % feature_dim; weight = rand(n,class_num); for i = 1:40 J = object_function(train_patterns,train_labels,weight); fprintf('objec function value : %f\n',J); weight = weight - alpha*gradient(train_patterns,train_labels,weight); end end