Why my cost function is giving wrong answer? - machine-learning

I have written a code for the cost function and it is giving incorrect answer.
I have read the code many times but I cannot find the mistake.
Here is my code:-
function J = computeCost(X, y, theta)
m = length(y); % number of training examples
s = 0;
h = 0;
sq = 0;
J = 0;
for i = 1:m
h = theta' * X(i, :)';
sq = (h - y(i))^2;
s = s + sq;
end
J = (1/2*m) * s;
end
Example:-
computeCost( [1 2; 1 3; 1 4; 1 5], [7;6;5;4], [0.1;0.2] )
ans = 11.9450
Here the answer should be 11.9450 but my code is giving me this:-
ans = 191.12
I have checked the the matrix multiplication and the code is calculating it right.

It seems you misunderstood the operator evaluation order. In fact
1/2*m ~= 1/(2*m)
With this in mind it seems you're computing an average. Instead of reinventing the wheel it is usually a good idea to use the built in functions to do the job which results in a much clearer (and less error prone) implementation:
function J = computeCost(X, y, theta)
h = X * theta;
sq = (h - y).^2;
J = 1/2 * mean(sq);
end
computeCost( [1,2;1,3;1,4;1,5], [7;6;5;4], [0.1;0.2] )
% ans = 11.9450
Try it online!

Related

Finding the closest centroids in a k means clustering

With
X = [1.8421 4.6076;
5.6586 4.8;
6.3526 3.2909;
2.904 4.6122;
3.232 4.9399;
1.2479 4.9327]
And
centroids = [3 3;
6 2;
8 5]
I'm trying to find the nearest centroid to each point in x.
I'm coding in Octave and here's my code
K = size(centroids, 1);
idx = zeros(size(X,1), 1); %idx is the vector storing the index of the closest centroid
for e = 1: size(X,1)
difference(1, :) = X(e,:) - centroids(1,:);
min_distance = sum(difference(1,:).^2);
for j = 2:K
difference(j, :) = X(e,:) - centroids(j,:);
distance = sum(difference.^2);
if distance<min_distance
min_distance = distance;
idx(e) = centroids(j);
endif
endfor
endfor
The code works, but I only get
idx = 0 0 0
for the first three entries of x
You need to be consistent in the size of your operand when you are calculating
the distance and properly set idx:
K = size(centroids, 1);
difference=zeros(size(centroids));
idx = zeros(size(X,1), 1); %idx is the vector storing the index of the closest centroid
for e = 1: size(X,1)
difference(1, :) = X(e,:) - centroids(1,:);
min_distance = sum(difference(1,:).^2);
idx(e)=1;
for j = 2:K
difference(j, :) = X(e,:) - centroids(j,:);
distance = sum(difference(j,:).^2);
if (distance<min_distance)
min_distance = distance;
idx(e) = j;
endif
endfor
endfor
idx
with this change the output is:
idx =
1
3
2
1
1
1
You may find the min function useful, when using Octave ;)
[minimum_values, minimum_index] = min(k);
You would use the minimum_index value for idx, for example:
idx(i) = minimum_index;
You could therefore reduce this further to:
[minimum_values, idx(i)] = min(k);

My gradient descent is not giving the exact value

I have written gradient descent algorithm in Octave but it is not giving me the exact answer. The answer differs from one to two digits.
Here is my code:
function theta = gradientDescent(X, y, theta, alpha, num_iters)
m = length(y); % number of training examples
s = 0;
temp = theta;
for iter = 1:num_iters
for j = 1:size(theta, 1)
for i = 1:m
h = theta' * X(i, :)';
s = s + (h - y(i))*X(i, j);
end
s = s/m;
temp(j) = temp(j) - alpha * s;
end
theta = temp;
end
end
For:
theta = gradientDescent([1 5; 1 2; 1 4; 1 5],[1 6 4 2]',[0 0]',0.01,1000);
My gradient descent gives this:
4.93708
-0.50549
But it is expected to give this:
5.2148
-0.5733
Minor fixes :
Your variable s probably the delta is initialised incorrectly.
So it the temp variable probably the new theta
Incorrectly calculating the delta
Try with below changes.
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
temp = theta;
for iter = 1:num_iters
temp = zeros(length(theta), 1);
for j = 1:size(theta)
s = 0
for i = 1:m
s = s + (X(i, :)*theta - y(i)) * X(i, j);
end
end
s = s/m;
temp(j) = temp(j) - alpha * s;
end
theta = temp;
J_history(iter) = computeCost(X, y, theta);
end
end

Linear regression implementation in Octave

I recently tried implementing linear regression in octave and couldn't get past the online judge. Here's the code
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
for i = 1:m
temp1 = theta(1)-(alpha/m)*(X(i,:)*theta-y(i,:));
temp2 = theta(2)-(alpha/m)*(X(i,:)*theta-y(i,:))*X(i,2);
theta = [temp1;temp2];
endfor
J_history(iter) = computeCost(X, y, theta);
end
end
I am aware of the vectorized implementation but just wanted to try the iterative method. Any help would be appreciated.
You don't need the inner for loop. Instead you can use the sum function.
In code:
for iter = 1:num_iters
j= 1:m;
temp1 = sum((theta(1) + theta(2) .* X(j,2)) - y(j));
temp2 = sum(((theta(1) + theta(2) .* X(j,2)) - y(j)) .* X(j,2));
theta(1) = theta(1) - (alpha/m) * (temp1 );
theta(2) = theta(2) - (alpha/m) * (temp2 );
J_history(iter) = computeCost(X, y, theta);
end
It'd be a good exercise to implement the vectorized solution too and then compare both of them to see how in practice the efficiency of the the vectorization.

Gradient descent values not correct

I'm attempting to implement gradient descent using code from :
Gradient Descent implementation in octave
I've amended code to following :
X = [1; 1; 1;]
y = [1; 0; 1;]
m = length(y);
X = [ones(m, 1), data(:,1)];
theta = zeros(2, 1);
iterations = 2000;
alpha = 0.001;
for iter = 1:iterations
theta = theta -((1/m) * ((X * theta) - y)' * X)' * alpha;
end
theta
Which gives following output :
X =
1
1
1
y =
1
0
1
theta =
0.32725
0.32725
theta is a 1x2 Matrix but should'nt it be 1x3 as the output (y) is 3x1 ?
So I should be able to multiply theta by the training example to make a prediction but cannot multiply x by theta as x is 1x3 and theta is 1x2?
Update :
%X = [1 1; 1 1; 1 1;]
%y = [1 1; 0 1; 1 1;]
X = [1 1 1; 1 1 1; 0 0 0;]
y = [1 1 1; 0 0 0; 1 1 1;]
m = length(y);
X = [ones(m, 1), X];
theta = zeros(4, 1);
theta
iterations = 2000;
alpha = 0.001;
for iter = 1:iterations
theta = theta -((1/m) * ((X * theta) - y)' * X)' * alpha;
end
%to make prediction
m = size(X, 1); % Number of training examples
p = zeros(m, 1);
htheta = sigmoid(X * theta);
p = htheta >= 0.5;
You are misinterpreting dimensions here. Your data consists of 3 points, each having a single dimension. Furthermore, you add a dummy dimension of 1s
X = [ones(m, 1), data(:,1)];
thus
octave:1> data = [1;2;3]
data =
1
2
3
octave:2> [ones(m, 1), data(:,1)]
ans =
1 1
1 2
1 3
and theta is your parametrization, which you should be able to apply through (this is not a code, but math notation)
h(x) = x1 * theta1 + theta0
thus your theta should have two dimensions. One is a weight for your dummy dimension (so called bias) and one for actual X dimension. If your X has K dimensions, theta would have K+1. Thus, after adding a dummy dimension matrices have following shapes:
X is 3x2
y is 3x1
theta is 2x1
so
X * theta is 3x1
the same as y

Fast bilinear interpolation on old iOS devices

I've got the following code to do a biliner interpolation from a matrix of 2D vectors, each cell has x and y values of the vector, and the function receives k and l indices telling the bottom-left nearest position in the matrix
// p[1] returns the interpolated values
// fieldLinePointsVerts the raw data array of fieldNumHorizontalPoints x fieldNumVerticalPoints
// only fieldNumHorizontalPoints matters to determine the index to access the raw data
// k and l horizontal and vertical indices of the point just bellow p[0] in the raw data
void interpolate( vertex2d* p, vertex2d* fieldLinePointsVerts, int fieldNumHorizontalPoints, int k, int l ) {
int index = (l * fieldNumHorizontalPoints + k) * 2;
vertex2d p11;
p11.x = fieldLinePointsVerts[index].x;
p11.y = fieldLinePointsVerts[index].y;
vertex2d q11;
q11.x = fieldLinePointsVerts[index+1].x;
q11.y = fieldLinePointsVerts[index+1].y;
index = (l * fieldNumHorizontalPoints + k + 1) * 2;
vertex2d q21;
q21.x = fieldLinePointsVerts[index+1].x;
q21.y = fieldLinePointsVerts[index+1].y;
index = ( (l + 1) * fieldNumHorizontalPoints + k) * 2;
vertex2d q12;
q12.x = fieldLinePointsVerts[index+1].x;
q12.y = fieldLinePointsVerts[index+1].y;
index = ( (l + 1) * fieldNumHorizontalPoints + k + 1 ) * 2;
vertex2d p22;
p22.x = fieldLinePointsVerts[index].x;
p22.y = fieldLinePointsVerts[index].y;
vertex2d q22;
q22.x = fieldLinePointsVerts[index+1].x;
q22.y = fieldLinePointsVerts[index+1].y;
float fx = 1.0 / (p22.x - p11.x);
float fx1 = (p22.x - p[0].x) * fx;
float fx2 = (p[0].x - p11.x) * fx;
vertex2d r1;
r1.x = fx1 * q11.x + fx2 * q21.x;
r1.y = fx1 * q11.y + fx2 * q21.y;
vertex2d r2;
r2.x = fx1 * q12.x + fx2 * q22.x;
r2.y = fx1 * q12.y + fx2 * q22.y;
float fy = 1.0 / (p22.y - p11.y);
float fy1 = (p22.y - p[0].y) * fy;
float fy2 = (p[0].y - p11.y) * fy;
p[1].x = fy1 * r1.x + fy2 * r2.x;
p[1].y = fy1 * r1.y + fy2 * r2.y;
}
Currently this code needs to be run every single frame in old iOS devices, say devices with arm6 processors
I've taken the numeric sub-indices from the wikipedia's equations http://en.wikipedia.org/wiki/Bilinear_interpolation
I'd accreciate any comments on optimization for performance, even plain asm code
This code should not be causing your slowdown if it's only run once per frame. However, if it's run multiple times per frame, it easily could be.
I'd run your app with a profiler to see where the true performance problem lies.
There is some room for optimization here: a) Certain index calculations could be factored out and re-used in subsequent calculations), b) You could dereference your fieldLinePointsVerts array to a pointer once and re-use that, instead of indexing it twice per index...
but in general those things won't help a great deal, unless this function is being called many, many times per frame. In which case every little thing will help.

Resources