Implementing a linear regression using gradient descent - machine-learning

I'm trying to implement a linear regression with gradient descent as explained in this article (https://towardsdatascience.com/linear-regression-using-gradient-descent-97a6c8700931).
I've followed to the letter the implementation, yet my results overflow after a few iterations.
I'm trying to get this result approximately: y = -0.02x + 8499.6.
The code:
package main
import (
"encoding/csv"
"fmt"
"strconv"
"strings"
)
const (
iterations = 1000
learningRate = 0.0001
)
func computePrice(m, x, c float64) float64 {
return m * x + c
}
func computeThetas(data [][]float64, m, c float64) (float64, float64) {
N := float64(len(data))
dm, dc := 0.0, 0.0
for _, dataField := range data {
x := dataField[0]
y := dataField[1]
yPred := computePrice(m, x, c)
dm += (y - yPred) * x
dc += y - yPred
}
dm *= -2/N
dc *= -2/N
return m - learningRate * dm, c - learningRate * dc
}
func main() {
data := readXY()
m, c := 0.0, 0.0
for k := 0; k < iterations; k++ {
m, c = computeThetas(data, m, c)
}
fmt.Printf("%.4fx + %.4f\n", m, c)
}
func readXY() ([][]float64) {
file := strings.NewReader(data)
reader := csv.NewReader(file)
records, err := reader.ReadAll()
if err != nil {
panic(err)
}
records = records[1:]
size := len(records)
data := make([][]float64, size)
for i, v := range records {
val1, err := strconv.ParseFloat(v[0], 64)
if err != nil {
panic(err)
}
val2, err := strconv.ParseFloat(v[1], 64)
if err != nil {
panic(err)
}
data[i] = []float64{val1, val2}
}
return data
}
var data = `km,price
240000,3650
139800,3800
150500,4400
185530,4450
176000,5250
114800,5350
166800,5800
89000,5990
144500,5999
84000,6200
82029,6390
63060,6390
74000,6600
97500,6800
67000,6800
76025,6900
48235,6900
93000,6990
60949,7490
65674,7555
54000,7990
68500,7990
22899,7990
61789,8290`
And here it can be worked on in the GO playground:
https://play.golang.org/p/2CdNbk9_WeY
What do I need to fix to get the correct result ?

Why would a formula work on one data set and not another one?
In addition to sascha's remarks, here's another way to look at problems of this application of gradient descent: The algorithm offers no guarantee that an iteration yields a better result than the previous, so it doesn't necessarily converge to a result, because:
The gradients dm and dc in axes m and c are handled indepently from each other; m is updated in the descending direction according to dm, and c at the same time is updated in the descending direction according to dc — but, with certain curved surfaces z = f(m, c), the gradient in a direction between axes m and c can have the opposite sign compared to m and c on their own, so, while updating any one of m or c would converge, updating both moves away from the optimum.
However, more likely the failure reason in this case of linear regression to a point cloud is the entirely arbitrary magnitude of the update to m and c, determined by the product of an obscure learning rate and the gradient. It is quite possible that such an update oversteps a minimum for the target function, even that this is repeated with higher amplitude in each iteration.

Related

Vectorization issue

Say you have two column vectors vv and ww, each with 7 elements (i.e., they have dimensions 7x1). Consider the following code:
z = 0;
for i = 1:7
z = z + v(i) * w(i)
end
A) z = sum (v .* w);
B) z = w' * v;
C) z = v * w;
D) z = w * v;
According to the solutions, answers (A) AND (B) are the right answers, can someone please help me understand why?
Why is z = v * w' which is similar to answer (B) but only the order of the operation changes, is false? Since we want a vector that by definition only has one column, wouldn't we need a matrix of this size: 1x7 * 7x1 = 1x1 ? So why is z = v' * w false ? It gives the same dimension as answer (B)?
z = v'*w is true and is equal to w'*v.
They both makes 1*1 matrix, which is a number value in octave.
See this:
octave:5> v = rand(7, 1);
octave:6> w = rand(7, 1);
octave:7> v'*w
ans = 1.3110
octave:8> w'*v
ans = 1.3110
octave:9> sum(v.*w)
ans = 1.3110
Answers A and B both perform a dot product of the two vectors, which yields the same result as the code provided. Answer A first performs the element-wise product (.*) of the two column vectors, then sums those intermediate values. Answer B performs the same mathematical operation but does so via a dot product (i.e., matrix multiplication).
Answer C is incorrect because it would be performing a matrix multiplication on misaligned matrices (7x1 and 7x1). The same is true for D.
z = v * w', which was not one of the options, is incorrect because it would yield a 7x7 matrix (instead of the 1x1 scalar value desired). The point is that order matters when performing matrix multiplication. (1xN)X(Nx1) -> (1x1), whereas (Nx1)X(1xN) -> (NxN).
z = v' * w is actually a correct solution but was simply not provided as one of the options.

OpenCV equivalent of np.where()

When using gocv package it is possible, for example, to perform template matching of a pattern within an image. The package also provide the MinMaxLoc function to retrieve locations of minimums and maximums within the matrix.
However, in below python example, the writer uses numpy.Where to threshold the matrix and get locations of multiple maximums. The python zip function is used to glue values together so they are like a slice [][2]int, the inner slice being xs and ys of the matches found.
The syntax loc[::-1] reverses the array.
The star operator in zip(*loc..) is being used to unpack the slices given to zip.
https://docs.opencv.org/master/d4/dc6/tutorial_py_template_matching.html
import cv2 as cv
import numpy as np
from matplotlib import pyplot as plt
img_rgb = cv.imread('mario.png')
img_gray = cv.cvtColor(img_rgb, cv.COLOR_BGR2GRAY)
template = cv.imread('mario_coin.png',0)
w, h = template.shape[::-1]
res = cv.matchTemplate(img_gray,template,cv.TM_CCOEFF_NORMED)
threshold = 0.8
loc = np.where( res >= threshold)
for pt in zip(*loc[::-1]):
cv.rectangle(img_rgb, pt, (pt[0] + w, pt[1] + h), (0,0,255), 2)
cv.imwrite('res.png',img_rgb)
How do I implement the same np.where algorithm in Go to get the multiple locations after the threshold is applied?
OpenCV has a built-in (semi-)equivalent function to np.where(), which is findNonZero(). As implied by the name, it finds the non-zero elements in an image, which is what np.where() does when called with a single argument, as the numpy docs state.
And this is available in the golang bindings as well. From the gocv docs on FindNonZero:
func FindNonZero(src Mat, idx *Mat)
FindNonZero returns the list of locations of non-zero pixels.
For further details, please see: https://docs.opencv.org/master/d2/de8/group__core__array.html#gaed7df59a3539b4cc0fe5c9c8d7586190
Note: np.where() returns indexes in array order, that is, (row, col) or (i, j) which is opposite to typical image indexing (x, y). That is why loc is reversed in Python. When using findNonZero() you won't need to do that, since OpenCV always uses (x, y) for points.
For anyone coming across this I hope a full example keeps you from spending days hitting your head against the wall and reading the same google results over and over until something clicks.
package main
import (
"fmt"
"image"
"image/color"
"os"
"gocv.io/x/gocv"
)
func OpenImage(path string) (image.Image, error) {
f, err := os.Open(path)
if err != nil {
return nil, err
}
defer f.Close()
img, _, err := image.Decode(f)
return img, err
}
func main() {
src := gocv.IMRead("haystack.png", gocv.IMReadGrayScale)
tgt := gocv.IMRead("needle.png", gocv.IMReadGrayScale)
if src.Empty() {
fmt.Printf("failed to read image")
os.Exit(1)
}
if tgt.Empty() {
fmt.Printf("failed to read image")
os.Exit(1)
}
// Get image size
tgtImg, _ := tgt.ToImage()
iX, iY := tgtImg.Bounds().Size().X, tgtImg.Bounds().Size().Y
// Perform a match template operation
res := gocv.NewMat()
gocv.MatchTemplate(src, tgt, &res, gocv.TmSqdiffNormed, gocv.NewMat())
// Set a thresh hold. Using the `gocv.TmSqdiffNormed` confidence levels are
// reversed. Meaning the lowest value is actually the greatest confidence.
// So here I perform an Inverse Binary Threshold setting all values
// above 0.16 to 1.
thresh := gocv.NewMat()
gocv.Threshold(res, &thresh, 0.16, 1.0, gocv.ThresholdBinaryInv)
// Filter out all the non-zero values.
gocv.FindNonZero(thresh, &res)
// FindNonZero returns a list or vector of locations in the form of a gocv.Mat when using gocv.
// There may be a better way to do this, but I iterate through each found location getting the int vector in value
// at each row. I have to convert the returned int32 values into ints. Then draw a rectangle around each point.
//
// The result of get res.GetVeciAt(i, 0) is just a slice of x, y integers so each value can be accessed by
// using slice/array syntax.
for i := 0; i < res.Rows(); i++ {
x, y := res.GetVeciAt(i, 0)[0], res.GetVeciAt(i, 0)[1]
xi, yi := int(x), int(y)
gocv.Rectangle(&src, image.Rect(xi, yi, xi+iX, yi+iY), color.RGBA{0, 0, 0, 1}, 2)
}
w := gocv.NewWindow("Test")
w.IMShow(src)
if w.WaitKey(0) > 1 {
os.Exit(0)
}
}

Not getting accurate result when using normalized data for gradient descent

I am currently on week 2 of Andrew NG's Machine Learning course on Coursera, and I came across an issue that I cannot sort out.
Based on a data set, where the first column is the house size, the second the number of bedrooms in it, and the third column is the price of it, I need to use linear regression and gradient descent after normalizing the data to predict new house prices.
However, I am getting a gigantic number for my prediction and I cannot find where is the error on my calculations.
I am using the following:
alpha = 0.03;
num_iters = 400;
Code to normalize the features (X is the data set matrix):
X_norm = X;
mu = zeros(1, size(X, 2));
sigma = zeros(1, size(X, 2));
for i = 1:size(X, 2);
mu(1, i) = mean(X(:, i)), % Getting the mean of each row.
sigma(1, i) = std(X(:, i)), % Getting the standard deviation of each row.
for j = 1:size(X, 1);
X_norm(j, i) = (X(j, i) .- mu(1, i)) ./ sigma(1, i);
end;
end;
Code to calculate current cost:
m = length(y);
J = 0;
predictions = X * theta;
sqErrors = (predictions - y).^2;
J = (1/(2*m)) * sum(sqErrors);
Code to calculate gradient descent:
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
% Getting the predictions for our firstly chosen theta values.
predictions = X * theta;
% Getting the error difference of the hypothesis(h(x)) and real results(y).
diff = predictions - y;
% Getting the number of features.
features_num = size(X, 2);
% Applying gradient descent for each feature.
for i = 1:features_num;
theta(i, 1) = theta(i, 1) - (alpha / m) * sum(diff .* X(:, i))
end;
% Saving the cost J in every iteration
J_history(iter) = computeCostMulti(X, y, theta);
The resulting price I am getting when predicting a house with 1650 squared feet and 3 bedrooms:
182329818.366117

Cost Function, Linear Regression, trying to avoid hard coding theta. Octave.

I'm in the second week of Professor Andrew Ng's Machine Learning course through Coursera. We're working on linear regression and right now I'm dealing with coding the cost function.
The code I've written solves the problem correctly but does not pass the submission process and fails the unit test because I have hard coded the values of theta and not allowed for more than two values for theta.
Here's the code I've got so far
function J = computeCost(X, y, theta)
m = length(y);
J = 0;
for i = 1:m,
h = theta(1) + theta(2) * X(i)
a = h - y(i);
b = a^2;
J = J + b;
end;
J = J * (1 / (2 * m));
end
the unit test is
computeCost( [1 2 3; 1 3 4; 1 4 5; 1 5 6], [7;6;5;4], [0.1;0.2;0.3])
and should produce ans = 7.0175
So I need to add another for loop to iterate over theta, therefore allowing for any number of values for theta, but I'll be damned if I can wrap my head around how/where.
Can anyone suggest a way I can allow for any number of values for theta within this function?
If you need more information to understand what I'm trying to ask, I will try my best to provide it.
You can use vectorize of operations in Octave/Matlab.
Iterate over entire vector - it is really bad idea, if your programm language let you vectorize operations.
R, Octave, Matlab, Python (numpy) allow this operation.
For example, you can get scalar production, if theta = (t0, t1, t2, t3) and X = (x0, x1, x2, x3) in the next way:
theta * X' = (t0, t1, t2, t3) * (x0, x1, x2, x3)' = t0*x0 + t1*x1 + t2*x2 + t3*x3
Result will be scalar.
For example, you can vectorize h in your code in the next way:
H = (theta'*X')';
S = sum((H - y) .^ 2);
J = S / (2*m);
Above answer is perfect but you can also do
H = (X*theta);
S = sum((H - y) .^ 2);
J = S / (2*m);
Rather than computing
(theta' * X')'
and then taking the transpose you can directly calculate
(X * theta)
It works perfectly.
The below line return the required 32.07 cost value while we run computeCost once using θ initialized to zeros:
J = (1/(2*m)) * (sum(((X * theta) - y).^2));
and is similar to the original formulas that is given below.
It can be also done in a line-
m- # training sets
J=(1/(2*m)) * ((((X * theta) - y).^2)'* ones(m,1));
J = sum(((X*theta)-y).^2)/(2*m);
ans = 32.073
Above answer is perfect,I thought the problem deeply for a day and still unfamiliar with Octave,so,Just study together!
If you want to use only matrix, so:
temp = (X * theta - y); % h(x) - y
J = ((temp')*temp)/(2 * m);
clear temp;
This would work just fine for you -
J = sum((X*theta - y).^2)*(1/(2*m))
This directly follows from the Cost Function Equation
Python code for the same :
def computeCost(X, y, theta):
m = y.size # number of training examples
J = 0
H = (X.dot(theta))
S = sum((H - y)**2);
J = S / (2*m);
return J
function J = computeCost(X, y, theta)
m = length(y);
J = 0;
% Hypothesis h(x)
h = X * theta;
% Error function (h(x) - y) ^ 2
squaredError = (h-y).^2;
% Cost function
J = sum(squaredError)/(2*m);
end
I think we needed to use iteration for much general solution for cost rather one iteration, also the result shows in the PDF 32.07 may not be correct answer that grader is looking for reason being its a one case out of many training data.
I think it should loop through like this
for i in 1:iteration
theta = theta - alpha*(1/m)(theta'*x-y)*x
j = (1/(2*m))(theta'*x-y)^2

Get a fraction from its decimal number

I am developing a program that solves a system of equations. When it gives me the results, it is like: "x1= 1,36842". I'd like to get the fraction of that "1,36842", so I wrote this code.
procedure TForm1.Button1Click(Sender: TObject);
var numero,s:string;
a,intpart,fracpart,frazfatta:double;
y,i,mcd,x,nume,denomin,R:integer;
begin
a:=StrToFloat(Edit1.Text); //get the value of a
IntPart := Trunc(a); // here I get the numerator and the denominator
FracPart := a-Trunc(a);
Edit2.Text:=FloatToStr(FracPart);
numero:='1';
for i:= 1 to (length(Edit2.Text)-2) do
begin
numero:=numero+'0';
end; //in this loop it creates a string that has many 0 as the length of the denominator
Edit3.text:=FloatToStr(IntPart);
y:=StrToInt(numero);
x:=StrToInt(Edit3.Text);
while y <> 0 do
begin
R:= x mod y;
x:=y;
y:=R;
end;
mcd:=x; //at the end of this loop I have the greatest common divisor
nume:= StrToInt(Edit3.Text) div mcd;
denomin:= StrToInt(numero) div mcd;
Memo1.Lines.Add('fraction: '+IntToStr(nume)+'/'+IntToStr(denomin));
end;
It doesn't work correctly because the fraction that it gives to me is wrong. Could anyone help me please?
Your code cannot work because you are using binary floating point. And binary floating point types cannot represent the decimal numbers that you are trying to represent. Representable binary floating point numbers are of the form s2e where s is the significand and e is the exponent. So, for example, you cannot represent 0.1 as a binary floating point value.
The most obvious solution is to perform the calculation using integer arithmetic. Don't call StrToFloat at all. Don't touch floating point arithmetic. Parse the input string yourself. Locate the decimal point. Use the number of digits that follow to work out the decimal scale. Strip off any leading or trailing zeros. And do the rest using integer arithmetic.
As an example, suppose the input is '2.79'. Convert that, by processing the text, into numerator and denominator variables
Numerator := 279;
Denominator := 100;
Obviously you'd have to code string parsing routines rather than use integer literals, but that is routine.
Finally, complete the problem by finding the gcd of these two integers.
The bottom line is that to represent and operate on decimal data you need a decimal algorithm. And that excludes binary floating point.
I recommend defining a function GreaterCommonDivisor function first (wiki reference)
This is going to be Java/C like code since I'm not familiar with Delphi
let
float x = inputnum // where inputnum is a float
// eg. x = 123.56
Then, multiplying
int n = 1;
while(decimalpart != 0){// or cast int and check if equal-> (int)x == x
x = x * 10;
decimalpart = x % 1;
// or a function getting the decimal part if the cast does work
n *= 10;
}
// running eg. x = 123.56 now x = 12356
// n = 100
Then you should have (float)x/n == inputnum at this point eg. (12356/100 == 123.56)
This mean you have a fraction that may not be simpified at this point. All you do now is implement and use the GCD function
int gcd = GreaterCommonDivisor(x, n);
// GreaterCommonDivisor(12356, 100) returns 4
// therefore for correct implementation gcd = 4
x /= gcd; // 12356 / 4 = 3089
n /= gcd; // 100 / 4 = 25
This should be quick and simple to implement, but:
Major Pitfalls:
Float must be terminating. For example expected value for 0.333333333333333333 won't be rounded to 1/3
Float * n <= max_int_value, otherwise there will be a overflow, there are work around this, but there may be another solutions more fitting to these larger numbers
Continued fractions can be used to find good rational approximations to real numbers. Here's an implementation in JavaScript, I'm sure it's trivial to port to Delphi:
function float2rat(x) {
var tolerance = 1.0E-6;
var h1=1; var h2=0;
var k1=0; var k2=1;
var b = x;
do {
var a = Math.floor(b);
var aux = h1; h1 = a*h1+h2; h2 = aux;
aux = k1; k1 = a*k1+k2; k2 = aux;
b = 1/(b-a);
} while (Math.abs(x-h1/k1) > x*tolerance);
return h1+"/"+k1;
}
For example, 1.36842 is converted into 26/19.
You can find a live demo and more information about this algorithm on my blog.
#Joni
I tried 1/2 and the result was a "division by zero" error;
I correct the loop adding:
if b - a = 0 then BREAK;
To avoid
b:= 1 / (b - a);

Resources