I am using weka for classification and I would like to understand what does it mean to change the value of epsilon in SVM Classifier.
The tooltip says:
epsilon -- The epsilon for round-off error (shouldn't be changed).
Take a look at
https://svn.cms.waikato.ac.nz/svn/weka/trunk/weka/src/main/java/weka/classifiers/functions/SMO.java
m_eps is used in a method called takeStep(). Here is the definition:
/**
* Method solving for the Lagrange multipliers for
* two instances.
*
* #param i1 index of the first instance
* #param i2 index of the second instance
* #param F2
* #return true if multipliers could be found
* #throws Exception if something goes wrong
*/
protected boolean takeStep(int i1, int i2, double F2) throws Exception {
Related
I'm currently taking Andrew Ng's machine learning course and I try implementing the stuff as I learn so as not to forget them, I just finished regularization (chapter 7). I know that theta 0 is updated normally, separate from other parameters, however, I am not sure which of these is the correct implementation.
Implementation 1: in my gradient function, after computing the regularization vector, change theta 0 part to 0 so when it is added to the total, it is as if theta 0 was never regularized.
Implementation 2: store theta in a temp variable: _theta, update it with a reg_step of 0 (so it's as if there's no regularization), store the new theta 0 in a temp variable: t1, then update the original theta value with my desired reg_step and replace theta 0 with t1 (value from non-regularized update).
below is my code for the first implementation, it's not meant to be advanced, I'm just practicing:
I'm using octave which is 1-index, so theta(1) is theta(0)
function ret = gradient(X,Y,theta,reg_step),
H = theta' * X;
dif = H-Y;
mul = dif .* X;
total = sum(mul,2);
m=(size(Y)(1,1));
regular = (reg_step/m)*theta;
regular(1)=0;
ret = (total/m)+regular,
endfunction
Thanks in advance.
A slight tweak to the first implementation worked for me.
First, calculate regularization for every theta. Then go on to perform gradient step and later you can change the first entry of the matrix containing gradients manually to ignore regularization for theta_0.
% Calculate regularization
regularization = (reg_step / m) * theta;
% Gradient Step
gradients = (1 / m) * (X' * (predictions - y)) + regularization;
% Ignore regularization in theta_0
gradients(1) = (1 / m) * (X(:, 1)' * (predictions - y));
I'm doing Andrew Ng's course on Machine Learning and I'm trying to wrap my head around the vectorised implementation of gradient descent for multiple variables which is an optional exercise in the course.
This is the algorithm in question (taken from here):
I just cannot do this in octave using sum though, I'm not sure how to multiply the sum of the hypothesis of x(i) - y(i) by the all variables xj(i). I tried different iterations of the following code but to no avail (either the dimensions are not right or the answer is wrong):
theta = theta - alpha/m * sum(X * theta - y) * X;
The correct answer, however, is entirely non-obvious (to a linear algebra beginner like me anyway, from here):
theta = theta - (alpha/m * (X * theta-y)' * X)';
Is there a rule of thumb for cases where sum is involved that governs transformations like the above?
And if so, is there the opposite version of the above (i.e. going from a sum based solution to a general multiplication one) as I was able to come up with a correct implementation using sum for gradient descent for a single variable (albeit not a very elegant one):
temp0 = theta(1) - (alpha/m * sum(X * theta - y));
temp1 = theta(2) - (alpha/m * sum((X * theta - y)' * X(:, 2)));
theta(1) = temp0;
theta(2) = temp1;
Please note that this only concerns vectorised implementations and although there are several questions on SO as to how this is done, my question is primarily concerned with the implementation of the algorithm in Octave using sum.
The general "rule of the thumb" is as follows, if you encounter something in the form of
SUM_i f(x_i, y_i, ...) g(a_i, b_i, ...)
then you can easily vectorize it (and this is what is done in the above) through
f(x, y, ...)' * g(a, b, ...)
As this is just a typical dot product, which in mathematics (in Euclidean space of finite dimension) looks like
<A, B> = SUM_i A_i B_i = A'B
thus
(X * theta-y)' * X)
is just
<X * theta-y), X> = <H_theta(X) - y, X> = SUM_i (H_theta(X_i) - y_i) X_i
as you can see this works both ways, as this is just a mathematical definition of dot product.
Referring to this part of your question specifically - "I'm not sure how to multiply the sum of the hypothesis of x(i) - y(i) by the all variables xj(i)."
In Octave you can multiply xj(i) to all the predictions using ".", so it can be written as:
m = size(X, 1);
predictions = X * theta;
sqrErrors = (predictions-y).^2;
J = 1 / (2*m) * sum(sqrErrors);
The vector multiplication automatically includes calculating the sum of the products. So you don't have to specify the sum() function. By using the sum() function, you are converting a vector into a scalar and that's bad.
You actually don't want to use summation here, because what you try to calculate are the single values for all thetas, and not the overall cost J. As you do this in one line of code, if you sum it up you end up with a single value (the sum of all thetas).
Summation was correct, though unnecessary, when you computed the values of theta one by one in the previous exercise. This works just the same:
temp0 = theta(1) - (alpha/m * (X * theta - y)' * X(:, 1));
temp1 = theta(2) - (alpha/m * (X * theta - y)' * X(:, 2));
theta(1) = temp0;
theta(2) = temp1;
Please help me to understand unit thing in neuron networks. From the book I understood that a unit in input layer represents an attribute of training tuple. However, it is left unclear, how exactly it does.
Here is the diagram:
There are two "thinking paths" about the input units. The first it could could be that X1 stands for attr1, X2 stands for attr2... Otherwise, it could be that X1, X2, and X3 represent attr1, but X1 stands for Value.VALUE_ONE, ... , X3 stands for Value.VALUE_THREE. So in least case, if attr1 = Value.VALUE_TWO then it weighted and fed simultaneously to a second layer.
public class Tuple
{
private Value attr1
private Value attr2
private Value attr3
}
public enum Value
{
VALUE_ONE,
VALUE_TWO,
VALUE_THREE
}
The second question is about hidden layer units. How it is decided how much units it shall be in hidden layer, and what they represent in the model?
The "units" are just floating point values.
All computations happening there are vector multiplications, and thus can be parallelized well using matrix multiplications and GPU hardware.
The general computation looks like this:
double v phi(double[] x, double[] w, double theta) {
double sum = theta;
for(int i = 0; i < x.length; i++)
sum += x[i] * w[i];
return tanh(sum);
}
except that you don't want to do this in Java code yourself. You want to do this on a GPU in a parallelized way, because this will be 100x faster.
According to the TryF#.org site this function below returns quadruple of the number entered.
let quadruple x =
let double x = x * 2
double(double(x))
Can anyone explain why as I interpret it as like follows? Quadruple doesn't perform any mutation or multiple calls.
function quadruple(x)
return function double(x)
return x * 2
or C#
int a(int x) { return b(x); }
int b(int x) { return x * 2; }
I think this is just a confused indentation. The function should probably look like this:
let quadruple x =
let double x = x * 2
double(double(x))
This should hopefully make more sense - the quadruple function defines a function double and then calls it on the input x (multiplying it by 2) and then applies double on the result, multiplying it by 2 again, so the result is (x * 2) * 2.
Using the indentation in your sample, the code would not compile, because it is not syntactically valid (a function body cannot end with a let line - it needs to end with an expression representing some result to be returned).
I have a simple function call takes two tuples. Getting compiler error on type:
module test
open System.IO
open System
let side (x1,y1) (x2,y2) : float =
Math.Sqrt((x2 - x1)*(x2 - x1) + (y2 - y1)*(y2 - y1))
let a = side ( 2s, 3s ) ( 1s, 2s )
Error 2 The type 'float' does not match the type 'int16'
Not sure where it goes wrong. Can anyone help?
Thanks!
Math.Sqrt expects argument of float, but you pass there int16. F# doesn't perform such implicit conversions
let side (x1,y1) (x2,y2) : float =
(x2 - x1)*(x2 - x1) + (y2 - y1)*(y2 - y1)
|> float
|> Math.Sqrt
or you can pass floats from the very beginning:
let side (x1,y1) (x2,y2) : float = Math.Sqrt((x2 - x1)*(x2 - x1) + (y2 - y1)*(y2 - y1))
let a = side ( 2.0, 3.0 ) ( 1.0, 2.0 )
As others already pointed out, the F# compiler doesn't automatically insert any conversions between numeric types. This means that if you're writing a function that works with floats, you need to pass it floats as arguments.
The function in your example can work with various types, because Math.Sqrt and numeric operators are overloaded. If you write it without any type annotations, you'll get a function working with floats (because Math.Sqrt only works with floats):
> let side (x1,y1) (x2,y2) =
Math.Sqrt((x2 - x1)*(x2 - x1) + (y2 - y1)*(y2 - y1));;
val side : float * float -> float * float -> float
This can be called only with floats as arguments, so you need to call it like Joel suggests. If you want to get a function that takes other type of number as parameter, you'll need to add type annotations and conversion. I would write it like this:
> let side (x1:int16,y1) (x2,y2) =
let n = (x2 - x1)*(x2 - x1) + (y2 - y1)*(y2 - y1)
Math.Sqrt(float(n));;
val side : int16 * int16 -> int16 * int16 -> float
We need only a single type annotation (the compiler then figures out that y1, x2, ... also have to be of type int16, because we're multiplying/adding them and that's only allowed on two values of the same type). So, now you can write:
side ( 2s, 3s ) ( 1s, 2s )
Note that the version by desco is a bit tricky - it adds conversion (using the float) function, but it doesn't have type annotation to specify the type of parameters - in this case, the compiler will pick a default type which is int, so if you use his function, you'll have to call it using side (2,3) (1,2).
The signature of your function is float * float -> float * float -> float but you're passing in int16 values (that's what the s suffix means).
One way to get it to compile would be to do this:
let a = side ( 2.0, 3.0 ) ( 1.0, 2.0 )