I would like to test different descriptors (like SIFT, SURF, ORB, LATCH etc.) in terms of precision-recall and computation time for my image dataset in order to understand which one is more suitable.
There is any pre-built tester in OpenCV for this purpose? Any other alternative or guideline?
You can use the code in the foolowing link to compute recall vs. precision curves:
http://www.robots.ox.ac.uk/~vgg/research/affine/desc_evaluation.html#code
In order to plot them, you need to detect keypoint and extract descriptors in each image in the dataset. Next, you write the descriptors for each image in the following format:
descriptor_size
nbr_of_regions
x1 y1 a1 b1 c1 d1 d2 d3 ...
x2 y2 a2 b2 c2 d1 d2 d3 ...
....
....
x, y - center coordinates
a, b, c - ellipse parameters ax^2+2bxy+cy^2=1
d1 d2 d3 ... - descriptor values, binary values in case of ORB and LATCH
Related
Where does the graph of the loss function in machine learning come from?
I am studying about machine learning. I sometimes don't understand models that have been optimized using regularization terms.
In the explanation of regularization, the following figure may appear.
Here is an example of the L1 regularization term. I have assumed that the model has two weight parameters w1, w2. That is, the equation of model y is expressed by the following equation.
y = w1x1 + w2x2
For simplicity, I ignored the bias term.
The red squares represent regularization terms. And the blue ellipses are represents the loss function without the regularization term.
The regularization term is given by
| w1 | ^ q + | w2 | ^ q = r ^ q (r is const.)
Therefore, the equation of the graph at w1> 0 and w2> 0 is expressed as follows.
w2 = (r ^ q-| w1 | ^ q) ^ (1 / q)
By substituting w1 for this equation (q = 0 at Lasso), you can draw a graph of the regularized term.
On the other hand, I could not draw a graph of the loss function. Perhaps you need more than one piece of data to draw this graph. For simplicity, I have assumed that I have only two pieces of data. I define them as (x11, x12, t1), (x21, x22, t2). When the loss function is MSE, it is expressed by the following equation.
Ed = 1/2 * {(t1-w1x11-w2x12) + (t1-w1x21-w2x22)}
If I simplify this, it is expressed as
Ed = a*w1^2 + b*w1 + c*w2^2 + d*w2 + e*w1*w2 + f
Here, a, b, c, d, e, and f are functions represented by all or part of x11, x12, x21, and x22. After finding a, b, c, d, e, and f, I thought that if we substitute w1 for this equation, we could draw a graph of the loss function. However, I cannot draw well.
Is the above understanding correct? Thank you.
To visualize the loss function, Ed which is a function of w1 and w2, we should visualize it as a 3 dimensional plot. For example, you can use Geogebra to visualize a 3 dimensional surface plot.
Here is an example, where a=3, b=-1, c=1, d =-1 , e=2.
The 2D plot that you see is called a countor plot. This link enables you to draw it online.
To draw a contour plot manually, you fix the value of Ed, then you obtained a quadratic equation, after which, as you varies w1, you can solve for your w2, for each w1, you can obtain up to 2 w2 as it is quadratic.
Remark: If you are looking for closed form expression in terms of arbitrary q, that could be more challenging.
While doing MOOC on ML by Andrew Ng, he in theory explains theta'*X gives us hypothesis and while doing coursework we use theta*X. Why it's so?
theta'*X is used to calculate the hypothesis for a single training example when X is a vector. Then you have to calculate theta' to get to the h(x) definition.
In the practice, since you have more than one training example, X is a Matrix (your training set) with "m x n" dimension where m is the number of your training examples and n your number of features.
Now, you want to calculate h(x) for all your training examples with your theta parameter in just one move right?
Here is the trick: theta has to be a n x 1 vector then when you do Matrix-Vector Multiplication (X*theta) you will obtain an m x 1 vector with all your h(x)'s training examples in your training set (X matrix). Matrix multiplication will create the vector h(x) row by row making the corresponding math and this will be equal to the h(x) definition at each training example.
You can do the math by hand, I did it and now is clear. Hope i can help someone. :)
In mathematics, a 'vector' is always defined as a vertically-stacked array, e.g. , and signifies a single point in a 3-dimensional space.
A 'horizontal' vector, typically signifies an array of observations, e.g. is a tuple of 3 scalar observations.
Equally, a matrix can be thought of as a collection of vectors. E.g., the following is a collection of four 3-dimensional vectors:
A scalar can be thought of as a matrix of size 1x1, and therefore its transpose is the same as the original.
More generally, an n-by-m matrix W can also be thought of as a transformation from an m-dimensional vector x to an n-dimensional vector y, since multiplying that matrix with an m-dimensional vector will yield a new n-dimensional one. If your 'matrix' W is '1xn', then this denotes a transformation from an n-dimensional vector to a scalar.
Therefore, notationally, it is customary to introduce the problem from the mathematical notation point of view, e.g. y = Wx.
However, for computational reasons, sometimes it makes more sense to perform the calculation as a "vector times a matrix" rather than "matrix times a vector". Since (Wx)' === x'W', sometimes we solve the problem like that, and treat x' as a horizontal vector. Also, if W is not a matrix, but a scalar, then Wx denotes scalar multiplication, and therefore in this case Wx === xW.
I don't know the exercises you speak of, but my assumption would be that in the course he introduced theta as a proper, vertical vector, but then transposed it to perform proper calculations, i.e. a transformation from a vector of n-dimensions to a scalar (which is your prediction).
Then in the exercises, presumably you were either dealing with a scalar 'theta' so there was no point transposing it, and was left as theta for convenience or, theta was now defined as a horizontal (i.e. transposed) vector to begin with for some reason (e.g. printing convenience), and then was left in that state when performing the necessary transformation.
I don't know what the dimensions for your theta and X are (you haven't provided anything) but actually it all depends on the X, theta and hypothesis dimensions. Let's say m is the number of features and n - the number of examples. Then, if theta is a mx1 vector and X is a nxm matrix then X*theta is a nx1 hypothesis vector.
But you will get the same result if calculate theta'*X. You can also get the same result with theta*X if theta is 1xm and X - mxn
Edit:
As #Tasos Papastylianou pointed out the same result will be obtained if X is mxn then (theta.'*X).' or X.'*theta are answers. If the hypothesis should be a 1xn vector then theta.'*X is an answer. If theta is 1xm, X - mxn and the hypothesis is 1xn then theta*X is also a correct answer.
i had the same problem for me. (ML course, linear regression)
after spending time on it, here is how i see it: there is a confusion between the x(i) vector and the X matrix.
About the hypothesis h(xi) for a xi vector (xi belongs to R3x1), theta belongs to R3x1
theta = [to;t1;t2] #R(3x1)
theta' = [to t1 t2] #R(1x3)
xi = [1 ; xi1 ; xi2] #(R3x1)
theta' * xi => to + t1.xi,1 +t2.xi,2
= h(xi) (which is a R1x1 => a real number)
to the theta'*xi works here
About the vectorization equation
in this case X is not the same thing as x (vector). it is a matrix with m rows and n+1 col (m =number of examples and n number of features on which we add the to term)
therefore from the previous example with n= 2,
the matrix X is a m x 3 matrix
X = [1 xo,1 xo,2 ; 1 x1,1 x1,2 ; .... ; 1 xi,1 xi,2 ; ...; 1 xm,1 xm,2]
if you want to vectorize the equation for the algorithm, you need to consider for that for each row i, you will have the h(xi) (a real number)
so you need to implement X * theta
that will give you for each row i
[ 1 xi,1 xi,2] * [to ; t1 ; t2] = to + t1.xi,1 + t2.xi,2
Hope it helps
I have used octave notation and syntax for writing matrices: 'comma' for separating column items, 'semicolon' for separating row items and 'single quote' for Transpose.
In the course theory under discussion, theta = [theta0; theta1; theta2; theta3; .... thetaf].
'theta' is therefore a column vector or '(f+1) x 1' matrix. Here 'f' is the number of features. theta0 is the intercept term.
With just one training example, x is a '(f+1) x 1' matrix or a column vector. Specifically x = [x0; x1; x2; x3; .... xf]
x0 is always '1'.
In this special case the '1 x (f+1)' matrix formed by taking theta' and x could be multiplied to give the correct '1x1' hypothesis matrix or a real number.
h = theta' * x is a valid expression.
But the coursework deals with multiple training examples. If there are 'm' training examples, X is a 'm x (f+1)' matrix.
To simplify, let there be two training examples each with 'f' features.
X = [ x1; x2].
(Please note 1 and 2 inside the brackets are not exponential terms but indexes for the training examples).
Here, x1 = [ x01, x11, x21, x31, .... xf1 ]
and
x2 = [ x02, x12, x22, x32, .... xf2].
So X is a '2 x (f+1)' matrix.
Now to answer the question, theta' is a '1 x (f+1)' matrix and X is a '2 x (f+1)' matrix. With this, the following expressions are not valid.
theta' * X
theta * X
The expected hypothesis matrix, 'h', should have two predicted values (two real numbers), one for each of the two training examples. 'h' is a '2 x 1' matrix or column vector.
The hypothesis can be obtained only by using the expression, X * theta which is valid and algebraically correct. Multiplying a '2 x (f+1)' matrix with a '(f+1) x 1' matrix resulting in a '2 x 1' hypothesis matrix.
When Andrew Ng first introduced x in the cost function J(theta), x is a column vector
aka
[x0; x1; ... ; xn]
i.e.
x0;
x1;
...;
xn
However, in the first programming assignment, we are given X, which is an (m * n) matrix, (# training examples * features per training example). The discrepancy comes with the fact that from file the individual x vectors(training samples) are stored as horizontal row vectors rather than the vertical column vectors!!
This means the X matrix you see is actually an X' (X Transpose) matrix!!
Since we have X', we need to make our code work given our equation is looking for h(theta) = theta' * X(when the vectors in matrix X are column vectors)
we have the linear algebra identity for matrix and vector multiplication:
(A*B)' == (B') * (A') as shown here Properties of Transposes
let t = theta,
given, h(t) = t' * X
h(t)' = (t' X)'
= X' * t
Now we have our variables in the format they were actually given to us. What I mean is that our input file really holds X' and theta is normal, so multiplying them in the order specified above will give a practically equivilant output to that he taught us to use which was theta' * X. Since we are summing all the elements of h(t)' at the end it doesn't matter that it is transposed for the final calculation. However, if you wanted h(t), not h(t)', you could always take your computed result and transpose it because
(A')' == A
However, for the coursera machine learning programming assignment 1, this is unnecessary.
This is because the computer has the coordinate (0,0) positioned on the top left, while geometry has the coordinate (0,0) positioned on the bottom left.
enter image description here
I was reading about non-parametric kernel density estimation.
http://en.wikipedia.org/wiki/Kernel_density_estimation
For uni-variate where D = 1, we can write like
For Multivariate Kernel density estimation (KDE), more preciously for d=3 and X = (x,y,z) can we write:
Is this technically correct? Can any one help with this?
This is very difficult to do on your own, and you really should do this through some package. Nevertheless, the definition is:
fH(x)= 1 / n \sum{i=1}n KH (x - xi), where
x = (x1, x2, …, xd)T, xi = (xi1, xi2, …, xid)T, i = 1, 2, …, n are d-vectors;
H is the bandwidth (or smoothing) d×d matrix which is symmetric and positive definite;
K is the kernel function which is a symmetric multivariate density;
KH(x) = |H|−1/n K(H−1/2x).
I have a regression related question, but I am not sure how to proceed. Consider the following dataset, with A, B, C, and D as the attributes (features) and a decision variable Dec for each row:
A B C D Dec
a1 b1 c1 d1 Y
a1 b2 c2 d2 N
a2 b2 c3 d2 N
a2 b1 c3 d1 N
a1 b3 c2 d3 Y
a1 b1 c1 d2 N
a1 b1 c4 d1 Y
Given such data, I want to figure out most compact rules for which Dec evaluates to Y.
For example, A=a1 AND B=b1 AND D=d1 => Y.
I would prefer specifying the thresholds for the Precision of these rules, so that I can filter them out as per my requirement. For example, I would like to see all rules which provide at least 90% precision. This can provide me better compaction of the rules. The above mentioned rule provides 100% precision, whereas B=b1 AND D=d1 => Y has 66% precision (it errs on the 4th row).
Vaguely, I can see that this is similar to building a decision tree and finding out paths which end in Y. If I understand correctly, building a regression model would provide me the attributes which matter the most, but I need combinations of actual values from the attributes which lead to Y.
The attribute values are multi-valued, but that is not a hard constraint. I can even assume them to be boolean.
Is there any library in existing tools such as Weka or R that can help me?
Regards
I don't think this is a regression problem. This seems like a classification problem where you are trying to classify Y or N. You could build ensemble learners like Adaboost and see how the decisions vary from tree to tree or you could do something like elastic net logistic regression and see what the final weights are.
I am trying to find an appropriate neural network structure to learn a function of the following form: F(x1,x2,x3,x4,x5)= a*x1+b*(x2-x4)/(x3-x4) + c*x5.
I am using the matlab's neural network toolbox to create a feedforwardnet, but without any luck.
Is it even possible to learn this kind of function using a neural network?
If yes, what can be an appropriate structure?
If no, are there any other models that can learn this kind of function?
Thanks.
I suggest that you start by preparing a training dataset in which you have the following:
1- Dataset
x1, x6, x5; x6 = (x2 - x4) / (x3 - x4)
2- Target label Y
Y = f(x1, x6, x5); you may assume some values of a,b,c
So, you have 3 input variables or features with one target variable Y.
Then, you define the ANN to have only one single layer (single layer Perceptron) and make sure that the output function is linear.
Finally, train the ANN and give it new values in terms of x1, x5 & x6 and compare
with the actual function.
If I understand correctly, you are trying to estimate the values of a, b, and c. Although the function is not linear with respect to its input, it is linear with respect to a, b, and c. So you should be able to solve your problem with linear regression.
More precisely, if you define x6 = (x2 - x4) / (x3 - x4), then you get F(x1, x5, x6) = a * x1 + b * x6 + c * x5, which is linear.