Please help me to understand unit thing in neuron networks. From the book I understood that a unit in input layer represents an attribute of training tuple. However, it is left unclear, how exactly it does.
Here is the diagram:
There are two "thinking paths" about the input units. The first it could could be that X1 stands for attr1, X2 stands for attr2... Otherwise, it could be that X1, X2, and X3 represent attr1, but X1 stands for Value.VALUE_ONE, ... , X3 stands for Value.VALUE_THREE. So in least case, if attr1 = Value.VALUE_TWO then it weighted and fed simultaneously to a second layer.
public class Tuple
{
private Value attr1
private Value attr2
private Value attr3
}
public enum Value
{
VALUE_ONE,
VALUE_TWO,
VALUE_THREE
}
The second question is about hidden layer units. How it is decided how much units it shall be in hidden layer, and what they represent in the model?
The "units" are just floating point values.
All computations happening there are vector multiplications, and thus can be parallelized well using matrix multiplications and GPU hardware.
The general computation looks like this:
double v phi(double[] x, double[] w, double theta) {
double sum = theta;
for(int i = 0; i < x.length; i++)
sum += x[i] * w[i];
return tanh(sum);
}
except that you don't want to do this in Java code yourself. You want to do this on a GPU in a parallelized way, because this will be 100x faster.
Related
I have 6 samples of 1 dim data as example and I'm trying to train vlfeat's SVM on it:
data:
[188.00000000;
168.00000000;
191.00000000;
150.00000000;
154.00000000;
124.00000000]
first 3 samples are positive and last 3 samples are negative.
and I get weights(including bias):
w: -0.6220197226 -0.0002974511
the problem is that all samples get predicted as negative, but they are clearly linear separable.
For learning I use solver type VlSvmSolverSgd and lambda 0.01.
I'm using C API if it matters.
Minimum working example:
void vlfeat_svm_test()
{
vl_size const numData = 6 ;
vl_size const dimension = 1 ;
//double x[dimension * numData] = {188.0,168.0,191.0,150.0, 154.0, 124.0};
double x[dimension * numData] = {188.0/255,168.0/255,191.0/255,150.0/255, 154.0/255, 124.0/255};
double y[numData] = {1, 1, 1, -1, -1, -1} ;
double lambda = 0.01;
VlSvm *svm = vl_svm_new(VlSvmSolverSgd, x, dimension, numData, y, lambda);
vl_svm_train(svm);
double const * w= vl_svm_get_model(svm);
double bias= vl_svm_get_bias(svm);
for(int k=0;k<numData;++k)
{
double res= 0.0;
for(int i=0;i<dimension;++i)
{
res+= x[k*dimension+i]*w[i];
}
int pred= ((res+bias)>0)?1:-1;
cout<< pred <<endl;
}
cout << "w: ";
for(int i=0;i<dimension;++i)
cout<< w[i] <<" ";
cout<< bias <<endl;
vl_svm_delete(svm);
}
Update:
Also I tried to scale input data by dividing by 255 it has no effect.
Update 2:
Extremly low lambda= 0.000001 seems solve the problem.
This happens because the SVM solvers in VLFeat do not estimate the model and bias directly, but use the workaround of adding a constant component to the data (as mentioned in http://www.vlfeat.org/api/svm-fundamentals.html) and return the corresponding model weight as the bias.
The bias term is thus a part of the regularizer and models with higher bias are "penalized" in terms of energy. This effect is especially strong in your case, since your data are extremely low dimensional :)
Therefore you need to choose a small value of the regularization parameter LAMBDA to lower the importance of the regularizer.
I searched around and it turns out the answer to this is surprising hard to find. Theres algorithm out there that can generate a random orientation in quaternion form but they involve sqrt and trig functions. I dont really need a uniformly distributed orientation. I just need to generate (many) quaternions such that their randomness in orientation is "good enough." I cant specify what is "good enough" except that I need to be able to do the generation quickly.
Quoted from http://planning.cs.uiuc.edu/node198.html:
Choose three points u, v, w ∈ [0,1] uniformly at random. A uniform, random quaternion is given by the simple expression:
h = ( sqrt(1-u) sin(2πv), sqrt(1-u) cos(2πv), sqrt(u) sin(2πw), sqrt(u) cos(2πw))
From Choosing a Point from the Surface of a Sphere by George Marsaglia:
Generate independent x, y uniformly in (-1..1) until z = x²+y² < 1.
Generate independent u, v uniformly in (-1..1) until w = u²+v² < 1.
Compute s = √((1-z) / w).
Return the quaternion (x, y, su, sv). It's already normalized.
This will generate a uniform random rotation because 4D spheres, unit quaternions and 3D rotations have equivalent measures.
The algorithm uses one square root, one division, and 16/π ≈ 5.09 random numbers on average. C++ code:
Quaternion random_quaternion() {
double x,y,z, u,v,w, s;
do { x = random(-1,1); y = random(-1,1); z = x*x + y*y; } while (z > 1);
do { u = random(-1,1); v = random(-1,1); w = u*u + v*v; } while (w > 1);
s = sqrt((1-z) / w);
return Quaternion(x, y, s*u, s*v);
}
Simplest way to generate it, just generate 4 random float and normalize it if required. If you want to produce rotation matrices later , than normalization can be skipped and convertion procedure should note nonunit quaternions.
This plots but the result of conv is a vector of a new length and so t is usless to include in plot like plot(t, z1) %doesn't work!.
t = [-5:.1:10];
unit = #(t) 1.*(t>=0);
h1 = #(t) (3*t + 2).*exp(-3*t).*unit(t);
z1 = conv(unit(t), h1(t));
plot(z1);
I want a plot of the convolved signal as a function of time.
You need to add the shape argument. Here's the spec:
— Function File: conv (a, b) — Function File: conv (a, b, shape)
Convolve two vectors a and b.
The output convolution is a vector with length equal to length (a) +
length (b) - 1. When a and b are the coefficient vectors of two
polynomials, the convolution represents the coefficient vector of the
product polynomial.
The optional shape argument may be
shape = "full" Return the full convolution. (default) shape = "same"
Return the central part of the convolution with the same size as a.
so convolve like this:
z1 = conv(unit(t), h1(t), "same");
And you'll get the same time units as the original.
I am fairly new to OpenCV and sort of understanding it bit by bit. I know that the matrix operators in cv::Mat class has been overloaded to do A.mult(B), A+B, A-B, A/B, etc.
I have two vectors which are projections of rows and columns of an image. I have two images(S and T), so each of them will have two projection vectors (rowProejctionS, columnProjectionS, rowProjectionT, columnProjectionT). I also have the means of the images (meanS, meanT). I need to do a "SUM OF PRODUCT" related calculation, which in MATLAB is as follows
numeratorLambdaRo = sum((rowProjectionT - meanT).*(rowProjectionS - meanS));
denominatorLambdaRo = sqrt(sum((rowProjectionT - meanT).^2)*sum((rowProjectionS - meanS).^2);
LambaRo = numeratorLambdaRo/denominatorLambdaRo;
I am not entirely sure about the capability of matrix operators in the context of cv::Mat objects.
declare meanT, meanS as double or cv::Scalar and you can just substract it from your matrix. You can maybe split your operations :
rowProjectionT -= meanT;
rowProjectionS -= meanS;
numeratoLambdaRo = cv::sum(rowProjectionT*rowProjectionS.t()); // transpose 1 of the vector so that multiplication is equivalent to dot product.
cv::Mat rowProjTSquare = rowProjectionT*rowProjectionT.t();
cv::Mat rowProjSSquare = rowProjectionS*rowProjectionS.t();
denominatorLambdaRo = sqrt(cv::sum(rowProjTSquare*rowProjSSquare));
This is a formula for LoG filtering:
(source: ed.ac.uk)
Also in applications with LoG filtering I see that function is called with only one parameter:
sigma(σ).
I want to try LoG filtering using that formula (previous attempt was by gaussian filter and then laplacian filter with some filter-window size )
But looking at that formula I can't understand how the size of filter is connected with this formula, does it mean that the filter size is fixed?
Can you explain how to use it?
As you've probably figured out by now from the other answers and links, LoG filter detects edges and lines in the image. What is still missing is an explanation of what σ is.
σ is the scale of the filter. Is a one-pixel-wide line a line or noise? Is a line 6 pixels wide a line or an object with two distinct parallel edges? Is a gradient that changes from black to white across 6 or 8 pixels an edge or just a gradient? It's something you have to decide, and the value of σ reflects your decision — the larger σ is the wider are the lines, the smoother the edges, and more noise is ignored.
Do not get confused between the scale of the filter (σ) and the size of the discrete approximation (usually called stencil). In Paul's link σ=1.4 and the stencil size is 9. While it is usually reasonable to use stencil size of 4σ to 6σ, these two quantities are quite independent. A larger stencil provides better approximation of the filter, but in most cases you don't need a very good approximation.
This was something that confused me too, and it wasn't until I had to do the same as you for a uni project that I understood what you were supposed to do with the formula!
You can use this formula to generate a discrete LoG filter. If you write a bit of code to implement that formula, you can then to generate a filter for use in image convolution. To generate, say a 5x5 template, simply call the code with x and y ranging from -2 to +2.
This will generate the values to use in a LoG template. If you graph the values this produces you should see the "mexican hat" shape typical of this filter, like so:
(source: ed.ac.uk)
You can fine tune the template by changing how wide it is (the size) and the sigma value (how broad the peak is). The wider and broader the template the less affected by noise the result will be because it will operate over a wider area.
Once you have the filter, you can apply it to the image by convolving the template with the image. If you've not done this before, check out these few tutorials.
java applet tutorials more mathsy.
Essentially, at each pixel location, you "place" your convolution template, centred at that pixel. You then multiply the surrounding pixel values by the corresponding "pixel" in the template and add up the result. This is then the new pixel value at that location (typically you also have to normalise (scale) the output to bring it back into the correct value range).
The code below gives a rough idea of how you might implement this. Please forgive any mistakes / typos etc. as it hasn't been tested.
I hope this helps.
private float LoG(float x, float y, float sigma)
{
// implement formula here
return (1 / (Math.PI * sigma*sigma*sigma*sigma)) * //etc etc - also, can't remember the code for "to the power of" off hand
}
private void GenerateTemplate(int templateSize, float sigma)
{
// Make sure it's an odd number for convenience
if(templateSize % 2 == 1)
{
// Create the data array
float[][] template = new float[templateSize][templatesize];
// Work out the "min and max" values. Log is centered around 0, 0
// so, for a size 5 template (say) we want to get the values from
// -2 to +2, ie: -2, -1, 0, +1, +2 and feed those into the formula.
int min = Math.Ceil(-templateSize / 2) - 1;
int max = Math.Floor(templateSize / 2) + 1;
// We also need a count to index into the data array...
int xCount = 0;
int yCount = 0;
for(int x = min; x <= max; ++x)
{
for(int y = min; y <= max; ++y)
{
// Get the LoG value for this (x,y) pair
template[xCount][yCount] = LoG(x, y, sigma);
++yCount;
}
++xCount;
}
}
}
Just for visualization purposes, here is a simple Matlab 3D colored plot of the Laplacian of Gaussian (Mexican Hat) wavelet. You can change the sigma(σ) parameter and see its effect on the shape of the graph:
sigmaSq = 0.5 % Square of σ parameter
[x y] = meshgrid(linspace(-3,3), linspace(-3,3));
z = (-1/(pi*(sigmaSq^2))) .* (1-((x.^2+y.^2)/(2*sigmaSq))) .*exp(-(x.^2+y.^2)/(2*sigmaSq));
surf(x,y,z)
You could also compare the effects of the sigma parameter on the Mexican Hat doing the following:
t = -5:0.01:5;
sigma = 0.5;
mexhat05 = exp(-t.*t/(2*sigma*sigma)) * 2 .*(t.*t/(sigma*sigma) - 1) / (pi^(1/4)*sqrt(3*sigma));
sigma = 1;
mexhat1 = exp(-t.*t/(2*sigma*sigma)) * 2 .*(t.*t/(sigma*sigma) - 1) / (pi^(1/4)*sqrt(3*sigma));
sigma = 2;
mexhat2 = exp(-t.*t/(2*sigma*sigma)) * 2 .*(t.*t/(sigma*sigma) - 1) / (pi^(1/4)*sqrt(3*sigma));
plot(t, mexhat05, 'r', ...
t, mexhat1, 'b', ...
t, mexhat2, 'g');
Or simply use the Wavelet toolbox provided by Matlab as follows:
lb = -5; ub = 5; n = 1000;
[psi,x] = mexihat(lb,ub,n);
plot(x,psi), title('Mexican hat wavelet')
I found this useful when implementing this for edge detection in computer vision. Although not the exact answer, hope this helps.
It appears to be a continuous circular filter whose radius is sqrt(2) * sigma. If you want to implement this for image processing you'll need to approximate it.
There's an example for sigma = 1.4 here: http://homepages.inf.ed.ac.uk/rbf/HIPR2/log.htm