How could I reverse the effect of kernel fisher? - machine-learning

I have used Kernel fisher's discriminant analysis in my project and it worked just great. but my problem arises from the fact that when I mapped my data set using kernel functions, all data and also all eigenvalues and eigenvectors are in that space and for testing new samples I face some problems. let me explain it with an example. when I have for example 50 samples with 10 features for describing each sample, my data matrix is 50 by 10 and mapping this function will result in a 50 by 50 matrix in the new feature space. so the eigenvectors (W in FDA) are also in 50D space. now for testing a new sample that is a vector with 10 elements as its features, the mapped data matrix will be 10 by 10 and it is not in the 50D space, so I can't project it into W to obtain which class does it belong to... pleas help me, what could I do?

You are not supposed to map testing points against themselves but against training set. This is why kernel methods (especially not sparse) in general do not scale well - you have to keep the old training set all the time. Thus you will obtain your projection through K(TEST_SAMPLES, TRAINING_SET) which is 10x50 and can be used in your 50 dimensional space.

Related

How do I decide or count number of hidden/tunable parameters in my design?

For my deep learning assignment I need to design a image classification network. There this constraint in the assignment I can have 500,000 number of hidden/tunable parameters at most in this design.
How can I count or observe the number of these hidden parameters especially if I am using this tensor flow tutorial as initial code/design.
Thanks in advance
How can I count or observe the number of these hidden parameters especially if I am using this tensor flow tutorial as initial code/design.
Instead of me doing the work for you I'll show you how to count free parameters
Glancing quickly it looks like the code at cifar10 uses layers of max pooling, convolution, bias, fully connected weights. Let's review how many free parameters each of these layers adds to your architecture.
max pooling : FREE! That's right, there are no "free parameters" from max pooling.
conv : Convolutions are defined using parameters like [1,3,3,1] where the numbers correspond to your tensor like so [batch_size, CONV_SIZE, CONV_SIZE, FEATURE_DEPTH]. Multiply all the dimension sizes together to find the total size of your free parameters. In the case of [1,3,3,1], the total is 1x3x3x1 = 9.
bias : A Bias is similar to convolutions in that it is defined by a shape like [10] or [1,342,342,3]. Same thing, just multiply all dimension sizes together to get the total free parameters. Sometimes a bias is just a single number, which means a size of 1.
fully connected : A fully connected layer usually has a 2d shape like [1024,32]. This means that it is a 2d matrix, and you calculate the total free parameters just like the convolution. In this example [1024,32] has 1024x32 = 32,768 free parameters.
Finally you add up all the free parameters from all the layers and that is your total number of free parameters.
500 000 parmeters? You use an R, G and B value of each pixel? If yes there is some problems
1. too much data (long calculating time)
2. in image clasification companys always use some other image analysis technique(preprocesing) befor throwing data into NN. if you have to identical images. Second is moved by one piksel. For the network they can be very diffrend.
Imagine other neural network. Use two parameters maybe weight and height. If you swap this parametrs what will happend.
Yes during learning of your image network can decrease this effect but when I made experiments with 5x5 binary images that was very hard to network. I start using 4 layers but this help only a little.
The image used to lerning can be good clasified, after destoring also but mooving for one pixel and you have a problem.
If no make eksperiments or use genetic algoritm to find it.
After laerning you should use some algoritm to find dates with network recognize as "no important"(big differnce beetwen weight of this input and the rest, If this input weight are too close to 0 network "think" it is no important)

SVM machine learning - How to define the target in the training set?

I am working on a project where I have to implement SVM machine learning algorithm. I am trying to predict the forearm movement intention. I am using accelometer (attached to my forearm) for measuring the angle change for x,y,z axes. I have never used machine before. The problem I am having is I do not exactly know how to structure the training set. I know the angle changes for each of the axis and I know i.e if x=45 degrees, y = 65 degrees, z=30 degrees gesture performed i performed is flexion. I would like to implement 3 gestures.So the data I am having is :
x y z Target
20 60 90 flexion
100 63 23 internal rotation
89 23 74 twist
.
.
.
.
I have a file with around 2000 entries. I know, I have to normalize the training set so the data are scaled. I would like to scale it so they are in range [0.9, 0.1]. The problem is that I do not know how to represent the target in my training set. Can I just use random numbers as 1 for flexion, 2 for internal rotation, 3 for twist??
Also once the training is completed, can I do the predictions based on values for x,y,z only?? without having to supply the target value. Is my understanding correct??
First of all, I suggest that you not scale or code your data. Leave it in human-readable form. Rather, write front-end routines to perform these tasks, and back-end routines to reverse the process. Also have internal routines that can display the data in the internal forms. Doing these up front will greatly enhance your debugging later on.
Yes, you will likely want to code your classifications as 1, 2, 3. Another possibility is to have a "one-hot" ordered triple: (1,0,0) or (0,1,0) or (0,0,1). However, most SVM algorithms are set up for scalar output. Also, note that the typical treatment for a multi-class algorithm is to run three separate SVM calculations, "one against all". For each class, you take that class as "plus" data and all the others as "minus" data.
Scaling data is important for regression convergence. If you're building your SVM via complete and direct computation of the support vectors, you don't need to scale numbers that are in compatible ranges, such as these. If you're doing it by some sort of iterative approximation, you still won't need it for this data -- but keep it in mind for the future.
Yes, prediction gives only the inputs: x, y, z. It will return the target classification. That's the purpose of supervised learning: summarize experience to classify the future.

how to handle large number of features machine learning

I developed a image processing program that identifies what a number is given an image of numbers. Each image was 27x27 pixels = 729 pixels. I take each R, G and B value which means I have 2187 variables from each image (+1 for the intercept = total of 2188).
I used the below gradient descent formula:
Repeat {
θj = θj−α/m∑(hθ(x)−y)xj
}
Where θj is the coefficient on variable j; α is the learning rate; hθ(x) is the hypothesis; y is real value and xj is the value of variable j. m is the number of training sets. hθ(x), y are for each training set (i.e. that's what the summation sign is for). Further the hypothesis is defined as:
hθ(x) = 1/(1+ e^-z)
z= θo + θ1X1+θ2X2 +θ3X3...θnXn
With this, and 3000 training images, I was able to train my program in just over an hour and when tested on a cross validation set, it was able to identify the correct image ~ 67% of the time.
I wanted to improve that so I decided to attempt a polynomial of degree 2.
However the number of variables jumps from 2188 to 2,394,766 per image! It takes me an hour just to do 1 step of gradient descent.
So my question is, how is this vast number of variables handled in machine learning? On the one hand, I don't have enough space to even hold that many variables for each training set. On the other hand, I am currently storing 2188 variables per training sample, but I have to perform O(n^2) just to get the values of each variable multiplied by another variable (i.e. the polynomial to degree 2 values).
So any suggestions / advice is greatly appreciated.
try to use some dimensionality reduction first (PCA, kernel PCA, or LDA if you are classifying the images)
vectorize your gradient descent - with most math libraries or in matlab etc. it will run much faster
parallelize the algorithm and then run in on multiple CPUs (but maybe your library for multiplying vectors already supports parallel computations)
Along with Jirka-x1's answer, I would first say that this is one of the key differences in working with image data than say text data for ML: high dimensionality.
Second... this is a duplicate, see How to approach machine learning problems with high dimensional input space?

OpenCV 2.4.3 PCA class - when number of samples is less than number of dimensions

I'm trying to use the PCA class in OpenCv to perform the principal component analysis operation in my C++ application . I'm new to OpenCV and I'm having a problem So I wish if someone could help.
I'm trying a demo Example on both Matlab and the PCA class to check the answers
when I'm using 2*10 data array, and the parameter (CV_PCA_DATA_AS_COL), here I'm having two dimensions so I'm expecting to have 2 Eigenvectors each has 2 elements, and this worked fine as expected with the same results as Matlab.
But while using 10*2 data array (generally when number of samples is less than number of dimension), I get (2*10) array of eiegnvectors. I.e: 10 eigenvectors with 2 elements each. This is not expected and it's not the result given by Matlab (Matlab give 10*10 matrix of eigenvectors).
I don't know why I'm having those results and due this I can't project the Data on principal components in my application, any help?
P.S : The code I used :
Mat Mean ;
Mat H(10, 2, CV_32F); // then the matrix is filled by data
PCA pca(H,Mean,CV_PCA_DATA_AS_COL,0) ;
pca.operator()(H,Mean,CV_PCA_DATA_AS_COL,0) ;
cout<<pca.eigenvectors.rows // gives 2 instead of 10
cout<<pca.eigenvectors.cols // gives 10
I'd state it as follows:
If the number of samples is less than the data dimension then the number of retained components will be clamped at the number of samples.
We did 3x3 PCA for mechanics subject at uni, also some non-linear control algorithms used similar approaches - my memory is foggy, but it may have something to do with assumptions regarding psuedo-inverses and non-square matrices...
Once you delve into the theory - websearch 'pca with less samples than dimensions' - it gets messy fast!

OpenCV + HOG +SVM: help needed with SVM single feature vector

I try to implement a people detecting system based on SVM and HOG using OpenCV2.3. But I got stucked.
I came this far:
I can compute HOG values from an image database and then I calculate with LIBSVM the SVM vectors, so I get e.g. 1419 SVM vectors with 3780 values each.
OpenCV just wants one feature vector in the method hog.setSVMDetector(). Therefore I have to calculate one feature vector from my 1419 SVM vectors, that LIBSVM has calculated.
I found one hint, how to calculate this single feature vector: link
“The detecting feature vector at component i (where i is in the range e.g. 0-3779) is built out of the sum of the support vectors at i * the alpha value of that support vector, e.g.
det[i] = sum_j (sv_j[i] * alpha[j]) , where j is the number of the support vector, i
is the number of the components of the support vector.”
According to this, my routine works this way:
I take the first element of my first SVM vector, multiply it with the alpha value and add it with the first element of the second SVM vector that has been multiplied with alpha value, …
But after summing up all 1419 elements I get quite high values:
16.0657, -0.351117, 2.73681, 17.5677, -8.10134,
11.0206, -13.4837, -2.84614, 16.796, 15.0564,
8.19778, -0.7101, 5.25691, -9.53694, 23.9357,
If you compare them, to the default vector in the OpenCV sample peopledetect.cpp (and hog.cpp in the OpenCV source)
0.05359386f, -0.14721455f, -0.05532170f, 0.05077307f,
0.11547081f, -0.04268804f, 0.04635834f, -0.05468199f, 0.08232084f,
0.10424068f, -0.02294518f, 0.01108519f, 0.01378693f, 0.11193510f,
0.01268418f, 0.08528346f, -0.06309239f, 0.13054633f, 0.08100729f,
-0.05209739f, -0.04315529f, 0.09341384f, 0.11035026f, -0.07596218f,
-0.05517511f, -0.04465296f, 0.02947334f, 0.04555536f,
you see, that the default vector values are in the boundaries between –1 and +1, but my values exceed them far.
I think, my single feature vector routine needs some adjustment, any ideas?
Regards,
Christoph
The aggregated vector's values do look high.
I used the loadSVMfromModelFile() located in http://lnx.mangaitalia.net/trainer/main.cpp
I had to remove svinstr.sync(); from the code since it caused losing parts of the lines and getting wrong results.
I don't know much about the rest of the file, I only used this function.

Resources