I'm working with Principal Component Analysis (PCA) in openCV. The constructor inputs for the case I'm interested in are:
PCA(InputArray data, InputArray mean, int flags, double retainedVariance);
Regarding the InputArray 'data' the documents state the appropriate flags should be:
CV_PCA_DATA_AS_ROW indicates that the input samples are stored as
matrix rows.
CV_PCA_DATA_AS_COL indicates that the input samples are
stored as matrix columns.
My question pertains to the use of the term 'samples' in that I'm not sure what a sample is in this context.
For example let's say I have 4 sets of data and for the sake of illustration let's label them A-D. Now each set A through D has 8 elements. They are then set up in the Mat variable I'll use as InputArray as follows:
The question is, which is it:
My sets are samples?
My data elements are samples?
Another way of asking:
Do I have 4 samples (CV_PCA_DATA_AS_COL)
Or do I have 4 sets of 8 samples (CV_PCA_DATA_AS_ROW)
?
As a guess, I'd choose CV_PCA_DATA_AS_COL (i.e. I have 4 samples) - but that's just where my head is at... Until I learn the correct terminology it seems the word 'sample' could apply to either reasoning.
Ugh...
So the answer was found by reversing the logic behind the documentation for the PCA::project step...
Mat PCA::project(InputArray vec)
vec – input vector(s); must have the same dimensionality and the same
layout as the input data used at PCA phase, that is, if
CV_PCA_DATA_AS_ROW are specified, then vec.cols==data.cols (vector
dimensionality)
i.e. 'sample' is equivalent to 'set', and the elements are the 'dimension'.
(and my guess was correct :)
Related
I'm trying to implement Deep Mind's DNC - Nature paper- with PyTorch 0.4.0.
When implementing the variant of LSTM they used I encountered some troubles with dimensions.
To simplify suppose BATCH=1.
The equations they list in the paper are these:
where [x;h] means a concatenation of x and h into one single vector, and i, f and o are column vectors.
My question is about how the state s_t is computed.
The second addendum is obtained by multiplying i with a column vector and so the result is either a scalar (transpose i first, then do scalar product) or wrong (two column vectors multiplied).
So the state results in a single scalar...
With the same reasoning the hidden state h_t is a scalar too, but it has to be a column vector.
Obviously I'm wrong somewhere, but I can't figure out where.
By looking at Wikipedia LSTM Article I think I figured it out.
This is the formal implementation of standard LSTM found in the article:
The circle represents element-by-element product.
By using this product in the corresponding parts of DNC equations (s_t and o_t) the dimensions work.
I am currently developing my own kernel to use for classification and want to include it into libsvm, replacing the standard kernels that libsvm offers.
I however am not 100% sure how to do this, and obviously do not want to make any mistakes. Beware, that my c++ is not very good. I found the following on the libsvm faq-page:
Q: I would like to use my own kernel. Any example? In svm.cpp, there
are two subroutines for kernel evaluations: k_function() and
kernel_function(). Which one should I modify ? An example is "LIBSVM
for string data" in LIBSVM Tools.
The reason why we have two functions is as follows. For the RBF kernel
exp(-g |xi - xj|^2), if we calculate xi - xj first and then the norm
square, there are 3n operations. Thus we consider exp(-g (|xi|^2 -
2dot(xi,xj) +|xj|^2)) and by calculating all |xi|^2 in the beginning,
the number of operations is reduced to 2n. This is for the training.
For prediction we cannot do this so a regular subroutine using that 3n
operations is needed. The easiest way to have your own kernel is to
put the same code in these two subroutines by replacing any kernel.
Hence, I was trying to find the two subroutinges k_function() and kernel_function(). The former I found with the following signature in svm.cpp:
double Kernel::k_function(const svm_node *x, const svm_node *y,
const svm_parameter& param)
Am I correct, that x and y each store one observation (=row) of my feature matrix in an array and I need to return the kernel value k(x,y)?
The function kernel_function() on the other hand I was not able to find at all. There is a pointer in the Kernel class with that name and the following declaration
double (Kernel::*kernel_function)(int i, int j) const;
which is set in the Kernel constructor. What are i and j in that case? I suppose I need to set this pointer as well?
Once I overwrote Kernel::k_function and Kernel::*kernel_function I'd be finished, and libsvm would use my kernel to compare two observations?
Thank you!
You don't have to break into the code of LIBSVM to use your own kernel, you can use the pre-computed kernel option (i.e., -t 4 training_set_file).
Thus, you can compute the kernel matrix externally as it suits you, store the values in a file and load the pre-computed kernel to LIBSVM. There's an explanation accompanied with an example of how to do this in the README file that you can find in LIBSVM tar ball (see in the Precomputed Kernels section line 236).
I'm trying to use the PCA class in OpenCv to perform the principal component analysis operation in my C++ application . I'm new to OpenCV and I'm having a problem So I wish if someone could help.
I'm trying a demo Example on both Matlab and the PCA class to check the answers
when I'm using 2*10 data array, and the parameter (CV_PCA_DATA_AS_COL), here I'm having two dimensions so I'm expecting to have 2 Eigenvectors each has 2 elements, and this worked fine as expected with the same results as Matlab.
But while using 10*2 data array (generally when number of samples is less than number of dimension), I get (2*10) array of eiegnvectors. I.e: 10 eigenvectors with 2 elements each. This is not expected and it's not the result given by Matlab (Matlab give 10*10 matrix of eigenvectors).
I don't know why I'm having those results and due this I can't project the Data on principal components in my application, any help?
P.S : The code I used :
Mat Mean ;
Mat H(10, 2, CV_32F); // then the matrix is filled by data
PCA pca(H,Mean,CV_PCA_DATA_AS_COL,0) ;
pca.operator()(H,Mean,CV_PCA_DATA_AS_COL,0) ;
cout<<pca.eigenvectors.rows // gives 2 instead of 10
cout<<pca.eigenvectors.cols // gives 10
I'd state it as follows:
If the number of samples is less than the data dimension then the number of retained components will be clamped at the number of samples.
We did 3x3 PCA for mechanics subject at uni, also some non-linear control algorithms used similar approaches - my memory is foggy, but it may have something to do with assumptions regarding psuedo-inverses and non-square matrices...
Once you delve into the theory - websearch 'pca with less samples than dimensions' - it gets messy fast!
When going through any of the machine learning functions explained here. They all follow the format of cvStatModel.
For example the train function of NormalBayes is achieved by:
CvNormalBayesClassifier::train(const Mat& trainData, const Mat& responses, const Mat& varIdx=Mat(), const Mat& sampleIdx=Mat(), bool update=false )
The documentation tells you to check out cvStatModel for details on parameters.
What I dont understand is what is responses supposed to take? I know that trainData is the data we used for training the system using bag of words, but what to place in responses?
In an example on bag of words the responses element was handled as follows:
float label=atof(entryPath.filename().c_str());
labels.push_back(label);
NormalBayesClassifier classifier;
classifier.train(trainingData, labels);
So here the filenames of the images were converted to doubles and used as the responses element.
I don't understand this and am confused by it. Can some one please explain what the responses element is supposed to take? and why is atof used in the above example?
Those models are supervised machine learning techniques, it means that training the model requires not only the training data (i.e. the vectors of measurements), but also the labels (or continuous values) associated with each sample. For example, if you are trying to detect images containing cats, you have a training set of, say, 500 images not containing cats and 500 containing cats. You compute your descriptors for all 1000 images, and you assign a number to each category (by convention, -1 for "non-cats", 1 for "cats). Then, responses will be a matrix of 1000x1 integers, the first 500 values being -1, while the remaining beeing 1.
In you example, atof is used to convert a directory name to a unique number, representing the category, because training examples are probably sorted by folders (folder cats, dogs, bicycles, etc).
I just want to clarify something about PCA in OpenCV. Suppose, I have two rows of data (A, B).
A 3 8 7
B 2 4 5
If I wanted to create a PCA model in OpenCV, what must I do to the data? Do I have to subtract the means (e.g. subtract the mean of A from its data points) or does the PCA function do this?
Someone said that OpenCV PCA expects the data to be normalised (between 0 and 1). If so, how do I normalise?
Hope someone can clarify this for me as PCA in OpenCV is very badly documented on the Net.
Cheers...
The data for PCA in OpenCV needs not to be normalized. But if you already have the mean (from some previuos calculations), you can send it to the PCACompute() function to speed it up.
OpenCV refman:
PCACompute(data[, mean[, eigenvectors[, maxComponents ]]]) !mean, eigenvectors
Parameters
data – Input samples stored as the matrix rows or as the matrix columns.
mean – Optional mean value. If the matrix is empty ( noArray() ), the mean is computed
from the data.
There is a good article on data normalization on Wikipedia.
For complete documentation check out the opencv.pdf file that should be in the doc/ folder of your instalation. On some versions it is named opencv2refman.pdf
And also try to find the book "Learning OpenCV", by Gary Bradsky, it's more than well exlained.