How ?SPEVX LAPACK calculates eigenvectors? - gfortran

I'm trying to use ?SPEVX to calculate eigenvalues and eigenvectors. Does anyone have a explanation to how ?SPEVX calculates eigenvectors? Because whenever I try to compare it to the analytical results, it doesn't match. My understanding is it ALWAYS generates real valued function, the question is HOW?
My understanding is linear combination of two eigenvectors (corresponding to this eigenvalue) will be a valid eigenvector! but how does this work? What is the linear combination.
Please help.
Thank you
I compared the eigenvectors to analytical function for such as exp(ix). The eigenvectors didn't match.

Related

Should I need to convert ordered numerical data into categorical for encoding or normalize it?

I'm working on a dataset that has both numerical and categorical columns. One of the numerical columns is fare rates ($) which has just 4 distinct values (200, 400, 600 and 800). I have done feature scaling on other numerical features but I'm stuck here to decide whether should I need to apply normalization here or make it categorical to encode this feature. I want to use Neural Networks, if I treat it as a numerical feature, the weights for this feature will affect the output. If anyone has any leads then please help me in this regard.
Thanks
I'm trying to find a perfect solution.
Use it as numerical feature, it should give you a better result. I

How to set a suitable activation function for an ANN having negative input values

I am creating an ANN which has 3 input neurons which take inputs from the device "s accelerometer in the form of x , y , z. These values are positive as well as negative depending upon the acceleration. I am not able to get an suitable activation to normalize these values. Also , I am not getting desired predictions. Any help will be valuable. :-)
You should normalize your data before. I would advise using standard score, which looks as following:
collect training sets for each of the input variables
calculate mean (m) and standard deviation (std) for each of the sets
normalize as (x-m)/z
If you are working on a regression problem, don't forget to normalize target values as well.
you can also use other normalization techniques if you think they would work better for your case. Some of them you can see here.
Choice of the activation function, in this case, should not affect much, you can just play with different types and see which results in a better performance.

Aggregating labels in GradientBoostingRegression

I am trying to understand Scikit-Learn's Gradient Boosting Regression algorithm. I followed their source code and have a good understanding of their iterative construction of trees based on a chosen loss function. What I couldn't figure out an answer to is how do they take the average of the labels from all underlying estimators when I invoke predict() .
I followed that function call down to this line. Here, scale holds the learning_rate , which if not provided, will default to 0.1. So, if I were to use 500 trees, I don't understand why would they be multiplying each of 500 different labels, for a given sample, by 0.1.
If someone could direct me to a published paper that explains this in depth, it would be much appreciated.

Distance measure for categorical attributes for k-Nearest Neighbor

For my class project, I am working on the Kaggle competition - Don't get kicked
The project is to classify test data as good/bad buy for cars. There are 34 features and the data is highly skewed. I made the following choices:
Since the data is highly skewed, out of 73,000 instances, 64,000 instances are bad buy and only 9,000 instances are good buy. Since building a decision tree would overfit the data, I chose to use kNN - K nearest neighbors.
After trying out kNN, I plan to try out Perceptron and SVM techniques, if kNN doesn't yield good results. Is my understanding about overfitting correct?
Since some features are numeric, I can directly use the Euclid distance as a measure, but there are other attributes which are categorical. To aptly use these features, I need to come up with my own distance measure. I read about Hamming distance, but I am still unclear on how to merge 2 distance measures so that each feature gets equal weight.
Is there a way to find a good approximate for value of k? I understand that this depends a lot on the use-case and varies per problem. But, if I am taking a simple vote from each neighbor, how much should I set the value of k? I'm currently trying out various values, such as 2,3,10 etc.
I researched around and found these links, but these are not specifically helpful -
a) Metric for nearest neighbor, which says that finding out your own distance measure is equivalent to 'kernelizing', but couldn't make much sense from it.
b) Distance independent approximation of kNN talks about R-trees, M-trees etc. which I believe don't apply to my case.
c) Finding nearest neighbors using Jaccard coeff
Please let me know if you need more information.
Since the data is unbalanced, you should either sample an equal number of good/bad (losing lots of "bad" records), or use an algorithm that can account for this. I think there's an SVM implementation in RapidMiner that does this.
You should use Cross-Validation to avoid overfitting. You might be using the term overfitting incorrectly here though.
You should normalize distances so that they have the same weight. By normalize I mean force to be between 0 and 1. To normalize something, subtract the minimum and divide by the range.
The way to find the optimal value of K is to try all possible values of K (while cross-validating) and chose the value of K with the highest accuracy. If a "good" value of K is fine, then you can use a genetic algorithm or similar to find it. Or you could try K in steps of say 5 or 10, see which K leads to good accuracy (say it's 55), then try steps of 1 near that "good value" (ie 50,51,52...) but this may not be optimal.
I'm looking at the exact same problem.
Regarding the choice of k, it's recommended be an odd value to avoid getting "tie votes".
I hope to expand this answer in the future.

Estimating parameters in multivariate classification

Newbie here typesetting my question, so excuse me if this don't work.
I am trying to give a bayesian classifier for a multivariate classification problem where input is assumed to have multivariate normal distribution. I choose to use a discriminant function defined as log(likelihood * prior).
However, from the distribution,
$${f(x \mid\mu,\Sigma) = (2\pi)^{-Nd/2}\det(\Sigma)^{-N/2}exp[(-1/2)(x-\mu)'\Sigma^{-1}(x-\mu)]}$$
i encounter a term -log(det($S_i$)), where $S_i$ is my sample covariance matrix for a specific class i. Since my input actually represents a square image data, my $S_i$ discovers quite some correlation and resulting in det(S_i) being zero. Then my discriminant function all turn Inf, which is disastrous for me.
I know there must be a lot of things go wrong here, anyone willling to help me out?
UPDATE: Anyone can help how to get the formula working?
I do not analyze the concept, as it is not very clear to me what you are trying to accomplish here, and do not know the dataset, but regarding the problem with the covariance matrix:
The most obvious solution for data, where you need a covariance matrix and its determinant, and from numerical reasons it is not feasible is to use some kind of dimensionality reduction technique in order to capture the most informative dimensions and simply discard the rest. One such method is Principal Component Analysis (PCA), which applied to your data and truncated after for example 5-20 dimensions would yield the reduced covariance matrix with non-zero determinant.
PS. It may be a good idea to post this question on Cross Validated
Probably you do not have enough data to infer parameters in a space of dimension d. Typically, the way you would get around this is to take an MAP estimate as opposed to an ML.
For the multivariate normal, this is a normal-inverse-wishart distribution. The MAP estimate adds the matrix parameter of inverse Wishart distribution to the ML covariance matrix estimate and, if chosen correctly, will get rid of the singularity problem.
If you are actually trying to create a classifier for normally distributed data, and not just doing an experiment, then a better way to do this would be with a discriminative method. The decision boundary for a multivariate normal is quadratic, so just use a quadratic kernel in conjunction with an SVM.

Resources