I'm applying PCA for dimensionality reduction of a multivariate time-series data with >100 dimensions. I was wondering if there are any short comings to this approach when working with time-series data. If so, what are some other approaches that can be applied in order to carry out the task of dimensionality reduction?
Related
I am trying to solve a regression problem by predicting a continuous value using machine learning. I have a dataset which composed of 6 float columns.
The data come from low price sensors, this explain that very likely we will have values that can be considered out of the ordinary. To fix the problem, and before predicting my continuous target, I will predict data anomalies, and use him as a data filter, but the data that I have is not labeled, that's mean I have unsupervised anomaly detection problem.
The algorithms used for this task are Local Outlier Factor, One Class SVM, Isolation Forest, Elliptic Envelope and DBSCAN.
After fitting those algorithms, it is necessary to evaluate them to choose the best one.
Can anyone have an idea how to evaluate an unsupervised algorithm for anomaly detection ?
The only way is to generate synthetic anomalies which mean to introduce outliers by yourself with the knowledge of how a typical outlier will look like.
Whether we need to scale(by zscale or by standardization) the data while building decision tree or random forests? As we know that we need to scale the data for KNN, K-means clustering and PCA. As these algorithms are based on distance calculations. What about scaling in Linear, Logistic, NavieBayes, Decision trees and Random forests?
We do data scaling, when we are seeking for some relation between data point. In ANN and other data mining approaches we need to normalize the inputs, otherwise network will be ill-conditioned. We do the scaling to reach a linear, more robust relationship. Moreover, data scaling can also help you a lot to overcome outliers in the data. In short, data scaling is highly recommended in each type of machine learning algorithms.
You can do normalization or standardization in order to scale your data.
[Notice that do not confuse normalization with standardization (e.g. Z-score)]
Hope that helps.
Whether we need to scale(by zscale or by standardization) the data while building decision tree or random forests?
A: Decision trees and Random Forests are immune to the feature magnitude and hence its not required.
As we know that we need to scale the data for KNN, K-means clustering and PCA. As these algorithms are based on distance calculations. What about scaling in Linear, Logistic, NavieBayes, Decision trees and Random forests?
A: In general, scaling is not an absolute requirement, its a recommendation, primarily for similarity based algorithms. For many algorithms, you may need to consider data transformation prior to normalization.There's also various normalization techniques you can try out, and there's no one size fits best for all problems. The main reason for normalization for error based algorithms such as linear, logistic regression, neural networks is faster convergence to the global minimum due to the better initialization of weights.Information based algorithms (Decision Trees, Random Forests) and probability based algorithms (Naive Bayes, Bayesian Networks) don't require normalization either.
Scaling is better to be done in general, because if all the features are on the same scale, the Gradient Descent Algorithm converges faster to the global or optimum local minimum.
We can speed up gradient descent by having each of our input values in roughly the same range. This is because our model parameters, will descend quickly on small ranges and slowly on large ranges, and so will oscillate inefficiently down to the optimum when the variables are very uneven.
In principle components analysis (PCA) why we need to calculate the eigenface to identify an unknown image? why we do not just use similarity measures to find best match betweem an unknown image and the images in the training data set?
I strongly suggest that you study PCA formally. It is not a difficult algorithm to understand.
PCA is a dimension reduction tool, not a classifier. In Scikit-Learn, all classifiers and estimators have a predict method which PCA does not. You need to fit a classifier on the PCA-transformed data. Scikit-Learn has many classifiers.
I am working on LDA (linear discriminant analysis), and you can refer to http://www.ccs.neu.edu/home/vip/teach/MLcourse/5_features_dimensions/lecture_notes/LDA/LDA.pdf .
My idea about semi-supervised LDA: I can use labeled data $X\in R^{d\times N}$ to computer all terms in $S_w$ and $S_b$. Now, I also have unlabeled data $Y\in R^{d\times M}$, and such data can be additionally used to estimate the covariance matrix $XX^T$ in $S_w$ by $\frac{N}{N+M}(XX^T+YY^T)$ which intuitively gets a better covariance estimation.
Implementation of different LDA: I also add a scaled identity matrix to $S_w$ for all compared methods, the scaling parameter should be tuned in different methods. I divide training data into two parts: labeled $X\in R^{d\times N}$, unlabeled $Y\in R^{d\times M}$ with $N/M$ ranging from $0.5$ to $0.05$. I run my semi-supervised LDA on three kinds of real datasets.
How to do classification: The eigenvectors of $S_w^{-1}S_b$ are used as the transformation matrix $\Phi$, then
Experiment results: 1) In the testing data, the classification accuracy of my semi-supervised LDA trained on data $X$& $Y$ is always a bit worse than the standard LDA trained only on data $X$. 2) Also, in one real data, the optimal scaling parameter can be very different for these two methods to achieve a best classification accuracy.
Could you tell me the reason and give me suggestion to make my semi-supervised LDA work? My codes have been checked. Many thanks.
Could somebody give me the example to show how platt scaling is used along with k-fold cross-validation in multiclass SVM classification in libsvm?
I have divided the whole dataset in two parts: Training and testing. For cross-validation i am partitioning the training data such that 1 partition is for testing and rest is for training multiclass SVM classifier.
Platt scaling has nothing to do with your partitioning or multiclass setting. Platt scaling is internal technique of each individual binary SVM, which uses only a training data. This is actually just fitting a logistic reggresion on top of your learned SVM projections.