Could somebody give me the example to show how platt scaling is used along with k-fold cross-validation in multiclass SVM classification in libsvm?
I have divided the whole dataset in two parts: Training and testing. For cross-validation i am partitioning the training data such that 1 partition is for testing and rest is for training multiclass SVM classifier.
Platt scaling has nothing to do with your partitioning or multiclass setting. Platt scaling is internal technique of each individual binary SVM, which uses only a training data. This is actually just fitting a logistic reggresion on top of your learned SVM projections.
Related
Can we use KNN and linear SVM classifier for training the model with data which contains 4 features and have 6 classification clusters? Because what i think that linear SVM and KNN are used for linearly separating the data which have two features and have binary classification cluster.
This is possible, you just need to use OneVsAll wrapper, like this one https://scikit-learn.org/stable/modules/generated/sklearn.multiclass.OneVsRestClassifier.html
Essentially you will train 6 classifiers, one per cluster, which seeks to locate one class from all the rest.
Which metrics is better for multi-label classification in Keras: accuracy or categorical_accuracy? Obviously the last activation function is sigmoid and as loss function is binary_crossentropy in this case.
I would not use Accuracy for classification tasks with unbalanced classes.
Especially for multi-label tasks, you probably have most of your labels to be False. That is, each data point can only have a small set of labels compared to the cardinality of all of the possibile labels.
For that reason accuracy is not a good metric, if your model predict all False (sigmoid activation output < 0.5) then you would measure a very high accuracy.
I would analyze either the AUC or recall/precision at each epoch.
Alternatively a multi-label task can be seen as a ranking task (like Recommender Systems) and you could evaluate precision#k or recall#k where k are the top predicted labels.
If your Keras back-end is TensorFlow, check out the full list of supported metrics here: https://www.tensorflow.org/api_docs/python/tf/keras/metrics.
Actually, there is no metric named accuracy in Keras. When you set metrics=['accuray'] in Keras, the correct accuracy metric will be inferred automatically based on the loss function used. As a result, since you have used binary_crossentropy as the loss function, the binary_accuracy will be chosen as the metric.
Now, you should definitely choose binary_accuracy over categorical_accuracy in a multi-label classification task since classes are independent from each other and the prediction for each class should be considered independently of the predictions for other classes.
I am working on LDA (linear discriminant analysis), and you can refer to http://www.ccs.neu.edu/home/vip/teach/MLcourse/5_features_dimensions/lecture_notes/LDA/LDA.pdf .
My idea about semi-supervised LDA: I can use labeled data $X\in R^{d\times N}$ to computer all terms in $S_w$ and $S_b$. Now, I also have unlabeled data $Y\in R^{d\times M}$, and such data can be additionally used to estimate the covariance matrix $XX^T$ in $S_w$ by $\frac{N}{N+M}(XX^T+YY^T)$ which intuitively gets a better covariance estimation.
Implementation of different LDA: I also add a scaled identity matrix to $S_w$ for all compared methods, the scaling parameter should be tuned in different methods. I divide training data into two parts: labeled $X\in R^{d\times N}$, unlabeled $Y\in R^{d\times M}$ with $N/M$ ranging from $0.5$ to $0.05$. I run my semi-supervised LDA on three kinds of real datasets.
How to do classification: The eigenvectors of $S_w^{-1}S_b$ are used as the transformation matrix $\Phi$, then
Experiment results: 1) In the testing data, the classification accuracy of my semi-supervised LDA trained on data $X$& $Y$ is always a bit worse than the standard LDA trained only on data $X$. 2) Also, in one real data, the optimal scaling parameter can be very different for these two methods to achieve a best classification accuracy.
Could you tell me the reason and give me suggestion to make my semi-supervised LDA work? My codes have been checked. Many thanks.
I'm using Naive Bayes classifier in Weka on a data set of 7000 instances with 15 attributes. My baseline accuracy is 87.5% using ZeroR. As a part of data preprocessing I normalized the data set with zero mean and unit variance, applied filter to randomize the dataset. I've used training (70%) and testing (30%) sets, as well as 10-fold cross validation on the entire data set, used supervised discretization and attribute selection and the best accuracy of the classifier I got is 93.43%. Is this small improvement in respect to baseline accuracy?
What is the exact difference between a Support Vector Machine classifier and a Support Vector Machine regresssion machine?
The one sentence answer is that SVM classifier performs binary classification and SVM regression performs regression.
While performing very different tasks, they are both characterized by following points.
usage of kernels
absence of local minima
sparseness of the solution
capacity control obtained by acting on the margin
number of support vectors, etc.
For SVM classification the hinge loss is used, for SVM regression the epsilon insensitive loss function is used.
SVM classification is more widely used and in my opinion better understood than SVM regression.