How to classify images with Variational Autoencoder - machine-learning

I have trained an autoencoder in both labeled images (1200) and unlabeled images (4000) and I have both models saved separately (vae_fake_img and vae_real_img). So I was wondering what to do next. I know Variational Autoencoders are not useful for a classification task but feature extraction seems like a good try. So here are my attempts:
Labeled my unlabeled data using k-means clustering from the labeled images latent space.
My supervisor suggested training the unlabeled images on the VAE, then visualize the latent space with t-SNE, then K-means clustering, then MLP for final prediction.
I want to train a Conditional VAE to create more labeled samples and retrain the VAE and use the reconstruction (64,64,3) output and using the last three fully connected (FC) layers of VGGNet16 architecture for final classification as done in this paper Encoder as feature extraction paper.
I have tried so many methods for my thesis and I really need to achieve high accuracy if I want to get a job in my current internship. So any suggestion or guidance is highly appreciated. I've read so many Autoencoder papers but the architecture for classification is not fully explained (or Im not understanding properly), I want to know which part of the VAE holds more information for multiclassification as I believe that the latent space of the encoder has more useful information than the decoder reconstruction. I want to know which part of the autoencoder has better feature extraction for a final classification.

in case of Autoencoders yoh don't need labels for reconstructing input data. So I think these approaches might make slight improvements:
Use VAE(Variational Auto Encoder) instead of AE
Use Conditional VAE(CVAE) and the combine all the data and train the network feeding all of data into that.
consider Batch as condition, for labeled and unlabeled data and use onehot of batch of data as its condition.
Inject the condition to Encoder and Decoder
Then the latent space won't have any batch effect and you can use KNN to get the label of nearest labeled data for unlabeled ones.
Alternatively you can train a somple MLP to classify every sample of your latent space. (in this approach you should train the MLP only with labeled data and then test it on unlabeled data)
don't forget Batch normalization and drop out layers
p.s., the most meaningful layer of an AE is the latent space.

Related

LDA as the dimension reduction before or after partitioning

I am doing a classification and I have this question about using LDA just for dimension reduction:
Shall the LDA be applied on whole feature matrix including train and test data and then (after reducing the dimension of data) do the partitioning of feature matrix to provide train and test sets for classification? Is it true?
Then, suppose we need to partition the data before applying the LDA. How is it possible to do the classification on the test data using the Matlab's internal classifiers like kNN and SVM?
You should generate the LDA on the train and afterwards apply it on the test set as well.
The reason is that you wan't to check how your entire processing chain performs on unseen data. If you generate the LDA model on train/test it might be that otherwise less important information might disappear.
Actually if you determine the number of dimensions you should go for a train/test/validation split. Where you determine the optimal number of dimension on train/test. Then build LDA+Model on train and test merged and evaluate on validation.

When should you use pretrained weights when training deep learning models?

I am interested in training a range of image and object detection models and I am wondering what the general rule of when to use pretrained weights of a network like VGG16 is.
For example, it seems obvious that fine-tuning pre-trained VGG16 imagenet model weights is helpful you are looking for a subset ie. Cats and Dogs.
However it seems less clear to me whether using these pretrained weights is a good idea if you are training an image classifier with 300 classes with only some of them being subsets of the classes in the pretrained model.
What is the intuition around this?
Lower layers learn features that are not necessarily specific to your application/dataset: corners, edges , simple shapes, etc. So it does not matter if your data is strictly a subset of the categories that the original network can predict.
Depending on how much data you have available for training, and how similar the data is to the one used in the pretrained network, you can decide to freeze the lower layers and learn only the higher ones, or simply train a classifier on top of your pretrained network.
Check here for a more detailed answer

My semi-supervised linear discriminant analysis does not work at all

I am working on LDA (linear discriminant analysis), and you can refer to http://www.ccs.neu.edu/home/vip/teach/MLcourse/5_features_dimensions/lecture_notes/LDA/LDA.pdf .
My idea about semi-supervised LDA: I can use labeled data $X\in R^{d\times N}$ to computer all terms in $S_w$ and $S_b$. Now, I also have unlabeled data $Y\in R^{d\times M}$, and such data can be additionally used to estimate the covariance matrix $XX^T$ in $S_w$ by $\frac{N}{N+M}(XX^T+YY^T)$ which intuitively gets a better covariance estimation.
Implementation of different LDA: I also add a scaled identity matrix to $S_w$ for all compared methods, the scaling parameter should be tuned in different methods. I divide training data into two parts: labeled $X\in R^{d\times N}$, unlabeled $Y\in R^{d\times M}$ with $N/M$ ranging from $0.5$ to $0.05$. I run my semi-supervised LDA on three kinds of real datasets.
How to do classification: The eigenvectors of $S_w^{-1}S_b$ are used as the transformation matrix $\Phi$, then
Experiment results: 1) In the testing data, the classification accuracy of my semi-supervised LDA trained on data $X$& $Y$ is always a bit worse than the standard LDA trained only on data $X$. 2) Also, in one real data, the optimal scaling parameter can be very different for these two methods to achieve a best classification accuracy.
Could you tell me the reason and give me suggestion to make my semi-supervised LDA work? My codes have been checked. Many thanks.

How to use stacked autoencoders for pretraining

Let's say I wish to used stacked autoencoders as a pretraining step.
Let's say my full autoencoder is 40-30-10-30-40.
My steps are:
Train a 40-30-40 using the original 40 features data set in both input and output layers.
Using the trained encoder part only of the above i.e. 40-30 encoder, derive a new 30 feature representation of the original 40 features.
Train a 30-10-30 using the new 30 features data set (derived in step 2) in both input and output layers.
Take the trained encoder from step 1 ,40-30, and feed it into the encoder from step 3,30-10, giving a 40-30-10 encoder.
Take the 40-30-10 encoder from step 4 and use it as the input the NN.
a) Is that correct?
b) Do I freeze the weights in the 40-30-10 encoder when training the NN which would be the same as pregenerating the 10 feature representation from the original 40 feature data set and training on the new 10 feature representation data set.
PS. I already have a question out asking about whether I need to tie the weights of the encoder and decoder
a) Is that correct?
This is one of the typical approaches. You could also try to fit the autoencoder directly, as "raw" autoencoder with that many layers should be possible to fit right away, As an alternative you might consider fitting stacked denoising autoencoders instead, which might benefit more from "stacked" training.
b) Do I freeze the weights in the 40-30-10 encoder when training the NN which would be the same as pregenerating the 10 feature representation from the original 40 feature data set and training on the new 10 feature representation data set.
When you train whole NN you do not freeze anything. Pretraining is only a kind of preconditioning for the optimization process - you show your method where to start, but you do not want to limit the fitting procedure of actual supervised learning.
PS. I already have a question out asking about whether I need to tie the weights of the encoder and decoder
No, you do not have to tie weights, especially that you actually throw away your decoder anyway. Tieing the weights is important for some more probabilistic models in order to make minimization procedure possible (like in the case of RBMs), but for autoencoder there is no point.

What's the difference between ANN, SVM and KNN classifiers?

I am doing remote sensing image classification. I am using the object-oriented method: first I segmented the image to different regions, then I extract the features from regions such as color, shape and texture. The number of all features in a region may be 30 and commonly there are 2000 regions in all, and I will choose 5 classes with 15 samples for every class.
In summary:
Sample data 1530
Test data 197530
How do I choose the proper classifier? If there are 3 classifiers (ANN, SVM, and KNN), which should I choose for better classification?
KNN is the most basic machine learning algorithm to paramtise and implement, but as alluded to by #etov, would likely be outperformed by SVM due to the small training data sizes. ANNs have been observed to be limited by insufficient training data also. However, KNN makes the least number of assumptions regarding your data, other than that accurate training data should form relatively discrete clusters. ANN and SVM are notoriously difficult to paramtise, especially if you wish to repeat the process using multiple datasets and rely upon certain assumptions, such as that your data is linearly separable (SVM).
I would also recommend the Random Forests algorithm as this is easy to implement and is relatively insensitive to training data size, but I would advise against using very small training data sizes.
The scikit-learn module contains these algorithms and is able to cope with large training data sizes, so you could increase the number of training data samples. the best way to know for sure would be to investigate them yourself, as suggested by #etov
If your "sample data" is the train set, it seems very small. I'd first suggest using more than 15 examples per class.
As said in the comments, it's best to match the algorithm to the problem, so you can simply test to see which algorithm works better. But to start with, I'd suggest SVM: it works better than KNN with small train sets, and generally easier to train then ANN, as there are less choices to make.
Have a look at below mind map
KNN: KNN performs well when sample size < 100K records, for non textual data. If accuracy is not high, immediately move to SVC ( Support Vector Classifier of SVM)
SVM: When sample size > 100K records, go for SVM with SGDClassifier.
ANN: ANN has evolved overtime and they are powerful. You can use both ANN and SVM in combination to classify images
More details are available #semanticscholar.org

Resources