I'm working on a Machine Learning classification problem. Where I need to classify input data into 8 classes + none of the classes. Should I be considering it as a classification of 9 classes ?
It depends what you want to do with this data.
If you are interested in the 'none' class as interesting outliers, or they are mainly noise, you can try run binary classification with other 8 classes vs the 'none' class, or some outlier detection algorithm.
On the other hand, if the 'none' class is not different from the other ones, then I don't see the point of treating it differently.
Related
i hope everyone is doing well
I need some help with generative models.
So im working on a project where the main task is to build a binary classification model. In the dataset which contains 300000 sample and 100 feature, there is an imbalance between the 2 classes where majority class is too much bigger than the minory class.
To handle this problem, i'm using VAE (variational autoencoders) to solve this problem.
So i started training the VAE on the minority class and then use the decoder part of the VAE to generate new or fake samples that are similars to the minority class then concatenate this new data with training set in order to have a new balanced training set.
My question is : is there anyway to evalutate generative models like vae, like is there a way to know if the data generated is similar to the real one ??
I have read that there is some metrics to evaluate generated data like inception distance and Frechet inception distance but i saw that they have been only used on image data
I wanna know if i can use them too on my dataset ?
Thanks in advance
I believe your data is not image as you say there are 100 features. What I believe that you can check the similarity between the synthesised features and the original features (the ones belong to minority class), and keep only the ones with certain similarity. Cosine similarity index would be useful for this problem.
That would be also very nice to check a scatter plot of the synthesised features with the original ones to see if they are close to each other. tSNE would be useful at this point.
We are attempting to implement multi-label classification using CNN in pytorch. We have 8 labels and around 260 images using a 90/10 split for train/validation sets.
The classes are highly imbalanced with the most frequent class occurring in over 140 images. On the other hand, the least frequent class occurs in less than 5 images.
We attempted BCEWithLogitsLoss function initially that led to the model predicting the same label for all images.
We then implemented a focal loss approach to handle class imbalance as follows:
import torch.nn as nn
import torch
class FocalLoss(nn.Module):
def __init__(self, alpha=1, gamma=2):
super(FocalLoss, self).__init__()
self.alpha = alpha
self.gamma = gamma
def forward(self, outputs, targets):
bce_criterion = nn.BCEWithLogitsLoss()
bce_loss = bce_criterion(outputs, targets)
pt = torch.exp(-bce_loss)
focal_loss = self.alpha * (1 - pt) ** self.gamma * bce_loss
return focal_loss
This resulted in the model predicting empty sets (no labels) for every image since it could not get a greater than 0.5 confidence for any classes.
Is there a approach in pytorch to help address this situation?
There's basically three ways of dealing with this.
Discard data from the more common class
Weight minority class loss values more heavily
Oversample the minority class
Option 1 is implemented by selecting the files you include in your Dataset.
Option 2 is implemented with the pos_weight parameter for BCEWithLogitsLoss
Option 3 is implemented with a custom Sampler passed to your Dataloader
For deep learning, oversampling typically works best.
I am trying to understand how a GAN is trained. I believe understand the Adversarial training process. What I can't seem to find information on is this: do GANs use class labels in the training process? My current understanding says no - because the discriminator is simply trying to discriminate between real or fake images, while the generator is trying to create real image (but not images of any specific class.)
If this is the case, then how do researchers propose to use the discriminator network for classification tasks? the network would only be able to perform two way classification between real or fake images. The generator network would also be difficult to use, seeing as we don't know what setting of the input vector 'Z' will result in the required generated image.
It completely depends on the network you are trying to build. If you are talking specifically about the basic GAN, then you are correct. Class labels are not needed as the discriminator network is only classifying real/fake images. There is a conditional variant of the GAN (cGAN) where you do make use of the class labels in both the generator and the discriminator. This allows you to produce examples for a specific class with the generator and classify them with the discriminator (along with the real/fake classification)
From the reading that I have done, the discriminator network is just used as a tool for training the generator, and the generator is the main network of concern. Why would you use the discriminator that you used to train the GAN for classification when you could just use a ResNet or VGG net for your classification tasks. These networks would work better anyway. You are right however that using the original GAN could cause difficulty because of the mode collapse and constantly producing the same image. That is why the conditional variant was introduced.
Hope this clears things up!
Do GANs use class labels in the training process?
The author suspected GANs doesn't require labels. This is correct. The discriminator is trained to classify real and fake images. Since we know which images are real and which are generated by the generator, we do not need labels to train the discriminator. The generator is trained to fool the discriminator, which also doesn't require labels.
This is one of the most attractive benefits of GANs [1]. Usually, we refer to methods that do not require labels as unsupervised learning. That said, if we had labels, maybe we could train a GAN that uses the labels to improve performance. This idea underlies the follow-up work by [2] who introduced the conditional GAN.
If this is the case, then how do researchers propose to use the discriminator network for classification tasks?
There seems to be a misunderstanding here. The purpose of the discriminator is NOT to act as a classifier on real data. The purpose of the discriminator is to "tell the generator how to improve its fakes". This is done by using the discriminator as a loss function, which we can backpropagate gradients through if it is a neural network. After training, we usually discard the discriminator.
The generator network would also be difficult to use, seeing as we don't know what setting of the input vector 'Z' will result in the required generated image.
It seems the underlying reason for posting the question lies here. The input vector 'Z' is chosen such that it follows some distribution, typically a normal distribution. But then what happens if we take 'Z', a random vector with normally distributed entries, and computes 'G(Z)'? We get a new vector which follows a very complicated distribution that depends on G. The entire idea of GANs is to change G such that this new complicated distribution is close to the distribution of our data. This idea is formalized with f-Divergences in [3].
[1] https://arxiv.org/abs/1406.2661
[2] https://arxiv.org/abs/1411.1784
[3] https://arxiv.org/abs/1606.00709
I have a SVM model consisting of 6 classes and 19 features. It works well, 95% accuracy.
I'm evaluating, how to get the last 5%. My idea is to create other models with other features, train instances.
Another idea is to rearrange the existing model from 6 classes to 6 models each with 2 classes, where one class is positive and the other 5 classes are negative. The features will remain the same. Will it bring any new classification results, or is it just a redundant model?
Thank you!
My idea is to create other models with other features, train
instances.
Yes, it's a good idea. Check performance of other models on your data.
Another idea is to rearrange the existing model from 6 classes to 6
models each with 2 classes, where one class is positive and the other
5 classes are negative.
Since SVM is a binary classifier. A multiclass SVM classifier internally uses either One-Vs-All or One-vs-One. What you are suggesting is one-vs-all. Since libsvm uses One-vs-One technique. you can use one-vs-all but this usually doesn't increase accuracy performance as one-vs-one uses more number of classifier.
SVM is only actually capable of doing binary classification. The multi-class adaptation uses several models and votes on what the class should be in a one-vs-one scheme.
Quick example:
class1 vs class2
class2 vs class3
class1 vs class3
would all be used in a 3-class SVM, then the models would vote on what class a observation should be. one-vs-all is another popular way to use SVM in a multiple classification scenario. To answer your question, that's already kind of what is going on behind the scenes. It is possible building even more models could help improve on your accuracy by a small margin, so its worth a shot if you're bored and want to see if it helps or not
I am a deep-learning newbie and working on creating a vehicle classifier for images using Caffe and have a 3-part question:
Are there any best practices in organizing classes for training a
CNN? i.e. number of classes and number of samples for each class?
For example, would I be better off this way:
(a) Vehicles - Car-Sedans/Car-Hatchback/Car-SUV/Truck-18-wheeler/.... (note this could mean several thousand classes), or
(b) have a higher level
model that classifies between car/truck/2-wheeler and so on...
and if car type then query the Car Model to get the car type
(sedan/hatchback etc)
How many training images per class is a typical best practice? I know there are several other variables that affect the accuracy of
the CNN, but what rough number is good to shoot for in each class?
Should it be a function of the number of classes in the model? For
example, if I have many classes in my model, should I provide more
samples per class?
How do we ensure we are not overfitting to class? Is there way to measure heterogeneity in training samples for a class?
Thanks in advance.
Well, the first choice that you mentioned corresponds to a very challenging task in computer vision community: fine-grained image classification, where you want to classify the subordinates of a base class, say Car! To get more info on this, you may see this paper.
According to the literature on image classification, classifying the high-level classes such as car/trucks would be much simpler for CNNs to learn since there may exist more discriminative features. I suggest to follow the second approach, that is classifying all types of cars vs. truck and so on.
Number of training samples is mainly proportional to the number of parameters, that is if you want to train a shallow model, much less samples are required. That also depends on your decision to fine-tune a pre-trained model or train a network from scratch. When sufficient samples are not available, you have to fine-tune a model on your task.
Wrestling with over-fitting has been always a problematic issue in machine learning and even CNNs are not free of them. Within the literature, some practical suggestions have been introduced to reduce the occurrence of over-fitting such as dropout layers and data-augmentation procedures.
May not included in your questions, but it seems that you should follow the fine-tuning procedure, that is initializing the network with pre-computed weights of a model on another task (say ILSVRC 201X) and adapt the weights according to your new task. This procedure is known as transfer learning (and sometimes domain adaptation) in community.