Is the class imbalance problem inherent to GANs? In a GAN, there are 2 networks working against each other, one is a classifier and the adversary is trying to fool the classifier by generating fake images. All of the generated images from the GAN will be fakes, so if the algorithm is run for long enough, there has to be a class imbalance, right?
Not right, but you have some basic concepts correct.
The classifier trains with real images as well. It's goal is to accurately discriminate between these real images and the fakes from the generator.
The adversary's goal is to generate images that will fool the classifier.
The model builder (i.e. you) chooses the balance between real and fake images in each iteration. This supports experiments to determine the most effective ratio.
True, the real images have a fixed population, and the generated images are effectively infinite. However, the idea of "class imbalance" doesn't apply as well here: after each iteration, the old fake images are replaced by new ones. The old images were useful for earlier training, but are not used after that single exposure to the classifier.
Related
I've been working with some face detection in OpenCV. I have a couple projects I've done - one does face detection which uses a pre-built model. Some others do different things where I collect my own images and train my own models. When I do the latter, it's generally with much smaller datasets that what you'd use for face training.
On my face recognizer - many of the common faces I work with do not get detected properly (due to odd properties like masks, hats, goggles, glasses, etc). So I want to re-train my own model - but grabbing the gigantic "stock" datasets, adding my images to it may take a VERY long time.
So the question is: is there a way to start with an existing model (XML file) and run the trainer in a way that would just add my images to it?
This is called "Transfer Learning". Tenserflow (Keras) has a lot of support for this. It basically consists of taking a pre-existing model with pre-existing weights, "freezing" weights on certain layers, adding new layers in or below existing ones, and then retraining only the un-frozen layers.
it can't be used to readily just "continue" learning, but can be used to add additional things into the training - for newer aspects (like potentially, adding masked people to an already trained model of unmasked people, as in my original question)
I have a data science problem which has around 70k images already labelled across 20 different categories. Some categories have many images whereas some other have fewer images. This in turn results in an imbalanced data set and poor results (currently at 68% accuracy). After some research I found out that I need to do a sampling of the images (Image mining?) instead of selecting all the images. One such approach could be Stratified sampling. Question is how do I select images to optimise the training of the model? Any command line tool or open source code that I could use on 70k images?
You have imbalanced data so to deal with that, you can simply use a library called Imbalanced learn
This library is originally focussed for implementing SMOTE but later also implemented under sampling and over sampling techniques.
It is also compatible with scikit-learn.
Using this approach, will result in data resampling in a way that every class has nearly equal instances.
Second option:
You can simply pick equal number of images for each class and form training data. This may not boost your accuracy due to lack of proper test data but surely your model will become more robust and generalized.
I have recently been looking into incorporating the machine learning release for iOS developers with my app. Since this is my first time ever using anything ML related I was very lost when I started reading the different model descriptions that Apple has made available. They have the same purpose/description, the only difference being the actual file size. What is the difference between these models and how would you know which one is best fit ?
The models Apple makes available are just for simple demo purposes. Most of the time, these models are not sufficient for use in your own app.
The models on Apple's download page are trained for a very specific purpose: image classification on the ImageNet dataset. This means they can take an image and tell you what the "main" object is in the image, but only if it's one of the 1,000 categories from the ImageNet dataset.
Usually, this is not what you want to do in your own apps. If your app wants to do image classification, typically you want to train a model on your own categories (like food or cars or whatever). In that case you can take something like Inception-v3 (the original, not the Core ML version) and re-train it on your own data. That gives you a new model, which you then need to convert to Core ML again.
If your app wants to do something other than image classification, you can use these pretrained models as "feature extractors" in a larger neural network structure. But again this involves training your own model (usually from scratch) and then converting the result to Core ML.
So only in a very specific use case -- image classification using the 1,000 ImageNet categories -- are these Apple-provided models useful to your app.
If you do want to use any of these models, the difference between them is speed vs. accuracy. The smaller models are fastest but also least accurate. (In my opinion, VGG16 shouldn't be used on mobile. It's just too big and it's no more accurate than Inception or even MobileNet.)
SqueezeNets are fully convolutional and use Fire modules which have a squeeze layer of 1x1 convolutions which vastly decreases parameters as it can restrict the number of input channels each layer. This makes SqueezeNets extremely low latency, in addition to the fact they don't have dense layers.
MobileNets utilise depth-wise separable convolutions, very similar to inception towers in inception. These also reduce the number of a parameters and hence latency. MobileNets also have useful model-shrinking parameters than you can call before training to make it exact size you want. The Keras implementation can use ImageNet pre-trained weights too.
The other models are very deep, large models. The reduced number of parameters / style of convolution is not used for low latency but just for the ability to train very deep models, essentially. ResNet introduced residual connections between layers which were originally believed to be key in training very deep models. These aren't seen in the previously mentioned low latency models.
There are different pictures of the same object. The pictures made from different angles, so while the object on the picture is the same, the pictures itself could be quite different.
Is there an example or ready to use deep learning model that will produce similar/close vectors for different pictures of the same object? (seems like face detection works in a kinda similar way...)
What you are looking for is a Siamese network, where in you pass 2 images through the same network and try to maximize distance between dissimilar images and minimize it between similar ones. Another variant used three images instead of two with one acting as anchor and one of the other two belonging to same class as original and other belonging to different class and you try to minimize and maximize distance from the anchor respectively. The loss function that achieves this is contrastive loss function. Look here for implementation of contrastive loss. And you can use any standard architecture in such a setting , I have personally found VGG-16 easy to tune and simple.
Here are some papers you should look at to understand the math and theory behind the same
Learning visual similarity for product design
Learning a Similarity Metric Discriminatively, with Application to Face
Verification
There is a way to do object detection, retraining Inception model provided by Google in Tensorflow? The goal is to predict wheter an image contains a defined category of objects (e.g. balls) or not. I can think about it as a one-class classification or multi-class with only two categories (ball and not-ball images). However, in the latter I think that it's very difficult to create a good training set (how many and which kind of not-ball images I need?).
Yes, there is a way to tell if something is a ball. However, it is better to use Google's Tensorflow Object Detection API for Tensorflow. Instead of saying "ball/no ball," it will tell you it thinks something is a ball with XX% accuracy.
To answer your other questions: with object detection, you don't need non-ball images for training. You should gather about 400-500 ball images (more is almost always better), split them into a training and an eval group, and label them with this. Then you should convert your labels and images into a .record file according to this. After that, you should set up Tensorflow and train.
This entire process is not easy. It took me a good couple of weeks with an iOS background to successfully train a single object detector. But it is worth it in the end, because now I can rapidly switch out images to train a different object detector whenever an app needs it.
Bonus: use this to convert your new TF model into a .mlmodel usable by iOS/Android.