I‘m currently coding a Generative Adversarial Network (GAN) from scratch with my own neural network library to generate MNIST handwritten digits. The discriminator seems to work fine, but the generator doesn‘t really learn anything over time. Maybe my training approach is wrong.
So my question is, if I can actually train my generator this way.
So first I train my discriminator with real Examples and the output 1 and then with fake examples generated by the generator and the output 0. This works fine.
Next I train the generator by running the discriminator with fake examples, but with the output 1 (the generator wants the discriminator to classify his generated images as real),and I backpropagate the error all the way back to the input layer of the discriminator, but without updating his weights. This error of the input layer I then backpropagate through the generator and update him based on this.
Can I actually do that and backpropagate the error of the discriminator through the generator? The generator is essentially the input to the discriminator right? Or this there a better way to do it?
Any help is appreciated.
From your question, I assume you are proposing an approach like this: While training discriminator, you want to backpropagate till generator (to the point where we provide noise) instead of detaching it at the beginning of discriminator( the first layer of discriminator) ?
If this is the case, then you are updating generator parameters with respect to discriminator's loss. The job of Discriminator is to update it's parameters so that it can classify between real and fake. If you don't stop the backpropagation and let it go inside generator, the parameters of the generator will get updated wrt disc loss which makes the generator produce an image that can be easily distingushed by discriminator. This create a mess as you're training gen to fool disc and at the same time, your gen is getting fooled by disc
The approach is simply
Generate a image from Generator
Pass the real image to disc and equate it with 1
Pass the fake image to disc and equate it with 0 (or vice-versa)
Perform back prop and make sure to detach fake image (fake.detach() in pytorch). So that halts backprop there itself and doesn't update generator parameters
Then, perform generator training by passing the fake image through disc with 1 ( or 0 if you have taken the vice-versa case above)
GANs do take a lot of time to train. To perform best training,
https://github.com/soumith/ganhacks
Follow these hacks
Related
I have trained a neural network and an XGBoost model for the same problem, now I am confused that how should I stack them. Should I just pass the output of the neural network as a parameter to the XGBoost model, or should I take the weighting of their results seperately ? Which would be better ?
This question cannot be clearly answered. I would suggest to check both possibilities and chose the one, that worked best.
Using the output of one model as input to the other model
I guess, you know, what you have to do to use the output of the NN as input to XGBoost. You should just take some time, about how you handle the test and train data (see below). Use the "probabilities" rather than the binary labels for that. Of course, you could also try it vice-versa, so that the NN gets the output of the XGBoost model as an additional input.
Using a Votingclassifier
The other possibility is to use a VotingClassifier using soft-voting. You can use VotingClassifier(voting='soft') for that (to be precise sklearn.ensemble.VotingClassifier). You could also play around with the weights here.
Difference
The big difference is, that with the first possibility the XGBoost model might learn, in what areas the NN is weak and in which it is strong, while with the VotingClassifier the outputs of both models are equally weighted for all samples and it relies on the assumption that the model output a "probability" not so close to 0 / 1 if they are not so confident about the prediciton of the specific input record. But this assumption might not be always true.
Handling of the Train/Testdata
In both cases, you need to think about, how you should handle the train/test data. The train/test data should ideally be split the same way for both models. Otherwise you might introduce some kind of data-leakage problem.
For the VotingClassifier this is no problem, because it can be used as a regular skearn model class. For the first method (output of model 1 is one feature of model 2), you should make sure, you do the train-test-split (or the cross-validation) with exactly the same records. If you don't do that, you would run the risk to validate the output of your second model on a record which was in the training set of model 1 (except for the additonal feature of course) and this clearly could cause a data-leakage problem which results in a score that appears to be better than how the model would actually perform on unseen productive data.
I am working on implementing a Generative Adversarial Network (GAN) in PyTorch 1.5.0.
For computing the loss of the generator, I compute both the negative probabilities that the discriminator mis-classifies an all-real minibatch and an all-(generator-generated-)fake minibatch. Then, I back-propagate both parts sequentially and finally apply the step function.
Calculating and back-propagating the part of the loss which is a function of the mis-classifications of the generated fake data seems straight forward, since during back-propagation of that loss term, the backward path leads through the generator who has produced the fake data in the first place.
However, classification of all-real-data minibatches does not involve passing data through the generator. Therefore, I was wondering whether the following code snipped would still calculate gradients for the generator or whether it would not calculate any gradients at all (since the backward path does not lead through the generator and the discriminator is in eval-mode while updating the generator)?
# Update generator #
net.generator.train()
net.discriminator.eval()
net.generator.zero_grad()
# All-real minibatch
x_real = get_all_real_minibatch()
y_true = torch.full((batch_size,), label_fake).long() # Pretend true targets were fake
y_pred = net.discriminator(x_real) # Produces softmax probability distribution over (0=label_fake,1=label_real)
loss_real = NLLLoss(torch.log(y_pred), y_true)
loss_real.backward()
optimizer_generator.step()
If this doesn’t work as intended, how could I make it work? Thanks in advance!
No gradients are propagated to the generator, as no calculation was performed with any of the generator's parameters. The discriminator being in eval mode would not prevent the gradients from propagating to the generator, albeit they would be slightly different if you are using layers that behave differently in eval mode compared to train mode, such as dropout.
The misclassification of real images is not part of training the generator, because it doesn't gain anything from this information. Conceptually, what should the generator learn from the fact that the discriminator failed to correctly classify a real image? The sole task of the generator is to create a fake image such that the discriminator thinks it's real, therefore the only relevant information for the generator is whether the discriminator was able to identify the fake image. If the discriminator was indeed able to identify the fake image, the generator needs to adjust itself to create a more convincing fake.
Of course it's not a binary case, but the generator always tries to improve the fake image such that the discriminator is even more convinced that it was a real image. The generator's goal is not to make the discriminator be doubtful (probability of 0.5 that it's real or fake), but that the discriminator is fully convinced that it's real, even though it's fake. That's why they are adversarial, not cooperative.
I implemented the proposed GAN Model from the Paper Edge-Connect (https://github.com/knazeri/edge-connect) in Keras and did some trainings on the KITTI dataset. Now I am trying to figure out what's going on inside my model and therefore I have a few questions.
1. Initial Training (100 Epochs, 500 batches/epoch, 10 Samples/Batch)
At first I trained the model as proposed in the paper (incuding style-, perceptual-, L1- and adversarial loss)
At first sight, the model converges to nice results:
This is the output of the generator(left) for the masked input(right)
Most of the graphs from the tensorboard look quite good as well:
(These are all values from the GAN-Model, containing the total loss of the generator(GENERATOR_Loss), different losses based on the generated image (L1, perc, style) as well as the adversarial loss (DISCRIMINATOR_loss)
When closely looking at the discriminator, things look different. The adversarial loss of the discriminiator for the generated images steadly increases.
The loss while training the discriminator (50/50 fake/real examples) doesn't change at all:
![] (https://i.stack.imgur.com/o5jCA.png)
And when looking at the histogram of activations of the output of the discriminator it always outputs values around 0.5.
Coming to my questions/conclusions where I would appreciate your feedback:
So I assume now, that my model learned a lot but nothing from the discriminator, right? The results are all based on the losses other
than the adversarial loss?
It seems that the Discriminator could not keep up with the generator generating better images. I think the discriminators activations should somehow early move to two peaks at around 0 (fake labels) and 1 (real lables) and stay there?
I know that my final goal is that the discriminator outputs 0.5 probability for real as well as fake... but what does it mean when this happens right from the beginning and doesn't change during training?
Did I stop training too early? Could the discriminator catch up (since the output of the generator doesn't change much anymore) and eliminate the last tiny faults of the generator?
2. Thus I started a second training, this time only using the adversarial loss in the generator! (~16 Epochs, 500 batches/epoch, 10 Samples/Batch)
This time the discriminator seems to be able to differentiate between real and fake after a while.
(prob_real is the mean probability assigned to real images and vice versa)
The histogram of activations looks good as well:
But somehow after around 4k Samples things start to change and at around 7k it diverges...
Also all samples from the generator look like this:
Coming to my second part of questions/conclusions:
Should I pretrain the discriminator so it gets a head start? I guess it needs to somehow be able to differentiate between real and fake (outputting large probabilites for real and vice versa) so the generator can learn usefull things from it? Should I train the discriminator multiple times while training the generator one step for the same reason?
What happend in the second training? Was the learn rate for the discriminator too high? (Opt: ADAM, lr=1.0E-3)
Many hints on the internet for training GANs aim for increasing the difficulty of the discriminators job (Label noise/label flipping, instance noise, label smoothing etc). Here I think the discriminator rather needs to be boosted? (-> I also trained the Disc without changing the generator and it converges nicely)
If discriminator outputs 0.5 probability directly in the beginning of the network it means that the weights of the discriminator are not being updated and it plays no role in training, which further indicates it is not able to differentiate between real and fake image coming from the generator. To solve this issue try to add Gaussian noise as an input to the discriminator or do label smoothing which are very simple and effective techniques.
In answer to your this question, that The results are all based on the losses other than the adversarial loss , the trick that can be used is try to train the network first on all the losses except the adversarial loss and then fine tune on the adversarial losses, hope it helps.
For the second part of your questions, the generated images seem to face the problem of mode collpase where they tend to learn color, degradation from 1 image and pass the same to the other images , try to solve it out by either decreasing the batch size or using unrolled gans,
I have a simple pytorch neural net that I copied from openai, and I modified it to some extent (mostly the input).
When I run my code, the output of the network remains the same on every episode, as if no training occurs.
I want to see if any training happens, or if some other reason causes the results to be the same.
How can I make sure any movement happens to the weights?
Thanks
Depends on what you are doing, but the easiest would be to check the weights of your model.
You can do this (and compare with the ones from previous iteration) using the following code:
for parameter in model.parameters():
print(parameter.data)
If the weights are changing, the neural network is being optimized (which doesn't necessarily mean it learns anything useful in particular).
I am trying to understand how a GAN is trained. I believe understand the Adversarial training process. What I can't seem to find information on is this: do GANs use class labels in the training process? My current understanding says no - because the discriminator is simply trying to discriminate between real or fake images, while the generator is trying to create real image (but not images of any specific class.)
If this is the case, then how do researchers propose to use the discriminator network for classification tasks? the network would only be able to perform two way classification between real or fake images. The generator network would also be difficult to use, seeing as we don't know what setting of the input vector 'Z' will result in the required generated image.
It completely depends on the network you are trying to build. If you are talking specifically about the basic GAN, then you are correct. Class labels are not needed as the discriminator network is only classifying real/fake images. There is a conditional variant of the GAN (cGAN) where you do make use of the class labels in both the generator and the discriminator. This allows you to produce examples for a specific class with the generator and classify them with the discriminator (along with the real/fake classification)
From the reading that I have done, the discriminator network is just used as a tool for training the generator, and the generator is the main network of concern. Why would you use the discriminator that you used to train the GAN for classification when you could just use a ResNet or VGG net for your classification tasks. These networks would work better anyway. You are right however that using the original GAN could cause difficulty because of the mode collapse and constantly producing the same image. That is why the conditional variant was introduced.
Hope this clears things up!
Do GANs use class labels in the training process?
The author suspected GANs doesn't require labels. This is correct. The discriminator is trained to classify real and fake images. Since we know which images are real and which are generated by the generator, we do not need labels to train the discriminator. The generator is trained to fool the discriminator, which also doesn't require labels.
This is one of the most attractive benefits of GANs [1]. Usually, we refer to methods that do not require labels as unsupervised learning. That said, if we had labels, maybe we could train a GAN that uses the labels to improve performance. This idea underlies the follow-up work by [2] who introduced the conditional GAN.
If this is the case, then how do researchers propose to use the discriminator network for classification tasks?
There seems to be a misunderstanding here. The purpose of the discriminator is NOT to act as a classifier on real data. The purpose of the discriminator is to "tell the generator how to improve its fakes". This is done by using the discriminator as a loss function, which we can backpropagate gradients through if it is a neural network. After training, we usually discard the discriminator.
The generator network would also be difficult to use, seeing as we don't know what setting of the input vector 'Z' will result in the required generated image.
It seems the underlying reason for posting the question lies here. The input vector 'Z' is chosen such that it follows some distribution, typically a normal distribution. But then what happens if we take 'Z', a random vector with normally distributed entries, and computes 'G(Z)'? We get a new vector which follows a very complicated distribution that depends on G. The entire idea of GANs is to change G such that this new complicated distribution is close to the distribution of our data. This idea is formalized with f-Divergences in [3].
[1] https://arxiv.org/abs/1406.2661
[2] https://arxiv.org/abs/1411.1784
[3] https://arxiv.org/abs/1606.00709