I am currently training a GAN using Pytorch to produce histopathology data for my research. I am using BCE criterion for both Generator and Discriminator. The network is able to produce good quality images but the loss curves are bit mysterious for me.
The generator and discriminator loss curves look like exact mirror images. See the attached tensor-board snip. Can someone tell me why this is happening?
Edit 1: Both generator and discriminator loss curves should show convergence, right?
Thanks a lot in advance!
The training curve you produce is somehow standard when training a GAN. The Generator and Discriminator are going to converge. If you plot the Gen Loss and Dis Loss together, you'll find out adversarial property. In fact, most of the time, validating the model by looking at the generated image is an efficient way.
Here are some of my works for your reference.
Related
I implemented the proposed GAN Model from the Paper Edge-Connect (https://github.com/knazeri/edge-connect) in Keras and did some trainings on the KITTI dataset. Now I am trying to figure out what's going on inside my model and therefore I have a few questions.
1. Initial Training (100 Epochs, 500 batches/epoch, 10 Samples/Batch)
At first I trained the model as proposed in the paper (incuding style-, perceptual-, L1- and adversarial loss)
At first sight, the model converges to nice results:
This is the output of the generator(left) for the masked input(right)
Most of the graphs from the tensorboard look quite good as well:
(These are all values from the GAN-Model, containing the total loss of the generator(GENERATOR_Loss), different losses based on the generated image (L1, perc, style) as well as the adversarial loss (DISCRIMINATOR_loss)
When closely looking at the discriminator, things look different. The adversarial loss of the discriminiator for the generated images steadly increases.
The loss while training the discriminator (50/50 fake/real examples) doesn't change at all:
![] (https://i.stack.imgur.com/o5jCA.png)
And when looking at the histogram of activations of the output of the discriminator it always outputs values around 0.5.
Coming to my questions/conclusions where I would appreciate your feedback:
So I assume now, that my model learned a lot but nothing from the discriminator, right? The results are all based on the losses other
than the adversarial loss?
It seems that the Discriminator could not keep up with the generator generating better images. I think the discriminators activations should somehow early move to two peaks at around 0 (fake labels) and 1 (real lables) and stay there?
I know that my final goal is that the discriminator outputs 0.5 probability for real as well as fake... but what does it mean when this happens right from the beginning and doesn't change during training?
Did I stop training too early? Could the discriminator catch up (since the output of the generator doesn't change much anymore) and eliminate the last tiny faults of the generator?
2. Thus I started a second training, this time only using the adversarial loss in the generator! (~16 Epochs, 500 batches/epoch, 10 Samples/Batch)
This time the discriminator seems to be able to differentiate between real and fake after a while.
(prob_real is the mean probability assigned to real images and vice versa)
The histogram of activations looks good as well:
But somehow after around 4k Samples things start to change and at around 7k it diverges...
Also all samples from the generator look like this:
Coming to my second part of questions/conclusions:
Should I pretrain the discriminator so it gets a head start? I guess it needs to somehow be able to differentiate between real and fake (outputting large probabilites for real and vice versa) so the generator can learn usefull things from it? Should I train the discriminator multiple times while training the generator one step for the same reason?
What happend in the second training? Was the learn rate for the discriminator too high? (Opt: ADAM, lr=1.0E-3)
Many hints on the internet for training GANs aim for increasing the difficulty of the discriminators job (Label noise/label flipping, instance noise, label smoothing etc). Here I think the discriminator rather needs to be boosted? (-> I also trained the Disc without changing the generator and it converges nicely)
If discriminator outputs 0.5 probability directly in the beginning of the network it means that the weights of the discriminator are not being updated and it plays no role in training, which further indicates it is not able to differentiate between real and fake image coming from the generator. To solve this issue try to add Gaussian noise as an input to the discriminator or do label smoothing which are very simple and effective techniques.
In answer to your this question, that The results are all based on the losses other than the adversarial loss , the trick that can be used is try to train the network first on all the losses except the adversarial loss and then fine tune on the adversarial losses, hope it helps.
For the second part of your questions, the generated images seem to face the problem of mode collpase where they tend to learn color, degradation from 1 image and pass the same to the other images , try to solve it out by either decreasing the batch size or using unrolled gans,
Any idea about why our training loss is smooth and our validation loss is that noisy (see the link) across epochs? We are implementing a deep learning model for diabetic retinopathy detection (binary classification) using the data set of fundus photographs provided by this Kaggle competition. We are using Keras 2.0 with Tensorflow backend.
As the data set is too big to fit in memory, we are using fit_generator, with ImageDataGenerator randomly taking images from training and validation folders:
# TRAIN THE MODEL
model.fit_generator(
train_generator,
steps_per_epoch= train_generator.samples // training_batch_size,
epochs=int(config['training']['epochs']),
validation_data=validation_generator,
validation_steps= validation_generator.samples // validation_batch_size,
class_weight=None)
Our CNN architecture is VGG16 with dropout = 0.5 in the last two fully connected layers, batch normalization only before the first fully connected layer, and data augmentation (consisting on flipping the images horizontally and vertically). Our training and validation samples are normalized using the training set mean and standard deviation. Batch size is 32. Our activation is a sigmoid and the loss function is the binary_crossentropy. You can find our implementation in Github
It definitely has nothing to do with overfitting, as we tried with a highly regularized model and the behavior was quite the same. Is it related with the sampling from the validation set? Has any of you had a similar problem before?
Thanks!!
I would look, in that order:
bug in validation_generator implementation (incl. steps - does it go through all pics reserved for validation?)
in validation_generator, do not use augmentation (reason: an augmentation might be bad, not learnable, and at train, it does achieve a good score only by hard-coding relationships which are not generalizable)
change train/val split to 50/50
calculate, via a custom callback, the validation loss at the end of the epoch (use the same function, but calling it with a callback produces different (more accurate, at certain, non-standard models) results)
If nothing of the above gives a more smooth validation loss curve, then my next assumption would be that this is the way it is, and I might need to work on the model architecture
I'm suffering some problems in my model. The model seems not learning because the accuracy remains the same value.
I've been trying to solve this training problem for several days and I haven't figure out what's wrong with my model, so I try to take advantage of tensorboard.
The result shown in the tensorboard:
NOTE: Only fc6, fc7, fc8 layers are to be trained, since the previous conv layers are pretrained.
graph
accuracy
cross_entropy
distribution of fc6 layer
histogram of three trained fc layers
Can I know from the graph diagram that the backprop is conducted properly? Or the backprop is to be done by tensorflow and won't be presented here?
From the histogram, I suppose the parameters are, indeed, trained in the first epoch, but I don't know why it seems not to be trained in the following epochs.
I wonder how I can interpret the diagrams and extract useful information so that I can get to know where might be the incorrect part of my code. (I've searched how to interpret tensorboard diagrams and I can't get any inspiration on how can I interpret my own.)
I can give the code if it's needed, but it's a bit long so I'm not presenting it now.
I have a set of images of a particular object. I want to find if some of these has anomalies with a machine learning algorithm. For example if I have many photos of glasses I want to find if one of these is broken or has something anomalous. Something like this:
GOOD!!
BAD!!
(Obviously I will use the same kind of glasses...)
The problem is that I don't know every negative situation, so, for training, I have only positive images.
In other words I want an algorithm that recognize if an image has something different from the dataset. Do you have any suggestion?
In particular is there a way to use convolutional neural network?
What you are looking for is usually called anomaly, outlier, or novelty detection. You have lots of examples of what your data should look like, and you want to know when something doesn't look like your data.
A good approach for this problem, since you are using images, you can get a feature vectorized version using a pre-trained CNN on image net. Then you can use an anomaly detector on that feature set. The isolation forest should be an easier one to get working.
This is a typical Classification problem. I do not understand why you need CNN for this ......
My suggestion would be to build/train a classification model
comprising of only GOOD images of glass. Here you would possibly
have all kinds of glasses that are intact with a regular shape.
If the model encounters anything other than GOOD images, it will
classify those as BAD images. This so called BAD images may
include cracked/broken glasses having an irregular shape.
Another option that might work is to use an autoencoder.
Autoencoders are unsupervised neural networks with bottleneck architecture that try to reconstruct its own input.
We could train a deep convolutional autoencoder with examples of good glasses so that it gets specialized in reconstructing those type of images. You don't need to train autoencoder with bad glasses.
Therefore I would expect the trained autoencoder to produce low error rate for good glasses and high error rate for bad glasses. Error rate could be measured with MSE based on the difference between the reconstructed and original values (pixels).
From the trained autoencoder you can plot the MSEs for good vs bad glasses to help you define the right threshold. Or you can also try statistic thresholds such as: avg + 2*std, median + 2*MAD, etc.
Autoencoder details:
http://ufldl.stanford.edu/tutorial/unsupervised/Autoencoders/
Deep autoencoder for images:
https://cds.cern.ch/record/2209085/files/Outlier%20detection%20using%20autoencoders.%20Olga%20Lyudchick%20(NMS).pdf
I'm working on binary classification problem using Apache Mahout. The algorithm I use is OnlineLogisticRegression and the model which I currently have strongly tends to produce predictions which are either 1 or 0 without any middle values.
Please suggest a way to tune or tweak the algorithm to make it produce more intermediate values in predictions.
Thanks in advance!
What is the test error rate of the classifier? If it's near zero then being confident is a feature, not a bug.
If the test error rate is high (or at least not low), then the classifier might be overfitting the training set: measure the difference between of the training error and the test error. In that case, increasing regularization as rrenaud suggested might help.
If your classifier is not overfitting, then there might be an issue with the probability calibration. Logistic Regression models (e.g. using the logit link function) should yield good enough probability calibrations (if the problem is approximately linearly separable and the label not too noisy). You can check the calibration of the probabilities with a plot as explained in this paper. If this is really a calibration issue, then implementing a custom calibration based on Platt scaling or isotonic regression might help fix the issue.
From reading the Mahout AbstractOnlineLogisticRegression docs, it looks like you can control the regularization parameter lambda. Increasing lambda should mean your weights are closer to 0, and hence your predictions are more hedged.