I've set up a GAN that works for generating mnist digits, and I am now trying to get it working with cars from cifar10. However, the outputs for cars from cifar10 are usually monochrome and periodically switch to different colors as the generator loss decreases and the discriminator loss increases.
Here are the generator and discriminator loss graphs when training on mnist:
In contrast, the losses for cifar10 are flat for long periods before suddenly spiking up:
For cifar10 the GAN generates nearly monochrome images of varying shades for example:
After about 10 epochs it starts outputting interesting textures that don't bear any resemblance to the images:
Letting it run for 50 epochs doesn't improve image quality. The pattern in the loss becomes less apparent, but you can still see the regular spikes at the start of training:
I've tried messing with the learning rates, activation functions, number of layers, convolutional kernel size, noisy labels, flipped labels, noisy discriminator input and so forth, but nothing seems to work. It seems more likely that there is a larger problem than just incorrect hyperparameters.
Does anyone recognize this abnormal pattern in the loss?
Related
Iam a little bit confused about how to normalize/standarize image pixel values before training a convolutional autoencoder. The goal is to use the autoencoder for denoising, meaning that my traning images consists of noisy images and the original non-noisy images used as ground truth.
To my knowledge there are to options to pre-process the images:
- normalization
- standarization (z-score)
When normalizing using the MinMax approach (scaling between 0-1) the network works fine, but my question here is:
- When using the min max values of the training set for scaling, should I use the min/max values of the noisy images or of the ground truth images?
The second thing I observed when training my autoencoder:
- Using z-score standarization, the loss decreases for the two first epochs, after that it stops at about 0.030 and stays there (it gets stuck). Why is that? With normalization the loss decreases much more.
Thanks in advance,
cheers,
Mike
[Note: This answer is a compilation of the comments above, for the record]
MinMax is really sensitive to outliers and to some types of noise, so it shouldn't be used it in a denoising application. You can use quantiles 5% and 95% instead, or use z-score (for which ready-made implementations are more common).
For more realistic training, normalization should be performed on the noisy images.
Because the last layer uses sigmoid activation (info from your comments), the network's outputs will be forced between 0 and 1. Hence it is not suited for an autoencoder on z-score-transformed images (because target intensities can take arbitrary positive or negative values). The identity activation (called linear in Keras) is the right choice in this case.
Note however that this remark on activation only concerns the output layer, any activation function can be used in the hidden layers. Rationale: negative values in the output can be obtained through negative weights multiplying the ReLU output of hidden layers.
I'm doing a AlexNet fine tuning for face detection following this: link
The only difference with the link is that I am using another dataset (facescrub and some images from imagenet as negative examples).
I noticed the accuracy increasing too fast, in 50 iterations it goes from 0.308 to 0.967 and when it is about 0.999 I stop the training and use the model using the same python script as the above link.
I use for testing an image from the dataset and the result is nowhere near good, test image result. As you can see the box in the faces is too big (and the dataset images are tightly cropped), not to mention the box not containing a face.
My solver and train_val files are exactly the same, only difference is batch sizes and max iter size.
The reason was that my dataset has way more face examples than non-face examples. I tried the same setup with the same number of positive and negative examples and now the accuracy increases slower.
I am testing printed digits (0-9) on a Convolutional Neural Network. It is giving 99+ % accuracy on the MNIST Dataset, but when I tried it using fonts installed on computer (Ariel, Calibri, Cambria, Cambria math, Times New Roman) and trained the images generated by fonts (104 images per font(Total 25 fonts - 4 images per font(little difference)) the training error rate does not go below 80%, i.e. 20% accuracy. Why?
Here is "2" number Images sample -
I resized every image 28 x 28.
Here is more detail :-
Training data size = 28 x 28 images.
Network parameters - As LeNet5
Architecture of Network -
Input Layer -28x28
| Convolutional Layer - (Relu Activation);
| Pooling Layer - (Tanh Activation)
| Convolutional Layer - (Relu Activation)
| Local Layer(120 neurons) - (Relu)
| Fully Connected (Softmax Activation, 10 outputs)
This works, giving 99+% accuracy on MNIST. Why is so bad with computer-generated fonts? A CNN can handle lot of variance in data.
I see two likely problems:
Preprocessing: MNIST is not only 28px x 28px, but also:
The original black and white (bilevel) images from NIST were size normalized to fit in a 20x20 pixel box while preserving their aspect ratio. The resulting images contain grey levels as a result of the anti-aliasing technique used by the normalization algorithm. the images were centered in a 28x28 image by computing the center of mass of the pixels, and translating the image so as to position this point at the center of the 28x28 field.
Source: MNIST website
Overfitting:
MNIST has 60,000 training examples and 10,000 test examples. How many do you have?
Did you try dropout (see paper)?
Did you try dataset augmentation techniques? (e.g. slightly shifting the image, probably changing the aspect ratio a bit, you could also add noise - however, I don't think those will help)
Did you try smaller networks? (And how big are your filters / how many filters do you have?)
Remarks
Interesting idea! Did you try simply applying the trained MNIST network on your data? What are the results?
It may be an overfitting problem. It could happen when your network is too complex for the problem to resolve.
Check this article: http://es.mathworks.com/help/nnet/ug/improve-neural-network-generalization-and-avoid-overfitting.html
It definitely looks like an issue of overfitting. I see that you have two convolution layers, two max pooling layers and two fully connected. But how many weights total? You only have 96 examples per class, which is certainly smaller than the number of weights you have in your CNN. Remember that you want at least 5 times more instances in your training set than weights in your CNN.
You have two solutions to improve your CNN:
Shake each instance in the training set. You each number about 1 pixel around. It will already multiply your training set by 9.
Use a transformer layer. It will add an elastic deformation to each number at each epoch. It will strengthen a lot the learning by artificially increase your training set. Moreover, it will make it much more effective to predict other fonts.
I am attempting to train a 2 hidden layer tanh neural neural network on the MNIST data set using the ADADELTA algorithm.
Here are the parameters of my setup:
Tanh activation function
2 Hidden layers with 784 units (same as the number of input units)
I am using softmax with cross entropy loss on the output layer
I randomly initialized weights with a fanin of ~15, and gaussian distributed weights with standard deviation of 1/sqrt(15)
I am using a minibatch size of 10 with 50% dropout.
I am using the default parameters of ADADELTA (rho=0.95, epsilon=1e-6)
I have checked my derivatives vs automatic differentiation
If I run ADADELTA, at first it makes gains in the error, and it I can see that the first layer is learning to identify the shapes of digits. It does a decent job of classifying the digits. However, when I run ADADELTA for a long time (30,000 iterations), it's clear that something is going wrong. While the objective function stops improving after a few hundred iterations (and the internal ADADELTA variables stop changing), the first layer weights still have the same sparse noise they were initialized with (despite real features being learned on top of that noise).
To illustrate what I mean, here is the example output from the visualization of the network.
Notice the pixel noise in the weights of the first layer, despite them having structure. This is the same noise that they were initialized with.
None of the training examples have discontinuous values like this noise, but for some reason the ADADELTA algorithm never reduces these outlier weights to be in line with their neighbors.
What is going on?
I'm training my neural network to classify some things in an image. I crop 40x40 pixels images and classify it that it as some object or not. So it has 1600 input neurons, 3 hidden layers (500, 200, 30) and 1 output neuron that must say 1 or 0. I use the Flood library.
I cannot train it with QuasiNewtonMethod, because it uses a big matrix in the algorithm and it do not fit in my memory. So I use GradientDescent and the ObjectiveFunctional is NormalizedSquaredError.
The problem is that by training it overflows the weights and the output of the neural network is INF or NaN for every input.
Also my dataset is too big (about 800mb when it is in CSV) and I can't load it fully. So I made many InputTargetDataSets with 1000 instances and saved it as XML (the default format for Flood) and training it for one epoch on each dataset randomly shuffled. But also when I train it just on one big dataset (10000 instances) it overflows.
Why is this happening and how can I prevent that?
I would recommend normalization of inputs. You should also think about that if you have 1600 neurons..output of input layer will sum(if sigmoid neurons) and there can be many problems.
It is quite useful to print out some steps..for example in which step it overflows.
There are some tips for weights of neurons. I would recommend very small < 0.01. Maybe if you could give more info about NN and intervals of inputs, weights etc. I could give you some other ideas.
And btw I think it is mathematically proved that two layers should be enough so there is no need for three hidden layers if you are not using some specialized algorithms which simulate human eye..