In theory as we increase the number of layers of neural network the training error must decrease. But Practically it decreases up to a point and increases. Why does this happen?(As mentioned by Andrew Ng in a course)
Because it became a big enough to remember training data and lost its generalization capability.
Related
I have a dataset of about a 100 numeric values. I have set the learning rate of my neural network to 0.0001. I have successfully trained it on the dataset for over 1 million times. But my question is that what is the effect of very low learning rates in the neural networks ?
Low learning rate mainly implies slow convergence: you're moving down the loss function with smaller steps (the step size is the learning rate).
If your function is convex this is not a problem, you will wait more but you'll reach a good solution.
If, as in the case of deep neural networks, your function is not convex than a low learning rate could lead to the reaching of a "good" optimum that's not the best one (getting stuck in a local minimum, without making steps as big as required to jump out of it).
That's why there are different optimization algorithms that are adaptive: such algorithm, like ADAM, RMSProp, ... have different learning rates for each weight in the network (every single learning rate starts from the same value). In this way, the optimization algorithm can work on every single parameter independently with the aim of finding a better solution (and letting the chose of the initial learning rate less critical)
I was trying to train an emotion recognition model on the fer2013 dataset using the architecture proposed in this paper
The paper uses different dataset than mine so I did some modifications on on the stride and filter size.
After a couple hours of training, accuracy on both training and test set suddenly drops.
After that the accuracy just stay around 0.1-0.2 for both set, never improve anymore.
Does anybody know about this phenomenon?
In any neural network training, if both accuracies i.e. training and validation improves at first and then starts decreasing, it is a sign that your network is failing to converge. More appropriately, your optimizer has started overshooting.
One most likely reason for this could be high learning rate. Reduce your learning rate and then check your example again. Also, in your linked paper, (at least in first glimpse), I couldn't see learning rate mentioned. Since your data is different from the paper's, same learning rate might not work as well.
When stacking Boltzmann machines to generatively pre-train a deep neural net, how accurate do the reconstructions need to be? If they are too accurate, can overfitting be a concern? Or is excessively high accuracy only a red flag when doing discriminative fine-tuning?
What is a concern is not burning in the markov chains enough to suppress high energy areas in training set which are far from the initial values. This is typical using CD (1) or any low order contrastive divergence. That said, these methods will always typically intialise weights far from local optima that non-pre-trained nets would get stuck in.
RBMs are also trained with simulated annealing so are more likely to explore more of the parameter space.
I also recommend you read the paper Understanding deep learning requires rethinking generalization by Zhang et al. It basically shows how these networks practically completely memorise the probabiliy distributions and can still generalise.
I'm trying to classify hotel image data using Convolutional neural network..
Below are some highlights:
Image preprocessing:
converting to gray-scale
resizing all images to same resolution
normalizing image data
finding pca components
Convolutional neural network:
Input- 32*32
convolution- 16 filters, 3*3 filter size
pooling- 2*2 filter size
dropout- dropping with 0.5 probability
fully connected- 256 units
dropout- dropping with 0.5 probability
output- 8 classes
Libraries used:
Lasagne
nolearn
But, I'm getting less accuracy on test data which is around 28% only.
Any possible reason for such less accuracy? Any suggested improvement?
Thanks in advance.
There are several possible reasons for low accuracy on test data, so without more information and a healthy amount of experimentation, it will be impossible to provide a concrete answer. Having said that, there are a few points worth mentioning:
As #lejlot mentioned in the comments, the PCA pre-processing step is suspicious. The fundamental CNN architecture is designed to require minimal pre-processing, and it's crucial that the basic structure of the image remains intact. This is because CNNs need to be able to find useful, spatially-local features.
For detecting complex objects from image data, it's likely that you'll benefit from more convolutional layers. Chances are, given the simple architecture you've described, that it simply doesn't possess the necessary expressiveness to handle the classification task.
Also, you mention you apply dropout after the convolutional layer. In general, the research I've seen indicates that dropout is not particularly effective on convolutional layers. I personally would recommend removing it to see if it has any impact. If you do wind up needing regularization on your convolutional layers, (which in my experience is often unnecessary since the shared kernels often already act as a powerful regularizer), you might consider stochastic pooling.
Among the most important tips I can give is to build a solid mechanism for measuring the quality of the model and then experiment. Try modifying the architecture and then tuning hyper-parameters to see what yields the best results. In particular, make sure to monitor training loss vs. validation loss so that you can identify when the model begins overfitting.
After 2012 Imagenet, all convolutional neural networks which performs good(state of the art) are adding more convolutional neural network, they even use zero padding to increase the convolutional neural network.
Increase the number of convolutional neural network.
Some says that dropout is not that effective on CNN, however it is not bad to use, but
You should lower the dropout value, you should try it(May be 0.2).
Data should be analysed. If it is low,
You should use data augmentation techniques.
If you have more data in one of the labels,
You are stuck with the imbalanced data problem. But you should not consider it for now.
You can
Fine-Tune from VGG-Net or some other CNN's should be considered.
Also, don't convert to grayscale, after image-to-array transformation, you should just divide 225.
I think that you learned CNN from some tutorial(MNIST) and you think that you should turn it to grayscale.
I write my own neural net library with backpropagation using gpu computing.
Want to make it universal, that I dont must check if the training set fits to the gpu memory.
How do you train a neural net, when the training set is too large to fit in gpu memory?
I assume that it fits in RAM of the host.
Must I do the train iteration on the firts piece, then deallocate it on the device and send the second piece to the device and train on that, so on ...
And then sum up the gradient results.
Is it not too slow, when i must push all the data trough the PCIe bus?
Have you a better idea?
Use minibatch gradient descent: in a loop,
send a batch of samples to the GPU
compute error, backprop gradient
adjust parameters.
Repeat this loop several times until the network converges.
This is not exactly equivalent to the naive batch learning algorithm (batch gradient descent): in fact it usually converges faster than batch learning. It helps if you randomly shuffle the samples before each training loop. So you still have the memory transfers, but you don't need as many iterations and the algorithm will run faster.