Neural network convergence - machine-learning

Neural network convergence - machine-learning

I have deep conv network that recognizes 64*96 1 channel images.
I have forgot to add normalization to images: (image - image.mean()) / stddev.
Network converged very fast to 85% or something, but gave a HUGE loss.
I've found my error, added normalization like this:
image = (image - image.mean()) / np.std(image)
For some reason it stopped converge at all after that.
I've tried to increase learning rate but it did not help at all.
Could anyone please help me to understand what is actually happening?
UPDATE: Changed
np.std(image, axis = 0)
to
np.std(image)

I think the problem you are facing is due to very high learning rates. High learning rates causes training to give huge loss values at the start and then failing to converge. You should use a lower learning rate from the start and see if the network converges. You can also follow a strategy of reducing learning rate as network is being trained. This link will be very helpful for you in training your network.

Related

How can one get state of the art results on Cifar10 with the All Convolution Net?

I was trying to replicate the results of Striving for Simplicity: The All Convolutional Net but I seem unable to. I've copied all the hyperparams from the implementation provided by StefOe and I'm not sure what to do. I get ~0.2 error instead of ~0.1. Has anyone been able to replicate this result on Pytorch?
The training looks fine, so I'm not sure what is going wrong:
Other related links:
https://discuss.pytorch.org/t/pytorch-net-from-striving-for-simplicity-the-all-convolutional-net/19297/2
Code:
https://github.com/StefOe/all-conv-pytorch/blob/master/allconv.py
https://github.com/StefOe/all-conv-pytorch/blob/master/cifar10.ipynb

You can at least get 13% error when using SGD with lr 0.01, momentum 0.9 and batch-size 32. I use the data augmentation as given in the implementation. In the paper, also a weight decay of 0.001 is used. For me that lead to 14%.
So the implementation uses a different learning rate and also no whitening and contrast normalization is used. Maybe those last two are the key to get to 7.25%.
Anyway, the results are pretty weird, since without data augmentation the network should already reach 9.08%.

Machine learning for lighting direction estimation from pictures?

Newcomer to machine learning and stackoverflow.
Recently, I have been trying to create a machine learning algorithm that estimates the direction of a light source based on the reflection of an object.
I know this may be a complicated subject and that's why, as a first step, i tried to simplify it as much as possible.
I first changed my problem from a regression problem to a classification problem by only taking as output : Light source is on left side of object or Light source is on the right side of the object.
I am also only making one angle vary for my dataset.
Short version of my question :
Do you think that it is possible to do such thing with machine learning ? (my experience is too limited to really be sure)
If yes, what would be the more suited neural network for you ? CNN ? R-CNN? LSTM ? SVM ?
What would be the pipeline to complete this task ?
I am currently using Unity Engine with directional light that takes a random X angle between [10,60] / [120,170] and a sphere with metallic reflection to create and label a dataset. Here is an example :
https://imgur.com/a/FxNew Label : 0 (Left side)
https://imgur.com/a/9KFhi Label : 1 (Right side)
For the pre-processing :
Images are resized to a 64x64 image
Transformed from RGB to grayscale format.
For the machine learning, i'm currently using tensorflow and a convolutional neural network with :
10000 Balanced, labeled data of 64x64 grayscale pictures as input and 0/1 as Label
3 Convolutional Layers with filter [16,32,64] with size [5,5] RELU
3 Pooling Layers with size [2,2] and stride [2,2]
1 Dense layer with 1024 Hidden neurons and dropout (Rate = 0.4) RELU
1 Dense Layer with 2 output neurons (1 for each class) Softmax
As for the issue : My network is simply not learning the loss hardly goes down and accuracy show that good result are random, whatever the data, the number of layer, optimizer, learning rate, ... My output just average between the two classes : [0.5 , 0.5].
My guess is that the problem is more complicated than i first thought, that my data doesn't give a good hint of what my prediction should be and that I should rather train a network that detects the reflection dot on an object and then use the orientation between the center of object and the dot. Am I right ?
Another guess is that the convolutional layer doesn't take position into account, so for the convolution part, all the images are the same since the sphere is always the same, as well as the lighting pattern. It will always detect the same thing and won't take into account that the light region has moved. Do you have any advice on which network I could use to resolve this issue ?
I'm really looking for some advice, warning on how to tackle this kind of task.
Please remember that I am still pretty new to machine learning and still learning more than my machines hehe...
Thank you.

Do you think that it is possible to do such thing with machine learning ?
Absolutely. And you've correctly chosen a CNN model - it's the best suited for this task.
My guess is that the problem is more complicated than i first thought, that my data doesn't give a good hint of what my prediction should be and that I should rather train a network that detects the reflection dot on an object and then use the orientation between the center of object and the dot. Am I right ?
No, CNN has proven to classify pretty well from the raw pixels. It should figure out itself what to pay attention to.
Do you have any advice on which network I could use to resolve this issue ?
I would be great if you provide your full code. There are so many reasons for not learning: image pre-processing bugs, data mislabeling, poor choice of hyperparameters (learning rate, initialization, ...), wrong loss function, etc. There can be simply bugs.
What I suggest right away, based on described CNN architecture:
5x5 filter size is probably too large, since you don't have that many filters. Try 3x3 and increase the number of filters a bit, e.g. 32 - 64 - 64.
I assume that you use CONV - POLL - CONV - POLL - CONV - POOL, not CONV - CONV - CONV - POOL - POOL - POLL. Just to make sure.
You probably don't need so many neurons in your FC layer. You have just two classes and pretty similar images! Reduce 1024 to say 256.
You don't experience any overfitting at the moment, so disable the dropout for now: keep_probability=1.0.
Pay attention to initialization and learning rate. Try different values in log-scale, e.g. learning_rate = 0.1, 0.01, 0.001 and check if learning pattern ever changes.

Thank you to #Maxim for his answer. It was very helpful and helped me solve my problem as well as refining my network.
He pointed me out to the problem : Data Mislabeling.
I was pretty sure about my data labeling but verified anyway.
The problem was there...
I write the answer here so it can maybe help other unaware tensorflow users :
When you use tf.string_input_producer without specifying it, the default is : "Shuffle = True" which shuffles your filenames queue.
Since i use a .csv file for the labels and .png folder for the images, the labels where read in order from 1 to 10000 whereas the .png files were read randomly.
I feel very dumb about it, but that's how you learn hehe.

Is it normal to get big error in backpropagation neural network when I using the same data training and data test?

I'm doing some programming with neural network backpropagation.
I have about 90 datas and doing some training with all data for data training (90 datas) and same data for data test (90 datas). I'm using iteration threshold about 2 iteration to test it and it gave me quite big error (About 60% with MAPE/Mean Absolute Square Error).
I'm afraid I've got the algorithm wrong since the only way to get training error less than threshold 10% is using iteration threshold around 3000k iteration and it's training takes quite a long time (I'm not using momentum. Just a Backpropagation Neural Network). But the test accuracy around 95-99% after that using said condition.
Is this normal? Or my program is work as it shouldn't be?

Of course, it will depend on the data set used, but I wouldn't be surprised if you get an error below 1% even for highly nonlinear data (I've seen this for example in sales data). As long as you separate training and test data sets, the error is expected to rise, but with the same set, it should drop to zero if there are enough hidden units. The capacity of an ANN to fit nonlinear data is huge (and, of course, the more fitted, the less general).
So, I would look for some program bug instead.

You say 3000k iteration, but i'm assume you mean 3k or 3000. The other answer says there might a bug in your code, but 3000 iterations for a problem with 90 samples is definitely normal.
You cannot expect a neural network to fit a training set with just 2 iterations, especially with a low learning rate.
TL;DR - you have nothing to worry. 3000 iterations is fine.

Interpretation of Neural Network (CNN) Result / Accuracy

I'm kind of new to the subject and build a convolutional neural network based on google's tensorflow. I wanted to classify a test data set of pictures belonging to 10 categories. My CNN setup is aligned to the tensorflow tutorial with some amendmends to meet my images' size.
I ran the trainig step repeatedly for 20 times over a random sample of 500 images and then repeated that step for 50 times on different samples of size 500. I used a sample of 200 as validation data set (kept this fixed for all runs). As a result I got an accuracy of about 35%, which isn't to bad in my eyes, since I didn't do any optimizations and the images are kind of hard to assign to a single category evan for humans.
So here are my questions:
Does it really make sense to run a step for 20 times over the same batch? (I did this becuase it's about what fits in the RAM and loading a new batch took quite a while - so I could get more runs in less time)
In the training accuracy diagram (see below) there's a jump at some point around step 120-130. From there on the accuracy goes up close to 100% for each 20-run of the same random batch. What does that jump mean in terms of network structure / learning?

Your spikes are likely due to the network overfitting on the batch that you are repeatedly showing it, while not really learning something that is useful in general. This also answers your first question - in this case, it doesn't make sense.

Things to try when Neural Network not Converging

One of the most popular questions regarding Neural Networks seem to be:
Help!! My Neural Network is not converging!!
See here, here, here, here and here.
So after eliminating any error in implementation of the network, What are the most common things one should try??
I know that the things to try would vary widely depending on network architecture.
But tweaking which parameters (learning rate, momentum, initial weights, etc) and implementing what new features (windowed momentum?) were you able to overcome some similar problems while building your own neural net?
Please give answers which are language agnostic if possible. This question is intended to give some pointers to people stuck with neural nets which are not converging..

If you are using ReLU activations, you may have a "dying ReLU" problem. In short, under certain conditions, any neuron with a ReLU activation can be subject to a (bias) adjustment that leads to it never being activated ever again. It can be fixed with a "Leaky ReLU" activation, well explained in that article.
For example, I produced a simple MLP (3-layer) network with ReLU output which failed. I provided data it could not possibly fail on, and it still failed. I turned the learning rate way down, and it failed more slowly. It always converged to predicting each class with equal probability. It was all fixed by using a Leaky ReLU instead of standard ReLU.

If we are talking about classification tasks, then you should shuffle examples before training your net. I mean, don't feed your net with thousands examples of class #1, after thousands examples of class #2, etc... If you do that, your net most probably wouldn't converge, but would tend to predict last trained class.

I had faced this problem while implementing my own back prop neural network. I tried the following:
Implemented momentum (and kept the value at 0.5)
Kept the learning rate at 0.1
Charted the error, weights, input as well as output of each and every neuron, Seeing the data as a graph is more helpful in figuring out what is going wrong
Tried out different activation function (all sigmoid). But this did not help me much.
Initialized all weights to random values between -0.5 and 0.5 (My network's output was in the range -1 and 1)
I did not try this but Gradient Checking can be helpful as well

If the problem is only convergence (not the actual "well trained network", which is way to broad problem for SO) then the only thing that can be the problem once the code is ok is the training method parameters. If one use naive backpropagation, then these parameters are learning rate and momentum. Nothing else matters, as for any initialization, and any architecture, correctly implemented neural network should converge for a good choice of these two parameters (in fact, for momentum=0 it should converge to some solution too, for a small enough learning rate).
In particular - there is a good heuristic approach called "resillient backprop" which is in fact parameterless appraoch, which should (almost) always converge (assuming correct implementation).

after you've tried different meta parameters (optimization / architecture), the most probable place to look at is - THE DATA
as for myself - to minimize fiddling with meta parameters, i keep my optimizer automated - Adam is by opt-of-choice.
there are some rules of thumb regarding application vs architecture... but its really best to crunch those on your own.
to the point:
in my experience, after you've debugged the net (the easy debugging), and still don't converge or get to an undesired local minima, the usual suspect is the data.
weather you have contradictory samples or just incorrect ones (outliers), a small amount can make the difference from say 0.6-acc to (after cleaning) 0.9-acc..
a smaller but golden (clean) dataset is much better than a big slightly dirty one...
with augmentation you can tweak results even further.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart