I tried to use a neural network to predict some data. I used the
MATLAB neural network fitting toolbox and I could predict some tests.
But the problem is the accuracy is not good enough for my results.
I tried to change the neuron numbers to change accuracy, but it was not good.
I wanted to change the trainer function, but I didn't find anything.
For example, I want to command MATLAB's toolbox to try to train until the accuracy is less than 0.1.
What should I do?
You can set net.trainParam.goal to set the performance goal or increase net.trainParam.epochs to increase the maximum number of epochs to train.
Related
I was using Keras' CNN to classify MNIST dataset. I found that using different batch-sizes gave different accuracies. Why is it so?
Using Batch-size 1000 (Acc = 0.97600)
Using Batch-size 10 (Acc = 0.97599)
Although, the difference is very small, why is there even a difference?
EDIT - I have found that the difference is only because of precision issues and they are in fact equal.
That is because of the Mini-batch gradient descent effect during training process. You can find good explanation Here that I mention some notes from that link here:
Batch size is a slider on the learning process.
Small values give a learning process that converges quickly at the
cost of noise in the training process.
Large values give a learning
process that converges slowly with accurate estimates of the error
gradient.
and also one important note from that link is :
The presented results confirm that using small batch sizes achieves the best training stability and generalization performance, for a
given computational cost, across a wide range of experiments. In all
cases the best results have been obtained with batch sizes m = 32 or
smaller
Which is the result of this paper.
EDIT
I should mention two more points Here:
because of the inherent randomness in machine learning algorithms concept, generally you should not expect machine learning algorithms (like Deep learning algorithms) to have same results on different runs. You can find more details Here.
On the other hand both of your results are too close and somehow they are equal. So in your case we can say that the batch size has no effect on your network results based on the reported results.
This is not connected to Keras. The batch size, together with the learning rate, are critical hyper-parameters for training neural networks with mini-batch stochastic gradient descent (SGD), which entirely affect the learning dynamics and thus the accuracy, the learning speed, etc.
In a nutshell, SGD optimizes the weights of a neural network by iteratively updating them towards the (negative) direction of the gradient of the loss. In mini-batch SGD, the gradient is estimated at each iteration on a subset of the training data. It is a noisy estimation, which helps regularize the model and therefore the size of the batch matters a lot. Besides, the learning rate determines how much the weights are updated at each iteration. Finally, although this may not be obvious, the learning rate and the batch size are related to each other. [paper]
I want to add two points:
1) When use special treatments, it is possible to achieve similar performance for a very large batch size while speeding-up the training process tremendously. For example,
Accurate, Large Minibatch SGD:Training ImageNet in 1 Hour
2) Regarding your MNIST example, I really don't suggest you to over-read these numbers. Because the difference is so subtle that it could be caused by noise. I bet if you try models saved on a different epoch, you will see a different result.
I am trying to implement Neural Networks for classifcation having 5 hidden layers, and with softmax cross entropy in the output layer. The implementation is in JAVA.
For optimization, I have used MiniBatch gradient descent(Batch size=100, learning rate = 0.01)
However, after a couple of iterations, the weights become "NaN" and the predicted values turn out to be the same for every testcase.
Unable to debug the source of this error.
Here is the github link to the code(with the test/training file.)
https://github.com/ahana204/NeuralNetworks
In my case, i forgot to normalize the training data (by subtracting mean). This was causing the denominator of my softmax equation to be 0. Hope this helps.
Assuming the code you implemented is correct, one reason would be large learning rate. If learning rate is large, weights may not converge and may become very small or very large which could be shown NaN. Try to lower learning rate to see if anything changes.
I am trying to design a neural network that makes a custom binary prediction.
Normally to do binary prediction, I would use a softmax as my last layer, and then my loss could be the difference between the prediction I made and the true binary value.
However, what if I don't want to use a softmax layer. Instead, I output a real valued number, and check if some condition on this number is true. In a really simple case, I check if this number is positive. If it is, I predict 1, else I predict 0. Let's say I want all the numbers to be positive, so the true predictions should be all 1, and then I want to train this network such that it outputs all positive numbers. I am confused as how to formulate a loss function for this problem, so that I am able to back propagate and train the network.
Does anyone have an idea how to create this kind of network?
I am confused as how to formulate a loss function for this problem, so
that I am able to back propagate and train the network.
Here's how you should approach it. Effectively, you need to transform the labels to positive and negative target values (say +1 and -1) and solve the regression problem. The loss function can be a simple L1 or L2 loss. The network will try to learn to output a prediction close to the training target, which you can afterwards interpret if it's closer to one target or another, i.e. positive or negative. You can even go ahead and make some targets larger (e.g. +2 or +10) to emphasize that these examples are very important. Example code: linear regression in tensorflow.
However, I simply have to warn you that your approach has serious drawbacks, see for instance this question. One outlier in training data can easily skew your predictions. Classification with softmax + cross-entropy loss is more stable, that's why almost always a better choice.
Is it possible that the MSE increases during training?
I'm currently calculating the MSE of the validation set per epoch and at a certrain point, the MSE starts to increase instead of decreasing. Does someone has an explanation for this behavior?
Answering your question: Yes, it is possible.
If you are using regularization or estochastic training it is normal some ups and downs on the MSE while training.
Some possible reasons to the problem
You are using a learning rate too high, which let to the problem of overshooting the local minima of the cost function.
The neural network is overfitting. Traning too much and loosing its capabilities to generalize.
What you can try:
When this starts to happen, reduce your learning rate.
Apply some kind of regularization on your network, like dropout, to avoid overfitting.
I'm using Pybrain to train a recurrent neural network. However, the average of the weights keeps climbing and after several iterations the train and test accuracy become lower. Now the highest performance on train data is about 55% and on test data is about 50%.
I think maybe the rnn have some training problems because of its high weights. How can I solve it? Thank you in advance.
The usual way to restrict the network parameters is to use a constrained error-functional which somehow penalizes the absolute magnitude of the parameters. Such is done in "weight decay" where you add to your sum-of-squares error the norm of the weights ||w||. Usually this is the Euclidian norm, but sometimes also the 1-norm in which case it is called "Lasso". Note that weight decay is also called ridge regression or Tikhonov regularization.
In PyBrain, according to this page in the documentation, there is available a Lasso-version of weight decay, which can be parametrized by the parameter wDecay.