Pretraining error increases in each epoch in Deep Belief Network - machine-learning

I am using this implementation of DBN.
http://deeplearning.net/tutorial/code/DBN.py
I am using ecg data to train the model which contains 100 float values (in milivolt unit) per row.
When I run this implementation the pretraining cost goes on increasing I dont understand why.
I am attaching sample input data files and the code of the DBN where I have modified number of input and output units and batchsize. I have modified the 'load_data' code in logistic_sgd.py so I am attaching that file too.
Here is the scenario:
Why this is happening? Where I am going wrong?
Link to code and data files:
https://drive.google.com/open?id=0B02Uz-muAJWWVktyaDFOekU5Ulk

Related

Torch, how to get a tensor of loss values during batch optimization

I am training a network with batch optimization over my training set, and I would like to get a loss vector containing the loss of each of my training examples.
More specifically I am using images (of size 3x64x64) in a batch of size 64. Therefore my input is a tensor of size 64x3x64x64.
During training when I write
output = net:forward(input)
loss = criterion:forward(input, target)
loss is a number, but I would like to get a tensor (of size 64) with one entry per image in my batch, corresponding to the loss value of this precise image.
Is there a way to do that without looping on the first dimension of my input tensor?
The forward method calls another method, the updateOutput method which can be overwritten.
For eg., in case of MSECriterion(), you can change the method by commenting the call to the THNN library and write on your own how you want the criterion to function, i.e., do a normal element wise subtraction and then square(again element wise) and divide by the total number of data points(again element wise); then return the output as a tensor.
You will also need to recompile the nn package once you have changed this using luarocks make rocks/[the scm file in the folder] after navigating to the nn folder.

How to save feature values of all batch data from pretrained torch networks?

Now I'm using fb torch library from github fb torch resnet
It's my first time to use torch and lua, so Im encountering some problems.
My goal is to save the feature vector of specific layer (last avg pooling of resnet) into a one file with the class of the input image. All input images are from cifar-10 db.
The file format that i want to get is like belows
image1.txt := class index of image and feature vector of image 1 of cifar-10
image2.txt := class index of image and feature vector of image 2 of cifar-10
// and so on through all images of cifar-10
Now I have seen some sample code of that github extract-features.lua
Because it's my first time for lua, I feel so hard to understand this code and to modify to the way i want. And i don't want my data to save into t7 file format.
How can i access only one specific layer from network in torch via lua? (last average pooling)
How can i access values of the layer and classification result index?
How can read all each images from cifar-10 db file(t7 batch)?
Sorry for too many questions. But im feeling hard using torch because of pool amouns of community threads and posting of torch.. please understand me.
How can i access only one specific layer from network in torch via lua? (last average pooling)
To access each layer you just have to load the model and get it using an integer number. If you do print model you will be able to see in which position the last average pooling is.
model = torch.load(path_to_model):cuda()
avg_pooling_layer = model:get(position_of_the_avg_pooling_layer)
How can i access values of the layer and classification result index?
I do not quite understand what you mean by this. If you want to see the output or the weights from a specific layer. (following the code above) You need to get these elements from the layer table. Again, to see which ones are the possible elements to get use print avg_pooling_layer
weights = avg_pooling_layer.weight -- get the weights of the layer
output = avg_pooling_layer.output -- get the output of the layer
How can read all each images from cifar-10 db file(t7 batch)?
To read the images from a t7 file use the torch function torch.load. (used before to load the model).
cifar_10 = torch.load("path_to_cifar-10.t7")
Once loaded you could have the training and test set in subtables or functions. Again, print the table and visualize which values are the ones you need to get.
Hope this helps!

How does Caffe determine test set accuracy?

Using the BVLC reference AlexNet file, I have been training a CNN against a training set I created.  In order to measure the progress of training, I have been using a rough method to approximate the accuracy against the training data.  My batch size on the test net is 256.  I have ~4500 images.  I perform 17 calls to solver.test_nets[0].forward() and record the value of solver.test_nets[0].blobs['accuracy'].data (the accuracy of that forward pass).  I take the average across these.  My thought was that I was taking 17 random samples of 256 from my validation set and getting the accuracy of these random samplings.  I would expect this to closely approximate the true accuracy against the entire set.  However, I later went back and wrote a script to go through each item in my LMDB so that I could generate a confusion matrix for my entire test set.  I discovered that the true accuracy of my model was significantly lower than the estimated accuracy.  For example, my expected accuracy of ~75% dropped to ~50% true accuracy.  This is a far worse result than I was expecting.
My assumptions match the answer given here.
Have I made an incorrect assumption somewhere?  What could account for the difference?  I had assumed that forward() function gathered a random sample, but I'm not so sure that was the case.  blobs.['accuracy'].data returned a different result (though usually within a small range) everytime, so this is why I assumed this.
I had assumed that forward() function gathered a random sample, but I'm not so sure that was the case. blobs.['accuracy'].data returned a different result (though usually within a small range) everytime, so this is why I assumed this.
The forward() function from Caffe does not perform any random sampling, it will only fetch the next batch according to your DataLayer. E.g., in your case forward() will pass the next 256 images in your network. Performing this 17 times will pass sequentially 17x256=4352 images.
Have I made an incorrect assumption somewhere? What could account for the difference?
Check that the script that goes through your whole LMDB performs the same data pre-processing as during training.

Perceptron learns to reproduce just one pattern all the time

This is rather a weird problem.
A have a code of back propagation which works perfectly, like this:
Now, when I do batch learning I get wrong results even if it concerns just a simple scalar function approximation.
After training the network produces almost the same output for all input patterns.
By this moment I've tried:
Introduced bias weights
Tried with and without updating of input weights
Shuffled the patterns in batch learning
Tried to update after each pattern and accumulating
Initialized weights in different possible ways
Double-checked the code 10 times
Normalized accumulated updates by the number of patterns
Tried different layer, neuron numbers
Tried different activation functions
Tried different learning rates
Tried different number of epochs from 50 to 10000
Tried to normalize the data
I noticed that after a bunch of back propagations for just one pattern, the network produces almost the same output for large variety of inputs.
When I try to approximate a function, I always get just line (almost a line). Like this:
Related question: Neural Network Always Produces Same/Similar Outputs for Any Input
And the suggestion to add bias neurons didn't solve my problem.
I found a post like:
When ANNs have trouble learning they often just learn to output the
average output values, regardless of the inputs. I don't know if this
is the case or why it would be happening with such a simple NN.
which describes my situation closely enough. But how to deal with it?
I am coming to a conclusion that the situation I encounter has the right to be. Really, for each net configuration, one may just "cut" all the connections up to the output layer. This is really possible, for example, by setting all hidden weights to near-zero or setting biases at some insane values in order to oversaturate the hidden layer and make the output independent from the input. After that, we are free to adjust the output layer so that it just reproduces the output as is independently from the input. In batch learning, what happens is that the gradients get averaged and the net reproduces just the mean of the targets. The inputs do not play ANY role.
My answer can not be fully precise because you have not posted the content of the functions perceptron(...) and backpropagation(...).
But from what I guess, you train your network many times on ONE data, then completely on ONE other in a loop for data in training_data, which leads that your network will only remember the last one. Instead, try training your network on every data once, then do that again many times (invert the order of your nested loops).
In other word, the for I = 1:number of patterns loop should be inside the backpropagation(...) function's loop, so this function should contain two loops.
EXAMPLE (in C#):
Here are some parts of a backpropagation function, I simplified it here. At each update of the weights and biases, the entire network is "propagated". The following code can be found at this URL: https://visualstudiomagazine.com/articles/2015/04/01/back-propagation-using-c.aspx
public double[] Train(double[][] trainData, int maxEpochs, double learnRate, double momentum)
{
//...
Shuffle(sequence); // visit each training data in random order
for (int ii = 0; ii < trainData.Length; ++ii)
{
//...
ComputeOutputs(xValues); // copy xValues in, compute outputs
//...
// Find new weights and biases
// Update weights and biases
//...
} // each training item
}
Maybe what is not working is just that you want to enclose everything after this comment (in Batch learn as an example) with a secondary for loop to do multiple epochs of learning:
%--------------------------------------------------------------------------
%% Get all updates

what's meaning of function predict's returned value in OpenCV?

I use function predict in opencv to classify my gestures.
svm.load("train.xml");
float ret = svm.predict(mat);//mat is my feature vector
I defined 5 labels (1.0,2.0,3.0,4.0,5.0), but in fact the value of ret are (0.521220207,-0.247173533,-0.127723947······)
So I am confused about it. As Opencv official document, the function returns a class label (classification) in my case.
update: I don't still know why to appear this result. But I choose new features to train models and the return value of predict function is what I defined during train phase (e.g. 1 or 2 or 3 or etc).
During the training of an SVM you assign a label to each class of training data.
When you classify a sample the returned result will match up with one of these labels telling you which class the sample is predicted to fall into.
There's some more documentation here which might help:
http://docs.opencv.org/doc/tutorials/ml/introduction_to_svm/introduction_to_svm.html
With Support Vector Machines (SVM) you have a training function and a prediction one. The training function is to train your data and save those informations on an xml file (it facilitates the prediction process in case you use a huge number of training data and you must do the prediction function in another project).
Example : 20 images per class in your case : 20*5=100 training images,each image is associated with a label of its appropriate class and all these informations are stocked in train.xml)
For the prediction function , it tells you what's label to assign to your test image according to your training DATA (the hole work you did in training process). Your prediction results might be good and might be bad , it's all about your training data I think.
If you want try to calculate the error rate for your classifier to see how much it can give good results or bad ones.

Resources