Static / Constant Loss for Convolutional Neural Network in Torch - machine-learning

I have a 1-D time series classification problem, and I have imported the data into Torch. I have written two different networks to learn the data. Each row is to be labelled as either a 1 or a 0.
The problem is that the loss of the Convolutional Network does not fall after the first iteration. It stays at exactly the same value, after iteration one. This is not true for the other network - a Logistic Regression. The loss of that network does fall over time.
Below is the ConvNet:
model = nn.Sequential()
for i = 1, iteration do
model:add(nn.TemporalConvolution(1,1,3,1))
model:add(nn.BatchNormalization(1))
model:add(nn.ReLU())
model:add(nn.TemporalMaxPooling(3,2))
if i == iteration then
model:add(nn.Sigmoid())
end
end
Since the LogReg's loss does fall, I assume the problem is to do with the ConvNet itself, rather than anything else in the code.
Any advice would be much appreciated. I am happy to post more code if required.

Usually if there is no improvement when minimizing the loss function the model is already at a local(global) minimum.
This can have several reasons, e.g. learning-rate, regularization, data does not fit to the model in some way ...
It's hard to say just based on the model.
Did you use the exact same code for training in LogReg?
You can check out this tutorial for some information about temporalConvolution:
http://supercomputingblog.com/machinelearning/an-intro-to-convolutional-networks-in-torch/

Related

Neural Network Custom Binary Prediction

I am trying to design a neural network that makes a custom binary prediction.
Normally to do binary prediction, I would use a softmax as my last layer, and then my loss could be the difference between the prediction I made and the true binary value.
However, what if I don't want to use a softmax layer. Instead, I output a real valued number, and check if some condition on this number is true. In a really simple case, I check if this number is positive. If it is, I predict 1, else I predict 0. Let's say I want all the numbers to be positive, so the true predictions should be all 1, and then I want to train this network such that it outputs all positive numbers. I am confused as how to formulate a loss function for this problem, so that I am able to back propagate and train the network.
Does anyone have an idea how to create this kind of network?
I am confused as how to formulate a loss function for this problem, so
that I am able to back propagate and train the network.
Here's how you should approach it. Effectively, you need to transform the labels to positive and negative target values (say +1 and -1) and solve the regression problem. The loss function can be a simple L1 or L2 loss. The network will try to learn to output a prediction close to the training target, which you can afterwards interpret if it's closer to one target or another, i.e. positive or negative. You can even go ahead and make some targets larger (e.g. +2 or +10) to emphasize that these examples are very important. Example code: linear regression in tensorflow.
However, I simply have to warn you that your approach has serious drawbacks, see for instance this question. One outlier in training data can easily skew your predictions. Classification with softmax + cross-entropy loss is more stable, that's why almost always a better choice.

Do trained weights depend on the order in which trained data has been input?

Suppose one makes a neural network using Keras. Do the trained weights depend on the order in which the training data has been fed into the system ? Is it ok to feed data belonging to one category first and then data belonging to another category or should they be random?
As the training will be done in batches, which means optimizing the weights on data chunk by chunk, the main assumption is that the batches of data are somewhat representative of the dataset. To make it representative it is thus better to randomly sample the data.
Bottomline : It will theoritically learn better if you feed randomly the neural network. I strongly advise yo to shuffle your dataset when you feed it in training mode (and there is an option in the .fit() function).
In inference mode, if you only want to make a forward pass on the neural net, then the order doesn't matter at all since you don't change the weights.
I hope this clarifies things a bit for you :-)
Nassim answer is believed to be True for small networks and datasets but recent articles (or e.g. this one) makes us believe that for deeper networks (with more than 4 layers) - not shuffling your data set might be considered as some kind of regularization - as poor minima are expected to be deep but small and good minima are expected to be wide and hard to leave.
In case of inference time - the only way where this might harm your inference process is when you are using a training distribution of your data in a highly coupled manner - e.g. using BatchNormalization or Dropout like in a training phase (this is sometimes used for some kinds of Bayesian Deep Learning).

Reusing Inception V3 Conv Neural Net (tensorflow) with 0% accuracy

EDIT1: My code is the same as here, https://github.com/tensorflow/models/blob/master/inception/inception. The only difference is that I pack my files into TFRecords and feed it bactch wise. Also, the ratio of Class 0 : Class 1 is 70:30.
I'm currently working on a project in which I'm making use of inception-V3 CNN model to train a classifier. Currently, I am working on a binary classifier (either predict 1 or 0) but, my model only predicts class 0 for everything. While troubleshooting I've found that the probability of prediction is 100% for class 0 all the time. I have verified everything from the input queuing system to the eval and testing, everything seems to be working well too.
Strangely, the loss value reduces in a perfect semi-parabolic fashion which makes me think that the loss has converged to a local minima. Upon testing the script only churns out class 0(with 100% probability) each time. Another thing I've noticed is that the activation across various Conv layers are always constant which could imply that the neurons are just not firing at all.
My question is,
1. Is my model working ? The loss seems to converge but the activation across various layers seems to be stagnant.
2. I am using the training code available from the models section of the tensorflow (https://github.com/tensorflow/models/blob/master/inception/inception/inception_train.py)
I am reusing the train, eval and supporting code to train my model with a custom input pipeline created by me (which is also working). Can someone help guide me in the right direction on this?
Thanks.
Ik I am a little late in answering this question 😅
First of all your model didn't learn anything at all. All it did (cleverly 😂) was to predict class 0 for all cases so that it achieves a baseline accuracy of 70% without any effort. (Probably the model was lazy 😪😋) JK. This is a very well known problem in machine learning. This is called as class imbalance problem. Refer this http://machinelearningmastery.com/tactics-to-combat-imbalanced-classes-in-your-machine-learning-dataset/.
Apart from the techniques mentioned there. The one technique that works wonders is using class weights. That is, basically telling the network to be biased towards the weaker class. In your case class weights will be class0:class1 = 3:7. This is a hyperparameter too! But this is a good point to start.
Moreover, you didn't give any info about your dataset size. Whether you are fine tuning or training from scratch. Without them it's hard to speculate. By default I would suggest fine tuning.
Moreover, by loss you mean the training loss or validation loss? Because training loss has literally no info regarding the performance of the model. Moreover, in my opinion both training loss, and validation losses have very little info to derive meaningful insights about the model's performance. Use other metrics like confusion matrix,f1 score, recall, precision etc.
Finally, there is absolutely no single answer to your question. The only way is the hard way - you will learn along with the model 😉. Because I consider training a NN especially a CNN an art. In which, intuition plays a very crucial role coz, most of the times, the least expected changes would give the best results. Anyway that's the fun part of training a NN.
Happy training 💪
P.S: Try using the visualisation tools like gradcam to know whether the model is looking at the correct part of the image for classification. This is very important!

Having some troubles with the PyBrain Neural Network regression function

I would appreciate a some insights into the workings of the PyBrain's neural network. I have a dataset of different household features that correspond to a certain household income. The task is to create a regression based on neural networks to be able to predict the income for given features.
I've tried the simple constructor
pybrain.tools.shortcuts.buildNetwork(feature_count, 12, 1, recurrent=False)
and it kinda works. But if i change the hiddenlayer to use GaussianLayer or LinearLayer i am getting the NaNs as output during the training phase.
Is there maybe something else that needs to be taken care of when using these layers (I am guessing maybe feature selection, when they correlate)?
Thanks
I solved a neural network regression problem using pybrain where I had to forecast the load on a power station using weather features. This appears to be the same problem as yours, except in application. I followed the guide here: http://fastml.com/pybrain-a-simple-neural-networks-library-in-python/ which brought me 90% of the way towards the final solution. I had 8 inputs and one outputs.
One "gotcha" I found was that I had to normalise my input values to 0 -> 1. The MSE value would not decrease on each EPOCH otherwise. Also, if any of my input vaues were NaN, I got continuous Nan values out.
I hope this helps.

Echo state neural network?

Is anyone here who is familiar with echo state networks? I created an echo state network in c#. The aim was just to classify inputs into GOOD and NOT GOOD ones. The input is an array of double numbers. I know that maybe for this classification echo state network isn't the best choice, but i have to do it with this method.
My problem is, that after training the network, it cannot generalize. When i run the network with foreign data (not the teaching input), i get only around 50-60% good result.
More details: My echo state network must work like a function approximator. The input of the function is an array of 17 double values, and the output is 0 or 1 (i have to classify the input into bad or good input).
So i have created a network. It contains an input layer with 17 neurons, a reservoir layer, which neron number is adjustable, and output layer containing 1 neuron for the output needed 0 or 1. In a simpler example, no output feedback is used (i tried to use output feedback as well, but nothing changed).
The inner matrix of the reservoir layer is adjustable too. I generate weights between two double values (min, max) with an adjustable sparseness ratio. IF the values are too big, it normlites the matrix to have a spectral radius lower then 1. The reservoir layer can have sigmoid and tanh activaton functions.
The input layer is fully connected to the reservoir layer with random values. So in the training state i run calculate the inner X(n) reservor activations with training data, collecting them into a matrix rowvise. Using the desired output data matrix (which is now a vector with 1 ot 0 values), i calculate the output weigths (from reservoir to output). Reservoir is fully connected to the output. If someone used echo state networks nows what im talking about. I ise pseudo inverse method for this.
The question is, how can i adjust the network so it would generalize better? To hit more than 50-60% of the desired outputs with a foreign dataset (not the training one). If i run the network again with the training dataset, it gives very good reults, 80-90%, but that i want is to generalize better.
I hope someone had this issue too with echo state networks.
If I understand correctly, you have a set of known, classified data that you train on, then you have some unknown data which you subsequently classify. You find that after training, you can reclassify your known data well, but can't do well on the unknown data. This is, I believe, called overfitting - you might want to think about being less stringent with your network, reducing node number, and/or training based on a hidden dataset.
The way people do it is, they have a training set A, a validation set B, and a test set C. You know the correct classification of A and B but not C (because you split up your known data into A and B, and C are the values you want the network to find for you). When training, you only show the network A, but at each iteration, to calculate success you use both A and B. So while training, the network tries to understand a relationship present in both A and B, by looking only at A. Because it can't see the actual input and output values in B, but only knows if its current state describes B accurately or not, this helps reduce overfitting.
Usually people seem to split 4/5 of data into A and 1/5 of it into B, but of course you can try different ratios.
In the end, you finish training, and see what the network will say about your unknown set C.
Sorry for the very general and basic answer, but perhaps it will help describe the problem better.
If your network doesn't generalize that means it's overfitting.
To reduce overfitting on a neural network, there are two ways:
get more training data
decrease the number of neurons
You also might think about the features you are feeding the network. For example, if it is a time series that repeats every week, then one feature is something like the 'day of the week' or the 'hour of the week' or the 'minute of the week'.
Neural networks need lots of data. Lots and lots of examples. Thousands. If you don't have thousands, you should choose a network with just a handful of neurons, or else use something else, like regression, that has fewer parameters, and is therefore less prone to overfitting.
Like the other answers here have suggested, this is a classic case of overfitting: your model performs well on your training data, but it does not generalize well to new test data.
Hugh's answer has a good suggestion, which is to reduce the number of parameters in your model (i.e., by shrinking the size of the reservoir), but I'm not sure whether it would be effective for an ESN, because the problem complexity that an ESN can solve grows proportional to the logarithm of the size of the reservoir. Reducing the size of your model might actually make the model not work as well, though this might be necessary to avoid overfitting for this type of model.
Superbest's solution is to use a validation set to stop training as soon as performance on the validation set stops improving, a technique called early stopping. But, as you noted, because you use offline regression to compute the output weights of your ESN, you cannot use a validation set to determine when to stop updating your model parameters---early stopping only works for online training algorithms.
However, you can use a validation set in another way: to regularize the coefficients of your regression! Here's how it works:
Split your training data into a "training" part (usually 80-90% of the data you have available) and a "validation" part (the remaining 10-20%).
When you compute your regression, instead of using vanilla linear regression, use a regularized technique like ridge regression, lasso regression, or elastic net regression. Use only the "training" part of your dataset for computing the regression.
All of these regularized regression techniques have one or more "hyperparameters" that balance the model fit against its complexity. The "validation" dataset is used to set these parameter values: you can do this using grid search, evolutionary methods, or any other hyperparameter optimization technique. Generally speaking, these methods work by choosing values for the hyperparameters, fitting the model using the "training" dataset, and measuring the fitted model's performance on the "validation" dataset. Repeat N times and choose the model that performs best on the "validation" set.
You can learn more about regularization and regression at http://en.wikipedia.org/wiki/Least_squares#Regularized_versions, or by looking it up in a machine learning or statistics textbook.
Also, read more about cross-validation techniques at http://en.wikipedia.org/wiki/Cross-validation_(statistics).

Resources