Having some troubles with the PyBrain Neural Network regression function - machine-learning

I would appreciate a some insights into the workings of the PyBrain's neural network. I have a dataset of different household features that correspond to a certain household income. The task is to create a regression based on neural networks to be able to predict the income for given features.
I've tried the simple constructor
pybrain.tools.shortcuts.buildNetwork(feature_count, 12, 1, recurrent=False)
and it kinda works. But if i change the hiddenlayer to use GaussianLayer or LinearLayer i am getting the NaNs as output during the training phase.
Is there maybe something else that needs to be taken care of when using these layers (I am guessing maybe feature selection, when they correlate)?
Thanks

I solved a neural network regression problem using pybrain where I had to forecast the load on a power station using weather features. This appears to be the same problem as yours, except in application. I followed the guide here: http://fastml.com/pybrain-a-simple-neural-networks-library-in-python/ which brought me 90% of the way towards the final solution. I had 8 inputs and one outputs.
One "gotcha" I found was that I had to normalise my input values to 0 -> 1. The MSE value would not decrease on each EPOCH otherwise. Also, if any of my input vaues were NaN, I got continuous Nan values out.
I hope this helps.

Related

Is it possible to train a binary classification neural network by only feeding it with input from only one class?

I'm basically trying to create a neural network that should tell me whether an input I'm giving it is valid or not. The problem is that I only have valid input with which I can train it.
Right now I am trying to come up with a working dense model that validates only mnist digits between 0 and 4. All other digits should be seen as invalid. First attempt was to train it with digits between 0 and 4 as valid and images with random pixels as invalid (with the same percent of black pixels as a normal image) but unfortunately it doesn't work. When I test it with digits between 5 and 9, they are seen as valid.
So I'm starting to think if it's even possible to train a neural network this way.
Also I realize there might be better ways to do this, maybe with an autoencoder or a different kind of network but right now I want to try this with only dense layers.
Thank you.
What you are looking for is one-class classification, also known as unary classification or class-modelling.
Quick google search suggests to train an autoencoder and define an object as in your class if the reconstruction error is below a specific threshold.
But if you start building up something like that i would suggest you to use something like One-Class K-Nearest Neighbor or One-Class SVM first to see if you get acceptable results. If so you can improve your results with the "extremly more complicated to develop"- solution using autoencoders

Mapping a Plant with Machine Learning

There is a dataset of a plant that makes certain numeric outputs based on numeric inputs. The dataset contains the input values and output value for several years every 15 minutes.
Since it would be too expensive to model the physical properties of the system in software, I would like to create a model with Machine learning, which behaves as the system. When entering inputs, the model should provide output.
For the solution I have tested Feedforward neural network. The results are ok, but in some cases too inaccurate.
What other methods would be available for this problem?
If it's a time series task you could use the NARX architecture of a neural network or an LSTM network. Later is like the NARX a recurrent neural network. Matlab offers an implementation of the first one.
https://en.m.wikipedia.org/wiki/Nonlinear_autoregressive_exogenous_model
https://en.m.wikipedia.org/wiki/Long_short-term_memory
If you "simply" want to fit a polynomial to your data you could use basic linear regression with polynomials of different degree to see which one works best.
Note: It's not called linear because it's only able to fit linear models.
https://en.m.wikipedia.org/wiki/Linear_regression
Some other possibilities are kernel methods such as kernel ridge regression or SVR. Later one is based on support vector machines which usually perform quite well (at least for classification from my personal experience).
If you want to try SVR you can use a small but great lib called libSVM. Matlab also offers this.
The following link shows a comparison of this algorithms:
http://scikit-learn.org/stable/auto_examples/plot_kernel_ridge_regression.html
Edit: If i understand this correctly, it's a time series task if you want to predict the outputs of a future time t+1 from a given time t. Try the NARX model or the LSTM net.

Reusing Inception V3 Conv Neural Net (tensorflow) with 0% accuracy

EDIT1: My code is the same as here, https://github.com/tensorflow/models/blob/master/inception/inception. The only difference is that I pack my files into TFRecords and feed it bactch wise. Also, the ratio of Class 0 : Class 1 is 70:30.
I'm currently working on a project in which I'm making use of inception-V3 CNN model to train a classifier. Currently, I am working on a binary classifier (either predict 1 or 0) but, my model only predicts class 0 for everything. While troubleshooting I've found that the probability of prediction is 100% for class 0 all the time. I have verified everything from the input queuing system to the eval and testing, everything seems to be working well too.
Strangely, the loss value reduces in a perfect semi-parabolic fashion which makes me think that the loss has converged to a local minima. Upon testing the script only churns out class 0(with 100% probability) each time. Another thing I've noticed is that the activation across various Conv layers are always constant which could imply that the neurons are just not firing at all.
My question is,
1. Is my model working ? The loss seems to converge but the activation across various layers seems to be stagnant.
2. I am using the training code available from the models section of the tensorflow (https://github.com/tensorflow/models/blob/master/inception/inception/inception_train.py)
I am reusing the train, eval and supporting code to train my model with a custom input pipeline created by me (which is also working). Can someone help guide me in the right direction on this?
Thanks.
Ik I am a little late in answering this question 😅
First of all your model didn't learn anything at all. All it did (cleverly 😂) was to predict class 0 for all cases so that it achieves a baseline accuracy of 70% without any effort. (Probably the model was lazy 😪😋) JK. This is a very well known problem in machine learning. This is called as class imbalance problem. Refer this http://machinelearningmastery.com/tactics-to-combat-imbalanced-classes-in-your-machine-learning-dataset/.
Apart from the techniques mentioned there. The one technique that works wonders is using class weights. That is, basically telling the network to be biased towards the weaker class. In your case class weights will be class0:class1 = 3:7. This is a hyperparameter too! But this is a good point to start.
Moreover, you didn't give any info about your dataset size. Whether you are fine tuning or training from scratch. Without them it's hard to speculate. By default I would suggest fine tuning.
Moreover, by loss you mean the training loss or validation loss? Because training loss has literally no info regarding the performance of the model. Moreover, in my opinion both training loss, and validation losses have very little info to derive meaningful insights about the model's performance. Use other metrics like confusion matrix,f1 score, recall, precision etc.
Finally, there is absolutely no single answer to your question. The only way is the hard way - you will learn along with the model 😉. Because I consider training a NN especially a CNN an art. In which, intuition plays a very crucial role coz, most of the times, the least expected changes would give the best results. Anyway that's the fun part of training a NN.
Happy training 💪
P.S: Try using the visualisation tools like gradcam to know whether the model is looking at the correct part of the image for classification. This is very important!

Advantages of RNN over DNN in prediction

I am going to work on a problem that needs to be addressed with either RNN or Deep Neural Nets. In general, the problem is predicting financial values. So, because I am given the sequence of financial data as an input, I thought that RNN would be better. On the other hand, I think that if I can fit the data into some structure, I can train with DNN much better because the training phase is easier in DNN than RNN. For example, I could get last 1-month info and keep 30 inputs and predict 31'th day while using DNN.
I don't understand the advantage of RNN over DNN in this perspective. My first question is about the proper usage of RNN or DNN in this problem.
My second questions are somehow basic. While training RNN, isn't it possible for a network to get "confused"? I mean, consider the following input: 10101111, and our inputs are one digits 0 or 1 and we have 2-sequences (1-0,1-0,1-1,1-1) Hereafter 1, comes 0 several times. And then at the end, after 1 comes 1. While training, wouldn't this become a major problem? That is, why the system not gets confused while training this sequence?
I think your question is phrased a bit problematically. First, DNNs are a class of architectures. A Convolutional Neural Network differs greatly from a Deep Belief Network or a simple Deep MLP. There are feed forward architectures (e.g. TDNN) fit for timeseries prediction but it depends on you, whether you're more interested in research or just solving your problem.
Second, RNNs are as "deep" as it gets. Considering the most basic RNN, the Elman Network: During training with Backpropagation through time (BPTT) they are unfolded in time - backpropagating over T timesteps. Since this backpropagation is done not only vertically like in a standard DNN but also horizontally over T-1 context layers, the past activations of the hidden layers from T-1 timesteps before the present are actually considered for the activation at the current timestep. This illustration of an unfolded net might help in understanding what I just wrote (source):
This makes RNNs so powerful for timeseries prediction (and should answer both your questions). If you have more questions, read about Elman Networks. LSTMs etc. will only confuse you. Understanding Elman Networks and BPTT is the needed foundation to understand any other RNN.
And one last thing you'll need to look out for: The vanishing gradient problem. While it's tempting to say let's make T=infinity and give our RNN as much memory as possible: It doesn't work. There are many ways working around this problem, LSTMs are quite popular at the moment and there are even some proper LSTM implementations around nowadays. But it's important to know that a basic Elman Network could really struggle with T=30.
As you answered yourself - RNN are for sequences. If data has sequential nature (time series) than it is preferable to use such model over DNN and other "static" models. The main reason is that RNN can model process which is responsible for each conequence, so for example given sequences
0011100
0111000
0001110
RNN will be able to build a model, that "after seeing '1' I will see two more" and correctly build a prediction when seeing
0000001**** -> 0000001110
While in the same time, for DNN (and other non sequential models) there is no relation between these three sequences, in fact the only common thing for them is that "there is 1 on forth position, so I guess it is always like that".
Regarding the second question. Why it won't get confused? Because it models sequences, because it has memory. It makes its recisions based on everything that was observed before, and assuming that your signal has any type of regularity, there is always some vent in the past that differentiate between two possible paths of signals. Once again, such phenomena are much better addressed by RNN than non-recurrent models. See for example natural language and enormous progress given by LSTM-based models in recent years.

Different weights for different classes in neural networks and how to use them after learning

I trained a neural network using the Backpropagation algorithm. I ran the network 30 times manually, each time changing the inputs and the desired output. The outcome is that of a traditional classifier.
I tried it out with 3 different classifications. Since I ran the network 30 times with 10 inputs for each class I ended up with 3 distinct weights but the same classification had very similar weights with a very small amount of error. The network has therefore proven itself to have learned successfully.
My question is, now that the learning is complete and I have 3 distinct type of weights (1 for each classification), how could I use these in a regular feed forward network so it can classify the input automatically. I searched around to check if you can somewhat average out the weights but it looks like this is not possible. Some people mentioned bootstrapping the data:
Have I done something wrong during the backpropagation learning process? Or is there an extra step which needs to be done post the learning process with these different weights for different classes?
One way how I am imaging this is by implementing a regular feed forward network which will have all of these 3 types of weights. There will be 3 outputs and for any given input, one of the output neurons will fire which will result that the given input is mapped to that particular class.
The network architecture is as follows:
3 inputs, 2 hidden neurons, 1 output neuron
Thanks in advance
It does not make sense if you only train one class in your neural network each time, since the hidden layer can make weight combinations to 'learn' which class the input data may belong to. Learn separately will make the weights independent. The network won't know which learned weight to use if a new test input is given.
Use a vector as the output to represent the three different classes, and train the data altogether.
EDIT
P.S, I don't think the link post you provide is relevant with your case. The question in that post arises from different weights initialization (randomly) in neural network training. Sometimes people apply some seed methods to make the weight learning reproducible to avoid such a problem.
In addition to response by nikie, another possibility is to represent output as one (unique) output unit with continuous values. For example, ann classify for first class if output is in the [0, 1) interval, for second if is in the [1, 2) interval and third classes in [2, 3). This architecture is declared in letterature (and verified in my experience) to be less efficient that discrete represetnation with 3 neurons.

Resources