neural networks for missing features - machine-learning

I have a dataset wit A...F features for training. Now my prediction data set to predict the key feature does not have observations of 3 feature used in the training set. So I have only a subset of features for prediction whereas the neural newtork is trained for a broader range of features.
How can I handle such problem? Can you use a neural network for the missing features? In my mind came the following: First, I use a neural network on training set, but now to train on the missing features. So I can predict the 3 missing features from the prediction data set. Now, I use a neural network on this new prediction data set.

Have you tried running the neural-network on your dataset even though features are missing? A neural-network does not need all features to be present.
You can simply set all missing features values to 0 for the neural network, as neural networks don't see a difference between 0 and feature is missing. Why not you ask? If you set an input value to 0, that means all the connections from that input node will have a 0 value a well: adding nothing to the hidden neurons that are connected to that input node.
But before you do that, try any of these:
source
As 1 seems the case for you!

Related

adding more output neurons to neural network after training?

Is there a problem of adding more output neurons after finishing training my neural network .
for example I teach my neural network how to see oranges and apples and say which one is apple and which one is orange. Shades, shape and texture as inputs and orange and apple as outputs so there are 3 inputs and 2 outputs.
what if I trained them and I wanted to add two more outputs lets say banana and stawberey. If I did that does my neural network previous learning fail ? or do I make something wrong here ? or it is safe to do that ?
You will most likely need to re-train the network from scratch incorporating the old and new data and four classes instead of two. If you try to add new classes to existing network, you are liable to run into what is called catastrophic forgetting.. However, you may be fine with only re-training the final classifier, or fine-tuning from previously learned weights.

Can a neural network have integer inputs?

I build a neural network with input as a mixture of integers and booleans. But it did not converge. I have seen many examples on internet and every one of them has input in boolean form. So is it possible to build a neural network with a mixture of inputs or integer inputs?
Indeed, it is. What you probability need to do is to normalize your inputs. This means that you could divide a feature's value by the maximum value you expect to see in that place, so that everything lies in the range (-1,1).
Some links to understand normalization of inputs:
Why do we have to normalize the input for an artificial neural network?
https://www.researchgate.net/post/How_can_I_normalize_input_and_output_data_in_training_neural_networks
Another recent way to ensure normalization is the concept of batch normalization

How to propagate uncertainty into the prediction of a neural network?

I have inputs x_1, ..., x_n that have known 1-sigma uncertainties e_1, ..., e_n. I am using them to predict outputs y_1, ..., y_m on a trained neural network. How can I obtain 1-sigma uncertainties on my predictions?
My idea is to randomly perturb each input x_i with normal noise having mean 0 and standard deviation e_i a large number of times (say, 10000), and then take the median and standard deviation of each prediction y_i. Does this work?
I fear that this only takes into account the "random" error (from the measurements) and not the "systematic" error (from the network), i.e., each prediction inherently has some error to it that is not being considered in this approach. How can I properly obtain 1-sigma error bars on my predictions?
You can get a general analysis of what "jittering" (generation of random samples) brings to the neural network optimization here http://wojciechczarnecki.com/pdfs/preprint-ml-with-unc.pdf
In short - jittering is just a regularization on network's weights.
For errors bars as such you should refer to works of Will Penny
http://www.fil.ion.ucl.ac.uk/~wpenny/publications/error_bars.ps
http://www.fil.ion.ucl.ac.uk/~wpenny/publications/nnerrors.ps
u r right. That method only takes the data uncertainty into account (assuming u don't fit the neural net while applying the noise). As a side note, alternatively when fitting the data using a neural net u may also apply mixture density networks (see one of the many tutorials).
More importantly, in order to account for model uncertainty u should apply bayesian neural nets. U could could start e.g. with Monte-Carlo dropout. Also very interesting should be this work on performing sampling-free inference when using Monte-Carlo dropout
https://arxiv.org/abs/1908.00598
This work explicitly uses error propagation through neural networks and should be very interesting for u!
Best

Having some troubles with the PyBrain Neural Network regression function

I would appreciate a some insights into the workings of the PyBrain's neural network. I have a dataset of different household features that correspond to a certain household income. The task is to create a regression based on neural networks to be able to predict the income for given features.
I've tried the simple constructor
pybrain.tools.shortcuts.buildNetwork(feature_count, 12, 1, recurrent=False)
and it kinda works. But if i change the hiddenlayer to use GaussianLayer or LinearLayer i am getting the NaNs as output during the training phase.
Is there maybe something else that needs to be taken care of when using these layers (I am guessing maybe feature selection, when they correlate)?
Thanks
I solved a neural network regression problem using pybrain where I had to forecast the load on a power station using weather features. This appears to be the same problem as yours, except in application. I followed the guide here: http://fastml.com/pybrain-a-simple-neural-networks-library-in-python/ which brought me 90% of the way towards the final solution. I had 8 inputs and one outputs.
One "gotcha" I found was that I had to normalise my input values to 0 -> 1. The MSE value would not decrease on each EPOCH otherwise. Also, if any of my input vaues were NaN, I got continuous Nan values out.
I hope this helps.

Echo state neural network?

Is anyone here who is familiar with echo state networks? I created an echo state network in c#. The aim was just to classify inputs into GOOD and NOT GOOD ones. The input is an array of double numbers. I know that maybe for this classification echo state network isn't the best choice, but i have to do it with this method.
My problem is, that after training the network, it cannot generalize. When i run the network with foreign data (not the teaching input), i get only around 50-60% good result.
More details: My echo state network must work like a function approximator. The input of the function is an array of 17 double values, and the output is 0 or 1 (i have to classify the input into bad or good input).
So i have created a network. It contains an input layer with 17 neurons, a reservoir layer, which neron number is adjustable, and output layer containing 1 neuron for the output needed 0 or 1. In a simpler example, no output feedback is used (i tried to use output feedback as well, but nothing changed).
The inner matrix of the reservoir layer is adjustable too. I generate weights between two double values (min, max) with an adjustable sparseness ratio. IF the values are too big, it normlites the matrix to have a spectral radius lower then 1. The reservoir layer can have sigmoid and tanh activaton functions.
The input layer is fully connected to the reservoir layer with random values. So in the training state i run calculate the inner X(n) reservor activations with training data, collecting them into a matrix rowvise. Using the desired output data matrix (which is now a vector with 1 ot 0 values), i calculate the output weigths (from reservoir to output). Reservoir is fully connected to the output. If someone used echo state networks nows what im talking about. I ise pseudo inverse method for this.
The question is, how can i adjust the network so it would generalize better? To hit more than 50-60% of the desired outputs with a foreign dataset (not the training one). If i run the network again with the training dataset, it gives very good reults, 80-90%, but that i want is to generalize better.
I hope someone had this issue too with echo state networks.
If I understand correctly, you have a set of known, classified data that you train on, then you have some unknown data which you subsequently classify. You find that after training, you can reclassify your known data well, but can't do well on the unknown data. This is, I believe, called overfitting - you might want to think about being less stringent with your network, reducing node number, and/or training based on a hidden dataset.
The way people do it is, they have a training set A, a validation set B, and a test set C. You know the correct classification of A and B but not C (because you split up your known data into A and B, and C are the values you want the network to find for you). When training, you only show the network A, but at each iteration, to calculate success you use both A and B. So while training, the network tries to understand a relationship present in both A and B, by looking only at A. Because it can't see the actual input and output values in B, but only knows if its current state describes B accurately or not, this helps reduce overfitting.
Usually people seem to split 4/5 of data into A and 1/5 of it into B, but of course you can try different ratios.
In the end, you finish training, and see what the network will say about your unknown set C.
Sorry for the very general and basic answer, but perhaps it will help describe the problem better.
If your network doesn't generalize that means it's overfitting.
To reduce overfitting on a neural network, there are two ways:
get more training data
decrease the number of neurons
You also might think about the features you are feeding the network. For example, if it is a time series that repeats every week, then one feature is something like the 'day of the week' or the 'hour of the week' or the 'minute of the week'.
Neural networks need lots of data. Lots and lots of examples. Thousands. If you don't have thousands, you should choose a network with just a handful of neurons, or else use something else, like regression, that has fewer parameters, and is therefore less prone to overfitting.
Like the other answers here have suggested, this is a classic case of overfitting: your model performs well on your training data, but it does not generalize well to new test data.
Hugh's answer has a good suggestion, which is to reduce the number of parameters in your model (i.e., by shrinking the size of the reservoir), but I'm not sure whether it would be effective for an ESN, because the problem complexity that an ESN can solve grows proportional to the logarithm of the size of the reservoir. Reducing the size of your model might actually make the model not work as well, though this might be necessary to avoid overfitting for this type of model.
Superbest's solution is to use a validation set to stop training as soon as performance on the validation set stops improving, a technique called early stopping. But, as you noted, because you use offline regression to compute the output weights of your ESN, you cannot use a validation set to determine when to stop updating your model parameters---early stopping only works for online training algorithms.
However, you can use a validation set in another way: to regularize the coefficients of your regression! Here's how it works:
Split your training data into a "training" part (usually 80-90% of the data you have available) and a "validation" part (the remaining 10-20%).
When you compute your regression, instead of using vanilla linear regression, use a regularized technique like ridge regression, lasso regression, or elastic net regression. Use only the "training" part of your dataset for computing the regression.
All of these regularized regression techniques have one or more "hyperparameters" that balance the model fit against its complexity. The "validation" dataset is used to set these parameter values: you can do this using grid search, evolutionary methods, or any other hyperparameter optimization technique. Generally speaking, these methods work by choosing values for the hyperparameters, fitting the model using the "training" dataset, and measuring the fitted model's performance on the "validation" dataset. Repeat N times and choose the model that performs best on the "validation" set.
You can learn more about regularization and regression at http://en.wikipedia.org/wiki/Least_squares#Regularized_versions, or by looking it up in a machine learning or statistics textbook.
Also, read more about cross-validation techniques at http://en.wikipedia.org/wiki/Cross-validation_(statistics).

Resources