Neural Networks normalizing output data - machine-learning

I have a training data for NN along with expected outputs. Each input is 10 dimensional vector and has 1 expected output.I have normalised the training data using Gaussian but I don't know how to normalise the outputs since it only has single dimension. Any ideas?
Example:
Raw Input Vector:-128.91, 71.076, -100.75,4.2475, -98.811, 77.219, 4.4096, -15.382, -6.1477, -361.18
Normalised Input Vector: -0.6049, 1.0412, -0.3731, 0.4912, -0.3571, 1.0918, 0.4925, 0.3296, 0.4056, -2.5168
The raw expected output for the above input is 1183.6 but I don't know how to normalise that. Should I normalise the expected output as part of the input vector?

From the looks of your problem, you are trying to implement some sort of regression algorithm. For regression problems you don't normally normalize the outputs. For the training data you provide for a regression system, the expected output should be within the range you're expecting, or simply whatever data you have for the expected outputs.
Therefore, you can normalize the training
inputs to allow the training to go faster, but you typically don't normalize the target outputs. When it comes to testing time or providing new inputs, make sure you normalize the data in the same way that you did during training. Specifically, use exactly the same parameters for normalization during training for any test inputs into the network.

One important remark is that you normalized elements of a single input vector. Having one-dimensional output space, you could not normalize the output.
The correct way is, indeed, to take a complete batch of training data, say N input (and output) vectors, and normalize each dimension (variable) individually (using N samples). Thus, for one-dimensional output, you will have N samples for normalization. In this way, the vector space of your input will not be distorted.
The normalization of the output dimension is usually required when the scale-space of output variables significantly different. After training, you should use the same set normalization parameters (e.g., for zscore it is "mean" and "std") as you obtain from the training data. In this case, you will put new (unseen) data into the same scale space as you in training.

Related

Predicting inputs in neural network

Is it possible to predict inputs in "Keras neural network" for a particular output?
For example, I have a dataset with 28 inputs and 3 outputs. So, I have trained the model in Keras which works fine. Now, I have to enter the particular values in outputs and I have to predict that what will be the inputs for that particular output.
I'm not 100% sure I understand the question correctly, but if you're trying to build a model that can take inputs and predict outputs, then you will need to train a second model to predict inputs from outputs, where you swap the inputs and outputs so that outputs are your inputs, and your inputs are the outputs. Although this might be annoying, you might have to build a separate network to predict each of your input variables.
To get around this problem, you can consider autoencoders if you're okay with getting a close approximation of the input. An autoencoder is an unsupervised artificial neural network that learns how to efficiently compress and encode data then learns how to reconstruct the data back from the reduced encoded representation to a representation that is as close to the original input as possible (you can read more here: https://towardsdatascience.com/auto-encoder-what-is-it-and-what-is-it-used-for-part-1-3e5c6f017726).
Yes it is definitely possible to predict inputs from the output. In fact, what you're describing is essentially an autoencoder.
Let's say you have a NN trained on MNIST. If you then use the outputs of the classification layer to train the decoder of an auto encoder, you will get a rough indication of the input.
However this is not the best way to do it. The best way to do it is to simply have the latent space be considered the "output", then feed this output into:
a): A 1 layer classification to give you the predicted output and
b): the decoder
This will give you the predicted output and the original image

Why do we want to scale outputs when using dropout?

From the dropout paper:
"The idea is to use a single neural net at test time without dropout.
The weights of this network are scaled-down versions of the trained
weights. If a unit is retained with probability p during training, the
outgoing weights of that unit are multiplied by p at test time as
shown in Figure 2. This ensures that for any hidden unit the expected
output (under the distribution used to drop units at training time) is
the same as the actual output at test time."
Why do we want to preserve the expected output? If we use ReLU activations, linear scaling of weights or activations results in linear scaling of network outputs and does not have any effect on the classification accuracy.
What am I missing?
To be precise, we want to preserve not the "expected output" but the expected value of the output, that is, we want to make up for the difference in training (when we don't pass values of some nodes) and testing phases by preserving mean (expected) values of outputs.
In case of ReLU activations this scaling indeed leads to linear scaling of outputs (when they are positive) but why do you think it doesn't affect final accuracy of a classification model? At least in the end, we usually apply either softmax of sigmoid which are non-linear and depend on this scaling.

Neural Network training data normalisation vs. runtime input data

I'm starting to learn about neural networks and I came across data normalisation. I understand the need for it but I don't quite know what to do with my data once my model is trained and in the field.
Let say I take my input data, subtract its mean and divide by the standard deviation. Then I take that as inputs and I train my neural network.
Once in the field, what do I do with my input sample on which I want a prediction?
Do I need to keep my training data mean and standard deviation and use that to normalise?
Correct. The mean and standard deviation that you use to normalize the training data will be the same that you use to normalize the testing data (i.e, don't compute a mean and standard deviation for the test data).
Hopefully this link will give you more helpful info: http://cs231n.github.io/neural-networks-2/
An important point to make about the preprocessing is that any preprocessing statistics (e.g. the data mean) must only be computed on the training data, and then applied to the validation / test data. E.g. computing the mean and subtracting it from every image across the entire dataset and then splitting the data into train/val/test splits would be a mistake. Instead, the mean must be computed only over the training data and then subtracted equally from all splits (train/val/test).

Is there a need to normalise input vector for prediction in SVM?

For input data of different scale I understand that the values used to train the classifier has to be normalized for correct classification(SVM).
So does the input vector for prediction also needs to be normalized?
The scenario that I have is that the training data is normalized and serialized and saved in the database, when a prediction has to be done the serialized data is deserialized to get the normalized numpy array, and the numpy array is then fit on the classifier and the input vector for prediction is applied for prediction. So does this input vector also needs to be normalized? If so how to do it, since at the time of prediction I don't have the actual input training data to normalize?
Also I am normalizing along axis=0 , i.e. along the column.
my code for normalizing is :
preprocessing.normalize(data, norm='l2',axis=0)
is there a way to serialize preprocessing.normalize
In SVMs it is recommended a scaler for several reasons.
It is better to have the same scale in many optimization methods.
Many kernel functions use internally an euclidean distance to compare two different samples (in the gaussian kernel the euclidean distance is in the exponential term), if every feature has a different scale, the euclidean distance only take into account the features with highest scale.
When you put the features in the same scale you must remove the mean and divide by the standard deviation.
xi - mi
xi -> ------------
sigmai
You must storage the mean and standard deviation of every feature in the training set to use the same operations in future data.
In python you have functions to do that for you:
http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html
To obtain means and standar deviations:
scaler = preprocessing.StandardScaler().fit(X)
To normalize then the training set (X is a matrix where every row is a data and every column a feature):
X = scaler.transform(X)
After the training, you must normalize of future data before the classification:
newData = scaler.transform(newData)

Qualitative Classification in Neural Network on Weka

I have a training set where the input vectors are speed, acceleration and turn angle change. Output is a crisp class- an activity state from the given set {rest, walk, run}. e.g- say for input vectors [3.1 1.2 2]-->run ; [2.1 1 1]-->walk and so on.
I am using weka to develop a Neural Network model. The output I am defining as crisp ones (or rather qualitative ones in words- categorical values). After training the model, the model can fairly classify on test data.
I was wondering how the internal process (mapping function) is taking place? Is the qualitative output states are getting some nominal value inside the model and after processing it is again getting converted to the categorical data? because a NN model cannot map float input values to a categorical data through hidden neurons, so what is actually happening, although the model is working fine.
If the model converts the categorical outputs into nominal ones and then start processing then on what basis it converts the categorical value into some arbitrary numerical values?
Yes, categorical values are usually being converted to numbers, and the networks learn to associate input data with these numbers. However these numbers are often further encoded, not to use only single output neuron. The most common way to do it, for unordered labels, is to add dummy output neurons dedicated to each category and use 1-of-C encoding, with 0.1 and 0.9 as target values. Output is interpreted using the Winner-take-all paradigm.
Using only one neuron and encoding categories with different numbers for unordered labels often leads to problems - as the network will treat middle categories as "averages" of the boundary categories. This however may sometimes be desired, if you have ordered categorical data.
You can find very good explanation of this issue in this part of the online Neural Network FAQ.
The neural net's computations all take place on continuous values. To do multiclass classification with discrete output, its final layer produces a vector of such values, one for each class. To make a discrete class prediction, take the index of the maximum element in that vector.
So if the final layer in a classification network for four classes predicts [0 -1 2 1], then the third element of the vector is the largest and the third class is selected. Often, these values are also constrained to form a probability distribution by means of a softmax activation function.

Resources