Data normalization for new inputs into a trained neural network - normalization

I have a backpropagation neural network that I have created and coded it in Q with a Kdb+ database.
I am pre-processing data into the network with normalization into the form of [0,1], the network is trained on and predicts future moving averages on a large data set split into 60:20:20 respectively.
Normalization formula:
processed data: (0.8*(VALn - MINn)/(MAXn - MINn))+0.1
VALn = unprocessed data value
MAXn = max of data set
MINn = min of data set
How do I go about normalizing new data into the final trained network?
Would I run new inputs through the above formula keeping the MIN and MAX values from the training set?
Thanks

You should keep the same MAXn and MINn, since changing it at test time would mean that you are changing how raw data is mapped to processed data. For a quick check, try the preprocessing with different MAXn and MINn, then try to predict the training cases. You will get lower performance since the normalized data do not look as they did before.
Note that if you happen to have data in test set that is higher/lower than MAXn/MINn, then those data will not be in range [0,1] after normalization. This is generally okay if there is not too many of these cases. It simply means the neural net is seeing data that is little out of previously seen range.

Yes, for prediction you should use the same normalization formula, as you use for normalization of training data.

Related

Machine Learning Predictions and Normalization

I am using z-score to normalize my data before training my model. When I do predictions on a daily basis, I tend to have very few observations each day, perhaps just a dozen or so. My question is, can I normalize the test data just by itself, or should I attach it to the entire training set to normalize it?
The reason I am asking is, the normalization is based on mean and std_dev, which obviously might look very different if my dataset consists only of a few observations.
You need to have all of your data in the same units. Among other things, this means that you need to use the same normalization transformation for all of your input. You don't need to include the new data in the training per se -- however, keep the parameters of the normalization (the m and b of y = mx + b) and apply those to the test data as you receive them.
It's certainly not a good idea to predict on a test set using a model trained with a very different data distribution. I would use the same mean and std of your training data to normalize you test set.

Data Science Scaling/Normalization real case

When do data pre-processing, it is suggested to do either scaling or normalization. It is easy to do it when you have data on your hand. You have all the data and can do it right away. But after the model built and run, does the first data that comes in need to be scaled or normalized? If it needed, it only one single row how to scale or normalize it? How do we know what is the min/max/mean/stdev from each feature? And how is the incoming data is the min/max/mean each feature?
Please advise
First of all you should know when to use scaling and normalization.
Scaling - scaling is nothing but to transform your features to comparable magnitudes.Let say if you have features like person's income and you noticed that some have value of order 10^3 and some have 10^6.Now if you model your problem with this features then algorithms like KNN, Ridge Regression will give higher weight to higher magnitude of such attributes.To prevent this you need to first scale your features.Min-Max scaler is one of the most used scaling.
Mean Normalisation -
If after examining the distribution of the feature and you found that feature is not centered around zero then for the algorithm like svm where objective function already assumes zero mean and same order variance, we could have problem in modeling.So here you should do Mean Normalisation.
Standardization - For the algorithm like svm, neural network, logistic regression it is necessary to have a variance of the feature in the same order.So why don't we make it to one.So in standardization, we make the distribution of features to zero mean and unit variance.
Now let's try to answer your question in terms of training and testing set.
So let's say you are training your model on 50k dataset and testing on 10k dataset.
For the above three transformations, the standard approach says that you should fit any normalizer or scaler to only training dataset and use only transform for the testing dataset.
In our case, if we want to use standardization then we will first fit our standardizer on 50k training dataset and then used to transform it 50k training dataset and also testing dataset.
Note - We shouldn't fit our standardizer to test dataset, in place of we will use already fitted standardizer to transform testing dataset.
Yes, you need to apply normalization to the input data, else the model will predict nonsense.
You also have to save the normalization coefficients that were used during training, or from training data. Then you have to apply the same coefficients to incoming data.
For example if you use min-max normalization:
f_n = (f - min(f)) / (max(f) - min_(f))
Then you need to save the min(f) and max(f) in order to perform normalization for new data.

Neural Network training data normalisation vs. runtime input data

I'm starting to learn about neural networks and I came across data normalisation. I understand the need for it but I don't quite know what to do with my data once my model is trained and in the field.
Let say I take my input data, subtract its mean and divide by the standard deviation. Then I take that as inputs and I train my neural network.
Once in the field, what do I do with my input sample on which I want a prediction?
Do I need to keep my training data mean and standard deviation and use that to normalise?
Correct. The mean and standard deviation that you use to normalize the training data will be the same that you use to normalize the testing data (i.e, don't compute a mean and standard deviation for the test data).
Hopefully this link will give you more helpful info: http://cs231n.github.io/neural-networks-2/
An important point to make about the preprocessing is that any preprocessing statistics (e.g. the data mean) must only be computed on the training data, and then applied to the validation / test data. E.g. computing the mean and subtracting it from every image across the entire dataset and then splitting the data into train/val/test splits would be a mistake. Instead, the mean must be computed only over the training data and then subtracted equally from all splits (train/val/test).

Overfitting and Data splitting

Let's say that I have a data file like:
Index,product_buying_date,col1,col2
0,2013-01-16,34,Jack
1,2013-01-12,43,Molly
2,2013-01-21,21,Adam
3,2014-01-09,54,Peirce
4,2014-01-17,38,Goldberg
5,2015-01-05,72,Chandler
..
..
2000000,2015-01-27,32,Mike
with some more data and I have a target variable y. Assume something as per your convenience.
Now I am aware that we divide the data into 2 parts i.e. Train and Test. And then we divide Train into 70:30, build the model with 70% and validate it with 30%. We tune the parameters so that model does not get overfit. And then predict with the Test data. For example: I divide 2000000 into two equal parts. 1000000 is train and I divide it in validate i.e. 30% of 1000000 which is 300000 and 70% is where I build the model i.e. 700000.
QUESTION: Is the above logic depending upon how the original data splits?
Generally we shuffle the data and then break it into train, validate and test. (train + validate = Train). (Please don't confuse here)
But what if the split is alternate. Like When I divide it in Train and Test first, I give even rows to Test and odd rows to Train. (Here data is initially sort on the basis of 'product_buying_date' column so when i split it in odd and even rows it gets uniformly split.
And when I build the model with Train I overfit it so that I get maximum AUC with Test data.
QUESTION: Isn't overfitting helping in this case?
QUESTION: Is the above logic depending upon how the original data
splits?
If dataset is large(hundred of thousand), you can randomly split the data and you should not have any problem but if dataset is small then you can adopt the different approaches like cross-validation to generate the data set. Cross-validation states that you split you make n number of training-validation set out of your Training set.
suppose you have 2000 data points, you split like
1000 - Training dataset
1000 - testing dataset.
5-cross validation would mean that you would make five 800/200 training/validation dataset.
QUESTION: Isn't overfitting helping in this case?
Number one rule of the machine learning is that, you don't touch the test data set. It's a holly data set that should not be touched.
If you overfit the test data to get maximum AUC score then there won't be any meaning of validation dataset. Foremost aim of any ml algorithm is to reduce the generalization error i.e. algorithm should be able to perform good on unseen data. If you would tune your algorithm with testing data. you won't be able to meet this criteria. In cross-validation also you do not touch your testing set. you select your algorithm. tune its parameter with validation dataset and after you have done with that apply your algorithm to test dataset which is your final score.

How do you normalize data to feed into a neural network that lies outside the range of the data it was trained on?

I have an input into a neural network used for classification, that was trained on a data set where the values were from 1-5, for example. And then I normalized all of this training data so that it was from 0-1. What would I feed into the network if I wanted to classify something where that input was outside of the 1-5 range. For example, how could a value of 5.3 be normalized?
There are a number of ways that the value could be handled depending on the conditions of your Neural Network. Some include:
1/. The Input may be maximised to a value of 1
2/. This may exceed 1 depending on the normalisation algorithm applied and whether the Neural Network was designed to allow it (Typically, if all data was normalised, these values should remain between 0 and 1)
3/. (Classification Only) - If the Inputs are categorical, rather than a quantitative value between 1 and 5, I'm not sure if a value of 5.3 would make sense. Perhaps adding another neuron for an 'unknown' state may help depending on your problem, but I have a gut feeling that this is overkill.
I am assuming that such a case has arisen as a result of unforeseen future cases being used for estimation purposes after training has been completed. Generally, handling would really come down to (i) the Programming of the Neural Network, and (ii) the calculation of the Normalised Input.

Resources