Why we have to reshape the array values when we are going to predict results using the predict() method of linear regression - machine-learning

While we are predicting it is mandatory to give the input to predict() as the 2-D array. why it is necessary ?

predict requires the argument as 2-D array because it transforms the rows and columns in the features and observation vecotr

Related

Linear Regression problem in predicting values

I am trying to predict values using linear regression . when I use Here
reg.predict(60)
it pops like this error
How to reshape this?
The prediction function takes a 2-D array as an input, so you can use reg.predict([[60]]) instead

What is the difference between Keras model.evaluate() and model.predict()?

I used Keras biomedical image segmentation to segment brain neurons. I used model.evaluate() it gave me Dice coefficient: 0.916. However, when I used model.predict(), then loop through the predicted images by calculating the Dice coefficient, the Dice coefficient is 0.82. Why are these two values different?
The model.evaluate function predicts the output for the given input and then computes the metrics function specified in the model.compile and based on y_true and y_pred and returns the computed metric value as the output.
The model.predict just returns back the y_pred
So if you use model.predict and then compute the metrics yourself, the computed metric value should turn out to be the same as model.evaluate
For example, one would use model.predict instead of model.evaluate in evaluating an RNN/ LSTM based models where the output needs to be fed as input in next time step
The problem lies in the fact that every metric in Keras is evaluated in a following manner:
For each batch a metric value is evaluated.
A current value of loss (after k batches is equal to a mean value of your metric across computed k batches).
The final result is obtained as a mean of all losses computed for all batches.
Most of the most popular metrics (like mse, categorical_crossentropy, mae) etc. - as a mean of loss value of each example - have a property that such evaluation ends up with a proper result. But in case of Dice Coefficient - a mean of its value across all of the batches is not equal to actual value computed on a whole dataset and as model.evaluate() uses such way of computations - this is the direct cause of your problem.
The keras.evaluate() function will give you the loss value for every batch. The keras.predict() function will give you the actual predictions for all samples in a batch, for all batches. So even if you use the same data, the differences will be there because the value of a loss function will be almost always different than the predicted values. These are two different things.
It is about regularization. model.predict() returns the final output of the model, i.e. answer. While model.evaluate() returns the loss. The loss is used to train the model (via backpropagation) and it is not the answer.
This video of ML Tokyo should help to understand the difference between model.evaluate() and model.predict().

Neural Networks normalizing output data

I have a training data for NN along with expected outputs. Each input is 10 dimensional vector and has 1 expected output.I have normalised the training data using Gaussian but I don't know how to normalise the outputs since it only has single dimension. Any ideas?
Example:
Raw Input Vector:-128.91, 71.076, -100.75,4.2475, -98.811, 77.219, 4.4096, -15.382, -6.1477, -361.18
Normalised Input Vector: -0.6049, 1.0412, -0.3731, 0.4912, -0.3571, 1.0918, 0.4925, 0.3296, 0.4056, -2.5168
The raw expected output for the above input is 1183.6 but I don't know how to normalise that. Should I normalise the expected output as part of the input vector?
From the looks of your problem, you are trying to implement some sort of regression algorithm. For regression problems you don't normally normalize the outputs. For the training data you provide for a regression system, the expected output should be within the range you're expecting, or simply whatever data you have for the expected outputs.
Therefore, you can normalize the training
inputs to allow the training to go faster, but you typically don't normalize the target outputs. When it comes to testing time or providing new inputs, make sure you normalize the data in the same way that you did during training. Specifically, use exactly the same parameters for normalization during training for any test inputs into the network.
One important remark is that you normalized elements of a single input vector. Having one-dimensional output space, you could not normalize the output.
The correct way is, indeed, to take a complete batch of training data, say N input (and output) vectors, and normalize each dimension (variable) individually (using N samples). Thus, for one-dimensional output, you will have N samples for normalization. In this way, the vector space of your input will not be distorted.
The normalization of the output dimension is usually required when the scale-space of output variables significantly different. After training, you should use the same set normalization parameters (e.g., for zscore it is "mean" and "std") as you obtain from the training data. In this case, you will put new (unseen) data into the same scale space as you in training.

Is there a need to normalise input vector for prediction in SVM?

For input data of different scale I understand that the values used to train the classifier has to be normalized for correct classification(SVM).
So does the input vector for prediction also needs to be normalized?
The scenario that I have is that the training data is normalized and serialized and saved in the database, when a prediction has to be done the serialized data is deserialized to get the normalized numpy array, and the numpy array is then fit on the classifier and the input vector for prediction is applied for prediction. So does this input vector also needs to be normalized? If so how to do it, since at the time of prediction I don't have the actual input training data to normalize?
Also I am normalizing along axis=0 , i.e. along the column.
my code for normalizing is :
preprocessing.normalize(data, norm='l2',axis=0)
is there a way to serialize preprocessing.normalize
In SVMs it is recommended a scaler for several reasons.
It is better to have the same scale in many optimization methods.
Many kernel functions use internally an euclidean distance to compare two different samples (in the gaussian kernel the euclidean distance is in the exponential term), if every feature has a different scale, the euclidean distance only take into account the features with highest scale.
When you put the features in the same scale you must remove the mean and divide by the standard deviation.
xi - mi
xi -> ------------
sigmai
You must storage the mean and standard deviation of every feature in the training set to use the same operations in future data.
In python you have functions to do that for you:
http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html
To obtain means and standar deviations:
scaler = preprocessing.StandardScaler().fit(X)
To normalize then the training set (X is a matrix where every row is a data and every column a feature):
X = scaler.transform(X)
After the training, you must normalize of future data before the classification:
newData = scaler.transform(newData)

Qualitative Classification in Neural Network on Weka

I have a training set where the input vectors are speed, acceleration and turn angle change. Output is a crisp class- an activity state from the given set {rest, walk, run}. e.g- say for input vectors [3.1 1.2 2]-->run ; [2.1 1 1]-->walk and so on.
I am using weka to develop a Neural Network model. The output I am defining as crisp ones (or rather qualitative ones in words- categorical values). After training the model, the model can fairly classify on test data.
I was wondering how the internal process (mapping function) is taking place? Is the qualitative output states are getting some nominal value inside the model and after processing it is again getting converted to the categorical data? because a NN model cannot map float input values to a categorical data through hidden neurons, so what is actually happening, although the model is working fine.
If the model converts the categorical outputs into nominal ones and then start processing then on what basis it converts the categorical value into some arbitrary numerical values?
Yes, categorical values are usually being converted to numbers, and the networks learn to associate input data with these numbers. However these numbers are often further encoded, not to use only single output neuron. The most common way to do it, for unordered labels, is to add dummy output neurons dedicated to each category and use 1-of-C encoding, with 0.1 and 0.9 as target values. Output is interpreted using the Winner-take-all paradigm.
Using only one neuron and encoding categories with different numbers for unordered labels often leads to problems - as the network will treat middle categories as "averages" of the boundary categories. This however may sometimes be desired, if you have ordered categorical data.
You can find very good explanation of this issue in this part of the online Neural Network FAQ.
The neural net's computations all take place on continuous values. To do multiclass classification with discrete output, its final layer produces a vector of such values, one for each class. To make a discrete class prediction, take the index of the maximum element in that vector.
So if the final layer in a classification network for four classes predicts [0 -1 2 1], then the third element of the vector is the largest and the third class is selected. Often, these values are also constrained to form a probability distribution by means of a softmax activation function.

Resources