Accuracy of Neural Networks incase of doing prediction of a continuious variable - machine-learning

Is there a way to calculate Accuracy instead of Error metrics for neural networks when doing regression (prediction of continuous variable) the same way we do when classifying categorical variables?

Though, the concept of accuracy comes in the classification, but you can print the predicted values and check them with dependent variables.

The problem with continuous variable, is that the probability to reproduce exactly a given value is (practically) zero. For instance if your neural network produces 2.000001 and the actual value is 2, then this would count as a wrong prediction as both values are different (although they are very close). Error metric like the root mean square, measure therefore at the average difference (squared).
However, depending on your application, you could introduce a threshold value ϵ and consider a given output of your neural network as correct if the absolute value of the difference between the observed value and the output is smaller than ϵ and compute the percentage of correct prediction.
In practice such a metric is not minimized directly, because it is difficult to compute its gradient, but it is still a useful quantity to compute.

Related

permutation importance in h2o random Forest

The CRAN implementation of random forests offers both variable importance measures: the Gini importance as well as the widely used permutation importance defined as
For classification, it is the increase in percent of times a case is
OOB and misclassified when the variable is permuted. For regression,
it is the average increase in squared OOB residuals when the variable
is permuted
By default h2o.varimp() computes only the former. Is there really no option in h2o to get the alternative measure out of a random forest model?
Thanks!
ML
H2O does not calculate permutation importance. Please see the documentation for the explanation of how variable importance is calculated.
For your convenience I'll paste it as well below:
How is variable importance calculated for DRF?
Variable importance is determined by calculating the relative influence of each variable: whether that variable was selected during splitting in the tree building process and how much the squared error (over all trees) improved as a result.
A feature request has been previously made for this issue, you can follow it here (though note it is currently open).

Neural Network Custom Binary Prediction

I am trying to design a neural network that makes a custom binary prediction.
Normally to do binary prediction, I would use a softmax as my last layer, and then my loss could be the difference between the prediction I made and the true binary value.
However, what if I don't want to use a softmax layer. Instead, I output a real valued number, and check if some condition on this number is true. In a really simple case, I check if this number is positive. If it is, I predict 1, else I predict 0. Let's say I want all the numbers to be positive, so the true predictions should be all 1, and then I want to train this network such that it outputs all positive numbers. I am confused as how to formulate a loss function for this problem, so that I am able to back propagate and train the network.
Does anyone have an idea how to create this kind of network?
I am confused as how to formulate a loss function for this problem, so
that I am able to back propagate and train the network.
Here's how you should approach it. Effectively, you need to transform the labels to positive and negative target values (say +1 and -1) and solve the regression problem. The loss function can be a simple L1 or L2 loss. The network will try to learn to output a prediction close to the training target, which you can afterwards interpret if it's closer to one target or another, i.e. positive or negative. You can even go ahead and make some targets larger (e.g. +2 or +10) to emphasize that these examples are very important. Example code: linear regression in tensorflow.
However, I simply have to warn you that your approach has serious drawbacks, see for instance this question. One outlier in training data can easily skew your predictions. Classification with softmax + cross-entropy loss is more stable, that's why almost always a better choice.

Should you normalize outputs of a neural network for regression tasks?

I've made a CNN that takes a signal as input and outputs the parameters used in a simulation to create that signal. I've heard that for regression tasks you don't normally normalize the outputs to a neural network. But the variables the model is trying to predict have very different standard deviations, like one variable is always in the range of [1x10^-20, 1x10-24] while another is almost always in the range of [8, 16]. My question is since all loss functions first take the difference between the target and actual output values and this difference would naturally scale with the std of that output variable wouldn't loss of the network mostly dependent on the accuracy of the output variables with large stds and not ones with small stds? Also would unnormalized output hinder the training process since the network can get low loss for an output variable with very low std by just guessing values close to its mean?
If this is the case why can't I find much on the internet talking about or suggesting to normalize outputs? It seems really important for getting reliable loss values.

What is a loss function in simple words?

Can anyone please explain in simple words and possibly with some examples what is a loss function in the field of machine learning/neural networks?
This came out while I was following a Tensorflow tutorial:
https://www.tensorflow.org/get_started/get_started
It describes how far off the result your network produced is from the expected result - it indicates the magnitude of error your model made on its prediciton.
You can then take that error and 'backpropagate' it through your model, adjusting its weights and making it get closer to the truth the next time around.
The loss function is how you're penalizing your output.
The following example is for a supervised setting i.e. when you know the correct result should be. Although loss functions can be applied even in unsupervised settings.
Suppose you have a model that always predicts 1. Just the scalar value 1.
You can have many loss functions applied to this model. L2 is the euclidean distance.
If I pass in some value say 2 and I want my model to learn the x**2 function then the result should be 4 (because 2*2 = 4). If we apply the L2 loss then its computed as ||4 - 1||^2 = 9.
We can also make up our own loss function. We can say the loss function is always 10. So no matter what our model outputs the loss will be constant.
Why do we care about loss functions? Well they determine how poorly the model did and in the context of backpropagation and neural networks. They also determine the gradients from the final layer to be propagated so the model can learn.
As other comments have suggested I think you should start with basic material. Here's a good link to start off with http://neuralnetworksanddeeplearning.com/
Worth to note we can speak of different kind of loss functions:
Regression loss functions and classification loss functions.
Regression loss function describes the difference between the values that a model is predicting and the actual values of the labels.
So the loss function has a meaning on a labeled data when we compare the prediction to the label at a single point of time.
This loss function is often called the error function or the error formula.
Typical error functions we use for regression models are L1 and L2, Huber loss, Quantile loss, log cosh loss.
Note: L1 loss is also know as Mean Absolute Error. L2 Loss is also know as Mean Square Error or Quadratic loss.
Loss functions for classification represent the price paid for inaccuracy of predictions in classification problems (problems of identifying which category a particular observation belongs to).
Name a few: log loss, focal loss, exponential loss, hinge loss, relative entropy loss and other.
Note: While more commonly used in regression, the square loss function can be re-written and utilized for classification.

how to train neural network with probabilistic input

Hello and thanks for helping,
My question is a long time problem that I try to tackle :
How do we train a neural network if the input is a probability rather than a value ?
To make it more intuitive :
Let's say we have 6 features and the value they may take is 1 or -1 for each.
Their value is determined probabilistically, such as the feature 1 can be 1 with 60% probability or -1 with 30% probability.
How do we train the network if in each trial, we may get a INPUT value in accordance with the probability distribution of each feature ?
Actually the answer is more straingthforward than you might expect, as many existing neural networks are actually trained exactly in this manner. You have to do ... nothing. Simply sample your batch in each iteration according to your distribution and that's all. Neural network does not require finite training set, thus you can efficiently train it on "potentialy ifinite" one (generator of samples). This is exactly what is being done in image processing with image augmentation - each batch consists random subsamples of the images (patches), which are sampled from very basic probability distributions.
#Nagabuhushan suggests solving different problem - where you know a priori probability of each sample, which, according to question is not the case:
we may get a INPUT value in accordance with the probability distribution of each feature
Plus, even if it would be the case, NNs are not good with multiplying thus one might need additional tweaking of architecture (log-transforms).
For the values you feed into the net, you should use the probabilities of each feature taking on the value 1. You could use the probabilities of them taking on -1, but be consistent. Also, determine some order of features and consistently order their probabilities, respectively.
Edit: I think I may have misunderstood the question. Do your inputs consist of probabilities, or 1's and -1's? If the latter, then a well-architected network should learn the distributions on its own. Just be sure to train it against the same input space that you'll be evaluating it against.

Resources