I know that the ROC-Curve is calculated from the True-Positive-Rate and the False-Positive-Rate.
But the ROC-Curve has infinite Elements on it's Curve, right? How is each Element calculated? Can someone explain this to me? Where is each point coming from?
Example
Thanks in Advance
The values are calculated for all values of the threshold of the classifier.
On the x axis, you have the "false positive rate" for the given threshold: FPR = FP / (TN + FP) where:
FP are the number of false positive (the elements predicted positive but which are negative);
TN the number of true negative (the elements predicted negative and are really negative);
and FP the number of false positive (the elements predicted positive but are negative).
On the y axis, you have the "true positive rate" for the given threshold: TPR = TP / (TP + FN) where:
TP are the number of true positive (predicted positive and are indeed positive);
FN the number of false negative (predicted negative but are positive).
You have not an infinite number of points in practice: you are limited to the number of points of the dataset (the rate dont change for some ranges of threshold).
Related
When we want to assess the quality of a positive prediction made by the model, which is the number of true positives divided by the total number of positive predictions.
Also, the recall shows the model's ability to detect positive samples and biases towrds negative predictions, which is the ratio between the numbers of true positivto the total number of positive samples.
Is there any meaning in the ratio between True negative and all negative predictions??
The false positive rate is the x-axis spanning from 0 to 1. The true positive rate is the y-axis spanning from 0 to 1. And the graphs show data points like (.8,.8). Which if the tpr is .8 and the fpr is .8, they add up to 1.6...
Typically the axis are normalised using the total number of FPs or TPs in the test/validation set. Otherwise the end of the curve wouldn't be 1/1. I personally prefer to label the axis by the number of instances.
Why to not normalise by the total number - in real applications, it gets rather complicated as you often do not have labels for all examples. The typical example for ROC curves are mass mailings. To normalise the curve correctly you would need to spam the entire world.
In the course cs231n, when I go through the note about activation function, I meet a problem about sigmoid function. Here is the screen shot:
cons and pros of sigmoid
In my opinion, since the gradient dw = x.T dot dout, although now x.T is all positive, but after matrix multiplication, why dw will be all positive or negative? the only chance of that is that dout is all positive or negative, but why is that?
Can someone help me?
If you read the exact sentence, in its entirety it says (slightly paraphrased):
If the data coming into a neuron is always positive then the gradient on the weights during backpropagation become either all positive or all negative (depending on the gradient of the whole expression f).
Assume f = w^Tx + b. Then the gradients with respect to the weights is \nabla_w L = (dL/df)(df/dw). Since dL/df is a scalar, it is either positive or negative (or zero, but that is unlikely). On the other hand,
df/dw = x. So clearly if x is all positive or all negative, then df/dw is also all positive or all negative. But this means that \nabla_w L must also be all positive or all negative, because dL/df cannot change the signs of individual elements of df/dw differently. Thus the sign of the gradient is homogeneous.
How does categorical accuract works? By definition
categorical_accuracy checks to see if the index of the maximal true
value is equal to the index of the maximal predicted value.
and
Calculates the mean accuracy rate across all predictions for
multiclass classification problems
What does it mean in practice? Lets say i am prediction bounding box of object
it has (xmin,ymin,xmax,ymax) does it check if xmin predicted is equal with xmin real? So if i xmin and xmax where same in prediction and real values, and ymin and ymax were different i would get 50%?
Please help me undestand this concept
Traditionally for multiclass classification, your labels will have some integer (or equivalently categorical) label; for example:
labels = [0, 1, 2]
The output of a multiclass classification prediction will typically be a probability distribution of confidences; for example:
preds = [0.25, 0.5, 0.25]
Normally the index associated with the most likely event will be the index of the label. In this case, the argmax(preds) is 1, which maps to label 1.
You can see the total accuracy of your predictions a la confusion matrices, where one axis is the "true" value, and the other axis is the "predicted" value. The values for each cell are the sums of the values of CM[y_true][y_pred]. The accuracy will be the sum of main diagonal of the matrix (y_true = y_pred) over the total number of training instances.
I have two binary images one as ground truth image and other as experimental/testing image. I want to calculate true positive, false positive and false negative where my region of interest is blobs (i.e. circle and ellipse) present in the images.
For the true positive, the intersection of images using 'or bitwise' operation was performed along with counting a total of black pixels present in the intersected image as 'Total number of True Positive pixels',i.e, TP.
For false positive, pixels having value 255 in the ground-truth image was considered and a total of white pixels assigned as'Total number of False Positive',i.e, FP
For false negative pixels having value 255 in experimental-image was considered and a total of white pixels assigned as 'Total number of False Negative',i.e., FN
Precision and Recall is calculated as:
Precision as TP / (TP + FP)
Recall as TP / (TP + FN)
It seems the values are calculated wrong as I got precision to be 19%.
Please guide me on this.
Thanks in advance.