There is a Naive Bayesian classifier which is created with a given training data. In the table, the predicted positive class probabilities and the actual class labels are shown. I want to prepare the confusion matrix but I could not find out how to do it with just knowing the probabilities.
ID
Actual class label
Predicted positive class probability
1
+
0.6
2
+
0.8
3
-
0.2
4
+
0.3
5
-
0.4
First, you need to have discrete class labels to compute confusion matrix. Define a threshold on the predicted positive class probability to predict class labels (y_pred).
You can then use actual class labels (y_actual) and y_pred to compute the confusion matrix.
from sklearn.metrics import confusion_matrix
confusion_matrix(y_actual, y_pred)
Related
What would be an answer to this?
Consider a dataset D that contains only two observations 𝐱1=(1,1) and 𝐱2=(−1,−1) . Suppose that the class of the first observation is 𝑦1=0 and that the class of the second observation is 𝑦2=1 . How would a 1-nearest neighbour classifier based on the Euclidean distance classify the observation 𝐱=(2,3) ? What are the distances between this new observation and each observation in the dataset? [0.5 marks out of 5]
Since x is closer to x_1 than to x_2 (I'll let you compute the distances yourself), the classifier would assign the class of x_1 to x, i.e. 0
What does pytorch SGD do if I feed the whole data and do not specify the batch size? I don't see any "stochastic" or "randomness" in the case.
For example, in the following simple code, I feed the whole data (x,y) into a model.
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
for epoch in range(5):
y_pred = model(x_data)
loss = criterion(y_pred, y_data)
optimizer.zero_grad()
loss.backward()
optimizer.step()
Suppose there are 100 data pairs (x,y), i.e. x_data and y_data each has 100 elements.
Question: It seems to me that all the 100 gradients are calculated before one update of parameters. Size of a "mini_batch" is 100, not 1. So there is no randomness, am I right? At first, I think SGD means randomly choose 1 data point and calculate its gradient, which will be used as an approximation of the true gradient from all data.
The SGD optimizer in PyTorch is just gradient descent. The stocastic part comes from how you usually pass a random subset of your data through the network at a time (i.e. a mini-batch or batch). The code you posted passes the entire dataset through on each epoch before doing backprop and stepping the optimizer so you're really just doing regular gradient descent.
I understand that the ROC curve for a model is constructed by varying the threshold (that affects TPR, FPR).
Thus my initial understanding is that, to calculate the AUROC, you need to run the model many times with different threshold to get that curve and finally calculate the area.
But it seems like you just need some probability estimate of the positive class, as in the code example in sklearn's roc_auc_score below, to calculate AUROC.
>>> import numpy as np
>>> from sklearn.metrics import roc_auc_score
>>> y_true = np.array([0, 0, 1, 1])
>>> y_scores = np.array([0.1, 0.4, 0.35, 0.8])
>>> roc_auc_score(y_true, y_scores)
0.75
How does that work? Any recommended read?
How does categorical accuract works? By definition
categorical_accuracy checks to see if the index of the maximal true
value is equal to the index of the maximal predicted value.
and
Calculates the mean accuracy rate across all predictions for
multiclass classification problems
What does it mean in practice? Lets say i am prediction bounding box of object
it has (xmin,ymin,xmax,ymax) does it check if xmin predicted is equal with xmin real? So if i xmin and xmax where same in prediction and real values, and ymin and ymax were different i would get 50%?
Please help me undestand this concept
Traditionally for multiclass classification, your labels will have some integer (or equivalently categorical) label; for example:
labels = [0, 1, 2]
The output of a multiclass classification prediction will typically be a probability distribution of confidences; for example:
preds = [0.25, 0.5, 0.25]
Normally the index associated with the most likely event will be the index of the label. In this case, the argmax(preds) is 1, which maps to label 1.
You can see the total accuracy of your predictions a la confusion matrices, where one axis is the "true" value, and the other axis is the "predicted" value. The values for each cell are the sums of the values of CM[y_true][y_pred]. The accuracy will be the sum of main diagonal of the matrix (y_true = y_pred) over the total number of training instances.
i have a classical y_train which is composed of 0 (negative) and 1 (positive) in a one dimensionnal shape. I wanted to train a tensorflow model but i have to initialize the y placeholder with the number of classes i want. So in this text classification case, i want the model to check negative or positive value so 2 classes ? But how did i convert my y_train to fit the output that i'm looking for. Thanks for your time!
"ValueError: Cannot feed value of shape (25000, 1) for Tensor u'Placeholder_5:0', which has shape (Dimension(None), Dimension(2))"
It appears your y_train contains the label values themselves, whereas the y_train needed by the model requires label probabilities. In your case, since there are only two labels, you can convert that to label probabilities as follows :
y_train = tf.concat(1, [1 - y_train, y_train])
If you have more labels, have a look at sparse_to_dense to convert them to probabilities.