Precision and recall scores of POS tags - named-entity-recognition

I am training a NER model using CRF. The results I've gotten were,
B-NP:
Precision (0.98)
Recall (1.00)
f1-score (0.99)
accuracy (0.99)
What do these numbers represent in relation to the POS-tags? Does this mean that the model recognizes B-NP at a score of those above?

Related

Precision of a model in training (balanced dataset) versus production (imbalanced dataset)

I have a balanced dataset used for model training purposes. There are two classes. My model has a precision of 50%, meaning that for 100 samples it predicts that 50 are positive, of those 50 only 25 are actually positive. The model is basically as good as flipping a coin.
Now in production, the data is highly unbalanced, say only 4 out of 100 samples are positive. Will my model still have the same precision?
The way I understand it is that my coin-flip model would then label 50 samples as positive, of which only 2 would actually be positive so precision would be 4% (2/50) in production.
Is it true that a model that was trained on a balanced dataset would have a different precision in production?
That depends: of those 50 samples classified as positive, are all 25 true positive samples correctly classified?
If your model correctly predicts every positive sample as positive and then also negative samples as positive (high sensitivity, low specificity), I think your precision would be at around 8%. Nevertheless, you should revisit your training, since fpr 50% precision you don't need a ML model but rather a one-liner generating a random variable between 0 and 1.

Average precision score too high looking at the confusion matrix

I am developing a machine learning scikit-learn model on an imbalanced dataset (binary classification). Looking at the confusion matrix and the F1 score, I expect a lower average precision score but I almost get a perfect score and I can't figure out why. This is the output I am getting:
Confusion matrix on the test set:
[[6792 199]
[ 0 173]]
F1 score:
0.63
Test AVG precision score:
0.99
I am giving the avg precision score function of scikit-learn probabilities which is what the package says to use. I was wondering where the problem could be.
The confusion matrix and f1 score are based on a hard prediction, which in sklearn is produced by cutting predictions at a probability threshold of 0.5 (for binary classification, and assuming the classifier is really probabilistic to begin with [so not SVM e.g.]). The average precision in contrast is computed using all possible probability thresholds; it can be read as the area under the precision-recall curve.
So a high average_precision_score and low f1_score suggests that your model does extremely well at some threshold that is not 0.5.

In Classification, what is the difference between the test accuracy and the AUC score?

I am working on a classification-based project, and I am evaluating different ML models based on their training accuracy, testing accuracy, confusion matrix, and the AUC score. I am now stuck in understanding the difference between the scores I get by calculating accuracy of a ML model on the test set (X_test), and the AUC score.
If I am correct, both metrics calculate how well a ML model is able to predict the correct class of previously unseen data. I also understand that for both, the higher the number, the better, for as long as the model is not over-fit or under-fit.
Assuming a ML model is neither over-fit nor under-fit, what is the difference between test accuracy score and the AUC score?
I don't have a background in math and stats, and pivoted towards data science from business background. Therefore, I will appreciate an explanation a business person can understand.
Both terms quantify the quality of a classification model, however, the accuracy quantifies a single manifestation of the variables, which means it describes a single confusion matrix. The AUC (area under the curve) represents the trade-off between the true-positive-rate (tpr) and the false-positive-rate (fpr) in multiple confusion matrices, that are generated for different fpr values for the same classifier.
A confusion matrix is of the form:
1) The accuracy is a measure for a single confusion matrix and is defined as:
where tp=true-positives, tn=true-negatives, fp=false-positives and fn=false-negatives (the amount of each).
2) The AUC measures the area under the ROC (receiver operating characteristic), that is the trade-off curve between the true-positive-rate and the false-positive-rate. For each choice of the false-positive-rate (fpr) threshold,the true-positive-rate (tpr) is determined. I.e, for a given classifier a fpr of 0, 0.1, 0.2 and so fourth is accepted, and for each fpr it's dependent tpr is evaluated. Therefore, you get a function tpr(fpr) that maps the interval [0,1] onto the same interval, because both rates are defined in those intervals. The area under this line is called the AUC, that is between 0 and 1, whereby a random classification is expected to yield an AUC of 0.5.
The AUC, as it is the area under the curve, is defined as:
However, in real (and finite) applications, the ROC is a step function and the AUC is determined by a weighted sum these levels.
Graphics are from Borgelt's Intelligent Data Mining Lecture.

what's the meaning of high precision and very much low recall of a recommender system ?

I have not much knowledge about precision and recall. I have design a recommender system. Its gives me
precision value = 0.409
and recall value = 0.067
we know that precision and recall are inversely related though I am not sure about that. Then what about my system??
Its that ok if I can increase precision value and decrease recall
value?
Precision is the percentage of your correctness when you choose positive since it depend on you prediction when you choose positive only (Depend on model positive prediction only ) an. In the other side , Recall measure whats you percentage of correctness in the positive Class (i.e in the All positive cases what is the percentage of true decision that the model take).

Why KNN has low accuracy but high precision?

I classified 20NG dataset with k-nn with 200 instance in each category with 80-20 train-test split where i found the following results
Here accuracy is quite low but how precision is high when accuracy is that low ? isn't precision formulae TP/(TP + FP) ? If yes than high accurate classifier needs to generate high true positive which will result in high precision but how K-nn is generating high precision with too less true positive rate ?
Recall is equivalent to the True Positive rate. Text classification tasks (especially Information Retrieval, but Text Categorization as well) show a trade-off between recall and precision. When precision is very high, recall tends to be low, and the opposite. This is due to the fact that you can tune the classifier to classify more or less instances as positive. The less instances you classify as positive, the higher the precision and the lower the recall.
To ensure that the effectiveness measure correlates with accuracy, you shoud focus on the F-measure, which averages recall and precision (F-measure = 2*r*p / (r+p)).
Non-lazy classifiers follow a training process in which they try to optimize accuracy or error. K-NN, being lazy, has not a training process, and in consequence, it does not try to optimize any effectiveness measure. You can play with different values of K, and intuitively, the bigger the K the higher the recall and the lower the precision, and the opposite.

Resources