I am doing a project to find out the disease associated genes using text mining. I am using 1000 articles for this. I got around 129 gene names. The actual dataset contains around 1000 entries. Now I would like to calculate the precision and recall of my method. When i did the comparison, out of the 129 genes, 72 were found to be correct. So the
precision = 72/129.
Is it correct?
Now how can I calculate the recall? Please help
The Wikipedia Article on Precision and Recall might help.
The definitions are:
Precision: tp / (tp+fp)
Recall: tp / (tp + fn)
Where tp are the true positives (genes which are associated with disease and you found them), fp are the false positives (genes you found but they actually aren't associated with the disease) and fn are the false negatives (genes which are actually associated with the disease but you didn't find them).
I am not quite sure what the numbers you have posted represent. Do you know the genes which are truly associated with the disease?
You have most likely computed the accuracy:
Accuracy = (tp + fp) / (Total Number)
The main issue was that the articles that i am considering might not contain all the originally listed gene names since its a small dataset. So while calculating the recall, instead of considering the denominator as 1000, I can compare the original database of genes with the articles to find out how many of the originally associated genes are present in the literature. i.e., if there are 1000 associated genes, I will check out of 1000 how many are there in the dataset I am considering. If it is 300, i will set the denominator as 300 instead of 1000. That will give recall.
Related
Consider the below scenario:
I have batches of data whose features and labels have similar distribution.
Say something like 4000000 negative labels and 25000 positive labels
As its a highly imbalanced set, I have undersampled the negative labels so that my training set (taken from one of the batch) now contains 25000 positive labels and 500000 negative labels.
Now I am trying to measure the precision and recall from a test set after training (generated from a different batch)
I am using XGBoost with 30 estimators.
Now if I use all of 40000000 negative labels, I get a (0.1 precsion and 0.1 recall at 0.7 threshold) worser precision-recall score than if I use a subset say just 500000 negative labels(0.4 precision with 0.1 recall at 0.3 threshold)..
What could be a potential reason that this could happen?
Few of the thoughts that I had:
The features of the 500000 negative labels are vastly different from the rest in the overall 40000000 negative labels.
But when I plot the individual features, their central tendencies closely match with the subset.
Are there any other ways to identify why I get a lower and a worser presicion recall, when the number of negative labels increase so much?
Are there any ways to compare the distributions?
Is my undersampled training a cause for this?
To understand this, we first need to understand how precision and recall are calculated. For this I will use the following variables:
P - total number of positives
N - total number of negatives
TP - number of true positives
TN - number of true negatives
FP - number of false positives
FN - number of false negatives
It is important to note that:
P = TP + FN
N = TN + FP
Now, precision is TP/(TP + FP)
recall is TP/(TP + FN), therefore TP/P.
Accuracy is TP/(TP + FN) + TN/(TN + FP), hence (TP + TN)/(P + N)
In your case where the the data is imbalanced, we have that N>>P.
Now imagine some random model. We can usually say that for such a model accuracy is around 50%, but that is only if the data is balanced. In your case, there will tend to be more FP's and TN's than TP's and FN's because a random selection of the data has more liklihood of returning a negative sample.
So we can establish that the more % of negative samples N/(T+N), the more FP and TN we get. That is, whenever your model is not able to select the correct label, it will pick a random label out of P and N and it is mostly going to be N.
Recall that FP is a denominator in precision? This means that precision also decreases with increasing N/(T+N).
For recall, we have neither FP nor TN in its derivation, so will likely not to change much with increasing N/(T+N) . As can be seen in your example, it clearly stays the same.
Therefore, I would try to make the data balanced to get better result. A ratio of 1:1.5 should do.
You can also use a different metric like the F1 score that combines precision and recall to get a better understanding of the performance.
Also check some of the other points made here on how to combat imbalance data
So I've got the following results from Naïves Bayes classification on my data set:
I am stuck however on understanding how to interpret the data. I am wanting to find and compare the accuracy of each class (a-g).
I know accuracy is found using this formula:
However, lets take the class a. If I take the number of correctly classified instances - 313 - and divide it by the total number of 'a' (4953) from the row a, this gives ~6.32%. Would this be the accuracy?
EDIT: if we use the column instead of the row, we get 313/1199 which gives ~26.1% which seems a more reasonable number.
EDIT 2: I have done a calculation of the accuracy of a in excel which gives me 84% as the accuracy, using the accuracy calculation shown above:
This doesn't seem right, as the overall accuracy of classification successfully is ~24%
No -- all you've calculated is tp/(tp+fn), the total correct identifications of class a, divided by the total of actual a examples. This is recall, not accuracy. You need to include the other two figures.
fp is the rest of the a column; tn is all of the other figures in the non-a rows and columns, the 6x6 sub-matrix. This will reduce all 35K+ trials to a 2x2 matrix with labels a and not a, the 2x2 confusion matrix with which you're already familiar.
Yes, you get to repeat that reduction for each of the seven features. I recommend doing it programmatically.
RESPONSE TO OP UPDATE
Your accuracy is that high: you have a huge quantity of true negatives, not-a samples that were properly classified as not-a.
Perhaps it doesn't feel right because our experience focuses more on the class in question. There are [other statistics that handle that focus.
Recall is tp / (tp+fn) -- of all items actually in class a, what percentage did we properly identify? This is the 6.32% figure.
Precision is tp / (tp + fp) -- of all items identified as class a, what percentage were actually in that class. This is the 26.1% figure you calculated.
I didn't find an answer for this question anywhere, so I hope someone here could help me and also other people with the same problem.
Suppose that I have 1000 Positive samples and 1500 Negative samples.
Now, suppose that there are 950 True Positives (positive samples correctly classified as positive) and 100 False Positives (negative samples incorrectly classified as positive).
Should I use these raw numbers to compute the Precision, or should I consider the different group sizes?
In other words, should my precision be:
TruePositive / (TruePositive + FalsePositive) = 950 / (950 + 100) = 90.476%
OR should it be:
(TruePositive / 1000) / [(TruePositive / 1000) + (FalsePositive / 1500)] = 0.95 / (0.95 + 0.067) = 93.44%
In the first calculation, I took the raw numbers without any consideration to the amount of samples in each group, while in the second calculation, I used the proportions of each measure to its corresponding group, to remove the bias caused by the groups' different size
Answering the stated question: by definition, precision is computed by the first formula: TP/(TP+FP).
However, it doesn't mean that you have to use this formula, i.e. precision measure. There are many other measures, look at the table on this wiki page and choose the one most suited for your task.
For example, positive likelihood ratio seems to be the most similar to your second formula.
I am currently learning Information retrieval and i am rather stuck with an example of recall and precision
A searcher uses a search engine to look for information. There are 10 documents on the first screen of results and 10 on the second.
Assuming there is known to be 10 relevant documents in the search engines index.
Soo... there is 20 searches all together of which 10 are relevant.
Can anyone help me make sense of this?
Thanks
Recall and precision measure the quality of your result. To understand them let's first define the types of results. A document in your returned list can either be
classified correctly
a true positive (TP): a document which is relevant (positive) that was indeed returned (true)
a true negative (TN): a document which is not relevant (negative) that was indeed NOT returned (true)
misclassified
a false positive (FP): a document which is not relevant but was returned positive
a false negative (FN): a document which is relevant but was not returned negative
the precision is then:
|TP| / (|TP| + |FP|)
i.e. the fraction of retrieved documents which are indeed relevant
the recall is then:
|TP| / (|TP| + |FN|)
i.e. the fraction of relevant documents which are in your result set
So, in your example 10 out of 20 results are relevant. This gives you a precision of 0.5. If there are no more than these 10 relevant documents, you have got a recall of 1.
(When measuring the performance of an Information Retrieval system it only makes sense to consider both precision and recall. You can easily get a precision of 100% by returning no result at all (i.e. no spurious returned instance => no FP) or a recall of 100% by returning every instance (i.e. no relevant document was missed => no FN). )
Well, this is an extension of my answer on recall at: https://stackoverflow.com/a/63120204/6907424. First read about precision here and than go to read recall. Here I am only explaining Precision using the same example:
ExampleNo Ground-truth Model's Prediction
0 Cat Cat
1 Cat Dog
2 Cat Cat
3 Dog Cat
4 Dog Dog
For now I am calculating precision for Cat. So Cat is our Positive Class and the rest of the classes (Here Dog only) are the Negative Classes. Precision means what the percentage of positive detection was actually positive. So here for Cat there are 3 detection by the model. But are all of them correct? No! Out of them only 2 are correct (in example 0 and 2) and another is wrong (in example 3). So the percentage of correct detection is 2 out of 3 which is (2 / 3) * 100 % = 66.67%.
Now coming to the formulation, here:
TP (True positive): Predicting something positive when it is actually positive. If cat is our positive example then predicting something a cat when it is actually a cat.
FP (False positive): Predicting something as positive but which is not actually positive, i.e, saying something positive "falsely".
Now the number of correct detection of a certain class is the number of TP of that class. But apart from them the model also predicted some other examples as positives but which were not actually positives and so these are the false positives (FP). So irrespective of correct or wrong the total number of positive class detected by the model is TP + FP. So the percentage of correct detection of positive class among all detection of that class will be: TP / (TP + FP) which is the precision of the detection of that class.
Like recall we can also generalize this formula for any number of classes. Just take one class at a time and consider it as the positive class and the rest of the classes as negative classes and continue the same process for all of the classes to calculate precision for each of them.
You can calculate precision and recall in another way (basically the other way of thinking the same formulae). Say for Cat, first count how many examples at the same time have Cat in both Ground-truth and Model's prediction (i.e, count the number of TP). Therefore if you are calculating precision then divide this count by the number of "Cat"s in the Model's Prediction. Otherwise for recall divide by the number of "Cat"s in the Ground-truth. This works as the same as the formulae of precision and recall. If you can't understand why then you should think for a while and review what actually TP, FP, TN and FN means.
If you have difficulty understanding precision and recall, consider reading this
https://medium.com/seek-product-management/8-out-of-10-brown-cats-6e39a22b65dc
What's the meaning of recall of a classifier, e.g. bayes classifier? please give an example.
for example, the Precision = correct/correct+wrong docs for test data. how to understand recall?
Recall literally is how many of the true positives were recalled (found), i.e. how many of the correct hits were also found.
Precision (your formula is incorrect) is how many of the returned hits were true positive i.e. how many of the found were correct hits.
I found the explanation of Precision and Recall from Wikipedia very useful:
Suppose a computer program for recognizing dogs in photographs identifies 8 dogs in a picture containing 12 dogs and some cats. Of the 8 dogs identified, 5 actually are dogs (true positives), while the rest are cats (false positives). The program's precision is 5/8 while its recall is 5/12. When a search engine returns 30 pages only 20 of which were relevant while failing to return 40 additional relevant pages, its precision is 20/30 = 2/3 while its recall is 20/60 = 1/3.
So, in this case, precision is "how useful the search results are", and recall is "how complete the results are".
Precision in ML is the same as in Information Retrieval.
recall = TP / (TP + FN)
precision = TP / (TP + FP)
(Where TP = True Positive, TN = True Negative, FP = False Positive, FN = False Negative).
It makes sense to use these notations for binary classifier, usually the "positive" is the less common classification. Note that the precision/recall metrics is actually the specific form where #classes=2 for the more general confusion matrix.
Also, your notation of "precision" is actually accuracy, and is (TP+TN)/ ALL
Giving you an example. Imagine we have a machine learning model which can detect cat vs dog. The actual label which is provided by human is called the ground-truth.
Again the output of your model is called the prediction. Now look at the following table:
ExampleNo Ground-truth Model's Prediction
0 Cat Cat
1 Cat Dog
2 Cat Cat
3 Dog Cat
4 Dog Dog
Say we want to find recall for the class cat. By definition recall means the percentage of a certain class correctly identified (from all of the given examples of that class). So for the class cat the model correctly identified it for 2 times (in example 0 and 2). But does it mean actually there are only 2 cats? No! In reality there are 3 cats in the ground truth (human labeled). So what is the percentage of correct identification of this certain class? 2 out of 3 that is (2/3) * 100 % = 66.67% or 0.667 if you normalize it within 1. Here is another prediction of cat in example 3 but it is not a correct prediction and hence, we are not considering it.
Now coming to mathematical formulation. First understand two terms:
TP (True positive): Predicting something positive when it is actually positive. If cat is our positive example then predicting something a cat when it is actually a cat.
FN (False negative): Predicting something negative when it is not actually negative.
Now for a certain class this classifier's output can be of two types: Cat or Dog (Not Cat). So the number correct identification is the number of True positive (TP). Again total number of examples of that class in ground-truth will be TP + FN. Because out of all cats the model either detected them correctly (TP) or didn't detect them correctly (FN i.e, the model falsely said Negative (Non Cat) when it was actually positive (Cat)). So For a certain class TP + FN denotes the total number of examples available in the ground truth of that class. So the formula is:
Recall = TP / (TP + FN)
Similarly recall can be calculated for Dog as well. At that time think the Dog as the positive class and the Cat as negative classes.
So for any number of classes to find recall of a certain class take the class as the positive class and take the rest of the classes as the negative classes and use the formula to find recall. Continue the process for each of the classes to find recall for all of them.
If you want to learn about precision as well then go here: https://stackoverflow.com/a/63121274/6907424
In very simple language: For example, in a series of photos showing politicians, how many times was the photo of politician XY was correctly recognised as showing A. Merkel and not some other politician?
precision is the ratio of how many times ANOTHER person was recognized (false positives) : (Correct hits) / (Correct hits) + (false positives)
recall is the ratio of how many times the name of the person shown in the photos was incorrectly recognized ('recalled'): (Correct calls) / (Correct calls) + (false calls)