When do micro- and macro-averages differ a lot? [closed] - machine-learning

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 1 year ago.
Improve this question
I am learning Machine Learning theory. I have a confusion matrix of a prediction using a Logistic Regression with multiple classes.
Now I have calculated the micro and macro averages (precision & recall).
The values are quite different. Now I wonder which factors influence this. Under which conditions does it happen that micro and macro differ much?
What I noticed is that the accuracies of the predictions differ for the different classes. Is this the reason? Or what other factors can cause this?
The sample confusion matrix:
And my calculated micro-macro-averages:
precision-micro = ~0.7329
recall-micro = ~0,7329
precision-macro = ~0.5910
recall-macro = ~0.6795

The difference between micro and macro averages becomes apparent in imbalanced datasets.
The micro average is a global strategy that basically ignores that there is a distinction between classes. It is calculated by counting the total true positives, false negatives and false positives over all classes.
In classification tasks where the underlying problem is not a multilabel classification, the micro average actually equals the accuracy score. See that your micro precision and recall are equal. Compute the accuracy score and compare, you will see no difference.
In case of macro average, the precision and recall are calculated for each label separately and reported as their unweighted mean. Depending on how your classifier performs on each class, this can heavily influence the result.
You can also refer to this answer of mine, where it has been addressed in a bit more detail.

Related

Why micro precision/recall is better suited for class imbalance? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 1 year ago.
Improve this question
I have three classes. Suppose the number of elements of the first class is 30, the second-30, the third-1000.
Some algorithm gave predictions and the following error matrix was obtained(rows are predictions, columns are true labels).
[[ 1 0 10]
[ 29 2 10]
[ 0 28 980]]
From this matrix, it can be seen that the third class is well classified, although other classes are almost always wrong.
The result is the following precision and recall:
Precision.
micro: 0.927
macro: 0.371
Recall.
micro: 0.927
macro: 0.360
From the official documentation and from many articles, questions (for example, from here) it is said that it is better to use micro when classes are unbalanced. Although intuitively it seems that in this case micro shows too good metric values, despite the fact that the two classes are practically not classified.
The micro-precision/recall are not "better" for imbalanced classes.
In fact, if you look at the results, it is clear that the macro precision/recall have very small values when you have bad predictions on an unbalanced dataset (poor results on the less well represented label).
The micro-precision however does take into account the number of elements per class when it is computed.
From sklearn's micro and macro f1-score for example (same for precision and recall):
'micro':
Calculate metrics globally by counting the total true positives, false negatives and false positives.
'macro':
Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
So macro actually penalises you when you have poor results in a label which is not well represented.
Micro-average on the other hand does not do that as it computes globally the metrics.
For example this means that if you have many samples in class 0, and say, many of the predictions are correct, while few samples in class 1 with many bad predictions, a micro-precision/recall could potentially yield a high number, while a macro-metric (precision/recall/f1-score) would penalise (yield a small number) for poor results on a specific label.
Now it really depends on what you are interested. If you want to globally have good results and you do not care about the distribution of labels, micro-metric could be suitable.
However we usually care about the results on less well-represented classes within our datasets, hence the utility of a macro-metric in spite of the micro-metric.

What are the performance metrics for Clustering Algorithms? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 1 year ago.
Improve this question
I'm working on Kmeans clustering but unlike supervised learning I cannot figure the performance metrics for clustering algorithms. How to perform the accuracy after training the data?
For kmeans you can find the inertia_ of it. Which can give you an idea how well kmeans algorithm has worked.
kmeans = KMeans(...)
# Assuming you already have fitted data on it.
kmeans.inertia_ # lesser is better
Or, alternatively if you call score() function, which will give you the same but the sign will be negative. As we assume bigger score means better but for kmeans lesser inertia_ is better. So, to make them consistent an extra negation is applied on it.
# Call score with data X
kmeans.score(X) # greater is better
This is the very basic form of analyzing performance of kmeans. In reality if you take the number of clusters too high the score() will increase accordingly (in other words inertia_ will decrease), because inertia_ is nothing but the summation of the squared distances from each point to its corresponding cluster's centroid to which cluster it is assigned to. So if you increase the number of the clusters too much, the overall distances' squared summation will decrease as each point will get a centroid very near to it. Although, the quality of clustering is horrible in this case. So, for better analysis you should find out silhouette score or even better use silhouette diagram in this case.
You will find all of the implementations in this notebook: 09_unsupervised_learning.ipynb
The book corresponding to this repository is: Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition. It is a great book to learn all of these details.

Which is given more importance Precision or Recall in classification report for obtaining a better prediction model [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 1 year ago.
Improve this question
For a classification problem in machine learning in classification report out of precision and recall which one is given more importance to get better model?
It actually depends on your classification problem.
First, you need to understand the difference between precision and recall.
Wikipedia may be a good start, but I would suggest this resource by developers.google.
Now imagine you're trying to track covid cases with a classifier.
The classifier tells you if a patient is carrying covid or not.
Are you more interested in:
A) Identifying all possible covid cases?
B) Being sure that if you identify a covid case that one is actually a real covid case?
If A) is more important you should focus on recall. On the other hand, if you're more interested in B), then precision is probably what you're looking for.
Be aware though:
Let's say you're testing 1000 possible cases, and let's say 500 of them are positive, we just don't know yet. You use the classifier and it tells you all 1000 people are positive.
So you have:
true_positives = 500
false_negatives = 0
recall = true_positives / (true_positives + false_negatives)
recall = 500 / (500 + 0) = 1
So here you have a good recall, but you're not precise, nor accurate.
What I'm trying to express is that one shouldn't focus on one metric over another, but always keep a broad view on the problem.
However, if you want to focus on just one metric to sum up both precision and recall, that's what the F score was made for.

How do you calculate accuracy metric for a regression problem with multiple outputs? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
My CNN (Conv1D) on pytorch has 20 inputs and 6 outputs. The predicted output is said to be "accurate" only of all 6 them match,right? So, unless all my predicted results are accurate to the 8th decimal point ,will I ever be able to get decent accuracy?
The standard accuracy metric is used for classification tasks. In order to sue accuracy you have to say if an output if one of the following: True positive (TP), True Negative (TN), False positive (FP), False negative (FN).
These classification metrics and be used to a certain extentin regression tasks, when you can apply these labels (TP, TN, FP, FN) to the outputs, maybe via simple threshold. This heavily depends on the kind of problem you are dealing with and may or may not be possible or useful.
As Andrey said in general you wan't to use metrics like the Mean absolute error (MAE) or the Mean squared error (MSE). But these metrics can be hard to interpret. I would suggest to look into papers who have a similar problem as you do and see which metrics they use to evaluate their results and compare themselves to other work.
Accuracy isn't a suitable metric for regression tasks. For regression tasks you should use such metrics as MAE, RMSE and so on.

what is f1-score and what its value indicates? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 5 years ago.
Improve this question
There is an evaluation metric on sklearn, it is f1-score(Also f-beta score exists).
I know how to use it, but I could not quite understand what is stands for.
What does it indicates when it is big or small.
if we put formula aside, what should I understand from a f-score value?
F-score is a simple formula to gather the scores of precision and recall. Imagine you want to predict labels for a binary classification task (positive or negative). You have 4 types of predictions:
true positive: correctly assigned as positive.
true negative: correctly assigned as negative.
false positive: wrongly assigned as positive.
false negative: wrongly assigned as negative.
Precision is the proportion of true positive on all positives predictions. A precision of 1 means that you have no false positive, which is good because you never says that an element is positive whereas it is not.
Recall is the proportion of true positives on all actual positive elements. A recall of 1 means that you have no false negative, which is good because you never says an element belongs to the opposite class whereas it actually belongs to your class.
If you want to know if your predictions are good, you need these two measures. You can have a precision of 1 (so when you say it's positive, it's actutally positive) but still have a very low recall (you predicted 3 good positives but forgot 15 others). Or you can have a good recall and a bad precision.
This is why you might check f1-score, but also any other type of f-score. If one of these two values decreases dramatically, the f-score also does. But be aware that in many problems, we prefer giving more weight to precision or to recall (in web security, it is better to wrongly block some good requests than to let go some bad ones).
The f1-score is one of the most popular performance metrics. From what I recall this is the metric present in sklearn.
In essence f1-score is the harmonic mean of the precision and recall. As when we create a classifier we always make a compromise between the recall and precision, it is kind of hard to compare a model with high recall and low precision versus a model with high precision but low recall. f1-score is measure that we can use to compare two models.
This is not to say that a model with higher f1 score is always better - this could depend on your specific case.

Resources