How to get the each class results seperately in multiclass confusion matrix - machine-learning

I have actual class and res class here - https://extendsclass.com/csv-editor.html#46eaa9e
I wanted to calculate the sensitivity, specificity, pos predictivity for each of the class A, N,O. Here is my code
Here is the code
from sklearn.metrics import multilabel_confusion_matrix
import numpy as np
mcm = multilabel_confusion_matrix(act_class, pred_class)
tps = mcm[:, 1, 1]
tns = mcm[:, 0, 0]
recall = tps / (tps + mcm[:, 1, 0]) # Sensitivity
specificity = tns / (tns + mcm[:, 0, 1]) # Specificity
precision = tps / (tps + mcm[:, 0, 1]) # PPV
print(recall)
print(specificity)
print(precision)
print(classification_report(act_class, pred_class))
Which gives me results like this
[0.31818182 0.96186441 nan nan]
[0.99576271 0.86363636 0.86092715 0.99337748]
[0.95454545 0.96186441 0. 0. ]
precision recall f1-score support
A 0.95 0.32 0.48 66
N 0.96 0.96 0.96 236
O 0.00 0.00 0.00 0
~ 0.00 0.00 0.00 0
accuracy 0.82 302
macro avg 0.48 0.32 0.36 302
weighted avg 0.96 0.82 0.86 302
The problem here is - I can not deduce clearly what is the sensitivity, specificity, pos predictivity for each of the class A, N,O.

This might be quicker to explain visually:
By default the labels should occur in sorted order (for your problem: A, N, O, ~).
If you want a different order, you can specify one with the labels= parameter. The following has two classes, and orders them by: [3, 2]
from sklearn.metrics import multilabel_confusion_matrix
from sklearn.metrics import classification_report
y_true = [2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3]
y_pred = [2, 2, 2, 3, 3, 2, 2, 2, 3, 3, 3, 3]
mcm = multilabel_confusion_matrix(y_true, y_pred, labels=[3, 2])
tps = mcm[:, 1, 1]
precision = tps / (tps + mcm[:, 0, 1])
print(precision)
print(f"Precision class 3: {precision[0]}. Precision class 2: {precision[1]}")
print(classification_report(y_true, y_pred, labels=[3, 2]))
Output:
[0.66666667 0.5 ]
Precision class 3: 0.6666666666666666. Precision class 2: 0.5
precision recall f1-score support
3 0.67 0.57 0.62 7
2 0.50 0.60 0.55 5
accuracy 0.58 12
macro avg 0.58 0.59 0.58 12
weighted avg 0.60 0.58 0.59 12

Related

Isolation Forest getting accuracy score 0.0 [closed]

Closed. This question is not about programming or software development. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 6 months ago.
Improve this question
Edit: please share comments as I'm learning to post good questions
I'm trying to train this dataset with IsolationForest(), I need to train this dataset, and use it in another dataset with altered qualities to predict the quality values and fetch all wines with quality 8 and 9.
However I'm having some problems with it. Because the accuracy score is 0.0 from the classification report:
print(classification_report(y_test, prediction))
precision recall f1-score support
-1 0.00 0.00 0.00 0.0
1 0.00 0.00 0.00 0.0
3 0.00 0.00 0.00 866.0
4 0.00 0.00 0.00 829.0
5 0.00 0.00 0.00 841.0
6 0.00 0.00 0.00 861.0
7 0.00 0.00 0.00 822.0
8 0.00 0.00 0.00 886.0
9 0.00 0.00 0.00 851.0
accuracy 0.00 5956.0
macro avg 0.00 0.00 0.00 5956.0
weighted avg 0.00 0.00 0.00 5956.0
I don't know if it's a hyperparameter issue, or if I'm clearing the wrong data or putting wrong parameters, I already tried to use with SMOTE and without SMOTE, I wanted to reach an accuracy of 90% at least.
I'll leave the shared drive link public for dataset verification::
https://drive.google.com/drive/folders/18_sOSIZZw9DCW7ftEKuOG4aIzGXoasFe?usp=sharing
Here's my code:
from sklearn.preprocessing import OrdinalEncoder
from sklearn.ensemble import IsolationForest
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from imblearn.over_sampling import SMOTE
from sklearn.metrics import classification_report,confusion_matrix
df = pd.read_csv('wines.csv')
df.head(5)
ordinalEncoder = OrdinalEncoder(handle_unknown='use_encoded_value', unknown_value=-99).fit(df[['color']])
df[['color']] = ordinalEncoder.transform(df[['color']])
df.info()
df['color'] = df['color'].astype(int)
df.head(3)
stm = SMOTE(k_neighbors=4)
x_smote = df.drop('quality',axis=1)
y_smote = df['quality']
x_smote,y_smote = stm.fit_resample(x_smote,y_smote)
print(x_smote.shape,y_smote.shape)
x_smote.columns
scaler = StandardScaler()
X = scaler.fit_transform(x_smote)
y = y_smote
X.shape, y.shape
x_train, x_test, y_train, y_test = train_test_split(X,y,test_size=0.3)
from sklearn.ensemble import IsolationForest
from sklearn.metrics import hamming_loss
iforest = IsolationForest(n_estimators=200, max_samples=0.1, contamination=0.10, max_features=1.0, bootstrap=False, n_jobs=-1,
random_state=None, verbose=0, warm_start=False)
iforest_fit = iforest.fit(x_train,y_train)
prediction = iforest_fit.predict(x_test)
print (prediction.shape, y_test.shape)
y.value_counts()
prediction
print(confusion_matrix(y_test, prediction))
hamming_loss(y_test, prediction)
from sklearn.metrics import classification_report
print(classification_report(y_test, prediction))
May I know why do you choose Isolation Forest as your model? This article says that Isolation Forest is an unsupervised learning algorithm for anomaly detection.
When I print some samples of the prediction (by Isolation Forest) and samples of actual truth, I get the following results, so you know why the accuracy score is 0.0:
print(list(prediction[0:15]))
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
print(list(y_test[0:15]))
[9, 4, 4, 7, 9, 3, 6, 7, 4, 8, 8, 7, 3, 8, 5]
The wines.csv dataset and your code are both pointing towards a multi-class classification problem. I have chosen RandomForestClassifier() to continue with the second part of your code:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import hamming_loss
model = RandomForestClassifier()
model.fit(x_train,y_train)
prediction = model.predict(x_test)
print(prediction[0:15]) #see 15 samples of prediction
[3, 9, 5, 5, 7, 9, 7, 6, 9, 8, 5, 9, 8, 3, 3]
print(list(y_test[0:15])) #see 15 samples of actual truth
[3, 9, 5, 6, 6, 9, 7, 5, 9, 8, 5, 9, 8, 3, 3]
print(confusion_matrix(y_test, prediction))
[[842 0 0 0 0 0 0]
[ 2 815 17 8 1 1 0]
[ 8 50 690 130 26 2 0]
[ 2 28 152 531 128 16 0]
[ 4 1 15 66 716 32 3]
[ 0 1 0 4 12 833 0]
[ 0 0 0 0 0 0 820]]
print('hamming_loss =', hamming_loss(y_test, prediction))
hamming_loss = 0.11903962390866353
print(classification_report(y_test, prediction))
precision recall f1-score support
3 0.98 1.00 0.99 842
4 0.91 0.97 0.94 844
5 0.79 0.76 0.78 906
6 0.72 0.62 0.67 857
7 0.81 0.86 0.83 837
8 0.94 0.98 0.96 850
9 1.00 1.00 1.00 820
accuracy 0.88 5956
macro avg 0.88 0.88 0.88 5956
weighted avg 0.88 0.88 0.88 5956
The accuracy is already 0.88 even before tuning hyperparameters.

how to build a generator and discriminator for a DCGAN with images of size 256x256

I have the following generators and discriminators for a DCGAN with images of size 128x128, it works excellent.
However, I would like to use the same code to generate images with a size of 256x256, but I cannot build the generators and discriminators.
# direccion del directorio de entrenamiento
dataroot = "./dataset 128x128"
# Number of workers for dataloader
workers = 6
# Batch size during training
batch_size = 1
# Spatial size of training images. All images will be resized to this
# size using a transformer.
image_size = 128
# Number of channels in the training images. For color images this is 3
nc = 3
# Size of z latent vector (i.e. size of generator input)
nz = 100
# Size of feature maps in generator
ngf = 32
# Size of feature maps in discriminator
ndf = 32
# Number of training epochs
num_epochs = 20
# Learning rate for optimizers
lr = 0.0002
# Beta1 hyperparam for Adam optimizers
beta1 = 0.5
# Number of GPUs available. Use 0 for CPU mode.
ngpu = 2
print("Dataset done")
# Generator Code
class Generator(nn.Module):
def __init__(self, ngpu):
super(Generator, self).__init__()
self.ngpu = ngpu
self.main = nn.Sequential(
# input is Z, going into a convolution
nn.ConvTranspose2d( nz, ngf * 16, 4, 1, 0, bias=False),
nn.BatchNorm2d(ngf * 16),
nn.ReLU(True),
# state size. (ngf*16) x 4 x 4
nn.ConvTranspose2d(ngf * 16, ngf * 8, 4, 2, 1, bias=False),
nn.BatchNorm2d(ngf * 8),
nn.ReLU(True),
# state size. (ngf*8) x 8 x 8
nn.ConvTranspose2d(ngf * 8, ngf * 4, 4, 2, 1, bias=False),
nn.BatchNorm2d(ngf * 4),
nn.ReLU(True),
# state size. (ngf*4) x 16 x 16
nn.ConvTranspose2d(ngf * 4, ngf * 2, 4, 2, 1, bias=False),
nn.BatchNorm2d(ngf * 2),
nn.ReLU(True),
# state size. (ngf*2) x 32 x 32
nn.ConvTranspose2d(ngf * 2, ngf, 4, 2, 1, bias=False),
nn.BatchNorm2d(ngf),
nn.ReLU(True),
# state size. (ngf) x 64 x 64
nn.ConvTranspose2d( ngf, nc, 4, 2, 1, bias=False),
nn.Tanh()
# state size. (nc) x 128 x 128
)
def forward(self, input):
return self.main(input)
class Discriminator(nn.Module):
def __init__(self, ngpu):
super(Discriminator, self).__init__()
self.ngpu = ngpu
self.main = nn.Sequential(
# input is (nc) x 128 x 128
nn.Conv2d(nc, ndf, 4, stride=2, padding=1, bias=False),
nn.LeakyReLU(0.2, inplace=True),
# state size. (ndf) x 64 x 64
nn.Conv2d(ndf, ndf * 2, 4, stride=2, padding=1, bias=False),
nn.BatchNorm2d(ndf * 2),
nn.LeakyReLU(0.2, inplace=True),
# state size. (ndf*2) x 32 x 32
nn.Conv2d(ndf * 2, ndf * 4, 4, stride=2, padding=1, bias=False),
nn.BatchNorm2d(ndf * 4),
nn.LeakyReLU(0.2, inplace=True),
# state size. (ndf*4) x 16 x 16
nn.Conv2d(ndf * 4, ndf * 8, 4, stride=2, padding=1, bias=False),
nn.BatchNorm2d(ndf * 8),
nn.LeakyReLU(0.2, inplace=True),
# state size. (ndf*8) x 8 x 8
nn.Conv2d(ndf * 8, ndf * 16, 4, stride=2, padding=1, bias=False),
nn.BatchNorm2d(ndf * 16),
nn.LeakyReLU(0.2, inplace=True),
# state size. (ndf*16) x 4 x 4
nn.Conv2d(ndf * 16, 1, 4, stride=1, padding=0, bias=False),
nn.Sigmoid()
# state size. 1
)
def forward(self, input):
return self.main(input)
# Lists to keep track of progress
img_list = []
G_losses = []
D_losses = []
iters = 0
print("Starting Training Loop...")
# For each epoch
for epoch in range(num_epochs):
# For each batch in the dataloader
for i, data in enumerate(dataloader, 0):
############################
# (1) Update D network: maximize log(D(x)) + log(1 - D(G(z)))
###########################
## Train with all-real batch
netD.zero_grad()
# Format batch
real_cpu = data[0].to(device)
b_size = real_cpu.size(0)
label = torch.full((b_size,), real_label, dtype=torch.float, device=device)
# Forward pass real batch through D
output = netD(real_cpu).view(-1)
# Calculate loss on all-real batch
errD_real = criterion(output, label)
# Calculate gradients for D in backward pass
errD_real.backward()
D_x = output.mean().item()
## Train with all-fake batch
# Generate batch of latent vectors
noise = torch.randn(b_size, nz, 1, 1, device=device)
# Generate fake image batch with G
fake = netG(noise)
label.fill_(fake_label)
# Classify all fake batch with D
output = netD(fake.detach()).view(-1)
# Calculate D's loss on the all-fake batch
errD_fake = criterion(output, label)
# Calculate the gradients for this batch, accumulated (summed) with previous gradients
errD_fake.backward()
D_G_z1 = output.mean().item()
# Compute error of D as sum over the fake and the real batches
errD = errD_real + errD_fake
# Update D
optimizerD.step()
############################
# (2) Update G network: maximize log(D(G(z)))
###########################
netG.zero_grad()
label.fill_(real_label) # fake labels are real for generator cost
# Since we just updated D, perform another forward pass of all-fake batch through D
output = netD(fake).view(-1)
# Calculate G's loss based on this output
errG = criterion(output, label)
# Calculate gradients for G
errG.backward()
D_G_z2 = output.mean().item()
# Update G
optimizerG.step()
# Output training stats
if i % 50 == 0:
print('[%d/%d][%d/%d]\tLoss_D: %.4f\tLoss_G: %.4f\tD(x): %.4f\tD(G(z)): %.4f / %.4f'
% (epoch, num_epochs, i, len(dataloader),
errD.item(), errG.item(), D_x, D_G_z1, D_G_z2))
# Save Losses for plotting later
G_losses.append(errG.item())
D_losses.append(errD.item())
# Check how the generator is doing by saving G's output on fixed_noise
if (iters % 500 == 0) or ((epoch == num_epochs-1) and (i == len(dataloader)-1)):
with torch.no_grad():
fake = netG(fixed_noise).detach().cpu()
img_list.append(vutils.make_grid(fake, padding=2, normalize=True))
iters += 1
How to modify those generators and discriminators for an image of size 256x256?

Get accuracy from true and predicted values

I have predicted_y and real_y.
Is there a faster way to get accuracy than:
from keras import backend as K
accuracy_array = K.eval(keras.metrics.categorical_accuracy(real_y, predicted_y))
print(sum(accuracy_array)/len(accuracy_array))
I would suggest to use scikit-learn for your purpose as I mentioned in my comment.
Example 1:
from sklearn import metrics
results = metrics.accuracy_score(real_y, predicted_y)
You can allso get the classification report including precision, recall, f1-scores.
Example 2:
from sklearn.metrics import classification_report
y_true = [0, 1, 2, 2, 2]
y_pred = [0, 0, 2, 2, 1]
target_names = ['class 0', 'class 1', 'class 2']
print(classification_report(y_true, y_pred, target_names=target_names))
precision recall f1-score support
class 0 0.50 1.00 0.67 1
class 1 0.00 0.00 0.00 1
class 2 1.00 0.67 0.80 3
avg / total 0.70 0.60 0.61 5
Finally, for the confusion matrix use this:
Example 3:
from sklearn.metrics import confusion_matrix
y_true = [0, 1, 2, 2, 2]
y_pred = [0, 0, 2, 2, 1]
confusion_matrix(y_true, y_pred)
array([[1, 0, 0],
[1, 0, 0],
[0, 1, 2]])
Try accuracy_score from scikit-learn.
import numpy as np
from sklearn.metrics import accuracy_score
y_pred = [0, 2, 1, 3]
y_true = [0, 1, 2, 3]
accuracy_score(y_true, y_pred)
accuracy_score(y_true, y_pred, normalize=False)
I wrote a Python lib for confusion matrix analysis, you can use it for your purpose.
>>> from pycm import *
>>> y_actu = [2, 0, 2, 2, 0, 1, 1, 2, 2, 0, 1, 2] # or y_actu = numpy.array([2, 0, 2, 2, 0, 1, 1, 2, 2, 0, 1, 2])
>>> y_pred = [0, 0, 2, 1, 0, 2, 1, 0, 2, 0, 2, 2] # or y_pred = numpy.array([0, 0, 2, 1, 0, 2, 1, 0, 2, 0, 2, 2])
>>> cm = ConfusionMatrix(actual_vector=y_actu, predict_vector=y_pred) # Create CM From Data
>>> cm.classes
[0, 1, 2]
>>> cm.table
{0: {0: 3, 1: 0, 2: 0}, 1: {0: 0, 1: 1, 2: 2}, 2: {0: 2, 1: 1, 2: 3}}
>>> print(cm)
Predict 0 1 2
Actual
0 3 0 0
1 0 1 2
2 2 1 3
Overall Statistics :
95% CI (0.30439,0.86228)
Bennett_S 0.375
Chi-Squared 6.6
Chi-Squared DF 4
Conditional Entropy 0.95915
Cramer_V 0.5244
Cross Entropy 1.59352
Gwet_AC1 0.38931
Joint Entropy 2.45915
KL Divergence 0.09352
Kappa 0.35484
Kappa 95% CI (-0.07708,0.78675)
Kappa No Prevalence 0.16667
Kappa Standard Error 0.22036
Kappa Unbiased 0.34426
Lambda A 0.16667
Lambda B 0.42857
Mutual Information 0.52421
Overall_ACC 0.58333
Overall_RACC 0.35417
Overall_RACCU 0.36458
PPV_Macro 0.56667
PPV_Micro 0.58333
Phi-Squared 0.55
Reference Entropy 1.5
Response Entropy 1.48336
Scott_PI 0.34426
Standard Error 0.14232
Strength_Of_Agreement(Altman) Fair
Strength_Of_Agreement(Cicchetti) Poor
Strength_Of_Agreement(Fleiss) Poor
Strength_Of_Agreement(Landis and Koch) Fair
TPR_Macro 0.61111
TPR_Micro 0.58333
Class Statistics :
Classes 0 1 2
ACC(Accuracy) 0.83333 0.75 0.58333
BM(Informedness or bookmaker informedness) 0.77778 0.22222 0.16667
DOR(Diagnostic odds ratio) None 4.0 2.0
ERR(Error rate) 0.16667 0.25 0.41667
F0.5(F0.5 score) 0.65217 0.45455 0.57692
F1(F1 score - harmonic mean of precision and sensitivity) 0.75 0.4 0.54545
F2(F2 score) 0.88235 0.35714 0.51724
FDR(False discovery rate) 0.4 0.5 0.4
FN(False negative/miss/type 2 error) 0 2 3
FNR(Miss rate or false negative rate) 0.0 0.66667 0.5
FOR(False omission rate) 0.0 0.2 0.42857
FP(False positive/type 1 error/false alarm) 2 1 2
FPR(Fall-out or false positive rate) 0.22222 0.11111 0.33333
G(G-measure geometric mean of precision and sensitivity) 0.7746 0.40825 0.54772
LR+(Positive likelihood ratio) 4.5 3.0 1.5
LR-(Negative likelihood ratio) 0.0 0.75 0.75
MCC(Matthews correlation coefficient) 0.68313 0.2582 0.16903
MK(Markedness) 0.6 0.3 0.17143
N(Condition negative) 9 9 6
NPV(Negative predictive value) 1.0 0.8 0.57143
P(Condition positive) 3 3 6
POP(Population) 12 12 12
PPV(Precision or positive predictive value) 0.6 0.5 0.6
PRE(Prevalence) 0.25 0.25 0.5
RACC(Random accuracy) 0.10417 0.04167 0.20833
RACCU(Random accuracy unbiased) 0.11111 0.0434 0.21007
TN(True negative/correct rejection) 7 8 4
TNR(Specificity or true negative rate) 0.77778 0.88889 0.66667
TON(Test outcome negative) 7 10 7
TOP(Test outcome positive) 5 2 5
TP(True positive/hit) 3 1 3
TPR(Sensitivity, recall, hit rate, or true positive rate) 1.0 0.33333 0.5
>>> cm.matrix()
Predict 0 1 2
Actual
0 3 0 0
1 0 1 2
2 2 1 3
>>> cm.normalized_matrix()
Predict 0 1 2
Actual
0 1.0 0.0 0.0
1 0.0 0.33333 0.66667
2 0.33333 0.16667 0.5
Link : PyCM
Thanks to seralouk, I've found:
from sklearn import metrics
metrics.accuracy_score(real_y.argmax(axis=1), predicted_y.argmax(axis=1))

how to compute the classification report for sentiment analysis with scikit-learn

how can I get the classification report measures precision, recall, accuracy, and support for 3 class classification and the classes are "positive", "negative" and "neutral". below is the code:
vec_clf = Pipeline([('vectorizer', vec), ('pac', svm_clf)])
print vec_clf.fit(X_train.values.astype('U'),y_train.values.astype('U'))
y_pred = vec_clf.predict(X_test.values.astype('U'))
print "SVM Accuracy-",metrics.accuracy_score(y_test, y_pred)
print "confuson metrics :\n", metrics.confusion_matrix(y_test, y_pred, labels=["positive","negative","neutral"])
print(metrics.classification_report(y_test, y_pred))
and it is giving error as:
SVM Accuracy- 0.850318471338
confuson metrics :
[[206 9 67]
[ 4 373 122]
[ 9 21 756]]
Traceback (most recent call last):
File "<ipython-input-62-e6ab3066790e>", line 1, in <module>
runfile('C:/Users/HP/abc16.py', wdir='C:/Users/HP')
File "C:\ProgramData\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 880, in runfile
execfile(filename, namespace)
File "C:\ProgramData\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 87, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)
File "C:/Users/HP/abc16.py", line 133, in <module>
print(metrics.classification_report(y_test, y_pred))
File "C:\ProgramData\Anaconda2\lib\site-packages\sklearn\metrics\classification.py", line 1391, in classification_report
labels = unique_labels(y_true, y_pred)
File "C:\ProgramData\Anaconda2\lib\site-packages\sklearn\utils\multiclass.py", line 104, in unique_labels
raise ValueError("Mix of label input types (string and number)")
ValueError: Mix of label input types (string and number)
please guide me where I am getting wrong
EDIT 1: this is how the y_true and y_pred looks
print "y_true :" ,y_test
print "y_pred :",y_pred
y_true : 5985 neutral
899 positive
2403 neutral
3963 neutral
3457 neutral
5345 neutral
3779 neutral
299 neutral
5712 neutral
5511 neutral
234 neutral
1684 negative
3701 negative
2886 neutral
.
.
.
2623 positive
3549 neutral
4574 neutral
4972 positive
Name: sentiment, Length: 1570, dtype: object
y_pred : [u'neutral' u'positive' u'neutral' ..., u'neutral' u'neutral' u'negative']
EDIT 2: output for type(y_true) and type(y_pred)
type(y_true): <class 'pandas.core.series.Series'>
type(y_pred): <type 'numpy.ndarray'>
Cannot reproduce your error:
import pandas as pd
import numpy as np
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
# toy data, similar to yours:
data = {'id':[5985,899,2403, 1684], 'sentiment':['neutral', 'positive', 'neutral', 'negative']}
y_true = pd.Series(data['sentiment'], index=data['id'], name='sentiment')
y_true
# 5985 neutral
# 899 positive
# 2403 neutral
# 1684 negative
# Name: sentiment, dtype: object
type(y_true)
# pandas.core.series.Series
y_pred = np.array(['neutral', 'positive', 'negative', 'neutral'])
# all metrics working fine:
accuracy_score(y_true, y_pred)
# 0.5
confusion_matrix(y_true, y_pred)
# array([[0, 1, 0],
# [1, 1, 0],
# [0, 0, 1]], dtype=int64)
classification_report(y_true, y_pred)
# result:
precision recall f1-score support
negative 0.00 0.00 0.00 1
neutral 0.50 0.50 0.50 2
positive 1.00 1.00 1.00 1
total 0.50 0.50 0.50 4

Is F1 micro the same as Accuracy?

I have tried many examples with F1 micro and Accuracy in scikit-learn and in all of them, I see that F1 micro is the same as Accuracy. Is this always true?
Script
from sklearn import svm
from sklearn import metrics
from sklearn.cross_validation import train_test_split
from sklearn.datasets import load_iris
from sklearn.metrics import f1_score, accuracy_score
# prepare dataset
iris = load_iris()
X = iris.data[:, :2]
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# svm classification
clf = svm.SVC(kernel='rbf', gamma=0.7, C = 1.0).fit(X_train, y_train)
y_predicted = clf.predict(X_test)
# performance
print "Classification report for %s" % clf
print metrics.classification_report(y_test, y_predicted)
print("F1 micro: %1.4f\n" % f1_score(y_test, y_predicted, average='micro'))
print("F1 macro: %1.4f\n" % f1_score(y_test, y_predicted, average='macro'))
print("F1 weighted: %1.4f\n" % f1_score(y_test, y_predicted, average='weighted'))
print("Accuracy: %1.4f" % (accuracy_score(y_test, y_predicted)))
Output
Classification report for SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape=None, degree=3, gamma=0.7, kernel='rbf',
max_iter=-1, probability=False, random_state=None, shrinking=True,
tol=0.001, verbose=False)
precision recall f1-score support
0 1.00 0.90 0.95 10
1 0.50 0.88 0.64 8
2 0.86 0.50 0.63 12
avg / total 0.81 0.73 0.74 30
F1 micro: 0.7333
F1 macro: 0.7384
F1 weighted: 0.7381
Accuracy: 0.7333
F1 micro = Accuracy
In classification tasks for which every test case is guaranteed to be assigned to exactly one class, micro-F is equivalent to accuracy. It won't be the case in multi-label classification.
This is because we are dealing with a multi class classification , where every test data should belong to only 1 class and not multi label , in such case where there is no TN , we can call True Negatives as True Positives.
Formula wise ,
correction : F1 score is 2* precision* recall / (precision + recall)
Micoaverage precision, recall, f1 and accuracy are all equal for cases in which every instance must be classified into one (and only one) class. A simple way to see this is by looking at the formulas precision=TP/(TP+FP) and recall=TP/(TP+FN). The numerators are the same, and every FN for one class is another classes's FP, which makes the denominators the same as well. If precision = recall, then f1 will also be equal.
For any inputs should should be able to show that:
from sklearn.metrics import accuracy_score as acc
from sklearn.metrics import f1_score as f1
f1(y_true,y_pred,average='micro')=acc(y_true,y_pred)
I had the same issue so I investigated and came up with this:
Just thinking about the theory, it is impossible that accuracy and the f1-score are the very same for every single dataset. The reason for this is that the f1-score is independent from the true-negatives while accuracy is not.
By taking a dataset where f1 = acc and adding true negatives to it, you get f1 != acc.
>>> from sklearn.metrics import accuracy_score as acc
>>> from sklearn.metrics import f1_score as f1
>>> y_pred = [0, 1, 1, 0, 1, 0]
>>> y_true = [0, 1, 1, 0, 0, 1]
>>> acc(y_true, y_pred)
0.6666666666666666
>>> f1(y_true,y_pred)
0.6666666666666666
>>> y_true = [0, 1, 1, 0, 1, 0, 0, 0, 0]
>>> y_pred = [0, 1, 1, 0, 0, 1, 0, 0, 0]
>>> acc(y_true, y_pred)
0.7777777777777778
>>> f1(y_true,y_pred)
0.6666666666666666

Resources