I have actual class and res class here - https://extendsclass.com/csv-editor.html#46eaa9e
I wanted to calculate the sensitivity, specificity, pos predictivity for each of the class A, N,O. Here is my code
Here is the code
from sklearn.metrics import multilabel_confusion_matrix
import numpy as np
mcm = multilabel_confusion_matrix(act_class, pred_class)
tps = mcm[:, 1, 1]
tns = mcm[:, 0, 0]
recall = tps / (tps + mcm[:, 1, 0]) # Sensitivity
specificity = tns / (tns + mcm[:, 0, 1]) # Specificity
precision = tps / (tps + mcm[:, 0, 1]) # PPV
print(recall)
print(specificity)
print(precision)
print(classification_report(act_class, pred_class))
Which gives me results like this
[0.31818182 0.96186441 nan nan]
[0.99576271 0.86363636 0.86092715 0.99337748]
[0.95454545 0.96186441 0. 0. ]
precision recall f1-score support
A 0.95 0.32 0.48 66
N 0.96 0.96 0.96 236
O 0.00 0.00 0.00 0
~ 0.00 0.00 0.00 0
accuracy 0.82 302
macro avg 0.48 0.32 0.36 302
weighted avg 0.96 0.82 0.86 302
The problem here is - I can not deduce clearly what is the sensitivity, specificity, pos predictivity for each of the class A, N,O.
This might be quicker to explain visually:
By default the labels should occur in sorted order (for your problem: A, N, O, ~).
If you want a different order, you can specify one with the labels= parameter. The following has two classes, and orders them by: [3, 2]
from sklearn.metrics import multilabel_confusion_matrix
from sklearn.metrics import classification_report
y_true = [2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3]
y_pred = [2, 2, 2, 3, 3, 2, 2, 2, 3, 3, 3, 3]
mcm = multilabel_confusion_matrix(y_true, y_pred, labels=[3, 2])
tps = mcm[:, 1, 1]
precision = tps / (tps + mcm[:, 0, 1])
print(precision)
print(f"Precision class 3: {precision[0]}. Precision class 2: {precision[1]}")
print(classification_report(y_true, y_pred, labels=[3, 2]))
Output:
[0.66666667 0.5 ]
Precision class 3: 0.6666666666666666. Precision class 2: 0.5
precision recall f1-score support
3 0.67 0.57 0.62 7
2 0.50 0.60 0.55 5
accuracy 0.58 12
macro avg 0.58 0.59 0.58 12
weighted avg 0.60 0.58 0.59 12
Related
Closed. This question is not about programming or software development. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 6 months ago.
Improve this question
Edit: please share comments as I'm learning to post good questions
I'm trying to train this dataset with IsolationForest(), I need to train this dataset, and use it in another dataset with altered qualities to predict the quality values and fetch all wines with quality 8 and 9.
However I'm having some problems with it. Because the accuracy score is 0.0 from the classification report:
print(classification_report(y_test, prediction))
precision recall f1-score support
-1 0.00 0.00 0.00 0.0
1 0.00 0.00 0.00 0.0
3 0.00 0.00 0.00 866.0
4 0.00 0.00 0.00 829.0
5 0.00 0.00 0.00 841.0
6 0.00 0.00 0.00 861.0
7 0.00 0.00 0.00 822.0
8 0.00 0.00 0.00 886.0
9 0.00 0.00 0.00 851.0
accuracy 0.00 5956.0
macro avg 0.00 0.00 0.00 5956.0
weighted avg 0.00 0.00 0.00 5956.0
I don't know if it's a hyperparameter issue, or if I'm clearing the wrong data or putting wrong parameters, I already tried to use with SMOTE and without SMOTE, I wanted to reach an accuracy of 90% at least.
I'll leave the shared drive link public for dataset verification::
https://drive.google.com/drive/folders/18_sOSIZZw9DCW7ftEKuOG4aIzGXoasFe?usp=sharing
Here's my code:
from sklearn.preprocessing import OrdinalEncoder
from sklearn.ensemble import IsolationForest
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from imblearn.over_sampling import SMOTE
from sklearn.metrics import classification_report,confusion_matrix
df = pd.read_csv('wines.csv')
df.head(5)
ordinalEncoder = OrdinalEncoder(handle_unknown='use_encoded_value', unknown_value=-99).fit(df[['color']])
df[['color']] = ordinalEncoder.transform(df[['color']])
df.info()
df['color'] = df['color'].astype(int)
df.head(3)
stm = SMOTE(k_neighbors=4)
x_smote = df.drop('quality',axis=1)
y_smote = df['quality']
x_smote,y_smote = stm.fit_resample(x_smote,y_smote)
print(x_smote.shape,y_smote.shape)
x_smote.columns
scaler = StandardScaler()
X = scaler.fit_transform(x_smote)
y = y_smote
X.shape, y.shape
x_train, x_test, y_train, y_test = train_test_split(X,y,test_size=0.3)
from sklearn.ensemble import IsolationForest
from sklearn.metrics import hamming_loss
iforest = IsolationForest(n_estimators=200, max_samples=0.1, contamination=0.10, max_features=1.0, bootstrap=False, n_jobs=-1,
random_state=None, verbose=0, warm_start=False)
iforest_fit = iforest.fit(x_train,y_train)
prediction = iforest_fit.predict(x_test)
print (prediction.shape, y_test.shape)
y.value_counts()
prediction
print(confusion_matrix(y_test, prediction))
hamming_loss(y_test, prediction)
from sklearn.metrics import classification_report
print(classification_report(y_test, prediction))
May I know why do you choose Isolation Forest as your model? This article says that Isolation Forest is an unsupervised learning algorithm for anomaly detection.
When I print some samples of the prediction (by Isolation Forest) and samples of actual truth, I get the following results, so you know why the accuracy score is 0.0:
print(list(prediction[0:15]))
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
print(list(y_test[0:15]))
[9, 4, 4, 7, 9, 3, 6, 7, 4, 8, 8, 7, 3, 8, 5]
The wines.csv dataset and your code are both pointing towards a multi-class classification problem. I have chosen RandomForestClassifier() to continue with the second part of your code:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import hamming_loss
model = RandomForestClassifier()
model.fit(x_train,y_train)
prediction = model.predict(x_test)
print(prediction[0:15]) #see 15 samples of prediction
[3, 9, 5, 5, 7, 9, 7, 6, 9, 8, 5, 9, 8, 3, 3]
print(list(y_test[0:15])) #see 15 samples of actual truth
[3, 9, 5, 6, 6, 9, 7, 5, 9, 8, 5, 9, 8, 3, 3]
print(confusion_matrix(y_test, prediction))
[[842 0 0 0 0 0 0]
[ 2 815 17 8 1 1 0]
[ 8 50 690 130 26 2 0]
[ 2 28 152 531 128 16 0]
[ 4 1 15 66 716 32 3]
[ 0 1 0 4 12 833 0]
[ 0 0 0 0 0 0 820]]
print('hamming_loss =', hamming_loss(y_test, prediction))
hamming_loss = 0.11903962390866353
print(classification_report(y_test, prediction))
precision recall f1-score support
3 0.98 1.00 0.99 842
4 0.91 0.97 0.94 844
5 0.79 0.76 0.78 906
6 0.72 0.62 0.67 857
7 0.81 0.86 0.83 837
8 0.94 0.98 0.96 850
9 1.00 1.00 1.00 820
accuracy 0.88 5956
macro avg 0.88 0.88 0.88 5956
weighted avg 0.88 0.88 0.88 5956
The accuracy is already 0.88 even before tuning hyperparameters.
I have the following generators and discriminators for a DCGAN with images of size 128x128, it works excellent.
However, I would like to use the same code to generate images with a size of 256x256, but I cannot build the generators and discriminators.
# direccion del directorio de entrenamiento
dataroot = "./dataset 128x128"
# Number of workers for dataloader
workers = 6
# Batch size during training
batch_size = 1
# Spatial size of training images. All images will be resized to this
# size using a transformer.
image_size = 128
# Number of channels in the training images. For color images this is 3
nc = 3
# Size of z latent vector (i.e. size of generator input)
nz = 100
# Size of feature maps in generator
ngf = 32
# Size of feature maps in discriminator
ndf = 32
# Number of training epochs
num_epochs = 20
# Learning rate for optimizers
lr = 0.0002
# Beta1 hyperparam for Adam optimizers
beta1 = 0.5
# Number of GPUs available. Use 0 for CPU mode.
ngpu = 2
print("Dataset done")
# Generator Code
class Generator(nn.Module):
def __init__(self, ngpu):
super(Generator, self).__init__()
self.ngpu = ngpu
self.main = nn.Sequential(
# input is Z, going into a convolution
nn.ConvTranspose2d( nz, ngf * 16, 4, 1, 0, bias=False),
nn.BatchNorm2d(ngf * 16),
nn.ReLU(True),
# state size. (ngf*16) x 4 x 4
nn.ConvTranspose2d(ngf * 16, ngf * 8, 4, 2, 1, bias=False),
nn.BatchNorm2d(ngf * 8),
nn.ReLU(True),
# state size. (ngf*8) x 8 x 8
nn.ConvTranspose2d(ngf * 8, ngf * 4, 4, 2, 1, bias=False),
nn.BatchNorm2d(ngf * 4),
nn.ReLU(True),
# state size. (ngf*4) x 16 x 16
nn.ConvTranspose2d(ngf * 4, ngf * 2, 4, 2, 1, bias=False),
nn.BatchNorm2d(ngf * 2),
nn.ReLU(True),
# state size. (ngf*2) x 32 x 32
nn.ConvTranspose2d(ngf * 2, ngf, 4, 2, 1, bias=False),
nn.BatchNorm2d(ngf),
nn.ReLU(True),
# state size. (ngf) x 64 x 64
nn.ConvTranspose2d( ngf, nc, 4, 2, 1, bias=False),
nn.Tanh()
# state size. (nc) x 128 x 128
)
def forward(self, input):
return self.main(input)
class Discriminator(nn.Module):
def __init__(self, ngpu):
super(Discriminator, self).__init__()
self.ngpu = ngpu
self.main = nn.Sequential(
# input is (nc) x 128 x 128
nn.Conv2d(nc, ndf, 4, stride=2, padding=1, bias=False),
nn.LeakyReLU(0.2, inplace=True),
# state size. (ndf) x 64 x 64
nn.Conv2d(ndf, ndf * 2, 4, stride=2, padding=1, bias=False),
nn.BatchNorm2d(ndf * 2),
nn.LeakyReLU(0.2, inplace=True),
# state size. (ndf*2) x 32 x 32
nn.Conv2d(ndf * 2, ndf * 4, 4, stride=2, padding=1, bias=False),
nn.BatchNorm2d(ndf * 4),
nn.LeakyReLU(0.2, inplace=True),
# state size. (ndf*4) x 16 x 16
nn.Conv2d(ndf * 4, ndf * 8, 4, stride=2, padding=1, bias=False),
nn.BatchNorm2d(ndf * 8),
nn.LeakyReLU(0.2, inplace=True),
# state size. (ndf*8) x 8 x 8
nn.Conv2d(ndf * 8, ndf * 16, 4, stride=2, padding=1, bias=False),
nn.BatchNorm2d(ndf * 16),
nn.LeakyReLU(0.2, inplace=True),
# state size. (ndf*16) x 4 x 4
nn.Conv2d(ndf * 16, 1, 4, stride=1, padding=0, bias=False),
nn.Sigmoid()
# state size. 1
)
def forward(self, input):
return self.main(input)
# Lists to keep track of progress
img_list = []
G_losses = []
D_losses = []
iters = 0
print("Starting Training Loop...")
# For each epoch
for epoch in range(num_epochs):
# For each batch in the dataloader
for i, data in enumerate(dataloader, 0):
############################
# (1) Update D network: maximize log(D(x)) + log(1 - D(G(z)))
###########################
## Train with all-real batch
netD.zero_grad()
# Format batch
real_cpu = data[0].to(device)
b_size = real_cpu.size(0)
label = torch.full((b_size,), real_label, dtype=torch.float, device=device)
# Forward pass real batch through D
output = netD(real_cpu).view(-1)
# Calculate loss on all-real batch
errD_real = criterion(output, label)
# Calculate gradients for D in backward pass
errD_real.backward()
D_x = output.mean().item()
## Train with all-fake batch
# Generate batch of latent vectors
noise = torch.randn(b_size, nz, 1, 1, device=device)
# Generate fake image batch with G
fake = netG(noise)
label.fill_(fake_label)
# Classify all fake batch with D
output = netD(fake.detach()).view(-1)
# Calculate D's loss on the all-fake batch
errD_fake = criterion(output, label)
# Calculate the gradients for this batch, accumulated (summed) with previous gradients
errD_fake.backward()
D_G_z1 = output.mean().item()
# Compute error of D as sum over the fake and the real batches
errD = errD_real + errD_fake
# Update D
optimizerD.step()
############################
# (2) Update G network: maximize log(D(G(z)))
###########################
netG.zero_grad()
label.fill_(real_label) # fake labels are real for generator cost
# Since we just updated D, perform another forward pass of all-fake batch through D
output = netD(fake).view(-1)
# Calculate G's loss based on this output
errG = criterion(output, label)
# Calculate gradients for G
errG.backward()
D_G_z2 = output.mean().item()
# Update G
optimizerG.step()
# Output training stats
if i % 50 == 0:
print('[%d/%d][%d/%d]\tLoss_D: %.4f\tLoss_G: %.4f\tD(x): %.4f\tD(G(z)): %.4f / %.4f'
% (epoch, num_epochs, i, len(dataloader),
errD.item(), errG.item(), D_x, D_G_z1, D_G_z2))
# Save Losses for plotting later
G_losses.append(errG.item())
D_losses.append(errD.item())
# Check how the generator is doing by saving G's output on fixed_noise
if (iters % 500 == 0) or ((epoch == num_epochs-1) and (i == len(dataloader)-1)):
with torch.no_grad():
fake = netG(fixed_noise).detach().cpu()
img_list.append(vutils.make_grid(fake, padding=2, normalize=True))
iters += 1
How to modify those generators and discriminators for an image of size 256x256?
I have predicted_y and real_y.
Is there a faster way to get accuracy than:
from keras import backend as K
accuracy_array = K.eval(keras.metrics.categorical_accuracy(real_y, predicted_y))
print(sum(accuracy_array)/len(accuracy_array))
I would suggest to use scikit-learn for your purpose as I mentioned in my comment.
Example 1:
from sklearn import metrics
results = metrics.accuracy_score(real_y, predicted_y)
You can allso get the classification report including precision, recall, f1-scores.
Example 2:
from sklearn.metrics import classification_report
y_true = [0, 1, 2, 2, 2]
y_pred = [0, 0, 2, 2, 1]
target_names = ['class 0', 'class 1', 'class 2']
print(classification_report(y_true, y_pred, target_names=target_names))
precision recall f1-score support
class 0 0.50 1.00 0.67 1
class 1 0.00 0.00 0.00 1
class 2 1.00 0.67 0.80 3
avg / total 0.70 0.60 0.61 5
Finally, for the confusion matrix use this:
Example 3:
from sklearn.metrics import confusion_matrix
y_true = [0, 1, 2, 2, 2]
y_pred = [0, 0, 2, 2, 1]
confusion_matrix(y_true, y_pred)
array([[1, 0, 0],
[1, 0, 0],
[0, 1, 2]])
Try accuracy_score from scikit-learn.
import numpy as np
from sklearn.metrics import accuracy_score
y_pred = [0, 2, 1, 3]
y_true = [0, 1, 2, 3]
accuracy_score(y_true, y_pred)
accuracy_score(y_true, y_pred, normalize=False)
I wrote a Python lib for confusion matrix analysis, you can use it for your purpose.
>>> from pycm import *
>>> y_actu = [2, 0, 2, 2, 0, 1, 1, 2, 2, 0, 1, 2] # or y_actu = numpy.array([2, 0, 2, 2, 0, 1, 1, 2, 2, 0, 1, 2])
>>> y_pred = [0, 0, 2, 1, 0, 2, 1, 0, 2, 0, 2, 2] # or y_pred = numpy.array([0, 0, 2, 1, 0, 2, 1, 0, 2, 0, 2, 2])
>>> cm = ConfusionMatrix(actual_vector=y_actu, predict_vector=y_pred) # Create CM From Data
>>> cm.classes
[0, 1, 2]
>>> cm.table
{0: {0: 3, 1: 0, 2: 0}, 1: {0: 0, 1: 1, 2: 2}, 2: {0: 2, 1: 1, 2: 3}}
>>> print(cm)
Predict 0 1 2
Actual
0 3 0 0
1 0 1 2
2 2 1 3
Overall Statistics :
95% CI (0.30439,0.86228)
Bennett_S 0.375
Chi-Squared 6.6
Chi-Squared DF 4
Conditional Entropy 0.95915
Cramer_V 0.5244
Cross Entropy 1.59352
Gwet_AC1 0.38931
Joint Entropy 2.45915
KL Divergence 0.09352
Kappa 0.35484
Kappa 95% CI (-0.07708,0.78675)
Kappa No Prevalence 0.16667
Kappa Standard Error 0.22036
Kappa Unbiased 0.34426
Lambda A 0.16667
Lambda B 0.42857
Mutual Information 0.52421
Overall_ACC 0.58333
Overall_RACC 0.35417
Overall_RACCU 0.36458
PPV_Macro 0.56667
PPV_Micro 0.58333
Phi-Squared 0.55
Reference Entropy 1.5
Response Entropy 1.48336
Scott_PI 0.34426
Standard Error 0.14232
Strength_Of_Agreement(Altman) Fair
Strength_Of_Agreement(Cicchetti) Poor
Strength_Of_Agreement(Fleiss) Poor
Strength_Of_Agreement(Landis and Koch) Fair
TPR_Macro 0.61111
TPR_Micro 0.58333
Class Statistics :
Classes 0 1 2
ACC(Accuracy) 0.83333 0.75 0.58333
BM(Informedness or bookmaker informedness) 0.77778 0.22222 0.16667
DOR(Diagnostic odds ratio) None 4.0 2.0
ERR(Error rate) 0.16667 0.25 0.41667
F0.5(F0.5 score) 0.65217 0.45455 0.57692
F1(F1 score - harmonic mean of precision and sensitivity) 0.75 0.4 0.54545
F2(F2 score) 0.88235 0.35714 0.51724
FDR(False discovery rate) 0.4 0.5 0.4
FN(False negative/miss/type 2 error) 0 2 3
FNR(Miss rate or false negative rate) 0.0 0.66667 0.5
FOR(False omission rate) 0.0 0.2 0.42857
FP(False positive/type 1 error/false alarm) 2 1 2
FPR(Fall-out or false positive rate) 0.22222 0.11111 0.33333
G(G-measure geometric mean of precision and sensitivity) 0.7746 0.40825 0.54772
LR+(Positive likelihood ratio) 4.5 3.0 1.5
LR-(Negative likelihood ratio) 0.0 0.75 0.75
MCC(Matthews correlation coefficient) 0.68313 0.2582 0.16903
MK(Markedness) 0.6 0.3 0.17143
N(Condition negative) 9 9 6
NPV(Negative predictive value) 1.0 0.8 0.57143
P(Condition positive) 3 3 6
POP(Population) 12 12 12
PPV(Precision or positive predictive value) 0.6 0.5 0.6
PRE(Prevalence) 0.25 0.25 0.5
RACC(Random accuracy) 0.10417 0.04167 0.20833
RACCU(Random accuracy unbiased) 0.11111 0.0434 0.21007
TN(True negative/correct rejection) 7 8 4
TNR(Specificity or true negative rate) 0.77778 0.88889 0.66667
TON(Test outcome negative) 7 10 7
TOP(Test outcome positive) 5 2 5
TP(True positive/hit) 3 1 3
TPR(Sensitivity, recall, hit rate, or true positive rate) 1.0 0.33333 0.5
>>> cm.matrix()
Predict 0 1 2
Actual
0 3 0 0
1 0 1 2
2 2 1 3
>>> cm.normalized_matrix()
Predict 0 1 2
Actual
0 1.0 0.0 0.0
1 0.0 0.33333 0.66667
2 0.33333 0.16667 0.5
Link : PyCM
Thanks to seralouk, I've found:
from sklearn import metrics
metrics.accuracy_score(real_y.argmax(axis=1), predicted_y.argmax(axis=1))
how can I get the classification report measures precision, recall, accuracy, and support for 3 class classification and the classes are "positive", "negative" and "neutral". below is the code:
vec_clf = Pipeline([('vectorizer', vec), ('pac', svm_clf)])
print vec_clf.fit(X_train.values.astype('U'),y_train.values.astype('U'))
y_pred = vec_clf.predict(X_test.values.astype('U'))
print "SVM Accuracy-",metrics.accuracy_score(y_test, y_pred)
print "confuson metrics :\n", metrics.confusion_matrix(y_test, y_pred, labels=["positive","negative","neutral"])
print(metrics.classification_report(y_test, y_pred))
and it is giving error as:
SVM Accuracy- 0.850318471338
confuson metrics :
[[206 9 67]
[ 4 373 122]
[ 9 21 756]]
Traceback (most recent call last):
File "<ipython-input-62-e6ab3066790e>", line 1, in <module>
runfile('C:/Users/HP/abc16.py', wdir='C:/Users/HP')
File "C:\ProgramData\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 880, in runfile
execfile(filename, namespace)
File "C:\ProgramData\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 87, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)
File "C:/Users/HP/abc16.py", line 133, in <module>
print(metrics.classification_report(y_test, y_pred))
File "C:\ProgramData\Anaconda2\lib\site-packages\sklearn\metrics\classification.py", line 1391, in classification_report
labels = unique_labels(y_true, y_pred)
File "C:\ProgramData\Anaconda2\lib\site-packages\sklearn\utils\multiclass.py", line 104, in unique_labels
raise ValueError("Mix of label input types (string and number)")
ValueError: Mix of label input types (string and number)
please guide me where I am getting wrong
EDIT 1: this is how the y_true and y_pred looks
print "y_true :" ,y_test
print "y_pred :",y_pred
y_true : 5985 neutral
899 positive
2403 neutral
3963 neutral
3457 neutral
5345 neutral
3779 neutral
299 neutral
5712 neutral
5511 neutral
234 neutral
1684 negative
3701 negative
2886 neutral
.
.
.
2623 positive
3549 neutral
4574 neutral
4972 positive
Name: sentiment, Length: 1570, dtype: object
y_pred : [u'neutral' u'positive' u'neutral' ..., u'neutral' u'neutral' u'negative']
EDIT 2: output for type(y_true) and type(y_pred)
type(y_true): <class 'pandas.core.series.Series'>
type(y_pred): <type 'numpy.ndarray'>
Cannot reproduce your error:
import pandas as pd
import numpy as np
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
# toy data, similar to yours:
data = {'id':[5985,899,2403, 1684], 'sentiment':['neutral', 'positive', 'neutral', 'negative']}
y_true = pd.Series(data['sentiment'], index=data['id'], name='sentiment')
y_true
# 5985 neutral
# 899 positive
# 2403 neutral
# 1684 negative
# Name: sentiment, dtype: object
type(y_true)
# pandas.core.series.Series
y_pred = np.array(['neutral', 'positive', 'negative', 'neutral'])
# all metrics working fine:
accuracy_score(y_true, y_pred)
# 0.5
confusion_matrix(y_true, y_pred)
# array([[0, 1, 0],
# [1, 1, 0],
# [0, 0, 1]], dtype=int64)
classification_report(y_true, y_pred)
# result:
precision recall f1-score support
negative 0.00 0.00 0.00 1
neutral 0.50 0.50 0.50 2
positive 1.00 1.00 1.00 1
total 0.50 0.50 0.50 4
I have tried many examples with F1 micro and Accuracy in scikit-learn and in all of them, I see that F1 micro is the same as Accuracy. Is this always true?
Script
from sklearn import svm
from sklearn import metrics
from sklearn.cross_validation import train_test_split
from sklearn.datasets import load_iris
from sklearn.metrics import f1_score, accuracy_score
# prepare dataset
iris = load_iris()
X = iris.data[:, :2]
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# svm classification
clf = svm.SVC(kernel='rbf', gamma=0.7, C = 1.0).fit(X_train, y_train)
y_predicted = clf.predict(X_test)
# performance
print "Classification report for %s" % clf
print metrics.classification_report(y_test, y_predicted)
print("F1 micro: %1.4f\n" % f1_score(y_test, y_predicted, average='micro'))
print("F1 macro: %1.4f\n" % f1_score(y_test, y_predicted, average='macro'))
print("F1 weighted: %1.4f\n" % f1_score(y_test, y_predicted, average='weighted'))
print("Accuracy: %1.4f" % (accuracy_score(y_test, y_predicted)))
Output
Classification report for SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape=None, degree=3, gamma=0.7, kernel='rbf',
max_iter=-1, probability=False, random_state=None, shrinking=True,
tol=0.001, verbose=False)
precision recall f1-score support
0 1.00 0.90 0.95 10
1 0.50 0.88 0.64 8
2 0.86 0.50 0.63 12
avg / total 0.81 0.73 0.74 30
F1 micro: 0.7333
F1 macro: 0.7384
F1 weighted: 0.7381
Accuracy: 0.7333
F1 micro = Accuracy
In classification tasks for which every test case is guaranteed to be assigned to exactly one class, micro-F is equivalent to accuracy. It won't be the case in multi-label classification.
This is because we are dealing with a multi class classification , where every test data should belong to only 1 class and not multi label , in such case where there is no TN , we can call True Negatives as True Positives.
Formula wise ,
correction : F1 score is 2* precision* recall / (precision + recall)
Micoaverage precision, recall, f1 and accuracy are all equal for cases in which every instance must be classified into one (and only one) class. A simple way to see this is by looking at the formulas precision=TP/(TP+FP) and recall=TP/(TP+FN). The numerators are the same, and every FN for one class is another classes's FP, which makes the denominators the same as well. If precision = recall, then f1 will also be equal.
For any inputs should should be able to show that:
from sklearn.metrics import accuracy_score as acc
from sklearn.metrics import f1_score as f1
f1(y_true,y_pred,average='micro')=acc(y_true,y_pred)
I had the same issue so I investigated and came up with this:
Just thinking about the theory, it is impossible that accuracy and the f1-score are the very same for every single dataset. The reason for this is that the f1-score is independent from the true-negatives while accuracy is not.
By taking a dataset where f1 = acc and adding true negatives to it, you get f1 != acc.
>>> from sklearn.metrics import accuracy_score as acc
>>> from sklearn.metrics import f1_score as f1
>>> y_pred = [0, 1, 1, 0, 1, 0]
>>> y_true = [0, 1, 1, 0, 0, 1]
>>> acc(y_true, y_pred)
0.6666666666666666
>>> f1(y_true,y_pred)
0.6666666666666666
>>> y_true = [0, 1, 1, 0, 1, 0, 0, 0, 0]
>>> y_pred = [0, 1, 1, 0, 0, 1, 0, 0, 0]
>>> acc(y_true, y_pred)
0.7777777777777778
>>> f1(y_true,y_pred)
0.6666666666666666