Problem:
I am building a model that will predict housing price. So, firstly I
decided to build a Linear regression model in Tensorflow. But when I
start training I see that my accuracy is always 1
I am new to machine learning. Please, someone, tell me what's going wrong I can't figure it out. I searched in google but doesn't find any answer that solves my problem.
Here's my code
df_train = df_train.loc[:, ['OverallQual', 'GrLivArea', 'GarageArea', 'SalePrice']]
df_X = df_train.loc[:, ['OverallQual', 'GrLivArea', 'GarageArea']]
df_Y = df_train.loc[:, ['SalePrice']]
df_yy = get_dummies(df_Y)
print("Shape of df_X: ", df_X.shape)
X_train, X_test, y_train, y_test = train_test_split(df_X, df_yy, test_size=0.15)
X_train = np.asarray(X_train).astype(np.float32)
X_test = np.asarray(X_test).astype(np.float32)
y_train = np.asarray(y_train).astype(np.float32)
y_test = np.asarray(y_test).astype(np.float32)
X = tf.placeholder(tf.float32, [None, num_of_features])
y = tf.placeholder(tf.float32, [None, 1])
W = tf.Variable(tf.zeros([num_of_features, 1]))
b = tf.Variable(tf.zeros([1]))
prediction = tf.add(tf.matmul(X, W), b)
num_epochs = 20000
# calculating loss
cost = tf.reduce_mean(tf.losses.softmax_cross_entropy(onehot_labels=y, logits=prediction))
optimizer = tf.train.GradientDescentOptimizer(0.00001).minimize(cost)
correct_prediction = tf.equal(tf.argmax(prediction, axis=1), tf.argmax(y, axis=1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for epoch in range(num_epochs):
if epoch % 100 == 0:
train_accuracy = accuracy.eval(feed_dict={X: X_train, y: y_train})
print('step %d, training accuracy %g' % (epoch, train_accuracy))
optimizer.run(feed_dict={X: X_train, y: y_train})
print('test accuracy %g' % accuracy.eval(feed_dict={
X: X_test, y: y_test}))
Output is:
step 0, training accuracy 1
step 100, training accuracy 1
step 200, training accuracy 1
step 300, training accuracy 1
step 400, training accuracy 1
step 500, training accuracy 1
step 600, training accuracy 1
step 700, training accuracy 1
............................
............................
step 19500, training accuracy 1
step 19600, training accuracy 1
step 19700, training accuracy 1
step 19800, training accuracy 1
step 19900, training accuracy 1
test accuracy 1
EDIT:
I changed my cost function to this
cost = tf.reduce_sum(tf.pow(prediction-y, 2))/(2*1241)
But still my output is always 1.
EDIT 2:
In response to lejlot comment:
Thanks lejlot. I changed my accuracy code to this
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
merged_summary = tf.summary.merge_all()
writer = tf.summary.FileWriter("/tmp/hpp1")
writer.add_graph(sess.graph)
for epoch in range(num_epochs):
if epoch % 5:
s = sess.run(merged_summary, feed_dict={X: X_train, y: y_train})
writer.add_summary(s, epoch)
sess.run(optimizer,feed_dict={X: X_train, y: y_train})
if (epoch+1) % display_step == 0:
c = sess.run(cost, feed_dict={X: X_train, y: y_train})
print("Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(c), \
"W=", sess.run(W), "b=", sess.run(b))
print("Optimization Finished!")
training_cost = sess.run(cost, feed_dict={X: X_train, y: y_train})
print("Training cost=", training_cost, "W=", sess.run(W), "b=", sess.run(b), '\n')
But the output is all nan
Output:
....................................
Epoch: 19900 cost= nan W= nan b= nan
Epoch: 19950 cost= nan W= nan b= nan
Epoch: 20000 cost= nan W= nan b= nan
Optimization Finished!
Training cost= nan W= nan b= nan
You want to use linear regression, but you actually use logistic regression. Take a look at tf.losses.softmax_cross_entropy: it outputs a probability distribution, i.e. a vector of numbers that sum up to 1. In your case, the vector has size=1, hence it always outputs [1].
Here are two examples that will help you see the difference: linear regression and logistic regression.
Related
I am trying to train a CNN model on CelebA (RGB images) dataset. But, when I train the model and check its accuracy it is 0% or close to 0%. I think the issue is in the ConNeuralNet function or the hyperparameters but due to my limited knowledge I'm not sure what I'm missing here. Can someone please help. Thanks
# Creating a simple network
class ConvNeuralNet(torch.nn.Module):
def __init__(self, num_classes=10178):
super(ConvNeuralNet, self).__init__()
self.conv_layer1 = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3)
self.conv_layer2 = nn.Conv2d(in_channels=32, out_channels=32, kernel_size=3)
self.max_pool1 = nn.MaxPool2d(kernel_size = 2, stride = 2)
self.conv_layer3 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3)
self.conv_layer4 = nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3)
self.max_pool2 = nn.MaxPool2d(kernel_size = 2, stride = 2)
self.fc1 = nn.Linear(13312, 128)
self.relu1 = nn.ReLU()
self.fc2 = nn.Linear(128, num_classes)
def forward(self, x):
out = self.conv_layer1(x)
out = self.conv_layer2(out)
out = self.max_pool1(out)
out = self.conv_layer3(out)
out = self.conv_layer4(out)
out = self.max_pool2(out)
out = out.reshape(out.size(0), -1)
out = self.fc1(out)
out = self.relu1(out)
out = self.fc2(out)
return F.log_softmax(out,dim=-1)
def trainTorch(torch_model, train_loader, test_loader,
nb_epochs=NB_EPOCHS, batch_size=BATCH_SIZE, train_end=-1, test_end=-1, learning_rate=LEARNING_RATE, optimizer=None):
train_loss = []
total = 0
correct = 0
step = 0
for _epoch in range(nb_epochs):
for xs, ys in train_loader:
xs, ys = Variable(xs), Variable(ys)
if torch.cuda.is_available():
xs, ys = xs.cuda(), ys.cuda()
optimizer.zero_grad()
preds = torch_model(xs)
preds = F.log_softmax(preds, dim=1)
loss = F.cross_entropy(preds, ys)
loss.backward()
train_loss.append(loss.data.item())
optimizer.step() # update gradients
preds_np = preds.cpu().detach().numpy()
correct += (np.argmax(preds_np, axis=1) == ys.cpu().detach().numpy()).sum()
total += train_loader.batch_size
step += 1
if total % 1000 == 0:
acc = float(correct) / total
print('[%s] Training accuracy: %.2f%%' % (step, acc * 100))
total = 0
correct = 0
nb_epochs = 8
image_size = 64
batch_size = 64
num_classes = 10178
learning_rate = 0.001
num_epochs = 8
# Device will determine whether to run the training on GPU or CPU.
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
trans = transforms.Compose([
transforms.Resize(image_size),
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
train_loader = torch.utils.data.DataLoader(
datasets.CelebA('data', split='train', target_type='identity', transform=trans, download="True"),
batch_size=batch_size, shuffle=True)
test_loader = torch.utils.data.DataLoader(
datasets.CelebA('data', split='test', target_type='identity', transform=trans),
batch_size=batch_size)
#Training the model
print("Training Model")
# Set optimizer with optimizer
optimizer = torch.optim.SGD(model1.parameters(), lr=learning_rate, weight_decay = 0.005, momentum = 0.9)
total_step = len(train_loader)
trainTorch(model1, train_loader, test_loader, nb_epochs, batch_size, train_end, test_end, learning_rate, optimizer = optimizer)
**Update I ran the code for a bit to see if it would start converging. One thing is that there are over 10,000 classes. With a batch size of 64 this means that it will take more than 150 mini-batches before your model has seen every class in your dataset. You certanly shouldn't expect the model to start achieving accurate predictions within a few hundred steps.
When I printed the loss value I noticed it was decreasing very slowly. I changed to learning rate to 0.01 and it started decreasing faster.
Also, your model is very shallow for a face recognition model. You're better off using something like a resnet variant (e.g. resnet-50 or resnet-101 from torchvision), rather than custom rolling your own model.
Primary changes include
Learning rate increased
Fix the loss function
Remove log_softmax from output of model
Add activation to the conv layers
IMO the comments about softmax are a bit misleading since you don't need to softmax the output of your model if you are using cross_entropy. You also don't need softmax to get the argmax of the prediction since both softmax and log_softmax don't change the relative ordering of the predictions (i.e. both softmax and log are strictly increasing functions).
IMO the comment about using average pooling to reduce the input size of the first fc layer is a good one and may improve performance, but you'll need to experiment with that one to find good parameters for it so I left it out of this answer.
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
from torchvision import datasets, transforms
# Creating a simple network
class ConvNeuralNet(torch.nn.Module):
def __init__(self, num_classes=10178):
super(ConvNeuralNet, self).__init__()
self.conv_layer1 = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3)
self.conv_layer2 = nn.Conv2d(in_channels=32, out_channels=32, kernel_size=3)
self.max_pool1 = nn.MaxPool2d(kernel_size = 2, stride = 2)
self.conv_layer3 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3)
self.conv_layer4 = nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3)
self.max_pool2 = nn.MaxPool2d(kernel_size = 2, stride = 2)
self.fc1 = nn.Linear(13312, 128)
self.relu1 = nn.ReLU()
self.fc2 = nn.Linear(128, num_classes)
def forward(self, x):
# note the relu activations on the conv layers
out = F.relu(self.conv_layer1(x))
out = F.relu(self.conv_layer2(out))
out = self.max_pool1(out)
out = F.relu(self.conv_layer3(out))
out = F.relu(self.conv_layer4(out))
out = self.max_pool2(out)
# you may want an adaptive average pool 2d here to reduce size of feature map further
out = out.reshape(out.size(0), -1)
out = self.fc1(out)
out = self.relu1(out)
out = self.fc2(out)
# return raw logits, not log-softmax output
return out
def trainTorch(torch_model, train_loader, test_loader, nb_epochs, batch_size, learning_rate, optimizer):
train_loss = []
total = 0
correct = 0
step = 0
for _epoch in range(nb_epochs):
for xs, ys in train_loader:
# the Variable interface has been deprecated for years, it is effectively a no-op in modern pytorch
# see: https://pytorch.org/docs/stable/autograd.html#variable-deprecated
if torch.cuda.is_available():
xs, ys = xs.cuda(), ys.cuda()
optimizer.zero_grad()
logits = torch_model(xs)
# don't softmax or log-softmax the inputs to cross_entropy
loss = F.cross_entropy(logits, ys)
# The following is equivalent but less numerically stable
# loss = F.nll_loss(F.log_softmax(logits), ys)
loss.backward()
train_loss.append(loss.item())
optimizer.step() # update gradients
logits_np = logits.cpu().detach().numpy()
correct += (np.argmax(logits_np, axis=1) == ys.cpu().detach().numpy()).sum()
total += train_loader.batch_size
step += 1
if step % 200 == 0:
acc = float(correct) / total
avg_loss = sum(train_loss) / len(train_loss)
print(f'[{step}] Training accuracy: {acc*100:.2f}% Training loss: {avg_loss:.4f}')
total = 0
correct = 0
train_loss = []
nb_epochs = 8
image_size = 64
batch_size = 64
num_classes = 10178
# increased learning rate to 0.01
learning_rate = 0.01
num_epochs = 8
# Device will determine whether to run the training on GPU or CPU.
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
trans = transforms.Compose([
transforms.Resize(image_size),
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
train_loader = torch.utils.data.DataLoader(
datasets.CelebA('data', split='train', target_type='identity', transform=trans, download=True),
batch_size=batch_size, shuffle=True)
test_loader = torch.utils.data.DataLoader(
datasets.CelebA('data', split='test', target_type='identity', transform=trans),
batch_size=batch_size)
model = ConvNeuralNet(num_classes)
if torch.cuda.is_available():
model.cuda()
#Training the model
print("Training Model")
# Set optimizer with optimizer
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, weight_decay=0.005, momentum=0.9)
total_step = len(train_loader)
trainTorch(model, train_loader, test_loader, nb_epochs, batch_size, learning_rate, optimizer=optimizer)
Output
Training Model
[200] Training accuracy: 0.00% Training loss: 9.2286
[400] Training accuracy: 0.02% Training loss: 9.2286
[600] Training accuracy: 0.04% Training loss: 9.2265
[800] Training accuracy: 0.00% Training loss: 9.2253
[1000] Training accuracy: 0.00% Training loss: 9.2222
[1200] Training accuracy: 0.00% Training loss: 9.2105
[1400] Training accuracy: 0.02% Training loss: 9.1776
[1600] Training accuracy: 0.03% Training loss: 9.1329
[1800] Training accuracy: 0.02% Training loss: 9.1013
[2000] Training accuracy: 0.02% Training loss: 9.0830
[2200] Training accuracy: 0.02% Training loss: 9.0715
[2400] Training accuracy: 0.01% Training loss: 9.0622
[2600] Training accuracy: 0.02% Training loss: 9.0456
[2800] Training accuracy: 0.00% Training loss: 9.0301
[3000] Training accuracy: 0.00% Training loss: 9.0357
[3200] Training accuracy: 0.02% Training loss: 9.0402
[3400] Training accuracy: 0.02% Training loss: 9.0321
[3600] Training accuracy: 0.02% Training loss: 9.0217
[3800] Training accuracy: 0.02% Training loss: 8.9757
[4000] Training accuracy: 0.09% Training loss: 8.9059
[4200] Training accuracy: 0.09% Training loss: 8.8331
[4400] Training accuracy: 0.09% Training loss: 8.7601
[4600] Training accuracy: 0.09% Training loss: 8.7356
[4800] Training accuracy: 0.10% Training loss: 8.6717
[5000] Training accuracy: 0.12% Training loss: 8.6311
[5200] Training accuracy: 0.16% Training loss: 8.5515
[5400] Training accuracy: 0.16% Training loss: 8.4943
[5600] Training accuracy: 0.14% Training loss: 8.4345
[5800] Training accuracy: 0.14% Training loss: 8.4107
[6000] Training accuracy: 0.18% Training loss: 8.3317
[6200] Training accuracy: 0.22% Training loss: 8.2716
[6400] Training accuracy: 0.31% Training loss: 8.1934
[6600] Training accuracy: 0.30% Training loss: 8.1500
[6800] Training accuracy: 0.35% Training loss: 8.0979
[7000] Training accuracy: 0.21% Training loss: 8.0739
[7200] Training accuracy: 0.44% Training loss: 8.0220
[7400] Training accuracy: 0.29% Training loss: 7.9819
From the output we see the loss is decreasing and the accuracy is starting to increase. Its hard to predict how well this will work and when it will converge but this is a good start. You'll probably need to use a better model and a learning rate scheduler to get better performance.
For example, just switching for a resnet-50
model = torchvision.models.resnet50(pretrained=True)
model.fc = nn.Linear(model.fc.in_features, num_classes)
The model starts converging much faster
Training Model
[200] Training accuracy: 0.05% Training loss: 9.1942
[400] Training accuracy: 0.05% Training loss: 8.9244
[600] Training accuracy: 0.15% Training loss: 8.5936
[800] Training accuracy: 0.30% Training loss: 8.3147
[1000] Training accuracy: 0.39% Training loss: 8.0745
[1200] Training accuracy: 0.43% Training loss: 7.9146
[1400] Training accuracy: 0.45% Training loss: 7.7706
[1600] Training accuracy: 0.64% Training loss: 7.6551
[1800] Training accuracy: 0.68% Training loss: 7.5784
[2000] Training accuracy: 0.74% Training loss: 7.5327
[2200] Training accuracy: 0.72% Training loss: 7.4689
[2400] Training accuracy: 0.63% Training loss: 7.4378
[2600] Training accuracy: 0.83% Training loss: 7.3789
[2800] Training accuracy: 0.90% Training loss: 7.2812
[3000] Training accuracy: 0.84% Training loss: 7.2771
[3200] Training accuracy: 0.96% Training loss: 7.2536
[3400] Training accuracy: 1.00% Training loss: 7.2538
I'm using an MLPClassifier for classification of heart diseases. I used imblearn.SMOTE to balance the objects of each class. I was getting very good results (85% balanced acc.), but i was advised that i would not use SMOTE on test data, only for train data. After i made this changes, the performance of my classifier fell down too much (~35% balanced accuracy) and i don't know what can be wrong.
Here is a simple benchmark with training data balanced but test data unbalanced:
And this is the code:
def makeOverSamplesSMOTE(X,y):
from imblearn.over_sampling import SMOTE
sm = SMOTE(sampling_strategy='all')
X, y = sm.fit_sample(X, y)
return X,y
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=20)
## Normalize data
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.fit_transform(X_test)
## SMOTE only on training data
X_train, y_train = makeOverSamplesSMOTE(X_train, y_train)
clf = MLPClassifier(hidden_layer_sizes=(20),verbose=10,
learning_rate_init=0.5, max_iter=2000,
activation='logistic', solver='sgd', shuffle=True, random_state=30)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
I'd like to know what i'm doing wrong, since this seems to be the proper way of preparing data.
The first mistake in your code is when you are transforming data into standard format. You only need to fit StandardScaler once and that is on X_train. You shouldn't refit it on X_test. So the correct code will be:
def makeOverSamplesSMOTE(X,y):
from imblearn.over_sampling import SMOTE
sm = SMOTE(sampling_strategy='all')
X, y = sm.fit_sample(X, y)
return X,y
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=20)
## Normalize data
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)
## SMOTE only on training data
X_train, y_train = makeOverSamplesSMOTE(X_train, y_train)
clf = MLPClassifier(hidden_layer_sizes=(20),verbose=10,
learning_rate_init=0.5, max_iter=2000,
activation='logistic', solver='sgd', shuffle=True, random_state=30)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
For the machine learning model, try reducing the learning rate. it is too high. the default learning rate in sklearn is 0.001. Try changing the activation function and the number of layers. Also not every ML model works on every dataset so you might need to look at your data and choose ML model accordingly.
Hope you have already got better result for your model.I tried by changing few parameter, and I getting accuracy of 65%, when I change it to 90:10 sample I got an accuracy of 70%.
But accuracy can mislead,so I calculated F1 score which give you better picture of prediction.
from sklearn.neural_network import MLPClassifier
clf = MLPClassifier(hidden_layer_sizes=(1,),verbose=False,
learning_rate_init=0.001,
max_iter=2000,
activation='logistic', solver='sgd', shuffle=True, random_state=50)
clf.fit(X_train_res, y_train_res)
y_pred = clf.predict(X_test)
from sklearn.metrics import accuracy_score, confusion_matrix ,classification_report
score=accuracy_score(y_test, y_pred, )
print(score)
cr=classification_report(y_test, clf.predict(X_test))
print(cr)
Accuracy = 0.65
classification report :
precision recall f1-score support
0 0.82 0.97 0.89 33
1 0.67 0.31 0.42 13
2 0.00 0.00 0.00 6
3 0.00 0.00 0.00 4
4 0.29 0.80 0.42 5
micro avg 0.66 0.66 0.66 61
macro avg 0.35 0.42 0.35 61
weighted avg 0.61 0.66 0.61 61
confusion_matrix:
array([[32, 0, 0, 0, 1],
[ 4, 4, 2, 0, 3],
[ 1, 1, 0, 0, 4],
[ 1, 1, 0, 0, 2],
[ 1, 0, 0, 0, 4]], dtype=int64)
I'm new to data mining. I have implemented my linear SVM as following.
X_train, X_test, y_train, y_test, = train_test_split(X, y, test_size=0.1, random_state = 0)
#print X_train.shape, y_train.shape
#print X_test.shape, y_test.shape
clf = svm.SVC(kernel='linear', C=1).fit(X_train, y_train)
print clf.score(X_test, y_test)
clf = svm.SVC(kernel='linear', C=1)
scores = cross_val_score(clf, X, y, cv=10)
print scores
print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std()*2 ))
tuned_parameters = [{'kernel': ['rbf'], 'gamma': [1e-3, 1e-4],'C': [1, 10, 100, 1000]},{'kernel': ['linear'], 'C': [1, 10, 100, 1000]}]
scores = ['precision', 'recall']
svr = svm.SVC(C=1)
for score in scores:
print("# Tuning hyper-parameters for %s"% score)
clf =GridSearchCV(svr, tuned_parameters, cv=10,scoring='%s_macro'% score)
clf.fit(X_train, y_train)
print("best parameters %s" % clf.best_params_)
Here, My data is too huge so what should I do to make my linear svm to run it very fast?
Do parameter tuning only on a sample.
Once you have found good parameters, then use the entire data set.
I have trained a linear classifier on the MNIST dataset with 92% accuracy. Then I fixed the weights and optimized the input image such that softmax probability for 8 was maximized. But the softmax loss doesn't decrease below 2.302 (-log(1/10)) which means that my training has been useless. What am I doing wrong?
Code for training the weights:
import tensorflow as tf
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
trX, trY, teX, teY = mnist.train.images, mnist.train.labels,
mnist.test.images, mnist.test.labels
X = tf.placeholder("float", [None, 784])
Y = tf.placeholder("float", [None, 10])
w = tf.Variable(tf.random_normal([784, 10], stddev=0.01))
b = tf.Variable(tf.zeros([10]))
o = tf.nn.sigmoid(tf.matmul(X, w)+b)
cost= tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=o, labels=Y))
train_op = tf.train.RMSPropOptimizer(0.001, 0.9).minimize(cost)
predict_op = tf.argmax(o, 1)
sess=tf.Session()
sess.run(tf.global_variables_initializer())
for i in range(100):
for start, end in zip(range(0, len(trX), 256), range(256, len(trX)+1, 256)):
sess.run(train_op, feed_dict={X: trX[start:end], Y: trY[start:end]})
print(i, np.mean(np.argmax(teY, axis=1) == sess.run(predict_op, feed_dict={X: teX})))
Code for training the image for fixed weights:
#Copy trained weights into W,B and pass them as placeholders to new model
W=sess.run(w)
B=sess.run(b)
X=tf.Variable(tf.random_normal([1, 784], stddev=0.01))
Y=tf.constant([0, 0, 0, 0, 0, 0, 0, 0, 1, 0])
w=tf.placeholder("float")
b=tf.placeholder("float")
o = tf.nn.sigmoid(tf.matmul(X, w)+b)
cost= tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=o, labels=Y))
train_op = tf.train.RMSPropOptimizer(0.001, 0.9).minimize(cost)
predict_op = tf.argmax(o, 1)
sess.run(tf.global_variables_initializer())
for i in range(1000):
sess.run(train_op, feed_dict={w:W, b:B})
if i%50==0:
sess.run(cost, feed_dict={w:W, b:B})
print(i, sess.run(predict_op, feed_dict={w:W, b:B}))
You shouldn't call tf.sigmoid on the output of your net. softmax_cross_entropy_with_logits assumes your inputs are logits, i.e. unconstrained real numbers. Using
o = tf.matmul(X, w)+b
increases your accuracy to 92.8%.
With this modification, your second training works. The cost reaches 0 although the resulting image is anything but appealing.
I have tried many examples with F1 micro and Accuracy in scikit-learn and in all of them, I see that F1 micro is the same as Accuracy. Is this always true?
Script
from sklearn import svm
from sklearn import metrics
from sklearn.cross_validation import train_test_split
from sklearn.datasets import load_iris
from sklearn.metrics import f1_score, accuracy_score
# prepare dataset
iris = load_iris()
X = iris.data[:, :2]
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# svm classification
clf = svm.SVC(kernel='rbf', gamma=0.7, C = 1.0).fit(X_train, y_train)
y_predicted = clf.predict(X_test)
# performance
print "Classification report for %s" % clf
print metrics.classification_report(y_test, y_predicted)
print("F1 micro: %1.4f\n" % f1_score(y_test, y_predicted, average='micro'))
print("F1 macro: %1.4f\n" % f1_score(y_test, y_predicted, average='macro'))
print("F1 weighted: %1.4f\n" % f1_score(y_test, y_predicted, average='weighted'))
print("Accuracy: %1.4f" % (accuracy_score(y_test, y_predicted)))
Output
Classification report for SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape=None, degree=3, gamma=0.7, kernel='rbf',
max_iter=-1, probability=False, random_state=None, shrinking=True,
tol=0.001, verbose=False)
precision recall f1-score support
0 1.00 0.90 0.95 10
1 0.50 0.88 0.64 8
2 0.86 0.50 0.63 12
avg / total 0.81 0.73 0.74 30
F1 micro: 0.7333
F1 macro: 0.7384
F1 weighted: 0.7381
Accuracy: 0.7333
F1 micro = Accuracy
In classification tasks for which every test case is guaranteed to be assigned to exactly one class, micro-F is equivalent to accuracy. It won't be the case in multi-label classification.
This is because we are dealing with a multi class classification , where every test data should belong to only 1 class and not multi label , in such case where there is no TN , we can call True Negatives as True Positives.
Formula wise ,
correction : F1 score is 2* precision* recall / (precision + recall)
Micoaverage precision, recall, f1 and accuracy are all equal for cases in which every instance must be classified into one (and only one) class. A simple way to see this is by looking at the formulas precision=TP/(TP+FP) and recall=TP/(TP+FN). The numerators are the same, and every FN for one class is another classes's FP, which makes the denominators the same as well. If precision = recall, then f1 will also be equal.
For any inputs should should be able to show that:
from sklearn.metrics import accuracy_score as acc
from sklearn.metrics import f1_score as f1
f1(y_true,y_pred,average='micro')=acc(y_true,y_pred)
I had the same issue so I investigated and came up with this:
Just thinking about the theory, it is impossible that accuracy and the f1-score are the very same for every single dataset. The reason for this is that the f1-score is independent from the true-negatives while accuracy is not.
By taking a dataset where f1 = acc and adding true negatives to it, you get f1 != acc.
>>> from sklearn.metrics import accuracy_score as acc
>>> from sklearn.metrics import f1_score as f1
>>> y_pred = [0, 1, 1, 0, 1, 0]
>>> y_true = [0, 1, 1, 0, 0, 1]
>>> acc(y_true, y_pred)
0.6666666666666666
>>> f1(y_true,y_pred)
0.6666666666666666
>>> y_true = [0, 1, 1, 0, 1, 0, 0, 0, 0]
>>> y_pred = [0, 1, 1, 0, 0, 1, 0, 0, 0]
>>> acc(y_true, y_pred)
0.7777777777777778
>>> f1(y_true,y_pred)
0.6666666666666666