Poor quality classifier in Tflearn? - machine-learning

I am new to machine learning and trying out TFlearn because it is simple.
I am trying to make a basic classifier which I find interesting.
My objective is to train the system to predict in which direction a point lies.
For example If I feed two 2D co-ordinates (50,50) and (51,51) the system has to predict that the direction is NE (North east).
If I feed (50,50) and (49,49) the system must predict that the direction is SW (South west)
Input: X1,Y1,X2,Y2,Label
Output: 0 to 8. For the 8 directions.
So here is the small code I wrote,
from __future__ import print_function
import numpy as np
import tflearn
import tensorflow as tf
import time
from tflearn.data_utils import load_csv
#Sample input 50,50,51,51,5
data, labels = load_csv(filename, target_column=4,
categorical_labels=True, n_classes=8)
my_optimizer = tflearn.SGD(learning_rate=0.1)
net = tflearn.input_data(shape=[None, 4])
net = tflearn.fully_connected(net, 32) #input 4, output 32
net = tflearn.fully_connected(net, 32) #input 32, output 32
net = tflearn.fully_connected(net, 8, activation='softmax')
net = tflearn.regression(net,optimizer=my_optimizer)
model = tflearn.DNN(net)
model.fit(data, labels, n_epoch=100, batch_size=100000, show_metric=True)
model.save("direction-classifier.tfl")
The problem I am facing is that even after I passed around 40 million input samples, the systems accuracy is as low as 20%.
I restricted the inputs to 40-x-60 and 40-y-60
I cannot understand if I over-fitted the sample, because the accuracy was never high throughout the training period of the total 40 million inputs
Why is the accuracy so low for this simple example?
EDIT:
I have reduced the learning rate and made the batch size small. However, the results are still the same with very poor accuracy.
I have included the output of the first 25 steps.
--
Training Step: 100000 | total loss: 6.33983 | time: 163.327s
| SGD | epoch: 001 | loss: 6.33983 - acc: 0.0663 -- iter: 999999/999999
--
Training Step: 200000 | total loss: 6.84055 | time: 161.981ss
| SGD | epoch: 002 | loss: 6.84055 - acc: 0.1568 -- iter: 999999/999999
--
Training Step: 300000 | total loss: 5.90203 | time: 158.853ss
| SGD | epoch: 003 | loss: 5.90203 - acc: 0.1426 -- iter: 999999/999999
--
Training Step: 400000 | total loss: 5.97782 | time: 157.607ss
| SGD | epoch: 004 | loss: 5.97782 - acc: 0.1465 -- iter: 999999/999999
--
Training Step: 500000 | total loss: 5.97215 | time: 155.929ss
| SGD | epoch: 005 | loss: 5.97215 - acc: 0.1234 -- iter: 999999/999999
--
Training Step: 600000 | total loss: 6.86967 | time: 157.299ss
| SGD | epoch: 006 | loss: 6.86967 - acc: 0.1230 -- iter: 999999/999999
--
Training Step: 700000 | total loss: 6.10330 | time: 158.137ss
| SGD | epoch: 007 | loss: 6.10330 - acc: 0.1242 -- iter: 999999/999999
--
Training Step: 800000 | total loss: 5.81901 | time: 157.464ss
| SGD | epoch: 008 | loss: 5.81901 - acc: 0.1464 -- iter: 999999/999999
--
Training Step: 900000 | total loss: 7.09744 | time: 157.486ss
| SGD | epoch: 009 | loss: 7.09744 - acc: 0.1359 -- iter: 999999/999999
--
Training Step: 1000000 | total loss: 7.19259 | time: 158.369s
| SGD | epoch: 010 | loss: 7.19259 - acc: 0.1248 -- iter: 999999/999999
--
Training Step: 1100000 | total loss: 5.60177 | time: 157.221ss
| SGD | epoch: 011 | loss: 5.60177 - acc: 0.1378 -- iter: 999999/999999
--
Training Step: 1200000 | total loss: 7.16676 | time: 158.607ss
| SGD | epoch: 012 | loss: 7.16676 - acc: 0.1210 -- iter: 999999/999999
--
Training Step: 1300000 | total loss: 6.19163 | time: 163.711ss
| SGD | epoch: 013 | loss: 6.19163 - acc: 0.1635 -- iter: 999999/999999
--
Training Step: 1400000 | total loss: 7.46101 | time: 162.091ss
| SGD | epoch: 014 | loss: 7.46101 - acc: 0.1216 -- iter: 999999/999999
--
Training Step: 1500000 | total loss: 7.78055 | time: 158.468ss
| SGD | epoch: 015 | loss: 7.78055 - acc: 0.1122 -- iter: 999999/999999
--
Training Step: 1600000 | total loss: 6.03101 | time: 158.251ss
| SGD | epoch: 016 | loss: 6.03101 - acc: 0.1103 -- iter: 999999/999999
--
Training Step: 1700000 | total loss: 5.59769 | time: 158.083ss
| SGD | epoch: 017 | loss: 5.59769 - acc: 0.1182 -- iter: 999999/999999
--
Training Step: 1800000 | total loss: 5.45591 | time: 158.088ss
| SGD | epoch: 018 | loss: 5.45591 - acc: 0.0868 -- iter: 999999/999999
--
Training Step: 1900000 | total loss: 6.54951 | time: 157.755ss
| SGD | epoch: 019 | loss: 6.54951 - acc: 0.1353 -- iter: 999999/999999
--
Training Step: 2000000 | total loss: 6.18566 | time: 157.408ss
| SGD | epoch: 020 | loss: 6.18566 - acc: 0.0551 -- iter: 999999/999999
--
Training Step: 2100000 | total loss: 4.95146 | time: 157.572ss
| SGD | epoch: 021 | loss: 4.95146 - acc: 0.1114 -- iter: 999999/999999
--
Training Step: 2200000 | total loss: 5.97208 | time: 157.279ss
| SGD | epoch: 022 | loss: 5.97208 - acc: 0.1277 -- iter: 999999/999999
--
Training Step: 2300000 | total loss: 6.75645 | time: 157.201ss
| SGD | epoch: 023 | loss: 6.75645 - acc: 0.1507 -- iter: 999999/999999
--
Training Step: 2400000 | total loss: 7.04119 | time: 157.346ss
| SGD | epoch: 024 | loss: 7.04119 - acc: 0.1512 -- iter: 999999/999999
--
Training Step: 2500000 | total loss: 5.95451 | time: 157.722ss
| SGD | epoch: 025 | loss: 5.95451 - acc: 0.1421 -- iter: 999999/999999

As discussed in my comment above, here is code that trains a multi-layer perceptron classifier model using a MLP helper class I created. The class is implemented using TensorFlow and follows the scikit-learn fit, predict, score interface.
The basic idea is to generate a random start and end point then to use a dictionary to create labels based on the direction. I used np.unique to find the number of class labels in the generated data as it can vary (some directions may be missing). I also included an empty string label for when the start and end point are the same.
Code
Using the code below I was able to achieve 100% cross-validation accuracy on some runs.
import numpy as np
from sklearn.model_selection import ShuffleSplit
from TFANN import MLPC
#Dictionary to lookup direction ()
DM = {(-1, -1):'SW', (-1, 0):'W', (-1, 1):'NW', (0, 1):'N',
( 1, 1):'NE', ( 1, 0):'E', ( 1, -1):'SE', (0, -1):'S',
( 0, 0):''}
NR = 4096 #Number of rows in sample matrix
A1 = np.random.randint(40, 61, size = (NR, 2)) #Random starting point
A2 = np.random.randint(40, 61, size = (NR, 2)) #Random ending point
A = np.hstack([A1, A2]) #Concat start and end point as feature vector
#Create label from direction vector
Y = np.array([DM[(x, y)] for x, y in (A2 - A1).clip(-1, 1)])
NC = len(np.unique(Y)) #Number of classes
ss = ShuffleSplit(n_splits = 1)
trn, tst = next(ss.split(A)) #Make a train/test split for cross-validation
#%% Create and train Multi-Layer Perceptron for Classification (MLPC)
l = [4, 6, 6, NC] #Neuron counts in each layer
mlpc = MLPC(l, batchSize = 64, maxIter = 128, verbose = True)
mlpc.fit(A[trn], Y[trn])
s1 = mlpc.score(A[trn], Y[trn]) #Training accuracy
s2 = mlpc.score(A[tst], Y[tst]) #Testing accuracy
s3 = mlpc.score(A, Y) #Total accuracy
print('Trn: {:05f}\tTst: {:05f}\tAll: {:05f}'.format(s1, s2, s3))
Results
This is a sample run of the above code on my machine:
Iter 1 2.59423236 (Batch Size: 64)
Iter 2 2.25392553 (Batch Size: 64)
Iter 3 2.02569708 (Batch Size: 64)
...
Iter 12 1.53575111 (Batch Size: 64)
Iter 13 1.47963311 (Batch Size: 64)
Iter 14 1.42776408 (Batch Size: 64)
...
Iter 83 0.23911642 (Batch Size: 64)
Iter 84 0.22893350 (Batch Size: 64)
Iter 85 0.23644384 (Batch Size: 64)
...
Iter 94 0.21170238 (Batch Size: 64)
Iter 95 0.20718799 (Batch Size: 64)
Iter 96 0.21230888 (Batch Size: 64)
...
Iter 126 0.17334313 (Batch Size: 64)
Iter 127 0.16970796 (Batch Size: 64)
Iter 128 0.15931854 (Batch Size: 64)
Trn: 0.995659 Tst: 1.000000 All: 0.996094

Turns out the optimizer was causing all the problems. When the custom optimizer was removed, the loss began falling correctly and the accuracy increased to 99%
The following two lines must be modified.
my_optimizer = tflearn.SGD(learning_rate=0.1)
net = tflearn.regression(net,optimizer=my_optimizer)
When replaced with
net = tflearn.regression(net)
yielded perfect results.

Related

CNN model for RGB images giving 0% accuracy

I am trying to train a CNN model on CelebA (RGB images) dataset. But, when I train the model and check its accuracy it is 0% or close to 0%. I think the issue is in the ConNeuralNet function or the hyperparameters but due to my limited knowledge I'm not sure what I'm missing here. Can someone please help. Thanks
# Creating a simple network
class ConvNeuralNet(torch.nn.Module):
def __init__(self, num_classes=10178):
super(ConvNeuralNet, self).__init__()
self.conv_layer1 = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3)
self.conv_layer2 = nn.Conv2d(in_channels=32, out_channels=32, kernel_size=3)
self.max_pool1 = nn.MaxPool2d(kernel_size = 2, stride = 2)
self.conv_layer3 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3)
self.conv_layer4 = nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3)
self.max_pool2 = nn.MaxPool2d(kernel_size = 2, stride = 2)
self.fc1 = nn.Linear(13312, 128)
self.relu1 = nn.ReLU()
self.fc2 = nn.Linear(128, num_classes)
def forward(self, x):
out = self.conv_layer1(x)
out = self.conv_layer2(out)
out = self.max_pool1(out)
out = self.conv_layer3(out)
out = self.conv_layer4(out)
out = self.max_pool2(out)
out = out.reshape(out.size(0), -1)
out = self.fc1(out)
out = self.relu1(out)
out = self.fc2(out)
return F.log_softmax(out,dim=-1)
def trainTorch(torch_model, train_loader, test_loader,
nb_epochs=NB_EPOCHS, batch_size=BATCH_SIZE, train_end=-1, test_end=-1, learning_rate=LEARNING_RATE, optimizer=None):
train_loss = []
total = 0
correct = 0
step = 0
for _epoch in range(nb_epochs):
for xs, ys in train_loader:
xs, ys = Variable(xs), Variable(ys)
if torch.cuda.is_available():
xs, ys = xs.cuda(), ys.cuda()
optimizer.zero_grad()
preds = torch_model(xs)
preds = F.log_softmax(preds, dim=1)
loss = F.cross_entropy(preds, ys)
loss.backward()
train_loss.append(loss.data.item())
optimizer.step() # update gradients
preds_np = preds.cpu().detach().numpy()
correct += (np.argmax(preds_np, axis=1) == ys.cpu().detach().numpy()).sum()
total += train_loader.batch_size
step += 1
if total % 1000 == 0:
acc = float(correct) / total
print('[%s] Training accuracy: %.2f%%' % (step, acc * 100))
total = 0
correct = 0
nb_epochs = 8
image_size = 64
batch_size = 64
num_classes = 10178
learning_rate = 0.001
num_epochs = 8
# Device will determine whether to run the training on GPU or CPU.
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
trans = transforms.Compose([
transforms.Resize(image_size),
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
train_loader = torch.utils.data.DataLoader(
datasets.CelebA('data', split='train', target_type='identity', transform=trans, download="True"),
batch_size=batch_size, shuffle=True)
test_loader = torch.utils.data.DataLoader(
datasets.CelebA('data', split='test', target_type='identity', transform=trans),
batch_size=batch_size)
#Training the model
print("Training Model")
# Set optimizer with optimizer
optimizer = torch.optim.SGD(model1.parameters(), lr=learning_rate, weight_decay = 0.005, momentum = 0.9)
total_step = len(train_loader)
trainTorch(model1, train_loader, test_loader, nb_epochs, batch_size, train_end, test_end, learning_rate, optimizer = optimizer)
**Update I ran the code for a bit to see if it would start converging. One thing is that there are over 10,000 classes. With a batch size of 64 this means that it will take more than 150 mini-batches before your model has seen every class in your dataset. You certanly shouldn't expect the model to start achieving accurate predictions within a few hundred steps.
When I printed the loss value I noticed it was decreasing very slowly. I changed to learning rate to 0.01 and it started decreasing faster.
Also, your model is very shallow for a face recognition model. You're better off using something like a resnet variant (e.g. resnet-50 or resnet-101 from torchvision), rather than custom rolling your own model.
Primary changes include
Learning rate increased
Fix the loss function
Remove log_softmax from output of model
Add activation to the conv layers
IMO the comments about softmax are a bit misleading since you don't need to softmax the output of your model if you are using cross_entropy. You also don't need softmax to get the argmax of the prediction since both softmax and log_softmax don't change the relative ordering of the predictions (i.e. both softmax and log are strictly increasing functions).
IMO the comment about using average pooling to reduce the input size of the first fc layer is a good one and may improve performance, but you'll need to experiment with that one to find good parameters for it so I left it out of this answer.
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
from torchvision import datasets, transforms
# Creating a simple network
class ConvNeuralNet(torch.nn.Module):
def __init__(self, num_classes=10178):
super(ConvNeuralNet, self).__init__()
self.conv_layer1 = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3)
self.conv_layer2 = nn.Conv2d(in_channels=32, out_channels=32, kernel_size=3)
self.max_pool1 = nn.MaxPool2d(kernel_size = 2, stride = 2)
self.conv_layer3 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3)
self.conv_layer4 = nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3)
self.max_pool2 = nn.MaxPool2d(kernel_size = 2, stride = 2)
self.fc1 = nn.Linear(13312, 128)
self.relu1 = nn.ReLU()
self.fc2 = nn.Linear(128, num_classes)
def forward(self, x):
# note the relu activations on the conv layers
out = F.relu(self.conv_layer1(x))
out = F.relu(self.conv_layer2(out))
out = self.max_pool1(out)
out = F.relu(self.conv_layer3(out))
out = F.relu(self.conv_layer4(out))
out = self.max_pool2(out)
# you may want an adaptive average pool 2d here to reduce size of feature map further
out = out.reshape(out.size(0), -1)
out = self.fc1(out)
out = self.relu1(out)
out = self.fc2(out)
# return raw logits, not log-softmax output
return out
def trainTorch(torch_model, train_loader, test_loader, nb_epochs, batch_size, learning_rate, optimizer):
train_loss = []
total = 0
correct = 0
step = 0
for _epoch in range(nb_epochs):
for xs, ys in train_loader:
# the Variable interface has been deprecated for years, it is effectively a no-op in modern pytorch
# see: https://pytorch.org/docs/stable/autograd.html#variable-deprecated
if torch.cuda.is_available():
xs, ys = xs.cuda(), ys.cuda()
optimizer.zero_grad()
logits = torch_model(xs)
# don't softmax or log-softmax the inputs to cross_entropy
loss = F.cross_entropy(logits, ys)
# The following is equivalent but less numerically stable
# loss = F.nll_loss(F.log_softmax(logits), ys)
loss.backward()
train_loss.append(loss.item())
optimizer.step() # update gradients
logits_np = logits.cpu().detach().numpy()
correct += (np.argmax(logits_np, axis=1) == ys.cpu().detach().numpy()).sum()
total += train_loader.batch_size
step += 1
if step % 200 == 0:
acc = float(correct) / total
avg_loss = sum(train_loss) / len(train_loss)
print(f'[{step}] Training accuracy: {acc*100:.2f}% Training loss: {avg_loss:.4f}')
total = 0
correct = 0
train_loss = []
nb_epochs = 8
image_size = 64
batch_size = 64
num_classes = 10178
# increased learning rate to 0.01
learning_rate = 0.01
num_epochs = 8
# Device will determine whether to run the training on GPU or CPU.
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
trans = transforms.Compose([
transforms.Resize(image_size),
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
train_loader = torch.utils.data.DataLoader(
datasets.CelebA('data', split='train', target_type='identity', transform=trans, download=True),
batch_size=batch_size, shuffle=True)
test_loader = torch.utils.data.DataLoader(
datasets.CelebA('data', split='test', target_type='identity', transform=trans),
batch_size=batch_size)
model = ConvNeuralNet(num_classes)
if torch.cuda.is_available():
model.cuda()
#Training the model
print("Training Model")
# Set optimizer with optimizer
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, weight_decay=0.005, momentum=0.9)
total_step = len(train_loader)
trainTorch(model, train_loader, test_loader, nb_epochs, batch_size, learning_rate, optimizer=optimizer)
Output
Training Model
[200] Training accuracy: 0.00% Training loss: 9.2286
[400] Training accuracy: 0.02% Training loss: 9.2286
[600] Training accuracy: 0.04% Training loss: 9.2265
[800] Training accuracy: 0.00% Training loss: 9.2253
[1000] Training accuracy: 0.00% Training loss: 9.2222
[1200] Training accuracy: 0.00% Training loss: 9.2105
[1400] Training accuracy: 0.02% Training loss: 9.1776
[1600] Training accuracy: 0.03% Training loss: 9.1329
[1800] Training accuracy: 0.02% Training loss: 9.1013
[2000] Training accuracy: 0.02% Training loss: 9.0830
[2200] Training accuracy: 0.02% Training loss: 9.0715
[2400] Training accuracy: 0.01% Training loss: 9.0622
[2600] Training accuracy: 0.02% Training loss: 9.0456
[2800] Training accuracy: 0.00% Training loss: 9.0301
[3000] Training accuracy: 0.00% Training loss: 9.0357
[3200] Training accuracy: 0.02% Training loss: 9.0402
[3400] Training accuracy: 0.02% Training loss: 9.0321
[3600] Training accuracy: 0.02% Training loss: 9.0217
[3800] Training accuracy: 0.02% Training loss: 8.9757
[4000] Training accuracy: 0.09% Training loss: 8.9059
[4200] Training accuracy: 0.09% Training loss: 8.8331
[4400] Training accuracy: 0.09% Training loss: 8.7601
[4600] Training accuracy: 0.09% Training loss: 8.7356
[4800] Training accuracy: 0.10% Training loss: 8.6717
[5000] Training accuracy: 0.12% Training loss: 8.6311
[5200] Training accuracy: 0.16% Training loss: 8.5515
[5400] Training accuracy: 0.16% Training loss: 8.4943
[5600] Training accuracy: 0.14% Training loss: 8.4345
[5800] Training accuracy: 0.14% Training loss: 8.4107
[6000] Training accuracy: 0.18% Training loss: 8.3317
[6200] Training accuracy: 0.22% Training loss: 8.2716
[6400] Training accuracy: 0.31% Training loss: 8.1934
[6600] Training accuracy: 0.30% Training loss: 8.1500
[6800] Training accuracy: 0.35% Training loss: 8.0979
[7000] Training accuracy: 0.21% Training loss: 8.0739
[7200] Training accuracy: 0.44% Training loss: 8.0220
[7400] Training accuracy: 0.29% Training loss: 7.9819
From the output we see the loss is decreasing and the accuracy is starting to increase. Its hard to predict how well this will work and when it will converge but this is a good start. You'll probably need to use a better model and a learning rate scheduler to get better performance.
For example, just switching for a resnet-50
model = torchvision.models.resnet50(pretrained=True)
model.fc = nn.Linear(model.fc.in_features, num_classes)
The model starts converging much faster
Training Model
[200] Training accuracy: 0.05% Training loss: 9.1942
[400] Training accuracy: 0.05% Training loss: 8.9244
[600] Training accuracy: 0.15% Training loss: 8.5936
[800] Training accuracy: 0.30% Training loss: 8.3147
[1000] Training accuracy: 0.39% Training loss: 8.0745
[1200] Training accuracy: 0.43% Training loss: 7.9146
[1400] Training accuracy: 0.45% Training loss: 7.7706
[1600] Training accuracy: 0.64% Training loss: 7.6551
[1800] Training accuracy: 0.68% Training loss: 7.5784
[2000] Training accuracy: 0.74% Training loss: 7.5327
[2200] Training accuracy: 0.72% Training loss: 7.4689
[2400] Training accuracy: 0.63% Training loss: 7.4378
[2600] Training accuracy: 0.83% Training loss: 7.3789
[2800] Training accuracy: 0.90% Training loss: 7.2812
[3000] Training accuracy: 0.84% Training loss: 7.2771
[3200] Training accuracy: 0.96% Training loss: 7.2536
[3400] Training accuracy: 1.00% Training loss: 7.2538

Getting Different results on Each Iteration using Long Short Term Memory[LSTM] for text classification

I am using LTSM Deep-learning technique to classify my text, First i am dividing them into text and lables using panda library and making their tokens and then dividing them into into training and text data sets,whenever i runs the code, i get different results which varies from (80 to 100)percent.
Here is my code,
tokenizer = Tokenizer(num_words=MAX_NB_WORDS, filters='!"#$%&()*+,-./:;<=>?#[\]^_`{|}~',
lower=True)
tokenizer.fit_on_texts(trainDF['texts'])
word_index = tokenizer.word_index
print('Found %s unique tokens.' % len(word_index))
X = tokenizer.texts_to_sequences(trainDF['texts'])
X = pad_sequences(X, maxlen=MAX_SEQUENCE_LENGTH)
print('Shape of data tensor:', X.shape)
Y = pd.get_dummies(trainDF['label'])
print('Shape of label tensor:', Y.shape)
X_train, X_test, Y_train, Y_test = train_test_split(X,Y, test_size = 0.10, random_state = 42)
print(X_train.shape,Y_train.shape)
print(X_test.shape,Y_test.shape)
model = Sequential()
model.add(Embedding(MAX_NB_WORDS, EMBEDDING_DIM, input_length=X.shape[1]))
model.add(SpatialDropout1D(0.2))
model.add(LSTM(100, dropout=0.2, recurrent_dropout=0.2))
variables_for_classification=6 #change it as per your number of categories
model.add(Dense(variables_for_classification, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())
epochs = 5
batch_size = 64
history = model.fit(X_train, Y_train, epochs=epochs,
batch_size=batch_size,validation_split=0.1,callbacks=[EarlyStopping(monitor='val_loss', patience=3,
min_delta=0.0001)])
accr = model.evaluate(X_test,Y_test)
print('Test set\n Loss: {:0.3f}\n Accuracy: {:0.3f}'.format(accr[0],accr[1]))
Train on 794 samples, validate on 89 samples
Epoch 1/5
794/794 [==============================] - 19s 24ms/step - loss: 1.6401 - accuracy: 0.6297 - val_loss: 0.9098 - val_accuracy: 0.5843
Epoch 2/5
794/794 [==============================] - 16s 20ms/step - loss: 0.8365 - accuracy: 0.7166 - val_loss: 0.7487 - val_accuracy: 0.7753
Epoch 3/5
794/794 [==============================] - 16s 20ms/step - loss: 0.7093 - accuracy: 0.8401 - val_loss: 0.6519 - val_accuracy: 0.8652
Epoch 4/5
794/794 [==============================] - 16s 20ms/step - loss: 0.5857 - accuracy: 0.8829 - val_loss: 0.4935 - val_accuracy: 1.0000
Epoch 5/5
794/794 [==============================] - 16s 20ms/step - loss: 0.4248 - accuracy: 0.9345 - val_loss: 0.3512 - val_accuracy: 0.8652
99/99 [==============================] - 0s 2ms/step
Test set
Loss: 0.348
Accuracy: 0.869
in the last run accuracy was 100 percent.

How to apply classification on series data in Keras?

The structure of my input data is:
print(df.col)
0 [262, 330, 392, 522, 784, 0, 0]
1 [262, 290, 330, 392, 522, 784, 0]
2 [262, 330, 392, 522, 784, 0, 0]
3 [250, 262, 330, 392, 522, 784, 0]
4 [262, 290, 306, 330, 392, 784, 0]
.
.
.
I had variable sized data so I've added a padding of 0's in the end to fix the input data shape.
The output column is:
print(df.predict)
array([[0., 0., 0., 1.],
[1., 0., 0., 0.],
[0., 0., 1., 0.],
[0., 0., 1., 0.],
[0., 0., 1., 0.],
[0., 1., 0., 0.],...])
Output is one hot encoded.
Following is my model:
model = Sequential()
model.add(Dense(7, activation='relu', input_dim = 7))
model.add(Dense(10, activation='relu'))
model.add(Dense(10, activation='relu'))
model.add(Dense(4))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# Fit the model
model.fit(X_train, y_train, epochs=500, batch_size=10, verbose=2)
The accuracy and loss become constant after 2-3 epochs.
Epoch 1/500
0s - loss: 5.8413 - acc: 0.1754
Epoch 2/500
0s - loss: 5.7398 - acc: 0.1754
Epoch 3/500
0s - loss: 5.7190 - acc: 0.1754
Epoch 4/500
0s - loss: 5.6885 - acc: 0.1754
Epoch 5/500
0s - loss: 5.6650 - acc: 0.1754
Epoch 6/500
0s - loss: 5.6403 - acc: 0.1754
Epoch 7/500
0s - loss: 5.6164 - acc: 0.2456
Epoch 8/500
0s - loss: 5.5900 - acc: 0.2456
Epoch 9/500
0s - loss: 5.5730 - acc: 0.2456
...
0s - loss: 5.3727 - acc: 0.1754
Epoch 499/500
0s - loss: 5.3727 - acc: 0.1754
Epoch 500/500
0s - loss: 5.3727 - acc: 0.1754
I have 72 data points and 4 classes (about 18 samples for each class)
The data is fairly simple. Why is the accuracy so low?
Is the model designed right?
I'm new to ML and Keras. Any help is appreciated.
Try this model.add(layers.Dense(4, activation = 'softmax')) as you last layer.
If you have more than 2 classes for classification you will need a softmax layer in the end. This is a function, that output the probabilities for the 4 different classes (all add to 1) and the one with the highest probability will be your class. This way your network will be able to learn all the 4 different classes instead of only two.

Keras accuracy never exceeds 19%

I am taking the images from the SVHN (street view house number dataset, stanford) and I could really use some help in figuring out why my accuracy does not increase past 19%... This is essentially an MNIST tutorial with more difficult images (other numbers could be off center, blurs, shadows etc..)
I essentially take each image and subtract that image's mean then I normalize to 0-1 (divide by 255.)
The pipeline is simple enough:
2 Convolution 2d Layers (32 filters, 3x3)
MaxPool (2x2)
Dropout (.25)
2 Convolution 2d layers (64 filters, 3x3)
Max Pool (2x2)
Dropout(.25)
Flatten
Dense Relu
Dropout(.5)
Dense Softmax (10)
1792/73257 [..............................] - ETA: 3:17 - loss: 2.3241 - acc: 0.1602
1920/73257 [..............................] - ETA: 3:16 - loss: 2.3203 - acc: 0.1625
2048/73257 [..............................] - ETA: 3:14 - loss: 2.3177 - acc: 0.1621
2176/73257 [..............................] - ETA: 3:13 - loss: 2.3104 - acc: 0.1682
...
...
...
53376/73257 [====================>.........] - ETA: 51s - loss: 2.2439 - acc: 0.1879
53504/73257 [====================>.........] - ETA: 51s - loss: 2.2439 - acc: 0.1879
53632/73257 [====================>.........] - ETA: 50s - loss: 2.2439 - acc: 0.1878
53760/73257 [=====================>........] - ETA: 50s - loss: 2.2439 - acc: 0.1879
Can anyone help me figure out what I'm doing wrong? Are there any tips to figuring out why it would increase in the beginning as normal then taper off so quickly?
I am using categorical cross entropy with an rmsprop optimizer
epochs: 20
batch_size: 128
image_size: 32x32
model = Sequential()
model.add(Convolution2D(32, (3, 3),
strides=1,
activation='relu',
padding='same',
input_shape=input_shape,
data_format='channels_last'))
model.add(Convolution2D(32, (3, 3), padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2), data_format='channels_last'))
model.add(Dropout(0.25))
model.add(Convolution2D(64, (3, 3), activation='relu'))
model.add(Convolution2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(model.output_shape[1], activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))
#METHOD1
# print('compiling model...')
# model.compile(loss='mean_squared_error',
# optimizer='sgd',
# metrics=['accuracy'])
# print('fitting model...')
#
# model.fit(X_train, y_train, batch_size=64, epochs=1, verbose=1)
# METHOD2
sgd = SGD(lr=0.05)
model.compile(loss='categorical_crossentropy',
optimizer=sgd,
metrics=['accuracy'])
model.fit(X_train, y_train,
epochs=20,
batch_size=128)
score = model.evaluate(X_test, y_test, batch_size=128)

LSTM labeling all samples as the same class

I'm trying to design an LSTM network using Keras to combine word embeddings and other features in a binary classification setting. My test set contains 250 samples per class.
When I run my model using only the word embedding layers (the "model" layer in the code), I get an average F1 of around 0.67. When I create a new branch with the other features of fixed size that I compute separately ("branch2") and merge these with the word embeddings using "concat", the predictions all revert to a single class (giving perfect recall for that class), and average F1 drops to 0.33.
Am I adding in the features and training/testing incorrectly?
def create_model(embedding_index, sequence_features, optimizer='rmsprop'):
# Branch 1: word embeddings
model = Sequential()
embedding_layer = create_embedding_matrix(embedding_index, word_index)
model.add(embedding_layer)
model.add(Convolution1D(nb_filter=32, filter_length=3, border_mode='same', activation='tanh'))
model.add(MaxPooling1D(pool_length=2))
model.add(Bidirectional(LSTM(100)))
model.add(Dropout(0.2))
model.add(Dense(2, activation='sigmoid'))
# Branch 2: other features
branch2 = Sequential()
dim = sequence_features.shape[1]
branch2.add(Dense(15, input_dim=dim, init='normal', activation='tanh'))
branch2.add(BatchNormalization())
# Merging branches to create final model
final_model = Sequential()
final_model.add(Merge([model,branch2], mode='concat'))
final_model.add(Dense(2, init='normal', activation='sigmoid'))
final_model.compile(loss='categorical_crossentropy', optimizer=optimizer,
metrics=['accuracy','precision','recall','fbeta_score','fmeasure'])
return final_model
def run(input_train, input_dev, input_test, text_col, label_col, resfile, embedding_index):
# Processing text and features
data_train, labels_train, data_test, labels_test = vectorize_text(input_train, input_test, text_col,label_col)
x_train, y_train = data_train, labels_train
x_test, y_test = data_test, labels_test
seq_train = get_sequence_features(input_train).as_matrix()
seq_test = get_sequence_features(input_test).as_matrix()
# Generating model
filepath = lstm_config.WEIGHTS_PATH
checkpoint = ModelCheckpoint(filepath, monitor='val_fmeasure', verbose=1, save_best_only=True, mode='max')
callbacks_list = [checkpoint]
model = create_model(embedding_index, seq_train)
model.fit([x_train, seq_train], y_train, validation_split=0.33, nb_epoch=3, batch_size=100, callbacks=callbacks_list, verbose=1)
# Evaluating
scores = model.evaluate([x_test, seq_test], y_test, verbose=1)
time.sleep(0.2)
preds = model.predict_classes([x_test, seq_test])
preds = to_categorical(preds)
print(metrics.f1_score(y_true=y_test, y_pred=preds, average="micro"))
print(metrics.f1_score(y_true=y_test, y_pred=preds, average="macro"))
print(metrics.classification_report(y_test, preds))
Output:
Using Theano backend. Found 2999999 word vectors.
Processing text dataset Found 7165 unique tokens.
Shape of data tensor: (1996, 50)
Shape of label tensor: (1996, 2)
1996 train 500 test
Train on 1337 samples, validate on 659 samples
Epoch 1/3 1300/1337
[============================>.] - ETA: 0s - loss: 0.6767 - acc:
0.6669 - precision: 0.5557 - recall: 0.6815 - fbeta_score: 0.6120 - fmeasure: 0.6120Epoch 00000: val_fmeasure im1337/1337
[==============================] - 10s - loss: 0.6772 - acc: 0.6672 -
precision: 0.5551 - recall: 0.6806 - fbeta_score: 0.6113 - fmeasure:
0.6113 - val_loss: 0.7442 - val_acc: 0 .0000e+00 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00 - val_fbeta_score: 0.0000e+00 - val_fmeasure: 0.0000e+00
Epoch 2/3 1300/1337
[============================>.] - ETA: 0s - loss: 0.6634 - acc:
0.7269 - precision: 0.5819 - recall: 0.7292 - fbeta_score: 0.6462 - fmeasure: 0.6462Epoch 00001: val_fmeasure di1337/1337
[==============================] - 9s - loss: 0.6634 - acc: 0.7263 -
precision: 0.5830 - recall: 0.7300 - fbeta_score: 0.6472 - fmeasure:
0.6472 - val_loss: 0.7616 - val_acc: 0. 0000e+00 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00 - val_fbeta_score: 0.0000e+00 - val_fmeasure: 0.0000e+00
Epoch 3/3 1300/1337
[============================>.] - ETA: 0s - loss: 0.6542 - acc:
0.7354 - precision: 0.5879 - recall: 0.7308 - fbeta_score: 0.6508 - fmeasure: 0.6508Epoch 00002: val_fmeasure di1337/1337
[==============================] - 8s - loss: 0.6545 - acc: 0.7337 -
precision: 0.5866 - recall: 0.7307 - fbeta_score: 0.6500 - fmeasure:
0.6500 - val_loss: 0.7801 - val_acc: 0. 0000e+00 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00 - val_fbeta_score: 0.0000e+00 - val_fmeasure: 0.0000e+00 500/500 [==============================] - 0s
500/500 [==============================] - 1s
0.5 /usr/local/lib/python3.4/dist-packages/sklearn/metrics/classification.py:1074:
UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in
labels with no predicted samples. 'precision', 'predicted', average,
warn_for)
0.333333333333 /usr/local/lib/python3.4/dist-packages/sklearn/metrics/classification.py:1074:
UndefinedMetricWarning: Precision and F-score are ill-defined and
being set to 0.0 in labels with no predicted samples.
precision recall f1-score support
0 0.00 0.00 0.00 250
1 0.50 1.00 0.67 250
avg / total 0.25 0.50 0.33 500

Resources