Keras accuracy never exceeds 19% - machine-learning

I am taking the images from the SVHN (street view house number dataset, stanford) and I could really use some help in figuring out why my accuracy does not increase past 19%... This is essentially an MNIST tutorial with more difficult images (other numbers could be off center, blurs, shadows etc..)
I essentially take each image and subtract that image's mean then I normalize to 0-1 (divide by 255.)
The pipeline is simple enough:
2 Convolution 2d Layers (32 filters, 3x3)
MaxPool (2x2)
Dropout (.25)
2 Convolution 2d layers (64 filters, 3x3)
Max Pool (2x2)
Dropout(.25)
Flatten
Dense Relu
Dropout(.5)
Dense Softmax (10)
1792/73257 [..............................] - ETA: 3:17 - loss: 2.3241 - acc: 0.1602
1920/73257 [..............................] - ETA: 3:16 - loss: 2.3203 - acc: 0.1625
2048/73257 [..............................] - ETA: 3:14 - loss: 2.3177 - acc: 0.1621
2176/73257 [..............................] - ETA: 3:13 - loss: 2.3104 - acc: 0.1682
...
...
...
53376/73257 [====================>.........] - ETA: 51s - loss: 2.2439 - acc: 0.1879
53504/73257 [====================>.........] - ETA: 51s - loss: 2.2439 - acc: 0.1879
53632/73257 [====================>.........] - ETA: 50s - loss: 2.2439 - acc: 0.1878
53760/73257 [=====================>........] - ETA: 50s - loss: 2.2439 - acc: 0.1879
Can anyone help me figure out what I'm doing wrong? Are there any tips to figuring out why it would increase in the beginning as normal then taper off so quickly?
I am using categorical cross entropy with an rmsprop optimizer
epochs: 20
batch_size: 128
image_size: 32x32
model = Sequential()
model.add(Convolution2D(32, (3, 3),
strides=1,
activation='relu',
padding='same',
input_shape=input_shape,
data_format='channels_last'))
model.add(Convolution2D(32, (3, 3), padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2), data_format='channels_last'))
model.add(Dropout(0.25))
model.add(Convolution2D(64, (3, 3), activation='relu'))
model.add(Convolution2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(model.output_shape[1], activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))
#METHOD1
# print('compiling model...')
# model.compile(loss='mean_squared_error',
# optimizer='sgd',
# metrics=['accuracy'])
# print('fitting model...')
#
# model.fit(X_train, y_train, batch_size=64, epochs=1, verbose=1)
# METHOD2
sgd = SGD(lr=0.05)
model.compile(loss='categorical_crossentropy',
optimizer=sgd,
metrics=['accuracy'])
model.fit(X_train, y_train,
epochs=20,
batch_size=128)
score = model.evaluate(X_test, y_test, batch_size=128)

Related

Getting Different results on Each Iteration using Long Short Term Memory[LSTM] for text classification

I am using LTSM Deep-learning technique to classify my text, First i am dividing them into text and lables using panda library and making their tokens and then dividing them into into training and text data sets,whenever i runs the code, i get different results which varies from (80 to 100)percent.
Here is my code,
tokenizer = Tokenizer(num_words=MAX_NB_WORDS, filters='!"#$%&()*+,-./:;<=>?#[\]^_`{|}~',
lower=True)
tokenizer.fit_on_texts(trainDF['texts'])
word_index = tokenizer.word_index
print('Found %s unique tokens.' % len(word_index))
X = tokenizer.texts_to_sequences(trainDF['texts'])
X = pad_sequences(X, maxlen=MAX_SEQUENCE_LENGTH)
print('Shape of data tensor:', X.shape)
Y = pd.get_dummies(trainDF['label'])
print('Shape of label tensor:', Y.shape)
X_train, X_test, Y_train, Y_test = train_test_split(X,Y, test_size = 0.10, random_state = 42)
print(X_train.shape,Y_train.shape)
print(X_test.shape,Y_test.shape)
model = Sequential()
model.add(Embedding(MAX_NB_WORDS, EMBEDDING_DIM, input_length=X.shape[1]))
model.add(SpatialDropout1D(0.2))
model.add(LSTM(100, dropout=0.2, recurrent_dropout=0.2))
variables_for_classification=6 #change it as per your number of categories
model.add(Dense(variables_for_classification, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())
epochs = 5
batch_size = 64
history = model.fit(X_train, Y_train, epochs=epochs,
batch_size=batch_size,validation_split=0.1,callbacks=[EarlyStopping(monitor='val_loss', patience=3,
min_delta=0.0001)])
accr = model.evaluate(X_test,Y_test)
print('Test set\n Loss: {:0.3f}\n Accuracy: {:0.3f}'.format(accr[0],accr[1]))
Train on 794 samples, validate on 89 samples
Epoch 1/5
794/794 [==============================] - 19s 24ms/step - loss: 1.6401 - accuracy: 0.6297 - val_loss: 0.9098 - val_accuracy: 0.5843
Epoch 2/5
794/794 [==============================] - 16s 20ms/step - loss: 0.8365 - accuracy: 0.7166 - val_loss: 0.7487 - val_accuracy: 0.7753
Epoch 3/5
794/794 [==============================] - 16s 20ms/step - loss: 0.7093 - accuracy: 0.8401 - val_loss: 0.6519 - val_accuracy: 0.8652
Epoch 4/5
794/794 [==============================] - 16s 20ms/step - loss: 0.5857 - accuracy: 0.8829 - val_loss: 0.4935 - val_accuracy: 1.0000
Epoch 5/5
794/794 [==============================] - 16s 20ms/step - loss: 0.4248 - accuracy: 0.9345 - val_loss: 0.3512 - val_accuracy: 0.8652
99/99 [==============================] - 0s 2ms/step
Test set
Loss: 0.348
Accuracy: 0.869
in the last run accuracy was 100 percent.

Model loss remains unchaged

I would like to understand what could be responsible for this model loss behaviour. Training a CNN network, with 6 hidden-layers, the loss shoots up from around 1.8 to above 12 after the first epoch and remains constant for the remaining 99 epochs.
724504/724504 [==============================] - 358s 494us/step - loss: 1.8143 - acc: 0.7557 - val_loss: 16.1181 - val_acc: 0.0000e+00
Epoch 2/100
724504/724504 [==============================] - 355s 490us/step - loss: 12.0886 - acc: 0.2500 - val_loss: 16.1181 - val_acc: 0.0000e+00
Epoch 3/100
724504/724504 [==============================] - 354s 489us/step - loss: 12.0886 - acc: 0.2500 - val_loss: 16.1181 - val_acc: 0.0000e+00
Epoch 4/100
724504/724504 [==============================] - 348s 481us/step - loss: 12.0886 - acc: 0.2500 - val_loss: 16.1181 - val_acc: 0.0000e+00
Epoch 5/100
724504/724504 [==============================] - 355s 490us/step - loss: 12.0886 - acc: 0.2500 - val_loss: 16.1181 - val_acc: 0.0000e+00
I cannot believe this got to do with the dataset I work with, because I tried this with a different, publicly available dataset, the performance is exactly the same (in fact exact figures for loss/accuracy).
I also tested this with a somehow show network having 2 hidden-layers, see the performance below:
724504/724504 [==============================] - 41s 56us/step - loss: 0.4974 - acc: 0.8236 - val_loss: 15.5007 - val_acc: 0.0330
Epoch 2/100
724504/724504 [==============================] - 40s 56us/step - loss: 0.5204 - acc: 0.8408 - val_loss: 15.5543 - val_acc: 0.0330
Epoch 3/100
724504/724504 [==============================] - 41s 56us/step - loss: 0.6646 - acc: 0.8439 - val_loss: 15.3904 - val_acc: 0.0330
Epoch 4/100
724504/724504 [==============================] - 41s 57us/step - loss: 8.8982 - acc: 0.4342 - val_loss: 15.5867 - val_acc: 0.0330
Epoch 5/100
724504/724504 [==============================] - 41s 57us/step - loss: 0.5627 - acc: 0.8444 - val_loss: 15.5449 - val_acc: 0.0330
Can someone points the probable cause of this behaviour? What parameter / configuration needs be adjusted?
EDIT
Model creation
model = Sequential()
activ = 'relu'
model.add(Conv2D(32, (1, 3), strides=(1, 1), padding='same', activation=activ, input_shape=(1, n_points, 4)))
model.add(Conv2D(32, (1, 3), strides=(1, 1), padding='same', activation=activ))
model.add(MaxPooling2D(pool_size=(1, 2)))
#model.add(Dropout(.5))
model.add(Conv2D(64, (1, 3), strides=(1, 1), padding='same', activation=activ))
model.add(Conv2D(64, (1, 3), strides=(1, 1), padding='same', activation=activ))
model.add(MaxPooling2D(pool_size=(1, 2)))
#model.add(Dropout(.5))
model.add(Conv2D(128, (1, 3), strides=(1, 1), padding='same', activation=activ))
model.add(Conv2D(128, (1, 3), strides=(1, 1), padding='same', activation=activ))
model.add(MaxPooling2D(pool_size=(1, 2)))
model.add(Dropout(.5))
model.add(Flatten())
A = model.output_shape
model.add(Dense(int(A[1] * 1/4.), activation=activ))
model.add(Dropout(.5))
model.add(Dense(NoClass, activation='softmax'))
optimizer = Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_reample, Y_resample, epochs=100, batch_size=64, shuffle=False,
validation_data=(Test_X, Test_Y))
Changing the learning rate to lr=0.0001 here's the result after 100 epochs.
72090/72090 [==============================] - 29s 397us/step - loss: 0.5040 - acc: 0.8347 - val_loss: 4.3529 - val_acc: 0.2072
Epoch 99/100
72090/72090 [==============================] - 28s 395us/step - loss: 0.4958 - acc: 0.8382 - val_loss: 6.3422 - val_acc: 0.1806
Epoch 100/100
72090/72090 [==============================] - 28s 393us/step - loss: 0.5084 - acc: 0.8342 - val_loss: 4.3781 - val_acc: 0.1925
the optimal epoch size: 97, the value of high accuracy 0.20716827656581954
EDIT 2
Apparently, SMOTE isn't good for sampling all but majority class in a multiclassification, see below the trian/test plot:
Can you please try using BatchNormalization also, place just after your pooling layers. it is good to include it

Accuracy doesn't change over all epochs with multi-class classification

I am trying to train a model to solve multi-class classification problem.
I've got a problem that is training accuracy and validation accuracy doesn't change over all epochs. Like this:
Train on 4642 samples, validate on 516 samples
Epoch 1/100
- 1s - loss: 1.7986 - acc: 0.4649 - val_loss: 1.7664 - val_acc: 0.4942
Epoch 2/100
- 1s - loss: 1.6998 - acc: 0.5017 - val_loss: 1.7035 - val_acc: 0.4942
Epoch 3/100
- 1s - loss: 1.6956 - acc: 0.5022 - val_loss: 1.7000 - val_acc: 0.4942
Epoch 4/100
- 1s - loss: 1.6900 - acc: 0.5022 - val_loss: 1.6954 - val_acc: 0.4942
Epoch 5/100
- 1s - loss: 1.6931 - acc: 0.5017 - val_loss: 1.7058 - val_acc: 0.4942
...
Epoch 98/100
- 1s - loss: 1.6842 - acc: 0.5022 - val_loss: 1.6995 - val_acc: 0.4942
Epoch 99/100
- 1s - loss: 1.6844 - acc: 0.5022 - val_loss: 1.6977 - val_acc: 0.4942
Epoch 100/100
- 1s - loss: 1.6838 - acc: 0.5022 - val_loss: 1.6934 - val_acc: 0.4942
My code with keras:
y_train = to_categorical(y_train, num_classes=11)
X_train, X_test, Y_train, Y_test = train_test_split(x_train, y_train,
test_size=0.1, random_state=42)
model = Sequential()
model.add(Dense(64, init='normal', activation='relu', input_dim=160))
model.add(Dropout(0.3))
model.add(Dense(32, init='normal', activation='relu'))
model.add(BatchNormalization())
model.add(Dense(11, init='normal', activation='softmax'))
model.summary()
print("[INFO] compiling model...")
model.compile(optimizer=keras.optimizers.Adam(lr=0.01, beta_1=0.9,
beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False),
loss='categorical_crossentropy',
metrics=['accuracy'])
print("[INFO] training network...")
model.fit(X_train, Y_train, epochs=100, batch_size=32, verbose=2, validation_data = (X_test, Y_test))
Please help me. Thank you!
I had a similar problem once. For me it turned out that making sure I didnt have too many missing values in x_train (having to fill with value representing unknown or filling with median value), dropping columns that really didnt help (all had same value), and normalizing the x_train data helped.
Example from my data/model,
# load data
x_main = pd.read_csv("glioma DB X.csv")
y_main = pd.read_csv("glioma DB Y.csv")
# fill with median (will have to improve later, not done yet)
fill_median =['Surgery_SBRT','df','Dose','Ki67','KPS','BMI','tumor_size']
x_main[fill_median] = x_main[fill_median].fillna(x_main[fill_median].median())
x_main['Neurofc'] = x_main['Neurofc'].fillna(2)
x_main['comorbid'] = x_main['comorbid'].fillna(int(x_main['comorbid'].median()))
# drop surgery
x_main = x_main.drop(['Surgery'], axis=1)
# normalize all x
x_main_normalized = x_main.apply(lambda x: (x-np.mean(x))/(np.std(x)+1e-10))

CNN learning stagnation

I have created a simulation of the CNN I am trying to use on video data set.
I set the test data to all one single image on all frames for positive examples and 0 for negative examples. I thought this would learn very quickly. But it does not move at all.
Using current versions of Keras & Tensorflow on Windows 10 64bit.
First question, is my logic wrong? Should I expect the learning of this test data to quickly reach high accuracy?
Is there something wrong with my model or parameters? I have been trying a number of changes but still get the same problem.
Is the sample size (56) too small?
# testing feature extraction model.
import time
import numpy as np, cv2
import sys
import os
import keras
import tensorflow as tf
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten, BatchNormalization
from keras.layers import Conv3D, MaxPooling3D
from keras.optimizers import SGD,rmsprop, adam
from keras import regularizers
from keras.initializers import Constant
from keras.models import Model
#set gpu options
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=.99, allocator_type = 'BFC')
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True, gpu_options=gpu_options))
config = tf.ConfigProto()
batch_size = 5
num_classes = 1
epochs = 50
nvideos = 56
nframes = 55
nchan = 3
nrows = 480
ncols = 640
#load any single image, resize if needed
img = cv2.imread('C:\\Users\\david\\Documents\\AutonomousSS\\single frame.jpg',cv2.IMREAD_COLOR)
img = cv2.resize(img,(640,480))
x_learn = np.random.randint(0,255,(nvideos,nframes,nrows,ncols,nchan),dtype=np.uint8)
y_learn = np.array([[1],[1],[1],[0],[1],[0],[1],[0],[1],[0],
[1],[0],[0],[1],[0],[0],[1],[0],[1],[0],
[1],[0],[1],[1],[0],[1],[0],[0],[1],[1],
[1],[0],[1],[0],[1],[0],[1],[0],[1],[0],
[0],[1],[0],[0],[1],[0],[1],[0],[1],[0],
[1],[1],[0],[1],[0],[0]],np.uint8)
#each sample, each frame is either the single image for postive examples or 0 for negative examples.
for i in range (nvideos):
if y_learn[i] == 0 :
x_learn[i]=0
else:
x_learn[i,:nframes]=img
#build model
m_loss = 'mean_squared_error'
m_opt = SGD(lr=0.001, decay=1e-6, momentum=0.9, nesterov=True)
m_met = 'acc'
model = Sequential()
# 1st layer group
model.add(Conv3D(32, (3, 3,3), activation='relu',padding="same", name="conv1a", strides=(3, 3, 3),
kernel_initializer = 'glorot_normal',
trainable=False,
input_shape=(nframes,nrows,ncols,nchan)))
#model.add(BatchNormalization(axis=1))
model.add(Conv3D(32, (3, 3, 3), trainable=False, strides=(1, 1, 1), padding="same", name="conv1b", activation="relu"))
#model.add(BatchNormalization(axis=1))
model.add(MaxPooling3D(padding="valid", trainable=False, pool_size=(1, 5, 5), name="pool1", strides=(2, 2, 2)))
# 2nd layer group
model.add(Conv3D(128, (3, 3, 3), trainable=False, strides=(1, 1, 1), padding="same", name="conv2a", activation="relu"))
model.add(Conv3D(128, (3, 3, 3), trainable=False, strides=(1, 1, 1), padding="same", name="conv2b", activation="relu"))
#model.add(BatchNormalization(axis=1))
model.add(MaxPooling3D(padding="valid", trainable=False, pool_size=(1, 5, 5), name="pool2", strides=(2, 2, 2)))
# 3rd layer group
model.add(Conv3D(256, (3, 3, 3), trainable=False, strides=(1, 1, 1), padding="same", name="conv3a", activation="relu"))
model.add(Conv3D(256, (3, 3, 3), trainable=False, strides=(1, 1, 1), padding="same", name="conv3b", activation="relu"))
#model.add(BatchNormalization(axis=1))
model.add(MaxPooling3D(padding="valid", trainable=False, pool_size=(1, 5, 5), name="pool3", strides=(2, 2, 2)))
# 4th layer group
model.add(Conv3D(512, (3, 3, 3), trainable=False, strides=(1, 1, 1), padding="same", name="conv4a", activation="relu"))
model.add(Conv3D(512, (3, 3, 3), trainable=False, strides=(1, 1, 1), padding="same", name="conv4b", activation="relu"))
#model.add(BatchNormalization(axis=1))
model.add(MaxPooling3D(padding="valid", trainable=False, pool_size=(1, 5, 5), name="pool4", strides=(2, 2, 2)))
model.add(Flatten(name='flatten',trainable=False))
model.add(Dense(512,activation='relu', trainable=True,name='den0'))
model.add(Dense(num_classes,activation='softmax',name='den1'))
print (model.summary())
#compile model
model.compile(loss=m_loss,
optimizer=m_opt,
metrics=[m_met])
print ('compiled')
#set callbacks
from keras import backend as K
K.set_learning_phase(0) #set learning phase
tb = keras.callbacks.TensorBoard(log_dir=sample_root_path+'logs', histogram_freq=0,
write_graph=True, write_images=False)
tb.set_model(model)
reduce_lr = keras.callbacks.ReduceLROnPlateau(monitor='loss', factor=0.2,verbose=1,
patience=2, min_lr=0.000001)
reduce_lr.set_model(model)
ear_stop = keras.callbacks.EarlyStopping(monitor='loss', min_delta=0, patience=4, verbose=1, mode='auto')
ear_stop.set_model(model)
#fit
history = model.fit(x_learn, y_learn,
batch_size=batch_size,
callbacks=[reduce_lr,tb, ear_stop],
verbose=1,
validation_split=0.1,
shuffle = True,
epochs=epochs)
score = model.evaluate(x_learn, y_learn, batch_size=batch_size)
print(str(model.metrics_names) + ": " + str(score))
As usual, thanks for any and all help.
added output...
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1a (Conv3D) (None, 19, 160, 214, 32) 2624
_________________________________________________________________
conv1b (Conv3D) (None, 19, 160, 214, 32) 27680
_________________________________________________________________
pool1 (MaxPooling3D) (None, 10, 78, 105, 32) 0
_________________________________________________________________
conv2a (Conv3D) (None, 10, 78, 105, 128) 110720
_________________________________________________________________
conv2b (Conv3D) (None, 10, 78, 105, 128) 442496
_________________________________________________________________
pool2 (MaxPooling3D) (None, 5, 37, 51, 128) 0
_________________________________________________________________
conv3a (Conv3D) (None, 5, 37, 51, 256) 884992
_________________________________________________________________
conv3b (Conv3D) (None, 5, 37, 51, 256) 1769728
_________________________________________________________________
pool3 (MaxPooling3D) (None, 3, 17, 24, 256) 0
_________________________________________________________________
conv4a (Conv3D) (None, 3, 17, 24, 512) 3539456
_________________________________________________________________
conv4b (Conv3D) (None, 3, 17, 24, 512) 7078400
_________________________________________________________________
pool4 (MaxPooling3D) (None, 2, 7, 10, 512) 0
_________________________________________________________________
flatten (Flatten) (None, 71680) 0
_________________________________________________________________
den0 (Dense) (None, 512) 36700672
_________________________________________________________________
den1 (Dense) (None, 1) 513
=================================================================
Total params: 50,557,281
Trainable params: 36,701,185
Non-trainable params: 13,856,096
_________________________________________________________________
None
compiled
Train on 50 samples, validate on 6 samples
Epoch 1/50
50/50 [==============================] - 20s - loss: 0.5000 - acc: 0.5000 - val_loss: 0.5000 - val_acc: 0.5000
Epoch 2/50
50/50 [==============================] - 16s - loss: 0.5000 - acc: 0.5000 - val_loss: 0.5000 - val_acc: 0.5000
Epoch 3/50
50/50 [==============================] - 16s - loss: 0.5000 - acc: 0.5000 - val_loss: 0.5000 - val_acc: 0.5000
Epoch 4/50
45/50 [==========================>...] - ETA: 1s - loss: 0.5111 - acc: 0.4889
Epoch 00003: reducing learning rate to 0.00020000000949949026.
50/50 [==============================] - 16s - loss: 0.5000 - acc: 0.5000 - val_loss: 0.5000 - val_acc: 0.5000
Epoch 5/50
50/50 [==============================] - 16s - loss: 0.5000 - acc: 0.5000 - val_loss: 0.5000 - val_acc: 0.5000
Epoch 6/50
45/50 [==========================>...] - ETA: 1s - loss: 0.5111 - acc: 0.4889
Epoch 00005: reducing learning rate to 4.0000001899898055e-05.
50/50 [==============================] - 16s - loss: 0.5000 - acc: 0.5000 - val_loss: 0.5000 - val_acc: 0.5000
Epoch 7/50
50/50 [==============================] - 16s - loss: 0.5000 - acc: 0.5000 - val_loss: 0.5000 - val_acc: 0.5000
Epoch 8/50
45/50 [==========================>...] - ETA: 1s - loss: 0.4889 - acc: 0.5111
Epoch 00007: reducing learning rate to 8.000000525498762e-06.
50/50 [==============================] - 16s - loss: 0.5000 - acc: 0.5000 - val_loss: 0.5000 - val_acc: 0.5000
Epoch 9/50
50/50 [==============================] - 16s - loss: 0.5000 - acc: 0.5000 - val_loss: 0.5000 - val_acc: 0.5000
Epoch 00008: early stopping
56/56 [==============================] - 12s
['loss', 'acc']: [0.50000001516725334, 0.5000000127724239]
Your layers are set to trainable=False(apart from the last dense layer). Therefore your CNN cannot learn. In addition you won´t be able to train just on a single sample.
If you run into performance issues on your GPU switch to CPU or AWS or reduce your image size.

LSTM labeling all samples as the same class

I'm trying to design an LSTM network using Keras to combine word embeddings and other features in a binary classification setting. My test set contains 250 samples per class.
When I run my model using only the word embedding layers (the "model" layer in the code), I get an average F1 of around 0.67. When I create a new branch with the other features of fixed size that I compute separately ("branch2") and merge these with the word embeddings using "concat", the predictions all revert to a single class (giving perfect recall for that class), and average F1 drops to 0.33.
Am I adding in the features and training/testing incorrectly?
def create_model(embedding_index, sequence_features, optimizer='rmsprop'):
# Branch 1: word embeddings
model = Sequential()
embedding_layer = create_embedding_matrix(embedding_index, word_index)
model.add(embedding_layer)
model.add(Convolution1D(nb_filter=32, filter_length=3, border_mode='same', activation='tanh'))
model.add(MaxPooling1D(pool_length=2))
model.add(Bidirectional(LSTM(100)))
model.add(Dropout(0.2))
model.add(Dense(2, activation='sigmoid'))
# Branch 2: other features
branch2 = Sequential()
dim = sequence_features.shape[1]
branch2.add(Dense(15, input_dim=dim, init='normal', activation='tanh'))
branch2.add(BatchNormalization())
# Merging branches to create final model
final_model = Sequential()
final_model.add(Merge([model,branch2], mode='concat'))
final_model.add(Dense(2, init='normal', activation='sigmoid'))
final_model.compile(loss='categorical_crossentropy', optimizer=optimizer,
metrics=['accuracy','precision','recall','fbeta_score','fmeasure'])
return final_model
def run(input_train, input_dev, input_test, text_col, label_col, resfile, embedding_index):
# Processing text and features
data_train, labels_train, data_test, labels_test = vectorize_text(input_train, input_test, text_col,label_col)
x_train, y_train = data_train, labels_train
x_test, y_test = data_test, labels_test
seq_train = get_sequence_features(input_train).as_matrix()
seq_test = get_sequence_features(input_test).as_matrix()
# Generating model
filepath = lstm_config.WEIGHTS_PATH
checkpoint = ModelCheckpoint(filepath, monitor='val_fmeasure', verbose=1, save_best_only=True, mode='max')
callbacks_list = [checkpoint]
model = create_model(embedding_index, seq_train)
model.fit([x_train, seq_train], y_train, validation_split=0.33, nb_epoch=3, batch_size=100, callbacks=callbacks_list, verbose=1)
# Evaluating
scores = model.evaluate([x_test, seq_test], y_test, verbose=1)
time.sleep(0.2)
preds = model.predict_classes([x_test, seq_test])
preds = to_categorical(preds)
print(metrics.f1_score(y_true=y_test, y_pred=preds, average="micro"))
print(metrics.f1_score(y_true=y_test, y_pred=preds, average="macro"))
print(metrics.classification_report(y_test, preds))
Output:
Using Theano backend. Found 2999999 word vectors.
Processing text dataset Found 7165 unique tokens.
Shape of data tensor: (1996, 50)
Shape of label tensor: (1996, 2)
1996 train 500 test
Train on 1337 samples, validate on 659 samples
Epoch 1/3 1300/1337
[============================>.] - ETA: 0s - loss: 0.6767 - acc:
0.6669 - precision: 0.5557 - recall: 0.6815 - fbeta_score: 0.6120 - fmeasure: 0.6120Epoch 00000: val_fmeasure im1337/1337
[==============================] - 10s - loss: 0.6772 - acc: 0.6672 -
precision: 0.5551 - recall: 0.6806 - fbeta_score: 0.6113 - fmeasure:
0.6113 - val_loss: 0.7442 - val_acc: 0 .0000e+00 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00 - val_fbeta_score: 0.0000e+00 - val_fmeasure: 0.0000e+00
Epoch 2/3 1300/1337
[============================>.] - ETA: 0s - loss: 0.6634 - acc:
0.7269 - precision: 0.5819 - recall: 0.7292 - fbeta_score: 0.6462 - fmeasure: 0.6462Epoch 00001: val_fmeasure di1337/1337
[==============================] - 9s - loss: 0.6634 - acc: 0.7263 -
precision: 0.5830 - recall: 0.7300 - fbeta_score: 0.6472 - fmeasure:
0.6472 - val_loss: 0.7616 - val_acc: 0. 0000e+00 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00 - val_fbeta_score: 0.0000e+00 - val_fmeasure: 0.0000e+00
Epoch 3/3 1300/1337
[============================>.] - ETA: 0s - loss: 0.6542 - acc:
0.7354 - precision: 0.5879 - recall: 0.7308 - fbeta_score: 0.6508 - fmeasure: 0.6508Epoch 00002: val_fmeasure di1337/1337
[==============================] - 8s - loss: 0.6545 - acc: 0.7337 -
precision: 0.5866 - recall: 0.7307 - fbeta_score: 0.6500 - fmeasure:
0.6500 - val_loss: 0.7801 - val_acc: 0. 0000e+00 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00 - val_fbeta_score: 0.0000e+00 - val_fmeasure: 0.0000e+00 500/500 [==============================] - 0s
500/500 [==============================] - 1s
0.5 /usr/local/lib/python3.4/dist-packages/sklearn/metrics/classification.py:1074:
UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in
labels with no predicted samples. 'precision', 'predicted', average,
warn_for)
0.333333333333 /usr/local/lib/python3.4/dist-packages/sklearn/metrics/classification.py:1074:
UndefinedMetricWarning: Precision and F-score are ill-defined and
being set to 0.0 in labels with no predicted samples.
precision recall f1-score support
0 0.00 0.00 0.00 250
1 0.50 1.00 0.67 250
avg / total 0.25 0.50 0.33 500

Resources