I've built a recurrent neural network to predict time series data. I seemed to be getting reasonable results, it would start at 33% accuracy for three classes as expected and would get better up to a point, also as expected. I wanted to test the network just to make sure it's actually working so I created a basic input / output as follows:
in1 in2 in3 out
1 2 3 0
5 6 7 1
9 10 11 2
1 2 3 0
5 6 7 1
9 10 11 2
I copied this pattern through a million lines in a csv. I would have assumed the neural network could easily identify this pattern, as it's always the same. I've tried a leaning rate of .1, .01, .001, .0001, but it always stays at about 33% (34% for the .0001 lr). I'll post my code below, should the neural net easily be able to identify this, is there something very wrong with my set up?
import os
import pandas as pd
from sklearn import preprocessing
from collections import deque
import random
import numpy as np
import time
import random
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, LSTM, CuDNNLSTM, BatchNormalization
from tensorflow.keras.models import load_model
df = pd.DataFrame()
df = pd.read_csv('/home/drew/Desktop/numbers.csv')
times = sorted(df.index.values)
last_5pct = times[-int(0.1*len(times))]
validation_df = df[(df.index >= last_5pct)]
main_df = df[(df.index < last_5pct)]
epochs = 10
batch_size = 64
train_x = main_df[['in1','in2','in3']]
validation_x = validation_df[['in1','in2','in3']]
train_y = main_df[['out']]
validation_y = validation_df[['out']]
train_x = train_x.values
validation_x = validation_x.values
train_x = train_x.reshape(train_x.shape[0],1,3)
validation_x = validation_x.reshape(validation_x.shape[0],1,3)
model = Sequential()
model.add(LSTM(128, input_shape=(train_x.shape[1:]), return_sequences=True))
model.add(Dropout(0.2))
model.add(BatchNormalization())
model.add(LSTM(128, input_shape=(train_x.shape[1:]), return_sequences=True))
model.add(Dropout(0.2))
model.add(BatchNormalization())
model.add(Dense(32, activation="relu"))
model.add(Dropout(0.2))
model.add(Dense(3, activation="softmax"))
opt = tf.keras.optimizers.Adam(lr=0.0001, decay=1e-6)
model.compile(loss='sparse_categorical_crossentropy',
optimizer=opt,
metrics=['accuracy'])
model.fit(train_x, train_y,
batch_size=batch_size,
epochs=epochs,
validation_data=(validation_x, validation_y))
result:
Train on 943718 samples, validate on 104857 samples
Epoch 1/10
943718/943718 [===] - 125s 132us/sample - loss: 0.0040 - acc: 0.3436 - val_loss: 2.3842e-07 - val_acc: 0.3439
Epoch 2/10
943718/943718 [===] - 111s 118us/sample - loss: 2.1557e-06 - acc: 0.3437 - val_loss: 2.3842e-07 - val_acc: 0.3435
...
Train on 943718 samples, validate on 104857 samples
Epoch 1/10
943718/943718 [==============================] - 125s 132us/sample - loss: 0.0040 - acc: 0.3436 - val_loss: 2.3842e-07 - val_acc: 0.3439
Epoch 2/10
943718/943718 [==============================] - 111s 118us/sample - loss: 2.1557e-06 - acc: 0.3437 - val_loss: 2.3842e-07 - val_acc: 0.3435
Epoch 6/10
719104/943718 [=====================>........] - ETA: 25s - loss: 2.4936e-07 - acc: 0.3436
It seems that you are not predicting time series result, so there is no need to return sequence after the second LSTM layer.
If you change the second LSTM to:
model.add(LSTM(128, input_shape=(train_x.shape[1:])))
Then, you should get high accuracy.
Related
Overview:
During transfer model (ResNet) training, when setting the test data as the validation data during training, the VALIDATION performance metrics for the last epoch (15) are: binary accuracy of 0.85, f1 of 0.84, precision of 0.84, and recall of 0.85. However, after training and fine-tuning, the model prediction on the test set yields a poor confusion matrix:
[[223, 277]
[233, 267]]
Training Data:
Perfectly balanced with 2000 positive and 2000 negative samples of retinal fundus images.
Validation/Testing Data:
Perfectly balanced with 500 positive and 500 negative samples of retinal fundus images.
Data Splitting:
train: 2 column dataframe with 4000 retinal fundus images paths and 4000 label types
test: 2 column dataframe with 1000 retinal fundus images paths and 1000 label types
Generator Code:
from tensorflow.keras.applications import ResNet50V2
from tensorflow.keras.applications.resnet_v2 import preprocess_input
from keras.preprocessing.image import ImageDataGenerator
target = 512
trainDataGen = ImageDataGenerator(preprocessing_function=preprocess_input, rotation_range=30, horizontal_flip=True, vertical_flip=False,shear_range = 0.2,zoom_range = 0.2,brightness_range=(0.8, 1.2))
trainGen = trainDataGen.flow_from_dataframe(dataframe=train, batch_size = 16, shuffle=True, x_col="fundus", y_col="types", class_mode="binary", validate_filenames='True', target_size=(target, target), directory=None, color_mode='rgb')
testDataGen = ImageDataGenerator(preprocessing_function=preprocess_input)
testGen = testDataGen.flow_from_dataframe(dataframe=test, x_col="fundus", y_col="types", class_mode="binary", validate_filenames='True', target_size=(target, target), directory=None, color_mode='rgb')
Example Model:
from keras.layers import Dropout, BatchNormalization, GlobalAveragePooling2D, Input, Flatten, Dropout, Dense
from keras.models import Model
base_model = ResNet50V2(weights='imagenet', include_top=False, input_shape=(target,target,3))
for layer in base_model.layers[:-4]:
layer.trainable = False
for layer in base_model.layers[-4:]:
layer.trainable = True
flatten = Flatten() (base_model.output)
flatten = Dropout(0.75) (flatten)
flatten = Dense(512, activation='relu') (flatten)
predictions = Dense(1, activation='sigmoid') (flatten)
model = Model(inputs=base_model.input, outputs=predictions)
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['binary_accuracy',f1_m,precision_m, recall_m])
history = model.fit_generator(trainGen, class_weight = {0:1 , 1:1}, epochs=15, validation_freq=1, validation_data=testGen)
Training Log Sample (prior to fine tuning):
Epoch 1/5
250/250 [==============================] - 643s 3s/step - loss: 1.6509 - binary_accuracy: 0.7312 - f1_m: 0.7095 - precision_m: 0.7429 - recall_m: 0.7319 - val_loss: 0.3843 - val_binary_accuracy: 0.8030 - val_f1_m: 0.7933 - val_precision_m: 0.8162 - val_recall_m: 0.7771
Epoch 2/5
250/250 [==============================] - 607s 2s/step - loss: 0.4188 - binary_accuracy: 0.8090 - f1_m: 0.7974 - precision_m: 0.8190 - recall_m: 0.8027 - val_loss: 0.3680 - val_binary_accuracy: 0.8160 - val_f1_m: 0.8118 - val_precision_m: 0.7705 - val_recall_m: 0.8718
Epoch 3/5
250/250 [==============================] - 606s 2s/step - loss: 0.3929 - binary_accuracy: 0.8278 - f1_m: 0.8113 - precision_m: 0.8542 - recall_m: 0.7999 - val_loss: 0.5049 - val_binary_accuracy: 0.7720 - val_f1_m: 0.8026 - val_precision_m: 0.7053 - val_recall_m: 0.9474
Epoch 4/5
250/250 [==============================] - 602s 2s/step - loss: 0.3491 - binary_accuracy: 0.8465 - f1_m: 0.8342 - precision_m: 0.8836 - recall_m: 0.8129 - val_loss: 0.3410 - val_binary_accuracy: 0.8350 - val_f1_m: 0.8425 - val_precision_m: 0.8038 - val_recall_m: 0.8948
Epoch 5/5
250/250 [==============================] - 617s 2s/step - loss: 0.3321 - binary_accuracy: 0.8480 - f1_m: 0.8335 - precision_m: 0.8705 - recall_m: 0.8187 - val_loss: 0.3538 - val_binary_accuracy: 0.8530 - val_f1_m: 0.8440 - val_precision_m: 0.9173 - val_recall_m: 0.7881
Model Evaluation:
from sklearn.metrics import confusion_matrix
import numpy as np
y_true = np.asarray(testGen.classes)
prediction = model.predict(testGen, verbose=1)
confusion = confusion_matrix(y_true, np.rint(prediction))
Summary:
Since the validation and test data were the same, I expected similar results. However, the large performance difference and poor confusion matrix are confusing :). Assuming the code is error-free, should this be expected when using the same data for validation and testing (despite both being unseen)?
The default behavior of flow_from_dataframe is shuffle=True, which should not be used for validation or testing data generators. When specifying shuffle=False for the trainGen variable, the confusion matrix will accurately show the performance results.
Please see related thread: Should I shuffle the test image dataset in ImageDataGenerator? I have different results with False and True
I'm training a model over several iterations (training, saving, and training again) on the second iteration my val_loss reached millions for some reason. is something wrong with how i'm importing the model?
This is how i saved my initial model after my first run
model.save('/content/drive/My Drive/Colab Notebooks/path/to/save/locaiton',save_format='tf')
and this is how i'm importing and overwriting it
def retrainmodel(model_path,tr_path,v_path):
image_size = 224
BATCH_SIZE_TRAINING = 10
BATCH_SIZE_VALIDATION = 10
BATCH_SIZE_TESTING = 1
EARLY_STOP_PATIENCE = 6
STEPS_PER_EPOCH_TRAINING = 10
STEPS_PER_EPOCH_VALIDATION = 10
NUM_EPOCHS = 20
model = tf.keras.models.load_model(model_path)
data_generator = ImageDataGenerator(preprocessing_function=preprocess_input)
train_generator = data_generator.flow_from_directory(tr_path,
target_size=(image_size, image_size),
batch_size=BATCH_SIZE_TRAINING,
class_mode='categorical')
validation_generator = data_generator.flow_from_directory(v_path,
target_size=(image_size, image_size),
batch_size=BATCH_SIZE_VALIDATION,
class_mode='categorical')
cb_early_stopper = EarlyStopping(monitor = 'val_loss', patience = EARLY_STOP_PATIENCE)
cb_checkpointer = ModelCheckpoint(filepath = 'path/to/checkpoint/folder', monitor = 'val_loss', save_best_only = True, mode = 'auto')
fit_history = model.fit(
train_generator,
steps_per_epoch=STEPS_PER_EPOCH_TRAINING,
epochs = NUM_EPOCHS,
validation_data=validation_generator,
validation_steps=STEPS_PER_EPOCH_VALIDATION,
callbacks=[cb_checkpointer, cb_early_stopper]
)
model.save('/content/drive/My Drive/Colab Notebooks/path/to/save/locaiton',save_format='tf')
this is my output after passing my directories onto this function
Found 1421 images belonging to 5 classes.
Found 305 images belonging to 5 classes.
Epoch 1/20
10/10 [==============================] - 233s 23s/step - loss: 2.3330 - acc: 0.7200 - val_loss: 4.6237 - val_acc: 0.4400
Epoch 2/20
10/10 [==============================] - 171s 17s/step - loss: 2.7988 - acc: 0.5900 - val_loss: 56996.6289 - val_acc: 0.6800
Epoch 3/20
10/10 [==============================] - 159s 16s/step - loss: 1.2776 - acc: 0.6800 - val_loss: 8396707.0000 - val_acc: 0.6500
Epoch 4/20
10/10 [==============================] - 144s 14s/step - loss: 1.4562 - acc: 0.6600 - val_loss: 2099639.7500 - val_acc: 0.7200
Epoch 5/20
10/10 [==============================] - 126s 13s/step - loss: 1.0970 - acc: 0.7033 - val_loss: 50811.5781 - val_acc: 0.7300
Epoch 6/20
10/10 [==============================] - 127s 13s/step - loss: 0.7326 - acc: 0.8000 - val_loss: 84781.5703 - val_acc: 0.7000
Epoch 7/20
10/10 [==============================] - 110s 11s/step - loss: 1.2356 - acc: 0.7100 - val_loss: 1000.2982 - val_acc: 0.7300
here is my optimizer:
sgd = optimizers.SGD(lr = 0.01, decay = 1e-6, momentum = 0.9, nesterov = True)
model.compile(optimizer = sgd, loss = 'categorical_crossentropy', metrics = 'acc')
where do you think i'm wrong?
I am training my model in batches because i'm working on google colab with 22K images in total, so these results are after feeding the network 2800 training images. do you think it'll sort itself out if i feed it more images, or is something seriously wrong?
I think it's not good to have this loss. It's logical to have a higher loss during the initial few epochs as we load a model and retrain it. However, this loss value should not shoot to the stars, as in your case. If at the time of saving, the loss value was roughly 0.5, then when you load the same model for retraining, it should not be higher than 10x previous value, so, a value of 5 +- 1 would be expected. [NOTE: this is purely based on experience. There is no general method to know the loss beforehand.]
If your loss is too high, the following would be reasonable:
Varying dataset - Changing the dynamics of the training dataset could force the model for this behavior.
Model save might have altered weights
Solutions Suggested:
Try using save_weights instead of save method on model
model.save_weights('path/to/filename.h5')
also, use load_weights instead of load_model
model = call_cnn_function_to_build_model()
model.compile(... your args ...)
model = model.load_weights('path/to/filename.h5')
Since you have checkpoints, try using checkpoint saved models. So rather than the final model, try to load a model from checkpoint which is near to your last epochs.
PS: Corrections gratefully accepted.
I've been thinking about 0-padding of word sequence and how that 0-padding is then converted to the Embedding layer. At first glance, one would think that you want to keep the embeddings = 0.0 as well. However, Embedding layer in keras generates random values for any input token, and there is no way to force it to generate 0.0's. Note, mask_zero does something different, I've already checked.
One might ask, why worry about this, the code seems to be working even when the embeddings are not 0.0's, as long as they are the same. So I came up with an example, albeit somewhat contrived, where setting the embeddings to 0.0's for the 0 padded token makes a difference.
I used the 20 News Groups data set from sklearn.datasets import fetch_20newsgroups. I do some minimal preprocessing: removal of punctuation, stopwords and numbers. I use from keras.preprocessing.sequence import pad_sequences for 0-padding. I split the ~18K posts into the training and validation set with the proportion of training/validation = 4/1.
I create a simple 1 dense hidden layer network with the input being the flattened sequence of embeddings:
EMBEDDING_DIM = 300
MAX_SEQUENCE_LENGTH = 1100
layer_size = 25
dropout = 0.3
sequence_input = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='int32', name='dnn_input')
embedding_layer = Embedding(len(word_index) + 1, EMBEDDING_DIM, input_length=MAX_SEQUENCE_LENGTH, name = 'embedding_dnn')
embedded_sequences = embedding_layer(sequence_input)
x = Flatten(name='flatten_dnn')(embedded_sequences)
x = Dense(layer_size, activation='relu', name ='hidden_dense_dnn')(x)
x = Dropout(dropout, name='dropout')(x)
preds = Dense(num_labels, activation='softmax', name = 'output_dnn')(x)
model = Model(sequence_input, preds)
model.compile(loss='categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
The model has about 14M trainable parameters (this example is a bit contrived, as I've already mentioned).
When I train it
earlystop = EarlyStopping(monitor='val_loss', patience=5)
history = model.fit(x_train, y_train, validation_data=(x_test, y_test), epochs=30, batch_size=BATCH_SIZE, callbacks=[earlystop])
it looks like for 4 epochs the algorithm is struggling to find its way out of the 'randomness':
Train on 15048 samples, validate on 3798 samples
Epoch 1/30
15048/15048 [==============================] - 58s 4ms/step - loss: 3.1118 - acc: 0.0519 - val_loss: 2.9894 - val_acc: 0.0534
Epoch 2/30
15048/15048 [==============================] - 56s 4ms/step - loss: 2.9820 - acc: 0.0556 - val_loss: 2.9827 - val_acc: 0.0527
Epoch 3/30
15048/15048 [==============================] - 55s 4ms/step - loss: 2.9712 - acc: 0.0626 - val_loss: 2.9718 - val_acc: 0.0579
Epoch 4/30
15048/15048 [==============================] - 55s 4ms/step - loss: 2.9259 - acc: 0.0756 - val_loss: 2.8363 - val_acc: 0.0874
Epoch 5/30
15048/15048 [==============================] - 56s 4ms/step - loss: 2.7092 - acc: 0.1390 - val_loss: 2.3251 - val_acc: 0.2796
...
Epoch 13/30
15048/15048 [==============================] - 56s 4ms/step - loss: 0.0698 - acc: 0.9807 - val_loss: 0.5010 - val_acc: 0.8736
It ends up with the accuracy of ~0.87
print ('Best validation accuracy is ', max(history.history['val_acc']))
Best validation accuracy is 0.874934175379845
However, when I explicitly set the embeddings for the padded 0's to 0.0
def myMask(x):
mask= K.greater(x,0) #will return boolean values
mask= K.cast(mask, dtype=K.floatx())
return mask
layer_size = 25
dropout = 0.3
sequence_input = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='int32', name='dnn_input')
embedding_layer = Embedding(len(word_index) + 1, EMBEDDING_DIM, input_length=MAX_SEQUENCE_LENGTH, name = 'embedding_dnn')
embedded_sequences = embedding_layer(sequence_input)
y = Lambda(myMask, output_shape=(MAX_SEQUENCE_LENGTH,))(sequence_input)
y = Reshape(target_shape=(MAX_SEQUENCE_LENGTH,1))(y)
merge_layer = Multiply(name = 'masked_embedding_dnn')([embedded_sequences,y])
x = Flatten(name='flatten_dnn')(merge_layer)
x = Dense(layer_size, activation='relu', name ='hidden_dense_dnn')(x)
x = Dropout(dropout, name='dropout')(x)
preds = Dense(num_labels, activation='softmax', name = 'output_dnn')(x)
model = Model(sequence_input, preds)
model.compile(loss='categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
the model with the same number of parameters immediately finds its way out of the 'randomness':
Train on 15048 samples, validate on 3798 samples
Epoch 1/30
15048/15048 [==============================] - 64s 4ms/step - loss: 2.4356 - acc: 0.3060 - val_loss: 1.2424 - val_acc: 0.7754
Epoch 2/30
15048/15048 [==============================] - 61s 4ms/step - loss: 0.6973 - acc: 0.8267 - val_loss: 0.5240 - val_acc: 0.8797
...
Epoch 10/30
15048/15048 [==============================] - 61s 4ms/step - loss: 0.0496 - acc: 0.9881 - val_loss: 0.4176 - val_acc: 0.8944
and ends up with a better accuracy of ~0.9.
Again, this is a somewhat contrived example, but still it shows that keeping those 'padded' embeddings at 0.0 can be beneficial.
Am I missing something here? And if I'm not missing anything, then, what is the reason Keras doesn't provide this functionality out-of-the-box?
UPDATE
#DanielMöller I tried your suggestion:
layer_size = 25
dropout = 0.3
init = RandomUniform(minval=0.0001, maxval=0.05, seed=None)
constr = NonNeg()
sequence_input = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='int32', name='dnn_input')
embedding_layer = Embedding(len(word_index) + 1,
EMBEDDING_DIM,
input_length=MAX_SEQUENCE_LENGTH,
name = 'embedding_dnn',
embeddings_initializer=init,
embeddings_constraint=constr)
embedded_sequences = embedding_layer(sequence_input)
y = Lambda(myMask, output_shape=(MAX_SEQUENCE_LENGTH,))(sequence_input)
y = Reshape(target_shape=(MAX_SEQUENCE_LENGTH,1))(y)
merge_layer = Multiply(name = 'masked_embedding_dnn')([embedded_sequences,y])
x = Flatten(name='flatten_dnn')(merge_layer)
x = Dense(layer_size, activation='relu', name ='hidden_dense_dnn')(x)
x = Dropout(dropout, name='dropout')(x)
preds = Dense(num_labels, activation='softmax', name = 'output_dnn')(x)
model = Model(sequence_input, preds)
model.compile(loss='categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
Unfortunately, the network was stuck in the 'randomness':
Train on 15197 samples, validate on 3649 samples
Epoch 1/30
15197/15197 [==============================] - 60s 4ms/step - loss: 3.1354 - acc: 0.0505 - val_loss: 2.9943 - val_acc: 0.0496
....
Epoch 24/30
15197/15197 [==============================] - 60s 4ms/step - loss: 2.9905 - acc: 0.0538 - val_loss: 2.9907 - val_acc: 0.0496
I also tried without the NonNeg() constraint, the same result.
Well, you're eliminating the computation of the gradients of the weights related to the padded steps.
If you have too many padded steps, then the embedding weights regarding the padding value will participate in a lot of calculations and will significantly compete with the other weights. But training these weights is a waste of computation and will certainly interfere in other words.
Consider also that, for instance, some of the weights for padding might have values between the values for meaningful words. So, increasing the weight might make it similar to another word when it's not. And decreasing too....
These extra calculations, extra contributions to loss and gradient calculations, etc. will create more computational need and more obstacles. It's like having a lot of garbage in the middle of the data.
Notice also that these zeros are going directly to the dense layer, which will also eliminate the gradients for a lot of the dense weights. This might overfit longer sequences though if they are few compared to shorter sequences.
Out of curiosity, what will happen if you do this?
from keras.initializers import RandomUniform
from keras.constraints import NonNeg
init = RandomUniform(minval=0.0001, maxval=0.05, seed=None)
constr = NonNeg()
......
embedding_layer = Embedding(len(word_index) + 1,
EMBEDDING_DIM,
input_length=MAX_SEQUENCE_LENGTH,
name = 'embedding_dnn',
embeddings_initializer=init,
embeddings_constraint=constr)
..........
I am trying to train a Lstm model for Text classification on Amazon Fine food review problem , I am using same dataset as provided by kaggle , I am using tokenizer to convert text data into tokens , but while I am training I am getting same accuracy for all the epochs .Like this
Epoch 1/5
55440/55440 [==============================] - 161s 3ms/step - loss: 2.3666 - acc: 0.8516 - val_loss: 2.3741 - val_acc: 0.8511
Epoch 2/5
55440/55440 [==============================] - 159s 3ms/step - loss: 2.3666 - acc: 0.8516 - val_loss: 2.3741 - val_acc: 0.8511
Epoch 3/5
55440/55440 [==============================] - 160s 3ms/step - loss: 2.3666 - acc: 0.8516 - val_loss: 2.3741 - val_acc: 0.8511
Epoch 4/5
55440/55440 [==============================] - 160s 3ms/step - loss: 2.3666 - acc: 0.8516 - val_loss: 2.3741 - val_acc: 0.8511
Moreover when I am plotting my confusion matrix None of the Negative classes are predicted only positive classes are predicted.
I think I am mostly doing wrong when converting Labels i.e. 'Positive' and 'Negative' to some numerical representation for classification.
Please see my code for more details.
I have tried increasing number of Lstm units and epochs also tried increasing length of sequences but none worked , please note that I have done all pre-processing on reviews as required.
# train and test split x is dataframe consisting of amazon fine food review
#dataset
x = df
x1 = pd.DataFrame(x)
y = df['Score']
x1.head()
import math
train_pct_index = int(0.70 * len(df)) #train data size = 70%
X_train, X_test = x1[:train_pct_index], x1[train_pct_index:]
y_train, y_test = y[:train_pct_index], y[train_pct_index:]
#y_test.value_counts()
x1_df = pd.DataFrame(X_train)
x2_df = pd.DataFrame(X_test)
from sklearn import preprocessing
encoder = preprocessing.LabelEncoder()
y_train=encoder.fit_transform(y_train)
y_test=encoder.fit_transform(y_test)
# tokenizing reviews
tokenizer = Tokenizer(num_words = 5000 )
tokenizer.fit_on_texts(x1_df['CleanedText'])
sequences = tokenizer.texts_to_sequences(x1_df['CleanedText'])
test_sequences = tokenizer.texts_to_sequences(x2_df['CleanedText'])
train_data = pad_sequences(sequences, maxlen=500)
test_data = pad_sequences(test_sequences, maxlen=500)
nb_words = (np.max(train_data) + 1)
# building lstm model and compiling it
from keras.layers.recurrent import LSTM, GRU
model = Sequential()
model.add(Embedding(nb_words,50,input_length=500))
model.add(LSTM(20))
model.add(Dropout(0.5))
model.add(Dense(1, activation='softmax'))
model.summary()
model.compile(loss='binary_crossentropy', optimizer='adam', metrics = ['accuracy'])
I would like my lstm model to generalise well and predict negative reviews which are minority class in this case as well.
I am trying to feed a huge sparse matrix to Keras model. As the dataset doesn`t fit into RAM, the way around is to train the model on a data generated batch-by-batch by a generator.
To test this approach and make sure my solution works fine, I slightly modified a Kera`s simple MLP on the Reuters newswire topic classification task. So, the idea is to compare original and edited models. I just convert numpy.ndarray into scipy.sparse.csr.csr_matrix and feed it to the model.
But my model crashes at some point and I need a hand to figure out a reason.
Here is the original model and my additions below
from __future__ import print_function
import numpy as np
np.random.seed(1337) # for reproducibility
from keras.datasets import reuters
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.utils import np_utils
from keras.preprocessing.text import Tokenizer
max_words = 1000
batch_size = 32
nb_epoch = 5
print('Loading data...')
(X_train, y_train), (X_test, y_test) = reuters.load_data(nb_words=max_words, test_split=0.2)
print(len(X_train), 'train sequences')
print(len(X_test), 'test sequences')
nb_classes = np.max(y_train)+1
print(nb_classes, 'classes')
print('Vectorizing sequence data...')
tokenizer = Tokenizer(nb_words=max_words)
X_train = tokenizer.sequences_to_matrix(X_train, mode='binary')
X_test = tokenizer.sequences_to_matrix(X_test, mode='binary')
print('X_train shape:', X_train.shape)
print('X_test shape:', X_test.shape)
print('Convert class vector to binary class matrix (for use with categorical_crossentropy)')
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)
print('Y_train shape:', Y_train.shape)
print('Y_test shape:', Y_test.shape)
print('Building model...')
model = Sequential()
model.add(Dense(512, input_shape=(max_words,)))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
history = model.fit(X_train, Y_train,
nb_epoch=nb_epoch, batch_size=batch_size,
verbose=1)#, validation_split=0.1)
#score = model.evaluate(X_test, Y_test,
# batch_size=batch_size, verbose=1)
print('Test score:', score[0])
print('Test accuracy:', score[1])
It outputs:
Loading data...
8982 train sequences
2246 test sequences
46 classes
Vectorizing sequence data...
X_train shape: (8982, 1000)
X_test shape: (2246, 1000)
Convert class vector to binary class matrix (for use with categorical_crossentropy)
Y_train shape: (8982, 46)
Y_test shape: (2246, 46)
Building model...
Epoch 1/5
8982/8982 [==============================] - 5s - loss: 1.3932 - acc: 0.6906
Epoch 2/5
8982/8982 [==============================] - 4s - loss: 0.7522 - acc: 0.8234
Epoch 3/5
8982/8982 [==============================] - 5s - loss: 0.5407 - acc: 0.8681
Epoch 4/5
8982/8982 [==============================] - 5s - loss: 0.4160 - acc: 0.8980
Epoch 5/5
8982/8982 [==============================] - 5s - loss: 0.3338 - acc: 0.9136
Test score: 1.01453569163
Test accuracy: 0.797417631398
Finally, here is my part
X_train_sparse = sparse.csr_matrix(X_train)
def batch_generator(X, y, batch_size):
n_batches_for_epoch = X.shape[0]//batch_size
for i in range(n_batches_for_epoch):
index_batch = range(X.shape[0])[batch_size*i:batch_size*(i+1)]
X_batch = X[index_batch,:].todense()
y_batch = y[index_batch,:]
yield(np.array(X_batch),y_batch)
model.fit_generator(generator=batch_generator(X_train_sparse, Y_train, batch_size),
nb_epoch=nb_epoch,
samples_per_epoch=X_train_sparse.shape[0])
The crash:
Exception Traceback (most recent call last)
<ipython-input-120-6722a4f77425> in <module>()
1 model.fit_generator(generator=batch_generator(X_trainSparse, Y_train, batch_size),
2 nb_epoch=nb_epoch,
----> 3 samples_per_epoch=X_trainSparse.shape[0])
/home/kk/miniconda2/envs/tensorflow/lib/python2.7/site-packages/keras/models.pyc in fit_generator(self, generator, samples_per_epoch, nb_epoch, verbose, callbacks, validation_data, nb_val_samples, class_weight, max_q_size, **kwargs)
648 nb_val_samples=nb_val_samples,
649 class_weight=class_weight,
--> 650 max_q_size=max_q_size)
651
652 def evaluate_generator(self, generator, val_samples, max_q_size=10, **kwargs):
/home/kk/miniconda2/envs/tensorflow/lib/python2.7/site-packages/keras/engine/training.pyc in fit_generator(self, generator, samples_per_epoch, nb_epoch, verbose, callbacks, validation_data, nb_val_samples, class_weight, max_q_size)
1356 raise Exception('output of generator should be a tuple '
1357 '(x, y, sample_weight) '
-> 1358 'or (x, y). Found: ' + str(generator_output))
1359 if len(generator_output) == 2:
1360 x, y = generator_output
Exception: output of generator should be a tuple (x, y, sample_weight) or (x, y). Found: None
I believe the problem is due to wrong setup of samples_per_epoch. I`d trully appreciate if someone could comment on this.
Here is my solution.
def batch_generator(X, y, batch_size):
number_of_batches = samples_per_epoch/batch_size
counter=0
shuffle_index = np.arange(np.shape(y)[0])
np.random.shuffle(shuffle_index)
X = X[shuffle_index, :]
y = y[shuffle_index]
while 1:
index_batch = shuffle_index[batch_size*counter:batch_size*(counter+1)]
X_batch = X[index_batch,:].todense()
y_batch = y[index_batch]
counter += 1
yield(np.array(X_batch),y_batch)
if (counter < number_of_batches):
np.random.shuffle(shuffle_index)
counter=0
In my case, X - sparse matrix, y - array.
If you can use Lasagne instead of Keras I've written a small MLP class with the following features:
supports both dense and sparse matrices
supports drop-out and hidden layer
Supports complete probability distribution instead of one-hot labels so supports multilabel training.
Supports scikit-learn like API (fit, predict, accuracy, etc.)
Is very easy to configure and modify