How do I retrain BERT model with new data

How do I retrain BERT model with new data - machine-learning

I have already trained a bert model and saved it in the .pb format and I want to retrain the model with new datasets that i custom made, so in order to not to lose the previous training and such, how do I train the model with the new data so the model could update it self
any approaches?
this is my training code down below
optimizer = Adam(lr=1e-5, decay=1e-6)
model.compile(loss='binary_crossentropy',
optimizer=optimizer,
metrics=['accuracy'])
history = model.fit(
x={'input_ids': x['input_ids'], 'attention_mask': x['attention_mask']},
#x={'input_ids': x['input_ids']},
y={'outputs': train_y},
validation_split=0.1,
batch_size=32,
epochs=1)

Related

Combining Classification and Regression in a sequential way using MLP

I am looking for a way to do classification and regression sequentially?
For example, assuming samples have 3 input values and 1 output value. The model should first classify using the 3 input values and sequentially do the regression task using the classification output (i.e. classification has 3 input values from the original samples and regression has 4 input values (3 from the original samples + the classification output).
Below the architecture that I draw. However, not really sure about the part where the second input layer occurs. Could someone give advice or working examples for this application?
input1_classification = Input(shape=(3,))
hidden1 = Dense(20, activation='relu', kernel_initializer='he_normal'(input1_classification)
# classsfication
outputout_classification = Dense(2, activation='softmax')(hidden1)
# regression input
input1_regression =Input(shape=(5,))
hidden2 = Dense(10, activation='relu', kernel_initializer='he_normal'(out_classification)
out_reg_final = Dense(1)(hidden2)
# define model
model = Model(inputs=input1_classification, outputs=[out_classification, out_reg_final])
# compile the keras modelmodel.compile(loss['sparse_categorical_crossentropy','mse'], optimizer='adam')
# fit the keras model on the dataset
model.fit(X_train, [y_train_class,y_train_reg], epochs=150, batch_size=32, verbose=2)

All you need to do is to concatenate your original input with the output of classification and apply your regression model there, you do not specify "extra" inputs.
So it will become something among the lines of:
input1_classification = Input(shape=(3,))
# classsfication
hidden1 = Dense(20, activation='relu', kernel_initializer='he_normal'(input1_classification)
outputout_classification = Dense(2, activation='softmax')(hidden1)
# regression input
new_input = Concatenate(axis=1)([input1_classification, outputout_classification ])
hidden2 = Dense(10, activation='relu', kernel_initializer='he_normal'(new_input)
out_reg_final = Dense(1)(hidden2)
# define model
model = Model(inputs=input1_classification, outputs=[out_classification, out_reg_final])
# compile the keras modelmodel.compile(loss['sparse_categorical_crossentropy','mse'], optimizer='adam')
# fit the keras model on the dataset
model.fit(X_train, [y_train_class,y_train_reg], epochs=150, batch_size=32, verbose=2)

What is the learning rate status when applying keras model fit() iteratively?

I am applying keras model fitting iteratively (within a for loop) due to a large dataset. My goal is to split the dataset into 100 parts, read each part at once and apply the fit() method.
My Question: In each iteration, does the fit() method begins from the initial learning rate (lr=0.1) which I set during model compilation? Or it remembers the last updated learning rate and apply it directly on a new call of the fit() method.
My code sample is as follows:
# Define model
my_model()
# Set the optimizer
sgd = SGD(lr=0.1, decay=1e-08, momentum=0.9, nesterov=False)
# Compile model
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
# Fit model and train
for j in range(100):
print('Data extracting from big matrix ...')
X_train = HDF5Matrix(path_train, 'X', start=st, end=ed)
Y_train = HDF5Matrix(path_train, 'y', start=st, end=ed)
print('Fitting model ...')
model.fit(X_train, Y_train, batch_size=100, shuffle='batch', nb_epoch=1,
validation_data=(X_test, Y_test))

The updated learning rate is remembered in the optimizer object model.optimizer, which is just the sgd variable in your example.
In callbacks such as LearningRateScheduler, the learning rate variable model.optimizer.lr is updated (some lines are removed for clarity).
def on_epoch_begin(self, epoch, logs=None):
lr = self.schedule(epoch)
K.set_value(self.model.optimizer.lr, lr)
However, when decay is used (as in your example), the learning rate variable is not directly updated, but the variable model.optimizer.iterations is updated. This variable records how many batches have been used in model fitting, and the learning rate with decay is computed in SGD.get_updates() by:
lr = self.lr
if self.initial_decay > 0:
lr *= (1. / (1. + self.decay * K.cast(self.iterations,
K.dtype(self.decay))))
So in either case, as long as the model is not re-compiled, it will use the updated learning rate in the new fit() calls.

Proper way to save Transfer Learning model in Keras

I have trained a constitutional net using transfer learning from ResNet50 in keras as given below.
base_model = applications.ResNet50(weights='imagenet', include_top=False, input_shape=(333, 333, 3))
## set model architechture
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
x = Dense(256, activation='relu')(x)
predictions = Dense(y_train.shape[1], activation='softmax')(x)
model = Model(input=base_model.input, output=predictions)
model.compile(loss='categorical_crossentropy', optimizer=optimizers.SGD(lr=1e-4, momentum=0.9),
metrics=['accuracy'])
model.summary()
After training the model as given below I want to save the model.
history = model.fit_generator(
train_datagen.flow(x_train, y_train, batch_size=batch_size),
steps_per_epoch=600,
epochs=epochs,
callbacks=callbacks_list
)
I can't use save_model() function from models of keras as model is of type Model here. I used save() function to save the model. But later when i loaded the model and validated the model it behaved like a untrained model. I think the weights were not saved. What was wrong.? How to save this model properly.?

As per Keras official docs,
If you only need to save the architecture of a model you can use
model_json = model.to_json()
with open("model_arch.json", "w") as json_file:
json_file.write(model_json)
To save weights
model.save_weights("my_model_weights.h5")
You can later load the json file and use
from keras.models import model_from_json
model = model_from_json(json_string)
And similarly, for weights you can use
model.load_weights('my_model_weights.h5')
I am using the same approach and this works perfectly well.

I don't know what happens with my models, but I've never been able to use save_model() and load_model(), there is always an error associated. But these functions exist.
What I usually do is to save and load weights (it's enough for using the model, but may cause a little problem for further training, as the "optimizer" state was not saved, but it was never a big problem, soon a new optimizer finds its way)
model.save_weights(fileName)
model.load_weights(fileName)
Another option us using numpy for saving - this one never failed:
np.save(fileName,model.get_weights())
model.set_weights(np.load(fileName))
For this to work, just create your model again (keep the code you use to create it) and set its weights.

Proper way to make prediction with Keras model trained with ImageDataGenerator

I have trained a model applying some image augmentations by using ImageDataGenerator in Keras as follows:
train_datagen = ImageDataGenerator(
rotation_range=60,
width_shift_range=0.1,
height_shift_range=0.1,
horizontal_flip=True)
train_datagen.fit(x_train)
history = model.fit_generator(
train_datagen.flow(x_train, y_train, batch_size=7),
steps_per_epoch=600,
epochs=epochs,
callbacks=callbacks_list
)
How should I make predictions with this model? By using model.predict() as shown below?
predictions = model.predict(x_test)
Or should I use model.predict_generator() where an ImageDataGenerator is applied on x_test where x_test is unlabelled?
If I use predict_generator(): How to do that?
What is the difference between two methods?

predict_generator() is a convenience function that makes it easier to load in the images and apply the same preprocessing like you did for your training samples. I recommend using that rather than model.predict.
In your case simply do:
test_gen = ImageDataGenerator()
predictions = model.predict_generator(test_gen.flow(# ... your params here ... #))

How to avoid overfitting on a simple feed forward network

Using the pima indians diabetes dataset I'm trying to build an accurate model using Keras. I've written the following code:
# Visualize training history
from keras import callbacks
from keras.layers import Dropout
tb = callbacks.TensorBoard(log_dir='/.logs', histogram_freq=10, batch_size=32,
write_graph=True, write_grads=True, write_images=False,
embeddings_freq=0, embeddings_layer_names=None, embeddings_metadata=None)
# Visualize training history
from keras.models import Sequential
from keras.layers import Dense
import matplotlib.pyplot as plt
import numpy
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load pima indians dataset
dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:, 0:8]
Y = dataset[:, 8]
# create model
model = Sequential()
model.add(Dense(12, input_dim=8, kernel_initializer='uniform', activation='relu', name='first_input'))
model.add(Dense(500, activation='tanh', name='first_hidden'))
model.add(Dropout(0.5, name='dropout_1'))
model.add(Dense(8, activation='relu', name='second_hidden'))
model.add(Dense(1, activation='sigmoid', name='output_layer'))
# Compile model
model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
# Fit the model
history = model.fit(X, Y, validation_split=0.33, epochs=1000, batch_size=10, verbose=0, callbacks=[tb])
# list all data in history
print(history.history.keys())
# summarize history for accuracy
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
After several tries, I've added dropout layers in order to avoid overfitting, but with no luck. The following graph shows that the validation loss and training loss gets separate at one point.
What else could I do to optimize this network?
UPDATE:
based on the comments I got I've tweaked the code like so:
model = Sequential()
model.add(Dense(12, input_dim=8, kernel_initializer='uniform', kernel_regularizer=regularizers.l2(0.01),
activity_regularizer=regularizers.l1(0.01), activation='relu',
name='first_input')) # added regularizers
model.add(Dense(8, activation='relu', name='first_hidden')) # reduced to 8 neurons
model.add(Dropout(0.5, name='dropout_1'))
model.add(Dense(5, activation='relu', name='second_hidden'))
model.add(Dense(1, activation='sigmoid', name='output_layer'))
Here are the graphs for 500 epochs

The first example gave a validation accuracy > 75% and the second one gave an accuracy of < 65% and if you compare the losses for epochs below 100, its less than < 0.5 for the first one and the second one was > 0.6. But how is the second case better?.
The second one to me is a case of under-fitting: the model doesnt have enough capacity to learn. While the first case has a problem of over-fitting because its training was not stopped when overfitting started (early stopping). If the training was stopped at say 100 epoch, it would be a far better model compared between the two.
The goal should be to obtain small prediction error in unseen data and for that you increase the capacity of the network till a point beyond which overfitting starts to happen.
So how to avoid over-fitting in this particular case? Adopt early stopping.
CODE CHANGES: To include early stopping and input scaling.
# input scaling
scaler = StandardScaler()
X = scaler.fit_transform(X)
# Early stopping
early_stop = EarlyStopping(monitor='val_loss', min_delta=0, patience=3, verbose=1, mode='auto')
# create model - almost the same code
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu', name='first_input'))
model.add(Dense(500, activation='relu', name='first_hidden'))
model.add(Dropout(0.5, name='dropout_1'))
model.add(Dense(8, activation='relu', name='second_hidden'))
model.add(Dense(1, activation='sigmoid', name='output_layer')))
history = model.fit(X, Y, validation_split=0.33, epochs=1000, batch_size=10, verbose=0, callbacks=[tb, early_stop])
The Accuracy and loss graphs:

First, try adding some regularization (https://keras.io/regularizers/) like with this code:
model.add(Dense(12, input_dim=12,
kernel_regularizer=regularizers.l2(0.01),
activity_regularizer=regularizers.l1(0.01)))
Also, make sure to decrease your network size i.e. you don't need a hidden layer of 500 neurons - try just taking that out to decrease the representation power and maybe even another layer if it's still overfitting. Also, only use relu activation. Maybe also try increasing your dropout rate to something like 0.75 (although it's already high). You probably also don't need to run it for so many epochs - it will just begin to overfit after long enough.

For a dataset like the Diabetes one you can use a much simpler network. Try to reduce the neurons in your second layer. (Is there a specific reason why you chose tanh as the activation there?).
In addition you simply can add an EarlyStopping callback to your training: https://keras.io/callbacks/

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

How do I retrain BERT model with new data - machine-learning

Related

Combining Classification and Regression in a sequential way using MLP

What is the learning rate status when applying keras model fit() iteratively?

Proper way to save Transfer Learning model in Keras

Proper way to make prediction with Keras model trained with ImageDataGenerator

How to avoid overfitting on a simple feed forward network

Categories

Resources