How to check the learning rate with train_on_batch [Keras] - machine-learning

I am using Keras on Python2.
Does anyone know how to check and modify the learning rate for the ADAM optimizer please ? Here is my neural network and I defined my own optimizer. When training on batches with model.train_on_batch(...) I have no way to track the learning rate. Thanks for your help
def CNN_model():
# Create model
model = Sequential()
model.add(Conv2D(12, (5, 5), input_shape=(1, 256, 256), activation='elu'))
model.add(MaxPooling2D(pool_size=(3, 3)))
model.add(Conv2D(12, (5, 5), activation='elu'))
model.add(MaxPooling2D(pool_size=(4, 4)))
model.add(Conv2D(12, (3, 3), activation='elu'))
model.add(MaxPooling2D(pool_size=(3, 3)))
model.add(Flatten())
model.add(Dropout(0.3))
model.add(Dense(128, activation='elu'))
model.add(Dropout(0.3))
model.add(Dense(32, activation='elu'))
model.add(Dense(2, activation='softmax'))
# Compile model
my_optimizer = Adam(lr=0.001, decay=0.05)
model.compile(loss='categorical_crossentropy', optimizer=my_optimizer, metrics=['accuracy'])
return model

You can do it in several ways. The simplest thing in my mind is to do it through callbacks
from keras.callbacks import Callback
from keras import backend as K
class showLR( Callback ) :
def on_epoch_begin(self, epoch, logs=None):
lr = float(K.get_value(self.model.optimizer.lr))
print " epoch={:02d}, lr={:.5f}".format( epoch, lr )

You can use ReduceLROnPlateau callback. On your callbacks list add ReduceLROnPlateau callback and then just include your callback list to your train scheme.
from keras.callbacks import ModelCheckpoint, ReduceLROnPlateau
callbacks= [ReduceLROnPlateau(monitor='val_acc',
patience=5,
verbose=1,
factor=0.5,
min_lr=0.00001)]
model=CNN_model()
model.fit(x_train, y_train, batch_size=batch_size,
epochs=epochs,
validation_data=(x_valid, y_valid),
callbacks = callbacks)

Related

Negative weights and biases in siamese keras model

I try to train a siamese model in keras. I use a really simple encoder with only covnets to encode a 32x32 RGB picture into a feature vector. The encoder encodes two pictures A and B. Then a MLP compares the two vectors and computes a score between 0 and 1 which is should be high if A and B are of the same class and low if they are not.
I used relu as the activation function on all layers but the model only learned to encode everything into a 0-vector. I switched to 'tanh' and saw that a lot of weight and biases, and also the entries in the feature-vector are negative. So i now understand why with relu everything was zero. But how come i get negative values? The input is positive, output as well and y-values are 0 or 1. I think there is something wrong with my model.
It doesn't perform very well either. It gets to around 60% accuracy.
Here is my model:
def model():
initializer = keras.initializers.random_uniform(minval=0.0001, maxval=0.001)
enc = Sequential()
enc.add(Conv2D(32, (3, 3), padding='same', activation='tanh',kernel_initializer=initializer))
enc.add(Conv2D(32, (3, 3), padding='same', strides=(2, 2), activation='tanh',kernel_initializer=initializer))
enc.add(Conv2D(32, (3, 3), padding='same', activation='tanh',kernel_initializer=initializer))
enc.add(Conv2D(16, (3, 3), padding='same', strides=(2, 2), activation='tanh',kernel_initializer=initializer))
enc.add(Conv2D(32, (3, 3), padding='same', activation='tanh',kernel_initializer=initializer))
enc.add(Conv2D(4, (3, 3), padding='same', strides=(2, 2), activation='tanh',kernel_initializer=initializer))
enc.add(Flatten())
input1 = Input((32, 32, 3))
# enc.build((1,32,32,3))
# enc.summary()
input2 = Input((32, 32, 3))
enc1 = enc(input1)
enc2 = enc(input2)
twin = concatenate([enc1, enc2])
twin = Dense(64, activation='tanh',kernel_initializer=initializer)(twin)
twin = Dense(32, activation='tanh',kernel_initializer=initializer)(twin)
twin = Dense(1, activation='sigmoid',kernel_initializer=initializer)(twin)
twin = Model(inputs=[input1, input2], outputs=twin)
twin.summary()
twin.compile(optimizer=adam(0.0001), loss='binary_crossentropy', metrics=["acc"])
return twin
Edit: I found out it was all good. Just my data was bad. I had only 1/10's samples of one class compared to the others. Oversampling didn't help. I removed the class from the dataset for now. Its working. I might add the class back in with augmented copies as additional samples and see how it goes.

How to use custom metrics in Keras model while using Grid Search CV?

I want to use R2 (Coefficient of determination) as metrics in my Keras model. For that, I have already defined a function (coeff_determination). This function as a metric works well without Grid Search CV but with grid search cv it gives an error like "The model is not configured to compute the accuracy. You should pass metrics=["accuracy"] to the model.compile() method". The code is given below.
def create_model():
#CNN Architecture - Model 7
model = Sequential()
model.add(Convolution1D(filters=10, kernel_size=12, activation="relu", kernel_initializer="glorot_uniform", input_shape=(X_train.shape[1],1)))
model.add(MaxPooling1D(pool_size=4, strides=2))
model.add(BatchNormalization())
model.add(Convolution1D(filters=16, kernel_size=12, activation='relu', kernel_initializer="glorot_uniform"))
model.add(MaxPooling1D(pool_size=3, strides=2))
model.add(BatchNormalization())
model.add(Convolution1D(filters=22, kernel_size=12, activation='relu', kernel_initializer="glorot_uniform"))
model.add(MaxPooling1D(pool_size=3, strides=2))
model.add(BatchNormalization())
model.add(Convolution1D(filters=28, kernel_size=12, activation='relu', kernel_initializer="glorot_uniform"))
model.add(MaxPooling1D(pool_size=4, strides=2))
model.add(BatchNormalization())
model.add(Convolution1D(filters=34, kernel_size=12, activation='relu', kernel_initializer="glorot_uniform"))
model.add(MaxPooling1D(pool_size=3, strides=2))
model.add(BatchNormalization())
model.add(Convolution1D(filters=40, kernel_size=12, activation='relu', kernel_initializer="glorot_uniform"))
model.add(MaxPooling1D(pool_size=3, strides=2))
model.add(BatchNormalization())
model.add(Flatten())
#model.add(Dropout(0.35))
model.add(Dense(130, activation='relu'))
#model.add(Dropout(0.35))
model.add(Dense(130, activation='relu'))
model.add(Dense(1, activation='linear'))
history = History()
model.compile(loss='mean_squared_error',optimizer= Adam(lr=0.0001), metrics=[coeff_determination])
#model.fit(X_train,y_train, validation_data=(X_test,y_test), epochs=400, batch_size=30, callbacks=[history])
return model
def coeff_determination(y_true, y_pred):
SS_res = K.sum(K.square(y_true - y_pred))
SS_tot = K.sum(K.square(y_true - K.mean(y_true)))
return (1 - SS_res / (SS_tot + K.epsilon()))
# to reprduce the same results next time
seed = 7
np.random.seed(seed)
# Creating Keras model with Scikit learn wrap-up
model = KerasClassifier(build_fn=create_model, verbose=0)
# define the grid search parameters
batch_size = [20,30,40,80]
epochs = [100,200,300,400]
# Using make scorer to convert metric r_2 to a scorer
my_scorer = make_scorer(r2_score, greater_is_better=True)
# passing dictionaries of parameters to the GridSearchCV
param_grid = dict(batch_size=batch_size, epochs=epochs)
grid = GridSearchCV(estimator=model, scoring=my_scorer, param_grid=param_grid, n_jobs=1, cv=3)
grid_result = grid.fit(X_train, y_train)
# summarizing the results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
print("%f (%f) with: %r" % (mean, stdev, param))
I think you need to feed your custom scoring function as the input for scoring param to GridSearchCV otherwise it will look out for the default estimator's scoring method which is accuracy.
From Documentation:
scoring: str, callable, list/tuple or dict, default=None
A single str (see The scoring parameter: defining model evaluation rules) or a callable (see Defining your scoring strategy from metric functions) to evaluate the predictions on the test set.
For evaluating multiple metrics, either give a list of (unique) strings or a dict with names as keys and callables as values.
NOTE that when using custom scorers, each scorer should return a single value. Metric functions returning a list/array of values can be wrapped into multiple scorers that return one value each.
See Specifying multiple metrics for evaluation for an example.
If None, the estimator’s score method is used.

I have this error "input 0 is incompatible with layer lstm expected ndim=3 found ndim=5"

I am very new in this field. I searched on the internet but I could not find a solution. I am waiting for the help of people who are interested in this field.
My model
def load_VGG16_model():
base_model = VGG16(weights='imagenet', include_top=False, input_shape=(256,256,3))
print("Model loaded..!")
return base_model
Summary of the model
load_VGG16_model().summary()
Adding Layers
def action_model(shape=(30, 256, 256, 3), nbout=len(classes)):
convnet = load_VGG16_model()
model = Sequential()
model.add(TimeDistributed(convnet, input_shape=shape))
model.add(LSTM(30,return_sequences=True,input_shape=(30,512))) # the error shows this line.
top_model.add(Dense(4096, activation='relu', W_regularizer=l2(0.1)))
top_model.add(Dropout(0.5))
top_model.add(Dense(4096, activation='relu', W_regularizer=l2(0.1)))
top_model.add(Dropout(0.5))
model.add(Dense(nbout, activation='softmax'))
return model
model.add(LSTM(30,return_sequences=True,input_shape=(30,512))) ==> the error shows this line.
Your problem is similar to this one Building CNN + LSTM in Keras for a regression problem. What are proper shapes?
Using reshape layer before the LSTM should work fine for you
def action_model(shape=(256, 256, 3), nbout=len(classes)):
convnet = load_VGG16_model()
model = Sequential()
model.add(convnet)
model.add(tf.keras.layers.Reshape((8*8, 512))) # Shape comes from the last output of covnet
model.add(LSTM(30,return_sequences=True,input_shape=(8*8,512))) # the error shows this line.
top_model.add(Dense(4096, activation='relu', W_regularizer=l2(0.1)))
top_model.add(Dropout(0.5))
top_model.add(Dense(4096, activation='relu', W_regularizer=l2(0.1)))
top_model.add(Dropout(0.5))
model.add(Dense(nbout, activation='softmax'))
return model

View y_true of batch in Keras Callback during training

I am attempting to implement a custom loss functoin in Keras. It requires that I compute the sum of the inverse class frequencies for each y in B
It is the 1/epsilon(...) portion of the below function
The functoin is from this paper - Page 7
Note: I most definitely could be misinterpreting what the paper is describing to do. Please let me know if I am
I am currently trying to use a Keras Callback and the on_batch_start/end methods to try and determine the class frequency of the input batch (which means accessing y_true of the batch input), but am having little luck.
Thank you in advance for any help you can offer.
Edit: By "little luck" I mean I cannot find a way to access the y_true of an individual batch during training. Example: batch_size = 64, train_features.shape == (50000, 120, 20), I cannot find a way to access the y_true of an individual batch during training. I can access the keras model from on_batch_start/end (self.model), but I cannot find a way to access the actual y_true of the batch, size 64.
from tensorflow.python.keras.callbacks import Callback
class FreqReWeight(Callback):
"""
Update learning rate by batch label frequency distribution -- for use with LDAM loss
"""
def __init__(self, C):
self.C = C
def on_train_begin(self, logs={}):
self.model.custom_val = 0
def on_batch_end(self, batch, logs=None):
print('batch index', batch)
print('Model being trained', self.model)
# how can one access the y_true of the batch?
LDAM Loss Function
zj = "the j-th output of the model for the j-th class"
EDIT2
Loss Function - for testing when loss is called
def LDAM(C):
def loss(y_true, y_pred):
print('shape', y_true.shape) # only prints each epoch, not each batch
return K.mean(y_pred) + C # NOT LDAM, just dummy for testing purposes
return loss
Preparing Data, Compiling Model & Training
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)
m = 64 # batch_size
model = keras.Sequential()
model.add(Conv2D(32, (3, 3), padding='same',
input_shape=x_train.shape[1:]))
model.add(Activation('relu'))
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(10))
model.add(Activation('softmax'))
model.compile(loss=LDAM(1), optimizer='sgd', metrics=['accuracy'])
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
model.fit(x_train, y_train,
batch_size=m,
validation_data=(x_test, y_test),
callbacks=[FreqReWeight(1)])
Solution
Ended up asking a more specific question regarding this.
Answer to both can be found here

keras stuck during optimization

After trying the Keras example on CIFAR10, I decided to go for something bigger : a VGG-like net on the Tiny Imagenet dataset. This is a subset of the ImageNet dataset with 200 classes (instead of 1000) and 100K images downscaled to 64x64.
I got the VGG-like model from the file vgg_like_convnet.py here. Unfortunately, things are going pretty much like here except that this time changing the learning rate or swapping TH for TF does not help. Neither changing the optimizer (see code below).
Accuracy is basically stuck at 0.005 which, as it was pointed out, is what you would expected for completely random answer with 200 classes. Worse, if, by a fluke of weights init, it starts at, say, 0.007, it will quickly converges to 0.005 and firmly stays there for any subsequent epoch.
The Keras code (TH version) is below :
from __future__ import print_function
from keras.datasets import cifar10
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D, ZeroPadding2D
from keras.regularizers import l2, activity_l2, l1, activity_l1
from keras.optimizers import SGD, Adam, Adagrad, Adadelta
from keras.utils import np_utils
import numpy as np
import cPickle as pickle
# seed = 7
# np.random.seed(seed)
batch_size = 64
nb_classes = 200
nb_epoch = 30
# input image dimensions
img_rows, img_cols = 64, 64
# the tiny image net images are RGB
img_channels = 3
# Load the train dataset for TH
print('Load training data')
X_train=pickle.load(open('xtrain_shu_th.p','rb')) # np.zeros((100000,3,64,64)).astype('uint8')
y_train=pickle.load(open('ytrain_shu_th.p','rb')) # np.zeros((100000,1)).astype('uint8')
# Load the test dataset for TH
print('Load validation data')
X_test=pickle.load(open('xtest_th.p','rb')) # np.zeros((10000,3,64,64)).astype('uint8')
y_test=pickle.load(open('ytest_th.p','rb')) # np.zeros((10000,1)).astype('uint8')
# the data, shuffled and split between train and test sets
# (X_train, y_train), (X_test, y_test) = cifar10.load_data()
print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')
# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)
model = Sequential()
model.add(ZeroPadding2D((1,1),input_shape=(3,64,64)))
model.add(Convolution2D(64, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(64, 3, 3, activation='relu',))
model.add(MaxPooling2D((2,2), strides=(2,2)))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(128, 3, 3, activation='relu'))#,weights=pretrained_weights['layer_6'].values()))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(128, 3, 3, activation='relu'))#,weights=pretrained_weights['layer_8'].values()))
model.add(MaxPooling2D((2,2), strides=(2,2)))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(256, 3, 3, activation='relu'))#,weights=pretrained_weights['layer_11'].values()))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(256, 3, 3, activation='relu'))#,weights=pretrained_weights['layer_13'].values()))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(256, 3, 3, activation='relu'))#,weights=pretrained_weights['layer_15'].values()))
model.add(MaxPooling2D((2,2), strides=(2,2)))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))#,weights=pretrained_weights['layer_18'].values()))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))#,weights=pretrained_weights['layer_20'].values()))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))#,weights=pretrained_weights['layer_22'].values()))
model.add(MaxPooling2D((2,2), strides=(2,2)))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))
model.add(Flatten())
model.add(Dense(4096))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(4096))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(200, activation='softmax'))
# let's train the model using SGD + momentum (how original).
opt = SGD(lr=0.0001, decay=1e-6, momentum=0.7, nesterov=True)
# opt= Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
# opt = Adadelta(lr=1.0, rho=0.95, epsilon=1e-08, decay=0.0)
# opt = Adagrad(lr=0.01, epsilon=1e-08, decay=0.0)
model.compile(loss='categorical_crossentropy',
optimizer=opt,
metrics=['accuracy'])
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
print('Optimization....')
model.fit(X_train, Y_train,
batch_size=batch_size,
nb_epoch=nb_epoch,
validation_data=(X_test, Y_test),
shuffle=True)
# Save the resulting model
model.save('model.h5')
The Tiny Imagenet dataset consists of JPEG images that I converted to PPM with djpeg. I then created a large binary file containing, for each image, the class label (1 byte) followed by (64x64x3 bytes).
Reading this file from Keras was excruciatingly slow. So (I'm very new to Python, it might sound dumb to you), I decided to init a 4D Numpy array (100000,3,64,64) (for TH, (100000,64,64,3) for TF) with the dataset and pickle it. It now takes ~40s to load the dataset in the array when I run the code above.
I even checked that the pickled array contained the data in the right order with the code below:
import numpy as np
import cPickle as pickle
print("Reading data")
pix=pickle.load(open('xtrain_th.p','rb'))
print("Done")
img=67857
f=open('img'+str(img)+'.ppm','wb')
f.write('P6\n64 64\n255\n')
for y in range(0,64):
for x in range(0,64):
f.write(chr(pix[img][0][y][x]))
f.write(chr(pix[img][1][y][x]))
f.write(chr(pix[img][2][y][x]))
f.close()
This extracts PPM images back from the dataset.
Finally, I noticed that the training dataset was too ordered (i.e. the first 500 images all belonged to class 0, the second 500 to class 1, etc. etc.)
So I shuffled them with the code below:
# Dataset preparation for Theano backend
import cPickle as pickle
import numpy as np
import random as rnd
n=100000
print('Load training data')
X_train=pickle.load(open('xtrain_th.p','rb')) # np.zeros((100000,3,64,64)).astype('uint8')
y_train=pickle.load(open('ytrain_th.p','rb')) # np.zeros((100000,1)).astype('uint8')
tmpa=np.zeros((3,64,64)).astype('uint8')
# Shuffle the data
print('Shuffling training data')
for _ in range(0,n):
i=rnd.randrange(n)
j=rnd.randrange(n)
tmpa=X_train[i]
X_train[i]=X_train[j];
X_train[j]=tmpa
tmp=y_train[i][0]
y_train[i][0]=y_train[j][0]
y_train[j][0]=tmp
print 'Pickle dump'
pickle.dump(X_train,open('xtrain_shu_th.p','wb'))
pickle.dump(y_train,open('ytrain_shu_th.p','wb'))
Nothing helped. I wasn't expecting 99% accuracy at the first attempt, but at least some movement and then plateau.
I wanted to try TFLearn, but it had a pending bug when I looked a few days ago.
Any ideas ? Thanks in advance
You can use the build in shuffle of the keras model API (https://keras.io/models/model/#fit). Just set the shuffle parameter to true. You can do both batch shuffle and global shuffle. The default is global shuffle.
One thing to note though is that the validation split in fit is done before the shuffling takes place. Therefore in case you want to shuffle your validation data too I would advise you to use: sklearn.utils.shuffle. (http://scikit-learn.org/stable/modules/generated/sklearn.utils.shuffle.html)
From github:
if shuffle == 'batch':
index_array = batch_shuffle(index_array, batch_size)
elif shuffle:
random.shuffle(index_array)

Resources