View y_true of batch in Keras Callback during training - machine-learning

I am attempting to implement a custom loss functoin in Keras. It requires that I compute the sum of the inverse class frequencies for each y in B
It is the 1/epsilon(...) portion of the below function
The functoin is from this paper - Page 7
Note: I most definitely could be misinterpreting what the paper is describing to do. Please let me know if I am
I am currently trying to use a Keras Callback and the on_batch_start/end methods to try and determine the class frequency of the input batch (which means accessing y_true of the batch input), but am having little luck.
Thank you in advance for any help you can offer.
Edit: By "little luck" I mean I cannot find a way to access the y_true of an individual batch during training. Example: batch_size = 64, train_features.shape == (50000, 120, 20), I cannot find a way to access the y_true of an individual batch during training. I can access the keras model from on_batch_start/end (self.model), but I cannot find a way to access the actual y_true of the batch, size 64.
from tensorflow.python.keras.callbacks import Callback
class FreqReWeight(Callback):
"""
Update learning rate by batch label frequency distribution -- for use with LDAM loss
"""
def __init__(self, C):
self.C = C
def on_train_begin(self, logs={}):
self.model.custom_val = 0
def on_batch_end(self, batch, logs=None):
print('batch index', batch)
print('Model being trained', self.model)
# how can one access the y_true of the batch?
LDAM Loss Function
zj = "the j-th output of the model for the j-th class"
EDIT2
Loss Function - for testing when loss is called
def LDAM(C):
def loss(y_true, y_pred):
print('shape', y_true.shape) # only prints each epoch, not each batch
return K.mean(y_pred) + C # NOT LDAM, just dummy for testing purposes
return loss
Preparing Data, Compiling Model & Training
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)
m = 64 # batch_size
model = keras.Sequential()
model.add(Conv2D(32, (3, 3), padding='same',
input_shape=x_train.shape[1:]))
model.add(Activation('relu'))
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(10))
model.add(Activation('softmax'))
model.compile(loss=LDAM(1), optimizer='sgd', metrics=['accuracy'])
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
model.fit(x_train, y_train,
batch_size=m,
validation_data=(x_test, y_test),
callbacks=[FreqReWeight(1)])

Solution
Ended up asking a more specific question regarding this.
Answer to both can be found here

Related

Why CNN model after regularizer L2 overfitting?

x_train1, x_test, y_train1, y_test = train_test_split(images, labels,test_size=0.2,random_state=42)
x_train2, x_val,y_train2,y_val = train_test_split(x_train1, y_train1,test_size=0.05,random_state=42)
Layers
model = Sequential()
model.add(Conv2D(32, (3, 3), activation = 'relu', input_shape=(128,128,1), kernel_regularizer=keras.regularizers.l2(0.005), padding ='same', name='Conv_1'))
model.add(MaxPooling2D((2,2),name='MaxPool_1'))
model.add(Conv2D(64, (3, 3), activation = 'relu',padding ='same', kernel_regularizer=keras.regularizers.l2(0.005), name='Conv_2'))
model.add(MaxPooling2D((2,2),name='MaxPool_2'))
model.add(Flatten(name='Flatten'))
model.add(Dropout(0.5,name='Dropout'))
model.add(Dense(64, kernel_initializer='normal', activation='relu', name='Dense_1'))
model.add(Dense(1, kernel_initializer='normal', activation='sigmoid', name='Dense_2'))
model.summary()
Model compile
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit(x_train2, y_train2,validation_data=(x_test, y_test),batch_size=32, epochs=100 )
**
Results
**
Train: accuracy = 0.939577 ; loss = 0.134506
Test: accuracy = 0.767908 ; loss = 0.8002433
Regularization is not a magical option that will just close the gap between train and test at any "weight". One way of thinking about this is that when you take the strength of regularistaion, so a cofficiant alpha (in your case =0.005) and then express the gap between train and test as a function of it, say f(x) (in your case f(0.005) = 0.94-0.76 = 0.18), then the only thing we know is that f(inf) = 0. In other words, as you increase regularization strength, the gap disappears (at the cost of trainin score going down). There is no one magical form of regularistaion, and there is no guarantee L2 is good for your problem. You can make the gap disappear by just making the weight higher, but it might lead to bot trian and test going very low.

NLP CNN with Embedding, predicting 5 values from Twitter text

I have a bunch of twitter texts (around 70K) that are around 10K words. Some are less and some are more. I have created a Keras architecture to predict 5 values for each Twitter texts and I have trained on those 70K. However, the accuracy (which is defined as follows: a match of pred1 and pred2 happens when all respective 5 values are with no more than 10 difference) is 21% (21% of the test data comply to the mentioned condition). I am not sure that the architecture, the tokenizer and the parameters are appropriate for this problem, but I will provide the code and ask for help. I would appreciate if someone could help me figure out why the accuracy is so low. Here is the model:
class NeuralNetMulti(Regressor):
def __init__(self):
self.name = 'keras-sequential'
self.model = Sequential()
self.num_words = 35000
self.tokenizer = Tokenizer(num_words=self.num_words, lower=True)
# self.earlystopping = callbacks.EarlyStopping(monitor="mae",
# mode="min", patience=5,
# restore_best_weights=True)
def fit(self, X, y):
print('Fitting into the neural net...')
#n_inputs = X.shape[1]
n_outputs = y.shape[1]
self.tokenizer.fit_on_texts(X)
encoded_docs = self.tokenizer.texts_to_sequences(X)
max_length = max([len(s.split()) for s in X])
self.max_length = max_length
X_train = pad_sequences(encoded_docs, maxlen=max_length, padding='post')
vocab_size = len(self.tokenizer.word_index) + 1
print(max_length)
self.model.add(Embedding(self.num_words, 512, input_length=max_length))
self.model.add(Conv1D(filters=32, kernel_size=8, activation='relu'))
self.model.add(MaxPooling1D(pool_size=2))
self.model.add(Conv1D(filters=16, kernel_size=4, activation='relu'))
self.model.add(MaxPooling1D(pool_size=2))
self.model.add(Conv1D(filters=8, kernel_size=4, activation='relu'))
self.model.add(MaxPooling1D(pool_size=2))
self.model.add(Flatten())
self.model.add(Dense(1024, activation='relu'))
self.model.add(Dense(512, activation='relu'))
self.model.add(Dense(256, activation='relu'))
self.model.add(Dense(n_outputs))
self.model.summary()
self.model.compile(loss='mse', optimizer='adam', metrics=['mse', 'mae'])
history = self.model.fit(X_train, y, verbose=1, epochs=5, validation_split=0.1, batch_size=16)
def predict(self, X):
print('Predicting...')
encoded_docs = self.tokenizer.texts_to_sequences(X)
X_test = pad_sequences(encoded_docs, maxlen=self.max_length, padding='post')
predictions = self.model.predict(X_test, verbose=1)
print('Predicted!')
return predictions
X in this case is just an array of strings (the texts). They could be 1000 words, but most of them are around 10K words. y is array of arrays with 5 values that I mentioned. Each of them is between 0 and 100. This model achieves 21% accuracy, but previously I used a TfIdf + PCA and basic Dense network and I achieved 62% accuracy on the same data. I would appreciate anyone experience in this field to give a professional advice. Thank you in advance!
One option is to try using pre-trained word embeddings instead of random embeddings since you are dealing with lots of text here. You can start with static embeddings like GloVe and then try out contextualized embeddings like BERT. Please refer to this Keras documentation to see how you can add pre-trained word vectors - link.

How to use custom metrics in Keras model while using Grid Search CV?

I want to use R2 (Coefficient of determination) as metrics in my Keras model. For that, I have already defined a function (coeff_determination). This function as a metric works well without Grid Search CV but with grid search cv it gives an error like "The model is not configured to compute the accuracy. You should pass metrics=["accuracy"] to the model.compile() method". The code is given below.
def create_model():
#CNN Architecture - Model 7
model = Sequential()
model.add(Convolution1D(filters=10, kernel_size=12, activation="relu", kernel_initializer="glorot_uniform", input_shape=(X_train.shape[1],1)))
model.add(MaxPooling1D(pool_size=4, strides=2))
model.add(BatchNormalization())
model.add(Convolution1D(filters=16, kernel_size=12, activation='relu', kernel_initializer="glorot_uniform"))
model.add(MaxPooling1D(pool_size=3, strides=2))
model.add(BatchNormalization())
model.add(Convolution1D(filters=22, kernel_size=12, activation='relu', kernel_initializer="glorot_uniform"))
model.add(MaxPooling1D(pool_size=3, strides=2))
model.add(BatchNormalization())
model.add(Convolution1D(filters=28, kernel_size=12, activation='relu', kernel_initializer="glorot_uniform"))
model.add(MaxPooling1D(pool_size=4, strides=2))
model.add(BatchNormalization())
model.add(Convolution1D(filters=34, kernel_size=12, activation='relu', kernel_initializer="glorot_uniform"))
model.add(MaxPooling1D(pool_size=3, strides=2))
model.add(BatchNormalization())
model.add(Convolution1D(filters=40, kernel_size=12, activation='relu', kernel_initializer="glorot_uniform"))
model.add(MaxPooling1D(pool_size=3, strides=2))
model.add(BatchNormalization())
model.add(Flatten())
#model.add(Dropout(0.35))
model.add(Dense(130, activation='relu'))
#model.add(Dropout(0.35))
model.add(Dense(130, activation='relu'))
model.add(Dense(1, activation='linear'))
history = History()
model.compile(loss='mean_squared_error',optimizer= Adam(lr=0.0001), metrics=[coeff_determination])
#model.fit(X_train,y_train, validation_data=(X_test,y_test), epochs=400, batch_size=30, callbacks=[history])
return model
def coeff_determination(y_true, y_pred):
SS_res = K.sum(K.square(y_true - y_pred))
SS_tot = K.sum(K.square(y_true - K.mean(y_true)))
return (1 - SS_res / (SS_tot + K.epsilon()))
# to reprduce the same results next time
seed = 7
np.random.seed(seed)
# Creating Keras model with Scikit learn wrap-up
model = KerasClassifier(build_fn=create_model, verbose=0)
# define the grid search parameters
batch_size = [20,30,40,80]
epochs = [100,200,300,400]
# Using make scorer to convert metric r_2 to a scorer
my_scorer = make_scorer(r2_score, greater_is_better=True)
# passing dictionaries of parameters to the GridSearchCV
param_grid = dict(batch_size=batch_size, epochs=epochs)
grid = GridSearchCV(estimator=model, scoring=my_scorer, param_grid=param_grid, n_jobs=1, cv=3)
grid_result = grid.fit(X_train, y_train)
# summarizing the results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
print("%f (%f) with: %r" % (mean, stdev, param))
I think you need to feed your custom scoring function as the input for scoring param to GridSearchCV otherwise it will look out for the default estimator's scoring method which is accuracy.
From Documentation:
scoring: str, callable, list/tuple or dict, default=None
A single str (see The scoring parameter: defining model evaluation rules) or a callable (see Defining your scoring strategy from metric functions) to evaluate the predictions on the test set.
For evaluating multiple metrics, either give a list of (unique) strings or a dict with names as keys and callables as values.
NOTE that when using custom scorers, each scorer should return a single value. Metric functions returning a list/array of values can be wrapped into multiple scorers that return one value each.
See Specifying multiple metrics for evaluation for an example.
If None, the estimator’s score method is used.

What is the best Neural Network architecture for mapping one input image to two outputs?

I have generated a data set using EMNIST that has one character per image or two characters per image.The image is sized at 28x56(hxw)
I basically want to predict the one or two characters in a given image. I am not sure on which architecture to follow to implement this. There are 62 character classes.
ex:-single character two characters
For single character y= [23]
For two characters y= [35,11]
I tried the following.
I tried implementing this thorough a CTC but I got stuck in a infinite loss that I couldn't fix.
Padded the single character ground truths with 62 to note a blank character and trained a CNN with following layers.
print()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
y_train = sequence.pad_sequences(y_train, padding='post', value = 62)
y_test = sequence.pad_sequences(y_test,padding='post', value = 62)
X_train = X_train/255.0
X_test = X_test/255.0
input_shape = (28, 56, 1)
model = Sequential()
model.add(Conv2D(filters=72, kernel_size=(11,11), padding = 'same', activation='relu',input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2,2),strides=2))
model.add(Conv2D(filters=144, kernel_size=(7,7) , padding = 'same', activation='relu'))
model.add(Conv2D(filters=144, kernel_size=(3,3) , padding = 'same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(units=1024, activation='relu'))
model.add(Dropout(.5))
model.add(Dense(512, activation='relu'))
model.add(Dense(units=2, activation='relu'))
model.compile(loss='mse', optimizer='adam', metrics=['accuracy'])
model.summary()
batch_size = 128
steps = math.ceil(X_train.shape[0]/batch_size)
datagen = ImageDataGenerator(
featurewise_center=False, # set input mean to 0 over the dataset
samplewise_center=False, # set each sample mean to 0
featurewise_std_normalization=False, # divide inputs by std of the dataset
samplewise_std_normalization=False, # divide each input by its std
zca_whitening=False, # apply ZCA whitening
rotation_range=0, # randomly rotate images in the range (degrees, 0 to 180)
zoom_range = 0.2, # Randomly zoom image
width_shift_range=0.2, # randomly shift images horizontally (fraction of total width)
height_shift_range=0.1, # randomly shift images vertically (fraction of total height)
horizontal_flip=False, # randomly flip images
vertical_flip=False)
history = model.fit_generator(datagen.flow(X_train,y_train, batch_size=batch_size),
epochs = 6, validation_data = (X_test, y_test),
verbose = 1,steps_per_epoch=steps)
I was able to reach an accuracy of around 90% for validation set. However when I feed a generated image to see it's prediction it's a few characters off from the correct classification. Is there something wrong in the way I have created the model or pre-processed the data?
I have recognized my error. I have tried to tackle the problem using regression method wheres the problem is a classification problem.

How to check the learning rate with train_on_batch [Keras]

I am using Keras on Python2.
Does anyone know how to check and modify the learning rate for the ADAM optimizer please ? Here is my neural network and I defined my own optimizer. When training on batches with model.train_on_batch(...) I have no way to track the learning rate. Thanks for your help
def CNN_model():
# Create model
model = Sequential()
model.add(Conv2D(12, (5, 5), input_shape=(1, 256, 256), activation='elu'))
model.add(MaxPooling2D(pool_size=(3, 3)))
model.add(Conv2D(12, (5, 5), activation='elu'))
model.add(MaxPooling2D(pool_size=(4, 4)))
model.add(Conv2D(12, (3, 3), activation='elu'))
model.add(MaxPooling2D(pool_size=(3, 3)))
model.add(Flatten())
model.add(Dropout(0.3))
model.add(Dense(128, activation='elu'))
model.add(Dropout(0.3))
model.add(Dense(32, activation='elu'))
model.add(Dense(2, activation='softmax'))
# Compile model
my_optimizer = Adam(lr=0.001, decay=0.05)
model.compile(loss='categorical_crossentropy', optimizer=my_optimizer, metrics=['accuracy'])
return model
You can do it in several ways. The simplest thing in my mind is to do it through callbacks
from keras.callbacks import Callback
from keras import backend as K
class showLR( Callback ) :
def on_epoch_begin(self, epoch, logs=None):
lr = float(K.get_value(self.model.optimizer.lr))
print " epoch={:02d}, lr={:.5f}".format( epoch, lr )
You can use ReduceLROnPlateau callback. On your callbacks list add ReduceLROnPlateau callback and then just include your callback list to your train scheme.
from keras.callbacks import ModelCheckpoint, ReduceLROnPlateau
callbacks= [ReduceLROnPlateau(monitor='val_acc',
patience=5,
verbose=1,
factor=0.5,
min_lr=0.00001)]
model=CNN_model()
model.fit(x_train, y_train, batch_size=batch_size,
epochs=epochs,
validation_data=(x_valid, y_valid),
callbacks = callbacks)

Resources