Why CNN model after regularizer L2 overfitting? - machine-learning

x_train1, x_test, y_train1, y_test = train_test_split(images, labels,test_size=0.2,random_state=42)
x_train2, x_val,y_train2,y_val = train_test_split(x_train1, y_train1,test_size=0.05,random_state=42)
Layers
model = Sequential()
model.add(Conv2D(32, (3, 3), activation = 'relu', input_shape=(128,128,1), kernel_regularizer=keras.regularizers.l2(0.005), padding ='same', name='Conv_1'))
model.add(MaxPooling2D((2,2),name='MaxPool_1'))
model.add(Conv2D(64, (3, 3), activation = 'relu',padding ='same', kernel_regularizer=keras.regularizers.l2(0.005), name='Conv_2'))
model.add(MaxPooling2D((2,2),name='MaxPool_2'))
model.add(Flatten(name='Flatten'))
model.add(Dropout(0.5,name='Dropout'))
model.add(Dense(64, kernel_initializer='normal', activation='relu', name='Dense_1'))
model.add(Dense(1, kernel_initializer='normal', activation='sigmoid', name='Dense_2'))
model.summary()
Model compile
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit(x_train2, y_train2,validation_data=(x_test, y_test),batch_size=32, epochs=100 )
**
Results
**
Train: accuracy = 0.939577 ; loss = 0.134506
Test: accuracy = 0.767908 ; loss = 0.8002433

Regularization is not a magical option that will just close the gap between train and test at any "weight". One way of thinking about this is that when you take the strength of regularistaion, so a cofficiant alpha (in your case =0.005) and then express the gap between train and test as a function of it, say f(x) (in your case f(0.005) = 0.94-0.76 = 0.18), then the only thing we know is that f(inf) = 0. In other words, as you increase regularization strength, the gap disappears (at the cost of trainin score going down). There is no one magical form of regularistaion, and there is no guarantee L2 is good for your problem. You can make the gap disappear by just making the weight higher, but it might lead to bot trian and test going very low.

Related

NLP CNN with Embedding, predicting 5 values from Twitter text

I have a bunch of twitter texts (around 70K) that are around 10K words. Some are less and some are more. I have created a Keras architecture to predict 5 values for each Twitter texts and I have trained on those 70K. However, the accuracy (which is defined as follows: a match of pred1 and pred2 happens when all respective 5 values are with no more than 10 difference) is 21% (21% of the test data comply to the mentioned condition). I am not sure that the architecture, the tokenizer and the parameters are appropriate for this problem, but I will provide the code and ask for help. I would appreciate if someone could help me figure out why the accuracy is so low. Here is the model:
class NeuralNetMulti(Regressor):
def __init__(self):
self.name = 'keras-sequential'
self.model = Sequential()
self.num_words = 35000
self.tokenizer = Tokenizer(num_words=self.num_words, lower=True)
# self.earlystopping = callbacks.EarlyStopping(monitor="mae",
# mode="min", patience=5,
# restore_best_weights=True)
def fit(self, X, y):
print('Fitting into the neural net...')
#n_inputs = X.shape[1]
n_outputs = y.shape[1]
self.tokenizer.fit_on_texts(X)
encoded_docs = self.tokenizer.texts_to_sequences(X)
max_length = max([len(s.split()) for s in X])
self.max_length = max_length
X_train = pad_sequences(encoded_docs, maxlen=max_length, padding='post')
vocab_size = len(self.tokenizer.word_index) + 1
print(max_length)
self.model.add(Embedding(self.num_words, 512, input_length=max_length))
self.model.add(Conv1D(filters=32, kernel_size=8, activation='relu'))
self.model.add(MaxPooling1D(pool_size=2))
self.model.add(Conv1D(filters=16, kernel_size=4, activation='relu'))
self.model.add(MaxPooling1D(pool_size=2))
self.model.add(Conv1D(filters=8, kernel_size=4, activation='relu'))
self.model.add(MaxPooling1D(pool_size=2))
self.model.add(Flatten())
self.model.add(Dense(1024, activation='relu'))
self.model.add(Dense(512, activation='relu'))
self.model.add(Dense(256, activation='relu'))
self.model.add(Dense(n_outputs))
self.model.summary()
self.model.compile(loss='mse', optimizer='adam', metrics=['mse', 'mae'])
history = self.model.fit(X_train, y, verbose=1, epochs=5, validation_split=0.1, batch_size=16)
def predict(self, X):
print('Predicting...')
encoded_docs = self.tokenizer.texts_to_sequences(X)
X_test = pad_sequences(encoded_docs, maxlen=self.max_length, padding='post')
predictions = self.model.predict(X_test, verbose=1)
print('Predicted!')
return predictions
X in this case is just an array of strings (the texts). They could be 1000 words, but most of them are around 10K words. y is array of arrays with 5 values that I mentioned. Each of them is between 0 and 100. This model achieves 21% accuracy, but previously I used a TfIdf + PCA and basic Dense network and I achieved 62% accuracy on the same data. I would appreciate anyone experience in this field to give a professional advice. Thank you in advance!
One option is to try using pre-trained word embeddings instead of random embeddings since you are dealing with lots of text here. You can start with static embeddings like GloVe and then try out contextualized embeddings like BERT. Please refer to this Keras documentation to see how you can add pre-trained word vectors - link.

1D Convolutions CNN Keras

I'm very new to Keras and I'm tying to implement a CNN using 1D convolutions for binary classification on the raw time series data. Each training example has 160 time steps and I have 120 training examples. The training data is of shape (120,160). Here is the code:
X_input = Input((160,1))
X = Conv1D(6, 5, strides=1, name='conv1', kernel_initializer=glorot_uniform(seed=0))(X_input)
X = Activation('relu')(X)
X = MaxPooling1D(2, strides=2)(X)
X = Conv1D(16, 5, strides=1, name='conv2', kernel_initializer=glorot_uniform(seed=0))(X)
X = Activation('relu')(X)
X = MaxPooling1D(2, strides=2)(X)
X = Flatten()(X)
X = Dense(120, activation='relu', name='fc1', kernel_initializer=glorot_uniform(seed=0))(X)
X = Dense(84, activation='relu', name='fc2', kernel_initializer=glorot_uniform(seed=0))(X)
X = Dense(2, activation='sigmoid', name='fc3', kernel_initializer=glorot_uniform(seed=0))(X)
model = Model(inputs=X_input, outputs=X, name='model')
X_train = X_train.reshape(-1,160,1) # shape (120,160,1)
t_train = y_train.reshape(-1,1,1) # shape (120,1,1)
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train)
The error that I get is expected fc3 to have 2 dimensions, but got array with shape (120, 1, 1).
I tried removing each layer and just leaving the 'conv1' component but I get the error expected conv1 to have shape (156, 6) but got array with shape (1, 1). It seems like my input shape is wrong; however, looking at other examples it seems that this worked for other people.
I think the issue is not your inputs, but rather your targets.
The output of the model is 2 dimensions, but when it checks against the targets, it realizes that the targets are in an array with shape (120, 1, 1).
You can try changing the y_train reshape line as follows (fyi, it also seems that you accidentally typed t_train instead of y_train):
y_train = y_train.reshape(-1,1)
Also, it seems that you probably want to use 1 instead of 2 for the last Dense layer (see Difference between Dense(2) and Dense(1) as the final layer of a binary classification CNN?)

Negative weights and biases in siamese keras model

I try to train a siamese model in keras. I use a really simple encoder with only covnets to encode a 32x32 RGB picture into a feature vector. The encoder encodes two pictures A and B. Then a MLP compares the two vectors and computes a score between 0 and 1 which is should be high if A and B are of the same class and low if they are not.
I used relu as the activation function on all layers but the model only learned to encode everything into a 0-vector. I switched to 'tanh' and saw that a lot of weight and biases, and also the entries in the feature-vector are negative. So i now understand why with relu everything was zero. But how come i get negative values? The input is positive, output as well and y-values are 0 or 1. I think there is something wrong with my model.
It doesn't perform very well either. It gets to around 60% accuracy.
Here is my model:
def model():
initializer = keras.initializers.random_uniform(minval=0.0001, maxval=0.001)
enc = Sequential()
enc.add(Conv2D(32, (3, 3), padding='same', activation='tanh',kernel_initializer=initializer))
enc.add(Conv2D(32, (3, 3), padding='same', strides=(2, 2), activation='tanh',kernel_initializer=initializer))
enc.add(Conv2D(32, (3, 3), padding='same', activation='tanh',kernel_initializer=initializer))
enc.add(Conv2D(16, (3, 3), padding='same', strides=(2, 2), activation='tanh',kernel_initializer=initializer))
enc.add(Conv2D(32, (3, 3), padding='same', activation='tanh',kernel_initializer=initializer))
enc.add(Conv2D(4, (3, 3), padding='same', strides=(2, 2), activation='tanh',kernel_initializer=initializer))
enc.add(Flatten())
input1 = Input((32, 32, 3))
# enc.build((1,32,32,3))
# enc.summary()
input2 = Input((32, 32, 3))
enc1 = enc(input1)
enc2 = enc(input2)
twin = concatenate([enc1, enc2])
twin = Dense(64, activation='tanh',kernel_initializer=initializer)(twin)
twin = Dense(32, activation='tanh',kernel_initializer=initializer)(twin)
twin = Dense(1, activation='sigmoid',kernel_initializer=initializer)(twin)
twin = Model(inputs=[input1, input2], outputs=twin)
twin.summary()
twin.compile(optimizer=adam(0.0001), loss='binary_crossentropy', metrics=["acc"])
return twin
Edit: I found out it was all good. Just my data was bad. I had only 1/10's samples of one class compared to the others. Oversampling didn't help. I removed the class from the dataset for now. Its working. I might add the class back in with augmented copies as additional samples and see how it goes.

View y_true of batch in Keras Callback during training

I am attempting to implement a custom loss functoin in Keras. It requires that I compute the sum of the inverse class frequencies for each y in B
It is the 1/epsilon(...) portion of the below function
The functoin is from this paper - Page 7
Note: I most definitely could be misinterpreting what the paper is describing to do. Please let me know if I am
I am currently trying to use a Keras Callback and the on_batch_start/end methods to try and determine the class frequency of the input batch (which means accessing y_true of the batch input), but am having little luck.
Thank you in advance for any help you can offer.
Edit: By "little luck" I mean I cannot find a way to access the y_true of an individual batch during training. Example: batch_size = 64, train_features.shape == (50000, 120, 20), I cannot find a way to access the y_true of an individual batch during training. I can access the keras model from on_batch_start/end (self.model), but I cannot find a way to access the actual y_true of the batch, size 64.
from tensorflow.python.keras.callbacks import Callback
class FreqReWeight(Callback):
"""
Update learning rate by batch label frequency distribution -- for use with LDAM loss
"""
def __init__(self, C):
self.C = C
def on_train_begin(self, logs={}):
self.model.custom_val = 0
def on_batch_end(self, batch, logs=None):
print('batch index', batch)
print('Model being trained', self.model)
# how can one access the y_true of the batch?
LDAM Loss Function
zj = "the j-th output of the model for the j-th class"
EDIT2
Loss Function - for testing when loss is called
def LDAM(C):
def loss(y_true, y_pred):
print('shape', y_true.shape) # only prints each epoch, not each batch
return K.mean(y_pred) + C # NOT LDAM, just dummy for testing purposes
return loss
Preparing Data, Compiling Model & Training
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)
m = 64 # batch_size
model = keras.Sequential()
model.add(Conv2D(32, (3, 3), padding='same',
input_shape=x_train.shape[1:]))
model.add(Activation('relu'))
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(10))
model.add(Activation('softmax'))
model.compile(loss=LDAM(1), optimizer='sgd', metrics=['accuracy'])
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
model.fit(x_train, y_train,
batch_size=m,
validation_data=(x_test, y_test),
callbacks=[FreqReWeight(1)])
Solution
Ended up asking a more specific question regarding this.
Answer to both can be found here

What is the best Neural Network architecture for mapping one input image to two outputs?

I have generated a data set using EMNIST that has one character per image or two characters per image.The image is sized at 28x56(hxw)
I basically want to predict the one or two characters in a given image. I am not sure on which architecture to follow to implement this. There are 62 character classes.
ex:-single character two characters
For single character y= [23]
For two characters y= [35,11]
I tried the following.
I tried implementing this thorough a CTC but I got stuck in a infinite loss that I couldn't fix.
Padded the single character ground truths with 62 to note a blank character and trained a CNN with following layers.
print()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
y_train = sequence.pad_sequences(y_train, padding='post', value = 62)
y_test = sequence.pad_sequences(y_test,padding='post', value = 62)
X_train = X_train/255.0
X_test = X_test/255.0
input_shape = (28, 56, 1)
model = Sequential()
model.add(Conv2D(filters=72, kernel_size=(11,11), padding = 'same', activation='relu',input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2,2),strides=2))
model.add(Conv2D(filters=144, kernel_size=(7,7) , padding = 'same', activation='relu'))
model.add(Conv2D(filters=144, kernel_size=(3,3) , padding = 'same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(units=1024, activation='relu'))
model.add(Dropout(.5))
model.add(Dense(512, activation='relu'))
model.add(Dense(units=2, activation='relu'))
model.compile(loss='mse', optimizer='adam', metrics=['accuracy'])
model.summary()
batch_size = 128
steps = math.ceil(X_train.shape[0]/batch_size)
datagen = ImageDataGenerator(
featurewise_center=False, # set input mean to 0 over the dataset
samplewise_center=False, # set each sample mean to 0
featurewise_std_normalization=False, # divide inputs by std of the dataset
samplewise_std_normalization=False, # divide each input by its std
zca_whitening=False, # apply ZCA whitening
rotation_range=0, # randomly rotate images in the range (degrees, 0 to 180)
zoom_range = 0.2, # Randomly zoom image
width_shift_range=0.2, # randomly shift images horizontally (fraction of total width)
height_shift_range=0.1, # randomly shift images vertically (fraction of total height)
horizontal_flip=False, # randomly flip images
vertical_flip=False)
history = model.fit_generator(datagen.flow(X_train,y_train, batch_size=batch_size),
epochs = 6, validation_data = (X_test, y_test),
verbose = 1,steps_per_epoch=steps)
I was able to reach an accuracy of around 90% for validation set. However when I feed a generated image to see it's prediction it's a few characters off from the correct classification. Is there something wrong in the way I have created the model or pre-processed the data?
I have recognized my error. I have tried to tackle the problem using regression method wheres the problem is a classification problem.

Resources