Negative weights and biases in siamese keras model - machine-learning

I try to train a siamese model in keras. I use a really simple encoder with only covnets to encode a 32x32 RGB picture into a feature vector. The encoder encodes two pictures A and B. Then a MLP compares the two vectors and computes a score between 0 and 1 which is should be high if A and B are of the same class and low if they are not.
I used relu as the activation function on all layers but the model only learned to encode everything into a 0-vector. I switched to 'tanh' and saw that a lot of weight and biases, and also the entries in the feature-vector are negative. So i now understand why with relu everything was zero. But how come i get negative values? The input is positive, output as well and y-values are 0 or 1. I think there is something wrong with my model.
It doesn't perform very well either. It gets to around 60% accuracy.
Here is my model:
def model():
initializer = keras.initializers.random_uniform(minval=0.0001, maxval=0.001)
enc = Sequential()
enc.add(Conv2D(32, (3, 3), padding='same', activation='tanh',kernel_initializer=initializer))
enc.add(Conv2D(32, (3, 3), padding='same', strides=(2, 2), activation='tanh',kernel_initializer=initializer))
enc.add(Conv2D(32, (3, 3), padding='same', activation='tanh',kernel_initializer=initializer))
enc.add(Conv2D(16, (3, 3), padding='same', strides=(2, 2), activation='tanh',kernel_initializer=initializer))
enc.add(Conv2D(32, (3, 3), padding='same', activation='tanh',kernel_initializer=initializer))
enc.add(Conv2D(4, (3, 3), padding='same', strides=(2, 2), activation='tanh',kernel_initializer=initializer))
enc.add(Flatten())
input1 = Input((32, 32, 3))
# enc.build((1,32,32,3))
# enc.summary()
input2 = Input((32, 32, 3))
enc1 = enc(input1)
enc2 = enc(input2)
twin = concatenate([enc1, enc2])
twin = Dense(64, activation='tanh',kernel_initializer=initializer)(twin)
twin = Dense(32, activation='tanh',kernel_initializer=initializer)(twin)
twin = Dense(1, activation='sigmoid',kernel_initializer=initializer)(twin)
twin = Model(inputs=[input1, input2], outputs=twin)
twin.summary()
twin.compile(optimizer=adam(0.0001), loss='binary_crossentropy', metrics=["acc"])
return twin
Edit: I found out it was all good. Just my data was bad. I had only 1/10's samples of one class compared to the others. Oversampling didn't help. I removed the class from the dataset for now. Its working. I might add the class back in with augmented copies as additional samples and see how it goes.

Related

Why CNN model after regularizer L2 overfitting?

x_train1, x_test, y_train1, y_test = train_test_split(images, labels,test_size=0.2,random_state=42)
x_train2, x_val,y_train2,y_val = train_test_split(x_train1, y_train1,test_size=0.05,random_state=42)
Layers
model = Sequential()
model.add(Conv2D(32, (3, 3), activation = 'relu', input_shape=(128,128,1), kernel_regularizer=keras.regularizers.l2(0.005), padding ='same', name='Conv_1'))
model.add(MaxPooling2D((2,2),name='MaxPool_1'))
model.add(Conv2D(64, (3, 3), activation = 'relu',padding ='same', kernel_regularizer=keras.regularizers.l2(0.005), name='Conv_2'))
model.add(MaxPooling2D((2,2),name='MaxPool_2'))
model.add(Flatten(name='Flatten'))
model.add(Dropout(0.5,name='Dropout'))
model.add(Dense(64, kernel_initializer='normal', activation='relu', name='Dense_1'))
model.add(Dense(1, kernel_initializer='normal', activation='sigmoid', name='Dense_2'))
model.summary()
Model compile
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit(x_train2, y_train2,validation_data=(x_test, y_test),batch_size=32, epochs=100 )
**
Results
**
Train: accuracy = 0.939577 ; loss = 0.134506
Test: accuracy = 0.767908 ; loss = 0.8002433
Regularization is not a magical option that will just close the gap between train and test at any "weight". One way of thinking about this is that when you take the strength of regularistaion, so a cofficiant alpha (in your case =0.005) and then express the gap between train and test as a function of it, say f(x) (in your case f(0.005) = 0.94-0.76 = 0.18), then the only thing we know is that f(inf) = 0. In other words, as you increase regularization strength, the gap disappears (at the cost of trainin score going down). There is no one magical form of regularistaion, and there is no guarantee L2 is good for your problem. You can make the gap disappear by just making the weight higher, but it might lead to bot trian and test going very low.

1D Convolutions CNN Keras

I'm very new to Keras and I'm tying to implement a CNN using 1D convolutions for binary classification on the raw time series data. Each training example has 160 time steps and I have 120 training examples. The training data is of shape (120,160). Here is the code:
X_input = Input((160,1))
X = Conv1D(6, 5, strides=1, name='conv1', kernel_initializer=glorot_uniform(seed=0))(X_input)
X = Activation('relu')(X)
X = MaxPooling1D(2, strides=2)(X)
X = Conv1D(16, 5, strides=1, name='conv2', kernel_initializer=glorot_uniform(seed=0))(X)
X = Activation('relu')(X)
X = MaxPooling1D(2, strides=2)(X)
X = Flatten()(X)
X = Dense(120, activation='relu', name='fc1', kernel_initializer=glorot_uniform(seed=0))(X)
X = Dense(84, activation='relu', name='fc2', kernel_initializer=glorot_uniform(seed=0))(X)
X = Dense(2, activation='sigmoid', name='fc3', kernel_initializer=glorot_uniform(seed=0))(X)
model = Model(inputs=X_input, outputs=X, name='model')
X_train = X_train.reshape(-1,160,1) # shape (120,160,1)
t_train = y_train.reshape(-1,1,1) # shape (120,1,1)
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train)
The error that I get is expected fc3 to have 2 dimensions, but got array with shape (120, 1, 1).
I tried removing each layer and just leaving the 'conv1' component but I get the error expected conv1 to have shape (156, 6) but got array with shape (1, 1). It seems like my input shape is wrong; however, looking at other examples it seems that this worked for other people.
I think the issue is not your inputs, but rather your targets.
The output of the model is 2 dimensions, but when it checks against the targets, it realizes that the targets are in an array with shape (120, 1, 1).
You can try changing the y_train reshape line as follows (fyi, it also seems that you accidentally typed t_train instead of y_train):
y_train = y_train.reshape(-1,1)
Also, it seems that you probably want to use 1 instead of 2 for the last Dense layer (see Difference between Dense(2) and Dense(1) as the final layer of a binary classification CNN?)

I have this error "input 0 is incompatible with layer lstm expected ndim=3 found ndim=5"

I am very new in this field. I searched on the internet but I could not find a solution. I am waiting for the help of people who are interested in this field.
My model
def load_VGG16_model():
base_model = VGG16(weights='imagenet', include_top=False, input_shape=(256,256,3))
print("Model loaded..!")
return base_model
Summary of the model
load_VGG16_model().summary()
Adding Layers
def action_model(shape=(30, 256, 256, 3), nbout=len(classes)):
convnet = load_VGG16_model()
model = Sequential()
model.add(TimeDistributed(convnet, input_shape=shape))
model.add(LSTM(30,return_sequences=True,input_shape=(30,512))) # the error shows this line.
top_model.add(Dense(4096, activation='relu', W_regularizer=l2(0.1)))
top_model.add(Dropout(0.5))
top_model.add(Dense(4096, activation='relu', W_regularizer=l2(0.1)))
top_model.add(Dropout(0.5))
model.add(Dense(nbout, activation='softmax'))
return model
model.add(LSTM(30,return_sequences=True,input_shape=(30,512))) ==> the error shows this line.
Your problem is similar to this one Building CNN + LSTM in Keras for a regression problem. What are proper shapes?
Using reshape layer before the LSTM should work fine for you
def action_model(shape=(256, 256, 3), nbout=len(classes)):
convnet = load_VGG16_model()
model = Sequential()
model.add(convnet)
model.add(tf.keras.layers.Reshape((8*8, 512))) # Shape comes from the last output of covnet
model.add(LSTM(30,return_sequences=True,input_shape=(8*8,512))) # the error shows this line.
top_model.add(Dense(4096, activation='relu', W_regularizer=l2(0.1)))
top_model.add(Dropout(0.5))
top_model.add(Dense(4096, activation='relu', W_regularizer=l2(0.1)))
top_model.add(Dropout(0.5))
model.add(Dense(nbout, activation='softmax'))
return model

View y_true of batch in Keras Callback during training

I am attempting to implement a custom loss functoin in Keras. It requires that I compute the sum of the inverse class frequencies for each y in B
It is the 1/epsilon(...) portion of the below function
The functoin is from this paper - Page 7
Note: I most definitely could be misinterpreting what the paper is describing to do. Please let me know if I am
I am currently trying to use a Keras Callback and the on_batch_start/end methods to try and determine the class frequency of the input batch (which means accessing y_true of the batch input), but am having little luck.
Thank you in advance for any help you can offer.
Edit: By "little luck" I mean I cannot find a way to access the y_true of an individual batch during training. Example: batch_size = 64, train_features.shape == (50000, 120, 20), I cannot find a way to access the y_true of an individual batch during training. I can access the keras model from on_batch_start/end (self.model), but I cannot find a way to access the actual y_true of the batch, size 64.
from tensorflow.python.keras.callbacks import Callback
class FreqReWeight(Callback):
"""
Update learning rate by batch label frequency distribution -- for use with LDAM loss
"""
def __init__(self, C):
self.C = C
def on_train_begin(self, logs={}):
self.model.custom_val = 0
def on_batch_end(self, batch, logs=None):
print('batch index', batch)
print('Model being trained', self.model)
# how can one access the y_true of the batch?
LDAM Loss Function
zj = "the j-th output of the model for the j-th class"
EDIT2
Loss Function - for testing when loss is called
def LDAM(C):
def loss(y_true, y_pred):
print('shape', y_true.shape) # only prints each epoch, not each batch
return K.mean(y_pred) + C # NOT LDAM, just dummy for testing purposes
return loss
Preparing Data, Compiling Model & Training
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)
m = 64 # batch_size
model = keras.Sequential()
model.add(Conv2D(32, (3, 3), padding='same',
input_shape=x_train.shape[1:]))
model.add(Activation('relu'))
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(10))
model.add(Activation('softmax'))
model.compile(loss=LDAM(1), optimizer='sgd', metrics=['accuracy'])
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
model.fit(x_train, y_train,
batch_size=m,
validation_data=(x_test, y_test),
callbacks=[FreqReWeight(1)])
Solution
Ended up asking a more specific question regarding this.
Answer to both can be found here

ValueError: strides should be of length 1, 1 or 3 but was 2

train input shape : (13974, 100, 6, 5)
train output shape : (13974, 1,1)
test input shape : (3494, 100, 6, 5)
test output shape : (3494, 1, 1)
I am developing the following model. of 2D CNN LSTM.
model = Sequential()
model.add(TimeDistributed(Conv2D(1, (1,1), activation='relu',
input_shape=(6,5,1))))
model.add(TimeDistributed(MaxPooling2D(pool_size=(6, 5))))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(units=300, return_sequences= False, input_shape=(100,1)))
model.add(Dense(1))
when I try to fit as follow
model.fit(train_input,train_output,epochs=50,batch_size=60)
it gives me a error.
ValueError: strides should be of length 1, 1 or 3 but was 2
please correct my model. I am converting the 6,5 image to a single unit and predict the 101th time stamp from 100 time stamps.
Your question is quite unclear, but I believe you have sequence of 100 images of size 6 x 5. It is better to incorporate Conv3D in your usecase, and also there is no necessary to have TimeDistributed everywhere. This is just an illustration for your usecase, you may have to add more layers of Conv and MaxPool and experiment with other hyper-parameters to get good fit.
# Add the channel dimension in input
train_input = np.expand_dims(train_input, -1)
# Remove the extra dimension in output
train_output = np.reshape(train_output, (-1, 1))
model = Sequential()
model.add(Conv3D(1, (1,1,1), activation='relu', input_shape=(100, 6,5, 1)))
model.add(MaxPooling3D(pool_size=(6, 5, 1)))
model.add(Reshape((16, 5)))
model.add(LSTM(units=300, return_sequences= False))
model.add(Dense(1))

Resources