I build a MLP using keras using below code.
model_relu = Sequential()
model_relu.add(Dense(256, activation='relu', input_shape=(input_dim,), kernel_initializer=RandomNormal(mean=0.0, stddev=0.062, seed=None)))
model_relu.add(Dense(128, activation='relu', kernel_initializer = RandomNormal(mean=0.0, stddev=0.125, seed=None)) )
model_relu.add(Dense(64, activation='relu', kernel_initializer = RandomNormal(mean=0.0, stddev=0.07, seed=None)) )
model_relu.add(Dense(output_dim, activation='softmax'))
The summary is
Model: "sequential_19"
Layer (type) Output Shape Param #
dense_49 (Dense) (None, 256) 200960
dense_50 (Dense) (None, 128) 32896
dense_51 (Dense) (None, 64) 8256
dense_52 (Dense) (None, 10) 650
I want to how many hidden layers this MLP has. Should we call 3 as number of hidden layers in this or 4 hidden layers.
Is total number of layers is 5 (Input + 3 hidden + 1 output(softmax)?
You have 1 input layer with 256 neurons, 2 hidden layers with 128 and 64 neurons and finally you have 1 output layer with 10 neurons.
I am trying to copy a model architecture. In the original model architecture, after applying the last Dense layer Output Shape is (None, 3) with 300 params. As shown
Layer (type) Output Shape Param #
dense_Dense1 (Dense) (None, 100) 128100
dense_Dense2 (Dense) (None, 3) 300
But when I apply the Dense output shape I am getting is (None, 3) with 303 params. as shown below
Layer (type) Output Shape Param #
dense_35 (Dense) (None, 100) 128100
dense_36 (Dense) (None, 3) 303
This is the code I wrote for this part:
x = GlobalAveragePooling2D()(x)
x = Dense(100, activation="relu")(x)
prediction = Dense(3, activation='softmax')(x)
Is it possible that the architecture you're trying to copy doesn't use bias? Try not using bias:
Dense(3, activation='softmax', use_bias=False)
Model: "sequential_5"
Layer (type) Output Shape Param #
global_average_pooling2d_3 ( (None, 8) 0
dense_9 (Dense) (None, 100) 900
dense_10 (Dense) (None, 3) 300
Total params: 1,200
Trainable params: 1,200
Non-trainable params: 0
I'm trying to use the google model from teachable machine application https://teachablemachine.withgoogle.com/ by adding few more layers before output layers.
When I retrain the model, always return this error:
ValueError: Input 0 of layer dense_25 is incompatible with the layer: expected axis -1 of input shape to have value 5 but received input with shape [20, 512]
Here's my approach:
When retrain the model it return error:
If I retrain the model without adding new layers, it's working fine.
Can anybody advise what was the issue?
if you want to add layers in between two layers for a pre-trained model, it is not as straightforward as adding layers using add method. if done so will result in un-expected behavior
analysis of error:
if you compile the model like below(like you specified):
model.layers[-1].add(Dense(512, activation ="relu"))
model.add(Dense(128, activation="relu"))
output of model summary :
Model: "sequential_12"
Layer (type) Output Shape Param #
sequential_9 (Sequential) (None, 1280) 410208
sequential_11 (Sequential) (None, 512) 131672
dense_12 (Dense) (None, 128) 768
dense_13 (Dense) (None, 32) 4128
dense_14 (Dense) (None, 5) 165
Total params: 546,941
Trainable params: 532,861
Non-trainable params: 14,080
everything looks good here, but on closer look :
for l in model.layers:
print("layer : ", l.name, ", expects input of shape : ",l.input_shape)
output :
layer : sequential_9 , expects input of shape : (None, 224, 224, 3)
layer : sequential_11 , expects input of shape : (None, 1280)
layer : dense_12 , expects input of shape : (None, 5) <-- **PROBLEM**
layer : dense_13 , expects input of shape : (None, 128)
layer : dense_14 , expects input of shape : (None, 32)
PROBLEM here is that dense_12 expects an input of shape(None, 5) but it should expect input shape of (None, 512) since we have added Dense(512) to sequential_11, possible reason would be adding layers like above specified might not update few attributes such as output shape of sequential_11, so during forward pass there is as miss-match between output of sequential_11 and input of layer dense_12(in your case dense_25)
possible work around would be :
for your question "adding layers in between sequential_9 and sequential_11", you can add as many layers as you want in between sequential_9 and sequential_11, but always make sure that output shape of last added layer should match input shape expected by sequential_11. in this case it is 1280.
code :
sequential_1 = model.layers[0] # re-using pre-trained model
sequential_2 = model.layers[1]
from tensorflow.keras.layers import Input
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Model
inp_sequential_1 = Input(sequential_1.layers[0].input_shape[1:])
out_sequential_1 = sequential_1(inp_sequential_1)
#adding layers in between sequential_9 and sequential_11
out_intermediate = Dense(512, activation="relu")(out_sequential_1)
out_intermediate = Dense(128, activation ="relu")(out_intermediate)
out_intermediate = Dense(32, activation ="relu")(out_intermediate)
# always make sure to include a layer with output shape matching input shape of sequential 11, in this case 1280
out_intermediate = Dense(1280, activation ="relu")(out_intermediate)
output = sequential_2(out_intermediate) # output of intermediate layers are given to sequential_11
final_model = Model(inputs=inp_sequential_1, outputs=output)
output of model summary:
Model: "functional_3"
Layer (type) Output Shape Param #
input_5 (InputLayer) [(None, 224, 224, 3)] 0
sequential_9 (Sequential) (None, 1280) 410208
dense_15 (Dense) (None, 512) 655872
dense_16 (Dense) (None, 128) 65664
dense_17 (Dense) (None, 32) 4128
dense_18 (Dense) (None, 1280) 42240
sequential_11 (Sequential) (None, 5) 128600
Total params: 1,306,712
Trainable params: 1,292,632
Non-trainable params: 14,080
Attempting a single-label classification problem with num_classes = 73
Here's my simplified Keras model:
num_classes = 73
batch_size = 4
train_data_list = [training_file_names list here..]
validation_data_list = [ validation_file_names list here..]
training_generator = DataGenerator(train_data_list, batch_size, num_classes)
validation_generator = DataGenerator(validation_data_list, batch_size, num_classes)
model = Sequential()
model.add(Conv1D(32, 3, strides=1, input_shape=(15,120), activation="relu"))
model.add(Conv1D(16, 3, strides=1, activation="relu"))
model.add(Dense(n_classes, activation='softmax'))
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.fit_generator(generator=training_generator, epochs=100,
Here's my DataGenerator's __get_item__ method:
def __get_item__(self):
X = np.zeros((self.batch_size,15,120))
y = np.zeros((self.batch_size, 1 ,self.n_classes))
for i in range(self.batch_size):
X_row = some_method_that_gives_X_of_15x20_dim()
target = some_method_that_gives_target()
one_hot = keras.utils.to_categorical(target, num_classes=self.n_classes)
X[i] = X_row
y[i] = one_hot
return X, y
Since my X values are correctly returned with dimension (batch_size, 15, 120), I am not showing it here. My issue is with the y value returned.
y returned from this generator method has a shape of (batch_size, 1, 73) as one hot encoded label for the 73 classes, which I think is the correct shape to return.
However Keras gives the following error for the last layer:
ValueError: Error when checking target: expected dense_1 to have 2
dimensions, but got array with shape (4, 1, 73)
Since the batch size is 4, I think the target batch should also be 3 dimensional (4,1,73). Why is then Keras expecting the last layer to be 2 dimensions ?
you model' s summary shows that in the output layer there should be only 2 dimensions, (None, 73)
Layer (type) Output Shape Param #
conv1d_7 (Conv1D) (None, 13, 32) 11552
conv1d_8 (Conv1D) (None, 11, 16) 1552
flatten_5 (Flatten) (None, 176) 0
dense_4 (Dense) (None, 73) 12921
Total params: 26,025
Trainable params: 26,025
Non-trainable params: 0
Since dimension of your target is (batch_size, 1, 73), you can just change to (batch_size, 73) in order for your model to run
I am finetuning a 3D-CNN called C3D which was originally trained to classify sports from video clips.
I am freezing the convolution (feature extraction) layers and training the fully connected layers using gifs from GIPHY to classify the gifs for sentiment analysis (positive or negative).
Weights are pre loaded for all layers except the final fully connected layer.
I am using 5000 images (2500 positive, 2500 negative) for training with a 70/30 training/testing split using Keras. I am using the Adam optimizer with a learning rate of 0.0001.
The training accuracy increases and the training loss decreases during training but very early on the validation accuracy and loss does not improve as the model starts to overfit.
I believe I have enough training data and am using a dropout of 0.5 on both of the fully connected layers so how can I combat this overfitting?
The model architechture, training code and visualisations of training performance from Keras can be found below.
from training.c3d_model import create_c3d_sentiment_model
from ImageSentiment import load_gif_data
import numpy as np
import pathlib
from keras.callbacks import ModelCheckpoint
from keras.optimizers import Adam
def image_generator(files, batch_size):
Generate batches of images for training instead of loading all images into memory
:param files:
:param batch_size:
while True:
# Select files (paths/indices) for the batch
batch_paths = np.random.choice(a=files,
batch_input = []
batch_output = []
# Read in each input, perform preprocessing and get labels
for input_path in batch_paths:
input = load_gif_data(input_path)
if "pos" in input_path: # if file name contains pos
output = np.array([1, 0]) # label
elif "neg" in input_path: # if file name contains neg
output = np.array([0, 1]) # label
batch_input += [input]
batch_output += [output]
# Return a tuple of (input,output) to feed the network
batch_x = np.array(batch_input)
batch_y = np.array(batch_output)
yield (batch_x, batch_y)
model = create_c3d_sentiment_model()
model.load_weights('models/C3D_Sport1M_weights_keras_2.2.4.h5', by_name=True)
for layer in model.layers[:14]: # freeze top layers as feature extractor
layer.trainable = False
for layer in model.layers[14:]: # fine tune final layers
layer.trainable = True
train_files = [str(filepath.absolute()) for filepath in pathlib.Path('data/sample_train').glob('**/*')]
val_files = [str(filepath.absolute()) for filepath in pathlib.Path('data/sample_validation').glob('**/*')]
batch_size = 8
train_generator = image_generator(train_files, batch_size)
validation_generator = image_generator(val_files, batch_size)
mc = ModelCheckpoint('best_model.h5', monitor='val_loss', mode='min', verbose=1)
history = model.fit_generator(train_generator, validation_data=validation_generator,
steps_per_epoch=int(np.ceil(len(train_files) / batch_size)),
validation_steps=int(np.ceil(len(val_files) / batch_size)), epochs=5, shuffle=True,
def load_gif_data(file_path):
Load and process gif for input into Keras model
:param file_path:
:return: Mean normalised image in BGR format as numpy array
for more info see -> http://cs231n.github.io/neural-networks-2/
im = Img(fp=file_path)
im.load(limit=16, # Keras image model only requires 16 frames
print("Error loading image: " + file_path)
im.resize(size=(112, 112))
np_frames = []
frame_index = 0
for i in range(16): # if image is less than 16 frames, repeat the frames until there are 16
frame = im.frames[frame_index]
rgb = np.array(frame)
bgr = rgb[..., ::-1]
mean = np.mean(bgr, axis=0)
np_frames.append(bgr - mean) # C3D model was originally trained on BGR, mean normalised images
# it is important that unseen images are in the same format
if frame_index == (len(im.frames) - 1):
frame_index = 0
frame_index = frame_index + 1
return np.array(np_frames)
model architecture
Layer (type) Output Shape Param #
conv1 (Conv3D) (None, 16, 112, 112, 64) 5248
pool1 (MaxPooling3D) (None, 16, 56, 56, 64) 0
conv2 (Conv3D) (None, 16, 56, 56, 128) 221312
pool2 (MaxPooling3D) (None, 8, 28, 28, 128) 0
conv3a (Conv3D) (None, 8, 28, 28, 256) 884992
conv3b (Conv3D) (None, 8, 28, 28, 256) 1769728
pool3 (MaxPooling3D) (None, 4, 14, 14, 256) 0
conv4a (Conv3D) (None, 4, 14, 14, 512) 3539456
conv4b (Conv3D) (None, 4, 14, 14, 512) 7078400
pool4 (MaxPooling3D) (None, 2, 7, 7, 512) 0
conv5a (Conv3D) (None, 2, 7, 7, 512) 7078400
conv5b (Conv3D) (None, 2, 7, 7, 512) 7078400
zeropad5 (ZeroPadding3D) (None, 2, 8, 8, 512) 0
pool5 (MaxPooling3D) (None, 1, 4, 4, 512) 0
flatten_1 (Flatten) (None, 8192) 0
fc6 (Dense) (None, 4096) 33558528
dropout_1 (Dropout) (None, 4096) 0
fc7 (Dense) (None, 4096) 16781312
dropout_2 (Dropout) (None, 4096) 0
nfc8 (Dense) (None, 2) 8194
Total params: 78,003,970
Trainable params: 78,003,970
Non-trainable params: 0
training visualisations
I think that the error is in the loss function and in the last Dense layer. As provided in the model summary, the last Dense layer is,
nfc8 (Dense) (None, 2)
The output shape is ( None , 2 ) meaning that the layer has 2 units. As you said earlier, you need to classify GIFs as positive or negative.
Classifying GIFs could be a binary classification problem or a multiclass classification problem ( with two classes ).
Binary classification has only 1 unit in the last Dense layer with a sigmoid activation function. But, here the model has 2 units in the last Dense layer.
Hence, the model is a multiclass classifier, but you have given a loss function of binary_crossentropy which is meant for binary classifiers ( with a single unit in the last layer ).
So, replacing the loss with categorical_crossentropy should work. Or edit the last Dense layer and change the number of units and activation function.
Hope this helps.
I use Keras 2.2.4. I train a model that I want to fine-tune every 30 epochs with new data content (image classification).
Everyday I add more image to classes to feed the model. Every 30 epochs the model is re-trained.
I use 2 conditions, first one if no previous model already trained and second condition when a model is already trained then I want to fine-tune it with new content/classes.
model_base = keras.applications.vgg19.VGG19(include_top=False, input_shape=(*IMG_SIZE, 3), weights='imagenet')
output = GlobalAveragePooling2D()(model_base.output)
# If we resume a pretrained model load it
if os.path.isfile(os.path.join(MODEL_PATH, 'weights.h5')):
print('Using existing weights...')
base_lr = 0.0001
model = load_model(os.path.join(MODEL_PATH, 'weights.h5'))
output = Dense(len(all_character_names), activation='softmax', name='d2')(output)
model = Model(model_base.input, output)
for layer in model_base.layers[:-2]:
layer.trainable = False
base_lr = 0.001
output = BatchNormalization()(output)
output = Dropout(0.5)(output)
output = Dense(2048, activation='relu', name='d1')(output)
output = BatchNormalization()(output)
output = Dropout(0.5)(output)
output = Dense(len(all_character_names), activation='softmax', name='d2')(output)
model = Model(model_base.input, output)
for layer in model_base.layers[:-5]:
layer.trainable = False
opt = optimizers.Adam(lr=base_lr, decay=base_lr / epochs)
Model summary first time:
block5_conv4 (Conv2D) (None, 14, 14, 512) 2359808
block5_pool (MaxPooling2D) (None, 7, 7, 512) 0
global_average_pooling2d_1 ( (None, 512) 0
batch_normalization_1 (Batch (None, 512) 2048
dropout_1 (Dropout) (None, 512) 0
d1 (Dense) (None, 2048) 1050624
batch_normalization_2 (Batch (None, 2048) 8192
dropout_2 (Dropout) (None, 2048) 0
d2 (Dense) (None, 19) 38931
Total params: 21,124,179
Trainable params: 10,533,907
Non-trainable params: 10,590,272
Model summary second time:
block5_conv4 (Conv2D) (None, 14, 14, 512) 2359808
block5_pool (MaxPooling2D) (None, 7, 7, 512) 0
global_average_pooling2d_1 ( (None, 512) 0
d2 (Dense) (None, 19) 9747
Total params: 20,034,131
Trainable params: 2,369,555
Non-trainable params: 17,664,576
Problem: When a model exist and is loaded for fine-tune it seems to have loose all additionals layers added the first time (Dense 2048, Dropout, etc)
Do I need to add these layers again ? It seems to have no sense as it would loose the training information made at the first pass.
Note: I may need to not set the base_lr as saving a model should save also the learning rate at the state where it stopped before, but I will check this later.
Please note that once you load the model:
model = load_model(os.path.join(MODEL_PATH, 'weights.h5'))
You don't use it. You just overwrite it again
model = Model(model_base.input, output)
Where output is also defined as an operation on the base_model.
It seems to me that you just want to delete the lines after load_model.