In order to generate Google-Dream like images, I am trying to modify input images optimizing an inceptionV3 network with gradient ascent`.
Desired effect:
(for more info on this, refer to [
For that matter, I have fine-tuned an inception network using the transfer learning method, and have generated the model:inceptionv3-ft.model
model.summary() prints the following architecture (shortened here due to space limitations):
Layer (type) Output Shape Param # Connected to
input_1 (InputLayer) (None, None, None, 3 0
conv2d_1 (Conv2D) (None, None, None, 3 864 input_1[0][0]
batch_normalization_1 (BatchNor (None, None, None, 3 96 conv2d_1[0][0]
activation_1 (Activation) (None, None, None, 3 0 batch_normalization_1[0][0]
conv2d_2 (Conv2D) (None, None, None, 3 9216 activation_1[0][0]
batch_normalization_2 (BatchNor (None, None, None, 3 96 conv2d_2[0][0]
activation_2 (Activation) (None, None, None, 3 0 batch_normalization_2[0][0]
conv2d_3 (Conv2D) (None, None, None, 6 18432 activation_2[0][0]
batch_normalization_3 (BatchNor (None, None, None, 6 192 conv2d_3[0][0]
activation_3 (Activation) (None, None, None, 6 0 batch_normalization_3[0][0]
max_pooling2d_1 (MaxPooling2D) (None, None, None, 6 0 activation_3[0][0]
conv2d_4 (Conv2D) (None, None, None, 8 5120 max_pooling2d_1[0][0]
batch_normalization_4 (BatchNor (None, None, None, 8 240 conv2d_4[0][0]
activation_4 (Activation) (None, None, None, 8 0 batch_normalization_4[0][0]
conv2d_5 (Conv2D) (None, None, None, 1 138240 activation_4[0][0]
batch_normalization_5 (BatchNor (None, None, None, 1 576 conv2d_5[0][0]
activation_5 (Activation) (None, None, None, 1 0 batch_normalization_5[0][0]
max_pooling2d_2 (MaxPooling2D) (None, None, None, 1 0 activation_5[0][0]
conv2d_9 (Conv2D) (None, None, None, 6 12288 max_pooling2d_2[0][0]
batch_normalization_9 (BatchNor (None, None, None, 6 192 conv2d_9[0][0]
activation_9 (Activation) (None, None, None, 6 0 batch_normalization_9[0][0]
conv2d_7 (Conv2D) (None, None, None, 4 9216 max_pooling2d_2[0][0]
conv2d_10 (Conv2D) (None, None, None, 9 55296 activation_9[0][0]
batch_normalization_7 (BatchNor (None, None, None, 4 144 conv2d_7[0][0]
batch_normalization_10 (BatchNo (None, None, None, 9 288 conv2d_10[0][0]
activation_7 (Activation) (None, None, None, 4 0 batch_normalization_7[0][0]
activation_10 (Activation) (None, None, None, 9 0 batch_normalization_10[0][0]
average_pooling2d_1 (AveragePoo (None, None, None, 1 0 max_pooling2d_2[0][0]
conv2d_6 (Conv2D) (None, None, None, 6 12288 max_pooling2d_2[0][0]
mixed9_1 (Concatenate) (None, None, None, 7 0 activation_88[0][0]
concatenate_2 (Concatenate) (None, None, None, 7 0 activation_92[0][0]
activation_94 (Activation) (None, None, None, 1 0 batch_normalization_94[0][0]
mixed10 (Concatenate) (None, None, None, 2 0 activation_86[0][0]
global_average_pooling2d_1 (Glo (None, 2048) 0 mixed10[0][0]
dense_1 (Dense) (None, 1024) 2098176 global_average_pooling2d_1[0][0]
dense_2 (Dense) (None, 1) 1025 dense_1[0][0]
Total params: 23,901,985
Trainable params: 18,315,137
Non-trainable params: 5,586,848
Now I'm using the following settings and code to try and tweak and activate specific high layer objects in order to make full objects emerge on the input image:
settings = {
'features': {
'mixed2': 0.,
'mixed3': 0.,
'mixed4': 0.,
'mixed10': 0., #highest
model = load_model('inceptionv3-ft.model')
#Get the symbolic outputs of each "key" layer (we gave them unique names).
layer_dict = dict([(, layer) for layer in model.layers])
#Define the loss.
loss = K.variable(0.)
for layer_name in settings['features']:
# Add the L2 norm of the features of a layer to the loss.
assert layer_name in layer_dict.keys(), 'Layer ' + layer_name + ' not found in model.'
coeff = settings['features'][layer_name]
x = layer_dict[layer_name].output
print (x)
# We avoid border artifacts by only involving non-border pixels in the loss.
scaling =, 'float32'))
if K.image_data_format() == 'channels_first':
loss += coeff * K.sum(K.square(x[:, :, 2: -2, 2: -2])) / scaling
loss += coeff * K.sum(K.square(x[:, 2: -2, 2: -2, :])) / scaling
# Compute the gradients of the dream wrt the loss.
grads = K.gradients(loss, dream)[0]
# Normalize gradients.
grads /= K.maximum(K.mean(K.abs(grads)), K.epsilon())
# Set up function to retrieve the value
# of the loss and gradients given an input image.
outputs = [loss, grads]
fetch_loss_and_grads = K.function([dream], outputs)
def eval_loss_and_grads(x):
outs = fetch_loss_and_grads([x])
loss_value = outs[0]
grad_values = outs[1]
return loss_value, grad_values
def resize_img(img, size):
img = np.copy(img)
if K.image_data_format() == 'channels_first':
factors = (1, 1,
float(size[0]) / img.shape[2],
float(size[1]) / img.shape[3])
factors = (1,
float(size[0]) / img.shape[1],
float(size[1]) / img.shape[2],
return scipy.ndimage.zoom(img, factors, order=1)
def gradient_ascent(x, iterations, step, max_loss=None):
for i in range(iterations):
loss_value, grad_values = eval_loss_and_grads(x)
if max_loss is not None and loss_value > max_loss:
print('..Loss value at', i, ':', loss_value)
x += step * grad_values
return x
def save_img(img, fname):
pil_img = deprocess_image(np.copy(img))
scipy.misc.imsave(fname, pil_img)
- Load the original image.
- Define a number of processing scales (i.e. image shapes),
from smallest to largest.
- Resize the original image to the smallest scale.
- For every scale, starting with the smallest (i.e. current one):
- Run gradient ascent
- Upscale image to the next scale
- Reinject the detail that was lost at upscaling time
- Stop when we are back to the original size.
To obtain the detail lost during upscaling, we simply
take the original image, shrink it down, upscale it,
and compare the result to the (resized) original image.
# Playing with these hyperparameters will also allow you to achieve new effects
step = 0.01 # Gradient ascent step size
num_octave = 3 # Number of scales at which to run gradient ascent
octave_scale = 1.4 # Size ratio between scales
iterations = 20 # Number of ascent steps per scale
max_loss = 10.
img = preprocess_image(base_image_path)
if K.image_data_format() == 'channels_first':
original_shape = img.shape[2:]
original_shape = img.shape[1:3]
successive_shapes = [original_shape]
for i in range(1, num_octave):
shape = tuple([int(dim / (octave_scale ** i)) for dim in original_shape])
successive_shapes = successive_shapes[::-1]
original_img = np.copy(img)
shrunk_original_img = resize_img(img, successive_shapes[0])
for shape in successive_shapes:
print('Processing image shape', shape)
img = resize_img(img, shape)
img = gradient_ascent(img,
upscaled_shrunk_original_img = resize_img(shrunk_original_img, shape)
same_size_original = resize_img(original_img, shape)
lost_detail = same_size_original - upscaled_shrunk_original_img
img += lost_detail
shrunk_original_img = resize_img(original_img, shape)
save_img(img, fname=result_prefix + '.png')
But no matter the setting values I tweak, I seem to only activate low level features, like edges and curves, or, at best, mixed features.
Ideally, settings should be able to access individual layers down to channels and units, i.e.,
Layer4c - Unit 0, but I haven't found in Keras documentation any method to achieve that:
see this:
I have learned that using Caffe framework gives you more flexibility, but installation system-wide is a dependency hell.
So, how do I activate individual classes on this network within the Keras framework, or any other framework other than Caffe?
What worked for me was the following:
To avoid installing all dependencies and caffe on my machine, I've pulled this Docker Image with all Deep Learning frameworks in it.
Within minutes I had caffe (as well as keras, tensorflow, CUDA, theano, lasagne, torch, openCV) installed in a container with a shared folder in my host machine.
I then ran this caffe script -->
Deep Dream, and voilá.
models generated by caffe are more resourceful and allow classes as stated above to be 'printed' on input images or from noise.
I have a dataset with more than 4000 images and 3 classes, and I'm reusing a code for capsule neural network with 10 classes but I modified it to 3 classes, when I'm running the model the following error occurs at the last point of the first epoch (44/45):
Epoch 1/16
44/45 [============================>.] - ETA: 28s - loss: 0.2304 - capsnet_loss: 0.2303 - decoder_loss: 0.2104 - capsnet_accuracy: 0.6598 - decoder_accuracy: 0.5781
InvalidArgumentError: Incompatible shapes: [15,3] vs. [100,3]
[[node gradient_tape/margin_loss/mul/Mul (defined at <ipython-input-22-9d913bd0e1fd>:11) ]] [Op:__inference_train_function_6157]
Function call stack:
Training code:
m = 100
epochs = 16
# Using EarlyStopping, end training when val_accuracy is not improved for 10 consecutive times
early_stopping = keras.callbacks.EarlyStopping(monitor='val_capsnet_accuracy',mode='max',
# Using ReduceLROnPlateau, the learning rate is reduced by half when val_accuracy is not improved for 5 consecutive times
lr_scheduler = keras.callbacks.ReduceLROnPlateau(monitor='val_capsnet_accuracy',mode='max',factor=0.5,patience=4)
train_model.compile(optimizer=keras.optimizers.Adam(lr=0.001),loss=[margin_loss,'mse'],loss_weights = [1. ,0.0005],metrics=['accuracy'])[x_train, y_train],[y_train,x_train], batch_size = m, epochs = epochs, validation_data = ([x_test, y_test],[y_test,x_test]),callbacks=[early_stopping,lr_scheduler])
The model is:
Model: "model"
Layer (type) Output Shape Param # Connected to
input_1 (InputLayer) [(100, 28, 28, 1)] 0
conv2d (Conv2D) (100, 27, 27, 256) 1280 input_1[0][0]
max_pooling2d (MaxPooling2D) (100, 27, 27, 256) 0 conv2d[0][0]
conv2d_1 (Conv2D) (100, 19, 19, 128) 2654336 max_pooling2d[0][0]
conv2d_2 (Conv2D) (100, 6, 6, 128) 1327232 conv2d_1[0][0]
reshape (Reshape) (100, 576, 8) 0 conv2d_2[0][0]
lambda (Lambda) (100, 576, 8) 0 reshape[0][0]
digitcaps (CapsuleLayer) (100, 3, 16) 221184 lambda[0][0]
input_2 (InputLayer) [(None, 3)] 0
mask (Mask) (100, 48) 0 digitcaps[0][0]
capsnet (Length) (100, 3) 0 digitcaps[0][0]
decoder (Sequential) (None, 28, 28, 1) 1354000 mask[0][0]
Total params: 5,558,032
Trainable params: 5,558,032
Non-trainable params: 0
Input layer,convulational layers and primary capsule
# Adding the first conv1 layer
# Adding Maxpooling layer
# Adding second convulational layer
# Adding primary cap layer
# Adding the squash activation
code source
x_train.shape --> (4415, 28, 28, 1)
y_train.shape --> (4415, 3)
x_test.shape --> (1104, 28, 28, 1)
y_test.shape --> (1104, 3)
My code here
Try make the X set so that the batch size perfectly fits the data i think the batch size remainder is 15 after fitting to all the data
For eg : make it a multiple of 100
I followed the blog Where CNN is looking? to understand and visualize the class activations in order to predict something. The given example works very well.
I have developed a custom model using autoencoders for image similarity. The model accepts 2 images and predicts the score for similarity. The model has the following layers:
Layer (type) Output Shape Param # Connected to
input_1 (InputLayer) (None, 256, 256, 3) 0
input_2 (InputLayer) (None, 256, 256, 3) 0
encoder (Sequential) (None, 7, 7, 256) 3752704 input_1[0][0]
Merged_feature_map (Concatenate (None, 7, 7, 512) 0 encoder[1][0]
mnet_conv1 (Conv2D) (None, 7, 7, 1024) 2098176 Merged_feature_map[0][0]
batch_normalization_1 (BatchNor (None, 7, 7, 1024) 4096 mnet_conv1[0][0]
activation_1 (Activation) (None, 7, 7, 1024) 0 batch_normalization_1[0][0]
mnet_pool1 (MaxPooling2D) (None, 3, 3, 1024) 0 activation_1[0][0]
mnet_conv2 (Conv2D) (None, 3, 3, 2048) 8390656 mnet_pool1[0][0]
batch_normalization_2 (BatchNor (None, 3, 3, 2048) 8192 mnet_conv2[0][0]
activation_2 (Activation) (None, 3, 3, 2048) 0 batch_normalization_2[0][0]
mnet_pool2 (MaxPooling2D) (None, 1, 1, 2048) 0 activation_2[0][0]
reshape_1 (Reshape) (None, 1, 2048) 0 mnet_pool2[0][0]
fc1 (Dense) (None, 1, 256) 524544 reshape_1[0][0]
batch_normalization_3 (BatchNor (None, 1, 256) 1024 fc1[0][0]
activation_3 (Activation) (None, 1, 256) 0 batch_normalization_3[0][0]
dropout_1 (Dropout) (None, 1, 256) 0 activation_3[0][0]
fc2 (Dense) (None, 1, 128) 32896 dropout_1[0][0]
batch_normalization_4 (BatchNor (None, 1, 128) 512 fc2[0][0]
activation_4 (Activation) (None, 1, 128) 0 batch_normalization_4[0][0]
dropout_2 (Dropout) (None, 1, 128) 0 activation_4[0][0]
fc3 (Dense) (None, 1, 64) 8256 dropout_2[0][0]
batch_normalization_5 (BatchNor (None, 1, 64) 256 fc3[0][0]
activation_5 (Activation) (None, 1, 64) 0 batch_normalization_5[0][0]
dropout_3 (Dropout) (None, 1, 64) 0 activation_5[0][0]
fc4 (Dense) (None, 1, 1) 65 dropout_3[0][0]
batch_normalization_6 (BatchNor (None, 1, 1) 4 fc4[0][0]
activation_6 (Activation) (None, 1, 1) 0 batch_normalization_6[0][0]
dropout_4 (Dropout) (None, 1, 1) 0 activation_6[0][0]
reshape_2 (Reshape) (None, 1) 0 dropout_4[0][0]
The encoder layer consists of the following layers:
I want to change my custom network to accept one input instead of two using the encoder part only and generate the heatmaps to understand what does the encoder part has learned.
Therefore, the idea is, in case the network predicts 'not similar' then I can generate the heatmaps of images one by one and compare them.
What I have done is the following:
I have passed the two images to the network and got the prediction as described in the blog:
preds = model.predict([x, y])
class_idx = np.argmax(preds[0])
class_output = model.output[:, class_idx]
Set the last convolutional layer and compute the gradient of the class output value with respect to the feature map.
last_conv_layer = model.get_layer('encoder')
grads = K.gradients(class_output, last_conv_layer.get_output_at(-1))[0]
The output of grads:
Tensor("gradients/Merged_feature_map/concat_grad/Slice_1:0", shape=(?, 7, 7, 256), dtype=float32)
Then I done pool the gradients as described in the blog:
pooled_grads = K.mean(grads, axis=(0, 1, 2))
iterate = K.function([input_img], [pooled_grads, last_conv_layer.get_output_at(-1)[0]])
At this moment when I checked the inputs and outputs it shows the following:
[<tf.Tensor 'input_1:0' shape=(?, 256, 256, 3) dtype=float32>]
[<tf.Tensor 'Mean:0' shape=(256,) dtype=float32>, <tf.Tensor 'strided_slice_1:0' shape=(7, 7, 256) dtype=float32>]
But I am now getting the error on the following code line:
pooled_grads_value, conv_layer_output_value = iterate([x])
The error is:
You must feed a value for placeholder tensor 'input_2' with dtype float and shape [?,256,256,3]
[[{{node input_2}}]]
It seems that it is asking for second image input but as seen above 'iterate.inputs' is only one image.
Where have I done a mistake? How can I limit it to accept only one image? Or, any other way to achieve the task in a more batter way?
The following network architecture is designed in order to find the similarity between two images.
Initially, I took VGGNet16 and removed the classification head:
vgg_model = VGG16(weights="imagenet", include_top=False,
input_tensor=Input(shape=(img_width, img_height, channels)))
Afterward, I set the parameter layer.trainable = False, so that the network will work as a feature extractor.
I passed two different images to the network:
encoded_left = vgg_model(input_left)
encoded_right = vgg_model(input_right)
This will produce two feature vectors. Then for the classification (whether they are similar or not), I used a metric network that consists of 2 convolution layers followed by pooling and 4 fully connected layers.
merge(encoded_left, encoded_right) -> conv-pool -> conv-pool -> reshape -> dense * 4 -> output
Hence, the model looks like:
model = Model(inputs=[left_image, right_image], outputs=output)
After training only metric network, for fine-tuning convolution layers, I set the last convo block for training. Therefore, in the second training phase, along with the metric network, the last convolution block is also trained.
Now I want to use this fine-tuned network for another purpose. Here is the network summary:
Layer (type) Output Shape Param # Connected to
input_1 (InputLayer) (None, 224, 224, 3) 0
input_2 (InputLayer) (None, 224, 224, 3) 0
vgg16 (Model) (None, 7, 7, 512) 14714688 input_1[0][0]
Merged_feature_map (Concatenate (None, 7, 7, 1024) 0 vgg16[1][0]
mnet_conv1 (Conv2D) (None, 7, 7, 1024) 4195328 Merged_feature_map[0][0]
batch_normalization_1 (BatchNor (None, 7, 7, 1024) 4096 mnet_conv1[0][0]
activation_1 (Activation) (None, 7, 7, 1024) 0 batch_normalization_1[0][0]
mnet_pool1 (MaxPooling2D) (None, 3, 3, 1024) 0 activation_1[0][0]
mnet_conv2 (Conv2D) (None, 3, 3, 2048) 8390656 mnet_pool1[0][0]
batch_normalization_2 (BatchNor (None, 3, 3, 2048) 8192 mnet_conv2[0][0]
activation_2 (Activation) (None, 3, 3, 2048) 0 batch_normalization_2[0][0]
mnet_pool2 (MaxPooling2D) (None, 1, 1, 2048) 0 activation_2[0][0]
reshape_1 (Reshape) (None, 1, 2048) 0 mnet_pool2[0][0]
fc1 (Dense) (None, 1, 256) 524544 reshape_1[0][0]
batch_normalization_3 (BatchNor (None, 1, 256) 1024 fc1[0][0]
activation_3 (Activation) (None, 1, 256) 0 batch_normalization_3[0][0]
fc2 (Dense) (None, 1, 128) 32896 activation_3[0][0]
batch_normalization_4 (BatchNor (None, 1, 128) 512 fc2[0][0]
activation_4 (Activation) (None, 1, 128) 0 batch_normalization_4[0][0]
fc3 (Dense) (None, 1, 64) 8256 activation_4[0][0]
batch_normalization_5 (BatchNor (None, 1, 64) 256 fc3[0][0]
activation_5 (Activation) (None, 1, 64) 0 batch_normalization_5[0][0]
fc4 (Dense) (None, 1, 1) 65 activation_5[0][0]
batch_normalization_6 (BatchNor (None, 1, 1) 4 fc4[0][0]
activation_6 (Activation) (None, 1, 1) 0 batch_normalization_6[0][0]
reshape_2 (Reshape) (None, 1) 0 activation_6[0][0]
Total params: 27,880,517
Trainable params: 13,158,787
Non-trainable params: 14,721,730
As the last convolution block of VGGNet is already trained on the custom dataset I want to cut the network at layer:
vgg16 (Model) (None, 7, 7, 512) 14714688 input_1[0][0]
and use this as a powerful feature extractor. For this task, I loaded the fine-tuned model:
model = load_model('model.h5')
then tried to create the new model as:
new_model = Model(Input(shape=(img_width, img_height, channels)), model.layers[2].output)
This results in the following error:
`AttributeError: Layer vgg16 has multiple inbound nodes, hence the notion of "layer output" is ill-defined. Use `get_output_at(node_index)` instead.`
Please, advise me where I am doing wrong.
I have tried several ways but the following method works perfectly. Instead of creating new model as:
model = load_model('model.h5')
new_model = Model(Input(shape=(img_width, img_height, channels)), model.layers[2].output)
I used the following way:
model = load_model('model.h5')
sub_model = Sequential()
for layer in model.get_layer('vgg16').layers:
I hope this will help others.
I am finetuning a 3D-CNN called C3D which was originally trained to classify sports from video clips.
I am freezing the convolution (feature extraction) layers and training the fully connected layers using gifs from GIPHY to classify the gifs for sentiment analysis (positive or negative).
Weights are pre loaded for all layers except the final fully connected layer.
I am using 5000 images (2500 positive, 2500 negative) for training with a 70/30 training/testing split using Keras. I am using the Adam optimizer with a learning rate of 0.0001.
The training accuracy increases and the training loss decreases during training but very early on the validation accuracy and loss does not improve as the model starts to overfit.
I believe I have enough training data and am using a dropout of 0.5 on both of the fully connected layers so how can I combat this overfitting?
The model architechture, training code and visualisations of training performance from Keras can be found below.
from training.c3d_model import create_c3d_sentiment_model
from ImageSentiment import load_gif_data
import numpy as np
import pathlib
from keras.callbacks import ModelCheckpoint
from keras.optimizers import Adam
def image_generator(files, batch_size):
Generate batches of images for training instead of loading all images into memory
:param files:
:param batch_size:
while True:
# Select files (paths/indices) for the batch
batch_paths = np.random.choice(a=files,
batch_input = []
batch_output = []
# Read in each input, perform preprocessing and get labels
for input_path in batch_paths:
input = load_gif_data(input_path)
if "pos" in input_path: # if file name contains pos
output = np.array([1, 0]) # label
elif "neg" in input_path: # if file name contains neg
output = np.array([0, 1]) # label
batch_input += [input]
batch_output += [output]
# Return a tuple of (input,output) to feed the network
batch_x = np.array(batch_input)
batch_y = np.array(batch_output)
yield (batch_x, batch_y)
model = create_c3d_sentiment_model()
model.load_weights('models/C3D_Sport1M_weights_keras_2.2.4.h5', by_name=True)
for layer in model.layers[:14]: # freeze top layers as feature extractor
layer.trainable = False
for layer in model.layers[14:]: # fine tune final layers
layer.trainable = True
train_files = [str(filepath.absolute()) for filepath in pathlib.Path('data/sample_train').glob('**/*')]
val_files = [str(filepath.absolute()) for filepath in pathlib.Path('data/sample_validation').glob('**/*')]
batch_size = 8
train_generator = image_generator(train_files, batch_size)
validation_generator = image_generator(val_files, batch_size)
mc = ModelCheckpoint('best_model.h5', monitor='val_loss', mode='min', verbose=1)
history = model.fit_generator(train_generator, validation_data=validation_generator,
steps_per_epoch=int(np.ceil(len(train_files) / batch_size)),
validation_steps=int(np.ceil(len(val_files) / batch_size)), epochs=5, shuffle=True,
def load_gif_data(file_path):
Load and process gif for input into Keras model
:param file_path:
:return: Mean normalised image in BGR format as numpy array
for more info see ->
im = Img(fp=file_path)
im.load(limit=16, # Keras image model only requires 16 frames
print("Error loading image: " + file_path)
im.resize(size=(112, 112))
np_frames = []
frame_index = 0
for i in range(16): # if image is less than 16 frames, repeat the frames until there are 16
frame = im.frames[frame_index]
rgb = np.array(frame)
bgr = rgb[..., ::-1]
mean = np.mean(bgr, axis=0)
np_frames.append(bgr - mean) # C3D model was originally trained on BGR, mean normalised images
# it is important that unseen images are in the same format
if frame_index == (len(im.frames) - 1):
frame_index = 0
frame_index = frame_index + 1
return np.array(np_frames)
model architecture
Layer (type) Output Shape Param #
conv1 (Conv3D) (None, 16, 112, 112, 64) 5248
pool1 (MaxPooling3D) (None, 16, 56, 56, 64) 0
conv2 (Conv3D) (None, 16, 56, 56, 128) 221312
pool2 (MaxPooling3D) (None, 8, 28, 28, 128) 0
conv3a (Conv3D) (None, 8, 28, 28, 256) 884992
conv3b (Conv3D) (None, 8, 28, 28, 256) 1769728
pool3 (MaxPooling3D) (None, 4, 14, 14, 256) 0
conv4a (Conv3D) (None, 4, 14, 14, 512) 3539456
conv4b (Conv3D) (None, 4, 14, 14, 512) 7078400
pool4 (MaxPooling3D) (None, 2, 7, 7, 512) 0
conv5a (Conv3D) (None, 2, 7, 7, 512) 7078400
conv5b (Conv3D) (None, 2, 7, 7, 512) 7078400
zeropad5 (ZeroPadding3D) (None, 2, 8, 8, 512) 0
pool5 (MaxPooling3D) (None, 1, 4, 4, 512) 0
flatten_1 (Flatten) (None, 8192) 0
fc6 (Dense) (None, 4096) 33558528
dropout_1 (Dropout) (None, 4096) 0
fc7 (Dense) (None, 4096) 16781312
dropout_2 (Dropout) (None, 4096) 0
nfc8 (Dense) (None, 2) 8194
Total params: 78,003,970
Trainable params: 78,003,970
Non-trainable params: 0
training visualisations
I think that the error is in the loss function and in the last Dense layer. As provided in the model summary, the last Dense layer is,
nfc8 (Dense) (None, 2)
The output shape is ( None , 2 ) meaning that the layer has 2 units. As you said earlier, you need to classify GIFs as positive or negative.
Classifying GIFs could be a binary classification problem or a multiclass classification problem ( with two classes ).
Binary classification has only 1 unit in the last Dense layer with a sigmoid activation function. But, here the model has 2 units in the last Dense layer.
Hence, the model is a multiclass classifier, but you have given a loss function of binary_crossentropy which is meant for binary classifiers ( with a single unit in the last layer ).
So, replacing the loss with categorical_crossentropy should work. Or edit the last Dense layer and change the number of units and activation function.
Hope this helps.
I use Keras 2.2.4. I train a model that I want to fine-tune every 30 epochs with new data content (image classification).
Everyday I add more image to classes to feed the model. Every 30 epochs the model is re-trained.
I use 2 conditions, first one if no previous model already trained and second condition when a model is already trained then I want to fine-tune it with new content/classes.
model_base = keras.applications.vgg19.VGG19(include_top=False, input_shape=(*IMG_SIZE, 3), weights='imagenet')
output = GlobalAveragePooling2D()(model_base.output)
# If we resume a pretrained model load it
if os.path.isfile(os.path.join(MODEL_PATH, 'weights.h5')):
print('Using existing weights...')
base_lr = 0.0001
model = load_model(os.path.join(MODEL_PATH, 'weights.h5'))
output = Dense(len(all_character_names), activation='softmax', name='d2')(output)
model = Model(model_base.input, output)
for layer in model_base.layers[:-2]:
layer.trainable = False
base_lr = 0.001
output = BatchNormalization()(output)
output = Dropout(0.5)(output)
output = Dense(2048, activation='relu', name='d1')(output)
output = BatchNormalization()(output)
output = Dropout(0.5)(output)
output = Dense(len(all_character_names), activation='softmax', name='d2')(output)
model = Model(model_base.input, output)
for layer in model_base.layers[:-5]:
layer.trainable = False
opt = optimizers.Adam(lr=base_lr, decay=base_lr / epochs)
Model summary first time:
block5_conv4 (Conv2D) (None, 14, 14, 512) 2359808
block5_pool (MaxPooling2D) (None, 7, 7, 512) 0
global_average_pooling2d_1 ( (None, 512) 0
batch_normalization_1 (Batch (None, 512) 2048
dropout_1 (Dropout) (None, 512) 0
d1 (Dense) (None, 2048) 1050624
batch_normalization_2 (Batch (None, 2048) 8192
dropout_2 (Dropout) (None, 2048) 0
d2 (Dense) (None, 19) 38931
Total params: 21,124,179
Trainable params: 10,533,907
Non-trainable params: 10,590,272
Model summary second time:
block5_conv4 (Conv2D) (None, 14, 14, 512) 2359808
block5_pool (MaxPooling2D) (None, 7, 7, 512) 0
global_average_pooling2d_1 ( (None, 512) 0
d2 (Dense) (None, 19) 9747
Total params: 20,034,131
Trainable params: 2,369,555
Non-trainable params: 17,664,576
Problem: When a model exist and is loaded for fine-tune it seems to have loose all additionals layers added the first time (Dense 2048, Dropout, etc)
Do I need to add these layers again ? It seems to have no sense as it would loose the training information made at the first pass.
Note: I may need to not set the base_lr as saving a model should save also the learning rate at the state where it stopped before, but I will check this later.
Please note that once you load the model:
model = load_model(os.path.join(MODEL_PATH, 'weights.h5'))
You don't use it. You just overwrite it again
model = Model(model_base.input, output)
Where output is also defined as an operation on the base_model.
It seems to me that you just want to delete the lines after load_model.