I'm trying to design a Convolutional Net to estimate the Depth of images using Keras.
I have RGB Input images with the shape of 3x120x160 and have the Grayscale Output Depth Maps with the shape of 1x120x160.
I tried using a VGG like architecture where the depth of each layer grows but at the end when I want to design the final layers, I get stuck. using a Dense layer is too expensive and I tried using Upsampling which proved inefficient.
I want to use DeConvolution2D but I can't get it to work. the only architecture I end up is something like this:
model = Sequential()
model.add(Convolution2D(64, 5, 5, activation='relu', input_shape=(3, 120, 160)))
model.add(Convolution2D(64, 5, 5, activation='relu'))
model.add(MaxPooling2D())
model.add(Dropout(0.5))
model.add(Convolution2D(128, 3, 3, activation='relu'))
model.add(Convolution2D(128, 3, 3, activation='relu'))
model.add(MaxPooling2D())
model.add(Dropout(0.5))
model.add(Convolution2D(256, 3, 3, activation='relu'))
model.add(Convolution2D(256, 3, 3, activation='relu'))
model.add(Dropout(0.5))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(Dropout(0.5))
model.add(ZeroPadding2D())
model.add(Deconvolution2D(512, 3, 3, (None, 512, 41, 61), subsample=(2, 2), activation='relu'))
model.add(Deconvolution2D(512, 3, 3, (None, 512, 123, 183), subsample=(3, 3), activation='relu'))
model.add(cropping.Cropping2D(cropping=((1, 2), (11, 12))))
model.add(Convolution2D(1, 1, 1, activation='sigmoid', border_mode='same'))
The Model summary is like this :
Layer (type) Output Shape Param # Connected to
====================================================================================================
convolution2d_1 (Convolution2D) (None, 64, 116, 156) 4864 convolution2d_input_1[0][0]
____________________________________________________________________________________________________
convolution2d_2 (Convolution2D) (None, 64, 112, 152) 102464 convolution2d_1[0][0]
____________________________________________________________________________________________________
maxpooling2d_1 (MaxPooling2D) (None, 64, 56, 76) 0 convolution2d_2[0][0]
____________________________________________________________________________________________________
dropout_1 (Dropout) (None, 64, 56, 76) 0 maxpooling2d_1[0][0]
____________________________________________________________________________________________________
convolution2d_3 (Convolution2D) (None, 128, 54, 74) 73856 dropout_1[0][0]
____________________________________________________________________________________________________
convolution2d_4 (Convolution2D) (None, 128, 52, 72) 147584 convolution2d_3[0][0]
____________________________________________________________________________________________________
maxpooling2d_2 (MaxPooling2D) (None, 128, 26, 36) 0 convolution2d_4[0][0]
____________________________________________________________________________________________________
dropout_2 (Dropout) (None, 128, 26, 36) 0 maxpooling2d_2[0][0]
____________________________________________________________________________________________________
convolution2d_5 (Convolution2D) (None, 256, 24, 34) 295168 dropout_2[0][0]
____________________________________________________________________________________________________
convolution2d_6 (Convolution2D) (None, 256, 22, 32) 590080 convolution2d_5[0][0]
____________________________________________________________________________________________________
dropout_3 (Dropout) (None, 256, 22, 32) 0 convolution2d_6[0][0]
____________________________________________________________________________________________________
convolution2d_7 (Convolution2D) (None, 512, 20, 30) 1180160 dropout_3[0][0]
____________________________________________________________________________________________________
convolution2d_8 (Convolution2D) (None, 512, 18, 28) 2359808 convolution2d_7[0][0]
____________________________________________________________________________________________________
dropout_4 (Dropout) (None, 512, 18, 28) 0 convolution2d_8[0][0]
____________________________________________________________________________________________________
zeropadding2d_1 (ZeroPadding2D) (None, 512, 20, 30) 0 dropout_4[0][0]
____________________________________________________________________________________________________
deconvolution2d_1 (Deconvolution2(None, 512, 41, 61) 2359808 zeropadding2d_1[0][0]
____________________________________________________________________________________________________
deconvolution2d_2 (Deconvolution2(None, 512, 123, 183) 2359808 deconvolution2d_1[0][0]
____________________________________________________________________________________________________
cropping2d_1 (Cropping2D) (None, 512, 120, 160) 0 deconvolution2d_2[0][0]
____________________________________________________________________________________________________
convolution2d_9 (Convolution2D) (None, 1, 120, 160) 513 cropping2d_1[0][0]
====================================================================================================
Total params: 9474113
I couldn't reduce the size of Deconvolution2D layers from 512 as doing so results in shape related errors and it seems I have to add as many Deconvolution2D layers as the number of filters in the previous layer.
I also had to add a final Convolution2D layer to be able to run the network.
The above architecture learns but really slow and (I think) inefficiently. I'm sure I'm doing something wrong and the design shouldn't be like this. Can you help me design a better network?
I also tried to make a network as the one mentioned in this repository but it seems Keras doesn't work as this Lasagne example does. I'd really appreciate it if someone could show me how to design something like this network in Keras. It's architecture is like this :
Thanks
I'd suggest a U-Net (see figure 1). In the first half of a U-Net, the spatial resolution gets reduced as the number of channels increases (like VGG, as you mentioned). In the second half, the opposite happens, (number of channels get reduced, resolution increases). "Skip" connections between different layers allow for the network to efficiently produce high-resolution output.
You should be able to find an appropriate Keras implementation (maybe this one).
Related
I followed the blog Where CNN is looking? to understand and visualize the class activations in order to predict something. The given example works very well.
I have developed a custom model using autoencoders for image similarity. The model accepts 2 images and predicts the score for similarity. The model has the following layers:
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, 256, 256, 3) 0
__________________________________________________________________________________________________
input_2 (InputLayer) (None, 256, 256, 3) 0
__________________________________________________________________________________________________
encoder (Sequential) (None, 7, 7, 256) 3752704 input_1[0][0]
input_2[0][0]
__________________________________________________________________________________________________
Merged_feature_map (Concatenate (None, 7, 7, 512) 0 encoder[1][0]
encoder[2][0]
__________________________________________________________________________________________________
mnet_conv1 (Conv2D) (None, 7, 7, 1024) 2098176 Merged_feature_map[0][0]
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 7, 7, 1024) 4096 mnet_conv1[0][0]
__________________________________________________________________________________________________
activation_1 (Activation) (None, 7, 7, 1024) 0 batch_normalization_1[0][0]
__________________________________________________________________________________________________
mnet_pool1 (MaxPooling2D) (None, 3, 3, 1024) 0 activation_1[0][0]
__________________________________________________________________________________________________
mnet_conv2 (Conv2D) (None, 3, 3, 2048) 8390656 mnet_pool1[0][0]
__________________________________________________________________________________________________
batch_normalization_2 (BatchNor (None, 3, 3, 2048) 8192 mnet_conv2[0][0]
__________________________________________________________________________________________________
activation_2 (Activation) (None, 3, 3, 2048) 0 batch_normalization_2[0][0]
__________________________________________________________________________________________________
mnet_pool2 (MaxPooling2D) (None, 1, 1, 2048) 0 activation_2[0][0]
__________________________________________________________________________________________________
reshape_1 (Reshape) (None, 1, 2048) 0 mnet_pool2[0][0]
__________________________________________________________________________________________________
fc1 (Dense) (None, 1, 256) 524544 reshape_1[0][0]
__________________________________________________________________________________________________
batch_normalization_3 (BatchNor (None, 1, 256) 1024 fc1[0][0]
__________________________________________________________________________________________________
activation_3 (Activation) (None, 1, 256) 0 batch_normalization_3[0][0]
__________________________________________________________________________________________________
dropout_1 (Dropout) (None, 1, 256) 0 activation_3[0][0]
__________________________________________________________________________________________________
fc2 (Dense) (None, 1, 128) 32896 dropout_1[0][0]
__________________________________________________________________________________________________
batch_normalization_4 (BatchNor (None, 1, 128) 512 fc2[0][0]
__________________________________________________________________________________________________
activation_4 (Activation) (None, 1, 128) 0 batch_normalization_4[0][0]
__________________________________________________________________________________________________
dropout_2 (Dropout) (None, 1, 128) 0 activation_4[0][0]
__________________________________________________________________________________________________
fc3 (Dense) (None, 1, 64) 8256 dropout_2[0][0]
__________________________________________________________________________________________________
batch_normalization_5 (BatchNor (None, 1, 64) 256 fc3[0][0]
__________________________________________________________________________________________________
activation_5 (Activation) (None, 1, 64) 0 batch_normalization_5[0][0]
__________________________________________________________________________________________________
dropout_3 (Dropout) (None, 1, 64) 0 activation_5[0][0]
__________________________________________________________________________________________________
fc4 (Dense) (None, 1, 1) 65 dropout_3[0][0]
__________________________________________________________________________________________________
batch_normalization_6 (BatchNor (None, 1, 1) 4 fc4[0][0]
__________________________________________________________________________________________________
activation_6 (Activation) (None, 1, 1) 0 batch_normalization_6[0][0]
__________________________________________________________________________________________________
dropout_4 (Dropout) (None, 1, 1) 0 activation_6[0][0]
__________________________________________________________________________________________________
reshape_2 (Reshape) (None, 1) 0 dropout_4[0][0]
==================================================================================================
The encoder layer consists of the following layers:
conv2d_1
batch_normalization_1
activation_1
max_pooling2d_1
conv2d_2
batch_normalization_2
activation_2
max_pooling2d_2
conv2d_3
batch_normalization_3
activation_3
conv2d_4
batch_normalization_4
activation_4
conv2d_5
batch_normalization_5
activation_5
max_pooling2d_3
I want to change my custom network to accept one input instead of two using the encoder part only and generate the heatmaps to understand what does the encoder part has learned.
Therefore, the idea is, in case the network predicts 'not similar' then I can generate the heatmaps of images one by one and compare them.
What I have done is the following:
I have passed the two images to the network and got the prediction as described in the blog:
preds = model.predict([x, y])
class_idx = np.argmax(preds[0])
class_output = model.output[:, class_idx]
Set the last convolutional layer and compute the gradient of the class output value with respect to the feature map.
last_conv_layer = model.get_layer('encoder')
grads = K.gradients(class_output, last_conv_layer.get_output_at(-1))[0]
The output of grads:
Tensor("gradients/Merged_feature_map/concat_grad/Slice_1:0", shape=(?, 7, 7, 256), dtype=float32)
Then I done pool the gradients as described in the blog:
pooled_grads = K.mean(grads, axis=(0, 1, 2))
iterate = K.function([input_img], [pooled_grads, last_conv_layer.get_output_at(-1)[0]])
At this moment when I checked the inputs and outputs it shows the following:
iterate.inputs
[<tf.Tensor 'input_1:0' shape=(?, 256, 256, 3) dtype=float32>]
iterate.outputs
[<tf.Tensor 'Mean:0' shape=(256,) dtype=float32>, <tf.Tensor 'strided_slice_1:0' shape=(7, 7, 256) dtype=float32>]
But I am now getting the error on the following code line:
pooled_grads_value, conv_layer_output_value = iterate([x])
The error is:
You must feed a value for placeholder tensor 'input_2' with dtype float and shape [?,256,256,3]
[[{{node input_2}}]]
It seems that it is asking for second image input but as seen above 'iterate.inputs' is only one image.
Where have I done a mistake? How can I limit it to accept only one image? Or, any other way to achieve the task in a more batter way?
I built an autoencoder using my own data set of about 32k images. I did a 75/25 split for training/testing, and I was able to get results I'm happy with.
Now I want to be able to extract the feature space and map them to every image in my dataset and to new data that wasn't tested. I couldn't find a tutorial online that delved into using the encoder as a feature space. All I could find was to build the full network.
My code:
> input_img = Input(shape=(200, 200, 1))
# encoder part of the model (increased filter lyaer after each filter)
x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(64, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(128, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
# decoder part of the model (went backwards from the encoder)
x = Conv2D(128, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(64, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(16, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)
decoded = Cropping2D(cropping=((8,0), (8,0)), data_format=None)(x)
autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')
autoencoder.summary()
Here's my net setup if anybody is interested:
Model: "model_22"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_23 (InputLayer) (None, 200, 200, 1) 0
_________________________________________________________________
conv2d_186 (Conv2D) (None, 200, 200, 16) 160
_________________________________________________________________
max_pooling2d_83 (MaxPooling (None, 100, 100, 16) 0
_________________________________________________________________
conv2d_187 (Conv2D) (None, 100, 100, 32) 4640
_________________________________________________________________
max_pooling2d_84 (MaxPooling (None, 50, 50, 32) 0
_________________________________________________________________
conv2d_188 (Conv2D) (None, 50, 50, 64) 18496
_________________________________________________________________
max_pooling2d_85 (MaxPooling (None, 25, 25, 64) 0
_________________________________________________________________
conv2d_189 (Conv2D) (None, 25, 25, 128) 73856
_________________________________________________________________
max_pooling2d_86 (MaxPooling (None, 13, 13, 128) 0
_________________________________________________________________
conv2d_190 (Conv2D) (None, 13, 13, 128) 147584
_________________________________________________________________
up_sampling2d_82 (UpSampling (None, 26, 26, 128) 0
_________________________________________________________________
conv2d_191 (Conv2D) (None, 26, 26, 64) 73792
_________________________________________________________________
up_sampling2d_83 (UpSampling (None, 52, 52, 64) 0
_________________________________________________________________
conv2d_192 (Conv2D) (None, 52, 52, 32) 18464
_________________________________________________________________
up_sampling2d_84 (UpSampling (None, 104, 104, 32) 0
_________________________________________________________________
conv2d_193 (Conv2D) (None, 104, 104, 16) 4624
_________________________________________________________________
up_sampling2d_85 (UpSampling (None, 208, 208, 16) 0
_________________________________________________________________
conv2d_194 (Conv2D) (None, 208, 208, 1) 145
_________________________________________________________________
cropping2d_2 (Cropping2D) (None, 200, 200, 1) 0
=================================================================
Total params: 341,761
Trainable params: 341,761
Non-trainable params: 0
Then my training:
autoencoder.fit(train, train,
epochs=3,
batch_size=128,
shuffle=True,
validation_data=(test, test))
My results:
Train on 23412 samples, validate on 7805 samples
Epoch 1/3
23412/23412 [==============================] - 773s 33ms/step - loss: 0.0620 - val_loss: 0.0398
Epoch 2/3
23412/23412 [==============================] - 715s 31ms/step - loss: 0.0349 - val_loss: 0.0349
Epoch 3/3
23412/23412 [==============================] - 753s 32ms/step - loss: 0.0314 - val_loss: 0.0319
Rather not share the images, but they look well reconstructed.
Thank you for all and any help!
Not sure if I fully understand your questions, but do you want to get the resulting feature space for every image you trained on as well as others. Why not just do this?
Name your encoded layer in your autoencoder architecture as 'embedding.' Then create the encoder the following way:
embedding_layer = autoencoder.get_layer(name='embedding').output
encoder = Model(input_img,embedding_layer)
The following network architecture is designed in order to find the similarity between two images.
Initially, I took VGGNet16 and removed the classification head:
vgg_model = VGG16(weights="imagenet", include_top=False,
input_tensor=Input(shape=(img_width, img_height, channels)))
Afterward, I set the parameter layer.trainable = False, so that the network will work as a feature extractor.
I passed two different images to the network:
encoded_left = vgg_model(input_left)
encoded_right = vgg_model(input_right)
This will produce two feature vectors. Then for the classification (whether they are similar or not), I used a metric network that consists of 2 convolution layers followed by pooling and 4 fully connected layers.
merge(encoded_left, encoded_right) -> conv-pool -> conv-pool -> reshape -> dense * 4 -> output
Hence, the model looks like:
model = Model(inputs=[left_image, right_image], outputs=output)
After training only metric network, for fine-tuning convolution layers, I set the last convo block for training. Therefore, in the second training phase, along with the metric network, the last convolution block is also trained.
Now I want to use this fine-tuned network for another purpose. Here is the network summary:
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, 224, 224, 3) 0
__________________________________________________________________________________________________
input_2 (InputLayer) (None, 224, 224, 3) 0
__________________________________________________________________________________________________
vgg16 (Model) (None, 7, 7, 512) 14714688 input_1[0][0]
input_2[0][0]
__________________________________________________________________________________________________
Merged_feature_map (Concatenate (None, 7, 7, 1024) 0 vgg16[1][0]
vgg16[2][0]
__________________________________________________________________________________________________
mnet_conv1 (Conv2D) (None, 7, 7, 1024) 4195328 Merged_feature_map[0][0]
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 7, 7, 1024) 4096 mnet_conv1[0][0]
__________________________________________________________________________________________________
activation_1 (Activation) (None, 7, 7, 1024) 0 batch_normalization_1[0][0]
__________________________________________________________________________________________________
mnet_pool1 (MaxPooling2D) (None, 3, 3, 1024) 0 activation_1[0][0]
__________________________________________________________________________________________________
mnet_conv2 (Conv2D) (None, 3, 3, 2048) 8390656 mnet_pool1[0][0]
__________________________________________________________________________________________________
batch_normalization_2 (BatchNor (None, 3, 3, 2048) 8192 mnet_conv2[0][0]
__________________________________________________________________________________________________
activation_2 (Activation) (None, 3, 3, 2048) 0 batch_normalization_2[0][0]
__________________________________________________________________________________________________
mnet_pool2 (MaxPooling2D) (None, 1, 1, 2048) 0 activation_2[0][0]
__________________________________________________________________________________________________
reshape_1 (Reshape) (None, 1, 2048) 0 mnet_pool2[0][0]
__________________________________________________________________________________________________
fc1 (Dense) (None, 1, 256) 524544 reshape_1[0][0]
__________________________________________________________________________________________________
batch_normalization_3 (BatchNor (None, 1, 256) 1024 fc1[0][0]
__________________________________________________________________________________________________
activation_3 (Activation) (None, 1, 256) 0 batch_normalization_3[0][0]
__________________________________________________________________________________________________
fc2 (Dense) (None, 1, 128) 32896 activation_3[0][0]
__________________________________________________________________________________________________
batch_normalization_4 (BatchNor (None, 1, 128) 512 fc2[0][0]
__________________________________________________________________________________________________
activation_4 (Activation) (None, 1, 128) 0 batch_normalization_4[0][0]
__________________________________________________________________________________________________
fc3 (Dense) (None, 1, 64) 8256 activation_4[0][0]
__________________________________________________________________________________________________
batch_normalization_5 (BatchNor (None, 1, 64) 256 fc3[0][0]
__________________________________________________________________________________________________
activation_5 (Activation) (None, 1, 64) 0 batch_normalization_5[0][0]
__________________________________________________________________________________________________
fc4 (Dense) (None, 1, 1) 65 activation_5[0][0]
__________________________________________________________________________________________________
batch_normalization_6 (BatchNor (None, 1, 1) 4 fc4[0][0]
__________________________________________________________________________________________________
activation_6 (Activation) (None, 1, 1) 0 batch_normalization_6[0][0]
__________________________________________________________________________________________________
reshape_2 (Reshape) (None, 1) 0 activation_6[0][0]
==================================================================================================
Total params: 27,880,517
Trainable params: 13,158,787
Non-trainable params: 14,721,730
As the last convolution block of VGGNet is already trained on the custom dataset I want to cut the network at layer:
__________________________________________________________________________________________________
vgg16 (Model) (None, 7, 7, 512) 14714688 input_1[0][0]
input_2[0][0]
__________________________________________________________________________________________________
and use this as a powerful feature extractor. For this task, I loaded the fine-tuned model:
model = load_model('model.h5')
then tried to create the new model as:
new_model = Model(Input(shape=(img_width, img_height, channels)), model.layers[2].output)
This results in the following error:
`AttributeError: Layer vgg16 has multiple inbound nodes, hence the notion of "layer output" is ill-defined. Use `get_output_at(node_index)` instead.`
Please, advise me where I am doing wrong.
I have tried several ways but the following method works perfectly. Instead of creating new model as:
model = load_model('model.h5')
new_model = Model(Input(shape=(img_width, img_height, channels)), model.layers[2].output)
I used the following way:
model = load_model('model.h5')
sub_model = Sequential()
for layer in model.get_layer('vgg16').layers:
sub_model.add(layer)
I hope this will help others.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I am new to Keras and I have a problem, that given an image I have to make a convolution neural network which outputs another image based upon it.
Now all the examples I have seen on the internet consists of classification problems where each image is given a label with one hot encoding. I want to give an image as a label.
A series of progressive convolutions can be followed by a series of resizing interpolations, e.g. as implemented in a layer such as this:
class Interpolation(Layer):
def __init__(self, output_dim, num_channels, **kwargs):
self.num_channels = num_channels
self.output_dim = output_dim
super(Interpolation, self).__init__(**kwargs)
def build(self, input_shape):
super(Interpolation, self).build(input_shape)
def call(self, x):
return K.tf.image.resize_bilinear(x, self.output_dim)
def compute_output_shape(self, input_shape):
return input_shape[0], input_shape[1] *2 , input_shape[2]* 2, self.num_channels
Then you can apply a series of transformations that will result in an output shape matching your input shape. Below is an example code showcasing the use of this layer:
# grayscale in
uncolored = Input(shape=(200,200,1,))
# first block 200x200x3
conv0 = Conv2D(3, (3,3), padding='same', activation='relu', data_format='channels_last', name='0', kernel_regularizer='l2')(uncolored)
bn0 = BatchNormalization()(conv0)
# second block 200x200x64
conv1 = Conv2D(64, (3,3), padding='same', activation='relu', data_format='channels_last', kernel_regularizer='l2')(conv0)
bn1 = BatchNormalization()(conv1) # 200x200x64
pool0 = MaxPooling2D(pool_size=2, padding='same')(conv0) # 100x100x64
# third block # 100x100x128
conv2 = Conv2D(128, (3,3), padding='same', activation='relu', data_format='channels_last', kernel_regularizer='l2')(pool0)
bn2 = BatchNormalization()(conv2) # 100 x 100 x 128
pool1 = MaxPooling2D(pool_size=2, padding='same')(conv2) # 50x50x128
# fourth block 50x50x256
conv3 = Conv2D(256, (3,3), padding='same', activation='relu', data_format='channels_last', name='2', kernel_regularizer='l2')(pool1)
bn3 = BatchNormalization()(conv3) # 50 x 50 x 256
pool2 = MaxPooling2D(pool_size=2, padding='same')(conv3) # 25 x 25 x 256
# fifth block 25 x 25 x 512
conv4 = Conv2D(512, (3,3), padding='same', activation='relu', data_format='channels_last', kernel_regularizer='l2')(pool2)
bn4 = BatchNormalization()(conv4)
rconv0 = Conv2D(256, (1,1), padding='same', activation='sigmoid', data_format='channels_last', kernel_regularizer='l2')(conv4)
# first upscale
interp_layer0 = Interpolation(output_dim=(50,50),
num_channels=256) (rconv0) #
# first addition
intermediate_0 = Add()([interp_layer0, bn3])
rconv1 = Conv2D(128, (3,3), padding='same',
activation='sigmoid', data_format='channels_last')(intermediate_0)
# second upscale
interp_layer1 = Interpolation(output_dim=(100,100),
num_channels=128)(rconv1)
# second addition
intermediate_1 = Add()([interp_layer1, bn2])
rconv2 = Conv2D(64, (3,3), padding='same',
activation='sigmoid', data_format='channels_last')(intermediate_1)
# third upscale
interp_layer2 = Interpolation(output_dim=(200,200),
num_channels= 64)(rconv2)
# third addition
intermediate_2 = Add()([interp_layer2, bn1 ])
rconv3 = Conv2D(3, (3,3), padding='same',
activation='sigmoid', data_format='channels_last')(intermediate_2)
# fourth addition
intermediate_3 = Add()([rconv3,bn0])
rconv4 = Conv2D(3, (3,3), padding='same', activation='sigmoid', data_format='channels_last')(intermediate_3)
model = Model(inputs=[uncolored], outputs=[rconv4])
And the model summary:
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_5 (InputLayer) (None, 200, 200, 1) 0
__________________________________________________________________________________________________
0 (Conv2D) (None, 200, 200, 3) 30 input_5[0][0]
__________________________________________________________________________________________________
max_pooling2d_13 (MaxPooling2D) (None, 100, 100, 3) 0 0[0][0]
__________________________________________________________________________________________________
conv2d_34 (Conv2D) (None, 100, 100, 128 3584 max_pooling2d_13[0][0]
__________________________________________________________________________________________________
max_pooling2d_14 (MaxPooling2D) (None, 50, 50, 128) 0 conv2d_34[0][0]
__________________________________________________________________________________________________
2 (Conv2D) (None, 50, 50, 256) 295168 max_pooling2d_14[0][0]
__________________________________________________________________________________________________
max_pooling2d_15 (MaxPooling2D) (None, 25, 25, 256) 0 2[0][0]
__________________________________________________________________________________________________
conv2d_35 (Conv2D) (None, 25, 25, 512) 1180160 max_pooling2d_15[0][0]
__________________________________________________________________________________________________
conv2d_36 (Conv2D) (None, 25, 25, 256) 131328 conv2d_35[0][0]
__________________________________________________________________________________________________
interpolation_13 (Interpolation (None, 50, 50, 256) 0 conv2d_36[0][0]
__________________________________________________________________________________________________
batch_normalization_24 (BatchNo (None, 50, 50, 256) 1024 2[0][0]
__________________________________________________________________________________________________
add_17 (Add) (None, 50, 50, 256) 0 interpolation_13[0][0]
batch_normalization_24[0][0]
__________________________________________________________________________________________________
conv2d_37 (Conv2D) (None, 50, 50, 128) 295040 add_17[0][0]
__________________________________________________________________________________________________
interpolation_14 (Interpolation (None, 100, 100, 128 0 conv2d_37[0][0]
__________________________________________________________________________________________________
batch_normalization_23 (BatchNo (None, 100, 100, 128 512 conv2d_34[0][0]
__________________________________________________________________________________________________
add_18 (Add) (None, 100, 100, 128 0 interpolation_14[0][0]
batch_normalization_23[0][0]
__________________________________________________________________________________________________
conv2d_38 (Conv2D) (None, 100, 100, 64) 73792 add_18[0][0]
__________________________________________________________________________________________________
conv2d_33 (Conv2D) (None, 200, 200, 64) 1792 0[0][0]
__________________________________________________________________________________________________
interpolation_15 (Interpolation (None, 200, 200, 64) 0 conv2d_38[0][0]
__________________________________________________________________________________________________
batch_normalization_22 (BatchNo (None, 200, 200, 64) 256 conv2d_33[0][0]
__________________________________________________________________________________________________
add_19 (Add) (None, 200, 200, 64) 0 interpolation_15[0][0]
batch_normalization_22[0][0]
__________________________________________________________________________________________________
conv2d_39 (Conv2D) (None, 200, 200, 3) 195 add_19[0][0]
__________________________________________________________________________________________________
batch_normalization_21 (BatchNo (None, 200, 200, 3) 12 0[0][0]
__________________________________________________________________________________________________
add_20 (Add) (None, 200, 200, 3) 0 conv2d_39[0][0]
batch_normalization_21[0][0]
__________________________________________________________________________________________________
conv2d_40 (Conv2D) (None, 200, 200, 1) 28 add_20[0][0]
==================================================================================================
Total params: 1,982,921
Trainable params: 1,982,019
Non-trainable params: 902
_____________________________
In the example, we move from a single-channel image to a multi-channel image. The same idea can be replicated for images of whatever size / number of channels. The exact network architecture of course depends on your desired functionality.
I tryed to run a deep learning code in Keras but got following error message all the time. I've searched all around and spend much time but still failed to fix it up. I'm a fish, any help will be appreciated!!
runfile('E:/dilation-keras/predict.py', wdir='E:/dilation-keras')
Using Theano backend.
Using gpu device 0: GeForce GT 635M (CNMeM is enabled with initial size: 90.0% of memory, cuDNN not available)
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
input_3 (InputLayer) (None, 3, 900, 900) 0
____________________________________________________________________________________________________
conv1_1 (Convolution2D) (None, 64, 898, 898) 1792 input_3[0][0]
____________________________________________________________________________________________________
conv1_2 (Convolution2D) (None, 64, 896, 896) 36928 conv1_1[0][0]
____________________________________________________________________________________________________
pool1 (MaxPooling2D) (None, 64, 448, 448) 0 conv1_2[0][0]
____________________________________________________________________________________________________
conv2_1 (Convolution2D) (None, 128, 446, 446) 73856 pool1[0][0]
____________________________________________________________________________________________________
conv2_2 (Convolution2D) (None, 128, 444, 444) 147584 conv2_1[0][0]
____________________________________________________________________________________________________
pool2 (MaxPooling2D) (None, 128, 222, 222) 0 conv2_2[0][0]
____________________________________________________________________________________________________
conv3_1 (Convolution2D) (None, 256, 220, 220) 295168 pool2[0][0]
____________________________________________________________________________________________________
conv3_2 (Convolution2D) (None, 256, 218, 218) 590080 conv3_1[0][0]
____________________________________________________________________________________________________
conv3_3 (Convolution2D) (None, 256, 216, 216) 590080 conv3_2[0][0]
____________________________________________________________________________________________________
pool3 (MaxPooling2D) (None, 256, 108, 108) 0 conv3_3[0][0]
____________________________________________________________________________________________________
conv4_1 (Convolution2D) (None, 512, 106, 106) 1180160 pool3[0][0]
____________________________________________________________________________________________________
conv4_2 (Convolution2D) (None, 512, 104, 104) 2359808 conv4_1[0][0]
____________________________________________________________________________________________________
conv4_3 (Convolution2D) (None, 512, 102, 102) 2359808 conv4_2[0][0]
____________________________________________________________________________________________________
conv5_1 (AtrousConvolution2D) (None, 512, 98, 98) 2359808 conv4_3[0][0]
____________________________________________________________________________________________________
conv5_2 (AtrousConvolution2D) (None, 512, 94, 94) 2359808 conv5_1[0][0]
____________________________________________________________________________________________________
conv5_3 (AtrousConvolution2D) (None, 512, 90, 90) 2359808 conv5_2[0][0]
____________________________________________________________________________________________________
fc6 (AtrousConvolution2D) (None, 4096, 66, 66) 102764544 conv5_3[0][0]
____________________________________________________________________________________________________
drop6 (Dropout) (None, 4096, 66, 66) 0 fc6[0][0]
____________________________________________________________________________________________________
fc7 (Convolution2D) (None, 4096, 66, 66) 16781312 drop6[0][0]
____________________________________________________________________________________________________
drop7 (Dropout) (None, 4096, 66, 66) 0 fc7[0][0]
____________________________________________________________________________________________________
fc-final (Convolution2D) (None, 21, 66, 66) 86037 drop7[0][0]
____________________________________________________________________________________________________
zeropadding2d_3 (ZeroPadding2D) (None, 21, 132, 132) 0 fc-final[0][0]
____________________________________________________________________________________________________
ct_conv1_1 (Convolution2D) (None, 42, 130, 130) 7980 zeropadding2d_3[0][0]
____________________________________________________________________________________________________
ct_conv1_2 (Convolution2D) (None, 42, 128, 128) 15918 ct_conv1_1[0][0]
____________________________________________________________________________________________________
ct_conv2_1 (AtrousConvolution2D) (None, 84, 124, 124) 31836 ct_conv1_2[0][0]
____________________________________________________________________________________________________
ct_conv3_1 (AtrousConvolution2D) (None, 168, 116, 116) 127176 ct_conv2_1[0][0]
____________________________________________________________________________________________________
ct_conv4_1 (AtrousConvolution2D) (None, 336, 100, 100) 508368 ct_conv3_1[0][0]
____________________________________________________________________________________________________
ct_conv5_1 (AtrousConvolution2D) (None, 672, 68, 68) 2032800 ct_conv4_1[0][0]
____________________________________________________________________________________________________
ct_fc1 (Convolution2D) (None, 672, 66, 66) 4064928 ct_conv5_1[0][0]
____________________________________________________________________________________________________
ct_final (Convolution2D) (None, 21, 66, 66) 14133 ct_fc1[0][0]
____________________________________________________________________________________________________
permute_5 (Permute) (None, 66, 66, 21) 0 ct_final[0][0]
____________________________________________________________________________________________________
reshape_5 (Reshape) (None, 4356, 21) 0 permute_5[0][0]
____________________________________________________________________________________________________
activation_3 (Activation) (None, 4356, 21) 0 reshape_5[0][0]
____________________________________________________________________________________________________
reshape_6 (Reshape) (None, 66, 66, 21) 0 activation_3[0][0]
____________________________________________________________________________________________________
permute_6 (Permute) (None, 21, 66, 66) 0 reshape_6[0][0]
====================================================================================================
Total params: 141,149,720
Trainable params: 141,149,720
Non-trainable params: 0
____________________________________________________________________________________________________
batch_size is: 1
Traceback (most recent call last):
File "<ipython-input-3-641fac717a39>", line 1, in <module>
runfile('E:/dilation-keras/predict.py', wdir='E:/dilation-keras')
File "c:\users\lenovo\anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 866, in runfile
execfile(filename, namespace)
File "c:\users\lenovo\anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 87, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)
File "E:/dilation-keras/predict.py", line 74, in <module>
y_img = predict(im, model, ds)
File "E:/dilation-keras/predict.py", line 46, in predict
prob = model.predict(model_in,batch_size=batch_size)[0]
File "c:\users\lenovo\anaconda2\lib\site-packages\keras\engine\training.py", line 1272, in predict
batch_size=batch_size, verbose=verbose)
File "c:\users\lenovo\anaconda2\lib\site-packages\keras\engine\training.py", line 945, in _predict_loop
batch_outs = f(ins_batch)
File "c:\users\lenovo\anaconda2\lib\site-packages\keras\backend\theano_backend.py", line 959, in __call__
return self.function(*inputs)
File "c:\users\lenovo\anaconda2\lib\site-packages\theano\compile\function_module.py", line 886, in __call__
storage_map=getattr(self.fn, 'storage_map', None))
File "c:\users\lenovo\anaconda2\lib\site-packages\theano\gof\link.py", line 325, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
File "c:\users\lenovo\anaconda2\lib\site-packages\theano\compile\function_module.py", line 873, in __call__
self.fn() if output_subset is None else\
RuntimeError: GpuCorrMM failed to allocate working memory of 576 x 802816
Apply node that caused the error: GpuCorrMM{valid, (1, 1), (1, 1)}(GpuContiguous.0, GpuContiguous.0)
Toposort index: 95
Inputs types: [CudaNdarrayType(float32, 4D), CudaNdarrayType(float32, 4D)]
Inputs shapes: [(1, 64, 898, 898), (64, 64, 3, 3)]
Inputs strides: [(0, 806404, 898, 1), (576, 9, 3, 1)]
Inputs values: ['not shown', 'not shown']
Outputs clients: [[GpuElemwise{Composite{(i0 * ((i1 + i2) + Abs((i1 + i2))))}}[(0, 1)](CudaNdarrayConstant{[[[[ 0.5]]]]}, GpuCorrMM{valid, (1, 1), (1, 1)}.0, GpuDimShuffle{x,0,x,x}.0)]]
HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.
The problem lies in memory. Your first ten conv2D layers need approximately 200MB each. This means that it's need 2GB for only storing output of your first 10 layers. This would surely not fit into your cards memory.