Output of vgg16 layer doesn't make sense - machine-learning

I have a vgg16 network without the last max pooling, fully connected and softmax layers. The network summary says that the last layer's output is going to have a size of (batchsize, 512, 14, 14). Putting an image into the network gives me an output of (batchsize, 512, 15, 15). How do I fix this?
import torch
import torch.nn as nn
from torchsummary import summary
vgg16 = torch.hub.load('pytorch/vision:v0.10.0', 'vgg16', pretrained=True)
vgg16withoutLastFewLayers = nn.Sequential(*list(vgg16.children())[:-2][0][0:30]).cuda()
image = torch.zeros((1,3,244,244)).cuda()
output = vgg16withoutLastFewLayers(image)
summary(vgg16withoutLastFewLayers, (3,224,224))
print(output.shape)
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 64, 224, 224] 1,792
ReLU-2 [-1, 64, 224, 224] 0
Conv2d-3 [-1, 64, 224, 224] 36,928
ReLU-4 [-1, 64, 224, 224] 0
MaxPool2d-5 [-1, 64, 112, 112] 0
Conv2d-6 [-1, 128, 112, 112] 73,856
ReLU-7 [-1, 128, 112, 112] 0
Conv2d-8 [-1, 128, 112, 112] 147,584
ReLU-9 [-1, 128, 112, 112] 0
MaxPool2d-10 [-1, 128, 56, 56] 0
Conv2d-11 [-1, 256, 56, 56] 295,168
ReLU-12 [-1, 256, 56, 56] 0
Conv2d-13 [-1, 256, 56, 56] 590,080
ReLU-14 [-1, 256, 56, 56] 0
Conv2d-15 [-1, 256, 56, 56] 590,080
ReLU-16 [-1, 256, 56, 56] 0
MaxPool2d-17 [-1, 256, 28, 28] 0
Conv2d-18 [-1, 512, 28, 28] 1,180,160
ReLU-19 [-1, 512, 28, 28] 0
Conv2d-20 [-1, 512, 28, 28] 2,359,808
ReLU-21 [-1, 512, 28, 28] 0
Conv2d-22 [-1, 512, 28, 28] 2,359,808
ReLU-23 [-1, 512, 28, 28] 0
MaxPool2d-24 [-1, 512, 14, 14] 0
Conv2d-25 [-1, 512, 14, 14] 2,359,808
ReLU-26 [-1, 512, 14, 14] 0
Conv2d-27 [-1, 512, 14, 14] 2,359,808
ReLU-28 [-1, 512, 14, 14] 0
Conv2d-29 [-1, 512, 14, 14] 2,359,808
ReLU-30 [-1, 512, 14, 14] 0
================================================================
torch.Size([1, 512, 15, 15])

The output shape should be [512, 14, 14], assuming that the input image is [3, 224, 224]. Your input image size is [3, 244, 244]. For example,
image = torch.zeros((1,3,224,224))
# torch.Size([1, 512, 14, 14])
output = vgg16withoutLastFewLayers(image)
Therefore, by increasing the image size, the spatial size [W, H] of your output tensor also increases.

Your input shapes are not the same size...
image = torch.zeros((1,3,244,244)).cuda()
output = vgg16withoutLastFewLayers(image)
summary(vgg16withoutLastFewLayers, (3,224,224))
print(output.shape)
Difference: 244 vs 224.
Because those VGG layers are only convolutional layers, when you increase the size of the input image, the output will also be increased in size. This would cause issues if there was a classification head (with no global pooling, etc) was applied directly on top of this as they have fixed-size inputs. You're not doing this, but it's something to keep in mind.

Related

Model Training Freezing StyleGAN2 in Google CoLab

I want to train a model using this dataset : https://decode.mit.edu/projects/biked/
and I followed this tutorial to do such : https://github.com/jeffheaton/present/blob/master/youtube/gan/colab_gan_train.ipynb
The Problem is once I start the command to start training the dataset the tick freezing on 0
is that normal? shout I keep it? because it's taking forever, I tried to change the worker number to 2 but still the same problem, I'm using Colab Free so I don't know, I even tried to use my own dataset but all of them are the same problem
Training options:
{
"num_gpus": 1,
"image_snapshot_ticks": 10,
"network_snapshot_ticks": 10,
"metrics": [
"fid50k_full"
],
"random_seed": 0,
"training_set_kwargs": {
"class_name": "training.dataset.ImageFolderDataset",
"path": "/content/drive/MyDrive/SquareImages.zip",
"use_labels": false,
"max_size": 42799,
"xflip": false,
"resolution": 1024
},
"data_loader_kwargs": {
"pin_memory": true,
"num_workers": 3,
"prefetch_factor": 2
},
"G_kwargs": {
"class_name": "training.networks.Generator",
"z_dim": 512,
"w_dim": 512,
"mapping_kwargs": {
"num_layers": 2
},
"synthesis_kwargs": {
"channel_base": 32768,
"channel_max": 512,
"num_fp16_res": 4,
"conv_clamp": 256
}
},
"D_kwargs": {
"class_name": "training.networks.Discriminator",
"block_kwargs": {},
"mapping_kwargs": {},
"epilogue_kwargs": {
"mbstd_group_size": 4
},
"channel_base": 32768,
"channel_max": 512,
"num_fp16_res": 4,
"conv_clamp": 256
},
"G_opt_kwargs": {
"class_name": "torch.optim.Adam",
"lr": 0.002,
"betas": [
0,
0.99
],
"eps": 1e-08
},
"D_opt_kwargs": {
"class_name": "torch.optim.Adam",
"lr": 0.002,
"betas": [
0,
0.99
],
"eps": 1e-08
},
"loss_kwargs": {
"class_name": "training.loss.StyleGAN2Loss",
"r1_gamma": 52.4288
},
"total_kimg": 25000,
"batch_size": 4,
"batch_gpu": 4,
"ema_kimg": 1.25,
"ema_rampup": 0.05,
"ada_target": 0.6,
"augment_kwargs": {
"class_name": "training.augment.AugmentPipe",
"xflip": 1,
"rotate90": 1,
"xint": 1,
"scale": 1,
"rotate": 1,
"aniso": 1,
"xfrac": 1,
"brightness": 1,
"contrast": 1,
"lumaflip": 1,
"hue": 1,
"saturation": 1
},
"run_dir": "/content/drive/MyDrive/exp/00003-SquareImages-auto1"
}
Output directory: /content/drive/MyDrive/exp/00003-SquareImages-auto1
Training data: /content/drive/MyDrive/SquareImages.zip
Training duration: 25000 kimg
Number of GPUs: 1
Number of images: 42799
Image resolution: 1024
Conditional model: False
Dataset x-flips: False
Creating output directory...
Launching processes...
Loading training set...
/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py:474: UserWarning: This DataLoader will create 3 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
warnings.warn(_create_warning_msg(
Num images: 42799
Image shape: [3, 1024, 1024]
Label shape: [0]
Constructing networks...
Setting up PyTorch plugin "bias_act_plugin"... Done.
Setting up PyTorch plugin "upfirdn2d_plugin"... Done.
Generator Parameters Buffers Output shape Datatype
--- --- --- --- ---
mapping.fc0 262656 - [4, 512] float32
mapping.fc1 262656 - [4, 512] float32
mapping - 512 [4, 18, 512] float32
synthesis.b4.conv1 2622465 32 [4, 512, 4, 4] float32
synthesis.b4.torgb 264195 - [4, 3, 4, 4] float32
synthesis.b4:0 8192 16 [4, 512, 4, 4] float32
synthesis.b4:1 - - [4, 512, 4, 4] float32
synthesis.b8.conv0 2622465 80 [4, 512, 8, 8] float32
synthesis.b8.conv1 2622465 80 [4, 512, 8, 8] float32
synthesis.b8.torgb 264195 - [4, 3, 8, 8] float32
synthesis.b8:0 - 16 [4, 512, 8, 8] float32
synthesis.b8:1 - - [4, 512, 8, 8] float32
synthesis.b16.conv0 2622465 272 [4, 512, 16, 16] float32
synthesis.b16.conv1 2622465 272 [4, 512, 16, 16] float32
synthesis.b16.torgb 264195 - [4, 3, 16, 16] float32
synthesis.b16:0 - 16 [4, 512, 16, 16] float32
synthesis.b16:1 - - [4, 512, 16, 16] float32
synthesis.b32.conv0 2622465 1040 [4, 512, 32, 32] float32
synthesis.b32.conv1 2622465 1040 [4, 512, 32, 32] float32
synthesis.b32.torgb 264195 - [4, 3, 32, 32] float32
synthesis.b32:0 - 16 [4, 512, 32, 32] float32
synthesis.b32:1 - - [4, 512, 32, 32] float32
synthesis.b64.conv0 2622465 4112 [4, 512, 64, 64] float32
synthesis.b64.conv1 2622465 4112 [4, 512, 64, 64] float32
synthesis.b64.torgb 264195 - [4, 3, 64, 64] float32
synthesis.b64:0 - 16 [4, 512, 64, 64] float32
synthesis.b64:1 - - [4, 512, 64, 64] float32
synthesis.b128.conv0 1442561 16400 [4, 256, 128, 128] float16
synthesis.b128.conv1 721409 16400 [4, 256, 128, 128] float16
synthesis.b128.torgb 132099 - [4, 3, 128, 128] float16
synthesis.b128:0 - 16 [4, 256, 128, 128] float16
synthesis.b128:1 - - [4, 256, 128, 128] float32
synthesis.b256.conv0 426369 65552 [4, 128, 256, 256] float16
synthesis.b256.conv1 213249 65552 [4, 128, 256, 256] float16
synthesis.b256.torgb 66051 - [4, 3, 256, 256] float16
synthesis.b256:0 - 16 [4, 128, 256, 256] float16
synthesis.b256:1 - - [4, 128, 256, 256] float32
synthesis.b512.conv0 139457 262160 [4, 64, 512, 512] float16
synthesis.b512.conv1 69761 262160 [4, 64, 512, 512] float16
synthesis.b512.torgb 33027 - [4, 3, 512, 512] float16
synthesis.b512:0 - 16 [4, 64, 512, 512] float16
synthesis.b512:1 - - [4, 64, 512, 512] float32
synthesis.b1024.conv0 51297 1048592 [4, 32, 1024, 1024] float16
synthesis.b1024.conv1 25665 1048592 [4, 32, 1024, 1024] float16
synthesis.b1024.torgb 16515 - [4, 3, 1024, 1024] float16
synthesis.b1024:0 - 16 [4, 32, 1024, 1024] float16
synthesis.b1024:1 - - [4, 32, 1024, 1024] float32
--- --- --- --- ---
Total 28794124 2797104 - -
Discriminator Parameters Buffers Output shape Datatype
--- --- --- --- ---
b1024.fromrgb 128 16 [4, 32, 1024, 1024] float16
b1024.skip 2048 16 [4, 64, 512, 512] float16
b1024.conv0 9248 16 [4, 32, 1024, 1024] float16
b1024.conv1 18496 16 [4, 64, 512, 512] float16
b1024 - 16 [4, 64, 512, 512] float16
b512.skip 8192 16 [4, 128, 256, 256] float16
b512.conv0 36928 16 [4, 64, 512, 512] float16
b512.conv1 73856 16 [4, 128, 256, 256] float16
b512 - 16 [4, 128, 256, 256] float16
b256.skip 32768 16 [4, 256, 128, 128] float16
b256.conv0 147584 16 [4, 128, 256, 256] float16
b256.conv1 295168 16 [4, 256, 128, 128] float16
b256 - 16 [4, 256, 128, 128] float16
b128.skip 131072 16 [4, 512, 64, 64] float16
b128.conv0 590080 16 [4, 256, 128, 128] float16
b128.conv1 1180160 16 [4, 512, 64, 64] float16
b128 - 16 [4, 512, 64, 64] float16
b64.skip 262144 16 [4, 512, 32, 32] float32
b64.conv0 2359808 16 [4, 512, 64, 64] float32
b64.conv1 2359808 16 [4, 512, 32, 32] float32
b64 - 16 [4, 512, 32, 32] float32
b32.skip 262144 16 [4, 512, 16, 16] float32
b32.conv0 2359808 16 [4, 512, 32, 32] float32
b32.conv1 2359808 16 [4, 512, 16, 16] float32
b32 - 16 [4, 512, 16, 16] float32
b16.skip 262144 16 [4, 512, 8, 8] float32
b16.conv0 2359808 16 [4, 512, 16, 16] float32
b16.conv1 2359808 16 [4, 512, 8, 8] float32
b16 - 16 [4, 512, 8, 8] float32
b8.skip 262144 16 [4, 512, 4, 4] float32
b8.conv0 2359808 16 [4, 512, 8, 8] float32
b8.conv1 2359808 16 [4, 512, 4, 4] float32
b8 - 16 [4, 512, 4, 4] float32
b4.mbstd - - [4, 513, 4, 4] float32
b4.conv 2364416 16 [4, 512, 4, 4] float32
b4.fc 4194816 - [4, 512] float32
b4.out 513 - [4, 1] float32
--- --- --- --- ---
Total 29012513 544 - -
Setting up augmentation...
Distributing across 1 GPUs...
Setting up training phases...
Exporting sample images...
Initializing logs...
Training for 25000 kimg...
tick 0 kimg 0.0 time 1m 31s sec/tick 14.4 sec/kimg 3595.75 maintenance 76.8 cpumem 4.82 gpumem 11.32 augment 0.000
Evaluating metrics...
/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py:474: UserWarning: This DataLoader will create 3 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
warnings.warn(_create_warning_msg(
Train Model using StyleGAN2 in Google CoLab - Model Training Freezing

How to prevent my model from outputting zero-vectors while training using one-hot encoded vectors?

I have been training a model for a study on one-shot learning.
It has 19280 examples in the training dataset (basically the popular Omniglot dataset), and a 300-length vector for each data sample.
The model consists of the following architecture-
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 32, 50, 50] 1,600
BatchNorm2d-2 [-1, 32, 50, 50] 64
ReLU-3 [-1, 32, 50, 50] 0
Conv2d-4 [-1, 32, 50, 50] 9,248
BatchNorm2d-5 [-1, 32, 50, 50] 64
ReLU-6 [-1, 32, 50, 50] 0
Conv2d-7 [-1, 32, 50, 50] 9,248
BatchNorm2d-8 [-1, 32, 50, 50] 64
ReLU-9 [-1, 32, 50, 50] 0
Conv2d-10 [-1, 64, 24, 24] 18,496
BatchNorm2d-11 [-1, 64, 24, 24] 128
ReLU-12 [-1, 64, 24, 24] 0
Conv2d-13 [-1, 64, 24, 24] 36,928
BatchNorm2d-14 [-1, 64, 24, 24] 128
ReLU-15 [-1, 64, 24, 24] 0
Conv2d-16 [-1, 256, 11, 11] 147,712
BatchNorm2d-17 [-1, 256, 11, 11] 512
ReLU-18 [-1, 256, 11, 11] 0
Conv2d-19 [-1, 512, 5, 5] 1,180,160
BatchNorm2d-20 [-1, 512, 5, 5] 1,024
ReLU-21 [-1, 512, 5, 5] 0
Conv2d-22 [-1, 1024, 2, 2] 4,719,616
BatchNorm2d-23 [-1, 1024, 2, 2] 2,048
ReLU-24 [-1, 1024, 2, 2] 0
Linear-25 [-1, 300] 1,229,100
================================================================
Total params: 7,356,140
Trainable params: 7,356,140
Non-trainable params: 0
----------------------------------------------------------------
Basically the input is a 105x105 image, single channel (grayscale). I have applied sigmoid on the output.
While I train the model, I use simple Mean-squared error. I use the Adam optimizer with the learning rate set to $10^{-5}$ , to help tuning.
The model gets struck at the same constant loss after 2 or 3 epochs.
On further investigating, I found out that the model generalizes and makes outputs a zero vector in each case. I am assuming it is stuck on the local minima, but how do I go about training my model successfully?
Also, the model architecture has been chosen randomly (not literally, but no particular logic behind the dimensionality retained at the end of each layer), so please advise me if you think there is any present irregularity that would better training after rectification, in terms of layers.
I would love to hear some tips :) .

How can I determine the input dimensions for a caffe blob

I'm trying to print out some diagnostics for a caffe net, but although I can find the shape of the data output by a blob, I cannot directly find the shape of the expected input data. For example:
nb = self.net.blobs # nb is an OrderedDict of the blob objects
that make up a VGG16 net
for ctr, name in enumerate(nb):
print ctr, name, nb[name].data.shape
0 data (10, 3, 224, 224)
1 conv1_1 (10, 64, 224, 224)
2 conv1_2 (10, 64, 224, 224)
3 pool1 (10, 64, 112, 112)
4 conv2_1 (10, 128, 112, 112)
5 conv2_2 (10, 128, 112, 112)
6 pool2 (10, 128, 56, 56)
7 conv3_1 (10, 256, 56, 56)
8 conv3_2 (10, 256, 56, 56)
9 conv3_3 (10, 256, 56, 56)
10 pool3 (10, 256, 28, 28)
11 conv4_1 (10, 512, 28, 28)
12 conv4_2 (10, 512, 28, 28)
13 conv4_3 (10, 512, 28, 28)
14 pool4 (10, 512, 14, 14)
15 conv5_1 (10, 512, 14, 14)
16 conv5_2 (10, 512, 14, 14)
17 conv5_3 (10, 512, 14, 14)
18 pool5 (10, 512, 7, 7)
19 fc6 (10, 4096)
20 fc7 (10, 4096)
21 fc8a (10, 365)
22 prob (10, 365)
How can I change this code so that the output is of the form:
layer_number layer_name input_shape output_shape
without directly querying the parent layer to see what output it gives?
You can modify the code in this answer to iterate the net layer by layer:
def dont_forget_to_thank_me_later(net):
for li in xrange(len(net.layers)): # for each layer in the net
print "{}\t{}\t".format(li, net._layer_names[li]),
# for each input to the layer (aka "bottom") print its name and shape
for bi in list(net._bottom_ids(li)):
print "{} ({}) ".format(net._blob_names[bi], net.blobs[net._blob_names[bi]].data.shape),
print "\t"
# for each output of the layer (aka "top") print its name and shape
for bi in list(net._top_ids(li)):
print "{} ({}) ".format(net._blob_names[bi], net.blobs[net._blob_names[bi]].data.shape)
print "" # end of line
Note that a layer may have more than one input, or more than one output...

Keras RuntimeError: GpuCorrMM failed to allocate working memory of 576 x 802816

I tryed to run a deep learning code in Keras but got following error message all the time. I've searched all around and spend much time but still failed to fix it up. I'm a fish, any help will be appreciated!!
runfile('E:/dilation-keras/predict.py', wdir='E:/dilation-keras')
Using Theano backend.
Using gpu device 0: GeForce GT 635M (CNMeM is enabled with initial size: 90.0% of memory, cuDNN not available)
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
input_3 (InputLayer) (None, 3, 900, 900) 0
____________________________________________________________________________________________________
conv1_1 (Convolution2D) (None, 64, 898, 898) 1792 input_3[0][0]
____________________________________________________________________________________________________
conv1_2 (Convolution2D) (None, 64, 896, 896) 36928 conv1_1[0][0]
____________________________________________________________________________________________________
pool1 (MaxPooling2D) (None, 64, 448, 448) 0 conv1_2[0][0]
____________________________________________________________________________________________________
conv2_1 (Convolution2D) (None, 128, 446, 446) 73856 pool1[0][0]
____________________________________________________________________________________________________
conv2_2 (Convolution2D) (None, 128, 444, 444) 147584 conv2_1[0][0]
____________________________________________________________________________________________________
pool2 (MaxPooling2D) (None, 128, 222, 222) 0 conv2_2[0][0]
____________________________________________________________________________________________________
conv3_1 (Convolution2D) (None, 256, 220, 220) 295168 pool2[0][0]
____________________________________________________________________________________________________
conv3_2 (Convolution2D) (None, 256, 218, 218) 590080 conv3_1[0][0]
____________________________________________________________________________________________________
conv3_3 (Convolution2D) (None, 256, 216, 216) 590080 conv3_2[0][0]
____________________________________________________________________________________________________
pool3 (MaxPooling2D) (None, 256, 108, 108) 0 conv3_3[0][0]
____________________________________________________________________________________________________
conv4_1 (Convolution2D) (None, 512, 106, 106) 1180160 pool3[0][0]
____________________________________________________________________________________________________
conv4_2 (Convolution2D) (None, 512, 104, 104) 2359808 conv4_1[0][0]
____________________________________________________________________________________________________
conv4_3 (Convolution2D) (None, 512, 102, 102) 2359808 conv4_2[0][0]
____________________________________________________________________________________________________
conv5_1 (AtrousConvolution2D) (None, 512, 98, 98) 2359808 conv4_3[0][0]
____________________________________________________________________________________________________
conv5_2 (AtrousConvolution2D) (None, 512, 94, 94) 2359808 conv5_1[0][0]
____________________________________________________________________________________________________
conv5_3 (AtrousConvolution2D) (None, 512, 90, 90) 2359808 conv5_2[0][0]
____________________________________________________________________________________________________
fc6 (AtrousConvolution2D) (None, 4096, 66, 66) 102764544 conv5_3[0][0]
____________________________________________________________________________________________________
drop6 (Dropout) (None, 4096, 66, 66) 0 fc6[0][0]
____________________________________________________________________________________________________
fc7 (Convolution2D) (None, 4096, 66, 66) 16781312 drop6[0][0]
____________________________________________________________________________________________________
drop7 (Dropout) (None, 4096, 66, 66) 0 fc7[0][0]
____________________________________________________________________________________________________
fc-final (Convolution2D) (None, 21, 66, 66) 86037 drop7[0][0]
____________________________________________________________________________________________________
zeropadding2d_3 (ZeroPadding2D) (None, 21, 132, 132) 0 fc-final[0][0]
____________________________________________________________________________________________________
ct_conv1_1 (Convolution2D) (None, 42, 130, 130) 7980 zeropadding2d_3[0][0]
____________________________________________________________________________________________________
ct_conv1_2 (Convolution2D) (None, 42, 128, 128) 15918 ct_conv1_1[0][0]
____________________________________________________________________________________________________
ct_conv2_1 (AtrousConvolution2D) (None, 84, 124, 124) 31836 ct_conv1_2[0][0]
____________________________________________________________________________________________________
ct_conv3_1 (AtrousConvolution2D) (None, 168, 116, 116) 127176 ct_conv2_1[0][0]
____________________________________________________________________________________________________
ct_conv4_1 (AtrousConvolution2D) (None, 336, 100, 100) 508368 ct_conv3_1[0][0]
____________________________________________________________________________________________________
ct_conv5_1 (AtrousConvolution2D) (None, 672, 68, 68) 2032800 ct_conv4_1[0][0]
____________________________________________________________________________________________________
ct_fc1 (Convolution2D) (None, 672, 66, 66) 4064928 ct_conv5_1[0][0]
____________________________________________________________________________________________________
ct_final (Convolution2D) (None, 21, 66, 66) 14133 ct_fc1[0][0]
____________________________________________________________________________________________________
permute_5 (Permute) (None, 66, 66, 21) 0 ct_final[0][0]
____________________________________________________________________________________________________
reshape_5 (Reshape) (None, 4356, 21) 0 permute_5[0][0]
____________________________________________________________________________________________________
activation_3 (Activation) (None, 4356, 21) 0 reshape_5[0][0]
____________________________________________________________________________________________________
reshape_6 (Reshape) (None, 66, 66, 21) 0 activation_3[0][0]
____________________________________________________________________________________________________
permute_6 (Permute) (None, 21, 66, 66) 0 reshape_6[0][0]
====================================================================================================
Total params: 141,149,720
Trainable params: 141,149,720
Non-trainable params: 0
____________________________________________________________________________________________________
batch_size is: 1
Traceback (most recent call last):
File "<ipython-input-3-641fac717a39>", line 1, in <module>
runfile('E:/dilation-keras/predict.py', wdir='E:/dilation-keras')
File "c:\users\lenovo\anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 866, in runfile
execfile(filename, namespace)
File "c:\users\lenovo\anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 87, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)
File "E:/dilation-keras/predict.py", line 74, in <module>
y_img = predict(im, model, ds)
File "E:/dilation-keras/predict.py", line 46, in predict
prob = model.predict(model_in,batch_size=batch_size)[0]
File "c:\users\lenovo\anaconda2\lib\site-packages\keras\engine\training.py", line 1272, in predict
batch_size=batch_size, verbose=verbose)
File "c:\users\lenovo\anaconda2\lib\site-packages\keras\engine\training.py", line 945, in _predict_loop
batch_outs = f(ins_batch)
File "c:\users\lenovo\anaconda2\lib\site-packages\keras\backend\theano_backend.py", line 959, in __call__
return self.function(*inputs)
File "c:\users\lenovo\anaconda2\lib\site-packages\theano\compile\function_module.py", line 886, in __call__
storage_map=getattr(self.fn, 'storage_map', None))
File "c:\users\lenovo\anaconda2\lib\site-packages\theano\gof\link.py", line 325, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
File "c:\users\lenovo\anaconda2\lib\site-packages\theano\compile\function_module.py", line 873, in __call__
self.fn() if output_subset is None else\
RuntimeError: GpuCorrMM failed to allocate working memory of 576 x 802816
Apply node that caused the error: GpuCorrMM{valid, (1, 1), (1, 1)}(GpuContiguous.0, GpuContiguous.0)
Toposort index: 95
Inputs types: [CudaNdarrayType(float32, 4D), CudaNdarrayType(float32, 4D)]
Inputs shapes: [(1, 64, 898, 898), (64, 64, 3, 3)]
Inputs strides: [(0, 806404, 898, 1), (576, 9, 3, 1)]
Inputs values: ['not shown', 'not shown']
Outputs clients: [[GpuElemwise{Composite{(i0 * ((i1 + i2) + Abs((i1 + i2))))}}[(0, 1)](CudaNdarrayConstant{[[[[ 0.5]]]]}, GpuCorrMM{valid, (1, 1), (1, 1)}.0, GpuDimShuffle{x,0,x,x}.0)]]
HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.
The problem lies in memory. Your first ten conv2D layers need approximately 200MB each. This means that it's need 2GB for only storing output of your first 10 layers. This would surely not fit into your cards memory.

Depth Estimation using Keras

I'm trying to design a Convolutional Net to estimate the Depth of images using Keras.
I have RGB Input images with the shape of 3x120x160 and have the Grayscale Output Depth Maps with the shape of 1x120x160.
I tried using a VGG like architecture where the depth of each layer grows but at the end when I want to design the final layers, I get stuck. using a Dense layer is too expensive and I tried using Upsampling which proved inefficient.
I want to use DeConvolution2D but I can't get it to work. the only architecture I end up is something like this:
model = Sequential()
model.add(Convolution2D(64, 5, 5, activation='relu', input_shape=(3, 120, 160)))
model.add(Convolution2D(64, 5, 5, activation='relu'))
model.add(MaxPooling2D())
model.add(Dropout(0.5))
model.add(Convolution2D(128, 3, 3, activation='relu'))
model.add(Convolution2D(128, 3, 3, activation='relu'))
model.add(MaxPooling2D())
model.add(Dropout(0.5))
model.add(Convolution2D(256, 3, 3, activation='relu'))
model.add(Convolution2D(256, 3, 3, activation='relu'))
model.add(Dropout(0.5))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(Dropout(0.5))
model.add(ZeroPadding2D())
model.add(Deconvolution2D(512, 3, 3, (None, 512, 41, 61), subsample=(2, 2), activation='relu'))
model.add(Deconvolution2D(512, 3, 3, (None, 512, 123, 183), subsample=(3, 3), activation='relu'))
model.add(cropping.Cropping2D(cropping=((1, 2), (11, 12))))
model.add(Convolution2D(1, 1, 1, activation='sigmoid', border_mode='same'))
The Model summary is like this :
Layer (type) Output Shape Param # Connected to
====================================================================================================
convolution2d_1 (Convolution2D) (None, 64, 116, 156) 4864 convolution2d_input_1[0][0]
____________________________________________________________________________________________________
convolution2d_2 (Convolution2D) (None, 64, 112, 152) 102464 convolution2d_1[0][0]
____________________________________________________________________________________________________
maxpooling2d_1 (MaxPooling2D) (None, 64, 56, 76) 0 convolution2d_2[0][0]
____________________________________________________________________________________________________
dropout_1 (Dropout) (None, 64, 56, 76) 0 maxpooling2d_1[0][0]
____________________________________________________________________________________________________
convolution2d_3 (Convolution2D) (None, 128, 54, 74) 73856 dropout_1[0][0]
____________________________________________________________________________________________________
convolution2d_4 (Convolution2D) (None, 128, 52, 72) 147584 convolution2d_3[0][0]
____________________________________________________________________________________________________
maxpooling2d_2 (MaxPooling2D) (None, 128, 26, 36) 0 convolution2d_4[0][0]
____________________________________________________________________________________________________
dropout_2 (Dropout) (None, 128, 26, 36) 0 maxpooling2d_2[0][0]
____________________________________________________________________________________________________
convolution2d_5 (Convolution2D) (None, 256, 24, 34) 295168 dropout_2[0][0]
____________________________________________________________________________________________________
convolution2d_6 (Convolution2D) (None, 256, 22, 32) 590080 convolution2d_5[0][0]
____________________________________________________________________________________________________
dropout_3 (Dropout) (None, 256, 22, 32) 0 convolution2d_6[0][0]
____________________________________________________________________________________________________
convolution2d_7 (Convolution2D) (None, 512, 20, 30) 1180160 dropout_3[0][0]
____________________________________________________________________________________________________
convolution2d_8 (Convolution2D) (None, 512, 18, 28) 2359808 convolution2d_7[0][0]
____________________________________________________________________________________________________
dropout_4 (Dropout) (None, 512, 18, 28) 0 convolution2d_8[0][0]
____________________________________________________________________________________________________
zeropadding2d_1 (ZeroPadding2D) (None, 512, 20, 30) 0 dropout_4[0][0]
____________________________________________________________________________________________________
deconvolution2d_1 (Deconvolution2(None, 512, 41, 61) 2359808 zeropadding2d_1[0][0]
____________________________________________________________________________________________________
deconvolution2d_2 (Deconvolution2(None, 512, 123, 183) 2359808 deconvolution2d_1[0][0]
____________________________________________________________________________________________________
cropping2d_1 (Cropping2D) (None, 512, 120, 160) 0 deconvolution2d_2[0][0]
____________________________________________________________________________________________________
convolution2d_9 (Convolution2D) (None, 1, 120, 160) 513 cropping2d_1[0][0]
====================================================================================================
Total params: 9474113
I couldn't reduce the size of Deconvolution2D layers from 512 as doing so results in shape related errors and it seems I have to add as many Deconvolution2D layers as the number of filters in the previous layer.
I also had to add a final Convolution2D layer to be able to run the network.
The above architecture learns but really slow and (I think) inefficiently. I'm sure I'm doing something wrong and the design shouldn't be like this. Can you help me design a better network?
I also tried to make a network as the one mentioned in this repository but it seems Keras doesn't work as this Lasagne example does. I'd really appreciate it if someone could show me how to design something like this network in Keras. It's architecture is like this :
Thanks
I'd suggest a U-Net (see figure 1). In the first half of a U-Net, the spatial resolution gets reduced as the number of channels increases (like VGG, as you mentioned). In the second half, the opposite happens, (number of channels get reduced, resolution increases). "Skip" connections between different layers allow for the network to efficiently produce high-resolution output.
You should be able to find an appropriate Keras implementation (maybe this one).

Resources