How to prevent my model from outputting zero-vectors while training using one-hot encoded vectors? - machine-learning

I have been training a model for a study on one-shot learning.
It has 19280 examples in the training dataset (basically the popular Omniglot dataset), and a 300-length vector for each data sample.
The model consists of the following architecture-
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 32, 50, 50] 1,600
BatchNorm2d-2 [-1, 32, 50, 50] 64
ReLU-3 [-1, 32, 50, 50] 0
Conv2d-4 [-1, 32, 50, 50] 9,248
BatchNorm2d-5 [-1, 32, 50, 50] 64
ReLU-6 [-1, 32, 50, 50] 0
Conv2d-7 [-1, 32, 50, 50] 9,248
BatchNorm2d-8 [-1, 32, 50, 50] 64
ReLU-9 [-1, 32, 50, 50] 0
Conv2d-10 [-1, 64, 24, 24] 18,496
BatchNorm2d-11 [-1, 64, 24, 24] 128
ReLU-12 [-1, 64, 24, 24] 0
Conv2d-13 [-1, 64, 24, 24] 36,928
BatchNorm2d-14 [-1, 64, 24, 24] 128
ReLU-15 [-1, 64, 24, 24] 0
Conv2d-16 [-1, 256, 11, 11] 147,712
BatchNorm2d-17 [-1, 256, 11, 11] 512
ReLU-18 [-1, 256, 11, 11] 0
Conv2d-19 [-1, 512, 5, 5] 1,180,160
BatchNorm2d-20 [-1, 512, 5, 5] 1,024
ReLU-21 [-1, 512, 5, 5] 0
Conv2d-22 [-1, 1024, 2, 2] 4,719,616
BatchNorm2d-23 [-1, 1024, 2, 2] 2,048
ReLU-24 [-1, 1024, 2, 2] 0
Linear-25 [-1, 300] 1,229,100
================================================================
Total params: 7,356,140
Trainable params: 7,356,140
Non-trainable params: 0
----------------------------------------------------------------
Basically the input is a 105x105 image, single channel (grayscale). I have applied sigmoid on the output.
While I train the model, I use simple Mean-squared error. I use the Adam optimizer with the learning rate set to $10^{-5}$ , to help tuning.
The model gets struck at the same constant loss after 2 or 3 epochs.
On further investigating, I found out that the model generalizes and makes outputs a zero vector in each case. I am assuming it is stuck on the local minima, but how do I go about training my model successfully?
Also, the model architecture has been chosen randomly (not literally, but no particular logic behind the dimensionality retained at the end of each layer), so please advise me if you think there is any present irregularity that would better training after rectification, in terms of layers.
I would love to hear some tips :) .

Related

Model Training Freezing StyleGAN2 in Google CoLab

I want to train a model using this dataset : https://decode.mit.edu/projects/biked/
and I followed this tutorial to do such : https://github.com/jeffheaton/present/blob/master/youtube/gan/colab_gan_train.ipynb
The Problem is once I start the command to start training the dataset the tick freezing on 0
is that normal? shout I keep it? because it's taking forever, I tried to change the worker number to 2 but still the same problem, I'm using Colab Free so I don't know, I even tried to use my own dataset but all of them are the same problem
Training options:
{
"num_gpus": 1,
"image_snapshot_ticks": 10,
"network_snapshot_ticks": 10,
"metrics": [
"fid50k_full"
],
"random_seed": 0,
"training_set_kwargs": {
"class_name": "training.dataset.ImageFolderDataset",
"path": "/content/drive/MyDrive/SquareImages.zip",
"use_labels": false,
"max_size": 42799,
"xflip": false,
"resolution": 1024
},
"data_loader_kwargs": {
"pin_memory": true,
"num_workers": 3,
"prefetch_factor": 2
},
"G_kwargs": {
"class_name": "training.networks.Generator",
"z_dim": 512,
"w_dim": 512,
"mapping_kwargs": {
"num_layers": 2
},
"synthesis_kwargs": {
"channel_base": 32768,
"channel_max": 512,
"num_fp16_res": 4,
"conv_clamp": 256
}
},
"D_kwargs": {
"class_name": "training.networks.Discriminator",
"block_kwargs": {},
"mapping_kwargs": {},
"epilogue_kwargs": {
"mbstd_group_size": 4
},
"channel_base": 32768,
"channel_max": 512,
"num_fp16_res": 4,
"conv_clamp": 256
},
"G_opt_kwargs": {
"class_name": "torch.optim.Adam",
"lr": 0.002,
"betas": [
0,
0.99
],
"eps": 1e-08
},
"D_opt_kwargs": {
"class_name": "torch.optim.Adam",
"lr": 0.002,
"betas": [
0,
0.99
],
"eps": 1e-08
},
"loss_kwargs": {
"class_name": "training.loss.StyleGAN2Loss",
"r1_gamma": 52.4288
},
"total_kimg": 25000,
"batch_size": 4,
"batch_gpu": 4,
"ema_kimg": 1.25,
"ema_rampup": 0.05,
"ada_target": 0.6,
"augment_kwargs": {
"class_name": "training.augment.AugmentPipe",
"xflip": 1,
"rotate90": 1,
"xint": 1,
"scale": 1,
"rotate": 1,
"aniso": 1,
"xfrac": 1,
"brightness": 1,
"contrast": 1,
"lumaflip": 1,
"hue": 1,
"saturation": 1
},
"run_dir": "/content/drive/MyDrive/exp/00003-SquareImages-auto1"
}
Output directory: /content/drive/MyDrive/exp/00003-SquareImages-auto1
Training data: /content/drive/MyDrive/SquareImages.zip
Training duration: 25000 kimg
Number of GPUs: 1
Number of images: 42799
Image resolution: 1024
Conditional model: False
Dataset x-flips: False
Creating output directory...
Launching processes...
Loading training set...
/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py:474: UserWarning: This DataLoader will create 3 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
warnings.warn(_create_warning_msg(
Num images: 42799
Image shape: [3, 1024, 1024]
Label shape: [0]
Constructing networks...
Setting up PyTorch plugin "bias_act_plugin"... Done.
Setting up PyTorch plugin "upfirdn2d_plugin"... Done.
Generator Parameters Buffers Output shape Datatype
--- --- --- --- ---
mapping.fc0 262656 - [4, 512] float32
mapping.fc1 262656 - [4, 512] float32
mapping - 512 [4, 18, 512] float32
synthesis.b4.conv1 2622465 32 [4, 512, 4, 4] float32
synthesis.b4.torgb 264195 - [4, 3, 4, 4] float32
synthesis.b4:0 8192 16 [4, 512, 4, 4] float32
synthesis.b4:1 - - [4, 512, 4, 4] float32
synthesis.b8.conv0 2622465 80 [4, 512, 8, 8] float32
synthesis.b8.conv1 2622465 80 [4, 512, 8, 8] float32
synthesis.b8.torgb 264195 - [4, 3, 8, 8] float32
synthesis.b8:0 - 16 [4, 512, 8, 8] float32
synthesis.b8:1 - - [4, 512, 8, 8] float32
synthesis.b16.conv0 2622465 272 [4, 512, 16, 16] float32
synthesis.b16.conv1 2622465 272 [4, 512, 16, 16] float32
synthesis.b16.torgb 264195 - [4, 3, 16, 16] float32
synthesis.b16:0 - 16 [4, 512, 16, 16] float32
synthesis.b16:1 - - [4, 512, 16, 16] float32
synthesis.b32.conv0 2622465 1040 [4, 512, 32, 32] float32
synthesis.b32.conv1 2622465 1040 [4, 512, 32, 32] float32
synthesis.b32.torgb 264195 - [4, 3, 32, 32] float32
synthesis.b32:0 - 16 [4, 512, 32, 32] float32
synthesis.b32:1 - - [4, 512, 32, 32] float32
synthesis.b64.conv0 2622465 4112 [4, 512, 64, 64] float32
synthesis.b64.conv1 2622465 4112 [4, 512, 64, 64] float32
synthesis.b64.torgb 264195 - [4, 3, 64, 64] float32
synthesis.b64:0 - 16 [4, 512, 64, 64] float32
synthesis.b64:1 - - [4, 512, 64, 64] float32
synthesis.b128.conv0 1442561 16400 [4, 256, 128, 128] float16
synthesis.b128.conv1 721409 16400 [4, 256, 128, 128] float16
synthesis.b128.torgb 132099 - [4, 3, 128, 128] float16
synthesis.b128:0 - 16 [4, 256, 128, 128] float16
synthesis.b128:1 - - [4, 256, 128, 128] float32
synthesis.b256.conv0 426369 65552 [4, 128, 256, 256] float16
synthesis.b256.conv1 213249 65552 [4, 128, 256, 256] float16
synthesis.b256.torgb 66051 - [4, 3, 256, 256] float16
synthesis.b256:0 - 16 [4, 128, 256, 256] float16
synthesis.b256:1 - - [4, 128, 256, 256] float32
synthesis.b512.conv0 139457 262160 [4, 64, 512, 512] float16
synthesis.b512.conv1 69761 262160 [4, 64, 512, 512] float16
synthesis.b512.torgb 33027 - [4, 3, 512, 512] float16
synthesis.b512:0 - 16 [4, 64, 512, 512] float16
synthesis.b512:1 - - [4, 64, 512, 512] float32
synthesis.b1024.conv0 51297 1048592 [4, 32, 1024, 1024] float16
synthesis.b1024.conv1 25665 1048592 [4, 32, 1024, 1024] float16
synthesis.b1024.torgb 16515 - [4, 3, 1024, 1024] float16
synthesis.b1024:0 - 16 [4, 32, 1024, 1024] float16
synthesis.b1024:1 - - [4, 32, 1024, 1024] float32
--- --- --- --- ---
Total 28794124 2797104 - -
Discriminator Parameters Buffers Output shape Datatype
--- --- --- --- ---
b1024.fromrgb 128 16 [4, 32, 1024, 1024] float16
b1024.skip 2048 16 [4, 64, 512, 512] float16
b1024.conv0 9248 16 [4, 32, 1024, 1024] float16
b1024.conv1 18496 16 [4, 64, 512, 512] float16
b1024 - 16 [4, 64, 512, 512] float16
b512.skip 8192 16 [4, 128, 256, 256] float16
b512.conv0 36928 16 [4, 64, 512, 512] float16
b512.conv1 73856 16 [4, 128, 256, 256] float16
b512 - 16 [4, 128, 256, 256] float16
b256.skip 32768 16 [4, 256, 128, 128] float16
b256.conv0 147584 16 [4, 128, 256, 256] float16
b256.conv1 295168 16 [4, 256, 128, 128] float16
b256 - 16 [4, 256, 128, 128] float16
b128.skip 131072 16 [4, 512, 64, 64] float16
b128.conv0 590080 16 [4, 256, 128, 128] float16
b128.conv1 1180160 16 [4, 512, 64, 64] float16
b128 - 16 [4, 512, 64, 64] float16
b64.skip 262144 16 [4, 512, 32, 32] float32
b64.conv0 2359808 16 [4, 512, 64, 64] float32
b64.conv1 2359808 16 [4, 512, 32, 32] float32
b64 - 16 [4, 512, 32, 32] float32
b32.skip 262144 16 [4, 512, 16, 16] float32
b32.conv0 2359808 16 [4, 512, 32, 32] float32
b32.conv1 2359808 16 [4, 512, 16, 16] float32
b32 - 16 [4, 512, 16, 16] float32
b16.skip 262144 16 [4, 512, 8, 8] float32
b16.conv0 2359808 16 [4, 512, 16, 16] float32
b16.conv1 2359808 16 [4, 512, 8, 8] float32
b16 - 16 [4, 512, 8, 8] float32
b8.skip 262144 16 [4, 512, 4, 4] float32
b8.conv0 2359808 16 [4, 512, 8, 8] float32
b8.conv1 2359808 16 [4, 512, 4, 4] float32
b8 - 16 [4, 512, 4, 4] float32
b4.mbstd - - [4, 513, 4, 4] float32
b4.conv 2364416 16 [4, 512, 4, 4] float32
b4.fc 4194816 - [4, 512] float32
b4.out 513 - [4, 1] float32
--- --- --- --- ---
Total 29012513 544 - -
Setting up augmentation...
Distributing across 1 GPUs...
Setting up training phases...
Exporting sample images...
Initializing logs...
Training for 25000 kimg...
tick 0 kimg 0.0 time 1m 31s sec/tick 14.4 sec/kimg 3595.75 maintenance 76.8 cpumem 4.82 gpumem 11.32 augment 0.000
Evaluating metrics...
/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py:474: UserWarning: This DataLoader will create 3 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
warnings.warn(_create_warning_msg(
Train Model using StyleGAN2 in Google CoLab - Model Training Freezing

Output of vgg16 layer doesn't make sense

I have a vgg16 network without the last max pooling, fully connected and softmax layers. The network summary says that the last layer's output is going to have a size of (batchsize, 512, 14, 14). Putting an image into the network gives me an output of (batchsize, 512, 15, 15). How do I fix this?
import torch
import torch.nn as nn
from torchsummary import summary
vgg16 = torch.hub.load('pytorch/vision:v0.10.0', 'vgg16', pretrained=True)
vgg16withoutLastFewLayers = nn.Sequential(*list(vgg16.children())[:-2][0][0:30]).cuda()
image = torch.zeros((1,3,244,244)).cuda()
output = vgg16withoutLastFewLayers(image)
summary(vgg16withoutLastFewLayers, (3,224,224))
print(output.shape)
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 64, 224, 224] 1,792
ReLU-2 [-1, 64, 224, 224] 0
Conv2d-3 [-1, 64, 224, 224] 36,928
ReLU-4 [-1, 64, 224, 224] 0
MaxPool2d-5 [-1, 64, 112, 112] 0
Conv2d-6 [-1, 128, 112, 112] 73,856
ReLU-7 [-1, 128, 112, 112] 0
Conv2d-8 [-1, 128, 112, 112] 147,584
ReLU-9 [-1, 128, 112, 112] 0
MaxPool2d-10 [-1, 128, 56, 56] 0
Conv2d-11 [-1, 256, 56, 56] 295,168
ReLU-12 [-1, 256, 56, 56] 0
Conv2d-13 [-1, 256, 56, 56] 590,080
ReLU-14 [-1, 256, 56, 56] 0
Conv2d-15 [-1, 256, 56, 56] 590,080
ReLU-16 [-1, 256, 56, 56] 0
MaxPool2d-17 [-1, 256, 28, 28] 0
Conv2d-18 [-1, 512, 28, 28] 1,180,160
ReLU-19 [-1, 512, 28, 28] 0
Conv2d-20 [-1, 512, 28, 28] 2,359,808
ReLU-21 [-1, 512, 28, 28] 0
Conv2d-22 [-1, 512, 28, 28] 2,359,808
ReLU-23 [-1, 512, 28, 28] 0
MaxPool2d-24 [-1, 512, 14, 14] 0
Conv2d-25 [-1, 512, 14, 14] 2,359,808
ReLU-26 [-1, 512, 14, 14] 0
Conv2d-27 [-1, 512, 14, 14] 2,359,808
ReLU-28 [-1, 512, 14, 14] 0
Conv2d-29 [-1, 512, 14, 14] 2,359,808
ReLU-30 [-1, 512, 14, 14] 0
================================================================
torch.Size([1, 512, 15, 15])
The output shape should be [512, 14, 14], assuming that the input image is [3, 224, 224]. Your input image size is [3, 244, 244]. For example,
image = torch.zeros((1,3,224,224))
# torch.Size([1, 512, 14, 14])
output = vgg16withoutLastFewLayers(image)
Therefore, by increasing the image size, the spatial size [W, H] of your output tensor also increases.
Your input shapes are not the same size...
image = torch.zeros((1,3,244,244)).cuda()
output = vgg16withoutLastFewLayers(image)
summary(vgg16withoutLastFewLayers, (3,224,224))
print(output.shape)
Difference: 244 vs 224.
Because those VGG layers are only convolutional layers, when you increase the size of the input image, the output will also be increased in size. This would cause issues if there was a classification head (with no global pooling, etc) was applied directly on top of this as they have fixed-size inputs. You're not doing this, but it's something to keep in mind.

InvalidArgumentError: Incompatible shapes: [15,3] vs. [100,3]

I have a dataset with more than 4000 images and 3 classes, and I'm reusing a code for capsule neural network with 10 classes but I modified it to 3 classes, when I'm running the model the following error occurs at the last point of the first epoch (44/45):
Epoch 1/16
44/45 [============================>.] - ETA: 28s - loss: 0.2304 - capsnet_loss: 0.2303 - decoder_loss: 0.2104 - capsnet_accuracy: 0.6598 - decoder_accuracy: 0.5781
InvalidArgumentError: Incompatible shapes: [15,3] vs. [100,3]
[[node gradient_tape/margin_loss/mul/Mul (defined at <ipython-input-22-9d913bd0e1fd>:11) ]] [Op:__inference_train_function_6157]
Function call stack:
train_function
Training code:
m = 100
epochs = 16
# Using EarlyStopping, end training when val_accuracy is not improved for 10 consecutive times
early_stopping = keras.callbacks.EarlyStopping(monitor='val_capsnet_accuracy',mode='max',
patience=2,restore_best_weights=True)
# Using ReduceLROnPlateau, the learning rate is reduced by half when val_accuracy is not improved for 5 consecutive times
lr_scheduler = keras.callbacks.ReduceLROnPlateau(monitor='val_capsnet_accuracy',mode='max',factor=0.5,patience=4)
train_model.compile(optimizer=keras.optimizers.Adam(lr=0.001),loss=[margin_loss,'mse'],loss_weights = [1. ,0.0005],metrics=['accuracy'])
train_model.fit([x_train, y_train],[y_train,x_train], batch_size = m, epochs = epochs, validation_data = ([x_test, y_test],[y_test,x_test]),callbacks=[early_stopping,lr_scheduler])
The model is:
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(100, 28, 28, 1)] 0
__________________________________________________________________________________________________
conv2d (Conv2D) (100, 27, 27, 256) 1280 input_1[0][0]
__________________________________________________________________________________________________
max_pooling2d (MaxPooling2D) (100, 27, 27, 256) 0 conv2d[0][0]
__________________________________________________________________________________________________
conv2d_1 (Conv2D) (100, 19, 19, 128) 2654336 max_pooling2d[0][0]
__________________________________________________________________________________________________
conv2d_2 (Conv2D) (100, 6, 6, 128) 1327232 conv2d_1[0][0]
__________________________________________________________________________________________________
reshape (Reshape) (100, 576, 8) 0 conv2d_2[0][0]
__________________________________________________________________________________________________
lambda (Lambda) (100, 576, 8) 0 reshape[0][0]
__________________________________________________________________________________________________
digitcaps (CapsuleLayer) (100, 3, 16) 221184 lambda[0][0]
__________________________________________________________________________________________________
input_2 (InputLayer) [(None, 3)] 0
__________________________________________________________________________________________________
mask (Mask) (100, 48) 0 digitcaps[0][0]
input_2[0][0]
__________________________________________________________________________________________________
capsnet (Length) (100, 3) 0 digitcaps[0][0]
__________________________________________________________________________________________________
decoder (Sequential) (None, 28, 28, 1) 1354000 mask[0][0]
==================================================================================================
Total params: 5,558,032
Trainable params: 5,558,032
Non-trainable params: 0
Input layer,convulational layers and primary capsule
img_shape=(28,28,1)
inp=L.Input(img_shape,100)
# Adding the first conv1 layer
conv1=L.Conv2D(filters=256,kernel_size=(2,2),activation='relu',padding='valid')(inp)
# Adding Maxpooling layer
maxpool1=L.MaxPooling2D(pool_size=(1,1))(conv1)
# Adding second convulational layer
conv2=L.Conv2D(filters=128,kernel_size=(9,9),activation='relu',padding='valid')(maxpool1)
# Adding primary cap layer
conv2=L.Conv2D(filters=8*16,kernel_size=(9,9),strides=2,padding='valid',activation=None)(conv2)
# Adding the squash activation
reshape2=L.Reshape([-1,8])(conv2)
squashed_output=L.Lambda(squash)(reshape2)
code source
x_train.shape --> (4415, 28, 28, 1)
y_train.shape --> (4415, 3)
x_test.shape --> (1104, 28, 28, 1)
y_test.shape --> (1104, 3)
My code here
Try make the X set so that the batch size perfectly fits the data i think the batch size remainder is 15 after fitting to all the data
For eg : make it a multiple of 100

Depth Estimation using Keras

I'm trying to design a Convolutional Net to estimate the Depth of images using Keras.
I have RGB Input images with the shape of 3x120x160 and have the Grayscale Output Depth Maps with the shape of 1x120x160.
I tried using a VGG like architecture where the depth of each layer grows but at the end when I want to design the final layers, I get stuck. using a Dense layer is too expensive and I tried using Upsampling which proved inefficient.
I want to use DeConvolution2D but I can't get it to work. the only architecture I end up is something like this:
model = Sequential()
model.add(Convolution2D(64, 5, 5, activation='relu', input_shape=(3, 120, 160)))
model.add(Convolution2D(64, 5, 5, activation='relu'))
model.add(MaxPooling2D())
model.add(Dropout(0.5))
model.add(Convolution2D(128, 3, 3, activation='relu'))
model.add(Convolution2D(128, 3, 3, activation='relu'))
model.add(MaxPooling2D())
model.add(Dropout(0.5))
model.add(Convolution2D(256, 3, 3, activation='relu'))
model.add(Convolution2D(256, 3, 3, activation='relu'))
model.add(Dropout(0.5))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(Dropout(0.5))
model.add(ZeroPadding2D())
model.add(Deconvolution2D(512, 3, 3, (None, 512, 41, 61), subsample=(2, 2), activation='relu'))
model.add(Deconvolution2D(512, 3, 3, (None, 512, 123, 183), subsample=(3, 3), activation='relu'))
model.add(cropping.Cropping2D(cropping=((1, 2), (11, 12))))
model.add(Convolution2D(1, 1, 1, activation='sigmoid', border_mode='same'))
The Model summary is like this :
Layer (type) Output Shape Param # Connected to
====================================================================================================
convolution2d_1 (Convolution2D) (None, 64, 116, 156) 4864 convolution2d_input_1[0][0]
____________________________________________________________________________________________________
convolution2d_2 (Convolution2D) (None, 64, 112, 152) 102464 convolution2d_1[0][0]
____________________________________________________________________________________________________
maxpooling2d_1 (MaxPooling2D) (None, 64, 56, 76) 0 convolution2d_2[0][0]
____________________________________________________________________________________________________
dropout_1 (Dropout) (None, 64, 56, 76) 0 maxpooling2d_1[0][0]
____________________________________________________________________________________________________
convolution2d_3 (Convolution2D) (None, 128, 54, 74) 73856 dropout_1[0][0]
____________________________________________________________________________________________________
convolution2d_4 (Convolution2D) (None, 128, 52, 72) 147584 convolution2d_3[0][0]
____________________________________________________________________________________________________
maxpooling2d_2 (MaxPooling2D) (None, 128, 26, 36) 0 convolution2d_4[0][0]
____________________________________________________________________________________________________
dropout_2 (Dropout) (None, 128, 26, 36) 0 maxpooling2d_2[0][0]
____________________________________________________________________________________________________
convolution2d_5 (Convolution2D) (None, 256, 24, 34) 295168 dropout_2[0][0]
____________________________________________________________________________________________________
convolution2d_6 (Convolution2D) (None, 256, 22, 32) 590080 convolution2d_5[0][0]
____________________________________________________________________________________________________
dropout_3 (Dropout) (None, 256, 22, 32) 0 convolution2d_6[0][0]
____________________________________________________________________________________________________
convolution2d_7 (Convolution2D) (None, 512, 20, 30) 1180160 dropout_3[0][0]
____________________________________________________________________________________________________
convolution2d_8 (Convolution2D) (None, 512, 18, 28) 2359808 convolution2d_7[0][0]
____________________________________________________________________________________________________
dropout_4 (Dropout) (None, 512, 18, 28) 0 convolution2d_8[0][0]
____________________________________________________________________________________________________
zeropadding2d_1 (ZeroPadding2D) (None, 512, 20, 30) 0 dropout_4[0][0]
____________________________________________________________________________________________________
deconvolution2d_1 (Deconvolution2(None, 512, 41, 61) 2359808 zeropadding2d_1[0][0]
____________________________________________________________________________________________________
deconvolution2d_2 (Deconvolution2(None, 512, 123, 183) 2359808 deconvolution2d_1[0][0]
____________________________________________________________________________________________________
cropping2d_1 (Cropping2D) (None, 512, 120, 160) 0 deconvolution2d_2[0][0]
____________________________________________________________________________________________________
convolution2d_9 (Convolution2D) (None, 1, 120, 160) 513 cropping2d_1[0][0]
====================================================================================================
Total params: 9474113
I couldn't reduce the size of Deconvolution2D layers from 512 as doing so results in shape related errors and it seems I have to add as many Deconvolution2D layers as the number of filters in the previous layer.
I also had to add a final Convolution2D layer to be able to run the network.
The above architecture learns but really slow and (I think) inefficiently. I'm sure I'm doing something wrong and the design shouldn't be like this. Can you help me design a better network?
I also tried to make a network as the one mentioned in this repository but it seems Keras doesn't work as this Lasagne example does. I'd really appreciate it if someone could show me how to design something like this network in Keras. It's architecture is like this :
Thanks
I'd suggest a U-Net (see figure 1). In the first half of a U-Net, the spatial resolution gets reduced as the number of channels increases (like VGG, as you mentioned). In the second half, the opposite happens, (number of channels get reduced, resolution increases). "Skip" connections between different layers allow for the network to efficiently produce high-resolution output.
You should be able to find an appropriate Keras implementation (maybe this one).

Tensorflow conv2d_transpose size error "Number of rows of out_backprop doesn't match computed"

I am creating a convolution autoencoder in tensorflow. I got this exact error:
tensorflow.python.framework.errors.InvalidArgumentError: Conv2DBackpropInput: Number of rows of out_backprop doesn't match computed: actual = 8, computed = 12
[[Node: conv2d_transpose = Conv2DBackpropInput[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/cpu:0"](conv2d_transpose/output_shape, Variable_1/read, MaxPool_1)]]
Relevant code:
l1d = tf.nn.relu(tf.nn.conv2d_transpose(l1da, w2, [10, 12, 12, 32], strides=[1, 1, 1, 1], padding='SAME'))
where
w2 = tf.Variable(tf.random_normal([5, 5, 32, 64], stddev=0.01))
I checked the shape of the input to conv2d_transpose i.e. l1da and it is correct(10x8x8x64). The batch size is 10, input to this layer is in the form of 8x8x64, and the output is supposed to be 12x12x32.
What am I missing?
Found the error. Padding should be "Valid", not "Same".

Resources