Relation between kernel size and input size in CNN - machine-learning

I have a Conv1D layer in keras with a kernel size of 3 and a stride length of 1.
I have the following error when I'm trying to handle input size of 5 but everything works with input size of 6.
InvalidArgumentError (see above for traceback): Computed output size would be negative:
-1 [input_size: 0, effective_filter_size: 3, stride: 1]
I thought that kernel of size 3 needs input of size at least 3.
EDIT: Here is the model, the input size is variable, the problem I have is with input of size 5.
model = Sequential()
model.add(Conv1D(
input_shape=(None, 4),
filters=64,
kernel_size=3,
activation='relu'))
model.add(Conv1D(
filters=32,
kernel_size=3,
activation='relu'))
model.add(Conv1D(
filters=16,
kernel_size=2,
activation='relu'))
model.add(GlobalMaxPooling1D())
model.add(Dense(number_of_classes))
model.add(Softmax(axis=-1))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

To ensure that the size of your output feature maps is the same as your input feature maps, you have to pad the input using 'same' padding.
model.add(Conv1D(
input_shape=(None, 4),
filters=64,
kernel_size=3,
activation='relu',
padding='same'))

Related

How to reverse max pooling layer in autoencoder to return the original shape in decoder?

I am building an autoencoder to compress the image. my input image is mnist dataset which contain (28,28,1) images and I want my latent space (encoded image)to have the shape (10,10,1) to have high compression ratio. in encoder part ,I don't have any problem but in the decoder part I cant return the the image to the original shape (28,28,1).
my code :
#Encoder
input_img = keras.Input(shape=(28, 28, 1))
x = layers.Conv2D(64 ,(3, 3), activation='relu', padding='same')(input_img)
x =layers.MaxPooling2D((3,3), padding='same')(x)
x = layers.Conv2D(128, (3, 3), activation='relu', padding='same')(x)
x = layers.Conv2D(256, (3, 3), activation='relu', padding='same')(x)
x = layers.Conv2D(512, (3, 3), activation='relu', padding='same')(x)
encoded = layers.Conv2D(1, (3, 3), activation='relu', padding='same')(x)
Encoded shape
#Decoder
x = layers.Conv2D(512, (3, 3), activation='relu', padding='same')(encoded)
x = layers.Conv2D(256, (3, 3), activation='relu', padding='same')(x)
x = layers.Conv2D(128, (3, 3), activation='relu', padding='same')(x)
x = layers.UpSampling2D((2, 2),interpolation="bilinear")(x)
x = layers.Conv2D(64 ,(3, 3), activation='relu', padding='same')(x)
decoded = x = layers.Conv2D(1, (3, 3), activation='relu', padding='same')(x)
decoded shape :(20,20,1)
Decoded shape
How i can return the image to the original shape?
There are multiple ways to upscale a 2D tensor, or alternatively, to project a smaller vector into a larger one.
Here's a non exhaustive list:
Apply one or a couple of upsampling layers followed by a flatten layer, followed by a Linear layer. Upsampling basically applies standard image upscaling algorithms to increase the size of your image. Then, you want to flatten it so a linear layer can be applied on it so you can achieve the precise shape you require.
Skip the upscale altogether and just apply a flatten, followed by a projection layer. For MNIST this will suffice. For more complex datasets, you want to use the previous suggestion, interspersed with convolutional blocks, to help improve your models capacity and reconstruction ability.
I can see that you have already attempted the UpSampling + Conv direction. What you want to do next is apply a flatten layer, followed by a projection layer with 768 output units, before reshaping into batch, 28, 28, 1 again to get what you need.

how to train a neural network with array as an label i.e [1,0] in tensorflow 2.0

I am training an image classifier model in which I have taken the image and converted it into an array of shape 50*50*1 and the label of that image is [0,1]. It is a horse or a human classifier i.e for horses it's [0,1] for humans it's [1,0]. I tried using this CNN network but failed.
model_new = Sequential([Conv2D(16, 3, padding='same', activation='relu',input_shape=(50, 50,1)),
MaxPooling2D(),
Dropout(0.2),
Conv2D(32, 3, padding='same', activation='relu'),
MaxPooling2D(),
Conv2D(64, 3, padding='same', activation='relu'),
MaxPooling2D(),
Dropout(0.2),
Flatten(),
Dense(512, activation='relu'),
Dense(1)
])
model_new.compile(optimizer='adam',loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),metrics=['accuracy'])
model_new.fit(X, Y, epochs=10,validation_data=(test_x, test_y))
X is an image array of size 50,50,1 and Y is array [1,0]. Is there anything wrong with the code? If so, how should I change it? Thanks.
Change Dense(1) to Dense(2) and use tf.keras.losses.SparseCategoricalCrossentropy

Use pretrained model with different input shape and class model

I am working on a classification problem using CNN where my input image size is 64X64 and I want to use pretrained model such as VGG16,COCO or any other. But the problem is input image size of pretrained model is 224X224. How do I sort this issue. Is there any data augmentation way for input image size.
If I resize my input image to 224X224 then there is very high chance of image will get blurred and that may impact the training. Please correct me if I am wrong.
Another question is related to pretrained model. If I am using transfer learning then generally how layers I have to freeze from pretrained model. Considering my classification is very different from pretrained model classes. But I guess first few layers we can freeze it to get the edges, curve etc.. of the images which is very common in all the images.
But the problem is input image size of pretrained model is 224X224.
I assume you work with Keras/Tensorflow (It's the same for other DL frameworks). According to the docs in the Keras Application:
input_shape: optional shape tuple, only to be specified if include_top
is False (otherwise the input shape has to be (224, 224, 3) (with
'channels_last' data format) or (3, 224, 224) (with 'channels_first'
data format). It should have exactly 3 inputs channels, and width and
height should be no smaller than 48. E.g. (200, 200, 3) would be one
So there are two options to solve your issue:
Resize your input image to 244*244 by existing library and use VGG classifier [include_top=True].
Train your own classifier on top of the VGG models. As mentioned in the above documentation in Keras if your image is different than 244*244, you should train your own classifier [include_top=False]. You can do such things easily with:
inp = keras.layers.Input(shape=(64, 64, 3), name='image_input')
vgg_model = VGG19(weights='imagenet', include_top=False)
vgg_model.trainable = False
x = keras.layers.Flatten(name='flatten')(vgg_model)
x = keras.layers.Dense(512, activation='relu', name='fc1')(x)
x = keras.layers.Dense(512, activation='relu', name='fc2')(x)
x = keras.layers.Dense(10, activation='softmax', name='predictions')(x)
new_model = keras.models.Model(inputs=inp, outputs=x)
new_model.compile(optimizer='adam', loss='categorical_crossentropy',
metrics=['accuracy'])
If I am using transfer learning then generally how layers I have to
freeze from pretrained model
It is really depend on what your new task, how many training example you have, whats your pretrained model, and lots of other things. If I were you, I first throw away the pretrained model classifier. Then, If not worked, remove some other Convolution layer and do it step by step until I get good performance.
The following code works for me for image size 128*128*3:
vgg_model = VGG16(include_top=False, weights='imagenet')
print(vgg_model.summary())
#Get the dictionary of config for vgg16
vgg_config = vgg_model.get_config()
vgg_config["layers"][0]["config"]["batch_input_shape"] = (None, 128, 128, 3)
vgg_updated = Model.from_config(vgg_config)
vgg_updated.trainable = False
model = Sequential()
# Add the vgg convolutional base model
model.add(vgg_updated)
# Flattedn Layer must be added
model.add(Flatten())
vgg_updated.summary()
model.summary()

Masking layer with ConvLSTM2D Keras

I am trying to use the Masking layer with ConvLSTM2D in keras. However, I keep gettin the error:
ValueError: Shape must be rank 4 but is rank 2 for 'conv_lst_m2d_1/while/Tile' (op: 'Tile') with input shapes: [?,128,128,1], [2].
My input shape is (None, 128, 128, 1)

Mistake again? Verifying ZFNet layers' input-output dimensions

As mentioned in one of the lecture of cs231n, there were some calculation errors in AlexNet architecture. The initial size of the image has to be 227x227 instead of 224x224 which is what is mentioned in the paper. I wanted to know is there any similar problem in the paper of ZFNet as well?
In the given figure (from ZFNet paper) the initial size of the image is again 224x224 so if we will use a 2D convolution layer with 96 filters of size (7x7) and stride (2,2) then the size of the result should be (224-7)/2 + 1 = 109.5 but if we take initial image size to be 225x225 then we will exactly get 110. Moreover, in the first layer, I feel that there is a similar problem. The size of input to max-pool layer is 110x110x96 and pooling size is (3x3) with stride 2, so the size of the output should be (110-3)/2 + 1 = 54.5 which is again not an integer. I want to know am I doing right calculations or is there any problem with the values given in the paper?
Pytorch built-in implementation says that you need to use padding:
self.conv1 = nn.Conv2d(3, 96, 7, stride=2, padding=2)
self.conv2 = nn.Conv2d(96, 256, 5, padding=2)
self.conv3 = nn.Conv2d(256, 384, 3, padding=1)
self.conv4 = nn.Conv2d(384, 384, 3, padding=1)
self.conv5 = nn.Conv2d(384, 256, 3, padding=1)
hi you are using zf net architure diagram zf net is similiar to alexnet but have a filter smaller filter size of 7*7 with stride of 2 there is no calucalation error it just round up the values.

Resources