keras convolutional nerural network - output shape - machine-learning

Please forgive my ignorance as I am really new to the area. I am trying to get the correct output shape from my neural network which has 3 Conv2D layers then 2 Dense layers. My input shape is (140, 140, 4), which are 4 grayscale images. When I fit in 1 input, I am expecting an output of (1, 4) but I am getting a shape of (14, 14, 4) here. What am I doing wrong here? Thank you very much for your help in advance!
meta_layers = [Conv2D, Conv2D, Conv2D, Dense, Dense]
meta_inits = ['lecun_uniform'] * 5
meta_nodes = [32, 64, 64, 512, 4]
meta_filter = [(8,8), (4,4), (3,3), None, None]
meta_strides = [(4,4), (2,2), (1,1), None, None]
meta_activations = ['relu'] * 5
meta_loss = "mean_squared_error"
meta_optimizer=RMSprop(lr=0.00025, rho=0.9, epsilon=1e-06)
meta_n_samples = 1000
meta_epsilon = 1.0;
meta = Sequential()
meta.add(self.meta_layers[0](self.meta_nodes[0], init=self.meta_inits[0], input_shape=(140, 140, 4), kernel_size=self.meta_filters[0], strides=self.meta_strides[0]))
meta.add(Activation(self.meta_activations[0]))
for layer, init, node, activation, kernel, stride in list(zip(self.meta_layers, self.meta_inits, self.meta_nodes, self.meta_activations, self.meta_filters, self.meta_strides))[1:]:
if(layer == Conv2D):
meta.add(layer(node, init = init, kernel_size = kernel, strides = stride))
meta.add(Activation(activation))
elif(layer == Dense):
meta.add(layer(node, init=init))
meta.add(Activation(activation))
print("meta node: " + str(node))
meta.compile(loss=self.meta_loss, optimizer=self.meta_optimizer)

Your problem lies in the fact that in Keras with version >= 2.0, a Dense layer is applied to the last channel of the inputs (you may read about it here). So if you apply:
Dense(512)
to a Conv2D layer with shape (14, 14, 64) you'll get the output with shape (14, 14, 512) and then Dense(4) applied to it will give you output with shape (14, 14, 4). You can call model.summary() method to confirm my words.
In order to solve this you need to apply one of the following layers: GlobalMaxPooling2D, GlobalAveragePooling2D or Flatten to the output from the last convolutional layer in order to squash your output to be only 2 dimensional (with shape (batch_size, features).

Related

Pytorch Unfold and Fold: How do I put this image tensor back together again?

I am trying to filter a single channel 2D image of size 256x256 using unfold to create 16x16 blocks with an overlap of 8. This is shown below:
# I = [256, 256] image
kernel_size = 16
stride = bx/2
patches = I.unfold(1, kernel_size,
int(stride)).unfold(0, kernel_size, int(stride)) # size = [31, 31, 16, 16]
I have started to attempt to put the image back together with fold but I’m not quite there yet. I’ve tried to use view to get the image to ‘fit’ the way it’s supposed to but I don’t see how this would preserve the original image. Perhaps I’m overthinking this.
# patches.shape = [31, 31, 16, 16]
patches = = filt_data_block.contiguous().view(-1, kernel_size*kernel_size) # [961, 256]
patches = patches.permute(1, 0) # size = [951, 256]
Any help would be greatly appreciated. Thanks very much.
I believe you will benefit from using torch.nn.functional.fold and torch.nn.functional.unfold in this case, as these functions are built specifically for images (or any 4D tensors, that is with shape B X C X H X W).
Let's start with unfolding the image:
import torch
import torch.nn.functional as F
import matplotlib.pyplot as plt
from sklearn.datasets import load_sample_image #Used to load a sample image
dtype = torch.cuda.FloatTensor if torch.cuda.is_available() else torch.FloatTensor
#Load a flower image from sklearn.datasets, crop it to shape 1 X 3 X 256 X 256:
I = torch.from_numpy(load_sample_image('flower.jpg')).permute(2,0,1).unsqueeze(0).type(dtype)[...,128:128+256,256:256+256]
kernel_size = 16
stride = kernel_size//2
I_unf = F.unfold(I, kernel_size, stride=stride)
Here we obtain all the 16x16 image patches with strides of 8 by using the F.unfold function. This will result in a 3D tensor with shape torch.Size([1, 768, 961]). ie - 961 patches with 768 = 16 X 16 X 3 pixels within each.
Now, say we wish to fold it back to I:
I_f = F.fold(I_unf,I.shape[-2:],kernel_size,stride=stride)
norm_map = F.fold(F.unfold(torch.ones(I.shape).type(dtype),kernel_size,stride=stride),I.shape[-2:],kernel_size,stride=stride)
I_f /= norm_map
We use F.fold where we tell it the original shape of I, the kernel_size we used to unfold and the stride used. After folding I_unf we will obtain a summation with overlaps. This means that the resulting image will appear saturated. As a result, we need to compute a normalization map which will normalize multiple summation of pixels due to overlaps. A way to do this efficiently is to take a ones tensor and use unfold followed by fold - to mimic the summation with overlaps. This gives us the normalization map by which we normalize I_f to recover I.
Now, we wish to plot I_f and I to prove content is preserved:
#Plot I:
plt.imshow(I[0,...].permute(1,2,0).cpu()/255)
#Plot I_f:
plt.imshow(I_f[0,...].permute(1,2,0).cpu()/255)
This whole process will work also for single-channel images. One thing to notice is that if spatial dimensions of the image are not divisible by the stride, you will get norm_map with zeros (at the edges) due to some pixels not reachable but you can easily handle this case as well.
A slightly less elegant solution than that proposed by Gil:
I took inspiration from this post on the Pytorch forums, formatting my image tensor to be of standard shape B x C x H x W (1 x 1 x 256 x 256). Unfolding:
# CREATE THE UNFOLDED IMAGE SLICES
I = image # shape [256, 256]
kernel_size = bx #shape [16]
stride = int(bx/2) #shape [8]
I2 = I.unsqueeze(0).unsqueeze(0) #shape [1, 1, 256, 256]
patches2 = I2.unfold(2, kernel_size, stride).unfold(3, kernel_size, stride)
#shape [1, 1, 31, 31, 16, 16]
Following this, I do some transforms and filtering to my tensor stack. Before doing this I apply a cosine window and normalise:
# NORMALISE AND WINDOW
Pvv = torch.mean(torch.pow(win, 2))*torch.numel(win)*(noise_std**2)
Pvv = Pvv.double()
mean_patches = torch.mean(patches2, (4, 5), keepdim=True)
mean_patches = mean_patches.repeat(1, 1, 1, 1, 16, 16)
window_patches = win.unsqueeze(0).unsqueeze(0).unsqueeze(0).unsqueeze(0).repeat(1, 1, 31, 31, 1, 1)
zero_mean = patches2 - mean_patches
windowed_patches = zero_mean * window_patches
#SOME FILTERING ....
#ADD MEAN AND WINDOW BEFORE FOLDING BACK TOGETHER.
filt_data_block = (filt_data_block + mean_patches*window_patches) * window_patches
The above code works for me, but a mask would be more simple. Next, I prepare my tensor of form [1, 1, 31, 31, 16, 16] to be transformed back into the original [1, 1, 256, 256]:
# REASSEMBLE THE IMAGE USING FOLD
patches = filt_data_block.contiguous().view(1, 1, -1, kernel_size*kernel_size)
patches = patches.permute(0, 1, 3, 2)
patches = patches.contiguous().view(1, kernel_size*kernel_size, -1)
IR = F.fold(patches, output_size=(256, 256), kernel_size=kernel_size, stride=stride)
IR = IR.squeeze()
This allowed me to create an overlapping sliding window and seamlessly stitch the image back together. Cutting out the filtering makes for an identical image.

How to prepare input with different input sizes for neural network training (keras)?

I'm building a fully convolutional neural network that inputs and outputs an image. I want my images to be of the different sizes and resizing or adding padding doesn't suit me.
As it was said here: Can Keras deal with input images with different size?, I can build such a model specifying input_shape = (1, None, None), but how should I prepare a dataset that I feed to my network?
I have this function for loading images for fixed image size:
def load_images(path):
all_images = []
for image_path in sorted(os.listdir(path)):
img = imread(path + image_path , as_gray=True)
all_images.append(img)
return np.array(all_images).reshape(len(all_images),img_size,img_size,1)
How should I change it so that 2 dimensions of the output numpy array are not fixed? np.reshape allows only one dimension to be unknown.
I'm not entirely sure, if this will solve your problem entirely.
So here's my approach and this unfortunately depends on the fact that you can create batch of data having the same (height width). But the height width between batches can change. This is what image_gen() is doing.
Then you can directly create a dataset the following way and train your model.
import numpy as np
import tensorflow as tf
def image_gen():
for _ in range(100):
rand = np.random.choice([0,1,2])
res = [(np.random.normal(size=(5, 256, 256, 3)), np.random.normal(size=(5, 256, 256, 3))),
(np.random.normal(size=(5, 128, 128, 3)), np.random.normal(size=(5, 128, 128, 3))),
(np.random.normal(size=(5, 64, 64, 3)), np.random.normal(size=(5, 64, 64, 3)))
]
yield res[rand]
dataset = tf.data.Dataset.from_generator(image_gen, output_types=(tf.float32, tf.float32), output_shapes=([5, None, None, 3],[5, None, None, 3]))
it = dataset.make_initializable_iterator()
with tf.Session() as sess:
sess.run(it.initializer)
model.fit(dataset)
And the model
from tensorflow.keras import layers, models
inp = layers.Input(shape=(None, None, 3))
out = layers.Conv2D(32, (3,3), strides=(2,2), padding='same')(inp)
out = layers.Conv2D(64, (3,3), strides=(2,2), padding='same')(out)
out = layers.Conv2DTranspose(32, (3,3), strides=(2,2), padding='same')(out)
out = layers.Conv2DTranspose(3, (3,3), strides=(2,2), padding='same')(out)
model = models.Model(inputs=inp, outputs=out)
model.compile(optimizer='adam', loss='categorical_crossentropy')
model.summary()

Why implementation of Resnet50 in Keras forbids images smaller than 32x32x3?

I am trying to understand why the implementation of ResNet50 in Keras forbids images smaller than 32x32x3.
Based on their implementation: https://github.com/keras-team/keras-applications/blob/master/keras_applications/resnet50.py
The function that catches that is _obtain_input_shape
To overcome this problem, I made my own implementation based on their code and I removed the code that forbids minimal size. In my implementation I also add the possibility to work with pre-trained model with more than three channels by replicating the RGB weights for the first conv1 layer.
def ResNet50(load_weights=True,
input_shape=None,
pooling=None,
classes=1000):
img_input = Input(shape=input_shape, name='tuned_input')
x = ZeroPadding2D(padding=(3, 3), name='conv1_pad')(img_input)
# Stage 1 (conv1_x)
x = Conv2D(64, (7, 7),
strides=(2, 2),
padding='valid',
kernel_initializer=KERNEL_INIT,
name='tuned_conv1')(x)
x = BatchNormalization(axis=CHANNEL_AXIS, name='bn_conv1')(x)
x = Activation('relu')(x)
x = ZeroPadding2D(padding=(1, 1), name='pool1_pad')(x)
x = MaxPooling2D((3, 3), strides=(2, 2))(x)
# Stage 2 (conv2_x)
x = _convolution_block(x, 3, [64, 64, 256], stage=2, block='a', strides=(1, 1))
for block in ['b', 'c']:
x = _identity_block(x, 3, [64, 64, 256], stage=2, block=block)
# Stage 3 (conv3_x)
x = _convolution_block(x, 3, [128, 128, 512], stage=3, block='a')
for block in ['b', 'c', 'd']:
x = _identity_block(x, 3, [128, 128, 512], stage=3, block=block)
# Stage 4 (conv4_x)
x = _convolution_block(x, 3, [256, 256, 1024], stage=4, block='a')
for block in ['b', 'c', 'd', 'e', 'f']:
x = _identity_block(x, 3, [256, 256, 1024], stage=4, block=block)
# Stage 5 (conv5_x)
x = _convolution_block(x, 3, [512, 512, 2048], stage=5, block='a')
for block in ['b', 'c']:
x = _identity_block(x, 3, [512, 512, 2048], stage=5, block=block)
# Condition on the last layer
if pooling == 'avg':
x = layers.GlobalAveragePooling2D()(x)
elif pooling == 'max':
x = layers.GlobalMaxPooling2D()(x)
inputs = img_input
# Create model.
model = models.Model(inputs, x, name='resnet50')
if load_weights:
weights_path = get_file(
'resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5',
WEIGHTS_PATH_NO_TOP,
cache_subdir='models',
md5_hash='a268eb855778b3df3c7506639542a6af')
model.load_weights(weights_path, by_name=True)
f = h5py.File(weights_path, 'r')
d = f['conv1']
# Used to work with more than 3 channels with pre-trained model
if input_shape[2] % 3 == 0:
model.get_layer('tuned_conv1').set_weights([d['conv1_W_1:0'][:].repeat(input_shape[2] / 3, axis=2),
d['conv1_b_1:0']])
else:
m = (3 * int(input_shape[2] / 3)) + 3
model.get_layer('tuned_conv1').set_weights(
[d['conv1_W_1:0'][:].repeat(m, axis=2)[:, :, 0:input_shape[2], :],
d['conv1_b_1:0']])
return model
I run my implementation with a 10x10x3 images and it seems to work. Thus I do not understand why they put this minimal bound.
They do not provide any information about this choice. I also check the original paper and I did not found any restriction mentioned about a minimal input shape. I suppose there is a reason for this bound but I do not know this one.
Thus I would like to know why such restriction has been done for the Resnet implementation.
ResNet50 has 5 stages of downsampling, between MaxPooling of 2x2 and Strided Convolution with strides of 2 px in each direction. This means that the minimum input size is 2^5 = 32, and this value is also the size of the receptive field.
There is not much point of using smaller images than 32x32, since then downsampling is not doing anything, and this will change the behavior of the network. For such small images then its better to use another network with less downsampling (like DenseNet) or with less depth.

How to decrease a 3D matrix to a 2D matrix using Keras?

I have built a Keras ConvLSTM neural network, and I want to predict one frame ahead based on a sequence of 10-time steps:
from keras.models import Sequential
from keras.layers.convolutional import Conv3D
from keras.layers.convolutional_recurrent import ConvLSTM2D
from keras.layers.normalization import BatchNormalization
import numpy as np
import pylab as plt
from keras import layers
# We create a layer which take as input movies of shape
# (n_frames, width, height, channels) and returns a movie
# of identical shape.
model = Sequential()
model.add(ConvLSTM2D(filters=40, kernel_size=(3, 3),
input_shape=(None, 64, 64, 1),
padding='same', return_sequences=True))
model.add(BatchNormalization())
model.add(ConvLSTM2D(filters=40, kernel_size=(3, 3),
padding='same', return_sequences=True))
model.add(BatchNormalization())
model.add(ConvLSTM2D(filters=40, kernel_size=(3, 3),
padding='same', return_sequences=True))
model.add(BatchNormalization())
model.add(ConvLSTM2D(filters=40, kernel_size=(3, 3),
padding='same', return_sequences=True))
model.add(BatchNormalization())
model.add(Conv3D(filters=1, kernel_size=(3, 3, 3),
activation='sigmoid',
padding='same', data_format='channels_last'))
model.compile(loss='binary_crossentropy', optimizer='adadelta')
training:
data_train_x = data_4[0:20, 0:10, :, :, :]
data_train_y = data_4[0:20, 10:11, :, :, :]
model.fit(data_train_x, data_train_y, batch_size=10, epochs=1,
validation_split=0.05)
and I test the model:
test_x = np.reshape(data_test_x[2,:,:,:,:], [1,10,64,64,1])
next_frame = model.predict(test_x,batch_size=1, verbose=1, steps=None)
but the problem is that 'next_frame' shape is: (1, 10, 64, 64, 1) but I wanted it to be of shape (1, 1, 64, 64, 1)
And this is the results of 'model.summary()':
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv_lst_m2d_1 (ConvLSTM2D) (None, None, 64, 64, 40) 59200
_________________________________________________________________
batch_normalization_1 (Batch (None, None, 64, 64, 40) 160
_________________________________________________________________
conv_lst_m2d_2 (ConvLSTM2D) (None, None, 64, 64, 40) 115360
_________________________________________________________________
batch_normalization_2 (Batch (None, None, 64, 64, 40) 160
_________________________________________________________________
conv_lst_m2d_3 (ConvLSTM2D) (None, None, 64, 64, 40) 115360
_________________________________________________________________
batch_normalization_3 (Batch (None, None, 64, 64, 40) 160
_________________________________________________________________
conv_lst_m2d_4 (ConvLSTM2D) (None, None, 64, 64, 40) 115360
_________________________________________________________________
batch_normalization_4 (Batch (None, None, 64, 64, 40) 160
_________________________________________________________________
conv3d_1 (Conv3D) (None, None, 64, 64, 1) 1081
=================================================================
Total params: 407,001
Trainable params: 406,681
Non-trainable params: 320
So I don't know what layer to add so I decrease the output to 1 frame instead of 10 frames?
This is expected based on the 3D convolution in the final layer. For example, if you have 1 filter in a Conv2D across a 3-dimensional tensor, with padding = 'same', this means it will produce a 2D output of the same height and width (e.g. the filter implicitly also captures along the depth axis).
The same is true for 3D across a 4-dimensional tensor, where it implicitly captures along the channel dimension depth axis, resulting in a 3-D tensor of the same (sequence index, height, width) as the input.
It sounds like what you want to do is add a pooling step of some kind after your Conv3D layer, such that it flattens across the sequence dimension, such as with AveragePooling3D with a pooling tuple of (10, 1, 1) to average across the first non-batch dimension (or modified according to your specific network needs).
Alternatively, suppose you want to specifically "pool" along the sequence dimension by taking only the final sequence element (e.g. instead of averaging or max-pooling across the sequence). You could then make the final ConvLSTM2D layer to have return_sequences=False, followed by a 2D convolution in the final step, but this means your final convolution won't benefit from aggregating across a sequence of predicted frames. Probably application-specific whether this is a good idea or not.
Just to confirm the first approach, I added:
model.add(layers.AveragePooling3D(pool_size=(10, 1, 1), padding='same'))
just after the Conv3D layer, and then made toy data:
x = np.random.rand(1, 10, 64, 64, 1)
and then:
In [22]: z = model.predict(x)
In [23]: z.shape
Out[23]: (1, 1, 64, 64, 1)
You would need to ensure the pooling size in the first non-batch dimension is set to the maximum possible sequence length to ensure you always get (1, 1, ...) in the final output shape.
As an alternative to ely's Conv2D and AveragePooling3D solutions, you can set the last ConvLSTM2D layer's return_sequence parameter as True but change the padding of the Conv3D layer to valid then set its kernel_size parameter as (n_observations - k_steps_to_predict + 1 , 1 , 1). With this, you are able to alter the time_dimension(#frames) of the output. You can apply this for any direct k-step ahead prediction assuming that the number of observations are fixed.

Keras model predict train/test shape

I am training a CNN with Keras but with 30x30 patches from an image. I want to test the network with a full image but I get the following error:
ValueError: GpuElemwise. Input dimension mis-match. Input 2 (indices
start at 0) has shape[1] == 30, but the output's size on that axis is
100. Apply node that caused the error: GpuElemwise{Composite{((i0 + i1) - i2)}}[(0, 0)](GpuDimShuffle{0,2,3,1}.0, GpuReshape{4}.0,
GpuFromHost.0) Toposort index: 79 Inputs types:
[CudaNdarrayType(float32, 4D), CudaNdarrayType(float32, (True, True,
True, False)), CudaNdarrayType(float32, 4D)] Inputs shapes: [(10, 100,
100, 3), (1, 1, 1, 3), (10, 30, 30, 3)] Inputs strides: [(30000, 100,
1, 10000), (0, 0, 0, 1), (2700, 90, 3, 1)] Inputs values: ['not
shown', CudaNdarray([[[[ 0.01060364 0.00988821 0.00741314]]]]), 'not
shown'] Outputs clients:
[[GpuCAReduce{pre=sqr,red=add}{0,1,1,1}(GpuElemwise{Composite{((i0 +
i1) - i2)}}[(0, 0)].0)]]
This is my model.predict:
predict_image = model.predict(np.array([test_images[1]]), batch_size=1)[0]
It's seems like the issue is that the input size cannot be anything other than 30x30 but the first input shape for the first layer of my network is none, none, 3.
model.add(Convolution2D(n1, f1, f1, border_mode='same', input_shape=(None, None, 3), activation='relu'))
Is it simply not possible to test an image with different dimensions to the ones I trained with?
As fchollet himself described here, you should be able to define the input as so:
input_shape=(1, None, None)
However this will fail if you have layers that use the Flatten operation.
This suggests that you should be able to accomplish your goal with a fully convolutional NN.

Resources