I am training a CNN with Keras but with 30x30 patches from an image. I want to test the network with a full image but I get the following error:
ValueError: GpuElemwise. Input dimension mis-match. Input 2 (indices
start at 0) has shape[1] == 30, but the output's size on that axis is
100. Apply node that caused the error: GpuElemwise{Composite{((i0 + i1) - i2)}}[(0, 0)](GpuDimShuffle{0,2,3,1}.0, GpuReshape{4}.0,
GpuFromHost.0) Toposort index: 79 Inputs types:
[CudaNdarrayType(float32, 4D), CudaNdarrayType(float32, (True, True,
True, False)), CudaNdarrayType(float32, 4D)] Inputs shapes: [(10, 100,
100, 3), (1, 1, 1, 3), (10, 30, 30, 3)] Inputs strides: [(30000, 100,
1, 10000), (0, 0, 0, 1), (2700, 90, 3, 1)] Inputs values: ['not
shown', CudaNdarray([[[[ 0.01060364 0.00988821 0.00741314]]]]), 'not
shown'] Outputs clients:
[[GpuCAReduce{pre=sqr,red=add}{0,1,1,1}(GpuElemwise{Composite{((i0 +
i1) - i2)}}[(0, 0)].0)]]
This is my model.predict:
predict_image = model.predict(np.array([test_images[1]]), batch_size=1)[0]
It's seems like the issue is that the input size cannot be anything other than 30x30 but the first input shape for the first layer of my network is none, none, 3.
model.add(Convolution2D(n1, f1, f1, border_mode='same', input_shape=(None, None, 3), activation='relu'))
Is it simply not possible to test an image with different dimensions to the ones I trained with?
As fchollet himself described here, you should be able to define the input as so:
input_shape=(1, None, None)
However this will fail if you have layers that use the Flatten operation.
This suggests that you should be able to accomplish your goal with a fully convolutional NN.
Related
Hello I'm using RandomizedSearchCV for hyperparameter tuning of my LSTM. The code works fine with stateful=False. However I also want so try this with stateful on but I'm not sure how to.
I arranged my data in a sliding window with shape (211845 datapoints, 4 window size, 16 features).
Following function for creating the architecture of the model.
def create_lstm(dropout_rate=0.0, neurons=32, lr=1e-3):
lstm = Sequential()
lstm.add(InputLayer((4, 16)))
lstm.add(LSTM(neurons, return_sequences=True))
lstm.add(Dropout(dropout_rate))
lstm.add(LSTM(neurons)
lstm.add(Dropout(dropout_rate))
lstm.add(Dense(neurons/4, activation='relu'))
lstm.add(Dense(1))
lstm.compile(loss='mse',
optimizer=Adam(learning_rate=lr),
metrics=['mean_squared_error']
)
return lstm
I pass the function to the wrapper
lstm_estimator = KerasRegressor(build_fn=create_lstm, verbose=1)
The following code is for my param grid and RandomSearch
lstm_param_grid = {
'dropout_rate': [0, 0.2, 0.4],
'neurons': [32, 64, 128],
'batch_size': [100, 200, 400],
'epochs': [50, 100, 150],
'lr': [1e-3, 1e-4, 1e-5]
}
lstm_RandomGrid = RandomizedSearchCV(estimator = lstm_estimator,
param_distributions = lstm_param_grid,
n_iter = 10,
verbose = 10,
n_jobs = -1,
cv = 5
)
In my create_lstm function the input shape of my data is equal to the window size and the number of features. After the RandomSearch i pass the arguments epochs and batch size when I fit the model. However with stateful=True you have to define batch_input_shape=(x, y, z).
I'm really not sure what exactly to do now. How can I change my code so that the RandomSearch still tests multiple batch sizes? And what exactly are (x,y,z) in my example? I tried (batch_size=100, window size, num of features) but that didn't work out.
train input shape : (13974, 100, 6, 5)
train output shape : (13974, 1, 6, 5)
test input shape : (3494, 100, 6, 5)
test output shape : (3494, 1, 6, 5)
model = Sequential()
model.add(TimeDistributed(Conv2D(32, (3, 3),
padding='same'),
input_shape=(100, 6, 5,1)))
model.add(TimeDistributed(Activation('relu')))
model.add(TimeDistributed(Conv2D(32, (3, 3))))
model.add(TimeDistributed(Activation('relu')))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Dropout(0.25)))
model.add(TimeDistributed(Flatten()))
model.add(TimeDistributed(Dense(512)))
model.add(TimeDistributed(Dense(35, name="first_dense_flow" )))
model.add(LSTM(20, return_sequences=True, name="lstm_layer_flow"));
model.add(TimeDistributed(Dense(101), name=" time_distr_dense_one_ flow "))
model.add(GlobalAveragePooling1D(name="global_avg_flow"))
model.compile(loss='mae', optimizer='adam', metrics=['accuracy']) model.fit(train_input,train_output,epochs=50,batch_size=60)
I am trying to build a CNN-LSTM model capable of detecting the future. Input is 13974 sequence, each sequence consists of 100 time stamps, each containing 6 locations and 5 features (variables) and so the input is (13974,100,6,5) and the output is (13974,1,6,5).
How I can change my model so that the spatio temporal prediction can be done
I am trying to understand why the implementation of ResNet50 in Keras forbids images smaller than 32x32x3.
Based on their implementation: https://github.com/keras-team/keras-applications/blob/master/keras_applications/resnet50.py
The function that catches that is _obtain_input_shape
To overcome this problem, I made my own implementation based on their code and I removed the code that forbids minimal size. In my implementation I also add the possibility to work with pre-trained model with more than three channels by replicating the RGB weights for the first conv1 layer.
def ResNet50(load_weights=True,
input_shape=None,
pooling=None,
classes=1000):
img_input = Input(shape=input_shape, name='tuned_input')
x = ZeroPadding2D(padding=(3, 3), name='conv1_pad')(img_input)
# Stage 1 (conv1_x)
x = Conv2D(64, (7, 7),
strides=(2, 2),
padding='valid',
kernel_initializer=KERNEL_INIT,
name='tuned_conv1')(x)
x = BatchNormalization(axis=CHANNEL_AXIS, name='bn_conv1')(x)
x = Activation('relu')(x)
x = ZeroPadding2D(padding=(1, 1), name='pool1_pad')(x)
x = MaxPooling2D((3, 3), strides=(2, 2))(x)
# Stage 2 (conv2_x)
x = _convolution_block(x, 3, [64, 64, 256], stage=2, block='a', strides=(1, 1))
for block in ['b', 'c']:
x = _identity_block(x, 3, [64, 64, 256], stage=2, block=block)
# Stage 3 (conv3_x)
x = _convolution_block(x, 3, [128, 128, 512], stage=3, block='a')
for block in ['b', 'c', 'd']:
x = _identity_block(x, 3, [128, 128, 512], stage=3, block=block)
# Stage 4 (conv4_x)
x = _convolution_block(x, 3, [256, 256, 1024], stage=4, block='a')
for block in ['b', 'c', 'd', 'e', 'f']:
x = _identity_block(x, 3, [256, 256, 1024], stage=4, block=block)
# Stage 5 (conv5_x)
x = _convolution_block(x, 3, [512, 512, 2048], stage=5, block='a')
for block in ['b', 'c']:
x = _identity_block(x, 3, [512, 512, 2048], stage=5, block=block)
# Condition on the last layer
if pooling == 'avg':
x = layers.GlobalAveragePooling2D()(x)
elif pooling == 'max':
x = layers.GlobalMaxPooling2D()(x)
inputs = img_input
# Create model.
model = models.Model(inputs, x, name='resnet50')
if load_weights:
weights_path = get_file(
'resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5',
WEIGHTS_PATH_NO_TOP,
cache_subdir='models',
md5_hash='a268eb855778b3df3c7506639542a6af')
model.load_weights(weights_path, by_name=True)
f = h5py.File(weights_path, 'r')
d = f['conv1']
# Used to work with more than 3 channels with pre-trained model
if input_shape[2] % 3 == 0:
model.get_layer('tuned_conv1').set_weights([d['conv1_W_1:0'][:].repeat(input_shape[2] / 3, axis=2),
d['conv1_b_1:0']])
else:
m = (3 * int(input_shape[2] / 3)) + 3
model.get_layer('tuned_conv1').set_weights(
[d['conv1_W_1:0'][:].repeat(m, axis=2)[:, :, 0:input_shape[2], :],
d['conv1_b_1:0']])
return model
I run my implementation with a 10x10x3 images and it seems to work. Thus I do not understand why they put this minimal bound.
They do not provide any information about this choice. I also check the original paper and I did not found any restriction mentioned about a minimal input shape. I suppose there is a reason for this bound but I do not know this one.
Thus I would like to know why such restriction has been done for the Resnet implementation.
ResNet50 has 5 stages of downsampling, between MaxPooling of 2x2 and Strided Convolution with strides of 2 px in each direction. This means that the minimum input size is 2^5 = 32, and this value is also the size of the receptive field.
There is not much point of using smaller images than 32x32, since then downsampling is not doing anything, and this will change the behavior of the network. For such small images then its better to use another network with less downsampling (like DenseNet) or with less depth.
I have built a Keras ConvLSTM neural network, and I want to predict one frame ahead based on a sequence of 10-time steps:
from keras.models import Sequential
from keras.layers.convolutional import Conv3D
from keras.layers.convolutional_recurrent import ConvLSTM2D
from keras.layers.normalization import BatchNormalization
import numpy as np
import pylab as plt
from keras import layers
# We create a layer which take as input movies of shape
# (n_frames, width, height, channels) and returns a movie
# of identical shape.
model = Sequential()
model.add(ConvLSTM2D(filters=40, kernel_size=(3, 3),
input_shape=(None, 64, 64, 1),
padding='same', return_sequences=True))
model.add(BatchNormalization())
model.add(ConvLSTM2D(filters=40, kernel_size=(3, 3),
padding='same', return_sequences=True))
model.add(BatchNormalization())
model.add(ConvLSTM2D(filters=40, kernel_size=(3, 3),
padding='same', return_sequences=True))
model.add(BatchNormalization())
model.add(ConvLSTM2D(filters=40, kernel_size=(3, 3),
padding='same', return_sequences=True))
model.add(BatchNormalization())
model.add(Conv3D(filters=1, kernel_size=(3, 3, 3),
activation='sigmoid',
padding='same', data_format='channels_last'))
model.compile(loss='binary_crossentropy', optimizer='adadelta')
training:
data_train_x = data_4[0:20, 0:10, :, :, :]
data_train_y = data_4[0:20, 10:11, :, :, :]
model.fit(data_train_x, data_train_y, batch_size=10, epochs=1,
validation_split=0.05)
and I test the model:
test_x = np.reshape(data_test_x[2,:,:,:,:], [1,10,64,64,1])
next_frame = model.predict(test_x,batch_size=1, verbose=1, steps=None)
but the problem is that 'next_frame' shape is: (1, 10, 64, 64, 1) but I wanted it to be of shape (1, 1, 64, 64, 1)
And this is the results of 'model.summary()':
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv_lst_m2d_1 (ConvLSTM2D) (None, None, 64, 64, 40) 59200
_________________________________________________________________
batch_normalization_1 (Batch (None, None, 64, 64, 40) 160
_________________________________________________________________
conv_lst_m2d_2 (ConvLSTM2D) (None, None, 64, 64, 40) 115360
_________________________________________________________________
batch_normalization_2 (Batch (None, None, 64, 64, 40) 160
_________________________________________________________________
conv_lst_m2d_3 (ConvLSTM2D) (None, None, 64, 64, 40) 115360
_________________________________________________________________
batch_normalization_3 (Batch (None, None, 64, 64, 40) 160
_________________________________________________________________
conv_lst_m2d_4 (ConvLSTM2D) (None, None, 64, 64, 40) 115360
_________________________________________________________________
batch_normalization_4 (Batch (None, None, 64, 64, 40) 160
_________________________________________________________________
conv3d_1 (Conv3D) (None, None, 64, 64, 1) 1081
=================================================================
Total params: 407,001
Trainable params: 406,681
Non-trainable params: 320
So I don't know what layer to add so I decrease the output to 1 frame instead of 10 frames?
This is expected based on the 3D convolution in the final layer. For example, if you have 1 filter in a Conv2D across a 3-dimensional tensor, with padding = 'same', this means it will produce a 2D output of the same height and width (e.g. the filter implicitly also captures along the depth axis).
The same is true for 3D across a 4-dimensional tensor, where it implicitly captures along the channel dimension depth axis, resulting in a 3-D tensor of the same (sequence index, height, width) as the input.
It sounds like what you want to do is add a pooling step of some kind after your Conv3D layer, such that it flattens across the sequence dimension, such as with AveragePooling3D with a pooling tuple of (10, 1, 1) to average across the first non-batch dimension (or modified according to your specific network needs).
Alternatively, suppose you want to specifically "pool" along the sequence dimension by taking only the final sequence element (e.g. instead of averaging or max-pooling across the sequence). You could then make the final ConvLSTM2D layer to have return_sequences=False, followed by a 2D convolution in the final step, but this means your final convolution won't benefit from aggregating across a sequence of predicted frames. Probably application-specific whether this is a good idea or not.
Just to confirm the first approach, I added:
model.add(layers.AveragePooling3D(pool_size=(10, 1, 1), padding='same'))
just after the Conv3D layer, and then made toy data:
x = np.random.rand(1, 10, 64, 64, 1)
and then:
In [22]: z = model.predict(x)
In [23]: z.shape
Out[23]: (1, 1, 64, 64, 1)
You would need to ensure the pooling size in the first non-batch dimension is set to the maximum possible sequence length to ensure you always get (1, 1, ...) in the final output shape.
As an alternative to ely's Conv2D and AveragePooling3D solutions, you can set the last ConvLSTM2D layer's return_sequence parameter as True but change the padding of the Conv3D layer to valid then set its kernel_size parameter as (n_observations - k_steps_to_predict + 1 , 1 , 1). With this, you are able to alter the time_dimension(#frames) of the output. You can apply this for any direct k-step ahead prediction assuming that the number of observations are fixed.
Please forgive my ignorance as I am really new to the area. I am trying to get the correct output shape from my neural network which has 3 Conv2D layers then 2 Dense layers. My input shape is (140, 140, 4), which are 4 grayscale images. When I fit in 1 input, I am expecting an output of (1, 4) but I am getting a shape of (14, 14, 4) here. What am I doing wrong here? Thank you very much for your help in advance!
meta_layers = [Conv2D, Conv2D, Conv2D, Dense, Dense]
meta_inits = ['lecun_uniform'] * 5
meta_nodes = [32, 64, 64, 512, 4]
meta_filter = [(8,8), (4,4), (3,3), None, None]
meta_strides = [(4,4), (2,2), (1,1), None, None]
meta_activations = ['relu'] * 5
meta_loss = "mean_squared_error"
meta_optimizer=RMSprop(lr=0.00025, rho=0.9, epsilon=1e-06)
meta_n_samples = 1000
meta_epsilon = 1.0;
meta = Sequential()
meta.add(self.meta_layers[0](self.meta_nodes[0], init=self.meta_inits[0], input_shape=(140, 140, 4), kernel_size=self.meta_filters[0], strides=self.meta_strides[0]))
meta.add(Activation(self.meta_activations[0]))
for layer, init, node, activation, kernel, stride in list(zip(self.meta_layers, self.meta_inits, self.meta_nodes, self.meta_activations, self.meta_filters, self.meta_strides))[1:]:
if(layer == Conv2D):
meta.add(layer(node, init = init, kernel_size = kernel, strides = stride))
meta.add(Activation(activation))
elif(layer == Dense):
meta.add(layer(node, init=init))
meta.add(Activation(activation))
print("meta node: " + str(node))
meta.compile(loss=self.meta_loss, optimizer=self.meta_optimizer)
Your problem lies in the fact that in Keras with version >= 2.0, a Dense layer is applied to the last channel of the inputs (you may read about it here). So if you apply:
Dense(512)
to a Conv2D layer with shape (14, 14, 64) you'll get the output with shape (14, 14, 512) and then Dense(4) applied to it will give you output with shape (14, 14, 4). You can call model.summary() method to confirm my words.
In order to solve this you need to apply one of the following layers: GlobalMaxPooling2D, GlobalAveragePooling2D or Flatten to the output from the last convolutional layer in order to squash your output to be only 2 dimensional (with shape (batch_size, features).