I am trying to convert a model which uses Flatten/Linear as the final layer to use global pooling with AdapativeAvgPool1d/Linear. The output dimensions of the Linear layer after the global pooling are messing up the training epochs. I get the following error:
ValueError: operands could not be broadcast together with shapes (64,4) (64,)
Model with Flatten-->Linear (works)
conv1d --> relu --> maxpool1d --> Flatten --> Linear:
model = nn.Sequential(
nn.Conv1d(in_channels=1, out_channels=32, kernel_size=128, stride=16, padding=1),
nn.ReLU(),
nn.MaxPool1d(kernel_size=2, stride=2),
nn.Flatten(),
nn.LazyLinear(n_classes)
)
==========================================================================================
Layer (type:depth-idx) Output Shape Param #
==========================================================================================
Sequential -- --
├─Conv1d: 1-1 [64, 32, 505] 4,128
├─ReLU: 1-2 [64, 32, 505] --
├─MaxPool1d: 1-3 [64, 32, 252] --
├─Flatten: 1-4 [64, 8064] --
├─Linear: 1-5 [64, 4] 32,260
==========================================================================================
Model with AdaptiveAvgPool1d-->Linear (output dimension wrong)
I want the output of this implementation to match that of the previous one, where the output shape coming out of the Linear layer is [64,4]
model = nn.Sequential(
nn.Conv1d(in_channels=1, out_channels=32, kernel_size=128, stride=16, padding=1),
nn.ReLU(),
nn.MaxPool1d(kernel_size=2, stride=2),
nn.AdaptiveAvgPool1d(1),
nn.LazyLinear(n_classes)
)
==========================================================================================
Layer (type:depth-idx) Output Shape Param #
==========================================================================================
Sequential -- --
├─Conv1d: 1-1 [64, 32, 505] 4,128
├─ReLU: 1-2 [64, 32, 505] --
├─MaxPool1d: 1-3 [64, 32, 252] --
├─AdaptiveAvgPool1d: 1-4 [64, 32, 1] --
├─Linear: 1-5 [64, 32, 4] 8
==========================================================================================
You can't replace nn.Flatten with nn.AdaptiveAvgPool1d because they don't do the same thing. You still need to add nn.Flatten() after nn.AdaptiveAvgPool1d to have the same output shape.
Related
train input shape : (13974, 100, 6, 5)
train output shape : (13974, 1,1)
test input shape : (3494, 100, 6, 5)
test output shape : (3494, 1, 1)
I am developing the following model. of 2D CNN LSTM.
model = Sequential()
model.add(TimeDistributed(Conv2D(1, (1,1), activation='relu',
input_shape=(6,5,1))))
model.add(TimeDistributed(MaxPooling2D(pool_size=(6, 5))))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(units=300, return_sequences= False, input_shape=(100,1)))
model.add(Dense(1))
when I try to fit as follow
model.fit(train_input,train_output,epochs=50,batch_size=60)
it gives me a error.
ValueError: strides should be of length 1, 1 or 3 but was 2
please correct my model. I am converting the 6,5 image to a single unit and predict the 101th time stamp from 100 time stamps.
Your question is quite unclear, but I believe you have sequence of 100 images of size 6 x 5. It is better to incorporate Conv3D in your usecase, and also there is no necessary to have TimeDistributed everywhere. This is just an illustration for your usecase, you may have to add more layers of Conv and MaxPool and experiment with other hyper-parameters to get good fit.
# Add the channel dimension in input
train_input = np.expand_dims(train_input, -1)
# Remove the extra dimension in output
train_output = np.reshape(train_output, (-1, 1))
model = Sequential()
model.add(Conv3D(1, (1,1,1), activation='relu', input_shape=(100, 6,5, 1)))
model.add(MaxPooling3D(pool_size=(6, 5, 1)))
model.add(Reshape((16, 5)))
model.add(LSTM(units=300, return_sequences= False))
model.add(Dense(1))
I am here to ask some more general questions about Pytorch and Convolutional Autoencoders.
If I only use Convolutional Layers (FCN), do I even have to care about the input shape? And then how do I choose the number of featuremaps best?
Does a ConvTranspose2d Layer automatically unpool?
Can you spot any errors or unconventional code in my example?
By the way, I want to make a symmetrical Convolutional Autoencoder to colorize black and white images with different image sizes.
self.encoder = nn.Sequential (
# conv 1
nn.Conv2d(in_channels=3, out_channels=512, kernel_size=3, stride=1, padding=1),
nn.ReLU,
nn.MaxPool2d(kernel_size=2, stride=2), # 1/2
nn.BatchNorm2d(512),
# conv 2
nn.Conv2d(in_channels=512, out_channels=256, kernel_size=3, stride=1, padding=1),
nn.ReLU,
nn.MaxPool2d(kernel_size=2, stride=2), # 1/4
nn.BatchNorm2d(256),
# conv 3
nn.Conv2d(in_channels=256, out_channels=128, kernel_size=3, stride=1, padding=1),
nn.ReLU,
nn.MaxPool2d(kernel_size=2, stride=2), # 1/8
nn.BatchNorm2d(128),
# conv 4
nn.Conv2d(in_channels=128, out_channels=64, kernel_size=3, stride=1, padding=1),
nn.ReLU,
nn.MaxPool2d(kernel_size=2, stride=2), #1/16
nn.BatchNorm2d(64)
)
self.encoder = nn.Sequential (
# conv 5
nn.ConvTranspose2d(in_channels=64, out_channels=128, kernel_size=3, stride=1, padding=1),
nn.ReLU,
nn.BatchNorm2d(128),
# conv 6
nn.ConvTranspose2d(in_channels=128, out_channels=256, kernel_size=3, stride=1, padding=1),
nn.ReLU,
nn.BatchNorm2d(256),
# conv 7
nn.ConvTranspose2d(in_channels=256, out_channels=512, kernel_size=3, stride=1, padding=1),
nn.ReLU,
nn.BatchNorm2d(512),
# conv 8
nn.ConvTranspose2d(in_channels=512, out_channels=512, kernel_size=3, stride=1, padding=1),
nn.Softmax()
)
def forward(self, x):
h = x
h = self.encoder(h)
h = self.decoder(h)
return h
No, you don't need to care about input width and height with a fully convolutional model. But should probably ensure that each downsampling operation in the encoder is matched by a corresponding upsampling operation in the decoder.
I'm not sure what you mean by unpooling. If you mean upsampling (increasing spatial dimensions), then this is what the stride parameter is for. In PyTorch, a transpose convolution with stride=2 will upsample twice. Note, however, that instead of a transpose convolution, many practitioners prefer to use bilinear upsampling followed by a regular convolution. This is one reason why.
If, on the other hand, you mean actual unpooling, then you should look at the documentation of torch.MaxUnpool2d. You need to collect maximal value indices from the MaxPool2d operation and feed them into MaxUnpool2d.
The general consensus seems to be that you should increase the number of feature maps as you downsample. Your code appears to do the reverse. Consecutive powers of 2 seem like a good place to start. It's hard to suggest a better rule of thumb. You probably need to experiment a little.
In other notes, I'm not sure why you apply softmax to the encoder output.
I do not able to understand the basic structure of LSTM model.
Here is mine model:
def build_model(train,n_input):
train_x, train_y = to_supervised(train, n_input)
verbose, epochs, batch_size = 1, 60,20
n_timesteps, n_features, n_outputs = train_x.shape[1], train_x.shape[2], train_y.shape[1]
train_y = train_y.reshape((train_y.shape[0], train_y.shape[1], 1))
model = Sequential()
model.add(LSTM(200, activation='relu', input_shape=(n_timesteps, n_features)))
model.add(RepeatVector(n_outputs))
model.add(LSTM(200, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(100, activation='relu')))
model.add(TimeDistributed(Dense(1)))
model.compile(loss='mse', optimizer='adam', metrics=['accuracy'])
model.fit(train_x, train_y, epochs=epochs, batch_size=batch_size, verbose=verbose)
return model
Here is my model.summary()
Layer (type) Output Shape Param #
=================================================================
lstm_5 (LSTM) (None, 200) 172000
_________________________________________________________________
repeat_vector_3 (RepeatVecto (None, 7, 200) 0
_________________________________________________________________
lstm_6 (LSTM) (None, 7, 200) 320800
_________________________________________________________________
time_distributed_5 (TimeDist (None, 7, 100) 20100
_________________________________________________________________
time_distributed_6 (TimeDist (None, 7, 1) 101
=================================================================
Total params: 513,001
Trainable params: 513,001
Non-trainable params: 0
_________________________________________________________________
None
From the above summary, i do not understanding what is lstm_5 or lstm_6. Also it don't tell number of hidden layer in the network
Please someone help me understand that in the above model, how many hidden layer are there with neuron.
I basically confuse by add(LSTM(200 ...) and add(TimeDistributed(Dense(100..)
I think 200 and 100 are the number of neuron in hidden layer and there are 4 hidden layer containing all .add() .
Please correct me and clarify my doubts. If possible try to understand by the diagram.
Pictorial representation of the model architecture to understand how outputs of a layer are attached to the next layer in the sequence.
The picture is self explanatory and it matches your model summary. Also note Batch_Size is None in the model summary as it is calculated dynamically. Also note that in LSTM the size of hidden layer is same as the size of the output of the LSTM.
Here you define an LSTM layer with 200 neurons. The 200-dim vector basically represtens the sequence as an interal embedding:
model.add(LSTM(200, activation='relu', input_shape=(n_timesteps, n_features)))
Therfore you get an output of (None, 200)
Here you repeat the vector 7 times:
model.add(RepeatVector(n_outputs))
You get an vector of (None, 7, 200)
You use this vector again as a sequence and you return the state of the 200 neurons at every timestep, unclear why:
model.add(LSTM(200, activation='relu', return_sequences=True))
You get an vector of (None, 7, 200)
You apply a weight-sharing Dense Layer with 100 Neurons on every time step. I actually don´t know, why the weights are shared here, seems weird:
model.add(TimeDistributed(Dense(100, activation='relu')))
You get an vector of (None, 7, 100)
Finally you apply a last neuron for every of these 7 timestep, again with shared weights, making a single value out of the 100-dim vector. The result is a vector of 7 neurons, one for every class:
model.add(TimeDistributed(Dense(1)))
I was following the Keras Seq2Seq tutorial, and wit works fine. However, this is a character-level model, and I would like to adopt it to a word-level model. The authors even include a paragraph with require changes but all my current attempts result in an error regarding wring dimensions.
If you follow the character-level model, the input data is of 3 dims: #sequences, #max_seq_len, #num_char since each character is one-hot encoded. When I plot the summary for the model as used in the tutorial, I get:
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, None, 71) 0
_____________________________________________________________________________ __________________
input_2 (InputLayer) (None, None, 94) 0
__________________________________________________________________________________________________
lstm_1 (LSTM) [(None, 256), (None, 335872 input_1[0][0]
__________________________________________________________________________________________________
lstm_2 (LSTM) [(None, None, 256), 359424 input_2[0][0]
lstm_1[0][1]
lstm_1[0][2]
__________________________________________________________________________________________________
dense_1 (Dense) (None, None, 94) 24158 lstm_2[0][0]
==================================================================================================
This compiles and trains just fine.
Now this tutorial has section "What if I want to use a word-level model with integer sequences?" And I've tried to follow those changes. Firstly, I encode all sequences using a word index. As such, the input and target data is now 2 dims: #sequences, #max_seq_len since I no longer one-hot encode but use now Embedding layers.
encoder_input_data_train.shape => (90000, 9)
decoder_input_data_train.shape => (90000, 16)
decoder_target_data_train.shape => (90000, 16)
For example, a sequence might look like this:
[ 826. 288. 2961. 3127. 1260. 2108. 0. 0. 0.]
When I use the listed code:
# encoder
encoder_inputs = Input(shape=(None, ))
x = Embedding(num_encoder_tokens, latent_dim)(encoder_inputs)
x, state_h, state_c = LSTM(latent_dim, return_state=True)(x)
encoder_states = [state_h, state_c]
# decoder
decoder_inputs = Input(shape=(None,))
x = Embedding(num_decoder_tokens, latent_dim)(decoder_inputs)
x = LSTM(latent_dim, return_sequences=True)(x, initial_state=encoder_states)
decoder_outputs = Dense(num_decoder_tokens, activation='softmax')(x)
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
the model compiles and looks like this:
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_35 (InputLayer) (None, None) 0
__________________________________________________________________________________________________
input_36 (InputLayer) (None, None) 0
__________________________________________________________________________________________________
embedding_32 (Embedding) (None, None, 256) 914432 input_35[0][0]
__________________________________________________________________________________________________
embedding_33 (Embedding) (None, None, 256) 914432 input_36[0][0]
__________________________________________________________________________________________________
lstm_32 (LSTM) [(None, 256), (None, 525312 embedding_32[0][0]
__________________________________________________________________________________________________
lstm_33 (LSTM) (None, None, 256) 525312 embedding_33[0][0]
lstm_32[0][1]
lstm_32[0][2]
__________________________________________________________________________________________________
dense_21 (Dense) (None, None, 3572) 918004 lstm_33[0][0]
While compile works, training
model.fit([encoder_input_data, decoder_input_data], decoder_target_data, batch_size=32, epochs=1, validation_split=0.2)
fails with the following error: ValueError: Error when checking target: expected dense_21 to have 3 dimensions, but got array with shape (90000, 16) with the latter being the shape of the decoder input/target. Why does the Dense layer an array of the shape of the decoder input data?
Things I've tried:
I find it a bit strange that the decoder LSTM has a return_sequences=True since I thought I cannot give a sequences to a Dense layer (and the decoder of the original character-level model does not state this). However, simply removing or setting return_sequences=False did not help. Of course, the Dense layer now has an output shape of (None, 3572).
I don' quite get the need for the Input layers. I've set them to shape=(max_input_seq_len, ) and shape=(max_target_seq_len, ) respectively so that the summary doesn't show (None, None) but the respective values, e.g., (None, 16). No change.
In the Keras Docs I've read that an Embedding layer should be used with input_length, otherwise a Dense layer upstream cannot compute its outputs. But again, still errors when I set input_length accordingly.
I'm a bit at a deadlock right? Am I even on the right track or do I missing something more fundamentally. Is the shape of my data wrong? Why does the last Dense layer get array with shape (90000, 16)? That seems rather off.
UPDATE: I figured out that the problem seems to be decoder_target_data which currently has the shape (#sample, max_seq_len), e.g., (90000, 16). But I assume I need to one-hot encode the target output with respect to the vocabulary: (#sample, max_seq_len, vocab_size), e.g., (90000, 16, 3572).
Unfortunately, this throws a Memory error. However, when I do for debugging purposes, i.e., assume a vocabulary size of 10:
decoder_target_data = np.zeros((len(input_sequences), max_target_seq_len, 10), dtype='float32')
and later in the decoder model:
x = Dense(10, activation='softmax')(x)
then the model trains without error. In case that's indeed my issue, I have to train the model with manually generate batches so I can keep the vocabulary size but reduce the #samples, e.g., to 90 batches each of shape (1000, 16, 3572). Am I on the right track here?
Recently I was also facing this problem. There is no other solution then creating small batches say batch_size=64 in a generator and then instead of model.fit do model.fit_generator. I have attached my generate_batch code below:
def generate_batch(X, y, batch_size=64):
''' Generate a batch of data '''
while True:
for j in range(0, len(X), batch_size):
encoder_input_data = np.zeros((batch_size, max_encoder_seq_length),dtype='float32')
decoder_input_data = np.zeros((batch_size, max_decoder_seq_length+2),dtype='float32')
decoder_target_data = np.zeros((batch_size, max_decoder_seq_length+2, num_decoder_tokens),dtype='float32')
for i, (input_text_seq, target_text_seq) in enumerate(zip(X[j:j+batch_size], y[j:j+batch_size])):
for t, word_index in enumerate(input_text_seq):
encoder_input_data[i, t] = word_index # encoder input seq
for t, word_index in enumerate(target_text_seq):
decoder_input_data[i, t] = word_index
if (t>0)&(word_index<=num_decoder_tokens):
decoder_target_data[i, t-1, word_index-1] = 1.
yield([encoder_input_data, decoder_input_data], decoder_target_data)
And then training like this:
batch_size = 64
epochs = 2
# Run training
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(
generator=generate_batch(X=X_train_sequences, y=y_train_sequences, batch_size=batch_size),
steps_per_epoch=math.ceil(len(X_train_sequences)/batch_size),
epochs=epochs,
verbose=1,
validation_data=generate_batch(X=X_val_sequences, y=y_val_sequences, batch_size=batch_size),
validation_steps=math.ceil(len(X_val_sequences)/batch_size),
workers=1,
)
X_train_sequences is list of lists like [[23,34,56], [2, 33544, 6, 10]].
Similarly others.
Also took help from this blog - word-level-english-to-marathi-nmt
I built my keras model in the following way (this is of course not the final production ready model):
self.model = Sequential()
self.model.add(Conv2D(32, (3, 3), input_shape=(674, 514, 1), padding='same',
activation='relu'))
self.model.compile(loss='mean_squared_error', optimizer='adam', metrics=
['accuracy'])
Model summary is:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (None, 674, 514, 32) 320
=================================================================
Total params: 320
Trainable params: 320
Non-trainable params: 0
_________________________________________________________________
I try to fit it in the following way:
self.model.fit(self.input_images, self.output_images, batch_size=32,
epochs=10, verbose=1, shuffle=True)
Shapes of both training input and output (self.input_images, self.output_images) are both (100, 674, 514, 1).
And when i try to train my model I get the following exception:
ValueError: Error when checking target: expected conv2d_1 to have shape
(674, 514, 32) but got array with shape (674, 514, 1)
Any help is much appreciated.
The mismatch is with your output_images. The result of the convolutional layer is (None, 674, 514, 32), because it has 32 filters. The loss mean_squared_error tells keras to expect a compatible label shape (which the supplied output_images is not).
The model isn't finished and normally CNN has many convolutional and downsampling layers, so the output shape is going to be different. But if you want you can make this model work by either changing the number of filters to 1...
Conv2D(1, ...)
... or by making output_images a tensor of shape (100, 674, 514, 32).