Keras - shapes mismatch using convolutional nets - machine-learning

I built my keras model in the following way (this is of course not the final production ready model):
self.model = Sequential()
self.model.add(Conv2D(32, (3, 3), input_shape=(674, 514, 1), padding='same',
activation='relu'))
self.model.compile(loss='mean_squared_error', optimizer='adam', metrics=
['accuracy'])
Model summary is:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (None, 674, 514, 32) 320
=================================================================
Total params: 320
Trainable params: 320
Non-trainable params: 0
_________________________________________________________________
I try to fit it in the following way:
self.model.fit(self.input_images, self.output_images, batch_size=32,
epochs=10, verbose=1, shuffle=True)
Shapes of both training input and output (self.input_images, self.output_images) are both (100, 674, 514, 1).
And when i try to train my model I get the following exception:
ValueError: Error when checking target: expected conv2d_1 to have shape
(674, 514, 32) but got array with shape (674, 514, 1)
Any help is much appreciated.

The mismatch is with your output_images. The result of the convolutional layer is (None, 674, 514, 32), because it has 32 filters. The loss mean_squared_error tells keras to expect a compatible label shape (which the supplied output_images is not).
The model isn't finished and normally CNN has many convolutional and downsampling layers, so the output shape is going to be different. But if you want you can make this model work by either changing the number of filters to 1...
Conv2D(1, ...)
... or by making output_images a tensor of shape (100, 674, 514, 32).

Related

perform LSTM model with more than 3D image representation

I work on image classification with 10 classes.
each image is represented as a set of sequences (=75 sequence). Each sequence is represented as a set (=42 visual words) of visuel words. each word is encoded accoardind to a visual vocabulary (size=200)
so each image is represented as a tensor of shape (75, 42,200).
I want to use LSTM network to model this image representation using this code
model = Sequential()
model.add(LSTM(128, activation='relu', input_shape=(75, 42,200))))
model.add(Dense(10, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
i get this error message:
ValueError Traceback (most recent call last)
<ipython-input-30-da9ec53d6d59> in <module>
1 model = Sequential()
----> 2 model.add(LSTM(128, activation='relu', input_shape=(75,42,200))) #number_of_hidden_units=128
3 model.add(Dense(10, activation='softmax')) #since number of output classes is 10
4 model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
5 model.summary()
2 frames
/usr/local/lib/python3.7/dist-packages/keras/engine/input_spec.py in assert_input_compatibility(input_spec, inputs, layer_name)
212 ndim = shape.rank
213 if ndim != spec.ndim:
--> 214 raise ValueError(f'Input {input_index} of layer "{layer_name}" '
215 'is incompatible with the layer: '
216 f'expected ndim={spec.ndim}, found ndim={ndim}. '
ValueError: Input 0 of layer "lstm_1" is incompatible with the layer: expected ndim=3, found ndim=4. Full shape received: (None, 75, 42, 200)
What is wrong. Please help
thank you
perform LSTM model with more than 3D image representation

ValueError: strides should be of length 1, 1 or 3 but was 2

train input shape : (13974, 100, 6, 5)
train output shape : (13974, 1,1)
test input shape : (3494, 100, 6, 5)
test output shape : (3494, 1, 1)
I am developing the following model. of 2D CNN LSTM.
model = Sequential()
model.add(TimeDistributed(Conv2D(1, (1,1), activation='relu',
input_shape=(6,5,1))))
model.add(TimeDistributed(MaxPooling2D(pool_size=(6, 5))))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(units=300, return_sequences= False, input_shape=(100,1)))
model.add(Dense(1))
when I try to fit as follow
model.fit(train_input,train_output,epochs=50,batch_size=60)
it gives me a error.
ValueError: strides should be of length 1, 1 or 3 but was 2
please correct my model. I am converting the 6,5 image to a single unit and predict the 101th time stamp from 100 time stamps.
Your question is quite unclear, but I believe you have sequence of 100 images of size 6 x 5. It is better to incorporate Conv3D in your usecase, and also there is no necessary to have TimeDistributed everywhere. This is just an illustration for your usecase, you may have to add more layers of Conv and MaxPool and experiment with other hyper-parameters to get good fit.
# Add the channel dimension in input
train_input = np.expand_dims(train_input, -1)
# Remove the extra dimension in output
train_output = np.reshape(train_output, (-1, 1))
model = Sequential()
model.add(Conv3D(1, (1,1,1), activation='relu', input_shape=(100, 6,5, 1)))
model.add(MaxPooling3D(pool_size=(6, 5, 1)))
model.add(Reshape((16, 5)))
model.add(LSTM(units=300, return_sequences= False))
model.add(Dense(1))

Understanding how many hidden layer in given LSTM model

I do not able to understand the basic structure of LSTM model.
Here is mine model:
def build_model(train,n_input):
train_x, train_y = to_supervised(train, n_input)
verbose, epochs, batch_size = 1, 60,20
n_timesteps, n_features, n_outputs = train_x.shape[1], train_x.shape[2], train_y.shape[1]
train_y = train_y.reshape((train_y.shape[0], train_y.shape[1], 1))
model = Sequential()
model.add(LSTM(200, activation='relu', input_shape=(n_timesteps, n_features)))
model.add(RepeatVector(n_outputs))
model.add(LSTM(200, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(100, activation='relu')))
model.add(TimeDistributed(Dense(1)))
model.compile(loss='mse', optimizer='adam', metrics=['accuracy'])
model.fit(train_x, train_y, epochs=epochs, batch_size=batch_size, verbose=verbose)
return model
Here is my model.summary()
Layer (type) Output Shape Param #
=================================================================
lstm_5 (LSTM) (None, 200) 172000
_________________________________________________________________
repeat_vector_3 (RepeatVecto (None, 7, 200) 0
_________________________________________________________________
lstm_6 (LSTM) (None, 7, 200) 320800
_________________________________________________________________
time_distributed_5 (TimeDist (None, 7, 100) 20100
_________________________________________________________________
time_distributed_6 (TimeDist (None, 7, 1) 101
=================================================================
Total params: 513,001
Trainable params: 513,001
Non-trainable params: 0
_________________________________________________________________
None
From the above summary, i do not understanding what is lstm_5 or lstm_6. Also it don't tell number of hidden layer in the network
Please someone help me understand that in the above model, how many hidden layer are there with neuron.
I basically confuse by add(LSTM(200 ...) and add(TimeDistributed(Dense(100..)
I think 200 and 100 are the number of neuron in hidden layer and there are 4 hidden layer containing all .add() .
Please correct me and clarify my doubts. If possible try to understand by the diagram.
Pictorial representation of the model architecture to understand how outputs of a layer are attached to the next layer in the sequence.
The picture is self explanatory and it matches your model summary. Also note Batch_Size is None in the model summary as it is calculated dynamically. Also note that in LSTM the size of hidden layer is same as the size of the output of the LSTM.
Here you define an LSTM layer with 200 neurons. The 200-dim vector basically represtens the sequence as an interal embedding:
model.add(LSTM(200, activation='relu', input_shape=(n_timesteps, n_features)))
Therfore you get an output of (None, 200)
Here you repeat the vector 7 times:
model.add(RepeatVector(n_outputs))
You get an vector of (None, 7, 200)
You use this vector again as a sequence and you return the state of the 200 neurons at every timestep, unclear why:
model.add(LSTM(200, activation='relu', return_sequences=True))
You get an vector of (None, 7, 200)
You apply a weight-sharing Dense Layer with 100 Neurons on every time step. I actually donĀ“t know, why the weights are shared here, seems weird:
model.add(TimeDistributed(Dense(100, activation='relu')))
You get an vector of (None, 7, 100)
Finally you apply a last neuron for every of these 7 timestep, again with shared weights, making a single value out of the 100-dim vector. The result is a vector of 7 neurons, one for every class:
model.add(TimeDistributed(Dense(1)))

OneClass SVM model using pretrained ResNet50 network

I'm trying to build OneClass classifier for image recognition. I found this article, but because I have no full source code I don't exactly understand what am i doing.
X_train, X_test, y_train, y_test = train_test_split(x, y, random_state=42)
# X_train (2250, 200, 200, 3)
resnet_model = ResNet50(input_shape=(200, 200, 3), weights='imagenet', include_top=False)
features_array = resnet_model.predict(X_train)
# features_array (2250, 7, 7, 2048)
pca = PCA(svd_solver='randomized', n_components=450, whiten=True, random_state=42)
svc = SVC(kernel='rbf', class_weight='balanced')
model = make_pipeline(pca, svc)
param_grid = {'svc__C': [1, 5, 10, 50], 'svc__gamma': [0.0001, 0.0005, 0.001, 0.005]}
grid = GridSearchCV(model, param_grid)
grid.fit(X_train, y_train)
I have 2250 images (food and not food) 200x200px size, I send this data to predict method of ResNet50 model. Result is (2250, 7, 7, 2048) tensor, any one know what this dimensionality does it mean?
When I try to run grid.fit method i get an error:
ValueError: Found array with dim 4. Estimator expected <= 2.
These are the findings I could make.
You are getting the output tensor above the global average pooling layer. (See resnet_model.summary() to know about how input dimension changes to output dimension)
For a simple fix, add an Average pooling 2d Layer on top of resnet_model.
(So that output shape becomes (2250,1,1, 2048))
resnet_model = ResNet50(input_shape=(200, 200, 3), weights='imagenet', include_top=False)
resnet_op = AveragePooling2D((7, 7), name='avg_pool_app')(resnet_model.output)
resnet_model = Model(resnet_model.input, resnet_op, name="ResNet")
This generally is present in the source code of ResNet50 itself. Basically we are appending an AveragePooling2D layer to the resnet50 model. The last line combines the layer (2nd line) and the base line model into a model object.
Now the output dimension (feature_array) will be (2250, 1, 1, 2048) (because of added average pooling layer).
To avoid the ValueError you ought to reshape this feature_array to (2250, 2048)
feature_array = np.reshape(feature_array, (-1, 2048))
In the last line of the program in the question,
grid.fit(X_train, y_train)
you have fit with X_train (which are images in this case). The correct variable here is features_array (This is considered to be summary of the image). Entering this line will rectify the error,
grid.fit(features_array, y_train)
For more finetuning in this fashion by extracting feature vectors do look here (training with neural nets instead of using PCA and SVM).
Hope this helps!!

Word-level Seq2Seq with Keras

I was following the Keras Seq2Seq tutorial, and wit works fine. However, this is a character-level model, and I would like to adopt it to a word-level model. The authors even include a paragraph with require changes but all my current attempts result in an error regarding wring dimensions.
If you follow the character-level model, the input data is of 3 dims: #sequences, #max_seq_len, #num_char since each character is one-hot encoded. When I plot the summary for the model as used in the tutorial, I get:
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, None, 71) 0
_____________________________________________________________________________ __________________
input_2 (InputLayer) (None, None, 94) 0
__________________________________________________________________________________________________
lstm_1 (LSTM) [(None, 256), (None, 335872 input_1[0][0]
__________________________________________________________________________________________________
lstm_2 (LSTM) [(None, None, 256), 359424 input_2[0][0]
lstm_1[0][1]
lstm_1[0][2]
__________________________________________________________________________________________________
dense_1 (Dense) (None, None, 94) 24158 lstm_2[0][0]
==================================================================================================
This compiles and trains just fine.
Now this tutorial has section "What if I want to use a word-level model with integer sequences?" And I've tried to follow those changes. Firstly, I encode all sequences using a word index. As such, the input and target data is now 2 dims: #sequences, #max_seq_len since I no longer one-hot encode but use now Embedding layers.
encoder_input_data_train.shape => (90000, 9)
decoder_input_data_train.shape => (90000, 16)
decoder_target_data_train.shape => (90000, 16)
For example, a sequence might look like this:
[ 826. 288. 2961. 3127. 1260. 2108. 0. 0. 0.]
When I use the listed code:
# encoder
encoder_inputs = Input(shape=(None, ))
x = Embedding(num_encoder_tokens, latent_dim)(encoder_inputs)
x, state_h, state_c = LSTM(latent_dim, return_state=True)(x)
encoder_states = [state_h, state_c]
# decoder
decoder_inputs = Input(shape=(None,))
x = Embedding(num_decoder_tokens, latent_dim)(decoder_inputs)
x = LSTM(latent_dim, return_sequences=True)(x, initial_state=encoder_states)
decoder_outputs = Dense(num_decoder_tokens, activation='softmax')(x)
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
the model compiles and looks like this:
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_35 (InputLayer) (None, None) 0
__________________________________________________________________________________________________
input_36 (InputLayer) (None, None) 0
__________________________________________________________________________________________________
embedding_32 (Embedding) (None, None, 256) 914432 input_35[0][0]
__________________________________________________________________________________________________
embedding_33 (Embedding) (None, None, 256) 914432 input_36[0][0]
__________________________________________________________________________________________________
lstm_32 (LSTM) [(None, 256), (None, 525312 embedding_32[0][0]
__________________________________________________________________________________________________
lstm_33 (LSTM) (None, None, 256) 525312 embedding_33[0][0]
lstm_32[0][1]
lstm_32[0][2]
__________________________________________________________________________________________________
dense_21 (Dense) (None, None, 3572) 918004 lstm_33[0][0]
While compile works, training
model.fit([encoder_input_data, decoder_input_data], decoder_target_data, batch_size=32, epochs=1, validation_split=0.2)
fails with the following error: ValueError: Error when checking target: expected dense_21 to have 3 dimensions, but got array with shape (90000, 16) with the latter being the shape of the decoder input/target. Why does the Dense layer an array of the shape of the decoder input data?
Things I've tried:
I find it a bit strange that the decoder LSTM has a return_sequences=True since I thought I cannot give a sequences to a Dense layer (and the decoder of the original character-level model does not state this). However, simply removing or setting return_sequences=False did not help. Of course, the Dense layer now has an output shape of (None, 3572).
I don' quite get the need for the Input layers. I've set them to shape=(max_input_seq_len, ) and shape=(max_target_seq_len, ) respectively so that the summary doesn't show (None, None) but the respective values, e.g., (None, 16). No change.
In the Keras Docs I've read that an Embedding layer should be used with input_length, otherwise a Dense layer upstream cannot compute its outputs. But again, still errors when I set input_length accordingly.
I'm a bit at a deadlock right? Am I even on the right track or do I missing something more fundamentally. Is the shape of my data wrong? Why does the last Dense layer get array with shape (90000, 16)? That seems rather off.
UPDATE: I figured out that the problem seems to be decoder_target_data which currently has the shape (#sample, max_seq_len), e.g., (90000, 16). But I assume I need to one-hot encode the target output with respect to the vocabulary: (#sample, max_seq_len, vocab_size), e.g., (90000, 16, 3572).
Unfortunately, this throws a Memory error. However, when I do for debugging purposes, i.e., assume a vocabulary size of 10:
decoder_target_data = np.zeros((len(input_sequences), max_target_seq_len, 10), dtype='float32')
and later in the decoder model:
x = Dense(10, activation='softmax')(x)
then the model trains without error. In case that's indeed my issue, I have to train the model with manually generate batches so I can keep the vocabulary size but reduce the #samples, e.g., to 90 batches each of shape (1000, 16, 3572). Am I on the right track here?
Recently I was also facing this problem. There is no other solution then creating small batches say batch_size=64 in a generator and then instead of model.fit do model.fit_generator. I have attached my generate_batch code below:
def generate_batch(X, y, batch_size=64):
''' Generate a batch of data '''
while True:
for j in range(0, len(X), batch_size):
encoder_input_data = np.zeros((batch_size, max_encoder_seq_length),dtype='float32')
decoder_input_data = np.zeros((batch_size, max_decoder_seq_length+2),dtype='float32')
decoder_target_data = np.zeros((batch_size, max_decoder_seq_length+2, num_decoder_tokens),dtype='float32')
for i, (input_text_seq, target_text_seq) in enumerate(zip(X[j:j+batch_size], y[j:j+batch_size])):
for t, word_index in enumerate(input_text_seq):
encoder_input_data[i, t] = word_index # encoder input seq
for t, word_index in enumerate(target_text_seq):
decoder_input_data[i, t] = word_index
if (t>0)&(word_index<=num_decoder_tokens):
decoder_target_data[i, t-1, word_index-1] = 1.
yield([encoder_input_data, decoder_input_data], decoder_target_data)
And then training like this:
batch_size = 64
epochs = 2
# Run training
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(
generator=generate_batch(X=X_train_sequences, y=y_train_sequences, batch_size=batch_size),
steps_per_epoch=math.ceil(len(X_train_sequences)/batch_size),
epochs=epochs,
verbose=1,
validation_data=generate_batch(X=X_val_sequences, y=y_val_sequences, batch_size=batch_size),
validation_steps=math.ceil(len(X_val_sequences)/batch_size),
workers=1,
)
X_train_sequences is list of lists like [[23,34,56], [2, 33544, 6, 10]].
Similarly others.
Also took help from this blog - word-level-english-to-marathi-nmt

Resources