Understanding how many hidden layer in given LSTM model - machine-learning

I do not able to understand the basic structure of LSTM model.
Here is mine model:
def build_model(train,n_input):
train_x, train_y = to_supervised(train, n_input)
verbose, epochs, batch_size = 1, 60,20
n_timesteps, n_features, n_outputs = train_x.shape[1], train_x.shape[2], train_y.shape[1]
train_y = train_y.reshape((train_y.shape[0], train_y.shape[1], 1))
model = Sequential()
model.add(LSTM(200, activation='relu', input_shape=(n_timesteps, n_features)))
model.add(RepeatVector(n_outputs))
model.add(LSTM(200, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(100, activation='relu')))
model.add(TimeDistributed(Dense(1)))
model.compile(loss='mse', optimizer='adam', metrics=['accuracy'])
model.fit(train_x, train_y, epochs=epochs, batch_size=batch_size, verbose=verbose)
return model
Here is my model.summary()
Layer (type) Output Shape Param #
=================================================================
lstm_5 (LSTM) (None, 200) 172000
_________________________________________________________________
repeat_vector_3 (RepeatVecto (None, 7, 200) 0
_________________________________________________________________
lstm_6 (LSTM) (None, 7, 200) 320800
_________________________________________________________________
time_distributed_5 (TimeDist (None, 7, 100) 20100
_________________________________________________________________
time_distributed_6 (TimeDist (None, 7, 1) 101
=================================================================
Total params: 513,001
Trainable params: 513,001
Non-trainable params: 0
_________________________________________________________________
None
From the above summary, i do not understanding what is lstm_5 or lstm_6. Also it don't tell number of hidden layer in the network
Please someone help me understand that in the above model, how many hidden layer are there with neuron.
I basically confuse by add(LSTM(200 ...) and add(TimeDistributed(Dense(100..)
I think 200 and 100 are the number of neuron in hidden layer and there are 4 hidden layer containing all .add() .
Please correct me and clarify my doubts. If possible try to understand by the diagram.

Pictorial representation of the model architecture to understand how outputs of a layer are attached to the next layer in the sequence.
The picture is self explanatory and it matches your model summary. Also note Batch_Size is None in the model summary as it is calculated dynamically. Also note that in LSTM the size of hidden layer is same as the size of the output of the LSTM.

Here you define an LSTM layer with 200 neurons. The 200-dim vector basically represtens the sequence as an interal embedding:
model.add(LSTM(200, activation='relu', input_shape=(n_timesteps, n_features)))
Therfore you get an output of (None, 200)
Here you repeat the vector 7 times:
model.add(RepeatVector(n_outputs))
You get an vector of (None, 7, 200)
You use this vector again as a sequence and you return the state of the 200 neurons at every timestep, unclear why:
model.add(LSTM(200, activation='relu', return_sequences=True))
You get an vector of (None, 7, 200)
You apply a weight-sharing Dense Layer with 100 Neurons on every time step. I actually donĀ“t know, why the weights are shared here, seems weird:
model.add(TimeDistributed(Dense(100, activation='relu')))
You get an vector of (None, 7, 100)
Finally you apply a last neuron for every of these 7 timestep, again with shared weights, making a single value out of the 100-dim vector. The result is a vector of 7 neurons, one for every class:
model.add(TimeDistributed(Dense(1)))

Related

How to handle very large 3D data in Deep Learning

I have a large input feature as 3D array of size 500x500x500 and 10000 of such samples. And the label of size 500x500x500x500.
I created a model with input shape of 500x500x500 using only one Conv3D layer at input and Dense layer at output (I have my own reason for dense layer at output) , the output shape of the network is 500x500x500x500.
Below is the bare minimum model which I used:
ip = Input(shape=(500,500,500,1))
x = Conv3D(100,3,activation="relu",padding='same')(ip)
x = Dense(500,activation="softmax")(x)
nn = Model(inputs=ip, outputs=x)
Below is the summary:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_5 (InputLayer) (None, 500, 500, 500, 1) 0
_________________________________________________________________
conv3d_4 (Conv3D) (None, 500, 500, 500, 100 2800
_________________________________________________________________
dense_4 (Dense) (None, 500, 500, 500, 500 50500
=================================================================
Total params: 53,300
Trainable params: 53,300
Non-trainable params: 0
_________________________________________________________________
when I run the model I got the memory error as I have 64 GB RAM and quadroP5000 nvidia GPU.
Another way to make it working was to split the input to 100s of 5x500x500 chunks thus making the network input of size 5x500x500 . Now I have 10000x100=1000000 samples of size 5x500x500. Below is the modified network:
ip = Input(shape=(5,500,500,1))
x = Conv3D(100,3,activation="relu",padding='same')(ip)
x = Dense(500,activation="softmax")(x)
nn = Model(inputs=ip, outputs=x)
below is the summary:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_6 (InputLayer) (None, 5, 500, 500, 1) 0
_________________________________________________________________
conv3d_5 (Conv3D) (None, 5, 500, 500, 100) 2800
_________________________________________________________________
dense_5 (Dense) (None, 5, 500, 500, 500) 50500
=================================================================
Total params: 53,300
Trainable params: 53,300
Non-trainable params: 0
_________________________________________________________________
Clearly the total number of parameters are same, but now I am able to train the network as I am able to load the data in RAM .But the network is not able to learn as it can't see all the information at once it can see only 5 of those. The information is distributed over whole array of size 500x500x500, so network can't figure out anything looking at only one chunk of size 5x500x500.
Please suggest me how to get over this. I want my network to use all the information for prediction not only one chunk.
You could resize your inputs before supplying them to the CNN. For example, I would go for 100x100x100 or even 50x50x50. Another option would be to add a Max/Mean/Average Pooling layer before the Convolution layer to have a sort of dimensionality reduction.
I had the same problem and I figured out that if I ran the whole process utilising only CPU it was working (but it was taking ages). I am not sure if this adds any value to your concerns.

ValueError: strides should be of length 1, 1 or 3 but was 2

train input shape : (13974, 100, 6, 5)
train output shape : (13974, 1,1)
test input shape : (3494, 100, 6, 5)
test output shape : (3494, 1, 1)
I am developing the following model. of 2D CNN LSTM.
model = Sequential()
model.add(TimeDistributed(Conv2D(1, (1,1), activation='relu',
input_shape=(6,5,1))))
model.add(TimeDistributed(MaxPooling2D(pool_size=(6, 5))))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(units=300, return_sequences= False, input_shape=(100,1)))
model.add(Dense(1))
when I try to fit as follow
model.fit(train_input,train_output,epochs=50,batch_size=60)
it gives me a error.
ValueError: strides should be of length 1, 1 or 3 but was 2
please correct my model. I am converting the 6,5 image to a single unit and predict the 101th time stamp from 100 time stamps.
Your question is quite unclear, but I believe you have sequence of 100 images of size 6 x 5. It is better to incorporate Conv3D in your usecase, and also there is no necessary to have TimeDistributed everywhere. This is just an illustration for your usecase, you may have to add more layers of Conv and MaxPool and experiment with other hyper-parameters to get good fit.
# Add the channel dimension in input
train_input = np.expand_dims(train_input, -1)
# Remove the extra dimension in output
train_output = np.reshape(train_output, (-1, 1))
model = Sequential()
model.add(Conv3D(1, (1,1,1), activation='relu', input_shape=(100, 6,5, 1)))
model.add(MaxPooling3D(pool_size=(6, 5, 1)))
model.add(Reshape((16, 5)))
model.add(LSTM(units=300, return_sequences= False))
model.add(Dense(1))

OneClass SVM model using pretrained ResNet50 network

I'm trying to build OneClass classifier for image recognition. I found this article, but because I have no full source code I don't exactly understand what am i doing.
X_train, X_test, y_train, y_test = train_test_split(x, y, random_state=42)
# X_train (2250, 200, 200, 3)
resnet_model = ResNet50(input_shape=(200, 200, 3), weights='imagenet', include_top=False)
features_array = resnet_model.predict(X_train)
# features_array (2250, 7, 7, 2048)
pca = PCA(svd_solver='randomized', n_components=450, whiten=True, random_state=42)
svc = SVC(kernel='rbf', class_weight='balanced')
model = make_pipeline(pca, svc)
param_grid = {'svc__C': [1, 5, 10, 50], 'svc__gamma': [0.0001, 0.0005, 0.001, 0.005]}
grid = GridSearchCV(model, param_grid)
grid.fit(X_train, y_train)
I have 2250 images (food and not food) 200x200px size, I send this data to predict method of ResNet50 model. Result is (2250, 7, 7, 2048) tensor, any one know what this dimensionality does it mean?
When I try to run grid.fit method i get an error:
ValueError: Found array with dim 4. Estimator expected <= 2.
These are the findings I could make.
You are getting the output tensor above the global average pooling layer. (See resnet_model.summary() to know about how input dimension changes to output dimension)
For a simple fix, add an Average pooling 2d Layer on top of resnet_model.
(So that output shape becomes (2250,1,1, 2048))
resnet_model = ResNet50(input_shape=(200, 200, 3), weights='imagenet', include_top=False)
resnet_op = AveragePooling2D((7, 7), name='avg_pool_app')(resnet_model.output)
resnet_model = Model(resnet_model.input, resnet_op, name="ResNet")
This generally is present in the source code of ResNet50 itself. Basically we are appending an AveragePooling2D layer to the resnet50 model. The last line combines the layer (2nd line) and the base line model into a model object.
Now the output dimension (feature_array) will be (2250, 1, 1, 2048) (because of added average pooling layer).
To avoid the ValueError you ought to reshape this feature_array to (2250, 2048)
feature_array = np.reshape(feature_array, (-1, 2048))
In the last line of the program in the question,
grid.fit(X_train, y_train)
you have fit with X_train (which are images in this case). The correct variable here is features_array (This is considered to be summary of the image). Entering this line will rectify the error,
grid.fit(features_array, y_train)
For more finetuning in this fashion by extracting feature vectors do look here (training with neural nets instead of using PCA and SVM).
Hope this helps!!

Word-level Seq2Seq with Keras

I was following the Keras Seq2Seq tutorial, and wit works fine. However, this is a character-level model, and I would like to adopt it to a word-level model. The authors even include a paragraph with require changes but all my current attempts result in an error regarding wring dimensions.
If you follow the character-level model, the input data is of 3 dims: #sequences, #max_seq_len, #num_char since each character is one-hot encoded. When I plot the summary for the model as used in the tutorial, I get:
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, None, 71) 0
_____________________________________________________________________________ __________________
input_2 (InputLayer) (None, None, 94) 0
__________________________________________________________________________________________________
lstm_1 (LSTM) [(None, 256), (None, 335872 input_1[0][0]
__________________________________________________________________________________________________
lstm_2 (LSTM) [(None, None, 256), 359424 input_2[0][0]
lstm_1[0][1]
lstm_1[0][2]
__________________________________________________________________________________________________
dense_1 (Dense) (None, None, 94) 24158 lstm_2[0][0]
==================================================================================================
This compiles and trains just fine.
Now this tutorial has section "What if I want to use a word-level model with integer sequences?" And I've tried to follow those changes. Firstly, I encode all sequences using a word index. As such, the input and target data is now 2 dims: #sequences, #max_seq_len since I no longer one-hot encode but use now Embedding layers.
encoder_input_data_train.shape => (90000, 9)
decoder_input_data_train.shape => (90000, 16)
decoder_target_data_train.shape => (90000, 16)
For example, a sequence might look like this:
[ 826. 288. 2961. 3127. 1260. 2108. 0. 0. 0.]
When I use the listed code:
# encoder
encoder_inputs = Input(shape=(None, ))
x = Embedding(num_encoder_tokens, latent_dim)(encoder_inputs)
x, state_h, state_c = LSTM(latent_dim, return_state=True)(x)
encoder_states = [state_h, state_c]
# decoder
decoder_inputs = Input(shape=(None,))
x = Embedding(num_decoder_tokens, latent_dim)(decoder_inputs)
x = LSTM(latent_dim, return_sequences=True)(x, initial_state=encoder_states)
decoder_outputs = Dense(num_decoder_tokens, activation='softmax')(x)
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
the model compiles and looks like this:
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_35 (InputLayer) (None, None) 0
__________________________________________________________________________________________________
input_36 (InputLayer) (None, None) 0
__________________________________________________________________________________________________
embedding_32 (Embedding) (None, None, 256) 914432 input_35[0][0]
__________________________________________________________________________________________________
embedding_33 (Embedding) (None, None, 256) 914432 input_36[0][0]
__________________________________________________________________________________________________
lstm_32 (LSTM) [(None, 256), (None, 525312 embedding_32[0][0]
__________________________________________________________________________________________________
lstm_33 (LSTM) (None, None, 256) 525312 embedding_33[0][0]
lstm_32[0][1]
lstm_32[0][2]
__________________________________________________________________________________________________
dense_21 (Dense) (None, None, 3572) 918004 lstm_33[0][0]
While compile works, training
model.fit([encoder_input_data, decoder_input_data], decoder_target_data, batch_size=32, epochs=1, validation_split=0.2)
fails with the following error: ValueError: Error when checking target: expected dense_21 to have 3 dimensions, but got array with shape (90000, 16) with the latter being the shape of the decoder input/target. Why does the Dense layer an array of the shape of the decoder input data?
Things I've tried:
I find it a bit strange that the decoder LSTM has a return_sequences=True since I thought I cannot give a sequences to a Dense layer (and the decoder of the original character-level model does not state this). However, simply removing or setting return_sequences=False did not help. Of course, the Dense layer now has an output shape of (None, 3572).
I don' quite get the need for the Input layers. I've set them to shape=(max_input_seq_len, ) and shape=(max_target_seq_len, ) respectively so that the summary doesn't show (None, None) but the respective values, e.g., (None, 16). No change.
In the Keras Docs I've read that an Embedding layer should be used with input_length, otherwise a Dense layer upstream cannot compute its outputs. But again, still errors when I set input_length accordingly.
I'm a bit at a deadlock right? Am I even on the right track or do I missing something more fundamentally. Is the shape of my data wrong? Why does the last Dense layer get array with shape (90000, 16)? That seems rather off.
UPDATE: I figured out that the problem seems to be decoder_target_data which currently has the shape (#sample, max_seq_len), e.g., (90000, 16). But I assume I need to one-hot encode the target output with respect to the vocabulary: (#sample, max_seq_len, vocab_size), e.g., (90000, 16, 3572).
Unfortunately, this throws a Memory error. However, when I do for debugging purposes, i.e., assume a vocabulary size of 10:
decoder_target_data = np.zeros((len(input_sequences), max_target_seq_len, 10), dtype='float32')
and later in the decoder model:
x = Dense(10, activation='softmax')(x)
then the model trains without error. In case that's indeed my issue, I have to train the model with manually generate batches so I can keep the vocabulary size but reduce the #samples, e.g., to 90 batches each of shape (1000, 16, 3572). Am I on the right track here?
Recently I was also facing this problem. There is no other solution then creating small batches say batch_size=64 in a generator and then instead of model.fit do model.fit_generator. I have attached my generate_batch code below:
def generate_batch(X, y, batch_size=64):
''' Generate a batch of data '''
while True:
for j in range(0, len(X), batch_size):
encoder_input_data = np.zeros((batch_size, max_encoder_seq_length),dtype='float32')
decoder_input_data = np.zeros((batch_size, max_decoder_seq_length+2),dtype='float32')
decoder_target_data = np.zeros((batch_size, max_decoder_seq_length+2, num_decoder_tokens),dtype='float32')
for i, (input_text_seq, target_text_seq) in enumerate(zip(X[j:j+batch_size], y[j:j+batch_size])):
for t, word_index in enumerate(input_text_seq):
encoder_input_data[i, t] = word_index # encoder input seq
for t, word_index in enumerate(target_text_seq):
decoder_input_data[i, t] = word_index
if (t>0)&(word_index<=num_decoder_tokens):
decoder_target_data[i, t-1, word_index-1] = 1.
yield([encoder_input_data, decoder_input_data], decoder_target_data)
And then training like this:
batch_size = 64
epochs = 2
# Run training
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(
generator=generate_batch(X=X_train_sequences, y=y_train_sequences, batch_size=batch_size),
steps_per_epoch=math.ceil(len(X_train_sequences)/batch_size),
epochs=epochs,
verbose=1,
validation_data=generate_batch(X=X_val_sequences, y=y_val_sequences, batch_size=batch_size),
validation_steps=math.ceil(len(X_val_sequences)/batch_size),
workers=1,
)
X_train_sequences is list of lists like [[23,34,56], [2, 33544, 6, 10]].
Similarly others.
Also took help from this blog - word-level-english-to-marathi-nmt

Keras - shapes mismatch using convolutional nets

I built my keras model in the following way (this is of course not the final production ready model):
self.model = Sequential()
self.model.add(Conv2D(32, (3, 3), input_shape=(674, 514, 1), padding='same',
activation='relu'))
self.model.compile(loss='mean_squared_error', optimizer='adam', metrics=
['accuracy'])
Model summary is:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (None, 674, 514, 32) 320
=================================================================
Total params: 320
Trainable params: 320
Non-trainable params: 0
_________________________________________________________________
I try to fit it in the following way:
self.model.fit(self.input_images, self.output_images, batch_size=32,
epochs=10, verbose=1, shuffle=True)
Shapes of both training input and output (self.input_images, self.output_images) are both (100, 674, 514, 1).
And when i try to train my model I get the following exception:
ValueError: Error when checking target: expected conv2d_1 to have shape
(674, 514, 32) but got array with shape (674, 514, 1)
Any help is much appreciated.
The mismatch is with your output_images. The result of the convolutional layer is (None, 674, 514, 32), because it has 32 filters. The loss mean_squared_error tells keras to expect a compatible label shape (which the supplied output_images is not).
The model isn't finished and normally CNN has many convolutional and downsampling layers, so the output shape is going to be different. But if you want you can make this model work by either changing the number of filters to 1...
Conv2D(1, ...)
... or by making output_images a tensor of shape (100, 674, 514, 32).

Resources