Keras autoencoder model for detect anomaly in text - machine-learning

I am trying to create an autoencoder that is capable of finding anomalies in text sequences:
X_train_pada_seq.shape
(28840, 999)
I want to use a layer Embedding. Here is my model:
encoder_inputs = Input(shape=(max_len_str, ))
encoder_emb = Embedding(input_dim=len(word_index)+1, output_dim=20, input_length=laenge_pads)(encoder_inputs)
encoder_LSTM_1 = Bidirectional(LSTM(400, activation='relu', return_sequences=True))(encoder_emb)
encoder_drop = Dropout(0.2)(encoder_LSTM_1)
encoder_LSTM_2 = Bidirectional(GRU(200, activation='relu', return_sequences=False, name = 'bottleneck'))(encoder_drop)
decoder_repeated = RepeatVector(200)(encoder_LSTM_2)
decoder_LSTM = Bidirectional(LSTM(400, activation='relu', return_sequences=True))(decoder_repeated)
decoder_drop = Dropout(0.2)(decoder_LSTM)
decoder_output = TimeDistributed(Dense(999, activation='softmax'))(decoder_drop)
autoencoder = Model(encoder_inputs, decoder_output)
autoencoder.compile(optimizer=keras.optimizers.Adam(learning_rate=0.0001), loss='categorical_crossentropy', metrics=['accuracy'])
autoencoder.summary()
Model: "model_7"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_10 (InputLayer) [(None, 999)] 0
_________________________________________________________________
embedding_19 (Embedding) (None, 999, 20) 159660
_________________________________________________________________
bidirectional (Bidirectional (None, 999, 800) 1347200
_________________________________________________________________
dropout (Dropout) (None, 999, 800) 0
_________________________________________________________________
bidirectional_1 (Bidirection (None, 400) 1202400
_________________________________________________________________
repeat_vector (RepeatVector) (None, 200, 400) 0
_________________________________________________________________
bidirectional_2 (Bidirection (None, 200, 800) 2563200
_________________________________________________________________
dropout_1 (Dropout) (None, 200, 800) 0
_________________________________________________________________
time_distributed_6 (TimeDist (None, 200, 999) 800199
=================================================================
Total params: 6,072,659
Trainable params: 6,072,659
Non-trainable params: 0
But when training the model:
history = autoencoder.fit(X_train_pada_seq, X_train_pada_seq, epochs=10, batch_size=64,
validation_data=(X_test_pada_seq, X_test_pada_seq))
I get an error:
ValueError: Shapes (None, 999) and (None, 200, 999) are incompatible
How to remake the model to fix the error?

I've seen your code snippet and it seems that your model output need to match your target shape which is (None, 999), but your output shape is (None, 200, 999).
You need to make your output model shape match the target shape.
Try using tf.reduce_mean with axis=1 (averages all the sequence):
decoder_drop = Dropout(0.2)(decoder_LSTM)
decoder_time = TimeDistributed(Dense(999, activation='softmax'))(decoder_drop)
decoder_output = tf.math.reduce_mean(decoder_time, axis=1)
This should let you fit the model.

your last layer (output) should be of this shape
batchsize x 999 x 200) #999 words, 200 is dim of each word
Currently the output of your model is
batchsize x 200 x 999
which is incorrect.
use sparse categorical cross entropy as loss function.
then it will work.

Related

Dense layer does not give expected Output shape

I am trying to copy a model architecture. In the original model architecture, after applying the last Dense layer Output Shape is (None, 3) with 300 params. As shown
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_Dense1 (Dense) (None, 100) 128100
dense_Dense2 (Dense) (None, 3) 300
But when I apply the Dense output shape I am getting is (None, 3) with 303 params. as shown below
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_35 (Dense) (None, 100) 128100
dense_36 (Dense) (None, 3) 303
This is the code I wrote for this part:
x = GlobalAveragePooling2D()(x)
x = Dense(100, activation="relu")(x)
prediction = Dense(3, activation='softmax')(x)
Is it possible that the architecture you're trying to copy doesn't use bias? Try not using bias:
Dense(3, activation='softmax', use_bias=False)
Model: "sequential_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
global_average_pooling2d_3 ( (None, 8) 0
_________________________________________________________________
dense_9 (Dense) (None, 100) 900
_________________________________________________________________
dense_10 (Dense) (None, 3) 300
=================================================================
Total params: 1,200
Trainable params: 1,200
Non-trainable params: 0
_________________________________________________________________

How to add more layers to existing model (eg. teachable machine application model)?

I'm trying to use the google model from teachable machine application https://teachablemachine.withgoogle.com/ by adding few more layers before output layers.
When I retrain the model, always return this error:
ValueError: Input 0 of layer dense_25 is incompatible with the layer: expected axis -1 of input shape to have value 5 but received input with shape [20, 512]
Here's my approach:
When retrain the model it return error:
If I retrain the model without adding new layers, it's working fine.
Can anybody advise what was the issue?
UPDATED ANSWER
if you want to add layers in between two layers for a pre-trained model, it is not as straightforward as adding layers using add method. if done so will result in un-expected behavior
analysis of error:
if you compile the model like below(like you specified):
model.layers[-1].add(Dense(512, activation ="relu"))
model.add(Dense(128, activation="relu"))
model.add(Dense(32))
model.add(Dense(5))
output of model summary :
Model: "sequential_12"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
sequential_9 (Sequential) (None, 1280) 410208
_________________________________________________________________
sequential_11 (Sequential) (None, 512) 131672
_________________________________________________________________
dense_12 (Dense) (None, 128) 768
_________________________________________________________________
dense_13 (Dense) (None, 32) 4128
_________________________________________________________________
dense_14 (Dense) (None, 5) 165
=================================================================
Total params: 546,941
Trainable params: 532,861
Non-trainable params: 14,080
_________________________________________________________________
everything looks good here, but on closer look :
for l in model.layers:
print("layer : ", l.name, ", expects input of shape : ",l.input_shape)
output :
layer : sequential_9 , expects input of shape : (None, 224, 224, 3)
layer : sequential_11 , expects input of shape : (None, 1280)
layer : dense_12 , expects input of shape : (None, 5) <-- **PROBLEM**
layer : dense_13 , expects input of shape : (None, 128)
layer : dense_14 , expects input of shape : (None, 32)
PROBLEM here is that dense_12 expects an input of shape(None, 5) but it should expect input shape of (None, 512) since we have added Dense(512) to sequential_11, possible reason would be adding layers like above specified might not update few attributes such as output shape of sequential_11, so during forward pass there is as miss-match between output of sequential_11 and input of layer dense_12(in your case dense_25)
possible work around would be :
for your question "adding layers in between sequential_9 and sequential_11", you can add as many layers as you want in between sequential_9 and sequential_11, but always make sure that output shape of last added layer should match input shape expected by sequential_11. in this case it is 1280.
code :
sequential_1 = model.layers[0] # re-using pre-trained model
sequential_2 = model.layers[1]
from tensorflow.keras.layers import Input
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Model
inp_sequential_1 = Input(sequential_1.layers[0].input_shape[1:])
out_sequential_1 = sequential_1(inp_sequential_1)
#adding layers in between sequential_9 and sequential_11
out_intermediate = Dense(512, activation="relu")(out_sequential_1)
out_intermediate = Dense(128, activation ="relu")(out_intermediate)
out_intermediate = Dense(32, activation ="relu")(out_intermediate)
# always make sure to include a layer with output shape matching input shape of sequential 11, in this case 1280
out_intermediate = Dense(1280, activation ="relu")(out_intermediate)
output = sequential_2(out_intermediate) # output of intermediate layers are given to sequential_11
final_model = Model(inputs=inp_sequential_1, outputs=output)
output of model summary:
Model: "functional_3"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_5 (InputLayer) [(None, 224, 224, 3)] 0
_________________________________________________________________
sequential_9 (Sequential) (None, 1280) 410208
_________________________________________________________________
dense_15 (Dense) (None, 512) 655872
_________________________________________________________________
dense_16 (Dense) (None, 128) 65664
_________________________________________________________________
dense_17 (Dense) (None, 32) 4128
_________________________________________________________________
dense_18 (Dense) (None, 1280) 42240
_________________________________________________________________
sequential_11 (Sequential) (None, 5) 128600
=================================================================
Total params: 1,306,712
Trainable params: 1,292,632
Non-trainable params: 14,080

What is the effect of using TimeDistributed layer wrapper?

Consider the following two models:
from tensorflow.python.keras.layers import Input, GRU, Dense, TimeDistributed
from tensorflow.python.keras.models import Model
inputs = Input(batch_shape=(None, None, 100))
gru_out = GRU(32, return_sequences=True)(inputs)
dense = Dense(200, activation='softmax')
decoder_pred = TimeDistributed(dense)(gru_out)
model = Model(inputs=inputs, outputs=decoder_pred)
model.summary()
with the output:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, None, 100) 0
_________________________________________________________________
gru (GRU) (None, None, 32) 12768
_________________________________________________________________
time_distributed (TimeDistri (None, None, 200) 6600
=================================================================
Total params: 19,368
Trainable params: 19,368
Non-trainable params: 0
_________________________________________________________________
And the second model:
from tensorflow.python.keras.layers import Input, GRU, Dense
from tensorflow.python.keras.models import Model
inputs = Input(batch_shape=(None, None, 100))
gru_out = GRU(32, return_sequences=True)(inputs)
decoder_pred = Dense(200, activation='softmax')(gru_out)
model = Model(inputs=inputs, outputs=decoder_pred)
model.summary()
with the output:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) (None, None, 100) 0
_________________________________________________________________
gru_1 (GRU) (None, None, 32) 12768
_________________________________________________________________
dense_1 (Dense) (None, None, 200) 6600
=================================================================
Total params: 19,368
Trainable params: 19,368
Non-trainable params: 0
_________________________________________________________________
My question is, is the TimeDistributed layer wrapper doing anything to the first model? Are these two different in any aspect (considering that their total number of params are identical)?

Keras target dimensions mismatch

Attempting a single-label classification problem with num_classes = 73
Here's my simplified Keras model:
num_classes = 73
batch_size = 4
train_data_list = [training_file_names list here..]
validation_data_list = [ validation_file_names list here..]
training_generator = DataGenerator(train_data_list, batch_size, num_classes)
validation_generator = DataGenerator(validation_data_list, batch_size, num_classes)
model = Sequential()
model.add(Conv1D(32, 3, strides=1, input_shape=(15,120), activation="relu"))
model.add(Conv1D(16, 3, strides=1, activation="relu"))
model.add(Flatten())
model.add(Dense(n_classes, activation='softmax'))
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss="categorical_crossentropy",optimizer=sgd,metrics=['accuracy'])
model.fit_generator(generator=training_generator, epochs=100,
validation_data=validation_generator)
Here's my DataGenerator's __get_item__ method:
def __get_item__(self):
X = np.zeros((self.batch_size,15,120))
y = np.zeros((self.batch_size, 1 ,self.n_classes))
for i in range(self.batch_size):
X_row = some_method_that_gives_X_of_15x20_dim()
target = some_method_that_gives_target()
one_hot = keras.utils.to_categorical(target, num_classes=self.n_classes)
X[i] = X_row
y[i] = one_hot
return X, y
Since my X values are correctly returned with dimension (batch_size, 15, 120), I am not showing it here. My issue is with the y value returned.
y returned from this generator method has a shape of (batch_size, 1, 73) as one hot encoded label for the 73 classes, which I think is the correct shape to return.
However Keras gives the following error for the last layer:
ValueError: Error when checking target: expected dense_1 to have 2
dimensions, but got array with shape (4, 1, 73)
Since the batch size is 4, I think the target batch should also be 3 dimensional (4,1,73). Why is then Keras expecting the last layer to be 2 dimensions ?
you model' s summary shows that in the output layer there should be only 2 dimensions, (None, 73)
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1d_7 (Conv1D) (None, 13, 32) 11552
_________________________________________________________________
conv1d_8 (Conv1D) (None, 11, 16) 1552
_________________________________________________________________
flatten_5 (Flatten) (None, 176) 0
_________________________________________________________________
dense_4 (Dense) (None, 73) 12921
=================================================================
Total params: 26,025
Trainable params: 26,025
Non-trainable params: 0
_________________________________________________________________
Since dimension of your target is (batch_size, 1, 73), you can just change to (batch_size, 73) in order for your model to run

Fine tuning model delete previous added layers

I use Keras 2.2.4. I train a model that I want to fine-tune every 30 epochs with new data content (image classification).
Everyday I add more image to classes to feed the model. Every 30 epochs the model is re-trained.
I use 2 conditions, first one if no previous model already trained and second condition when a model is already trained then I want to fine-tune it with new content/classes.
model_base = keras.applications.vgg19.VGG19(include_top=False, input_shape=(*IMG_SIZE, 3), weights='imagenet')
output = GlobalAveragePooling2D()(model_base.output)
# If we resume a pretrained model load it
if os.path.isfile(os.path.join(MODEL_PATH, 'weights.h5')):
print('Using existing weights...')
base_lr = 0.0001
model = load_model(os.path.join(MODEL_PATH, 'weights.h5'))
output = Dense(len(all_character_names), activation='softmax', name='d2')(output)
model = Model(model_base.input, output)
for layer in model_base.layers[:-2]:
layer.trainable = False
else:
base_lr = 0.001
output = BatchNormalization()(output)
output = Dropout(0.5)(output)
output = Dense(2048, activation='relu', name='d1')(output)
output = BatchNormalization()(output)
output = Dropout(0.5)(output)
output = Dense(len(all_character_names), activation='softmax', name='d2')(output)
model = Model(model_base.input, output)
for layer in model_base.layers[:-5]:
layer.trainable = False
opt = optimizers.Adam(lr=base_lr, decay=base_lr / epochs)
model.compile(optimizer=opt,
loss='categorical_crossentropy',
metrics=['accuracy'])
Model summary first time:
...
_________________________________________________________________
block5_conv4 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_pool (MaxPooling2D) (None, 7, 7, 512) 0
_________________________________________________________________
global_average_pooling2d_1 ( (None, 512) 0
_________________________________________________________________
batch_normalization_1 (Batch (None, 512) 2048
_________________________________________________________________
dropout_1 (Dropout) (None, 512) 0
_________________________________________________________________
d1 (Dense) (None, 2048) 1050624
_________________________________________________________________
batch_normalization_2 (Batch (None, 2048) 8192
_________________________________________________________________
dropout_2 (Dropout) (None, 2048) 0
_________________________________________________________________
d2 (Dense) (None, 19) 38931
=================================================================
Total params: 21,124,179
Trainable params: 10,533,907
Non-trainable params: 10,590,272
Model summary second time:
...
_________________________________________________________________
block5_conv4 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_pool (MaxPooling2D) (None, 7, 7, 512) 0
_________________________________________________________________
global_average_pooling2d_1 ( (None, 512) 0
_________________________________________________________________
d2 (Dense) (None, 19) 9747
=================================================================
Total params: 20,034,131
Trainable params: 2,369,555
Non-trainable params: 17,664,576
Problem: When a model exist and is loaded for fine-tune it seems to have loose all additionals layers added the first time (Dense 2048, Dropout, etc)
Do I need to add these layers again ? It seems to have no sense as it would loose the training information made at the first pass.
Note: I may need to not set the base_lr as saving a model should save also the learning rate at the state where it stopped before, but I will check this later.
Please note that once you load the model:
model = load_model(os.path.join(MODEL_PATH, 'weights.h5'))
You don't use it. You just overwrite it again
model = Model(model_base.input, output)
Where output is also defined as an operation on the base_model.
It seems to me that you just want to delete the lines after load_model.

Resources