Vector representation of time series in Keras stateful LSTM autoencoder - machine-learning

I am implementing a LSTM autoencoder in Keras to get a vector representation of my time series data.
The series I have are very long and so I am using stateful LSTMs.
I create non-overlapping windows of each series and input them to the autoencoder.
See code below.
I am unclear of how to get the vector representation of a time series:
What is the vector representation of the series? Is it the encoder hidden state or the encoder output?
Each sequence is broken into windows and when performing predict, I get an [encoder_outputs, state_h, state_c] per window.
Which window contains the vector representation of the entire sequence? Is it the last window? The first?
# Builing the Model.
inputs = Input(shape=(batch_size,window_size, input_dim))
encoded = LSTM(latent_dim, stateful=True, batch_input_shape=
(batch_size,window_size, input_dim))(inputs)
decoded = RepeatVector(window_size)(encoded)
decoded = LSTM(input_dim, return_sequences=True, stateful=True,
batch_input_shape=(batch_size,window_size, input_dim))(decoded)
decoded = TimeDistributed(Dense(latent_dim, activation='linear')(decoded)
sequence_autoencoder = Model(inputs, decoded)
encoder = Model(inputs, encoded)
# Predicting using the encoder
encoded_out=encoder.predict(X, batch_size=batch_size)
# For each sequence in X, we take the output of the last window as the
vector representing the entire sequence.
# Is this correct?
seqVector=encoded_out[-batch_size:]

Related

Applying encoding to train and test data separately

I have a feature in my dataset State, so after splitting I apply encoding to train set like this
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(drop='first', handle_unknown='ignore'), ['State'])], remainder='passthrough')
encoded_X_train = ct.fit_transform(X_train)
and train model like this
regressor = LinearRegression()
regressor.fit(encoded_X_train, y_train)
then encodes and predict like this
encoded_X_test = ct.fit_transform(X_test)
y_pred = regressor.predict(encoded_X_test)
Is this the right process of doing so, or am I doing something wrong?
No. You should train the encoding model using the train data only.
fit_transform is transforming data based on the model fitted with the data.
Thus, you should use the following code instead.
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(drop='first', handle_unknown='ignore'), ['State'])], remainder='passthrough')
encoded_X_train = ct.fit_transform(X_train)
encoded_X_test = ct.transform(X_test)

sequence encoding by MultiHeadAttention

I am trying to encode a sequence of image embeddings to one bigger embedding using MultiHeadAttention. (order doesn't matter)
I want an operation similar to passing a sequence of shape (batch_size, sequence_length, embedding_dim) to an LSTM layer and taking the last hidden_state as an embedding holding all importatnt information about the sequence.. or we may say a sequence embedding.
I want to implement that using attention to get rid of the recurrent behavior..
# embeddings: shape(batch_size, sequence_length, embedding_dim)
multihead_attn = nn.MultiheadAttention(embedding_dim, num_heads)
attn_output, attn_output_weights = multihead_attn(embeddings, embeddings, embeddings)
but in this case the attention output will have a shape of (batch_size, sequence_length, embedding_dim) as well..
should I be doing
attn_output = attn_output.mean(1)
or what if I pass the query as the embeddings.mean(1).. will that give the intended behavior? will the output simulate a sequence embedding?
# embeddings: shape(batch_size, sequence_length, embedding_dim)
multihead_attn = nn.MultiheadAttention(embedding_dim, num_heads)
seq_emb, attn_output_weights = multihead_attn(embeddings.mean(1), embeddings, embeddings)

Keras Denoising Autoencoder (tabular data)

I have a project where I am doing a regression with Gradient Boosted Trees using tabular data. I want to see if using a denoising autoencoder on my data can find a better representation of my original data and improve my original GBT scores. Inspiration is taken from the popular Kaggle winner here.
AFAIK I have two main choices for extracting the activation's of the DAE - creating a bottleneck structure and taking the single middle layer activations or concatenating every layer's activation's as the representation.
Let's assume I want all layer activations from the 3x 512 node layers below:
inputs = Input(shape=(31,))
encoded = Dense(512, activation='relu')(inputs)
encoded = Dense(512, activation='relu')(encoded)
decoded = Dense(512, activation='relu')(encoded)
decoded = Dense(31, activation='linear')(decoded)
autoencoder = Model(inputs, decoded)
autoencoder.compile(optimizer='Adam', loss='mse')
history = autoencoder.fit(x_train_noisy, x_train_clean,
epochs=100,
batch_size=128,
shuffle=True,
validation_data=(x_test_noisy, x_test_clean),
callbacks=[reduce_lr])
My questions are:
Taking the activations of the above will give me a new representation of x_train, right? Should I repeat this process for x_test? I need both to train my GBT model.
How can I do inference? Each new data point will need to be "converted" into this new representation format. How can I do that with Keras?
Do I actually need to provide validation_data= to .fit in this situation?
Taking the activations of the above will give me a new representation
of x_train, right? Should I repeat this process for x_test? I need
both to train my GBT model.
Of course, you need to have the denoised representation for both training and testing data, because the GBT model that you train later only accepts the denoised feature.
How can I do inference? Each new data point will need to be
"converted" into this new representation format. How can I do that
with Keras?
If you want to use the denoised/reconstructed feature, you can directly use autoencoder.predict( X_feat ) to extract features. If you want to use the middle layer, you need to build a new model encoder_only=Model(inputs, encoded) first and use it for feature extraction.
Do I actually need to provide validation_data= to .fit in this
situation?
You'd better separate some training data for validation to prevent overfitting. However, you can always train multiple models, e.g. in a leave-one-out way to fully use all data in an ensemble way.
Additional remarks:
512 hidden neurons seems to be too many for your task
consider to use DropOut
be careful about tabular data, especially when data in different columns are of different dynamic ranges (i.e. MSE does not fairly quantize the reconstruction errors of different columns).
Denoising autoencoder model is a model that can help denoising noisy data. As train data we are using our train data with target the same data.
The model you are describing above is not a denoising autoencoder model. For an autoencoder model, on encoding part, units must gradually be decreased in number from layer to layer thus on decoding part units must gradually be increased in number.
Simple autoencoder model should look like this:
input = Input(shape=(31,))
encoded = Dense(128, activation='relu')(input)
encoded = Dense(64, activation='relu')(encoded)
encoded = Dense(32, activation='relu')(encoded)
decoded = Dense(32, activation='relu')(encoded)
decoded = Dense(64, activation='relu')(decoded)
decoded = Dense(128, activation='relu')(decoded)
decoded = Dense(31, activation='sigmoid')(decoded)
autoencoder = Model(input, decoded)
autoencoder.compile(optimizer='adam', loss='mse')
autoencoder.fit(x_train_noisy, x_train_noisy,
epochs=100,
batch_size=256,
shuffle=True,
validation_data=(x_test_noisy, x_test_noisy))

Keras: model with one input and two outputs, trained jointly on different data (semi-supervised learning)

I would like to code with Keras a neural network that acts both as an autoencoder AND a classifier for semi-supervised learning. Take for example this dataset where there is a few labeled images and a lot of unlabeled images: https://cs.stanford.edu/~acoates/stl10/
Some papers listed here achieved that, or very similar things, successfully.
To sum up: if the model would have the same input data shape and the same "encoding" convolutional layers, but would split into two heads (fork-style), so there is a classification head and a decoding head, in a way that the unsupervised autoencoder will contribute to a good learning for the classification head.
With TensorFlow there would be no problem doing that as we have full control over the computational graph.
But with Keras, things are more high-level and I feel that all the calls to ".fit" must always provide all the data at once (so it would force me to tie together the classification head and the autoencoding head into one time-step).
One way in keras to almost do that would be with something that goes like this:
input = Input(shape=(32, 32, 3))
cnn_feature_map = sequential_cnn_trunk(input)
classification_predictions = Dense(10, activation='sigmoid')(cnn_feature_map)
autoencoded_predictions = decode_cnn_head_sequential(cnn_feature_map)
model = Model(inputs=[input], outputs=[classification_predictions, ])
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy'])
model.fit([images], [labels, images], epochs=10)
However, I think and I fear that if I just want to fit things in that way it will fail and ask for the missing head:
for epoch in range(10):
# classifications step
model.fit([images], [labels, None], epochs=1)
# "semi-unsupervised" autoencoding step
model.fit([images], [None, images], epochs=1)
# note: ".train_on_batch" could probably be used rather than ".fit" to avoid doing a whole epoch each time.
How should one implement that behavior with Keras? And could the training be done jointly without having to split the two calls to the ".fit" function?
Sometimes when you don't have a label you can pass zero vector instead of one hot encoded vector. It should not change your result because zero vector doesn't have any error signal with categorical cross entropy loss.
My custom to_categorical function looks like this:
def tricky_to_categorical(y, translator_dict):
encoded = np.zeros((y.shape[0], len(translator_dict)))
for i in range(y.shape[0]):
if y[i] in translator_dict:
encoded[i][translator_dict[y[i]]] = 1
return encoded
When y contains labels, and translator_dict is a python dictionary witch contains labels and its unique keys like this:
{'unisex':2, 'female': 1, 'male': 0}
If an UNK label can't be found in this dictinary then its encoded label will be a zero vector
If you use this trick you also have to modify your accuracy function to see real accuracy numbers. you have to filter out all zero vectors from our metrics
def tricky_accuracy(y_true, y_pred):
mask = K.not_equal(K.sum(y_true, axis=-1), K.constant(0)) # zero vector mask
y_true = tf.boolean_mask(y_true, mask)
y_pred = tf.boolean_mask(y_pred, mask)
return K.cast(K.equal(K.argmax(y_true, axis=-1), K.argmax(y_pred, axis=-1)), K.floatx())
note: You have to use larger batches (e.g. 32) in order to prevent zero matrix update, because It can make your accuracy metrics crazy, I don't know why
Alternative solution
Use Pseudo Labeling :)
you can train jointly, you have to pass an array insted of single label.
I used fit_generator, e.g.
model.fit_generator(
batch_generator(),
steps_per_epoch=len(dataset) / batch_size,
epochs=epochs)
def batch_generator():
batch_x = np.empty((batch_size, img_height, img_width, 3))
gender_label_batch = np.empty((batch_size, len(gender_dict)))
category_label_batch = np.empty((batch_size, len(category_dict)))
while True:
i = 0
for idx in np.random.choice(len(dataset), batch_size):
image_id = dataset[idx][0]
batch_x[i] = load_and_convert_image(image_id)
gender_label_batch[i] = gender_labels[idx]
category_label_batch[i] = category_labels[idx]
i += 1
yield batch_x, [gender_label_batch, category_label_batch]

How to do softmax when LSTM returns sequence in keras?

I am trying to make a sequence to sequence encoder decoder model and need to softmax the last layer to use categorical cross entropy.
I've tried setting activation of the last LSTM layer to 'softmax' but that doesn't seem to do the trick. Adding another dense layer and setting the activation to softmax doesn't help either. What is the correct way to do a softmax when your last LSTM outputs a sequence?
inputs = Input(batch_shape=(batch_size, timesteps, input_dim), name='hella')
encoded = LSTM(latent_dim, return_sequences=True, stateful=False)(inputs)
encoded = LSTM(latent_dim, return_sequences=True, stateful=False)(encoded)
encoded = LSTM(latent_dim, return_sequences=True, stateful=False)(encoded)
encoded = LSTM(latent_dim, return_sequences=False)(encoded)
decoded = RepeatVector(timesteps)(encoded)
decoded = LSTM(input_dim, return_sequences=True)(decoded)
# do softmax here
sequence_autoencoder = Model(inputs, decoded)
sequence_autoencoder.compile(loss='categorical_crossentropy', optimizer='adam')
Figured it out:
As of Keras 2, you can simply add:
TimeDistributed(Dense(input_dim, activation='softmax'))
TimeDistributed allows you to apply a Dense layer on each temporal time step. Documentation can be found here: https://keras.io/layers/wrappers/

Resources