Multiple instance model in Keras - machine-learning

I have a multiple instance dataset for which I want to predict the instance category as well as a (derived) bag label using Keras' Functional API. Simple instance prediction works and getting a bag label from that also works. But since the bag label is outside of the model the results seem to be suboptimal.
My thinking is as follows:
For each instance in a bag, start up a separate branch of the model.
After running each instance through its branch, concatenate the results.
After concatenation, predict the bag label based on probabilities
What I have written so far - here, n_instances is the number of instances per bag, n_feat the number of features per instance, and n_classes the number of possible categories an instance/bag can belong to.
from keras.layers import *
inputs = []
instance_layer = [None] * n_instances
for i in range(n_instances):
inp = Input(shape=n_feat)
inputs.append(inp)
instance_layer[i] = Dense(units=256, activation='ReLU')(inp)
instance_layer[i] = Dense(units=128, activation='ReLU')(instance_layer[i])
instance_layer[i] = Dense(units=64, activation='ReLU')(instance_layer[i])
instance_layer[i] = Dense(units=n_classes + 1, activation='sigmoid')(instance_layer[i]) # output to be converted to one-hot vector
output_tensor = Concatenate()(instance_layer)
"""
Code to go from concatenated tensor to a single bag prediction
"""
model = tf.keras.models.Model(inputs, output_tensor)
Issues:
It seems to me like each instance sees a separate model while I want to keep the models identical
Concatenate() produces a tensor of length n_instances*n_classes, whereas I'm interested in a tensor of shape (n_instances, n_classes). I would prefer to use CategoricalCrossEntropy as a loss function.
Any pointers on how to go from this tensor of instance predictions to a bag prediction?

For posterity:
instance_model = tf.keras.models.Sequential([
Dense(units=256, name='fc_256', activation='ReLU', input_dim=n_feat),
Dense(units=128, name='fc_128', activation='ReLU'),
Dense(units=64, name='fc_64', activation='ReLU'),
Dense(units=n_classes+1, name='label_predictions', activation='sigmoid')
])
This is then wrapped in a TimeDistributed layer which returns a tensor with n_instances rows and n_classes+1 columns, for an input tensor of n_instances rows and n_feat columns. n_instances is variable here, hence the None in the input shape:
inputs = Input(shape=(None, n_feat), name="input")
instance_output = TimeDistributed(instance_model)(inputs)
# Condense into bag prediction
bag_output = GlobalAveragePooling1D(name="pooling")(instance_output)
model = tf.keras.models.Model(inputs, bag_output)

Related

LSTM time series forecast for multiple time series' ( single model vs individual model performance)

I would like to use deep(stacked) lstm model to forecast the disk utilisation for all my clusters. But what I am experiencing is my individual model representing a single clusters- time series is giving more accurate performance, when compared to a single model representing all the time series'. Single model is a kind of averaging out the forecast though I am including the cluster-id as part of a feature passed to the model.
My training sequence logic goes as given below, where my time series
has [ 'utilisation','clusterID'] as two features.
for i in range(len(sequences)):
# find the end of this pattern
end_ix = i + n_steps_in
out_end_ix = end_ix + n_steps_out-1
# check if we are beyond the dataset
if out_end_ix > len(sequences):
break
# gather input and output parts of the pattern
seq_x, seq_y = sequences[i:end_ix, :], sequences[end_ix-1:out_end_ix, 0]
X.append(seq_x)
y.append(seq_y)
return array(X), array(y)```
- **Model**
trainX = trainX.reshape((trainX.shape[0], trainX.shape[1],features))
trainY = trainY.reshape((trainY.shape[0], trainY.shape[1]))
input = Input(batch_input_shape=(batch_size,look_back,features), name='input', dtype='float32')
optimizer = keras.optimizers.RMSprop(learning_rate=0.0001, rho=0.9, epsilon=None, decay=0.0)
model = Sequential()
model.add(Bidirectional(LSTM(800, return_sequences=True,return_state=False,input_shape=(look_back, features),stateful=stateful)))
model.add((LSTM(800, return_sequences=True,return_state=False,stateful=stateful)))
model.add(attention(return_sequences=False))
model.add(Dense(units=out_num))
model.compile(metrics=[rmse],run_eagerly=True,loss='mean_squared_error', optimizer='adam')
history = model.fit(trainX, trainY, batch_size=5000, epochs=100, verbose=2, validation_split=0.1)
- **and prediction logic as follows**
testX, testY = create_dataset(predictList.values,look_back,out_num)
reshaped_test=np.reshape(testX[0],(1,look_back,2))
futureStepPredict = model.predict(reshaped_test)
I was expecting an LSTM model can be trained on multiple timeseries at one time itself and can give individual forecast with almost the same accuracy as if it was considered as a single model based on the cluster-id provided as input, is this a wrong expectation?

PyTorch: Predicting future values with LSTM

I'm currently working on building an LSTM model to forecast time-series data using PyTorch. I used lag features to pass the previous n steps as inputs to train the network. I split the data into three sets, i.e., train-validation-test split, and used the first two to train the model. My validation function takes the data from the validation data set and calculates the predicted valued by passing it to the LSTM model using DataLoaders and TensorDataset classes. Initially, I've got pretty good results with R2 values in the region of 0.85-0.95.
However, I have an uneasy feeling about whether this validation function is also suitable for testing my model's performance. Because the function now takes the actual X values, i.e., time-lag features, from the DataLoader to predict y^ values, i.e., predicted target values, instead of using the predicted y^ values as features in the next prediction. This situation seems far from reality where the model has no clue of the real values of the previous time steps, especially if you forecast time-series data for longer time periods, say 3-6 months.
I'm currently a bit puzzled about tackling this issue and defining a function to predict future values relying on the model's values rather than the actual values in the test set. I have the following function predict, which makes a one-step prediction, but I haven't really figured out how to predict the whole test dataset using DataLoader.
def predict(self, x):
# convert row to data
x = x.to(device)
# make prediction
yhat = self.model(x)
# retrieve numpy array
yhat = yhat.to(device).detach().numpy()
return yhat
You can find how I split and load my datasets, my constructor for the LSTM model, and the validation function below. If you need more information, please do not hesitate to reach out to me.
Splitting and Loading Datasets
def create_tensor_datasets(X_train_arr, X_val_arr, X_test_arr, y_train_arr, y_val_arr, y_test_arr):
train_features = torch.Tensor(X_train_arr)
train_targets = torch.Tensor(y_train_arr)
val_features = torch.Tensor(X_val_arr)
val_targets = torch.Tensor(y_val_arr)
test_features = torch.Tensor(X_test_arr)
test_targets = torch.Tensor(y_test_arr)
train = TensorDataset(train_features, train_targets)
val = TensorDataset(val_features, val_targets)
test = TensorDataset(test_features, test_targets)
return train, val, test
def load_tensor_datasets(train, val, test, batch_size=64, shuffle=False, drop_last=True):
train_loader = DataLoader(train, batch_size=batch_size, shuffle=shuffle, drop_last=drop_last)
val_loader = DataLoader(val, batch_size=batch_size, shuffle=shuffle, drop_last=drop_last)
test_loader = DataLoader(test, batch_size=batch_size, shuffle=shuffle, drop_last=drop_last)
return train_loader, val_loader, test_loader
Class LSTM
class LSTMModel(nn.Module):
def __init__(self, input_dim, hidden_dim, layer_dim, output_dim, dropout_prob):
super(LSTMModel, self).__init__()
self.hidden_dim = hidden_dim
self.layer_dim = layer_dim
self.lstm = nn.LSTM(
input_dim, hidden_dim, layer_dim, batch_first=True, dropout=dropout_prob
)
self.fc = nn.Linear(hidden_dim, output_dim)
def forward(self, x, future=False):
h0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_()
c0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_()
out, (hn, cn) = self.lstm(x, (h0.detach(), c0.detach()))
out = out[:, -1, :]
out = self.fc(out)
return out
Validation (defined within a trainer class)
def validation(self, val_loader, batch_size, n_features):
with torch.no_grad():
predictions = []
values = []
for x_val, y_val in val_loader:
x_val = x_val.view([batch_size, -1, n_features]).to(device)
y_val = y_val.to(device)
self.model.eval()
yhat = self.model(x_val)
predictions.append(yhat.cpu().detach().numpy())
values.append(y_val.cpu().detach().numpy())
return predictions, values
I've finally found a way to forecast values based on predicted values from the earlier observations. As expected, the predictions were rather accurate in the short-term, slightly becoming worse in the long term. It is not so surprising that the future predictions digress over time, as they no longer depend on the actual values. Reflecting on my results and the discussions I had on the topic, here are my take-aways:
In real-life cases, the real values can be retrieved and fed into the model at each step of the prediction -be it weekly, daily, or hourly- so that the next step can be predicted with the actual values from the previous step. So, testing the performance based on the actual values from the test set may somewhat reflect the real performance of the model that is maintained regularly.
However, for predicting future values in the long term, forecasting, if you will, you need to make either multiple one-step predictions or multi-step predictions that span over the time period you wish to forecast.
Making multiple one-step predictions based on the values predicted the model yields plausible results in the short term. As the forecasting period increases, the predictions become less accurate and therefore less fit for the purpose of forecasting.
To make multiple one-step predictions and update the input after each prediction, we have to work our way through the dataset one by one, as if we are going through a for-loop over the test set. Not surprisingly, this makes us lose all the computational advantages that matrix operations and mini-batch training provide us.
An alternative could be predicting sequences of values, instead of predicting the next value only, say using RNNs with multi-dimensional output with many-to-many or seq-to-seq structure. They are likely to be more difficult to train and less flexible to make predictions for different time periods. An encoder-decoder structure may prove useful for solving this, though I have not implemented it by myself.
You can find the code for my function that forecasts the next n_steps based on the last row of the dataset X (time-lag features) and y (target value). To iterate over each row in my dataset, I would set batch_size to 1 and n_features to the number of lagged observations.
def forecast(self, X, y, batch_size=1, n_features=1, n_steps=100):
predictions = []
X = torch.roll(X, shifts=1, dims=2)
X[..., -1, 0] = y.item(0)
with torch.no_grad():
self.model.eval()
for _ in range(n_steps):
X = X.view([batch_size, -1, n_features]).to(device)
yhat = self.model(X)
yhat = yhat.to(device).detach().numpy()
X = torch.roll(X, shifts=1, dims=2)
X[..., -1, 0] = yhat.item(0)
predictions.append(yhat)
return predictions
The following line shifts values in the second dimension of the tensor by one so that a tensor [[[x1, x2, x3, ... , xn ]]] becomes [[[xn, x1, x2, ... , x(n-1)]]].
X = torch.roll(X, shifts=1, dims=2)
And, the line below selects the first element from the last dimension of the 3d tensor and sets that item to the predicted value stored in the NumPy ndarray (yhat), [[xn+1]]. Then, the new input tensor becomes [[[x(n+1), x1, x2, ... , x(n-1)]]]
X[..., -1, 0] = yhat.item(0)
Recently, I've decided to put together the things I had learned and the things I would have liked to know earlier. If you'd like to have a look, you can find the links down below. I hope you'll find it useful. Feel free to comment or reach out to me if you agree or disagree with any of the remarks I made above.
Building RNN, LSTM, and GRU for time series using PyTorch
Predicting future values with RNN, LSTM, and GRU using PyTorch

Using keras.layers.Concatenate correctly [duplicate]

I am trying to merge output from two models and give them as input to the third model using keras sequential model.
Model1 :
inputs1 = Input(shape=(750,))
x = Dense(500, activation='relu')(inputs1)
x = Dense(100, activation='relu')(x)
Model1 :
inputs2 = Input(shape=(750,))
y = Dense(500, activation='relu')(inputs2)
y = Dense(100, activation='relu')(y)
Model3 :
merged = Concatenate([x, y])
final_model = Sequential()
final_model.add(merged)
final_model.add(Dense(100, activation='relu'))
final_model.add(Dense(3, activation='softmax'))
Till here, my understanding is that, output from two models as x and y are merged and given as input to the third model. But when I fit this all like,
module3.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
module3.fit([in1, in2], np_res_array)
in1 and in2 are two numpy ndarray of dimention 10000*750 which contains my training data and np_res_array is the corresponding target. This gives me error as 'list' object has no attribute 'shape' As far as know, this is how we give multiple inputs to a model, but what is this error? How do I resolve it?
You can't do this using Sequential API. That's because of two reasons:
Sequential models, as their name suggests, are a sequence of layers where each layer is connected directly to its previous layer and therefore they cannot have branches (e.g. merge layers, multiple input/output layers, skip connections, etc.).
The add() method of Sequential API accepts a Layer instance as its argument and not a Tensor instance. In your example merged is a Tensor (i.e. concatenation layer's output).
Further, the correct way of using Concatenate layer is like this:
merged = Concatenate()([x, y])
However, you can also use concatenate (note the lowercase "c"), its equivalent functional interface, like this:
merged = concatenate([x, y])
Finally, to be able to construct that third model you also need to use the functional API.

Using different loss functions for different outputs simultaneously Keras?

I'm trying to make a network that outputs a depth map, and semantic segmentation data separately.
In order to train the network, I'd like to use categorical cross entropy for the segmentation branch, and mean squared error for the branch that outputs the depth map.
I couldn't find any info on implementing the two loss functions for each branches in the Keras documentation for the Functional API.
Is it possible for me to use these loss functions simultaneously during training, or would it be better for me to train the different branches separately?
From the documentation of Model.compile:
loss: String (name of objective function) or objective function. See
losses. If the model has multiple outputs, you can use a different
loss on each output by passing a dictionary or a list of losses. The
loss value that will be minimized by the model will then be the sum of
all individual losses.
If your output is named, you can use a dictionary mapping the names to the corresponding losses:
x = Input((10,))
out1 = Dense(10, activation='softmax', name='segmentation')(x)
out2 = Dense(10, name='depth')(x)
model = Model(x, [out1, out2])
model.compile(loss={'segmentation': 'categorical_crossentropy', 'depth': 'mse'},
optimizer='adam')
Otherwise, use a list of losses (in the same order as the corresponding model outputs).
x = Input((10,))
out1 = Dense(10, activation='softmax')(x)
out2 = Dense(10)(x)
model = Model(x, [out1, out2])
model.compile(loss=['categorical_crossentropy', 'mse'], optimizer='adam')

Keras: model with one input and two outputs, trained jointly on different data (semi-supervised learning)

I would like to code with Keras a neural network that acts both as an autoencoder AND a classifier for semi-supervised learning. Take for example this dataset where there is a few labeled images and a lot of unlabeled images: https://cs.stanford.edu/~acoates/stl10/
Some papers listed here achieved that, or very similar things, successfully.
To sum up: if the model would have the same input data shape and the same "encoding" convolutional layers, but would split into two heads (fork-style), so there is a classification head and a decoding head, in a way that the unsupervised autoencoder will contribute to a good learning for the classification head.
With TensorFlow there would be no problem doing that as we have full control over the computational graph.
But with Keras, things are more high-level and I feel that all the calls to ".fit" must always provide all the data at once (so it would force me to tie together the classification head and the autoencoding head into one time-step).
One way in keras to almost do that would be with something that goes like this:
input = Input(shape=(32, 32, 3))
cnn_feature_map = sequential_cnn_trunk(input)
classification_predictions = Dense(10, activation='sigmoid')(cnn_feature_map)
autoencoded_predictions = decode_cnn_head_sequential(cnn_feature_map)
model = Model(inputs=[input], outputs=[classification_predictions, ])
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy'])
model.fit([images], [labels, images], epochs=10)
However, I think and I fear that if I just want to fit things in that way it will fail and ask for the missing head:
for epoch in range(10):
# classifications step
model.fit([images], [labels, None], epochs=1)
# "semi-unsupervised" autoencoding step
model.fit([images], [None, images], epochs=1)
# note: ".train_on_batch" could probably be used rather than ".fit" to avoid doing a whole epoch each time.
How should one implement that behavior with Keras? And could the training be done jointly without having to split the two calls to the ".fit" function?
Sometimes when you don't have a label you can pass zero vector instead of one hot encoded vector. It should not change your result because zero vector doesn't have any error signal with categorical cross entropy loss.
My custom to_categorical function looks like this:
def tricky_to_categorical(y, translator_dict):
encoded = np.zeros((y.shape[0], len(translator_dict)))
for i in range(y.shape[0]):
if y[i] in translator_dict:
encoded[i][translator_dict[y[i]]] = 1
return encoded
When y contains labels, and translator_dict is a python dictionary witch contains labels and its unique keys like this:
{'unisex':2, 'female': 1, 'male': 0}
If an UNK label can't be found in this dictinary then its encoded label will be a zero vector
If you use this trick you also have to modify your accuracy function to see real accuracy numbers. you have to filter out all zero vectors from our metrics
def tricky_accuracy(y_true, y_pred):
mask = K.not_equal(K.sum(y_true, axis=-1), K.constant(0)) # zero vector mask
y_true = tf.boolean_mask(y_true, mask)
y_pred = tf.boolean_mask(y_pred, mask)
return K.cast(K.equal(K.argmax(y_true, axis=-1), K.argmax(y_pred, axis=-1)), K.floatx())
note: You have to use larger batches (e.g. 32) in order to prevent zero matrix update, because It can make your accuracy metrics crazy, I don't know why
Alternative solution
Use Pseudo Labeling :)
you can train jointly, you have to pass an array insted of single label.
I used fit_generator, e.g.
model.fit_generator(
batch_generator(),
steps_per_epoch=len(dataset) / batch_size,
epochs=epochs)
def batch_generator():
batch_x = np.empty((batch_size, img_height, img_width, 3))
gender_label_batch = np.empty((batch_size, len(gender_dict)))
category_label_batch = np.empty((batch_size, len(category_dict)))
while True:
i = 0
for idx in np.random.choice(len(dataset), batch_size):
image_id = dataset[idx][0]
batch_x[i] = load_and_convert_image(image_id)
gender_label_batch[i] = gender_labels[idx]
category_label_batch[i] = category_labels[idx]
i += 1
yield batch_x, [gender_label_batch, category_label_batch]

Resources