How to join two lstm with dense layer as given in figures? - machine-learning

How can I make a network architecture like this (sketch) in keras?

Ok, since your sketch and your description of your problem is very vague, all I can give you is a vague answer. Therefore I can just guess specific network details like activation functions or input/output dimensions, adjust them accordingly to your problem.
So generally speaking what you can do in keras is to use the Functional API:
# adjust dimensions and shapes accordingly
left_input = Input(shape=(100,), name='left_input')
left_x = Embedding(output_dim=512, input_dim=10000, input_length=100)(left_input)
left_lstm_out = LSTM(32)(left_x)
left_output = Dense(16, activation='sigmoid', name='left_output')(left_lstm_out)
# adjust dimensions and shapes accordingly
right_input = Input(shape=(100,), name='right_input')
right_x = Embedding(output_dim=512, input_dim=10000, input_length=100)(right_input)
right_lstm_out = LSTM(32)(right_x)
right_output = Dense(16, activation='sigmoid', name='right_output')(right_lstm_out)
# concatenate or add outputs, you decide
combined_input = keras.layers.concatenate([left_output, right_output])
output = Dense(1, activation='sigmoid')(combined_input)
model = Model(inputs=[left_input, right_input], outputs=output)
model.compile(optimizer='rmsprop', loss='binary_crossentropy')
model.fit([left_train_data, right_train_data], train_labels, epochs=50, batch_size=32)
For more details regarding the Functional API have a look at the documentation (linked above). Feel free to update your post with more details about your project and we can give you a more precise answer.

Related

when setting .eval() my model performs worse than when I set .train()

During the training phase, I select the model parameters with the best performance metric.
if performance_metric.item()>max_performance:
max_performance= performance_metric.item()
torch.save(neural_net.state_dict(), PATH+'/best_model.pt')
This is the neural network model used:
class Neural_Net(nn.Module):
def __init__(self, M,shape_input,batch_size):
super(Neural_Net, self).__init__()
self.lstm = nn.LSTM(shape_input,M)
#self.dense1 = nn.Linear(shape_input,M)
self.dense1 = nn.Linear(M,M) #Used with the LSTM
torch.nn.init.xavier_uniform_(self.dense1.weight)
self.dense2 = nn.Linear(M,M)
torch.nn.init.xavier_uniform_(self.dense2.weight)
self.dense3 = nn.Linear(M,1)
torch.nn.init.xavier_uniform_(self.dense3.weight)
self.drop = nn.Dropout(0.7)
self.bachnorm1 = nn.BatchNorm1d(M)
self.relu = nn.ReLU()
self.sigmoid = nn.Sigmoid()
self.hidden_cell = (torch.zeros(1,batch_size,M),torch.zeros(1,batch_size,M))
def forward(self, x):
lstm_out, self.hidden_cell = self.lstm(x.view(1 ,len(x), -1), self.hidden_cell)
x = self.drop(self.relu(self.dense1(self.bachnorm1(lstm_out.view(len(x), -1)))))
x = self.drop(self.relu(self.dense2(x)))
x = self.relu(self.dense3(x))
return x
After that I load the model with the best parameters and set the evaluation mode:
neural_net.load_state_dict(torch.load(PATH+'/best_model.pt'))
neural_net.eval()
The results are completely random. When I set train() the performance is similar to the selected best model parameter.
There is an important aspect of the eval() that I am forgetting? Is the batch normalization correctly used? I am using a batch the same size as in the training phase for the test phase.
Without knowing your batch size, training/test dataset size, or the training/test dataset discrepancies, this issue has been discussed on the pytorch forums previously here.
In my experience, it sounds very much like your latent training data representation in your model is significantly different to your validation data representation. The main advice I can provide is for you to try reducing the momentum of your batchnorm layer. It might be worth substituting a layernorm layer instead (which doesn't track a running mean/standard deviation) OR setting track_running_stats=False in the batchnorm1d function and seeing if the problem persists.

Can we use Keras model's accuracy metric for Image Captioning model?

Kindly consider the following line of code.
model.compile(loss='categorical_crossentropy', optimizer='adam',metrics=['accuracy'])
I am allowed to use metrics=['accuracy'] for my Image Captioning model. My model has been defined as follows:
inputs1 = Input(shape=(2048,))
fe1 = Dropout(0.2)(inputs1)
fe1=BatchNormalization()(fe1)
fe2 = Dense(256, activation='relu')(fe1)
inputs2 = Input(shape=(max_length,))
se1 = Embedding(vocabsize, embedding_dim, mask_zero=True)(inputs2)
se2 = Dropout(0.2)(se1)
se2=BatchNormalization()(se2)
se3 = LSTM(256)(se2)
decoder1 = add([fe2, se3])
decoder2 = Dense(256, activation='relu')(decoder1)
outputs = Dense(vocabsize, activation='softmax')(decoder2)
model = Model(inputs=[inputs1, inputs2], outputs=outputs)
training this model gives the output as follows:
Can I use this accuracy metric to evaluate my Image Captioning model?
If yes then are the built-in calculations considering the semantic meaning of predicted captions?
If the answer to question 1 is yes then what is the use of BLEU score and other evaluation metrics?
My model gives decent captions for the given new image. Is it necessary to have this accuracy metric value greater than 0.5?
to answer all questions I should say:
for language models, it's common to use bleu (bilingual evaluation understudy) score since it gives you a better overview of your model performance
Keras's acc metric is ok, but it actually used for categorical models or models which have a deterministic output, but language models are not like that e.g ("I am ok" and "I am good" or "I'm ok" have the same meaning but Keras accuracy makes difference between them ). I suggest to check out Keras implementation: https://github.com/keras-team/keras/blob/master/keras/metrics.py#L439

How to apply zca on a huge image dataset with limited memory?

what google told me is:
For keras, the ImageDataGenerator function seems to have a zca_whitening which can be used out of the box. But if this option been set, it requires to call the ImageDataGenerator.fit on the whole dataset X. So this is not an option.
For sklearn, the IncrementalPCA seems to work with a huge dataset, but I don't know how to rotate PCA to ZCA in an generator style.
Thanks for the help!
I have defined a function that might be helpful following the ZCA transformation:
def ZCAtransform(X,IPCA_model):
# get the Eigenvectors and Eigenvalues
U = IPCA_model.components_.transpose()
S = np.sqrt(IPCA_model.explained_variance_)
Xdemeand = (X-np.mean(X,0)).transpose()
#get the transformed data
# Xproj' = U * diag(1/(S+I*epsilon)) * U' * X_data
return (U.dot(np.diag(1/(S+IPCA_model.noise_variance_))).dot(U.transpose()).dot(Xdemeand)).transpose()
Xproj = ZCAtransform(X, ipca)
Following the given example at Scikit-learn, I was able to generate the ZCA of Iris dataset as shown below:
ZCA Whitened PCA

How to tie word embedding and softmax weights in keras?

Its commonplace for various neural network architectures in NLP and vision-language problems to tie the weights of an initial word embedding layer to that of an output softmax. Usually this produces a boost to sentence generation quality. (see example here)
In Keras its typical to embed word embedding layers using the Embedding class, however there seems to be no easy way to tie the weights of this layer to the output softmax. Would anyone happen to know how this could be implemented ?
Be aware that Press and Wolf dont't propose to freeze the weights to some pretrained ones, but tie them. That means, to ensure that input and output weights are always the same during training (in the sense of synchronized).
In a typical NLP model (e.g. language modelling/translation), you have an input dimension (vocabulary) of size V and a hidden representation size H. Then, you start with an Embedding layer, which is a matrix VxH. And the output layer is (probably) something like Dense(V, activation='softmax'), which is a matrix H2xV. When tying the weights, we want that those matrices are the same (therefore, H==H2).
For doing this in Keras, I think the way to go is via shared layers:
In your model, you need to instantiate a shared embedding layer (of dimension VxH), and apply it to either your input and output. But you need to transpose it, to have the desired output dimensions (HxV). So, we declare a TiedEmbeddingsTransposed layer, which transposes the embedding matrix from a given layer (and applies an activation function):
class TiedEmbeddingsTransposed(Layer):
"""Layer for tying embeddings in an output layer.
A regular embedding layer has the shape: V x H (V: size of the vocabulary. H: size of the projected space).
In this layer, we'll go: H x V.
With the same weights than the regular embedding.
In addition, it may have an activation.
# References
- [ Using the Output Embedding to Improve Language Models](https://arxiv.org/abs/1608.05859)
"""
def __init__(self, tied_to=None,
activation=None,
**kwargs):
super(TiedEmbeddingsTransposed, self).__init__(**kwargs)
self.tied_to = tied_to
self.activation = activations.get(activation)
def build(self, input_shape):
self.transposed_weights = K.transpose(self.tied_to.weights[0])
self.built = True
def compute_mask(self, inputs, mask=None):
return mask
def compute_output_shape(self, input_shape):
return input_shape[0], K.int_shape(self.tied_to.weights[0])[0]
def call(self, inputs, mask=None):
output = K.dot(inputs, self.transposed_weights)
if self.activation is not None:
output = self.activation(output)
return output
def get_config(self):
config = {'activation': activations.serialize(self.activation)
}
base_config = super(TiedEmbeddingsTransposed, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
The usage of this layer is:
# Declare the shared embedding layer
shared_embedding_layer = Embedding(V, H)
# Obtain word embeddings
word_embedding = shared_embedding_layer(input)
# Do stuff with your model
# Compute output (e.g. a vocabulary-size probability vector) with the shared layer:
output = TimeDistributed(TiedEmbeddingsTransposed(tied_to=shared_embedding_layer, activation='softmax')(intermediate_rep)
I have tested this in NMT-Keras and it trains properly. But, as I try to load a trained model, it gets an error, related to the way Keras loads the models: it doesn't load the weights from the tied_to. I've found several questions regarding this (1, 2, 3), but I haven't managed to solve this issue. If someone have any ideas on the next steps to take, I'd be very glad to hear them :)
As you may read here you should simply set trainable flag to False. E.g.
aux_output = Embedding(..., trainable=False)(input)
....
output = Dense(nb_of_classes, .. ,activation='softmax', trainable=False)

Variational Autoencoder for Feature Extraction

I would like to ask if would it be possible (rather if it can make any sense) to use a variational autoencoder for feature extraction. I ask because for the encoding part we sample from a distribution, and then it means that the same sample can have a different encoding (Due to the stochastic nature in the sampling process). Thanks!
Yes the feature extraction goal is the same for vae's or sparse autoencoders.
Once you have an encoder plug-in a classifier on the extracted features.
Best reggards,
Yes the output of encoder network can be used as your feature.
Just think about this: using the output of encoder network as input, the decoder network can generate you an image quite like your old image. Therefore the output of encoder network has pretty much covered most of the information in your original image. In other words, they are the most important features of your original image that distinguish it from other images.
The only thing you want to pay attention to is that variational autoencoder is a stochastic feature extractor, while usually the feature extractor is deterministic. You can either use the mean and variance as your extracted feature, or use Monte Carlo method by drawing from the Gaussian distribution defined by the mean and variance as "sampled extracted features".
Yes, you can.
I used the below code to extract the important features from my dataset.
prostate_df <- read.csv('your_data')
prostate_df <- prostate_df[,-1] # first column.
train_df<-prostate_df
outcome_name <- 'subtype' # my label column
feature_names <- setdiff(names(prostate_df), outcome_name)
library(h2o)
localH2O = h2o.init()
prostate.hex<-as.h2o(train_df, destination_frame="train.hex")
prostate.dl = h2o.deeplearning(x = feature_names,
#y="subtype",
training_frame = prostate.hex,
model_id = "AE100",
# input_dropout_ratio = 0.3, #Quite high,
#l2 = 1e-5, #Quite high
autoencoder = TRUE,
#validation_frame = prostate.hex,
#reproducible = T,seed=1,
hidden = c(1), epochs = 700,
#activation = "Tanh",
#activation ="TanhWithDropout",
activation ="Rectifier",
#activation ="RectifierWithDropout",
standardize = TRUE,
#regression_stop = -1,
#stopping_metric="MSE",
train_samples_per_iteration = 0,
variable_importances=TRUE
)
label1<-ncol(train_df)
train_supervised_features2 = h2o.deepfeatures(prostate.dl, prostate.hex, layer=1)
plotdata = as.data.frame(train_supervised_features2)
plotdata$label = as.character(as.vector(train_df[,label1]))
library(ggplot2)
qplot(DF.L1.C1, DF.L1.C2, data = plotdata, color = label, main = "Cancer Normal Pathway data ")
prostate.anon = h2o.anomaly(prostate.dl, prostate.hex, per_feature=FALSE)
head(prostate.anon)
err <- as.data.frame(prostate.anon)
h2o.scoreHistory(prostate.dl)
head(h2o.varimp(prostate.dl),10)
h2o.varimp_plot(prostate.dl)

Resources