Pre-training BERT - `cls_token_idx` - machine-learning

I am trying to pre-train BERT on a domain specific dataset using the official tensorflow code and I am wondering what does the `cls_token_idx` refer to in this `config.yaml` file:
model:
cls_heads: [{activation: tanh, cls_token_idx: 0, dropout_rate: 0.1, inner_dim: 768, name: next_sentence, num_classes: 2}]
encoder:
type: bert
bert:
...
Searching a little bit in the source code, I found that:
...
cls_token_idx: The index inside the sequence to pool.
...
if not self.inner_dim:
x = features
else:
x = features[:, self.cls_token_idx, :] # take <CLS> token.
x = self.dense(x)
Is the cls_token_idx referring to the index of [CLS] token in the vocabulary or the index of [CLS] token in the sequence?
For example, let's say we have the following input:
tokens: [CLS] ... [SEP] ... [SEP]
input_ids: 2 ... 3 ... 3
Should I consider the cls_token_idx (when configuring the .yaml file) 0 (idx in the seq) or 2 (idx in the vocab)?
Thank you!

Related

What is the input type of GNN?

from reading the articles and papers ,I understood that GNN is used for
Node level Prediction
Link prediction and
graph level prediction
but i am very much confused about GNN's input type
I have a list of questions
what is the input type of GNN ? Graphs or numerical data
If GNN takes graphs as input then how it is generated ?
Second edit:
By reading another paper i found that GNN takes graphs as input
1
Now i had only one question how graph is generated from the input ?
Reference:
Jie Zhou a,1, Ganqu Cui a,1, Shengding Hu a, Zhengyan Zhang a, Cheng Yang b, Zhiyuan Liu a,*,
Lifeng Wang c, Changcheng Li c, Maosong Sun a Graph neural networks: A review of methods and applications "Graph neural networks: A review of methods and applications" AI Open
The input of GNN include objects of different dimensions e.g. properties matrix dimension is [n_nodes, n_node_features], adjacency matrix dimension is [n_nodes, n_nodes] depending of the type of graph-neural-networks.
Spektral is a nice library with good examples of different types of GNN. The examples of how to load the data are also provided.
Here is the example of GNN model created using tensorflow and1
class GIN0(Model):
def __init__(self, channels, n_layers):
super().__init__()
self.conv1 = GINConv(channels, epsilon=0, mlp_hidden=[channels, channels])
self.convs = []
for _ in range(1, n_layers):
self.convs.append(
GINConv(channels, epsilon=0, mlp_hidden=[channels, channels])
)
self.pool = GlobalAvgPool()
self.dense1 = Dense(channels, activation="relu")
self.dropout = Dropout(0.5)
self.dense2 = Dense(channels, activation="relu")
def call(self, inputs):
x, a, i = inputs
x = self.conv1([x, a])
for conv in self.convs:
x = conv([x, a])
x = self.pool([x, i])
x = self.dense1(x)
x = self.dropout(x)
return self.dense2(x)
You can also look at this question for a more complete example of GNN application.

Overcoming compatibility issues with using iml from h2o models

I am unable to reproduce the only example I can find of using h2o with iml (https://www.r-bloggers.com/2018/08/iml-and-h2o-machine-learning-model-interpretability-and-feature-explanation/) as detailed here (Error when extracting variable importance with FeatureImp$new and H2O). Can anyone point to a workaround or other examples of using iml with h2o?
Reproducible example:
library(rsample) # data splitting
library(ggplot2) # allows extension of visualizations
library(dplyr) # basic data transformation
library(h2o) # machine learning modeling
library(iml) # ML interprtation
library(modeldata) #attrition data
# initialize h2o session
h2o.no_progress()
h2o.init()
# classification data
data("attrition", package = "modeldata")
df <- rsample::attrition %>%
mutate_if(is.ordered, factor, ordered = FALSE) %>%
mutate(Attrition = recode(Attrition, "Yes" = "1", "No" = "0") %>% factor(levels = c("1", "0")))
# convert to h2o object
df.h2o <- as.h2o(df)
# create train, validation, and test splits
set.seed(123)
splits <- h2o.splitFrame(df.h2o, ratios = c(.7, .15), destination_frames =
c("train","valid","test"))
names(splits) <- c("train","valid","test")
# variable names for resonse & features
y <- "Attrition"
x <- setdiff(names(df), y)
# elastic net model
glm <- h2o.glm(
x = x,
y = y,
training_frame = splits$train,
validation_frame = splits$valid,
family = "binomial",
seed = 123
)
# 1. create a data frame with just the features
features <- as.data.frame(splits$valid) %>% select(-Attrition)
# 2. Create a vector with the actual responses
response <- as.numeric(as.vector(splits$valid$Attrition))
# 3. Create custom predict function that returns the predicted values as a
# vector (probability of purchasing in our example)
pred <- function(model, newdata) {
results <- as.data.frame(h2o.predict(model, as.h2o(newdata)))
return(results[[3L]])
}
# create predictor object to pass to explainer functions
predictor.glm <- Predictor$new(
model = glm,
data = features,
y = response,
predict.fun = pred,
class = "classification"
)
imp.glm <- FeatureImp$new(predictor.glm, loss = "mse")
Error obtained:
Error in `[.data.frame`(prediction, , self$class, drop = FALSE): undefined columns
selected
traceback()
1. FeatureImp$new(predictor.glm, loss = "mse")
2. .subset2(public_bind_env, "initialize")(...)
3. private$run.prediction(private$sampler$X)
4. self$predictor$predict(data.frame(dataDesign))
5. prediction[, self$class, drop = FALSE]
6. `[.data.frame`(prediction, , self$class, drop = FALSE)
7. stop("undefined columns selected")
In the iml package documentation, it says that the class argument is "The class column to be returned.". When you set class = "classification", it's looking for a column called "classification" which is not found. At least on GitHub, it looks like the iml package has gone through a fair amount of development since that blog post, so I imagine some functionality may not be backwards compatible anymore.
After reading through the package documentation, I think you might want to try something like:
predictor.glm <- Predictor$new(
model = glm,
data = features,
y = "Attrition",
predict.function = pred,
type = "prob"
)
# check ability to predict first
check <- predictor.glm$predict(features)
print(check)
Even better might be to leverage H2O's extensive functionality around machine learning interpretability.
h2o.varimp(glm) will give the user the variable importance for each feature
h2o.varimp_plot(glm, 10) will render a graphic showing the relative importance of each feature.
h2o.explain(glm, as.h2o(features)) is a wrapper for the explainability interface and will by default provide the confusion matrix (in this case) as well as variable importance, and partial dependency plots for each feature.
For certain algorithms (e.g., tree-based methods), h2o.shap_explain_row_plot() and h2o.shap_summary_plot() will provide the shap contributions.
The h2o-3 docs might be useful here to explore more

what does model.output.op does in keras?

This is part of the code of grad-cam:
def generate_grad_cam(img_tensor, model, class_index, activation_layer):
inp = model.input
y_c = model.output.op.inputs[0][0, class_index]
A_k = model.get_layer(activation_layer).output
What does model.output.op.inputs[0][0,class_index] do? What is model.output.op?
In this picture which one is the
model.output.op.inputs[0][0, class_index]??
I did a bit of exploring in the TF/Keras code, and based on that investigation I believe what
model.output.op
provides is the mathematical "operation" at the output layer of the model. The inputs provide the list of input tensors to that operation (for example in your case inputs[0] is the first input to whatever the actual op is). The remaining part is slicing into that tensor to extract a certain element.
After running this code for the VGG16 example model:
import tensorflow as tf
tf.keras.backend.clear_session() # For easy reset of notebook state.
myNewModel = tf.keras.applications.VGG16()
print('myNewModel.output.op:')
print(myNewModel.output.op)
print('myNewModel.output.op.inputs[0]:')
print(myNewModel.output.op.inputs[0])
print('myNewModel.output.op.inputs[0][0,3]:')
print(myNewModel.output.op.inputs[0][0,3])
I get this output - Note that I used 3 for the class_index just for an example:
myNewModel.output.op:
name: "predictions/Softmax"
op: "Softmax"
input: "predictions/BiasAdd"
attr {
key: "T"
value {
type: DT_FLOAT
}
}
myNewModel.output.op.inputs[0]:
Tensor("predictions/BiasAdd:0", shape=(?, 1000), dtype=float32)
myNewModel.output.op.inputs[0][0,3]:
Tensor("strided_slice:0", shape=(), dtype=float32)
I hope this helps.

Make personnal Dataloader with PYTORCH

I'm searching to create a personnal dataloader with a specific format to use Pytorch library, someone have an idea how can I do it ? I have follow Pytorch Tutorial but I don't find my answer!
I need a DataLoader that yields the tuples of the following format:
(Bx3xHxW FloatTensor x, BxHxW LongTensor y, BxN LongTensor y_cls) where
x - batch of input images,
y - batch of groung truth seg maps,
y_cls - batch of 1D tensors of dimensionality N: N total number of classes,
y_cls[i, T] = 1 if class T is present in image i, 0 otherwise
I hope that someone can unlock the problem .. :) Thanks !
You simply need to have a database derived from torch.utils.data.Dataset, where __getitem__(index) returns a tuple (x, y, y_cls) of the types you want, pytorch will take care of everything else.
from torch.utils import data
class MyTupleDataset(data.Dataset):
def __init__(self):
super(MyTupleDataset, self).__init__()
# init your dataset here...
def __getitem__(index):
x = torch.Tensor(3, H, W) # batch dim is handled by the data loader
y = torch.Tensor(H, W).to(torch.long)
y_cls = torch.Tensor(N).to(torch.long)
return x, y, y_cls
That's it. Provide pytorch's torch.utils.data.DataLoader with MyTupleDataset and you are done.

Keras weighted merge

I am trying to compute a weighted output from multiple parallel models using Keras' Merge layer. I'm using Theano backend.
I have L parallel models (Ci). Each of their output layer is a k-sized softmax.
There is one model (N), its output is a L-sized softmax.
Here is what I have so far:
Parallel models (Ci) each with k dimension in the output layer:
model.add(Dense(K, activation='softmax', W_regularizer=l2(0.001),init='normal'))
The weighing model (N), output layer:
model.add(Dense(L, activation='softmax', W_regularizer=l2(0.001), init='normal'))
The merger is as follows:
model.add(Merge(layers=model_group,
mode=lambda model_group: self.merge_fun(model_group, L),
output_shape = (None, k)))
where "model_group" is a (L+1)-length list [N, C1, C2, ..., CL], and merge_fun's signature is:
def merge_fun(self, model_group, L):
Mathematically, I would like the output of the merged layer to be a weighted sum:
out = N[1]x([C11, C12, C13, .., C1k]) + N[2]x([C21, C22, C23, ..., C2k]) + ... + N[L]x([CL1, CL2, CL3, ..., CLk]),
where out is a vector of size k
How can I use the Merge layer to achieve this ?
I know that the magic would probably have to happen in the 'merge_fun', but I am not sure how to perform matrix algebra in Keras. The tensor parameters don't have a "shape" parameter - they have a keras_shape = (None, K or L) - but I am not sure how to combine parallel models' output into a matrix.
I tried using a local evaluation of the following expressions:
K.concatenate([model_group[1], model_group[2]], axis=0)*model_group[0]
and
model_group[0] * K.concatenate([model_group[1], model_group[2]], axis=0)
both of which didn't throw an error, so I can't use this as a guide. After the multiplication, the result returned did not have the keras_shape variable, so I'm not sure what the shape of the result is.
Any suggestions ?
What I advise you is to use a functional API and use this is in a following manner:
Define the L output models:
softmax_1 = Dense(K, activation='softmax', ...))(input_to_softmax_1)
softmax_2 = Dense(K, activation='softmax', ...))(input_to_softmax_2)
...
softmax_L = Dense(K, activation='softmax', ...))(input_to_softmax_L)
Define the merge softmax:
merge_softmax= Dense(L, activation='softmax', ...)(input_to_merge_softmax)
merge_softmax = Reshape((1, L))(merge_softmax)
Merge and reshape the bag of L models:
bag_of_models = merge([softmax_1, ..., softmax_L], mode = 'concat')
bag_of_models = Reshape((L, K))(bag_of_models)
Compute the final merged softmax:
final_result = merge([bag_of_models, merge_softmax], mode = 'dot', dot_axes = [1, 2])
final_result = Reshape((K, ))(final_result)
Of course - depending on your topology - different tensor might be the same (e.g. input to different softmaxes). I tested this on my machine but due to extensive refactoring - I might made mistake - so if you fin one - please inform me.
The solution with Sequential is much less clear and a little bit cumbersome - but if you want one - please write in the comment so I will update my answer.

Resources