Concatenating two pre-trained BERT - machine-learning

max_length = 50
tokenizer = RobertaTokenizer.from_pretrained('roberta-large', do_lower_case=True)
encodings = tokenizer.batch_encode_plus(comments,max_length=max_length,pad_to_max_length=True, truncation=True) # tokenizer's encoding method
train_inputs = encodings['input_ids']
train_masks = encodings['attention_mask']
train_inputs = torch.tensor(train_inputs)
train_labels = torch.tensor(train_labels)
train_masks = torch.tensor(train_masks)
batch_size = 48
train_data = TensorDataset(train_inputs, train_masks, train_labels)
train_sampler = RandomSampler(train_data)
train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=batch_size)
model = RobertaForSequenceClassification.from_pretrained('roberta-large', num_labels=num_labels)
model.cuda()
Hi, i'm using HuggingFace library for classification, i want to concatenate two types of BERT together. this is not the entire code. i just want you to know how i've used tokenizer and encodings. now i have 2 questions:
1: how can i see the created vectors? it's dimension and the vector it self. 2: in which step should i concatenate two BERT together? their vectors? or their output (logit) maybe?

Related

ValueError: Data cardinality is ambiguous. Make sure all arrays contain the same number of samples?

I am trying to use a MobileNet model but facing above mentioned issue . I don't know if it is
occuring due to train_test_split or else . Architecture is shown below
Can I use model.fit instead of model.fit_generator here ?
mobilenet = MobileNet(input_shape=(224,224,3) , weights='imagenet', include_top=False)
# don't train existing weights
for layer in mobilenet.layers:
layer.trainable = False
folders = glob('/content/drive/MyDrive/AllClasses/*')
print("Total number of classes are",len(folders))
x = Flatten()(mobilenet.output)
prediction = Dense(len(folders), activation='softmax')(x)
model = Model(inputs=mobilenet.input, outputs=prediction)
model.summary()
model.compile(loss='categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
dataset = ImageDataGenerator(rescale=1./255)
dataset = dataset.flow_from_directory('/content/drive/MyDrive/AllClasses',target_size=(224, 224),batch_size=32,class_mode='categorical',color_mode='grayscale')
train_data, test_data = train_test_split(dataset,random_state=42, test_size=0.20,shuffle=True)
r = model.fit(train_data,validation_data=(test_data),epochs=5)

Keras LSTM from for loop, using functional API with custom number of layers

I am trying to build a network through the keras functional API feeding two lists containing the number of units of the LSTM layers and of the FC (Dense) layers. I want to analyse 20 consecutive segments (batches) which contain fs time steps each and 2 values (2 features per time step). This is my code:
Rec = [4,4,4]
FC = [8,4,2,1]
def keras_LSTM(Rec,FC,fs, n_witness, lr=0.04, optimizer='Adam'):
model_LSTM = Input(batch_shape=(20,fs,n_witness))
return_state_bool=True
for i in range(shape(Rec)[0]):
nRec = Rec[i]
if i == shape(Rec)[0]-1:
return_state_bool=False
model_LSTM = LSTM(nRec, return_sequences=True,return_state=return_state_bool,
stateful=True, input_shape=(None,n_witness),
name='LSTM'+str(i))(model_LSTM)
for j in range(shape(FC)[0]):
nFC = FC[j]
model_LSTM = Dense(nFC)(model_LSTM)
model_LSTM = LeakyReLU(alpha=0.01)(model_LSTM)
nFC_final = 1
model_LSTM = Dense(nFC_final)(model_LSTM)
predictions = LeakyReLU(alpha=0.01)(model_LSTM)
full_model_LSTM = Model(inputs=model_LSTM, outputs=predictions)
model_LSTM.compile(optimizer=keras.optimizers.Adam(lr=lr, beta_1=0.9, beta_2=0.999,
epsilon=1e-8, decay=0.066667, amsgrad=False), loss='mean_squared_error')
return full_model_LSTM
model_new = keras_LSTM(Rec, FC, fs=fs, n_witness=n_wit)
model_new.summary()
When compiling I get the following error:
ValueError: Graph disconnected: cannot obtain value for tensor Tensor("input_1:0", shape=(20, 2048, 2), dtype=float32) at layer "input_1". The following previous layers were accessed without issue: []
Which I actually don't quite understand, but suspect it may have something to do with inputs?
I solved the issue by modifying line 4 of the code as in the following:
x = model_LSTM = Input(batch_shape=(20,fs,n_witness))
along with line 21, as in the following:
full_model_LSTM = Model(inputs=x, outputs=predictions)

I want to calculate cross entropy at all outputs of LSTM

I am writing a program of classification problem using LSTM.
However, I do not know how to calculate cross entropy with all the output of LSTM.
Here is a part of my program.
cell_fw = tf.nn.rnn_cell.LSTMCell(num_hidden)
cell_bw = tf.nn.rnn_cell.LSTMCell(num_hidden)
outputs, _ = tf.nn.bidirectional_dynamic_rnn(cell_fw,cell_bw,inputs = inputs3, dtype=tf.float32,sequence_length= seq_len)
outputs = tf.concat(outputs,axis=2)
#outputs [batch_size,max_timestep,num_features]
outputs = tf.reshape(outputs, [-1, num_hidden*2])
W = tf.Variable(tf.truncated_normal([num_hidden*2,
num_classes],
stddev=0.1))
b = tf.Variable(tf.constant(0., shape=[num_classes]))
logits = tf.matmul(outputs, W) + b
How can I apply crossentropy error to this?
Should I create a vector that represents the same class as the number of max_timestep for each batch and calculate the error with that?
Have you looked at cross_entropy documentation: https://www.tensorflow.org/api_docs/python/tf/losses/softmax_cross_entropy ?
The dimension of onehot_labels should answer your question.

How to use NN LookupTable in Torch

How do I use nn.LookupTable to convert my vector of vocab indices to a tensor of embedding vectors?
(https://github.com/torch/nn/blob/master/LookupTable.lua)
The network is set up as:
self.llstm = LSTM
self.rlstm = LSTM
local modules = nn.Parallel()
:add(nn.LookupTable(self.vocab_size, self.emb_size))
:add(nn.Collapse(2))
:add(self.llstm)
:add(self.my_module)
self.params, self.grad_params = modules:getParameters
In the train step:
input = dataset[i]
emb_input = ?
self.llstm.forward(emb_input, reverse)

Stacked RNN model setup in TensorFlow

I'm kind of lost in building up a stacked LSTM model for text classification in TensorFlow.
My input data was something like:
x_train = [[1.,1.,1.],[2.,2.,2.],[3.,3.,3.],...,[0.,0.,0.],[0.,0.,0.],
...... #I trained the network in batch with batch size set to 32.
]
y_train = [[1.,0.],[1.,0.],[0.,1.],...,[1.,0.],[0.,1.]]
# binary classification
The skeleton of my code looks like:
self._input = tf.placeholder(tf.float32, [self.batch_size, self.max_seq_length, self.vocab_dim], name='input')
self._target = tf.placeholder(tf.float32, [self.batch_size, 2], name='target')
lstm_cell = rnn_cell.BasicLSTMCell(self.vocab_dim, forget_bias=1.)
lstm_cell = rnn_cell.DropoutWrapper(lstm_cell, output_keep_prob=self.dropout_ratio)
self.cells = rnn_cell.MultiRNNCell([lstm_cell] * self.num_layers)
self._initial_state = self.cells.zero_state(self.batch_size, tf.float32)
inputs = tf.nn.dropout(self._input, self.dropout_ratio)
inputs = [tf.reshape(input_, (self.batch_size, self.vocab_dim)) for input_ in
tf.split(1, self.max_seq_length, inputs)]
outputs, states = rnn.rnn(self.cells, inputs, initial_state=self._initial_state)
# We only care about the output of the last RNN cell...
y_pred = tf.nn.xw_plus_b(outputs[-1], tf.get_variable("softmax_w", [self.vocab_dim, 2]), tf.get_variable("softmax_b", [2]))
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_pred, self._target))
correct_pred = tf.equal(tf.argmax(y_pred, 1), tf.argmax(self._target, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
train_op = tf.train.AdamOptimizer(self.lr).minimize(loss)
init = tf.initialize_all_variables()
with tf.Session() as sess:
initializer = tf.random_uniform_initializer(-0.04, 0.04)
with tf.variable_scope("model", reuse=True, initializer=initializer):
sess.run(init)
# generate batches here (omitted for clarity)
print sess.run([train_op, loss, accuracy], feed_dict={self._input: batch_x, self._target: batch_y})
The problem is that no matter how large the dataset is, the loss and accuracy has no sign of improvement (looks completely stochastic). Am I doing anything wrong?
Update:
# First, load Word2Vec model in Gensim.
model = Doc2Vec.load(word2vec_path)
# Second, build the dictionary.
gensim_dict = Dictionary()
gensim_dict.doc2bow(model.vocab.keys(), allow_update=True)
w2indx = {v: k + 1 for k, v in gensim_dict.items()}
w2vec = {word: model[word] for word in w2indx.keys()}
# Third, read data from a text file.
for fname in fnames:
i = 0
with codecs.open(fname, 'r', encoding='utf8') as fr:
for line in fr:
tmp = []
for t in line.split():
tmp.append(t)
X_train.append(tmp)
i += 1
if i is samples_count:
break
# Fourth, convert words into vectors, and pad each sentence with ZERO arrays to a fixed length.
result = np.zeros((len(data), self.max_seq_length, self.vocab_dim), dtype=np.float32)
for rowNo in xrange(len(data)):
rowLen = len(data[rowNo])
for colNo in xrange(rowLen):
word = data[rowNo][colNo]
if word in w2vec:
result[rowNo][colNo] = w2vec[word]
else:
result[rowNo][colNo] = [0] * self.vocab_dim
for colPadding in xrange(rowLen, self.max_seq_length):
result[rowNo][colPadding] = [0] * self.vocab_dim
return result
# Fifth, generate batches and feed them to the model.
... Trivias ...
Here are few reasons it may not be training and suggestions to try:
You are not allowing to update word vectors, space of pre-learned vectors may be not working properly.
RNNs really need gradient clipping when trained. You can try adding something like this.
Unit scale initialization seems to work better, as it accounts for the size of the layer and allows gradient to be scaled properly as it goes deeper.
You should try removing dropout and second layer - just to check if your data passing is correct and your loss is going down at all.
I also can recommend trying this example with your data: https://github.com/tensorflow/skflow/blob/master/examples/text_classification.py
It trains word vectors from scratch, already has gradient clipping and uses GRUCells which usually are easier to train. You can also see nice visualizations for loss and other things by running tensorboard logdir=/tmp/tf_examples/word_rnn.

Resources