I have a trouble understanding how back-propagation works in Encoder in seq2seq model. There are no labels, therefore it's not possible to calculate error, which is back-propagated, however weights of LSTM layer are somehow updated.
l_enc_input = Input(batch_shape=(batch_size, None, embedding_size))
l_enc_lstm = LSTM(encoding_size, return_sequences=False, return_state=True, stateful=True, dropout=0.2)
l_dec_input = Input(batch_shape=(batch_size, None, embedding_size))
l_dec_lstm = LSTM(encoding_size, return_sequences=False, stateful=True, dropout=0.2)
l_dec_dense = Dense(embedding_size, activation="softmax")
t_enc_out = l_enc_lstm(l_enc_input)
state = t_enc_out[1:]
t_dec_out = l_dec_dense(l_dec_lstm(l_dec_input, initial_state=state))
model_train = Model(inputs=[l_enc_input, l_dec_input], outputs=[t_dec_out])
model_train.compile(optimizer="adam", loss="categorical_crossentropy")
A seq2seq/autoencoder consists of an encoder that processes the input and a decoder that generates the output. During training, input is provided to the encoder and the output of the encoder is provided to the decoder. The goal is that the output of the decoder should be close to the input. So this is how the loss is computed, between the output of the decoder and the input.
In high level pseudo-code:
Let x be the input.
x' = decoder(encoder(x))
loss = f(x', x)
Hope that helps!
There is a great explanation here.
Also the wikipedia page is very detailed.
Related
I have been trying to fine tune a BERT model to give response sentences like a character based on input sentences but I am getting a rather odd error every time . the code is
`
Here sourcetexts is a list of sentences that give the context and target_text is a list of sentences that give response to context statments
from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained("bert-base-cased").to(device)
tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
input_ids = \[\]
output_ids = \[\]
for i in range (0 , len(source_text):
input_ids.append(tokenizer.encode(source_texts\[i\], return_tensors="pt"))
output_ids.append(tokenizer.encode(target_texts\[i\], return_tensors="pt"))
import torch
device = torch.device("cuda")
from transformers import BertForMaskedLM, AdamW
model = BertForMaskedLM.from_pretrained("bert-base-cased")
optimizer = AdamW(model.parameters(), lr=1e-5)
loss_fn = torch.nn.CrossEntropyLoss()
def train(input_id, output_id):
input_id = input_id.to(device)
output_id = output_id.to(device)
model.zero_grad()
logits, _ = model(input_id, labels=output_id)
# Compute the loss
loss = loss_fn(logits.view(-1, logits.size(-1)), output_id.view(-1))
loss.backward()
optimizer.step()
return loss.item()
for epoch in range(50):
\# Train the model on the training dataset
train_loss = 0.0
for input_sequences, output_sequences in zip(input_ids, output_ids):
input_sequences = input_sequences.to(device)
output_sequences = output_sequences.to(device)
train_loss += train(input_sequences, output_sequences)
This is the Error that I am getting
Any help would be really appreciated .
Pls help!!
Hi i saw your code but you didn't move your model to GPU, only the inputs, pytorch by default is on CPU
import torch
device = torch.device('cuda')
model = BertForMaskedLM.from_pretrained("bert-base-cased")
model.to(device)
I would like to know what is the best approach in order to use librosa.feature.mfcc feature extraction in a Random Forest classifier?
2 cases is as follows:
Case 1:
I have 1000 audio files and use the librosa mfcc feature extraction as is:
def extract_features(file_name):
try:
durationSeconds = 1
audio, sample_rate = librosa.load(file_name, res_type='kaiser_fast')
trimmed = librosa.util.fix_length(audio, size=int(sample_rate * durationSeconds))
mfccs = librosa.feature.mfcc(y=trimmed, sr=sample_rate, n_mfcc=40)
pad_width = max_pad_len - mfccs.shape[1]
mfccs = np.pad(mfccs, pad_width=((0, 0), (0, pad_width)), mode='constant')
except Exception as e:
print("Error encountered while parsing file: ", file_name)
return None
return mfccs
I would then flatten the 3D array generated by this feature extraction via:
X_train_flat = np.array([features_2d.flatten() for features_2d in X_train])
and then send this to the Random Forest classifier.
Case 2:
For the same 1000 audio files, I use:
def extract_features(file_name):
try:
durationSeconds = 1
audio, sample_rate = librosa.load(file_name, res_type='kaiser_fast')
trimmed = librosa.util.fix_length(audio, size=int(sample_rate * durationSeconds))
mfccs = librosa.feature.mfcc(y=trimmed, sr=sample_rate, n_mfcc=40)
pd.Series(np.hstack((np.mean( mfccs, axis=1), np.std( mfccs, axis=1)))) #returns mfcc features with mean and standard deviation along time
except Exception as e:
print("Error encountered while parsing file: ", file_name)
return None
return pd.Series([0]*40)
I would then pass this X_train data to the Random Forest Classifier.
rfc.fit(X_train, y_train)
Note that in the first use case, when I flatten the data, I get X_train of size 1000 x 6920. This, in effect, have me parsing 6920 features to the Random Forest classifier for analysis compared to the 2nd use case of 40.
Can you tell me which approach is correct?
Thanks!
I have just started learning mlr3 and have read the mlr3 book (parameters optimization).
In the book, they provided an example for the nested hyperparameters but I do not know how to provide the final prediction i.e. predict (model, test data). The following code provides learner, task, inner resampling (holdout), outer-resampling (3-fold CV), and grid search for tuning. My questions are:
(1) Dont we need to train the optimized model i.e. at in this case like train(at, task) ?
(2) After train, how to predict the data with test data as I am not seeing any splits of train and test data?
The code taken from mlr3 book (https://mlr3book.mlr-org.com/nested-resampling.html) is as follows:
library("mlr3tuning")
task = tsk("iris")
learner = lrn("classif.rpart")
resampling = rsmp("holdout")
measure = msr("classif.ce")
param_set = paradox::ParamSet$new(
params = list(paradox::ParamDbl$new("cp", lower = 0.001, upper = 0.1)))
terminator = trm("evals", n_evals = 5)
tuner = tnr("grid_search", resolution = 10)
at = AutoTuner$new(learner, resampling, measure = measure,
param_set, terminator, tuner = tuner)
rr = resample(task = task, learner = at, resampling = resampling_outer)
See The "Cross-Validation - Train/Predict" misunderstanding.
As the model is not learning I wanted to make sure whether I coded the below lines correctly.
This lines shows the architecture of SRGAN model:-
generated_image = generator(input_low_res_img_layer)
discriminator_output = discriminator(generated_image)
fake_features = vgg_model(generated_image)
GAN_model = tf.keras.models.Model(inputs = input_low_res_img_layer, outputs = [discriminator_output,fake_features])
Below lines are used for the compilation of the models:-
my_optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
vgg_model.compile(loss='mse', optimizer=my_optimizer, metrics=['accuracy'])
discriminator.compile(loss='binary_crossentropy', optimizer= my_optimizer, metrics=['accuracy'])
GAN_model.compile(loss=['binary_crossentropy','mse'], loss_weights=[1e-3,1],metrics=['accuracy'])
Below lines are GAN and discriminator model fit codes:
discriminator.trainable = True
discriminator.fit(real_and_fake_images,real_and_fake_labels)
discriminator.trainable = False
GAN_model.fit(low_res_img_batch, [np.ones((batch_size,1)),vgg_model(high_res_img_batch)])
The image shows the result of the training. The first metric of each mini-batch is for the discriminator and the second line is for the GAN network.
In case needed, Full code here
Right now I am going through the tensorflow example on LSTMs where they use the PTB dataset to create an LSTM network capable of predicting the next word. I've spent a lot of time trying to understand the code, and have a good understanding for most of it however there is one function which I don't fully grasp:
def run_epoch(session, model, eval_op=None, verbose=False):
"""Runs the model on the given data."""
costs = 0.0
iters = 0
state = session.run(model.initial_state)
fetches = {
"cost": model.cost,
"final_state": model.final_state,
}
if eval_op is not None:
fetches["eval_op"] = eval_op
for step in range(model.input.epoch_size):
feed_dict = {}
for i, (c, h) in enumerate(model.initial_state):
feed_dict[c] = state[i].c
feed_dict[h] = state[i].h
vals = session.run(fetches, feed_dict)
cost = vals["cost"]
state = vals["final_state"]
costs += cost
iters += model.input.num_steps
return np.exp(costs / iters)
My confusion is this: each time through the outerloop I believe we have processed batch_size * num_steps numbers of words, done the forward propagation and done the backward propagation. But, how in the next iteration, for example, do we know to start with the 36th word of each batch if num_steps = 35? I suspect it is some change in an attribute of the class model on each iteration but I cannot figure that out. Thanks for your help.