I have just started learning mlr3 and have read the mlr3 book (parameters optimization).
In the book, they provided an example for the nested hyperparameters but I do not know how to provide the final prediction i.e. predict (model, test data). The following code provides learner, task, inner resampling (holdout), outer-resampling (3-fold CV), and grid search for tuning. My questions are:
(1) Dont we need to train the optimized model i.e. at in this case like train(at, task) ?
(2) After train, how to predict the data with test data as I am not seeing any splits of train and test data?
The code taken from mlr3 book (https://mlr3book.mlr-org.com/nested-resampling.html) is as follows:
library("mlr3tuning")
task = tsk("iris")
learner = lrn("classif.rpart")
resampling = rsmp("holdout")
measure = msr("classif.ce")
param_set = paradox::ParamSet$new(
params = list(paradox::ParamDbl$new("cp", lower = 0.001, upper = 0.1)))
terminator = trm("evals", n_evals = 5)
tuner = tnr("grid_search", resolution = 10)
at = AutoTuner$new(learner, resampling, measure = measure,
param_set, terminator, tuner = tuner)
rr = resample(task = task, learner = at, resampling = resampling_outer)
See The "Cross-Validation - Train/Predict" misunderstanding.
Related
final_poly_converter = PolynomialFeatures(degree=3,include_bias=False)
final_poly_features = final_poly_converter.fit_transform(X)
final_scaler = StandardScaler()
scaled_X = final_scaler.fit_transform(final_poly_features)
from sklearn.linear_model import Lasso
final_model = Lasso(alpha=0.004943070909225827,max_iter=1000000)
final_model.fit(scaled_X,y)
from joblib import dump,load
dump(final_model,'lasso_model.joblib')
dump(final_poly_converter,'lasso_poly_coverter.joblib')
dump(final_scaler,'scaler.joblib')
loaded_converter = load('lasso_poly_coverter.joblib')
loaded_model = load('lasso_model.joblib')
loaded_scaler = load('scaler.joblib')
campaign = [[149,22,12]]
transformed_data = loaded_converter.fit_transform(campaign)
scaled_data = loaded_scaler.transform(transformed_data)# fit_transform or only transform
loaded_model.predict(scaled_data)
The output values change when I use fit_transform() and when I use transform()
You should always use fit_transform on your train and transform on test and further predictions. If you refit your scaler on test pool you would have a different feature distribution in your test set vs train set which is something you don't want to happen. Think of scaler params that you fit as part of the model parameters. Naturally you fit all the parameters on the training set and then you don't change them on the test evaluation/prediction.
I have been trying to fine tune a BERT model to give response sentences like a character based on input sentences but I am getting a rather odd error every time . the code is
`
Here sourcetexts is a list of sentences that give the context and target_text is a list of sentences that give response to context statments
from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained("bert-base-cased").to(device)
tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
input_ids = \[\]
output_ids = \[\]
for i in range (0 , len(source_text):
input_ids.append(tokenizer.encode(source_texts\[i\], return_tensors="pt"))
output_ids.append(tokenizer.encode(target_texts\[i\], return_tensors="pt"))
import torch
device = torch.device("cuda")
from transformers import BertForMaskedLM, AdamW
model = BertForMaskedLM.from_pretrained("bert-base-cased")
optimizer = AdamW(model.parameters(), lr=1e-5)
loss_fn = torch.nn.CrossEntropyLoss()
def train(input_id, output_id):
input_id = input_id.to(device)
output_id = output_id.to(device)
model.zero_grad()
logits, _ = model(input_id, labels=output_id)
# Compute the loss
loss = loss_fn(logits.view(-1, logits.size(-1)), output_id.view(-1))
loss.backward()
optimizer.step()
return loss.item()
for epoch in range(50):
\# Train the model on the training dataset
train_loss = 0.0
for input_sequences, output_sequences in zip(input_ids, output_ids):
input_sequences = input_sequences.to(device)
output_sequences = output_sequences.to(device)
train_loss += train(input_sequences, output_sequences)
This is the Error that I am getting
Any help would be really appreciated .
Pls help!!
Hi i saw your code but you didn't move your model to GPU, only the inputs, pytorch by default is on CPU
import torch
device = torch.device('cuda')
model = BertForMaskedLM.from_pretrained("bert-base-cased")
model.to(device)
I'm using the mlr3 to build a machine learning workflow using SVM classfier. When I try to tune the parameter
library(mlr3)
library(mlr3learners)
library(paradox)
library(mlr3tuning)
task = tsk("pima")
learner = lrn("classif.svm")
learner$param_set
tune_ps = ParamSet$new(list(
ParamDbl$new("cost", lower = 0.001, upper = 0.1)
))
tune_ps
hout = rsmp("holdout")
measure = msr("classif.ce")
evals20 = term("evals", n_evals = 20)
instance = TuningInstance$new(
task = task,
learner = learner,
resampling = hout,
measures = measure,
param_set = tune_ps,
terminator = evals20
)
tuner = tnr("grid_search", resolution = 10)
result<-tuner$tune(instance)
It outputs the error
Error in (function (xs) :
Assertion on 'xs' failed: Condition for 'cost' not ok: type equal C-classification; instead: type=
I can't figure out what is happening there.
We decided to solve this with a more descriptive error message but still requiring to set parameters with dependencies explicitly in the ParamSet rather than falling back to ParamSet defaults.
See https://github.com/mlr-org/paradox/pull/262 and related issues/PRs for more information.
I have a trouble understanding how back-propagation works in Encoder in seq2seq model. There are no labels, therefore it's not possible to calculate error, which is back-propagated, however weights of LSTM layer are somehow updated.
l_enc_input = Input(batch_shape=(batch_size, None, embedding_size))
l_enc_lstm = LSTM(encoding_size, return_sequences=False, return_state=True, stateful=True, dropout=0.2)
l_dec_input = Input(batch_shape=(batch_size, None, embedding_size))
l_dec_lstm = LSTM(encoding_size, return_sequences=False, stateful=True, dropout=0.2)
l_dec_dense = Dense(embedding_size, activation="softmax")
t_enc_out = l_enc_lstm(l_enc_input)
state = t_enc_out[1:]
t_dec_out = l_dec_dense(l_dec_lstm(l_dec_input, initial_state=state))
model_train = Model(inputs=[l_enc_input, l_dec_input], outputs=[t_dec_out])
model_train.compile(optimizer="adam", loss="categorical_crossentropy")
A seq2seq/autoencoder consists of an encoder that processes the input and a decoder that generates the output. During training, input is provided to the encoder and the output of the encoder is provided to the decoder. The goal is that the output of the decoder should be close to the input. So this is how the loss is computed, between the output of the decoder and the input.
In high level pseudo-code:
Let x be the input.
x' = decoder(encoder(x))
loss = f(x', x)
Hope that helps!
There is a great explanation here.
Also the wikipedia page is very detailed.
Right now I am going through the tensorflow example on LSTMs where they use the PTB dataset to create an LSTM network capable of predicting the next word. I've spent a lot of time trying to understand the code, and have a good understanding for most of it however there is one function which I don't fully grasp:
def run_epoch(session, model, eval_op=None, verbose=False):
"""Runs the model on the given data."""
costs = 0.0
iters = 0
state = session.run(model.initial_state)
fetches = {
"cost": model.cost,
"final_state": model.final_state,
}
if eval_op is not None:
fetches["eval_op"] = eval_op
for step in range(model.input.epoch_size):
feed_dict = {}
for i, (c, h) in enumerate(model.initial_state):
feed_dict[c] = state[i].c
feed_dict[h] = state[i].h
vals = session.run(fetches, feed_dict)
cost = vals["cost"]
state = vals["final_state"]
costs += cost
iters += model.input.num_steps
return np.exp(costs / iters)
My confusion is this: each time through the outerloop I believe we have processed batch_size * num_steps numbers of words, done the forward propagation and done the backward propagation. But, how in the next iteration, for example, do we know to start with the 36th word of each batch if num_steps = 35? I suspect it is some change in an attribute of the class model on each iteration but I cannot figure that out. Thanks for your help.