Error on tuning parameters using classif.svm in mlr3 - mlr3

I'm using the mlr3 to build a machine learning workflow using SVM classfier. When I try to tune the parameter
library(mlr3)
library(mlr3learners)
library(paradox)
library(mlr3tuning)
task = tsk("pima")
learner = lrn("classif.svm")
learner$param_set
tune_ps = ParamSet$new(list(
ParamDbl$new("cost", lower = 0.001, upper = 0.1)
))
tune_ps
hout = rsmp("holdout")
measure = msr("classif.ce")
evals20 = term("evals", n_evals = 20)
instance = TuningInstance$new(
task = task,
learner = learner,
resampling = hout,
measures = measure,
param_set = tune_ps,
terminator = evals20
)
tuner = tnr("grid_search", resolution = 10)
result<-tuner$tune(instance)
It outputs the error
Error in (function (xs) :
Assertion on 'xs' failed: Condition for 'cost' not ok: type equal C-classification; instead: type=
I can't figure out what is happening there.

We decided to solve this with a more descriptive error message but still requiring to set parameters with dependencies explicitly in the ParamSet rather than falling back to ParamSet defaults.
See https://github.com/mlr-org/paradox/pull/262 and related issues/PRs for more information.

Related

Training random forest (ranger) using caret with custom F4 metric in R yields but after running full ,error showing undefined columns selected

library(MLmetrics)
library(caret)
library(doSNOW)
library(ranger)
data is called as the "bank additional" full from this enter link description here and then following code to generate data1
library(VIM)
data1<-hotdeck(data,variable=c('job','marital','education','default','housing','loan'),domain_var = "y",imp_var=FALSE)
#converting the categorical variables to factors as they should be
library(magrittr)
data1%<>%
mutate_at(colnames(data1)[grepl('factor|logical|character',sapply(data1,class))],factor)
Now, splitting
library(caret)
#spliting data into train test 70/30
set.seed(1234)
trainIndex<-createDataPartition(data1$y,p=0.7,times = 1,list = F)
train<-data1[trainIndex,-11]
test<-data1[-trainIndex,-11]
levels(train$y)
train$y = as.factor(train$y)
# train$y = factor(train$y,levels = c("yes","no"))
# train$y = relevel(train$y,ref="yes")
Here, i got an idea of how to create F1 metric in Training Model in Caret Using F1 Metric
and using fbeta score formula i created f1_val; now i can't understand what lev,obs and pred are indicating . in my train dataset only column y showing data$obs , but no data$pred . So, is following error is due to this? and how to rectify this?
f1 <- function (data, lev = NULL, model = NULL) {
precision <- precision(data$obs,data$pred)
recall <- sensitivity(data$obs,data$pred)
f1_val <- (17*precision*recall)/(16*precision+recall)
names(f1_val) <- c("F1")
f1_val
}
tgrid <- expand.grid(
.mtry = 1:5,
.splitrule = "gini",
.min.node.size = seq(1,500,75)
)
model_caret <- train(train$y~., data = train,
method = "ranger",
trControl = trainControl(method="cv",
number = 2,
verboseIter = T,
classProbs = T,
summaryFunction = f1),
tuneGrid = tgrid,
num.trees = 500,
importance = "impurity",
metric = "F1")
After running for 3/4 minutes we get following :
Aggregating results
Selecting tuning parameters
Fitting mtry = 5, splitrule = gini, min.node.size = 1 on full training set
but error:
Error in `[.data.frame`(data, , all.vars(Terms), drop = FALSE) :
undefined columns selected
Also when running model_caret we get,
Error: object 'model_caret' not found
Kindly help. Thanks in advance

Problem with Tuning & Benchmark "surv.svm"

I get different error messages when I try to tune/benchmark "surv.svm".
For tuning I get the following error
Error in kernelMatrix(Xtrain = sv, kernel_type = kernel_type, kernel_pars = kernel_pars, : additiv kernel can not be applied on constant column
For benchmark I get the following error when poly_kernel is listed
Error in tcrossprod(K, Dc) : non-conformable arguments
When poly_kernel is removed, I get different error message
What is the problem and how to solve it?
task = tsk("actg")
learner = as_learner(ppl("distrcompositor",
learner = lrn("surv.svm", type = "regression",
kernel = to_tune(c("lin_kernel", "add_kernel", "rbf_kernel")),
gamma.mu = to_tune(p_dbl(-3, 1, trafo = function(x) 10^x))),
estimator = "kaplan", form = "ph"))
set.seed(82721)
inner_cv = rsmp("cv", folds = 2)
at_learner = AutoTuner$new(learner = learner,
resampling = inner_cv,
measure = msr("surv.cindex"),
terminator = trm("evals", n_evals = 96),
tuner = tnr("irace"))
at_learner$train(task)

How to construct the FSelectInstanceSingleCrit in mlr3FSelect?

I copied the code from mlr3book:
library(mlr3verse)
task = tsk("pima")
print(task)
learner = lrn("classif.rpart")
hout = rsmp("holdout")
measure = msr("classif.ce")
evals20 = trm("evals", n_evals = 20)
instance = FSelectInstanceSingleCrit$new(
task = task,
learner = learner,
resampling = hout,
measure = measure,
terminator = evals20
)
But I always got this error:
Error in initialize(...) : unused argument (store_x_domain = FALSE)
Is there anything with this code? Could someone give some suggestions? Thank you.
Update your packages with update.packages(). You use an old version of mlr3fselect.

How to test our model in mlr3 with nested hyperparameter optimization

I have just started learning mlr3 and have read the mlr3 book (parameters optimization).
In the book, they provided an example for the nested hyperparameters but I do not know how to provide the final prediction i.e. predict (model, test data). The following code provides learner, task, inner resampling (holdout), outer-resampling (3-fold CV), and grid search for tuning. My questions are:
(1) Dont we need to train the optimized model i.e. at in this case like train(at, task) ?
(2) After train, how to predict the data with test data as I am not seeing any splits of train and test data?
The code taken from mlr3 book (https://mlr3book.mlr-org.com/nested-resampling.html) is as follows:
library("mlr3tuning")
task = tsk("iris")
learner = lrn("classif.rpart")
resampling = rsmp("holdout")
measure = msr("classif.ce")
param_set = paradox::ParamSet$new(
params = list(paradox::ParamDbl$new("cp", lower = 0.001, upper = 0.1)))
terminator = trm("evals", n_evals = 5)
tuner = tnr("grid_search", resolution = 10)
at = AutoTuner$new(learner, resampling, measure = measure,
param_set, terminator, tuner = tuner)
rr = resample(task = task, learner = at, resampling = resampling_outer)
See The "Cross-Validation - Train/Predict" misunderstanding.

Find the best pipeline model using CrossValidator and ParamGridBuilder

I have an acceptable model, but I would like to improve it by adjusting its parameters in Spark ML Pipeline with CrossValidator and ParamGridBuilder.
As an Estimator I will place the existing pipeline.
In ParamMaps I would not know what to put, I do not understand it.
As Evaluator I will use the RegressionEvaluator already created previously.
I'm going to do it for 5 folds, with a list of 10 different depth values in the tree.
How can I select and show the best model for the lowest RMSE?
ACTUAL example:
from pyspark.ml import Pipeline
from pyspark.ml.regression import DecisionTreeRegressor
from pyspark.ml.feature import VectorIndexer
from pyspark.ml.evaluation import RegressionEvaluator
dt = DecisionTreeRegressor()
dt.setPredictionCol("Predicted_PE")
dt.setMaxBins(100)
dt.setFeaturesCol("features")
dt.setLabelCol("PE")
dt.setMaxDepth(8)
pipeline = Pipeline(stages=[vectorizer, dt])
model = pipeline.fit(trainingSetDF)
regEval = RegressionEvaluator(predictionCol = "Predicted_XX", labelCol = "XX", metricName = "rmse")
rmse = regEval.evaluate(predictions)
print("Root Mean Squared Error: %.2f" % rmse)
(1) Spark Jobs
(2) Root Mean Squared Error: 3.60
NEED:
from pyspark.ml.tuning import CrossValidator, ParamGridBuilder
dt2 = DecisionTreeRegressor()
dt2.setPredictionCol("Predicted_PE")
dt2.setMaxBins(100)
dt2.setFeaturesCol("features")
dt2.setLabelCol("PE")
dt2.setMaxDepth(10)
pipeline2 = Pipeline(stages=[vectorizer, dt2])
model2 = pipeline2.fit(trainingSetDF)
regEval2 = RegressionEvaluator(predictionCol = "Predicted_PE", labelCol = "PE", metricName = "rmse")
paramGrid = ParamGridBuilder().build() # ??????
crossval = CrossValidator(estimator = pipeline2, estimatorParamMaps = paramGrid, evaluator=regEval2, numFolds = 5) # ?????
rmse2 = regEval2.evaluate(predictions)
#bestPipeline = ????
#bestLRModel = ????
#bestParams = ????
print("Root Mean Squared Error: %.2f" % rmse2)
(1) Spark Jobs
(2) Root Mean Squared Error: 3.60 # the same ¿?
You need to call .fit() with your training data on the crossval object to create the cv model. That will do the cross validation. Then you get the best model (according to your evaluator metric) from that. Eg.
cvModel = crossval.fit(trainingData)
myBestModel = cvModel.bestModel

Resources