How do I tune random forest with oob error?

How do I tune random forest with oob error? - mlr3

Instead of doing a CV and train the Random Forest multiple times I would like to use the OOB Error as and unbiased estimation of the generalized error.
And for a few data points (in the low thousands), does it make sense to use the OOB error instead of CV, since it might be possible that only a few data points are oob?
So far I could only find something about it from this issue thread https://github.com/mlr-org/mlr/issues/338 from mlr. I think it is suggested to use a hould out split with almost only training data.
I found the insample resampling method https://mlr3.mlr-org.com/reference/mlr_resamplings_insample.html which uses the same data for training and testing.
This is my code:
learner = as_learner(
po("select", selector=selector_name(selection)) %>>% po("learner", learner=lrn("regr.ranger"))
)
sp = ps(
regr.ranger.mtry.ratio = p_dbl(0, 1),
regr.ranger.replace = p_fct(c(TRUE, FALSE)),
regr.ranger.sample.fraction = p_dbl(0.1, 1),
regr.ranger.num.trees = p_int(1, 2000)
)
at = auto_tuner(
resampling = rsmp("insample"),
method = "random_search",
learner = learner,
measure = msr("oob_error"),
term_evals = 5,
search_space=sp
)
learners = c(at)
resamplings = rsmp("cv", folds = 5)
design = benchmark_grid(task, learners, resamplings)
bmr = benchmark(design)
But when running the code above, I get the error: Error in learner$oob_error() : attempt to apply non-function

The problem is that the resulting GraphLearner does not have the method oob_error() anymore. This is similar to the issues here:
https://github.com/mlr-org/mlr3pipelines/issues/291
Edit: Add workaround.
This suggestion should be seen as a workaround.
The idea is that it is possible to write custom measures as mentioned in the comments. A tutorial on that can be found in the mlr3 book
This custom measure only works in this specific case, because it is tailored to the specific structure of the GraphLearner. For a different learner, the measure would have to be adjusted.
library(mlr3verse)
#> Loading required package: mlr3
task = tsk("mtcars")
selection = c("mpg", "cyl")
learner = as_learner(
po("select", selector = selector_name(selection)) %>>% po("learner", learner = lrn("regr.ranger"))
)
sp = ps(
regr.ranger.mtry.ratio = p_dbl(0, 1),
regr.ranger.replace = p_fct(c(TRUE, FALSE)),
regr.ranger.sample.fraction = p_dbl(0.1, 1),
regr.ranger.num.trees = p_int(1, 2000)
)
MyMeasure = R6::R6Class(
"MyMeasure",
inherit = MeasureRegr,
public = list(
initialize = function() {
super$initialize(
id = "MyMeasure",
range = c(-Inf, Inf),
minimize = TRUE,
predict_type = "response",
properties = "requires_learner"
)
}
),
private = list(
.score = function(prediction, learner, ...) {
model = learner$state$model$regr.ranger
if (is.null(model)) stop("Set store_models = TRUE.")
model$model$prediction.error
}
)
)
at = auto_tuner(
resampling = rsmp("insample"),
method = "random_search",
learner = learner,
measure = MyMeasure$new(),
term_evals = 1,
search_space = sp,
store_models = TRUE
)
learners = c(at)
resamplings = rsmp("cv", folds = 5)
design = benchmark_grid(task, learners, resamplings)
lgr::get_logger("mlr3")$set_threshold(NULL)
lgr::get_logger("mlr3tuning")$set_threshold(NULL)
bmr = benchmark(design)
#> INFO [23:28:45.638] [mlr3] Running benchmark with 5 resampling iterations
#> INFO [23:28:45.740] [mlr3] Applying learner 'select.regr.ranger.tuned' on task 'mtcars' (iter 1/5)
#> INFO [23:28:47.112] [bbotk] Starting to optimize 4 parameter(s) with '<OptimizerRandomSearch>' and '<TerminatorEvals> [n_evals=1, k=0]'
#> INFO [23:28:47.158] [bbotk] Evaluating 1 configuration(s)
#> INFO [23:28:47.201] [mlr3] Running benchmark with 1 resampling iterations
#> INFO [23:28:47.209] [mlr3] Applying learner 'select.regr.ranger' on task 'mtcars' (iter 1/1)
#> INFO [23:28:47.346] [mlr3] Finished benchmark
#> INFO [23:28:47.419] [bbotk] Result of batch 1:
#> INFO [23:28:47.424] [bbotk] regr.ranger.mtry.ratio regr.ranger.replace regr.ranger.sample.fraction
#> INFO [23:28:47.424] [bbotk] 0.5708216 TRUE 0.4830289
#> INFO [23:28:47.424] [bbotk] regr.ranger.num.trees MyMeasure warnings errors runtime_learners
#> INFO [23:28:47.424] [bbotk] 1209 11.39842 0 0 0.124
#> INFO [23:28:47.424] [bbotk] uhash
#> INFO [23:28:47.424] [bbotk] abfcaa2f-8b01-4821-8e8b-1d209fbe2229
#> INFO [23:28:47.444] [bbotk] Finished optimizing after 1 evaluation(s)
#> INFO [23:28:47.445] [bbotk] Result:
#> INFO [23:28:47.447] [bbotk] regr.ranger.mtry.ratio regr.ranger.replace regr.ranger.sample.fraction
#> INFO [23:28:47.447] [bbotk] 0.5708216 TRUE 0.4830289
#> INFO [23:28:47.447] [bbotk] regr.ranger.num.trees learner_param_vals x_domain MyMeasure
#> INFO [23:28:47.447] [bbotk] 1209 <list[6]> <list[4]> 11.39842
#> INFO [23:28:47.616] [mlr3] Applying learner 'select.regr.ranger.tuned' on task 'mtcars' (iter 2/5)
#> INFO [23:28:47.733] [bbotk] Starting to optimize 4 parameter(s) with '<OptimizerRandomSearch>' and '<TerminatorEvals> [n_evals=1, k=0]'
#> INFO [23:28:47.758] [bbotk] Evaluating 1 configuration(s)
#> INFO [23:28:47.799] [mlr3] Running benchmark with 1 resampling iterations
#> INFO [23:28:47.807] [mlr3] Applying learner 'select.regr.ranger' on task 'mtcars' (iter 1/1)
#> INFO [23:28:47.900] [mlr3] Finished benchmark
#> INFO [23:28:47.969] [bbotk] Result of batch 1:
#> INFO [23:28:47.971] [bbotk] regr.ranger.mtry.ratio regr.ranger.replace regr.ranger.sample.fraction
#> INFO [23:28:47.971] [bbotk] 0.9683787 FALSE 0.4303312
#> INFO [23:28:47.971] [bbotk] regr.ranger.num.trees MyMeasure warnings errors runtime_learners
#> INFO [23:28:47.971] [bbotk] 112 9.594568 0 0 0.084
#> INFO [23:28:47.971] [bbotk] uhash
#> INFO [23:28:47.971] [bbotk] 4bb2742b-49e2-4b02-adc4-ffaa70aef8d4
#> INFO [23:28:47.984] [bbotk] Finished optimizing after 1 evaluation(s)
#> INFO [23:28:47.984] [bbotk] Result:
#> INFO [23:28:47.986] [bbotk] regr.ranger.mtry.ratio regr.ranger.replace regr.ranger.sample.fraction
#> INFO [23:28:47.986] [bbotk] 0.9683787 FALSE 0.4303312
#> INFO [23:28:47.986] [bbotk] regr.ranger.num.trees learner_param_vals x_domain MyMeasure
#> INFO [23:28:47.986] [bbotk] 112 <list[6]> <list[4]> 9.594568
#> INFO [23:28:48.116] [mlr3] Applying learner 'select.regr.ranger.tuned' on task 'mtcars' (iter 3/5)
#> INFO [23:28:48.241] [bbotk] Starting to optimize 4 parameter(s) with '<OptimizerRandomSearch>' and '<TerminatorEvals> [n_evals=1, k=0]'
#> INFO [23:28:48.266] [bbotk] Evaluating 1 configuration(s)
#> INFO [23:28:48.308] [mlr3] Running benchmark with 1 resampling iterations
#> INFO [23:28:48.316] [mlr3] Applying learner 'select.regr.ranger' on task 'mtcars' (iter 1/1)
#> INFO [23:28:48.413] [mlr3] Finished benchmark
#> INFO [23:28:48.480] [bbotk] Result of batch 1:
#> INFO [23:28:48.483] [bbotk] regr.ranger.mtry.ratio regr.ranger.replace regr.ranger.sample.fraction
#> INFO [23:28:48.483] [bbotk] 0.4089994 TRUE 0.1780138
#> INFO [23:28:48.483] [bbotk] regr.ranger.num.trees MyMeasure warnings errors runtime_learners
#> INFO [23:28:48.483] [bbotk] 620 38.86261 0 0 0.089
#> INFO [23:28:48.483] [bbotk] uhash
#> INFO [23:28:48.483] [bbotk] 9b47bdb0-15dc-421d-9091-db2e6c41cbee
#> INFO [23:28:48.495] [bbotk] Finished optimizing after 1 evaluation(s)
#> INFO [23:28:48.496] [bbotk] Result:
#> INFO [23:28:48.498] [bbotk] regr.ranger.mtry.ratio regr.ranger.replace regr.ranger.sample.fraction
#> INFO [23:28:48.498] [bbotk] 0.4089994 TRUE 0.1780138
#> INFO [23:28:48.498] [bbotk] regr.ranger.num.trees learner_param_vals x_domain MyMeasure
#> INFO [23:28:48.498] [bbotk] 620 <list[6]> <list[4]> 38.86261
#> INFO [23:28:48.646] [mlr3] Applying learner 'select.regr.ranger.tuned' on task 'mtcars' (iter 4/5)
#> INFO [23:28:48.763] [bbotk] Starting to optimize 4 parameter(s) with '<OptimizerRandomSearch>' and '<TerminatorEvals> [n_evals=1, k=0]'
#> INFO [23:28:48.788] [bbotk] Evaluating 1 configuration(s)
#> INFO [23:28:48.829] [mlr3] Running benchmark with 1 resampling iterations
#> INFO [23:28:48.837] [mlr3] Applying learner 'select.regr.ranger' on task 'mtcars' (iter 1/1)
#> INFO [23:28:48.959] [mlr3] Finished benchmark
#> INFO [23:28:49.027] [bbotk] Result of batch 1:
#> INFO [23:28:49.030] [bbotk] regr.ranger.mtry.ratio regr.ranger.replace regr.ranger.sample.fraction
#> INFO [23:28:49.030] [bbotk] 0.3449179 FALSE 0.344375
#> INFO [23:28:49.030] [bbotk] regr.ranger.num.trees MyMeasure warnings errors runtime_learners
#> INFO [23:28:49.030] [bbotk] 1004 11.96155 0 0 0.112
#> INFO [23:28:49.030] [bbotk] uhash
#> INFO [23:28:49.030] [bbotk] d14754c3-ab73-4777-84bd-10daa10318f0
#> INFO [23:28:49.043] [bbotk] Finished optimizing after 1 evaluation(s)
#> INFO [23:28:49.044] [bbotk] Result:
#> INFO [23:28:49.046] [bbotk] regr.ranger.mtry.ratio regr.ranger.replace regr.ranger.sample.fraction
#> INFO [23:28:49.046] [bbotk] 0.3449179 FALSE 0.344375
#> INFO [23:28:49.046] [bbotk] regr.ranger.num.trees learner_param_vals x_domain MyMeasure
#> INFO [23:28:49.046] [bbotk] 1004 <list[6]> <list[4]> 11.96155
#> INFO [23:28:49.203] [mlr3] Applying learner 'select.regr.ranger.tuned' on task 'mtcars' (iter 5/5)
#> INFO [23:28:49.327] [bbotk] Starting to optimize 4 parameter(s) with '<OptimizerRandomSearch>' and '<TerminatorEvals> [n_evals=1, k=0]'
#> INFO [23:28:49.352] [bbotk] Evaluating 1 configuration(s)
#> INFO [23:28:49.393] [mlr3] Running benchmark with 1 resampling iterations
#> INFO [23:28:49.401] [mlr3] Applying learner 'select.regr.ranger' on task 'mtcars' (iter 1/1)
#> INFO [23:28:49.537] [mlr3] Finished benchmark
#> INFO [23:28:49.614] [bbotk] Result of batch 1:
#> INFO [23:28:49.616] [bbotk] regr.ranger.mtry.ratio regr.ranger.replace regr.ranger.sample.fraction
#> INFO [23:28:49.616] [bbotk] 0.4485645 FALSE 0.4184389
#> INFO [23:28:49.616] [bbotk] regr.ranger.num.trees MyMeasure warnings errors runtime_learners
#> INFO [23:28:49.616] [bbotk] 1931 12.59067 0 0 0.127
#> INFO [23:28:49.616] [bbotk] uhash
#> INFO [23:28:49.616] [bbotk] 295d1dc0-810d-4351-9bb4-7255fca38be3
#> INFO [23:28:49.629] [bbotk] Finished optimizing after 1 evaluation(s)
#> INFO [23:28:49.630] [bbotk] Result:
#> INFO [23:28:49.631] [bbotk] regr.ranger.mtry.ratio regr.ranger.replace regr.ranger.sample.fraction
#> INFO [23:28:49.631] [bbotk] 0.4485645 FALSE 0.4184389
#> INFO [23:28:49.631] [bbotk] regr.ranger.num.trees learner_param_vals x_domain MyMeasure
#> INFO [23:28:49.631] [bbotk] 1931 <list[6]> <list[4]> 12.59067
#> INFO [23:28:49.806] [mlr3] Finished benchmark
Created on 2023-01-30 with reprex v2.0.2

Related

Setting `early_stopping_rounds` in xgboost learner using mlr3

I want to tune an xgboost learner and set the parameter early_stopping_rounds to 10% of the parameter nrounds (whichever is generated each time).
Should be a simple thing to do in general (i.e. tuning a parameter relative to another) but I can't make it work, see example below:
library(mlr3verse)
#> Loading required package: mlr3
learner = lrn('surv.xgboost', nrounds = to_tune(50, 5000),
early_stopping_rounds = to_tune(ps(
a = p_int(10,5000), # had to put something in here, `early_stopping_rounds` also doesn't work
.extra_trafo = function(x, param_set) {
list(early_stopping_rounds = ceiling(0.1 * x$nrounds))
}, .allow_dangling_dependencies = TRUE)))
#> Error in self$assert(xs): Assertion on 'xs' failed: early_stopping_rounds: tune token invalid: to_tune(ps(a = p_int(10, 5000), .extra_trafo = function(x, param_set) { list(early_stopping_rounds = ceiling(0.1 * x$nrounds)) }, .allow_dangling_dependencies = TRUE)) generates points that are not compatible with param early_stopping_rounds.
#> Bad value:
#> numeric(0)
#> Parameter:
#> id class lower upper levels default
#> 1: early_stopping_rounds ParamInt 1 Inf .
# this works though:
pam = ps(z = p_int(-3,3), x = p_int(0,10),
.extra_trafo = function(x, param_set) {
x$z = 2*(x$x) # overwrite z as 2*x
x
})
dplyr::bind_rows(generate_design_random(pam, 5)$transpose())
#> # A tibble: 5 × 2
#> z x
#> <dbl> <int>
#> 1 2 1
#> 2 14 7
#> 3 8 4
#> 4 12 6
#> 5 20 10
Created on 2022-08-29 by the reprex package (v2.0.1)

The reason why your solution is not working is that you are using x$nrounds from the paramset in which it does not exist.
You can use this as a workaround.
library(mlr3verse)
#> Loading required package: mlr3
search_space = ps(
nrounds = p_int(lower = 50, upper = 5000),
.extra_trafo = function(x, param_set) {
x$early_stopping_rounds = as.integer(ceiling(0.1 * x$nrounds))
x
}
)
task = tsk("iris")
learner = lrn("classif.xgboost")
terminator = trm("evals", n_evals = 10)
tuner = tnr("random_search")
at = AutoTuner$new(
learner = learner,
resampling = rsmp("holdout"),
measure = msr("classif.ce"),
search_space = search_space,
terminator = terminator,
tuner = tuner
)
at$train(task)
#> INFO [13:12:50.316] [bbotk] Starting to optimize 1 parameter(s) with '<OptimizerRandomSearch>' and '<TerminatorEvals> [n_evals=10, k=0]'
#> INFO [13:12:50.351] [bbotk] Evaluating 1 configuration(s)
#> INFO [13:12:50.406] [mlr3] Running benchmark with 1 resampling iterations
#> INFO [13:12:50.441] [mlr3] Applying learner 'classif.xgboost' on task 'iris' (iter 1/1)
#> INFO [13:12:51.837] [mlr3] Finished benchmark
#> INFO [13:12:51.865] [bbotk] Result of batch 1:
#> INFO [13:12:51.867] [bbotk] nrounds classif.ce warnings errors runtime_learners
#> INFO [13:12:51.867] [bbotk] 3497 0 0 0 1.387
#> INFO [13:12:51.867] [bbotk] uhash
#> INFO [13:12:51.867] [bbotk] 8a8e7d03-3166-4c03-8e06-78fe9f4e8a35
#> INFO [13:12:51.870] [bbotk] Evaluating 1 configuration(s)
#> INFO [13:12:51.918] [mlr3] Running benchmark with 1 resampling iterations
#> INFO [13:12:51.926] [mlr3] Applying learner 'classif.xgboost' on task 'iris' (iter 1/1)
#> INFO [13:12:53.650] [mlr3] Finished benchmark
#> INFO [13:12:53.680] [bbotk] Result of batch 2:
#> INFO [13:12:53.681] [bbotk] nrounds classif.ce warnings errors runtime_learners
#> INFO [13:12:53.681] [bbotk] 4197 0 0 0 1.718
#> INFO [13:12:53.681] [bbotk] uhash
#> INFO [13:12:53.681] [bbotk] 85c94228-4419-4e7e-8f4b-6e289a2d2900
#> INFO [13:12:53.684] [bbotk] Evaluating 1 configuration(s)
#> INFO [13:12:53.725] [mlr3] Running benchmark with 1 resampling iterations
#> INFO [13:12:53.730] [mlr3] Applying learner 'classif.xgboost' on task 'iris' (iter 1/1)
#> INFO [13:12:54.648] [mlr3] Finished benchmark
#> INFO [13:12:54.683] [bbotk] Result of batch 3:
#> INFO [13:12:54.685] [bbotk] nrounds classif.ce warnings errors runtime_learners
#> INFO [13:12:54.685] [bbotk] 2199 0 0 0 0.911
#> INFO [13:12:54.685] [bbotk] uhash
#> INFO [13:12:54.685] [bbotk] cd33357f-13bf-4851-8da3-f3c1b58755a6
#> INFO [13:12:54.687] [bbotk] Evaluating 1 configuration(s)
#> INFO [13:12:54.727] [mlr3] Running benchmark with 1 resampling iterations
#> INFO [13:12:54.732] [mlr3] Applying learner 'classif.xgboost' on task 'iris' (iter 1/1)
#> INFO [13:12:56.651] [mlr3] Finished benchmark
#> INFO [13:12:56.679] [bbotk] Result of batch 4:
#> INFO [13:12:56.681] [bbotk] nrounds classif.ce warnings errors runtime_learners
#> INFO [13:12:56.681] [bbotk] 4679 0 0 0 1.909
#> INFO [13:12:56.681] [bbotk] uhash
#> INFO [13:12:56.681] [bbotk] 4efe832d-9163-4447-9e4c-5a41190de74c
#> INFO [13:12:56.684] [bbotk] Evaluating 1 configuration(s)
#> INFO [13:12:56.722] [mlr3] Running benchmark with 1 resampling iterations
#> INFO [13:12:56.727] [mlr3] Applying learner 'classif.xgboost' on task 'iris' (iter 1/1)
#> INFO [13:12:57.850] [mlr3] Finished benchmark
#> INFO [13:12:57.875] [bbotk] Result of batch 5:
#> INFO [13:12:57.877] [bbotk] nrounds classif.ce warnings errors runtime_learners
#> INFO [13:12:57.877] [bbotk] 2422 0 0 0 1.116
#> INFO [13:12:57.877] [bbotk] uhash
#> INFO [13:12:57.877] [bbotk] 8db417a2-0b6e-4844-9c07-4c83e899964e
#> INFO [13:12:57.880] [bbotk] Evaluating 1 configuration(s)
#> INFO [13:12:57.915] [mlr3] Running benchmark with 1 resampling iterations
#> INFO [13:12:57.920] [mlr3] Applying learner 'classif.xgboost' on task 'iris' (iter 1/1)
#> INFO [13:12:59.769] [mlr3] Finished benchmark
#> INFO [13:12:59.794] [bbotk] Result of batch 6:
#> INFO [13:12:59.795] [bbotk] nrounds classif.ce warnings errors runtime_learners
#> INFO [13:12:59.795] [bbotk] 4721 0 0 0 1.843
#> INFO [13:12:59.795] [bbotk] uhash
#> INFO [13:12:59.795] [bbotk] d37d1ec0-bd89-408b-9c29-ecf657a9bbb5
#> INFO [13:12:59.798] [bbotk] Evaluating 1 configuration(s)
#> INFO [13:12:59.833] [mlr3] Running benchmark with 1 resampling iterations
#> INFO [13:12:59.838] [mlr3] Applying learner 'classif.xgboost' on task 'iris' (iter 1/1)
#> INFO [13:13:00.336] [mlr3] Finished benchmark
#> INFO [13:13:00.369] [bbotk] Result of batch 7:
#> INFO [13:13:00.371] [bbotk] nrounds classif.ce warnings errors runtime_learners
#> INFO [13:13:00.371] [bbotk] 1323 0 0 0 0.491
#> INFO [13:13:00.371] [bbotk] uhash
#> INFO [13:13:00.371] [bbotk] 89f100b9-2f9e-4c47-8734-9165dc215277
#> INFO [13:13:00.374] [bbotk] Evaluating 1 configuration(s)
#> INFO [13:13:00.412] [mlr3] Running benchmark with 1 resampling iterations
#> INFO [13:13:00.417] [mlr3] Applying learner 'classif.xgboost' on task 'iris' (iter 1/1)
#> INFO [13:13:01.706] [mlr3] Finished benchmark
#> INFO [13:13:01.736] [bbotk] Result of batch 8:
#> INFO [13:13:01.737] [bbotk] nrounds classif.ce warnings errors runtime_learners
#> INFO [13:13:01.737] [bbotk] 3424 0 0 0 1.282
#> INFO [13:13:01.737] [bbotk] uhash
#> INFO [13:13:01.737] [bbotk] 9f754641-fa5f-420a-b09a-32fe7512bb9b
#> INFO [13:13:01.740] [bbotk] Evaluating 1 configuration(s)
#> INFO [13:13:01.784] [mlr3] Running benchmark with 1 resampling iterations
#> INFO [13:13:01.789] [mlr3] Applying learner 'classif.xgboost' on task 'iris' (iter 1/1)
#> INFO [13:13:03.160] [mlr3] Finished benchmark
#> INFO [13:13:03.189] [bbotk] Result of batch 9:
#> INFO [13:13:03.191] [bbotk] nrounds classif.ce warnings errors runtime_learners
#> INFO [13:13:03.191] [bbotk] 3432 0 0 0 1.365
#> INFO [13:13:03.191] [bbotk] uhash
#> INFO [13:13:03.191] [bbotk] 47cfe02f-fd4e-4382-9343-b4c4ac274d91
#> INFO [13:13:03.194] [bbotk] Evaluating 1 configuration(s)
#> INFO [13:13:03.232] [mlr3] Running benchmark with 1 resampling iterations
#> INFO [13:13:03.237] [mlr3] Applying learner 'classif.xgboost' on task 'iris' (iter 1/1)
#> INFO [13:13:04.387] [mlr3] Finished benchmark
#> INFO [13:13:04.413] [bbotk] Result of batch 10:
#> INFO [13:13:04.415] [bbotk] nrounds classif.ce warnings errors runtime_learners
#> INFO [13:13:04.415] [bbotk] 2991 0 0 0 1.142
#> INFO [13:13:04.415] [bbotk] uhash
#> INFO [13:13:04.415] [bbotk] a1b9d503-0dae-4c5d-ba50-ffd27a754032
#> INFO [13:13:04.421] [bbotk] Finished optimizing after 10 evaluation(s)
#> INFO [13:13:04.422] [bbotk] Result:
#> INFO [13:13:04.423] [bbotk] nrounds learner_param_vals x_domain classif.ce
#> INFO [13:13:04.423] [bbotk] 3497 <list[4]> <list[2]> 0
Created on 2022-08-29 by the reprex package (v2.0.1)

How to capture fleeting bazel console output

During a bazel build, there's a bunch of text flying by that's temporarily displayed and then deleted from the screen. This happens all across the build. I've tried a couple of redirection techniques with stderr redirecting to standard output to no avail. I've also experimented with bazel's verbose flags.
Question: is there any way to capture this fleeting console output bazel generates? I'd like to at least study what information is being presented before its taken away, more as a learning exercise and to gain familiarity.

These options should allow you to expand all the log messages generated by actions/tasks and redirect them to a file.
# .bazelrc
common --color=no
common --curses=yes
build --show_progress_rate_limit=0
build --show_task_finish
build --show_timestamps
build --worker_verbose
Setting color=no and show_progress_rate_limit=0 results in the progress messages to be expanded (and kept) in the terminal.
curses=yes affects redirection (at least on my machine). The other flags just add more information to the log.
Example output (bash, bazel 1.0.0)
$> bazel build :my_project >& /tmp/bazel_build.log
$> cat /tmp/bazel_build.log
(11:22:46) INFO: Writing tracer profile to '.../command.profile.gz'
(11:22:46) INFO: Current date is 2019-11-01
(11:22:46) Loading: loading...
(11:22:46) Loading:
(11:22:46) Loading: 0 packages loaded
(11:22:46) Loading: 0 packages loaded
Fetching #bazel_tools; fetching
(11:22:46) Loading: 0 packages loaded
Fetching #bazel_tools; fetching
(11:22:46) Loading: 0 packages loaded
currently loading: path/to/my/project
(11:22:46) Analyzing: target //path/to/my/project:my_project (1 packages l\
oaded)
[...]
(11:22:46) INFO: Analyzed target //path/to/my/project:my_project (14 packages loaded, 670 targets configured).
(11:22:46)
(11:22:46) INFO: Found 1 target...
(11:22:46)
(11:22:46) [0 / 1] [Prepa] BazelWorkspaceStatusAction stable-status.txt
(11:22:46) [1 / 13] [Prepa] //path/to/my/project:my_project
(11:22:46) [5 / 12] 3 actions, 0 running
[Prepa] #deps//:my_dependency
(11:22:46) [10 / 12] [Scann] Compiling path/to/my/project/main.cc
(11:22:46) [10 / 12] [Prepa] Compiling path/to/my/project/main.cc
(11:22:46) [10 / 12] .../project:my_project; 0s processwrapper-sandbox
(11:22:46) [11 / 12] [Prepa] Linking path/to/my/project/my_project
Target //path/to/my/project:my_project up-to-date:
(11:22:46) [12 / 12] checking cached actions
bazel-bin/path/to/my/project/my_project
(11:22:46) [12 / 12] checking cached actions
(11:22:46) INFO: Elapsed time: 0.493s, Critical Path: 0.29s
(11:22:46) [12 / 12] checking cached actions
(11:22:46) INFO: 2 processes: 2 processwrapper-sandbox.
(11:22:46) [12 / 12] checking cached actions
(11:22:46) INFO: Build completed successfully, 12 total actions
(11:22:46) INFO: Build completed successfully, 12 total actions
Hope this helps.

bazel build //... &> log.txt
&> does the job

On top of #dms's excellent suggestions, the --subcommands flag can be used to persist the exact command line Bazel invokes for each action execution.

How do I disable the error logger in EUnit test cases?

When running an EUnit test that tests an application or starts and stops (or tests killing of) gen_server or supervisor processs, error logger outputs crash reports and other messages by default:
$ rebar3 eunit 1s
===> Verifying dependencies...
===> Compiling my_app
===> Performing EUnit tests...
...........=INFO REPORT==== 5-Sep-2019::16:32:18.760457 ===
application: ranch
exited: stopped
type: temporary
=INFO REPORT==== 5-Sep-2019::16:32:18.760545 ===
application: xmerl
exited: stopped
type: temporary
=INFO REPORT==== 5-Sep-2019::16:32:18.763882 ===
application: my_app
exited: stopped
type: temporary
......=ERROR REPORT==== 5-Sep-2019::16:32:18.814431 ===
** Generic server my_app_sup terminating
** Last message in was {'EXIT',<0.279.0>,test_kill}
** When Server state == {state,
{local,my_app_sup},
simple_one_for_one,
{[undefined],
#{undefined =>
{child,undefined,undefined,
{my_app_server,start_link,[]},
transient,5000,worker,
[my_app_server]}}},
{maps,#{<0.355.0> => [my_app_test]}},
1,5,[],0,my_app_sup,[]}
** Reason for termination ==
** test_kill
=CRASH REPORT==== 5-Sep-2019::16:32:18.814598 ===
crasher:
initial call: supervisor:my_app_sup/1
pid: <0.354.0>
registered_name: my_app_sup
exception exit: test_kill
in function gen_server:decode_msg/9 (gen_server.erl, line 432)
ancestors: [<0.279.0>]
message_queue_len: 0
messages: []
links: []
dictionary: []
trap_exit: true
status: running
heap_size: 1598
stack_size: 27
reductions: 6463
neighbours:
...........
Finished in 0.457 seconds
28 tests, 0 failures
How can I avoid these expected messages during testing?

These can be avoided by temporarily disabling TTY reports in the error logger. Surround the code which produces the reports with this:
my_test() ->
error_logger:tty(false),
try
% code that produces error logger reports
after
error_logger:tty(true)
end.
If you use this many times in the tests, this wrapper can be useful:
without_error_logger(Fun) ->
error_logger:tty(false),
try
Fun()
after
error_logger:tty(true)
end.
Which is used like so:
without_error_logger(fun() ->
% code that produces error logger reports
end)

Julia: fail to compile ImageView.jl from JuliaPro 1.1.1.1 (although it runs without problem in Julia 1.1)

I'm running on MacOS 10.14.5, and I'm trying to use the Package ImageView.jl on the freshly installed (today: 26.07.2019) Julia Pro 1.1.1.1. After installing ImageView without error message, I get the following error message when trying to use the package. The problem doesn't appear when using the Package on Julia 1.1 (meaning: I can use the Package in Julia 1.1 without problem). I guess that the problem is linked to Atom or Juno. The following issues are also related on github: JuliaImages/ImageView.jl#146 and JuliaGraphics/Gtk.jl#363
using ImageView
[ Info: Precompiling ImageView [86fae568-95e7-573e-a6b2-d8a6b900c9ef]
ERROR: LoadError: LoadError: error compiling top-level scope: could not load library "libgobject-2.0"
dlopen(libgobject-2.0.dylib, 1): image not found
Stacktrace:
[1] include_relative(::Module, ::String) at /Applications/JuliaPro-1.1.1.1.app/Contents/Resources/julia/Contents/Resources/julia/lib/julia/sys.dylib:?
[2] include at ./sysimg.jl:29 [inlined]
[3] include(::String) at /Users/mymac/.juliapro/JuliaPro_v1.1.1.1/packages/Gtk/aP55V/src/Gtk.jl:2
[4] top-level scope at none:0
[5] include_relative(::Module, ::String) at /Applications/JuliaPro-1.1.1.1.app/Contents/Resources/julia/Contents/Resources/julia/lib/julia/sys.dylib:?
[6] include(::Module, ::String) at /Applications/JuliaPro-1.1.1.1.app/Contents/Resources/julia/Contents/Resources/julia/lib/julia/sys.dylib:?
[7] top-level scope at none:2
[8] eval at ./boot.jl:328 [inlined]
[9] eval(::Expr) at ./client.jl:404
[10] top-level scope at ./none:3
in expression starting at /Users/mymac/.juliapro/JuliaPro_v1.1.1.1/packages/Gtk/aP55V/src/GLib/GLib.jl:49
in expression starting at /Users/mymac/.juliapro/JuliaPro_v1.1.1.1/packages/Gtk/aP55V/src/Gtk.jl:7
ERROR: LoadError: Failed to precompile Gtk [4c0ca9eb-093a-5379-98c5-f87ac0bbbf44] to /Users/mymac/.juliapro/JuliaPro_v1.1.1.1/compiled/v1.1/Gtk/Vjnq0.ji.
Stacktrace:
[1] error(::String) at /Applications/JuliaPro-1.1.1.1.app/Contents/Resources/julia/Contents/Resources/julia/lib/julia/sys.dylib:?
[2] compilecache(::Base.PkgId, ::String) at /Applications/JuliaPro-1.1.1.1.app/Contents/Resources/julia/Contents/Resources/julia/lib/julia/sys.dylib:?
[3] _require(::Base.PkgId) at /Applications/JuliaPro-1.1.1.1.app/Contents/Resources/julia/Contents/Resources/julia/lib/julia/sys.dylib:?
[4] require(::Base.PkgId) at /Applications/JuliaPro-1.1.1.1.app/Contents/Resources/julia/Contents/Resources/julia/lib/julia/sys.dylib:? (repeats 2 times)
[5] include_relative(::Module, ::String) at /Applications/JuliaPro-1.1.1.1.app/Contents/Resources/julia/Contents/Resources/julia/lib/julia/sys.dylib:?
[6] include(::Module, ::String) at /Applications/JuliaPro-1.1.1.1.app/Contents/Resources/julia/Contents/Resources/julia/lib/julia/sys.dylib:?
[7] top-level scope at none:2
[8] eval at ./boot.jl:328 [inlined]
[9] eval(::Expr) at ./client.jl:404
[10] top-level scope at ./none:3
in expression starting at /Users/mymac/.juliapro/JuliaPro_v1.1.1.1/packages/ImageView/1uiRS/src/ImageView.jl:5
ERROR: Failed to precompile ImageView [86fae568-95e7-573e-a6b2-d8a6b900c9ef] to /Users/mymac/.juliapro/JuliaPro_v1.1.1.1/compiled/v1.1/ImageView/4mtgY.ji.
Stacktrace:
[1] compilecache(::Base.PkgId, ::String) at ./loading.jl:1197
[2] _require(::Base.PkgId) at ./loading.jl:960
[3] require(::Base.PkgId) at ./loading.jl:858
[4] require(::Module, ::Symbol) at ./loading.jl:853

Most likely the answer is to pin glib 2.58.3, see https://github.com/JuliaGraphics/Cairo.jl/issues/271#issuecomment-476827465 (with most users seeming to report it fixes the problem for them).

scikit-learn GridSearchCV does not work properly with random forest

I have a grid search implementation for random forest models.
train_X, test_X, train_y, test_y = train_test_split(features, target, test_size=.10, random_state=0)
# A bit performance gains can be obtained from standarization
train_X, test_X = standarize(train_X, test_X)
tuned_parameters = [{
'n_estimators': [5],
'criterion': ['mse', 'mae'],
'random_state': [0]
}]
scores = ['neg_mean_squared_error', 'neg_mean_absolute_error']
for n_fold in [5]:
for score in scores:
print("# Tuning hyper-parameters for %s with %d-fold" % (score, n_fold))
start_time = time.time()
print()
# TODO: RandomForestRegressor
clf = GridSearchCV(RandomForestRegressor(verbose=2), tuned_parameters, cv=n_fold,
scoring=score, verbose=2, n_jobs=-1)
clf.fit(train_X, train_y)
... Rest omitted
Before I use it for this grid search, I have used the exact same dataset for many other tasks, so there should not be any problem with the data. In addition, for the test purpose, I first use LinearRegression to see if the entire pipeline goes smoothly, it works. Then I switch to RandomForestRegressor and set a very small number of estimators to test it again. A very strange thing happen them, I'll attach the verbose information. There is a very significant decrease in performance and I don't know what happened. There is no reason to spend 30 minute+ for running one small grid search.
Fitting 5 folds for each of 2 candidates, totalling 10 fits
[CV] criterion=mse, n_estimators=5, random_state=0 ...................
building tree 1 of 5
[CV] criterion=mse, n_estimators=5, random_state=0 ...................
building tree 1 of 5
[CV] criterion=mse, n_estimators=5, random_state=0 ...................
building tree 1 of 5
[CV] criterion=mse, n_estimators=5, random_state=0 ...................
building tree 1 of 5
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 1.0s remaining: 0.0s
building tree 2 of 5
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 1.0s remaining: 0.0s
building tree 2 of 5
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 1.1s remaining: 0.0s
building tree 2 of 5
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 1.1s remaining: 0.0s
building tree 2 of 5
building tree 3 of 5
building tree 3 of 5
building tree 3 of 5
building tree 3 of 5
building tree 4 of 5
building tree 4 of 5
building tree 4 of 5
building tree 4 of 5
building tree 5 of 5
building tree 5 of 5
building tree 5 of 5
building tree 5 of 5
[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 5.0s finished
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 5.0s finished
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 5.0s finished
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 0.2s finished
[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 5.0s finished
[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 0.3s finished
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.1s remaining: 0.0s
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.1s remaining: 0.0s
[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 0.3s finished
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.1s remaining: 0.0s
[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 0.2s finished
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.1s remaining: 0.0s
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.1s remaining: 0.0s
[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 0.8s finished
[CV] .... criterion=mse, n_estimators=5, random_state=0, total= 5.3s
[CV] criterion=mse, n_estimators=5, random_state=0 ...................
[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 0.8s finished
[CV] .... criterion=mse, n_estimators=5, random_state=0, total= 5.3s
building tree 1 of 5
[CV] criterion=mae, n_estimators=5, random_state=0 ...................
[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 0.9s finished
[CV] .... criterion=mse, n_estimators=5, random_state=0, total= 5.3s
building tree 1 of 5
[CV] criterion=mae, n_estimators=5, random_state=0 ...................
building tree 1 of 5
[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 0.9s finished
[CV] .... criterion=mse, n_estimators=5, random_state=0, total= 5.3s
[CV] criterion=mae, n_estimators=5, random_state=0 ...................
building tree 1 of 5
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 1.0s remaining: 0.0s
building tree 2 of 5
building tree 3 of 5
building tree 4 of 5
building tree 5 of 5
[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 5.3s finished
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 0.2s finished
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.1s remaining: 0.0s
[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 0.5s finished
[CV] .... criterion=mse, n_estimators=5, random_state=0, total= 5.6s
[CV] criterion=mae, n_estimators=5, random_state=0 ...................
building tree 1 of 5
The above log is printed in a few second, then things seem to be stucked start here...
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 7.4min remaining: 0.0s
building tree 2 of 5
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 7.5min remaining: 0.0s
building tree 2 of 5
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 7.5min remaining: 0.0s
building tree 2 of 5
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 7.8min remaining: 0.0s
building tree 2 of 5
building tree 3 of 5
building tree 3 of 5
building tree 3 of 5
building tree 3 of 5
building tree 4 of 5
building tree 4 of 5
building tree 4 of 5
building tree 4 of 5
building tree 5 of 5
building tree 5 of 5
building tree 5 of 5
It cost more than 20 minutes for these lines.
BTW, for each GridSearchCV run, linear regression cost less than 1 sec.
Do you have any idea why the performance decrease that much?
Any suggestion and comment are appreciated. Thank you.

Try setting max_depth for the RandomForestRegressor. This should reduce fitting time. By default max_depth=None.
For example:
tuned_parameters = [{
'n_estimators': [5],
'criterion': ['mse', 'mae'],
'random_state': [0],
'max_depth': [4],
}]
Edit: Also, by default RandomForestRegressor has n_jobs=1. It will build one tree at a time with this setting. Try setting n_jobs=-1.
In addition, instead of looping over the scoring parameters to GridSearchCV, you can specify multiple metrics. When doing so, you must also specify the metric you want to GridSearchCV to select on as the value of refit. Then, you can access all scores in the cv_results_ dictionary after the fit.
clf = GridSearchCV(RandomForestRegressor(verbose=2),tuned_parameters,
cv=n_fold, scoring=scores, refit='neg_mean_squared_error',
verbose=2, n_jobs=-1)
clf.fit(train_X, train_y)
results = clf.cv_results_
print(np.mean(results['mean_test_neg_mean_squared_error']))
print(np.mean(results['mean_test_neg_mean_absolute_error']))
http://scikit-learn.org/stable/auto_examples/model_selection/plot_multi_metric_evaluation.html#sphx-glr-auto-examples-model-selection-plot-multi-metric-evaluation-py

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

How do I tune random forest with oob error? - mlr3

Related

Setting `early_stopping_rounds` in xgboost learner using mlr3

How to capture fleeting bazel console output

How do I disable the error logger in EUnit test cases?

Julia: fail to compile ImageView.jl from JuliaPro 1.1.1.1 (although it runs without problem in Julia 1.1)

scikit-learn GridSearchCV does not work properly with random forest

Categories

Resources