How to refer to previous targets in drake? - drake-r-package

I would like to use the wildcard to generate a bunch of targets, and then have another set of targets that refers to those original targets. I think this example represents my idea:
plan <- drake_plan(
sub_task = runif(1000, min = mean__, max = 50),
full_task = sub_task * 2
)
step <- 1:4
full_plan <- evaluate_plan(
plan,
rules = list(
mean__ = step
)
)
So what I get now is 5 targets, 4 sub_tasks and a single final_task. What I'm looking for is to get 8 targets. The 4 sub_tasks (that are good), and 4 more that are based on those 4 good sub_tasks.

This question comes up regularly, and I like how you phrased it.
More about the problem
For onlookers, I will print out the plan and the graph of the current (problematic) workflow.
library(drake)
plan <- drake_plan(
sub_task = runif(1000, min = mean__, max = 50),
full_task = sub_task * 2
)
step <- 1:4
full_plan <- evaluate_plan(
plan,
rules = list(
mean__ = step
)
)
full_plan
#> # A tibble: 5 x 2
#> target command
#> <chr> <chr>
#> 1 sub_task_1 runif(1000, min = 1, max = 50)
#> 2 sub_task_2 runif(1000, min = 2, max = 50)
#> 3 sub_task_3 runif(1000, min = 3, max = 50)
#> 4 sub_task_4 runif(1000, min = 4, max = 50)
#> 5 full_task sub_task * 2
config <- drake_config(full_plan)
vis_drake_graph(config)
Created on 2018-12-18 by the reprex package (v0.2.1)
Solution
As you say, we want full_task_* targets that depend on their corresponding single_task_* targets. to accomplish this, we need to use the mean__ wildcard in the full_task_* commands as well. Wildcards are an early-days interface based on text replacement, so they do not need to be independent variable names in their own right.
library(drake)
plan <- drake_plan(
sub_task = runif(1000, min = mean__, max = 50),
full_task = sub_task_mean__ * 2
)
step <- 1:4
full_plan <- evaluate_plan(
plan,
rules = list(
mean__ = step
)
)
full_plan
#> # A tibble: 8 x 2
#> target command
#> <chr> <chr>
#> 1 sub_task_1 runif(1000, min = 1, max = 50)
#> 2 sub_task_2 runif(1000, min = 2, max = 50)
#> 3 sub_task_3 runif(1000, min = 3, max = 50)
#> 4 sub_task_4 runif(1000, min = 4, max = 50)
#> 5 full_task_1 sub_task_1 * 2
#> 6 full_task_2 sub_task_2 * 2
#> 7 full_task_3 sub_task_3 * 2
#> 8 full_task_4 sub_task_4 * 2
config <- drake_config(full_plan)
vis_drake_graph(config)
Created on 2018-12-18 by the reprex package (v0.2.1)

Related

How to set the graph learner id in mlr3pipelines?

I construct a benchmark with 4 graph learners on 1 dataset. The learner_id of the result of the benchmark is so long because I have some preprocessings. How can I set the learner id so that it wouldn't too long.
Here's my code:
# step 1 the task
all_plays <- readRDS("../000files/all_plays.rds")
pbp_task <- as_task_classif(all_plays, target="play_type")
split_task <- partition(pbp_task, ratio=0.75)
task_train <- pbp_task$clone()$filter(split_task$train)
task_test <- pbp_task$clone()$filter(split_task$test)
# step 2 the preprocess
pbp_prep <- po("select",
selector = selector_invert(
selector_name(c("half_seconds_remaining","yards_gained","game_id")))
) %>>%
po("colapply",
affect_columns = selector_name(c("posteam","defteam")),
applicator = as.factor) %>>%
po("filter",
filter = mlr3filters::flt("find_correlation"), filter.cutoff=0.3) %>>%
po("scale", scale = F) %>>%
po("removeconstants")
# step 3 learners
rf_glr <- as_learner(pbp_prep %>>% lrn("classif.ranger", predict_type="prob"))
log_glr <-as_learner(pbp_prep %>>% lrn("classif.log_reg", predict_type="prob"))
tree_glr <- as_learner(pbp_prep %>>% lrn("classif.rpart", predict_type="prob"))
kknn_glr <- as_learner(pbp_prep %>>% lrn("classif.kknn", predict_type="prob"))
# step 4 benckmark grid
set.seed(0520)
cv <- rsmp("cv",folds=10)
design <- benchmark_grid(
tasks = task_train,
learners = list(rf_glr,log_glr,tree_glr,kknn_glr),
resampling = cv
)
# step 5 benchmark
bmr <- benchmark(design,store_models = T)
bmr
# learner_id toooo long...
<BenchmarkResult> of 40 rows with 4 resampling runs
nr task_id learner_id resampling_id
1 all_plays select.colapply.find_correlation.scale.removeconstants.randomForest cv
2 all_plays select.colapply.find_correlation.scale.removeconstants.logistic cv
3 all_plays select.colapply.find_correlation.scale.removeconstants.decisionTree cv
4 all_plays select.colapply.find_correlation.scale.removeconstants.kknn cv
iters warnings errors
10 0 0
10 0 0
10 0 0
10 0 0
The learner_id is too long in this result and it's also bad for autoplot(bmr). How can I set the learner_id to make it short?
Thank you very much.
You can do:
library(mlr3verse)
#> Loading required package: mlr3
learner = as_learner(po("pca") %>>% po("learner", lrn("regr.rpart")))
learner$id = "my_id"
print(learner)
#> <GraphLearner:my_id>
#> * Model: -
#> * Parameters: regr.rpart.xval=0
#> * Packages: mlr3, mlr3pipelines, rpart
#> * Predict Types: [response], se, distr
#> * Feature Types: logical, integer, numeric, character, factor, ordered,
#> POSIXct
#> * Properties: featureless, hotstart_backward, hotstart_forward,
#> importance, loglik, missings, oob_error, selected_features, weights
Created on 2022-07-22 by the reprex package (v2.0.1)

Forecasting using mutiple seasonal STL and arima

I am attempting to forecast half hourly electricity data. The method I am using is to decompose the electricity consumption data using 'mstl' from the 'Forecast' package by Rob Hyndman and then forecast the seasonally adjusted data using ARIMA.
df <- IntervalData %>% select(CONSUMPTION_MW)
length_test_set = 17520
h = 17520
# create msts object with daily, weekly and monthly seasonality
data_msts <- msts(df, seasonal.periods=c(48,48*7,365/12*48))
train_msts = msts(df[1:(nrow(df)-length_test_set),],seasonal.periods=c(48,48*7,365/12*48))
test_msts = msts(df[((nrow(df)-length_test_set)+1):(nrow(df)),],seasonal.periods=c(48,48*7,365/12*48))
fit_mstl = mstl(train_msts, iterate = 4, s.window = 19, robust = TRUE)
fcast_arima=forecast(fit_mstl,method='arima',h=h)
How do I specify the order of my ARIMA model eg. ARIMA(2,1,6)?
You will need to write your own forecast function like this (using fake data so it can be reproduced).
library(forecast)
df <- data.frame(y=rnorm(50000))
length_test_set <- 17520
h <- 17520
# create msts object with daily, weekly and monthly seasonality
data_msts <- msts(df, seasonal.periods = c(48, 48*7, 365/12*48))
train_msts <- msts(df[1:(nrow(df) - length_test_set), ], seasonal.periods = c(48, 48 * 7, 365 / 12 * 48))
test_msts <- msts(df[((nrow(df) - length_test_set) + 1):(nrow(df)), ], seasonal.periods = c(48, 48 * 7, 365 / 12 * 48))
fit_mstl <- mstl(train_msts, iterate = 4, s.window = 19, robust = TRUE)
# Function to fit specific ARIMA model and return forecasts
arima_forecast <- function(x, h, level, order, ...) {
fit <- Arima(x, order=order, seasonal = c(0,0,0), ...)
return(forecast(fit, h = h, level = level))
}
# Example using an ARIMA(3,0,0) model
fcast_arima <- forecast(fit_mstl, forecastfunction=arima_forecast, h = h, order=c(3,0,0))
Created on 2020-07-25 by the reprex package (v0.3.0)

Create groups of targets

Let's say that I have the following plan:
test_plan = drake_plan(
foo = target(x + 1, transform = map(x = c(5, 10))),
bar = 42
)
Now I want to create a new target that contains the two subtargets foo_5, foo_10 and the target bar. How can I do this? I feel it must be super simple but I don't manage to get a solution.
Thanks!
Yes, it is both possible and simple. The built-in solution is to use tags: https://books.ropensci.org/drake/static.html#tags. Example:
library(drake)
drake_plan(
foo = target(
x + 1,
transform = map(x = c(5, 10), .tag_out = group)
),
bar = target(
42,
# You need a transform to use a tag, even for 1 target.
transform = map(tmp = 1, .tag_out = group)
),
baz_map = target(group, transform = map(group)),
baz_combine = target(c(group), transform = combine(group))
)
#> # A tibble: 7 x 2
#> target command
#> <chr> <expr>
#> 1 foo_5 5 + 1
#> 2 foo_10 10 + 1
#> 3 bar_1 42
#> 4 baz_map_foo_5 foo_5
#> 5 baz_map_foo_10 foo_10
#> 6 baz_map_bar_1 bar_1
#> 7 baz_combine c(foo_5, foo_10, bar_1)
Created on 2019-11-16 by the reprex package (v0.3.0)

Substitute variable with value, but don't evaluate

Suppose I have the following expressions:
(%i1) (8*x)*(log(x) / log(10));
(%i2) X^2;
Now, because I want to find out what constant value I can pick to make the statement %i1 is O(%i2) true, I evaluate them in a loop like so:
for a:1 thru 10 do print(%i1, "=", ev(%i1, x=a), %i2, "=", ev(%i2, numer, x=a));
The output is:
8 x log(x) 2
---------- = 0.0 , x = 1
log(10)
8 x log(x) 2
---------- = 4.816479930623698 , x = 4
log(10)
8 x log(x) 2
---------- = 11.45091011327189 , x = 9
log(10)
8 x log(x) 2
---------- = 19.26591972249479 , x = 16
log(10)
8 x log(x) 2
---------- = 27.95880017344075 , x = 25
log(10)
8 x log(x) 2
---------- = 37.35126001841489 , x = 36
log(10)
8 x log(x) 2
---------- = 47.32549024079837 , x = 49
log(10)
8 x log(x) 2
---------- = 57.79775916748438 , x = 64
log(10)
8 x log(x) 2
---------- = 68.70546067963139 , x = 81
log(10)
8 x log(x) 2
---------- = 80.0 , x = 100
log(10)
I want to make the output easier to eyeball, something like:
8 1 log(1) 2
---------- = 0.0 , 1 = 1
log(10)
8 2 log(2) 2
---------- = 4.816479930623698 , 2 = 4
log(10)
8 3 log(3) 2
---------- = 11.45091011327189 , 3 = 9
log(10)
[snip]
8 10 log(10) 2
---------- = 80.0 , 10 = 100
log(10)
How can I tell Maxima to substitute the value of a for x in every iteration of the loop without evaluating the expression?
I've searched the manual, but I didn't find anything seemingly relevant.
A lot of operations in Maxima are carried out by a process called "simplification", which means applying identities to make a "simpler" expression. E.g. 1 + 1 simplifies to 2, sin(0) simplifies to 0, etc.
In order to get the effect you want, we must disable simplification in general, so that expressions are evaluated but not simplified. But to get the numerical values, we need to enable simplification just for those results.
Here's something to do that.
(%i16) simp : false $
(%i17) for x in [1,2,3,4,5]
do print (ev(%i1) = ev(%i1, simp, numer), ev(%i2) = ev(%i2, simp));
log(1) 2
(8 1) (-------) = 0.0 1 = 1
log(10)
log(2) 2
(8 2) (-------) = 4.816479930623698 2 = 4
log(10)
log(3) 2
(8 3) (-------) = 11.4509101132719 3 = 9
log(10)
log(4) 2
(8 4) (-------) = 19.26591972249479 4 = 16
log(10)
log(5) 2
(8 5) (-------) = 27.95880017344075 5 = 25
log(10)
(%o17) done
Note that I wrote for x in [1, 2, 3, 4, 5] ... instead of for x:1 thru 5 .... That's because the latter uses arithmetic, which requires simplification. Try it both ways, I think you'll see the difference, and it is very enlightening, I believe.
Nota bene I've used the same values of %i1 and %i2 as you.
Use "empty" function:
(%i1) display2d: false $
(%i2) prefix("") $
(%i3) almost_subst(a, x, e):= subst(""(a), x, e) $
(%i4) almost_subst(10, x, 8*x*log(x)/log(10));
(%o4) (8* 10*log( 10))/log(10)

Calculated nCr mod m (n choose r) for large values of n (10^9)

Now that CodeSprint 3 is over, I've been wondering how to solve this problem. We need to simply calculate nCr mod 142857 for large values of r and n (0<=n<=10^9 ; 0<=r<=n). I used a recursive method which goes through min(r, n-r) iterations to calculate the combination. Turns out this wasn't efficient enough. I've tried a few different methods, but they all seem to not be efficient enough. Any suggestions?
For non-prime mod, factor it (142857 = 3^3 * 11 * 13 * 37) and compute C(n,k) mod p^q for each prime factor of the mod using the general Lucas theorem, and combine them using Chinese remainder theorem.
For example, C(234, 44) mod 142857 = 6084, then
C(234, 44) mod 3^3 = 9
C(234, 44) mod 11 = 1
C(234, 44) mod 13 = 0
C(234, 44) mod 37 = 16
The Chinese Remainder theorem involves finding x such that
x = 9 mod 3^3
x = 1 mod 11
x = 0 mod 13
x = 16 mod 37
The result is x = 6084.
Example
C(234, 44) mod 3^3
First convert n, k, and n-k to base p
n = 234_10 = 22200_3
k = 44_10 = 1122_3
r = n-k = 190_10 = 21001_3
Next find the number of carries
e[i] = number of carries from i to end
e 4 3 2 1 0
1 1
r 2 1 0 0 1
k 1 1 2 2
n 2 2 2 0 0
Now create the factorial function needed for general Lucas
def f(n, p):
r = 1
for i in range(1, n+1):
if i % p != 0:
r *= i
return r
Since q = 3, you will consider only three digits of the base p representation at a time
So
f(222_3, 3)/[f(210_3, 3) * f(011_3, 3)] *
f(220_3, 3)/[f(100_3, 3) * f(112_3, 3)] *
f(200_3, 3)/[f(001_3, 3) * f(122_3, 3)] = 6719344775 / 7
Now
s = 1 if p = 2 and q >= 3 else -1
Then
p^e[0] * s * 6719344775 / 7 mod 3^3
e[0] = 2
p^e[0] = 3^2 = 9
s = -1
p^e[0] * s * 6719344775 = -60474102975
Now you have
-60474102975 / 7 mod 3^3
This is a linear congruence and can be solved with
ModularInverse(7, 3^3) = 4
4 * -60474102975 mod 27 = 9
Hence C(234, 44) mod 3^3 = 9

Resources