Related
Hi I am having issues regarding a foreach loop where in every iteration I estimate a regression on a subset of the data with a different list of controls on several outcomes. The problem is that for some outcomes in some countries I only have missing values and therefore the regression function returns an error message. I would like to be able to run the loop, get the output with NAs or a string saying "Error" for example instead of the coefficient table. I tried several things but they don't quite work with the .combine = rbind option and if I use .combine = c I get a very messy output. Thanks in advance for any help.
reg <- function(y, d, c){
if (missing(c))
feols(as.formula(paste0(y, "~ 0 + treatment")), data = d)
else {
feols(as.formula(paste0(y, "~ 0 + treatment + ", c)), data = d)
}
}
# Here we set up the parallelization to run the code on the server
n.cores <- 9 #parallel::detectCores() - 1
#create the cluster
my.cluster <- parallel::makeCluster(
n.cores,
type = "PSOCK"
)
# print(my.cluster)
#register it to be used by %dopar%
doParallel::registerDoParallel(cl = my.cluster)
# #check if it is registered (optional)
# foreach::getDoParRegistered()
# #how many workers are available? (optional)
# foreach::getDoParWorkers()
# Here is the cycle to parallel regress each outcome on the global treatment
# variable for each RCT with strata control
tables <- foreach(
n = 1:9, .combine = rbind, .packages = c('data.table', 'fixest'),
.errorhandling = "pass"
) %dopar% {
dt_target <- dt[country == n]
c <- controls[n]
est <- lapply(outcomes, function(x) reg(y = x, d = dt_target, c))
table <- etable(est, drop = "!treatment", cluster = "uid", fitstat = "n")
table
}
Using SWI-Prolog I have made this simple predicate that relates a time that is in hh:mm format into a time term.
time_string(time(H,M), String) :-
number_string(H,Hour),
number_string(M,Min),
string_concat(Hour,":",S),
string_concat(S,Min,String).
The predicate though can only work in one direction.
time_string(time(10,30),String).
String = "10:30". % This is perfect.
Unfortunately this query fails.
time_string(Time,"10:30").
ERROR: Arguments are not sufficiently instantiated
ERROR: In:
ERROR: [11] number_string(_8690,_8692)
ERROR: [10] time_string(time(_8722,_8724),"10:30") at /tmp/prolcompDJBcEE.pl:74
ERROR: [9] toplevel_call(user:user: ...) at /usr/local/logic/lib/swipl/boot/toplevel.pl:1107
It would be really nice if I didn't have to write a whole new predicate to answer this query. Is there a way I could do this?
Well, going from the structured term time(H,M) to the string String is easier than going from the unstructured String the term time(H,M).
Your predicate works in the "generation" direction.
For the other direction, you want to parse the String. In this case, this is computationally easy and can be done without search/backtracking, which is nice!
Use Prolog's "Definite Clause Grammar" syntax which are "just" a nice way to write predicates that process a "list of stuff". In this case the list of stuff is a list of characters (atoms of length 1). (For the relevant page from SWI-Prolog, see here)
With some luck, the DCG code can run backwards/forwards, but this is generally not the case. Real code meeting some demands of efficiency or causality may force it so that under the hood of a single predicate, you first branch by "processing direction", and then run through rather different code structures to deliver the goods.
So here. The code immediately "decays" into the parse and generate branches. Prolog does not yet manage to behave fully constraint-based. You just have to do some things before others.
Anyway, let's do this:
:- use_module(library(dcg/basics)).
% ---
% "Generate" direction; note that String may be bound to something
% in which case this clause also verifies whether generating "HH:MM"
% from time(H,M) indeed yields (whatever is denoted by) String.
% ---
process_time(time(H,M),String) :-
integer(H), % Demand that H,M are valid integers inside limits
integer(M),
between(0,23,H),
between(0,59,M),
!, % Guard passed, commit to this code branch
phrase(time_g(H,M),Chars,[]), % Build Codes from time/2 Term
string_chars(String,Chars). % Merge Codes into a string, unify with String
% ---
% "Parse" direction.
% ---
process_time(time(H,M),String) :-
string(String), % Demand that String be a valid string; no demands on H,M
!, % Guard passed, commit to this code branch
string_chars(String,Chars), % Explode String into characters
phrase(time_p(H,M),Chars,[]). % Parse "Codes" into H and M
% ---
% "Generate" DCG
% ---
time_g(H,M) --> hour_g(H), [':'], minute_g(M).
hour_g(H) --> { divmod(H,10,V1,V2), digit_int(D1,V1), digit_int(D2,V2) }, digit(D1), digit(D2).
minute_g(M) --> { divmod(M,10,V1,V2), digit_int(D1,V1), digit_int(D2,V2) }, digit(D1), digit(D2).
% ---
% "Parse" DCG
% ---
time_p(H,M) --> hour_p(H), [':'], minute_p(M).
hour_p(H) --> digit(D1), digit(D2), { digit_int(D1,V1), digit_int(D2,V2), H is V1*10+V2, between(0,23,H) }.
minute_p(M) --> digit(D1), digit(D2), { digit_int(D1,V1), digit_int(D2,V2), M is V1*10+V2, between(0,59,M) }.
% ---
% Do I really have to code this? Oh well!
% ---
digit_int('0',0).
digit_int('1',1).
digit_int('2',2).
digit_int('3',3).
digit_int('4',4).
digit_int('5',5).
digit_int('6',6).
digit_int('7',7).
digit_int('8',8).
digit_int('9',9).
% ---
% Let's add plunit tests!
% ---
:- begin_tests(hhmm).
test("parse 1", true(T == time(0,0))) :- process_time(T,"00:00").
test("parse 2", true(T == time(12,13))) :- process_time(T,"12:13").
test("parse 1", true(T == time(23,59))) :- process_time(T,"23:59").
test("generate", true(S == "12:13")) :- process_time(time(12,13),S).
test("verify", true) :- process_time(time(12,13),"12:13").
test("complete", true(H == 12)) :- process_time(time(H,13),"12:13").
test("bad parse", fail) :- process_time(_,"66:66").
test("bad generate", fail) :- process_time(time(66,66),_).
:- end_tests(hhmm).
That's a lot of code.
Does it work?
?- run_tests.
% PL-Unit: hhmm ........ done
% All 8 tests passed
true.
Given the simplicity of the pattern, a DCG could be deemeed overkill, but actually it provides us an easy access to the atomics ingredients that we can feed into some declarative arithmetic library. For instance
:- module(hh_mm_bi,
[hh_mm_bi/2
,hh_mm_bi//1
]).
:- use_module(library(dcg/basics)).
:- use_module(library(clpfd)).
hh_mm_bi(T,S) :- phrase(hh_mm_bi(T),S).
hh_mm_bi(time(H,M)) --> n2(H,23),":",n2(M,59).
n2(V,U) --> d(A),d(B), {V#=A*10+B,V#>=0,V#=<U}.
d(V) --> digit(D), {V#=D-0'0}.
Some tests
?- hh_mm_bi(T,`23:30`).
T = time(23, 30).
?- hh_mm_bi(T,`24:30`).
false.
?- phrase(hh_mm_bi(T),S).
T = time(0, 0),
S = [48, 48, 58, 48, 48] ;
T = time(0, 1),
S = [48, 48, 58, 48, 49] ;
...
edit
library(clpfd) is not the only choice we have for declarative arithmetic. Here is another shot, using library(clpBNR), but it requires you install the appropriate pack, using ?- pack_install(clpBNR). After this is done, another solution functionally equivalent to the one above could be
:- module(hh_mm_bnr,
[hh_mm_bnr/2
,hh_mm_bnr//1
]).
:- use_module(library(dcg/basics)).
:- use_module(library(clpBNR)).
hh_mm_bnr(T,S) :- phrase(hh_mm_bnr(T),S).
hh_mm_bnr(time(H,M)) --> n2(H,23),":",n2(M,59).
n2(V,U) --> d(A),d(B), {V::integer(0,U),{V==A*10+B}}.
d(V) --> digit(D), {{V==D-0'0}}.
edit
The comment (now removed) by #DavidTonhofer has made me think that a far simpler approach is available, moving the 'generation power' into d//1:
:- module(hh_mm,
[hh_mm/2
,hh_mm//1
]).
hh_mm(T,S) :- phrase(hh_mm(T),S).
hh_mm(time(H,M)) --> n2(H,23),":",n2(M,59).
n2(V,U) --> d(A),d(B), { V is A*10+B, V>=0, V=<U }.
d(V) --> [C], { member(V,[0,1,2,3,4,5,6,7,8,9]), C is V+0'0 }.
time_string(time(H,M),String)
:-
hour(H) ,
minute(M) ,
number_string(H,Hs) ,
number_string(M,Ms) ,
string_concat(Hs,":",S) ,
string_concat(S,Ms,String)
.
hour(H) :- between(0,11,H) .
minute(M) :- between(0,59,M) .
/*
?- time_string(time(10,30),B).
B = "10:30".
?- time_string(time(H,M),"10:30").
H = 10,
M = 30 ;
false.
?- time_string(time(H,M),S).
H = M, M = 0,
S = "0:0" ;
H = 0,
M = 1,
S = "0:1" ;
H = 0,
M = 2,
S = "0:2" ;
H = 0,
M = 3,
S = "0:3" %etc.
*/
Yet another answer, avoiding DCGs as overkill for this task. Or rather, the two separate tasks involved here: Not every relation can be expressed in a single Prolog predicate, especially not every relation on something as extra-logical as SWI-Prolog's strings.
So here is the solution for one of the tasks, computing strings from times (this is your code renamed):
time_string_(time(H,M), String) :-
number_string(H,Hour),
number_string(M,Min),
string_concat(Hour,":",S),
string_concat(S,Min,String).
For example:
?- time_string_(time(11, 59), String).
String = "11:59".
Here is a simple implementation of the opposite transformation:
string_time_(String, time(H, M)) :-
split_string(String, ":", "", [Hour, Minute]),
number_string(H, Hour),
number_string(M, Minute).
For example:
?- string_time_("11:59", Time).
Time = time(11, 59).
And here is a predicate that chooses which of these transformations to use, depending on which arguments are known. The exact condition will depend on the cases that can occur in your application, but it seems reasonable to say that if the string is indeed a string, we want to try to parse it:
time_string(Time, String) :-
( string(String)
-> % Try to parse the existing string.
string_time_(String, Time)
; % Hope that Time is a valid time term.
time_string_(Time, String) ).
This will translate both ways:
?- time_string(time(11, 59), String).
String = "11:59".
?- time_string(Time, "11:59").
Time = time(11, 59).
I am new to Prolog
I am trying in Prolog a rule that gives me a given path from a node to another and also gives me the total weight of the path.
I have succeeded to get all the edges of the path but I am not able to show the weight of the path. I debbuged it and it is seen that variable S adds up to the whole weight of the path but in the way back, deletes all the elements. My idea is to add the total weight to P.
Code:
notIn(A,[]).
notIn(A,[H|T]):- A\==H,notIn(A,T).
path(X,X,_,[], S, P).
path(X,Y,[X|Cs], S, P) :-
path(X,Y,[X],Cs, S, P), P is S+W.
path(X,Y,Visited,[Z|Cs], S, P) :-
connection(X,Z,W),
notIn(Z,Visited),
path(Z,Y,[Z|Visited],Cs, S+W, P).
? path(ori, dest, X, 0, P).
Your predicate almost works. There are only two issues and some details I'd like to address. Firstly it would aid readability greatly to separate predicates with different arities. So let's put the one rule of path/5 in front of the two rules of path/6 like so:
path(X,Y,[X|Cs], S, P) :-
path(X,Y,[X],Cs, S, P),
P is S+W. % <-(1)
path(X,X,_,[], S, P).
path(X,Y,Visited,[Z|Cs], S, P) :-
connection(X,Z,W),
notIn(Z,Visited),
path(Z,Y,[Z|Visited],Cs, S+W, P). % <-(2)
Looking at your example query path/5 seems to be the predicate you want to call to find paths. In the second goal of its single rule (marked as % <-(1)) you are using the built-in is/2 with the expression S+W on the right hand side. The variable W appears here for the first time and thus is unbound. This leads to an instantiation error as illustrated by the following example:
?- X is 1+W.
ERROR!!
INSTANTIATION ERROR- in arithmetic: expected bound value
However, since you are only using path/5 to call path/6 there is no need for that goal. Secondly, in the second rule of path/6, in the last goal you are passing S+W as argument instead of evaluating it first. To see what happens, let's remove the goal marked % <-(1) from path/5 and add an example graph to your code:
connection(ori,a,2).
connection(a,b,5).
connection(b,a,4).
connection(b,dest,1).
Now consider your example query with an additional goal:
?- path(ori, dest, X, 0, P), Weight is P.
P = 0+2+5+1,
Weight = 8,
X = [ori,a,b,dest] ? ;
no
As you see the argument S+W leads to the final weight being an expression rather than a value. Consider adding a goal S1 is S+W before the recursive goal and pass S1 as an argument. Thirdly you are using the built-in (\==)/2 in your predicate notIn/2. This comparison succeeds or fails without side effect or unification. This is fine as long as both arguments are bound to values but are problematic when used with unbound variables. Consider the following queries:
?- X=Y, X\==Y.
no
fails as expected but:
?- X\==Y, X=Y.
X = Y
succeeds as X\==Y has no effect to the variables, so they can be unified in the next goal. It is a good idea to use dif/2 instead:
?- X=Y, dif(X,Y).
no
?- dif(X,Y), X=Y.
no
Lastly, two minor suggestions: First, since you are using the 4th argument of path/5 to pass 0 as a start-value for the weight, you might as well do that in the single goal of the rule, thereby simplifying the interface to path/4. Second, it would be nice to have a more descriptive name for the predicate that reflects its declarative nature, say start_end_path_weight/4. So your code would then look something like this:
notIn(A,[]).
notIn(A,[H|T]):-
dif(A,H),
notIn(A,T).
start_end_path_weight(X,Y,[X|Cs], P) :-
path(X,Y,[X],Cs, 0, P).
path(X,X,_,[], P, P).
path(X,Y,Visited,[Z|Cs], S, P) :-
connection(X,Z,W),
notIn(Z,Visited),
S1 is S+W,
path(Z,Y,[Z|Visited],Cs, S1, P).
With these modifications your example query looks like this:
?- start_end_path_weight(ori,dest,X,W).
W = 8,
X = [ori,a,b,dest] ? ;
no
Here's how to improve upon #tas's answer by using clpfd for arithmetics instead of (is)/2:
:- use_module(library(clpfd)).
start_end_path_weight(X,Y,[X|Cs], P) :-
path(X,Y,[X],Cs, 0, P).
path(X,X,_,[], P, P).
path(X,Y,Visited,[Z|Cs], S, P) :-
connection(X,Z,W),
notIn(Z,Visited)
maplist(dif(Z),Visited),
S1 is S+W
S1 #= S+W, S1 #=< P,
path(Z,Y,[Z|Visited],Cs, S1, P).
Limiting the maximum costs? Piece of cake!
Consider the following InterRail subset ...
... translated to Prolog ...
connection(X,Y,D) :- to_fro_dt(X,Y,D).
connection(X,Y,D) :- to_fro_dt(Y,X,D).
to_fro_dt(aberdeen,edinburgh,140). to_fro_dt(amsterdam,berlin,370). to_fro_dt(amsterdam,brussels,113). to_fro_dt(amsterdam,cologne,158). to_fro_dt(amsterdam,copenhagen,675). to_fro_dt(ancona,igoumenitsa,900). to_fro_dt(athens,patras,215). to_fro_dt(athens,/* for consistency */piraeus,5). to_fro_dt(athens,thessaloniki,265). to_fro_dt(bar,belgrade,572). to_fro_dt(barcelona,madrid,170). to_fro_dt(barcelona,marseille,280). to_fro_dt(barcelona,sevilla,330). to_fro_dt(barcelona,valencia,175). to_fro_dt(bari,igoumenitsa,570). to_fro_dt(bari,rome,240). to_fro_dt(belfast,dublin,240). to_fro_dt(belgrade,bucharest,730). to_fro_dt(belgrade,budapest,450). to_fro_dt(belgrade,sarajevo,540). to_fro_dt(belgrade,skopje,525). to_fro_dt(belgrade,sofia,485). to_fro_dt(bergen,oslo,405). to_fro_dt(berlin,cologne,260). to_fro_dt(berlin,hamburg,95). to_fro_dt(berlin,munich,345). to_fro_dt(berlin,prague,275). to_fro_dt(berlin,warsaw,365). to_fro_dt(bern,frankfurt,235). to_fro_dt(bern,lyon,230). to_fro_dt(bern,milan,240). to_fro_dt(birmingham,edinburgh,265). to_fro_dt(birmingham,holyhead,245). to_fro_dt(birmingham,london,105). to_fro_dt(bologna,florence,37). to_fro_dt(bologna,milan,60). to_fro_dt(bordeaux,lyon,375). to_fro_dt(bordeaux,madrid,660). to_fro_dt(bordeaux,paris,180). to_fro_dt(bristol,london,105). to_fro_dt(brussels,cologne,107). to_fro_dt(brussels,frankfurt,190). to_fro_dt(brussels,london,140). to_fro_dt(brussels,paris,85). to_fro_dt(bucharest,budapest,830). to_fro_dt(bucharest,sofia,540). to_fro_dt(bucharest,zagreb,365). to_fro_dt(budapest,ljubljana,540). to_fro_dt(budapest,vienna,165). to_fro_dt(budapest,warsaw,680). to_fro_dt(budapest,zagreb,365). to_fro_dt(catania,naples,450). to_fro_dt(cologne,frankfurt,82). to_fro_dt(copenhagen,hamburg,270). to_fro_dt(copenhagen,oslo,520). to_fro_dt(copenhagen,stockholm,315). to_fro_dt(cork,dublin,165). to_fro_dt(dublin,holyhead,195). to_fro_dt(dublin,westport,210). to_fro_dt(edinburgh,glasgow,50). to_fro_dt(faro,lisbon,230). to_fro_dt(florence,rome,95). to_fro_dt(florence,venice,123). to_fro_dt(frankfurt,hamburg,220). to_fro_dt(frankfurt,munich,190). to_fro_dt(frankfurt,paris,235). to_fro_dt(hamburg,munich,350). to_fro_dt(helsinki,rovaniemi,570). to_fro_dt(helsinki,turku,110). to_fro_dt(heraklion,piraeus,390). to_fro_dt(igoumenitsa,patras,360). to_fro_dt(istanbul,sofia,775). to_fro_dt(istanbul,thessaloniki,720). to_fro_dt(kiruna,stockholm,960). to_fro_dt(lisbon,madrid,610). to_fro_dt(lisbon,porto,165). to_fro_dt(ljubljana,venice,540). to_fro_dt(ljubljana,zagreb,140). to_fro_dt(london,paris,135). to_fro_dt(london,penzance,305). to_fro_dt(lyon,marseille,100). to_fro_dt(lyon,paris,115). to_fro_dt(madrid,'málaga',165). to_fro_dt(madrid,pamplona,180). to_fro_dt(madrid,santander,270). to_fro_dt(madrid,santiago,425). to_fro_dt(madrid,sevilla,155). to_fro_dt(madrid,valencia,105). to_fro_dt(marseille,montpellier,140). to_fro_dt(marseille,nice,155). to_fro_dt(milan,munich,465). to_fro_dt(milan,nice,310). to_fro_dt(milan,venice,155). to_fro_dt(munich,prague,365). to_fro_dt(munich,venice,425). to_fro_dt(munich,vienna,250). to_fro_dt(naples,rome,70). to_fro_dt(oslo,stockholm,380). to_fro_dt(paris,rennes,120). to_fro_dt(piraeus,rhodes,710). to_fro_dt(prague,vienna,270). to_fro_dt(prague,warsaw,520). to_fro_dt(sarajevo,zagreb,550). to_fro_dt(skopje,sofia,540). to_fro_dt(skopje,thessaloniki,240). to_fro_dt(sofia,thessaloniki,400). to_fro_dt(split,zagreb,335). to_fro_dt(stockholm,/* added by hand */turku,725). to_fro_dt(stockholm,'östersund',420). to_fro_dt(trondheim,'östersund',230). to_fro_dt(venice,vienna,440). to_fro_dt(vienna,warsaw,450).
... let's find paths that
start in Vienna
include at least 2 other cities
and have a cumulative travel time of 10 hours (or less)!
?- W #=< 600, Path = [_,_,_|_], start_end_path_weight(vienna, _, Path, W).
W = 530, Path = [vienna,budapest,zagreb] ;
W = 595, Path = [vienna,munich,berlin] ;
W = 440, Path = [vienna,munich,frankfurt] ;
W = 522, Path = [vienna,munich,frankfurt,cologne] ;
W = 600, Path = [vienna,munich,hamburg] ;
W = 545, Path = [vienna,prague,berlin] ;
W = 563, Path = [vienna,venice,florence] ;
W = 600, Path = [vienna,venice,florence,bologna] ;
W = 595, Path = [vienna,venice,milan] ;
false. % terminates universally fast
I am trying to set the seeds inside the caret's gafsControl(), but I am getting this error:
Error in { : task 1 failed - "supplied seed is not a valid integer"
I understand that seeds for trainControl() is a vector equal to the number of resamples plus one, with the number of combinations of models's tuning parameters (in my case 36, SVM with 6 Sigma and 6 Cost values) in each (resamples) entries. However, I couldn't figure out what I should use for gafsControl(). I've tried iters*popSize (100*10), iters (100), popSize (10), but none has worked.
Thanks in advance.
here is my code (with simulated data):
library(caret)
library(doMC)
library(kernlab)
registerDoMC(cores=32)
set.seed(1234)
train.set <- twoClassSim(300, noiseVars = 100, corrVar = 100, corrValue = 0.75)
mylogGA <- caretGA
mylogGA$fitness_extern <- mnLogLoss
#Index for gafsControl
set.seed(1045481)
ga_index <- createFolds(train.set$Class, k=3)
#Seed for the gafsControl()
set.seed(1056)
ga_seeds <- vector(mode = "list", length = 4)
for(i in 1:3) ga_seeds[[i]] <- sample.int(1500, 1000)
## For the last model:
ga_seeds[[4]] <- sample.int(1000, 1)
#Index for the trainControl()
set.seed(1045481)
tr_index <- createFolds(train.set$Class, k=5)
#Seeds for the trainControl()
set.seed(1056)
tr_seeds <- vector(mode = "list", length = 6)
for(i in 1:5) tr_seeds[[i]] <- sample.int(1000, 36)#
## For the last model:
tr_seeds[[6]] <- sample.int(1000, 1)
gaCtrl <- gafsControl(functions = mylogGA,
method = "cv",
number = 3,
metric = c(internal = "logLoss",
external = "logLoss"),
verbose = TRUE,
maximize = c(internal = FALSE,
external = FALSE),
index = ga_index,
seeds = ga_seeds,
allowParallel = TRUE)
tCtrl = trainControl(method = "cv",
number = 5,
classProbs = TRUE,
summaryFunction = mnLogLoss,
index = tr_index,
seeds = tr_seeds,
allowParallel = FALSE)
svmGrid <- expand.grid(sigma= 2^c(-25, -20, -15,-10, -5, 0), C= 2^c(0:5))
t1 <- Sys.time()
set.seed(1234235)
svmFuser.gafs <- gafs(x = train.set[, names(train.set) != "Class"],
y = train.set$Class,
gafsControl = gaCtrl,
trControl = tCtrl,
popSize = 10,
iters = 100,
method = "svmRadial",
preProc = c("center", "scale"),
tuneGrid = svmGrid,
metric="logLoss",
maximize = FALSE)
t2<- Sys.time()
svmFuser.gafs.time<-difftime(t2,t1)
save(svmFuser.gafs, file ="svmFuser.gafs.rda")
save(svmFuser.gafs.time, file ="svmFuser.gafs.time.rda")
Session Info:
> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS
locale:
[1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C LC_TIME=en_CA.UTF-8
[4] LC_COLLATE=en_CA.UTF-8 LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8
[7] LC_PAPER=en_CA.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] kernlab_0.9-22 doMC_1.3.3 iterators_1.0.7 foreach_1.4.2 caret_6.0-52 ggplot2_1.0.1 lattice_0.20-33
loaded via a namespace (and not attached):
[1] Rcpp_0.12.0 magrittr_1.5 splines_3.2.2 MASS_7.3-43 munsell_0.4.2
[6] colorspace_1.2-6 foreach_1.4.2 minqa_1.2.4 car_2.0-26 stringr_1.0.0
[11] plyr_1.8.3 tools_3.2.2 parallel_3.2.2 pbkrtest_0.4-2 nnet_7.3-10
[16] grid_3.2.2 gtable_0.1.2 nlme_3.1-122 mgcv_1.8-7 quantreg_5.18
[21] MatrixModels_0.4-1 iterators_1.0.7 gtools_3.5.0 lme4_1.1-9 digest_0.6.8
[26] Matrix_1.2-2 nloptr_1.0.4 reshape2_1.4.1 codetools_0.2-11 stringi_0.5-5
[31] compiler_3.2.2 BradleyTerry2_1.0-6 scales_0.3.0 stats4_3.2.2 SparseM_1.7
[36] brglm_0.5-9 proto_0.3-10
>
I am not so familiar with the gafsControl() function that you mention, but I encountered a very similar issue when setting parallel seeds using trainControl(). In the instructions, it describes how to create a list (length = number of resamples + 1), where each item is a list (length = number of parameter combinations to test). I find that doing that does not work (see topepo/caret issue #248 for info). However, if you then turn each item into a vector, e.g.
seeds <- lapply(seeds, as.vector)
then the seeds seem to work (i.e. models and predictions are entirely reproducible). I should clarify that this is using doMC as the backend. It may be different for other parallel backends.
Hope this helps
I was able to figure out my mistake by inspecting gafs.default. The seeds inside gafsControl() takes a vector with length (n_repeats*nresampling)+1 and not a list (as in trainControl$seeds). It is actually stated in the documentation of ?gafsControl that seeds is a vector or integers that can be used to set the seed during each search. The number of seeds must be equal to the number of resamples plus one. I figured it out the hard way, this is a reminder to carefully read the documentation :D.
if (!is.null(gafsControl$seeds)) {
if (length(gafsControl$seeds) < length(gafsControl$index) +
1)
stop(paste("There must be at least", length(gafsControl$index) +
1, "random number seeds passed to gafsControl"))
}
else {
gafsControl$seeds <- sample.int(1e+05, length(gafsControl$index) +
1)
}
So, the proper way to set my ga_seeds is:
#Index for gafsControl
set.seed(1045481)
ga_index <- createFolds(train.set$Class, k=3)
#Seed for the gafsControl()
set.seed(1056)
ga_seeds <- sample.int(1500, 4)
If that way settings seeds you can ensure each run the same feature subset is selected ? I ams asking due randominess of GA
Going through the second part of Nimrod's tutorial I've reached the part were macros are explained. The documentation says they run at compile time, so I thought I could do some parsing of strings to create myself a domain specific language. However, there are no examples of how to do this, the debug macro example doesn't display how one deals with a string parameter.
I want to convert code like:
instantiate("""
height,f,132.4
weight,f,75.0
age,i,25
""")
…into something which by hand I would write like:
var height: float = 132.4
var weight: float = 75.0
var age: int = 25
Obviously this example is not very useful, but I want to look at something simple (multiline/comma splitting, then transformation) which could help me implement something more complex.
My issue here is how does the macro obtain the input string, parse it (at compile time!), and what kind of code can run at compile time (is it just a subset of a languaje? can I use macros/code from other imported modules)?
EDIT: Based on the answer here's a possible code solution to the question:
import macros, strutils
# Helper proc, macro inline lambdas don't seem to compile.
proc cleaner(x: var string) = x = x.strip()
macro declare(s: string): stmt =
# First split all the input into separate lines.
var
rawLines = split(s.strVal, {char(0x0A), char(0x0D)})
buf = ""
for rawLine in rawLines:
# Split the input line into three columns, stripped, and parse.
var chunks = split(rawLine, ',')
map(chunks, cleaner)
if chunks.len != 3:
error("Declare macro syntax is 3 comma separated values:\n" &
"Got: '" & rawLine & "'")
# Add the statement, preppending a block if the buffer is empty.
if buf.len < 1: buf = "var\n"
buf &= " " & chunks[0] & ": "
# Parse the input type, which is an abbreviation.
case chunks[1]
of "i": buf &= "int = "
of "f": buf &= "float = "
else: error("Unexpected type '" & chunks[1] & "'")
buf &= chunks[2] & "\n"
# Finally, check if we did add any variable!
if buf.len > 0:
result = parseStmt(buf)
else:
error("Didn't find any input values!")
declare("""
x, i, 314
y, f, 3.14
""")
echo x
echo y
Macros can, by and large, utilize all pure Nimrod code that a procedure in the same place could see, too. E.g., you can import strutils or peg to parse your string, then construct output from that. Example:
import macros, strutils
macro declare(s: string): stmt =
var parts = split(s.strVal, {' ', ','})
if len(parts) != 3:
error("declare macro requires three parts")
result = parseStmt("var $1: $2 = $3" % parts)
declare("x, int, 314")
echo x
"Calling" a macro will basically evaluate it at compile time as though it were a procedure (with the caveat that the macro arguments will actually be ASTs, hence the need to use s.strVal above instead of s), then insert the AST that it returns at the position of the macro call.
The macro code is evaluated by the compiler's internal virtual machine.