Modelling full sequence with LSTM in Flux-Julia - machine-learning

I am trying to train an LSTM to model a full sequence y based on a sequence of x (not just the last item or a classifier). With the following code, the training does not work although the loss function works. It seems that the dot formalism does not work with train! ? Any ideas how I could do it? In Keras it's so simple....
Thanks in advance,
Markus
using Flux
# Create synthetic data first
### Function to generate x consisting of three variables and a sequence length of 200
function generateX()
x1 = Array{Float32, 1}(randn(200))
x2 = Array{Float32, 1}(randn(200))
x3 = Array{Float32, 1}(sin.((0:199) / 12*2*pi))
xdata=[x1 x2 x3]'
return(xdata)
end
### Generate 50 of these sequences of x
xdata = [generateX() for i in 1:50]
### Function to generate sequence of y from x sequence
function yfromx(x)
y=Array{Float32, 1}(0.2*cumsum(x[1,:].*x[2,:].*exp.(x[1,:])) .+x[3,:])
return(y')
end
ydata = map(yfromx, xdata);
### Now rearrange such that there is a sequence of 200 X inputs, i.e. an array of x vectors (and 50 of those sequences)
xdata=Flux.batch(xdata)
xdata2 = [xdata[:,s,c] for s in 1:200, c in 1:50]
xdata= [xdata2[:,c] for c in 1:50]
### Same for y
ydata=Flux.batch(ydata)
ydata2 = [ydata[:,s,c] for s in 1:200, c in 1:50]
ydata= [ydata2[:,c] for c in 1:50]
### Define model and loss function. "model." returns sequence of y from sequence of x
import Base.Iterators: flatten
model=Chain(LSTM(3, 26), Dense(26,1))
loss(x,y) = Flux.mse(collect(flatten(model.(x))),collect(flatten(y)))
model.(xdata[1]) # works fine
loss(xdata[2],ydata[2]) # also works fine
Flux.train!(loss, params(model), zip(xdata, ydata), ADAM(0.005)) ## Does not work, see error below. How to work around?
Error message
Mutating arrays is not supported
Stacktrace:
[1] error(::String) at ./error.jl:33
[2] (::getfield(Zygote, Symbol("##992#993")))(::Nothing) at /Net/Groups/BGI/scratch/mreichstein/julia_atacama_depots/packages/Zygote/fw4Oc/src/lib/array.jl:44
[3] (::getfield(Zygote, Symbol("##2633#back#994")){getfield(Zygote, Symbol("##992#993"))})(::Nothing) at /Net/Groups/BGI/scratch/mreichstein/julia_atacama_depots/packages/ZygoteRules/6nssF/src/adjoint.jl:49
[4] copyto! at ./abstractarray.jl:725 [inlined]
[5] (::typeof(∂(copyto!)))(::Array{Float32,1}) at /Net/Groups/BGI/scratch/mreichstein/julia_atacama_depots/packages/Zygote/fw4Oc/src/compiler/interface2.jl:0
[6] _collect at ./array.jl:550 [inlined]
[7] (::typeof(∂(_collect)))(::Array{Float32,1}) at /Net/Groups/BGI/scratch/mreichstein/julia_atacama_depots/packages/Zygote/fw4Oc/src/compiler/interface2.jl:0
[8] collect at ./array.jl:544 [inlined]
[9] (::typeof(∂(collect)))(::Array{Float32,1}) at /Net/Groups/BGI/scratch/mreichstein/julia_atacama_depots/packages/Zygote/fw4Oc/src/compiler/interface2.jl:0
[10] loss at ./In[20]:4 [inlined]
[11] (::typeof(∂(loss)))(::Float32) at /Net/Groups/BGI/scratch/mreichstein/julia_atacama_depots/packages/Zygote/fw4Oc/src/compiler/interface2.jl:0
[12] #153 at /Net/Groups/BGI/scratch/mreichstein/julia_atacama_depots/packages/Zygote/fw4Oc/src/lib/lib.jl:142 [inlined]
[13] #283#back at /Net/Groups/BGI/scratch/mreichstein/julia_atacama_depots/packages/ZygoteRules/6nssF/src/adjoint.jl:49 [inlined]
[14] #15 at /Net/Groups/BGI/scratch/mreichstein/julia_atacama_depots/packages/Flux/oX9Pi/src/optimise/train.jl:69 [inlined]
[15] (::typeof(∂(λ)))(::Float32) at /Net/Groups/BGI/scratch/mreichstein/julia_atacama_depots/packages/Zygote/fw4Oc/src/compiler/interface2.jl:0
[16] (::getfield(Zygote, Symbol("##38#39")){Zygote.Params,Zygote.Context,typeof(∂(λ))})(::Float32) at /Net/Groups/BGI/scratch/mreichstein/julia_atacama_depots/packages/Zygote/fw4Oc/src/compiler/interface.jl:101
[17] gradient(::Function, ::Zygote.Params) at /Net/Groups/BGI/scratch/mreichstein/julia_atacama_depots/packages/Zygote/fw4Oc/src/compiler/interface.jl:47
[18] macro expansion at /Net/Groups/BGI/scratch/mreichstein/julia_atacama_depots/packages/Flux/oX9Pi/src/optimise/train.jl:68 [inlined]
[19] macro expansion at /Net/Groups/BGI/scratch/mreichstein/julia_atacama_depots/packages/Juno/oLB1d/src/progress.jl:134 [inlined]
[20] #train!#12(::getfield(Flux.Optimise, Symbol("##16#22")), ::typeof(Flux.Optimise.train!), ::Function, ::Zygote.Params, ::Base.Iterators.Zip{Tuple{Array{Array{Array{Float32,1},1},1},Array{LinearAlgebra.Adjoint{Float32,Array{Float32,1}},1}}}, ::ADAM) at /Net/Groups/BGI/scratch/mreichstein/julia_atacama_depots/packages/Flux/oX9Pi/src/optimise/train.jl:66
[21] train!(::Function, ::Zygote.Params, ::Base.Iterators.Zip{Tuple{Array{Array{Array{Float32,1},1},1},Array{LinearAlgebra.Adjoint{Float32,Array{Float32,1}},1}}}, ::ADAM) at /Net/Groups/BGI/scratch/mreichstein/julia_atacama_depots/packages/Flux/oX9Pi/src/optimise/train.jl:64
[22] top-level scope at In[24]:1
loss(xdata[2],ydata[2])

Well, following Frederik's path, the following loss seems to work, but frankly I don't quite like it, so I still wonder if there are more elegant/idiomatic/efficient(?) solutions...
function loss(x,y)
yhat=model.(x)
s=0
for i in 1:length(yhat)
s+=(yhat[i][1] - y[i][1])^2
end
s/=length(yhat)
s
end

Please take look at Buffer for the mutating array.

Related

For each element, loop over all previous elements

I have a 2D JAX array containing an image.
For each pixel P[y, x] of the image, I would like to loop over all pixels P[y, x-i] to the left of that pixel and reduce those to a single value. The exact reduction computation involves finding a particular maximum over a weighted sum involving those pixels' values, as well as i and x. Therefore, the result (or any intermediate results) for P[y, x] can't be reused for P[y, x+1] either; this is an O(x²y) operation overall.
Can I accomplish this somewhat efficiently in JAX? If so, how?
JAX does not provide any native tool to do this sort of operation for an arbitrary function. It can be done via lax.scan or perhaps jnp.cumsum for functions where each successive value can be computed from the last, but it sounds like that is not the case here.
I believe the best you can do is to combine vmap with Python for-loops to achieve what you want: just be aware that during JIT compilation JAX will flatten all for loops, so if your image size is very large, the compilation time will be long. Here's a short example:
import jax.numpy as jnp
from jax import vmap
def reduction(x):
# some 1D reduction
assert x.ndim == 1
return len(x) + jnp.sum(x)
def cumulative_apply(row, reduction=reduction):
return jnp.array([reduction(row[:i]) for i in range(1, len(row) + 1)])
P = jnp.arange(20).reshape(4, 5)
result = vmap(cumulative_apply)(P)
print(result)
# [[ 1 3 6 10 15]
# [ 6 13 21 30 40]
# [11 23 36 50 65]
# [16 33 51 70 90]]

Logging loss while training in Flux using callbacks

I'm trying to write a callback for the train! function in Flux.
My code is:
cb_loss = x -> push!(x, loss(x_train, y_train))
loss_vector = Vector{Float32}()
Flux.train!(loss, ps, train_data, opt, cb=cb_loss(loss_vector))
It gives me this error:
MethodError: objects of type Float32 are not callable
Stacktrace:
[1] call(::Float32) at C:\Users\arjur\.julia\packages\Flux\Fj3bt\src\optimise\train.jl:36
[2] foreach at .\abstractarray.jl:1920 [inlined]
[3] #10 at C:\Users\arjur\.julia\packages\Flux\Fj3bt\src\optimise\train.jl:38 [inlined]
[4] macro expansion at C:\Users\arjur\.julia\packages\Flux\Fj3bt\src\optimise\train.jl:93 [inlined]
[5] macro expansion at C:\Users\arjur\.julia\packages\Juno\oLB1d\src\progress.jl:134 [inlined]
[6] #train!#12(::Array{Float32,1}, ::typeof(Flux.Optimise.train!), ::typeof(loss), ::Zygote.Params, ::DataLoader, ::Descent) at C:\Users\arjur\.julia\packages\Flux\Fj3bt\src\optimise\train.jl:81
[7] (::Flux.Optimise.var"#kw##train!")(::NamedTuple{(:cb,),Tuple{Array{Float32,1}}}, ::typeof(Flux.Optimise.train!), ::Function, ::Zygote.Params, ::DataLoader, ::Descent) at .\none:0
[8] top-level scope at In[108]:1
Interestingly it properly adds the first value to the vector and then crashes so I guess the error message is related to that.
I checked the function outside the train! function and it works so how should I rewrite this function to log the loss in a vector?
It seems that you need to pass it like this: cb=callback. So it can be done either using global variables or defining the callback like this:
loss_vector = Vector{Float32}()
callback() = push!(loss_vector, loss(x_train, y_train))
Flux.train!(loss, ps, train_data, opt, cb=callback)

Recursive Feature Elimination (RFE) SKLearn

I created a table to test my understanding
F1 F2 Outcome
0 2 5 1
1 4 8 2
2 6 0 3
3 9 8 4
4 10 6 5
From F1 and F2 I tried to predict Outcome
As you can see F1 have a strong correlation to Outcome,F2 is random noise
I tested
pca = PCA(n_components=2)
fit = pca.fit(X)
print("Explained Variance")
print(fit.explained_variance_ratio_)
Explained Variance
[ 0.57554896 0.42445104]
Which is what I expected and shows that F1 is more important
However when I do RFE (Recursive Feature Elimination)
model = LogisticRegression()
rfe = RFE(model, 1)
fit = rfe.fit(X, Y)
print(fit.n_features_)
print(fit.support_)
print(fit.ranking_)
1
[False True]
[2 1]
It asked me to keep F2 instead? It should ask me to keep F1 since F1 is a strong predictor while F2 is random noise... why F2?
Thanks
It is advisable to do a Recursive Feature Elimination Cross Validation (RFECV) before running the Recursive Feature Elimination (RFE)
Here is an example:
Having columns :
df.columns = ['age', 'id', 'sex', 'height', 'gender', 'marital status', 'income', 'race']
Use RFECV to identify the optimal number of features needed.
from sklearn.ensemble import RandomForestClassifier
rfe = RandomForestClassifier(random_state = 32) # Instantiate the algo
rfecv = RFECV(estimator= rfe, step=1, cv=StratifiedKFold(2), scoring="accuracy") # Instantiate the RFECV and its parameters
fit = rfecv.fit(features(or X), target(or y))
print("Optimal number of features : %d" % rfecv.n_features_)
>>>> Optimal number of output is 4
Now that the optimal number of features has been known, we can use Recursive Feature Elimination to identify the Optimal features
from sklearn.feature_selection import RFE
min_features_to_select = 1
rfc = RandomForestClassifier()
rfe = RFE(estimator=rfc, n_features_to_select= 4, step=1)
fittings1 = rfe.fit(features, target)
for i in range(features.shape[1]):
print('Column: %d, Selected %s, Rank: %.3f' % (i, rfe.support_[i], rfe.ranking_[i]))
output will be something like:
>>> Column: 0, Selected True, Rank: 1.000
>>> Column: 1, Selected False, Rank: 4.000
>>> Column: 2, Selected False, Rank: 7.000
>>> Column: 3, Selected False, Rank: 10.000
>>> Column: 4, Selected True, Rank: 1.000
>>> Column: 5, Selected False, Rank: 3.000
Now display the features to remove based on recursive feature elimination done above
columns_to_remove = features.columns.values[np.logical_not(rfe.support_)]
columns_to_remove
output will be something like:
>>> array(['age', 'id', 'race'], dtype=object)
Now create your new dataset by dropping the un-needed features and selecting the needed one
new_df = df.drop(['age', 'id', 'race'], axis = 1)
Then you can cross validation to know how well this newly selected features (new_df) predicts the target column.
# Check how well the features predict the target variable using cross_validation
cv = ShuffleSplit(n_splits=5, test_size=0.3, random_state=0)
scores = cross_val_score(RandomForestClassifier(), new_df, target, cv= cv)
print("%0.2f accuracy with a standard deviation of %0.2f" % (scores.mean(), scores.std()))
>>> 0.84 accuracy with a standard deviation of 0.01
Don't forget you can also read up on the best Cross-Validation (CV) parameters to use in this documentation
Recursive Feature Elimination (RFE) documentation to learn more and understand better
Recursive Feature Elimination Cross validation (RFECV) documentation
You are using LogisticRegression model. This is a classifier, not a regressor. So your outcome here is treated as labels (not numbers). For good training and prediction, a classifier needs multiple samples of each class. But in your data, only single row is present for each class. Hence the results are garbage and not to be taken seriously.
Try replacing that with any regression model and you will see the outcome which you thought would be.
model = LinearRegression()
rfe = RFE(model, 1)
fit = rfe.fit(X, y)
print(fit.n_features_)
print(fit.support_)
print(fit.ranking_)
# Output
1
[ True False]
[1 2]

Bayesian vs OLS

I found this question online. Can someone explain in details please, why using OLS is better? Is it only because the number of samples is not enough? Also, why not use all the 1000 samples to estimate the prior distribution?
We have 1000 randomly sampled data points. The goal is to try to build
a regression model with one response variable from k regressor
variables. Which is better? 1. (Bayesian Regression) Using the first
500 samples to estimate the parameters of an assumed prior
distribution and then use the last 500 samples to update the prior to
a posterior distribution with posterior estimates to be used in the
final regression model. 2. (OLS Regression) Use a simple ordinary
least squares regression model with all 1000 regressor variables
"Better" is always a matter of opinion, and it greatly depends on context.
Advantages to a frequentist OLS approach: Simpler, faster, more accessible to a wider audience (and therefore less to explain). A wise professor of mine used to say "You don't need to build an atom smasher when a flyswatter will do the trick."
Advantages to an equivalent Bayesian approach: More flexible to further model development, can directly model posteriors of derived/calculated quantities (there are more, but these have been my motivations for going Bayesian with a given analysis). Note the word "equivalent" - there are things you can do in a Bayesian framework that you can't do within a frequentist approach.
And hey, here's a exploration in R, first simulating data, then using a typical OLS approach.
N <- 1000
x <- 1:N
epsilon <- rnorm(N, 0, 1)
y <- x + epsilon
summary(lm(y ~ x))
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.9053 -0.6723 0.0116 0.6937 3.7880
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0573955 0.0641910 0.894 0.371
## x 0.9999997 0.0001111 9000.996 <2e-16 ***
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
##
## Residual standard error: 1.014 on 998 degrees of freedom
## Multiple R-squared: 1, Adjusted R-squared: 1
## F-statistic: 8.102e+07 on 1 and 998 DF, p-value: < 2.2e-16
...and here's an equivalent Bayesian regression, using non-informative priors on the regression parameters and all 1000 data points.
library(R2jags)
cat('model {
for (i in 1:N){
y[i] ~ dnorm(y.hat[i], tau)
y.hat[i] <- a + b * x[i]
}
a ~ dnorm(0, .0001)
b ~ dnorm(0, .0001)
tau <- pow(sigma, -2)
sigma ~ dunif(0, 100)
}', file="test.jags")
test.data <- list(x=x,y=y,N=1000)
test.jags.out <- jags(model.file="test.jags", data=test.data,
parameters.to.save=c("a","b","tau","sigma"), n.chains=3, n.iter=10000)
test.jags.out$BUGSoutput$mean$a
## [1] 0.05842661
test.jags.out$BUGSoutput$sd$a
## [1] 0.06606705
test.jags.out$BUGSoutput$mean$b
## [1] 0.9999976
test.jags.out$BUGSoutput$sd$b
## [1] 0.0001122533
Note that the parameter estimates and standard errors/standard deviations are essentially equivalent!
Now here's another Bayesian regression, using the first 500 data points to estimate the priors and then the last 500 to estimate posteriors.
test.data <- list(x=x[1:500],y=y[1:500],N=500)
test.jags.out <- jags(model.file="test.jags", data=test.data,
parameters.to.save=c("a","b","tau","sigma"), n.chains=3, n.iter=10000)
cat('model {
for (i in 1:N){
y[i] ~ dnorm(y.hat[i], tau)
y.hat[i] <- a + b * x[i]
}
a ~ dnorm(a_mn, a_prec)
b ~ dnorm(b_mn, b_prec)
a_prec <- pow(a_sd, -2)
b_prec <- pow(b_sd, -2)
tau <- pow(sigma, -2)
sigma ~ dunif(0, 100)
}', file="test.jags1")
test.data1 <- list(x=x[501:1000],y=y[501:1000],N=500,
a_mn=test.jags.out$BUGSoutput$mean$a,a_sd=test.jags.out$BUGSoutput$sd$a,
b_mn=test.jags.out$BUGSoutput$mean$b,b_sd=test.jags.out$BUGSoutput$sd$b)
test.jags.out1 <- jags(model.file="test.jags1", data=test.data1,
parameters.to.save=c("a","b","tau","sigma"), n.chains=3, n.iter=10000)
test.jags.out1$BUGSoutput$mean$a
## [1] 0.01491162
test.jags.out1$BUGSoutput$sd$a
## [1] 0.08513474
test.jags.out1$BUGSoutput$mean$b
## [1] 1.000054
test.jags.out1$BUGSoutput$sd$b
## [1] 0.0001201778
Interestingly, the inferences are similar to the OLS results, but not nearly as much so. This leads me to suspect that the 500 data points used to train the prior are not carrying as much weight in the analysis as the last 500, and the prior is effectively getting washed out, though I'm not sure on this point.
Regardless, I can't think of a reason not to use all 1000 data points (and non-informative priors) either, particularly since I suspect the 500+500 is using the first 500 and last 500 differently.
So perhaps, the answer to all of this is: I trust the OLS and 1000-point Bayesian results more than the 500+500, and OLS is simpler.
In my opinion is not a matter of better but a matter of which inference approach you're comfortable with.
You must remember that OLS comes from the frequentist school of inference and estimation is donde ML process which for this particular problem coincides with a geometric argument of distance minimization (in my personal opinion it is very odd, as supposedly we aare dealing with a rondom phenomena).
On the other hand, in the bayesian approach, inference is done through posterior distribution which is the multiplication of the prior (that represents the decision maker's previous information about the phenom) and the likelihood.
Again, the question is a matter of what inference approach you're comfortable with.

Caret doesn't run in parallel

Actual parallelizing caret depends on R , caret and doMC packages . As described at Parallelizing Caret code
Does anyone working with similar enviroment as I do ? What the max R version where R caret paralellization working correctly ?
> sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.2 LTS
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=C LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] caret_6.0-52 ggplot2_1.0.1 lattice_0.20-31 doMC_1.3.3 iterators_1.0.7 foreach_1.4.2 RStudioAMI_0.2
loaded via a namespace (and not attached):
[1] Rcpp_0.12.1 magrittr_1.5 splines_3.2.1 MASS_7.3-41 munsell_0.4.2 colorspace_1.2-6
[7] minqa_1.2.4 car_2.1-0 stringr_1.0.0 plyr_1.8.3 tools_3.2.1 pbkrtest_0.4-2
[13] nnet_7.3-9 grid_3.2.1 gtable_0.1.2 nlme_3.1-120 mgcv_1.8-6 quantreg_5.19
[19] MatrixModels_0.4-1 gtools_3.5.0 lme4_1.1-9 digest_0.6.8 Matrix_1.2-0 nloptr_1.0.4
[25] reshape2_1.4.1 codetools_0.2-11 stringi_0.5-5 BradleyTerry2_1.0-6 scales_0.3.0 stats4_3.2.1
[31] SparseM_1.7 brglm_0.5-9 proto_0.3-10
Update 1 :
My code follows :
library(doMC) ; registerDoMC(cores=4)
library(caret)
classification_formula <- as.formula(paste("target" ,"~",
paste(names(m_input_data)[!names(m_input_data)=='target'],collapse="+")))
CVfolds <- 2
CVreps <- 5
ma_control <- trainControl(method = "repeatedcv",
number = CVfolds,
repeats = CVreps ,
returnResamp = "final" ,
classProbs = T,
summaryFunction = twoClassSummary,
allowParallel = TRUE,verboseIter = TRUE)
rf_tuneGrid = expand.grid(mtry = seq(2,32, length.out = 6))
rf <- train(classification_formula , data = m_input_data , method = "rf", metric="ROC" ,trControl = ma_control, tuneGrid = rf_tuneGrid , ntree = 101)
Update 2 :
When I run from command line the only one core is working
When I run these script from Rstudio the paralell is working since I see 4
processes via top . But a second after this the error happens :
Error in names(resamples) <- gsub("^\\.", "", names(resamples)) :
attempt to set an attribute on NULL
Update 4 :
Hi , it seems the problem was in R session that was terminated . Each time I am start AWS instance I was run the R code with now refresh the R engine . Now each time I refresh Rstudio browser I do Session -> Restart R . Seems it runs .
I am checking now if the same for run the script from Ubuntu command line.
Generally it is running without to finish . Caret parallel on the data level . It means it is able to process each resample on different process . But if sample still big ( 100,000 / 2 ( number of folds = 2) X 2,000 features ) this can be hard to finish for each processor unit . Am I right ?
I think the parallelism must on algorithm level . It means each algorithm run likely to run on several cores . If such algorithm imlpementation avialable in caret ???
I have the latest release for Linux platforms, R version 3.2.2 (2015-08-14, Fire Safety), and paralellization works fine. Can you provide your code that does not work in parallel.
> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS
locale:
[1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8
[5] LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8 LC_PAPER=en_CA.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] kernlab_0.9-22 doMC_1.3.3 iterators_1.0.7 foreach_1.4.2 caret_6.0-52 ggplot2_1.0.1 lattice_0.20-33
loaded via a namespace (and not attached):
[1] Rcpp_0.12.0 compiler_3.2.2 nloptr_1.0.4 plyr_1.8.3 tools_3.2.2 digest_0.6.8
[7] lme4_1.1-9 nlme_3.1-122 gtable_0.1.2 mgcv_1.8-7 Matrix_1.2-2 brglm_0.5-9
[13] SparseM_1.7 proto_0.3-10 BradleyTerry2_1.0-6 stringr_1.0.0 gtools_3.5.0 MatrixModels_0.4-1
[19] stats4_3.2.2 grid_3.2.2 nnet_7.3-10 minqa_1.2.4 reshape2_1.4.1 car_2.0-26
[25] magrittr_1.5 scales_0.3.0 codetools_0.2-11 MASS_7.3-43 splines_3.2.2 pbkrtest_0.4-2
[31] colorspace_1.2-6 quantreg_5.18 stringi_0.5-5 munsell_0.4.2
I've used your code for the BreastCancer dataset on my local machine and it worked in parallel without any problem. I am using RStudio Version 0.98.1103.
library(caret)
library(mlbench)
data(BreastCancer)
library(doMC)
registerDoMC(cores=2)
classification_formula <- as.formula(paste("Class" ,"~",
paste(names(BreastCancer)[!names(BreastCancer)=='Class'],collapse="+")))
CVfolds <- 2
CVreps <- 5
ma_control <- trainControl(method = "repeatedcv",
number = CVfolds,
repeats = CVreps ,
returnResamp = "final" ,
classProbs = T,
summaryFunction = twoClassSummary,
allowParallel = TRUE,verboseIter = TRUE)
rf_tuneGrid = expand.grid(mtry = seq(2,32, length.out = 6))
#Notice, it might be easier just to use Class~.
#instead of classification_formula
rf <- train(classification_formula ,
data = BreastCancer ,
method = "rf",
metric="ROC" ,
trControl = ma_control,
tuneGrid = rf_tuneGrid ,
ntree = 101)
> rf
Random Forest
699 samples
10 predictors
2 classes: 'benign', 'malignant'
No pre-processing
Resampling: Cross-Validated (2 fold, repeated 5 times)
Summary of sample sizes: 341, 342, 342, 341, 342, 341, ...
Resampling results across tuning parameters:
mtry ROC Sens Spec ROC SD Sens SD Spec SD
2 0.9867820 1.0000000 0.0000000 0.005007691 0.000000000 0.000000000
8 0.9899107 0.9549550 0.9640196 0.002243649 0.006714919 0.017247716
14 0.9907072 0.9558559 0.9631933 0.003028258 0.012345228 0.008019979
20 0.9909514 0.9635135 0.9556513 0.003268291 0.006864342 0.010471005
26 0.9911480 0.9630631 0.9539706 0.003384987 0.005113930 0.010628533
32 0.9911485 0.9657658 0.9522969 0.002973508 0.004842197 0.004090206
ROC was used to select the optimal model using the largest value.
The final value used for the model was mtry = 32.
>

Resources