Logging loss while training in Flux using callbacks - machine-learning

I'm trying to write a callback for the train! function in Flux.
My code is:
cb_loss = x -> push!(x, loss(x_train, y_train))
loss_vector = Vector{Float32}()
Flux.train!(loss, ps, train_data, opt, cb=cb_loss(loss_vector))
It gives me this error:
MethodError: objects of type Float32 are not callable
Stacktrace:
[1] call(::Float32) at C:\Users\arjur\.julia\packages\Flux\Fj3bt\src\optimise\train.jl:36
[2] foreach at .\abstractarray.jl:1920 [inlined]
[3] #10 at C:\Users\arjur\.julia\packages\Flux\Fj3bt\src\optimise\train.jl:38 [inlined]
[4] macro expansion at C:\Users\arjur\.julia\packages\Flux\Fj3bt\src\optimise\train.jl:93 [inlined]
[5] macro expansion at C:\Users\arjur\.julia\packages\Juno\oLB1d\src\progress.jl:134 [inlined]
[6] #train!#12(::Array{Float32,1}, ::typeof(Flux.Optimise.train!), ::typeof(loss), ::Zygote.Params, ::DataLoader, ::Descent) at C:\Users\arjur\.julia\packages\Flux\Fj3bt\src\optimise\train.jl:81
[7] (::Flux.Optimise.var"#kw##train!")(::NamedTuple{(:cb,),Tuple{Array{Float32,1}}}, ::typeof(Flux.Optimise.train!), ::Function, ::Zygote.Params, ::DataLoader, ::Descent) at .\none:0
[8] top-level scope at In[108]:1
Interestingly it properly adds the first value to the vector and then crashes so I guess the error message is related to that.
I checked the function outside the train! function and it works so how should I rewrite this function to log the loss in a vector?

It seems that you need to pass it like this: cb=callback. So it can be done either using global variables or defining the callback like this:
loss_vector = Vector{Float32}()
callback() = push!(loss_vector, loss(x_train, y_train))
Flux.train!(loss, ps, train_data, opt, cb=callback)

Related

Scikit-Learn issues error for RandomForestClassifier for multilabel classification - Jagged arrays

Scikit-Learn RandomForestClassifier throws an error for a multilabel classification problem.
This code creates a RandomForestClassifier multilabel object, given predictors C and multi-labels out with no error.
C = np.array([[2,4,6],[4,2,1],[8,3,1]])
out = np.array([[0,1],[0,1],[1,0]])
rf = RandomForestClassifier(n_estimators=100, oob_score=True)
rf.fit(C,out)
If I modify the multilabels, so that all the elements at a certain index are the same, say (where all the first components of the multilabels equals zero)
out = np.array([[0,1],[0,1],[0,0]])
I get an error and traceback:
VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a
list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated.
If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
y_pred = np.array(y_pred, copy=False)
raise ValueError(
507 "The type of target cannot be used to compute OOB "
508 f"estimates. Got {y_type} while only the following are "
509 "supported: continuous, continuous-multioutput, binary, "
510 "multiclass, multilabel-indicator."
511 )
ValueError: could not broadcast input array from shape (2,1) into shape (2,)
Not requesting OOB predictions does not result in an error:
rf_err = RandomForestClassifier(n_estimators=100, oob_score=False)
I cannot figure out why keeping the OOB predictions would trigger such an error, when all the n-component of a multilabel are equal.
In your setup out_err = np.array([[0,1],[0,1],[0,0]]) you do not have any examples of the second class, so you only have elements of 1 class.
That means that there is no 'class label' dimension and it can be omitted. That's why you see (2,) shape.
Please, describe your initial intent: why would you need to set a particular position in labels to 0. If you try to go with N-1 classes instead of N classes I suggest removing the position itself and the elements of the class from the dataset, not putting all zeros:
out=[[1,0,0],[0,1,0],[0,1,0],[0,0,1],[1,0,0]] # 3 classes
# remove the second class:
out=[[1,0],[0,1],[1,0]] # 2 classes

Modelling full sequence with LSTM in Flux-Julia

I am trying to train an LSTM to model a full sequence y based on a sequence of x (not just the last item or a classifier). With the following code, the training does not work although the loss function works. It seems that the dot formalism does not work with train! ? Any ideas how I could do it? In Keras it's so simple....
Thanks in advance,
Markus
using Flux
# Create synthetic data first
### Function to generate x consisting of three variables and a sequence length of 200
function generateX()
x1 = Array{Float32, 1}(randn(200))
x2 = Array{Float32, 1}(randn(200))
x3 = Array{Float32, 1}(sin.((0:199) / 12*2*pi))
xdata=[x1 x2 x3]'
return(xdata)
end
### Generate 50 of these sequences of x
xdata = [generateX() for i in 1:50]
### Function to generate sequence of y from x sequence
function yfromx(x)
y=Array{Float32, 1}(0.2*cumsum(x[1,:].*x[2,:].*exp.(x[1,:])) .+x[3,:])
return(y')
end
ydata = map(yfromx, xdata);
### Now rearrange such that there is a sequence of 200 X inputs, i.e. an array of x vectors (and 50 of those sequences)
xdata=Flux.batch(xdata)
xdata2 = [xdata[:,s,c] for s in 1:200, c in 1:50]
xdata= [xdata2[:,c] for c in 1:50]
### Same for y
ydata=Flux.batch(ydata)
ydata2 = [ydata[:,s,c] for s in 1:200, c in 1:50]
ydata= [ydata2[:,c] for c in 1:50]
### Define model and loss function. "model." returns sequence of y from sequence of x
import Base.Iterators: flatten
model=Chain(LSTM(3, 26), Dense(26,1))
loss(x,y) = Flux.mse(collect(flatten(model.(x))),collect(flatten(y)))
model.(xdata[1]) # works fine
loss(xdata[2],ydata[2]) # also works fine
Flux.train!(loss, params(model), zip(xdata, ydata), ADAM(0.005)) ## Does not work, see error below. How to work around?
Error message
Mutating arrays is not supported
Stacktrace:
[1] error(::String) at ./error.jl:33
[2] (::getfield(Zygote, Symbol("##992#993")))(::Nothing) at /Net/Groups/BGI/scratch/mreichstein/julia_atacama_depots/packages/Zygote/fw4Oc/src/lib/array.jl:44
[3] (::getfield(Zygote, Symbol("##2633#back#994")){getfield(Zygote, Symbol("##992#993"))})(::Nothing) at /Net/Groups/BGI/scratch/mreichstein/julia_atacama_depots/packages/ZygoteRules/6nssF/src/adjoint.jl:49
[4] copyto! at ./abstractarray.jl:725 [inlined]
[5] (::typeof(∂(copyto!)))(::Array{Float32,1}) at /Net/Groups/BGI/scratch/mreichstein/julia_atacama_depots/packages/Zygote/fw4Oc/src/compiler/interface2.jl:0
[6] _collect at ./array.jl:550 [inlined]
[7] (::typeof(∂(_collect)))(::Array{Float32,1}) at /Net/Groups/BGI/scratch/mreichstein/julia_atacama_depots/packages/Zygote/fw4Oc/src/compiler/interface2.jl:0
[8] collect at ./array.jl:544 [inlined]
[9] (::typeof(∂(collect)))(::Array{Float32,1}) at /Net/Groups/BGI/scratch/mreichstein/julia_atacama_depots/packages/Zygote/fw4Oc/src/compiler/interface2.jl:0
[10] loss at ./In[20]:4 [inlined]
[11] (::typeof(∂(loss)))(::Float32) at /Net/Groups/BGI/scratch/mreichstein/julia_atacama_depots/packages/Zygote/fw4Oc/src/compiler/interface2.jl:0
[12] #153 at /Net/Groups/BGI/scratch/mreichstein/julia_atacama_depots/packages/Zygote/fw4Oc/src/lib/lib.jl:142 [inlined]
[13] #283#back at /Net/Groups/BGI/scratch/mreichstein/julia_atacama_depots/packages/ZygoteRules/6nssF/src/adjoint.jl:49 [inlined]
[14] #15 at /Net/Groups/BGI/scratch/mreichstein/julia_atacama_depots/packages/Flux/oX9Pi/src/optimise/train.jl:69 [inlined]
[15] (::typeof(∂(λ)))(::Float32) at /Net/Groups/BGI/scratch/mreichstein/julia_atacama_depots/packages/Zygote/fw4Oc/src/compiler/interface2.jl:0
[16] (::getfield(Zygote, Symbol("##38#39")){Zygote.Params,Zygote.Context,typeof(∂(λ))})(::Float32) at /Net/Groups/BGI/scratch/mreichstein/julia_atacama_depots/packages/Zygote/fw4Oc/src/compiler/interface.jl:101
[17] gradient(::Function, ::Zygote.Params) at /Net/Groups/BGI/scratch/mreichstein/julia_atacama_depots/packages/Zygote/fw4Oc/src/compiler/interface.jl:47
[18] macro expansion at /Net/Groups/BGI/scratch/mreichstein/julia_atacama_depots/packages/Flux/oX9Pi/src/optimise/train.jl:68 [inlined]
[19] macro expansion at /Net/Groups/BGI/scratch/mreichstein/julia_atacama_depots/packages/Juno/oLB1d/src/progress.jl:134 [inlined]
[20] #train!#12(::getfield(Flux.Optimise, Symbol("##16#22")), ::typeof(Flux.Optimise.train!), ::Function, ::Zygote.Params, ::Base.Iterators.Zip{Tuple{Array{Array{Array{Float32,1},1},1},Array{LinearAlgebra.Adjoint{Float32,Array{Float32,1}},1}}}, ::ADAM) at /Net/Groups/BGI/scratch/mreichstein/julia_atacama_depots/packages/Flux/oX9Pi/src/optimise/train.jl:66
[21] train!(::Function, ::Zygote.Params, ::Base.Iterators.Zip{Tuple{Array{Array{Array{Float32,1},1},1},Array{LinearAlgebra.Adjoint{Float32,Array{Float32,1}},1}}}, ::ADAM) at /Net/Groups/BGI/scratch/mreichstein/julia_atacama_depots/packages/Flux/oX9Pi/src/optimise/train.jl:64
[22] top-level scope at In[24]:1
loss(xdata[2],ydata[2])
Well, following Frederik's path, the following loss seems to work, but frankly I don't quite like it, so I still wonder if there are more elegant/idiomatic/efficient(?) solutions...
function loss(x,y)
yhat=model.(x)
s=0
for i in 1:length(yhat)
s+=(yhat[i][1] - y[i][1])^2
end
s/=length(yhat)
s
end
Please take look at Buffer for the mutating array.

How to get loss function history using tf.contrib.opt.ScipyOptimizerInterface

I need to get the loss history over time to plot it in graph.
Here is my skeleton of code:
optimizer = tf.contrib.opt.ScipyOptimizerInterface(loss, method='L-BFGS-B',
options={'maxiter': args.max_iterations, 'disp': print_iterations})
optimizer.minimize(sess, loss_callback=append_loss_history)
With append_loss_history definition:
def append_loss_history(**kwargs):
global step
if step % 50 == 0:
loss_history.append(loss.eval())
step += 1
When I see the verbose output of ScipyOptimizerInterface, the loss is actually decrease over time.
But when I print loss_history, the losses are nearly the same over time.
Refer to the doc:
"Variables subject to optimization are updated in-place AT THE END OF OPTIMIZATION"
https://www.tensorflow.org/api_docs/python/tf/contrib/opt/ScipyOptimizerInterface. Is that the reason for the being unchanged of the loss?
I think you have the problem down; the variables themselves are not modified until the end of the optimization (instead being fed to session.run calls), and evaluating a "back channel" Tensor gets the un-modified variables. Instead, use the fetches argument to optimizer.minimize to piggyback on the session.run calls which have the feeds specified:
import tensorflow as tf
def print_loss(loss_evaled, vector_evaled):
print(loss_evaled, vector_evaled)
vector = tf.Variable([7., 7.], 'vector')
loss = tf.reduce_sum(tf.square(vector))
optimizer = tf.contrib.opt.ScipyOptimizerInterface(
loss, method='L-BFGS-B',
options={'maxiter': 100})
with tf.Session() as session:
tf.global_variables_initializer().run()
optimizer.minimize(session,
loss_callback=print_loss,
fetches=[loss, vector])
print(vector.eval())
(Modified from the example in the documentation). This prints Tensors with the updated values:
98.0 [ 7. 7.]
79.201 [ 6.29289341 6.29289341]
7.14396e-12 [ -1.88996808e-06 -1.88996808e-06]
[ -1.88996808e-06 -1.88996808e-06]

CNTK: ValueError unbound Placeholder found in the function

I am working on CNTK and got following error:
ValueError: 2 unbound Placeholder(s) 'Placeholder('keep', [#, *], [939]), Placeholder('keep', [#, *], [939])' found in the Function. All Placeholders of a Function must be bound (to a variable) before performing a Forward computation.
for i in range(10000):
a1,a2,tar=get_sample(minibatch_size,start)
start=start+int(minibatch_size)
if start>=int(0.8*float(len(lab)))-minibatch_size:
start=0
trainer.train_minibatch({P1: a1, P2: a2, target: tar})
P1 and P2 are defined as C.layers.Input(939)
I was able to figure out the problem in my case. I had to pass the model output instead of the model itself as a parameter to the trainer constructor.
model = cntk.layers.Sequential([l1,l2])
model_output = model(predictor)
Error:
trainer = cntk.train.trainer.Trainer(model,(loss,meas),[learner])
No Error:
trainer = cntk.train.trainer.Trainer(model_output,(loss,meas),[learner])

Caret doesn't run in parallel

Actual parallelizing caret depends on R , caret and doMC packages . As described at Parallelizing Caret code
Does anyone working with similar enviroment as I do ? What the max R version where R caret paralellization working correctly ?
> sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.2 LTS
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=C LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] caret_6.0-52 ggplot2_1.0.1 lattice_0.20-31 doMC_1.3.3 iterators_1.0.7 foreach_1.4.2 RStudioAMI_0.2
loaded via a namespace (and not attached):
[1] Rcpp_0.12.1 magrittr_1.5 splines_3.2.1 MASS_7.3-41 munsell_0.4.2 colorspace_1.2-6
[7] minqa_1.2.4 car_2.1-0 stringr_1.0.0 plyr_1.8.3 tools_3.2.1 pbkrtest_0.4-2
[13] nnet_7.3-9 grid_3.2.1 gtable_0.1.2 nlme_3.1-120 mgcv_1.8-6 quantreg_5.19
[19] MatrixModels_0.4-1 gtools_3.5.0 lme4_1.1-9 digest_0.6.8 Matrix_1.2-0 nloptr_1.0.4
[25] reshape2_1.4.1 codetools_0.2-11 stringi_0.5-5 BradleyTerry2_1.0-6 scales_0.3.0 stats4_3.2.1
[31] SparseM_1.7 brglm_0.5-9 proto_0.3-10
Update 1 :
My code follows :
library(doMC) ; registerDoMC(cores=4)
library(caret)
classification_formula <- as.formula(paste("target" ,"~",
paste(names(m_input_data)[!names(m_input_data)=='target'],collapse="+")))
CVfolds <- 2
CVreps <- 5
ma_control <- trainControl(method = "repeatedcv",
number = CVfolds,
repeats = CVreps ,
returnResamp = "final" ,
classProbs = T,
summaryFunction = twoClassSummary,
allowParallel = TRUE,verboseIter = TRUE)
rf_tuneGrid = expand.grid(mtry = seq(2,32, length.out = 6))
rf <- train(classification_formula , data = m_input_data , method = "rf", metric="ROC" ,trControl = ma_control, tuneGrid = rf_tuneGrid , ntree = 101)
Update 2 :
When I run from command line the only one core is working
When I run these script from Rstudio the paralell is working since I see 4
processes via top . But a second after this the error happens :
Error in names(resamples) <- gsub("^\\.", "", names(resamples)) :
attempt to set an attribute on NULL
Update 4 :
Hi , it seems the problem was in R session that was terminated . Each time I am start AWS instance I was run the R code with now refresh the R engine . Now each time I refresh Rstudio browser I do Session -> Restart R . Seems it runs .
I am checking now if the same for run the script from Ubuntu command line.
Generally it is running without to finish . Caret parallel on the data level . It means it is able to process each resample on different process . But if sample still big ( 100,000 / 2 ( number of folds = 2) X 2,000 features ) this can be hard to finish for each processor unit . Am I right ?
I think the parallelism must on algorithm level . It means each algorithm run likely to run on several cores . If such algorithm imlpementation avialable in caret ???
I have the latest release for Linux platforms, R version 3.2.2 (2015-08-14, Fire Safety), and paralellization works fine. Can you provide your code that does not work in parallel.
> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS
locale:
[1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8
[5] LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8 LC_PAPER=en_CA.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] kernlab_0.9-22 doMC_1.3.3 iterators_1.0.7 foreach_1.4.2 caret_6.0-52 ggplot2_1.0.1 lattice_0.20-33
loaded via a namespace (and not attached):
[1] Rcpp_0.12.0 compiler_3.2.2 nloptr_1.0.4 plyr_1.8.3 tools_3.2.2 digest_0.6.8
[7] lme4_1.1-9 nlme_3.1-122 gtable_0.1.2 mgcv_1.8-7 Matrix_1.2-2 brglm_0.5-9
[13] SparseM_1.7 proto_0.3-10 BradleyTerry2_1.0-6 stringr_1.0.0 gtools_3.5.0 MatrixModels_0.4-1
[19] stats4_3.2.2 grid_3.2.2 nnet_7.3-10 minqa_1.2.4 reshape2_1.4.1 car_2.0-26
[25] magrittr_1.5 scales_0.3.0 codetools_0.2-11 MASS_7.3-43 splines_3.2.2 pbkrtest_0.4-2
[31] colorspace_1.2-6 quantreg_5.18 stringi_0.5-5 munsell_0.4.2
I've used your code for the BreastCancer dataset on my local machine and it worked in parallel without any problem. I am using RStudio Version 0.98.1103.
library(caret)
library(mlbench)
data(BreastCancer)
library(doMC)
registerDoMC(cores=2)
classification_formula <- as.formula(paste("Class" ,"~",
paste(names(BreastCancer)[!names(BreastCancer)=='Class'],collapse="+")))
CVfolds <- 2
CVreps <- 5
ma_control <- trainControl(method = "repeatedcv",
number = CVfolds,
repeats = CVreps ,
returnResamp = "final" ,
classProbs = T,
summaryFunction = twoClassSummary,
allowParallel = TRUE,verboseIter = TRUE)
rf_tuneGrid = expand.grid(mtry = seq(2,32, length.out = 6))
#Notice, it might be easier just to use Class~.
#instead of classification_formula
rf <- train(classification_formula ,
data = BreastCancer ,
method = "rf",
metric="ROC" ,
trControl = ma_control,
tuneGrid = rf_tuneGrid ,
ntree = 101)
> rf
Random Forest
699 samples
10 predictors
2 classes: 'benign', 'malignant'
No pre-processing
Resampling: Cross-Validated (2 fold, repeated 5 times)
Summary of sample sizes: 341, 342, 342, 341, 342, 341, ...
Resampling results across tuning parameters:
mtry ROC Sens Spec ROC SD Sens SD Spec SD
2 0.9867820 1.0000000 0.0000000 0.005007691 0.000000000 0.000000000
8 0.9899107 0.9549550 0.9640196 0.002243649 0.006714919 0.017247716
14 0.9907072 0.9558559 0.9631933 0.003028258 0.012345228 0.008019979
20 0.9909514 0.9635135 0.9556513 0.003268291 0.006864342 0.010471005
26 0.9911480 0.9630631 0.9539706 0.003384987 0.005113930 0.010628533
32 0.9911485 0.9657658 0.9522969 0.002973508 0.004842197 0.004090206
ROC was used to select the optimal model using the largest value.
The final value used for the model was mtry = 32.
>

Resources