Default resampling method in mlr3 - mlr3

If I did not mention any resampling methods in my mlr3 code, what happens in the background? Does it run a default resampling method? If yes, what is it? k-fold CV ?
Warm regards

You have two options:
Read the documentation at https://mlr3book.mlr-org.com
run code yourself and inspect the resampling object
(Asking such a question feels like you haven't invested only a few minutes to do your own research)
Please also see https://stackoverflow.com/help/how-to-ask for further questions.

Related

CV or train/predict in mlr3

In a post "The "Cross-Validation - Train/Predict" misunderstanding" by Patrick Schratz
https://mlr-org.com/docs/cv-vs-predict/
mentioned that:
(a) CV is done to get an estimate of a model’s performance.
(b) Train/predict is done to create the final predictions (which your boss might use to make some decisions on).
It means in mlr3, if we are in academia, need to publish papers, we need to use the CV as we intend to compare the performance of different algorithms. And in industry, if our plan is to train a model and then have to use again and again on industry data to make predictions, we need to use the train/predict methods provided by mlr3 ?
Is it something which I completely picked wrong?
Thank you
You always need a CV if you want to make a statement about a model's performance.
If you want to use the model to make predictions to unknown data, do a single fit and then predict.
So in practice, you need both: CV + "train+predict".
PS: Your post does not really fit to Stackoverflow since it is not related to a coding problem. For statistical questions please see https://stats.stackexchange.com/.
PS2: If you talk about a post, please include the link. I am the author of the post in this case but most other people might not know what you are talking about ;)

How can re-train my logistic model using pymc3?

I have a binary classification problem where I have around 15 features. I have chosen these features using some other model. Now I want to perform Bayesian Logistic on these features. My target classes are highly imbalance(minority class is 0.001%) and I have around 6 million records. I want to build a model which can be trained nighty or weekend using Bayesian logistic.
Currently, I have divided the data into 15 parts and then I train my model on the first part and test on the last part then I am updating my priors using Interpolated method of pymc3 and rerun the model using the 2nd set of data. I am checking the accuracy and other metrics(ROC, f1-score) after each run.
Problems:
My score is not improving.
Am I using the right approch?
This process is taking too much time.
If someone can guide me with the right approach and code snippets it will be very helpful for me.
You can use variational inference. It is faster than sampling and produces almost similar results. pymc3 itself provides methods for VI, you can explore that.
I only know this part of question. If you can elaborate your problem a bit further, maybe.. I can help you.

What is OOF approach in machine learning?

I have seen in many kaggle notebooks people talk about oof approach when they do machine learning with K-Fold validation. What is oof and is it related to k-fold validation ? Also can you suggest some useful resources for it to get the concept in detail
Thanks for helping!
OOF simply stands for "Out-of-fold" and refers to a step in the learning process when using k-fold validation in which the predictions from each set of folds are grouped together into one group of 1000 predictions. These predictions are now "out-of-the-folds" and thus error can be calculated on these to get a good measure of how good your model is.
In terms of learning more about it, there's really not a ton more to it than that, and it certainly isn't its own technique to learning or anything. If you have a follow up question that is small, please leave a comment and I will try and update my answer to include this.
EDIT: While ambling around the inter-webs I stumbled upon this relatively similar question from Cross-Validated (with a slightly more detailed answer), perhaps it will add some intuition if you are still confused.
I found this article from machine learning mastery explaining out of the fold predictions quite in depth.
Below an extract from the article explaining what out of fold (OOF) prediction is:
"An out-of-fold prediction is a prediction by the model during the k-fold cross-validation procedure.
That is, out-of-fold predictions are those predictions made on the holdout datasets during the resampling procedure. If performed correctly, there will be one prediction for each example in the training dataset."

Regression. Optimize median instead of mean for skewed distribution

Let say I do the DNN regression task for some skewed data distribution. Now I am using mean absolute error as loss function.
All typical approaches in machine learning are minimizing mean loss, but for skewed that is unappropriating. It is better from a practical point of view to minimize median loss. I think one way is to penalize big losses with some coefficient. And then mean will be close to the median. But how to calculate that coef for the unknown distribution type? Are there other approaches? What can you to advice?
(I am using tensorflow/keras)
Just use the mean absolute error loss function in keras, instead of the mean squared.
The mean absolute is pretty much equivalent the median, and anyway would be more robust to outliers or skewed data. you should have a look at all of the possible keras losses:
https://keras.io/losses/
and obviously, you can create your own too.
But for most data sets it just empirically turns out that mean square gets you better accuracy. so i would recommend to at least try both methods before settling on the mean absolute one.
If you have skewed error distributions, you can use tfp.stats.percentile as your Keras loss function, with something like:
def loss_fn(y_true, y_pred):
return tfp.stats.percentile(tf.abs(y_true - y_pred), q=50)
model.compile(loss=loss_fn)
It gives gradients, so works with Keras, although isn't as fast as MAE / MSE.
https://www.tensorflow.org/probability/api_docs/python/tfp/stats/percentile
Customizing loss (/objective) functions is tough. Keras does theoretically allow you to do this, though they seem to have removed the documentation specifically describing it in their 2.0 release.
You can check their docs on loss functions for ideas, and then head over to the source code to see what kind of an API you should implement.
However, there are a number of issues filed by people who are having trouble with this, and the fact that they've removed the documentation on it is not inspiring.
Just remember that you have to use Keras own backend to compute your loss function. If you get it working, please write a blog post, or update with an answer here, because this is something quite a few other people have struggled/are struggling with!

WEKA's MultilayerPerceptron: training then training again

I am trying to do the following with weka's MultilayerPerceptron:
Train with a small subset of the training Instances for a portion of the epochs input,
Train with whole set of Instances for the remaining epochs.
However, when I do the following in my code, the network seems to reset itself to start with a clean slate the second time.
mlp.setTrainingTime(smallTrainingSetEpochs);
mlp.buildClassifier(smallTrainingSet);
mlp.setTrainingTime(wholeTrainingSetEpochs);
mlp.buildClassifier(wholeTrainingSet);
Am I doing something wrong, or is this the way that the algorithm is supposed to work in weka?
If you need more information to answer this question, please let me know. I am kind of new to programming with weka and am unsure as to what information would be helpful.
This thread on the weka mailing list is a question very similar to yours.
It seems that this is how weka's MultilayerPerceptron is supposed to work. It's designed to be a 'batch' learner, you are trying to use it incrementally. Only classifiers that implement weka.classifiers.UpdateableClassifier can be incrementally trained.

Resources