Getting the best parameters using sklearn nested cross-validation - machine-learning

Is my understanding correct that "greed_search.fit(X,Y)" has nothing with nested CV in the following code? Meaning that there is no way to get the best parameters using nested CV in sklearn.
# inner cross_validation
greed_search = GridSearchCV(estimator=estimator, param_grid=parameters, cv=inner_cv, scoring=optimized_for)
greed_search.fit(X, optimization_label)
# Nested CV with parameter optimization
nested_score = cross_val_score(greed_search, X=X, y=Y, cv=outer_cv)

You are right: the greed_search.fit(X, optimization_label) in your code is performed as is without being nested into the next cross validation.
To answer your second question I ask you another question: What should be the best parameters of the grid search nested into the cross validation? The ones of the first fold? the most commons?
The inner grid search, at each step of the outer cross validation, selects the best parameters according to the training data of the current step. Hence the parameters can change among the folds. Doing the outer cross validation by yourself you can compute the best parameters of each step, but I don't think you really need it.

Related

tfx.components.StatisticsGen display train and eval in two different figures, is it possible to have them in a single figure as tfdv does?

a superimposed display for train/val splits using StatisticsGen
Hi,
I'm currently using tfx pipeline inside kubeflow. I struggle to have StatisticsGen showing a single graph with train and validation splits curves superimposed, allowing better comparaison distributions. this is exactly how tfdv.visualize_statistics(lhs_statistics=train_stats, rhs_statistics=eval_stats, lhs_name='train', rhs_name='eval') behaves (see illustration 1), and I would like StatisticsGen to also provide a superimposed splits graph.
Thanks for any reference or help so that i can move forward.
Regards
You can use something like
# docs-infra: no-execute
# Compare evaluation data with training data
tfdv.visualize_statistics(lhs_statistics=eval_stats, rhs_statistics=train_stats,
lhs_name='EVAL_DATASET', rhs_name='TRAIN_DATASET')
From the tensorflow data validation tutorial

Feature selection with GridsearchCV

I am trying to use GridSearchCV to optimize a pipeline that does feature selection in the beginning and classification using KNN at the end. I have fitted the model using my data set but when I see the best parameters found by GridSearchCV, it only gives the best parameters for SelectKBest. I have no idea why it doesn't show the best parameters for KNN.
Here is my code.
Addition of KNN and SelectKbest
classifier = KNeighborsClassifier()
parameters = {"classify__n_neighbors": list(range(5,15)),
"classify__p":[1,2]}
sel = SelectKBest(f_classif)
param={'kbest__k': [10, 20 ,30 ,40 ,50]}
GridsearchCV with pipeline and parameter grid
model = GridSearchCV(Pipeline([('kbest',sel),('classify', classifier)]),
param_grid=[param,parameters], cv=10)
fitting the model
model.fit(X_new, y)
the result
print(model.best_params_)
{'kbest__k': 40}
That's an incorrect way of merging dicts I believe. Try
param_grid={**param,**parameters}
or (Python 3.9+)
param_grid=param|parameters
When param_grid is a list, the disjoint union of the grids generated by each dictionary in the list is explored. So your search is over (1) the default k=10 selected features and every combination of classifier parameters, and separately (2) the default classifier parameters and each value of k. That the best parameters just show k=40 means that having more features, even with default classifier, performed best. You can check your cv_results_ to verify.
As dx2-66 answers, merging the dictionaries will generate the full grid you probably are after. You could also just define a single dictionary from the start.

Some xgboost options do not differentiate predictions

I want to customize my model with a few parameters and choose the best one. I want to customize the objective property, eval_metric and possibly feval. My problem is that the properties eval_metric and feval do not affect the prediction at all. I tried to specify the disable_default_eval_metric:1 property but it didn't help.
What is the reason?
y1 = xgb.train({'objective': 'reg:squaredlogerror'},dtrain=dtrain).predict(dtest)
y2 = xgb.train(
{'objective': 'reg:squaredlogerror', 'disable_default_eval_metric':1, 'eval_metric':'rmsle'},
dtrain = dtrain
).predict(dtest)
Prediction:
y1 = [3.9530325, 4.1704693, 4.18354, 4.1704693, 3.9317188]
y2 = [3.9530325, 4.1704693, 4.18354, 4.1704693, 3.9317188]
As the name implies, eval_metric is only used for evaluation, and it does not affect in any way the model training; it only reports back the value of the chosen metric(s). During training, the model only tries to optimize the objective, and it does not bother at all with any eval_metric (save for reporting it back, and possibly using it for early stopping, if such an option has been selected).
This is the reason why you can use multiple functions in eval_metric; from the docs:
User can add multiple evaluation metrics.
This would not be the case if eval_metric was directly used for model optimization during training, as it would raise the issue of which single one to optimize. Notice that, in contrast to eval_metric, you cannot have multiple objective functions.
Given that, what you report is absolutely expected, i.e. your models are actually the same in both cases.

How do you actually apply a trained model?

I've been slowly going through the tensorflow tutorials, and I assume I will have to again. I don't have a background in ML but am slowly pushing my way up.
Anyway, after reading through the RNN tutorial and running the training code, I am confused.
How does one actually apply the trained model so that it can be used to make language predictions?
I know this is a terrible noobish and simple question, but I believe it will be of use to others, as I have seen it asked and not answered in a satisfactory way.
In general, when you train a model, you first do a forward pass, and then a backward pass. The forward pass makes a prediction based on your input data, and the backward pass adjust your model based on how correct your prediction was.
So when you want to apply your model, you just do a forward pass with your new data as input.
In your particular example, using this code, you can see how it's done by looking at how they run the test set, starting line 286.
# They instantiate the model with is_training=False
mtest = PTBModel(is_training=False, config=eval_config)
# Then they can do a forward pass
test_perplexity = run_epoch(session, mtest, test_data, tf.no_op())
print("Test Perplexity: %.3f" % test_perplexity)
And if you want the actual prediction and not the perplexity, it is the state in the run_epoch function :
cost, state, _ = session.run([m.cost, m.final_state, eval_op],
{m.input_data: x,
m.targets: y,
m.initial_state: state})

Applying Multi-label Transformation in Rapidminer?

I am working on text categorization in rapid miner and require to implement a problem transformation method to convert multi-label data set into single label i.e. Label Power set etc but couldn't find one in Rapid miner, i am sure i am missing something or may be Rapid miner has provided them with another name or something ?
1) I searched and found "Polynomial By Binomial" operator for Rapidminer which i think is using Binary Relevance internally for problem transformation but how can i apply others i.e. Label Power set or Classifier Chains ?
2) Secondly SVM (Learner) inside "Polynomial By Binomial" operator is applied K(Number of classes)times and combines 'K' Models into a single model but it would still classify a multi-label (multiple labels) example as a single label (one label) example, How can i get the multiple labels associate with an example ?
3) Do i have to store each model generated inside "Polynomial By Binomial" and then apply each on testing data to find out the multiple labels associate with an example ?
I am new to rapid miner so ignore my mistake
Thanks in Advance ...
Polynomial to Bionomial is not the way you want to go.
This operator performs something like XvsAll. This enables you to solve multiclass problems with a learner only capable doing binomial classification.
For your problem:
Would it to transform your table like this:
before:
ID Label
1 A|B|C
2 B|C
to
ID Label
1 A
2 B
3 C
4 B
5 C
The tricky thing for this is how to calculate the performance. But i think once this is clear a combination of recall/remember/remove duplicates and join will do it.

Resources