GridsearchSV auto fill of params in Hyper parameter tuning - machine-learning

Is there a way to do hyper parameter tuning with the use of Gridsearch without defining each param (parameters) On a classifier/regressor? Like a auto hyper parameter tuning command. on documentation I found ParameterGrid but I did not fully understand what this is for.

In scikit-learn, you need to define both:
which hyperparameter you want to tune
which values of distributions you want to test for each hyperparameter
This is defined with a dictionary like param_grid = {'C': [1, 10], 'kernel': ['linear', 'rbf]} where the keys are the hyperparameter to be tuned, and the values are a list of values to be tested.
When you give this dictionary to GridSearchCV, it automatically creates a grid of hyperparameter with all possible combinations, using ParameterGrid. For example:
from sklearn.model_selection import ParameterGrid
param_grid = {'C': [1, 10], 'kernel': ['linear', 'rbf']}
list(ParameterGrid(param_grid)) == (
[{'C': 1, 'kernel': 'linear'}, {'C': 1, 'kernel': 'rbf'},
{'C': 10, 'kernel': 'linear'}, {'C': 10, 'kernel': 'rbf'}])
This is the list of combinations of hyperparameters that are tested in the grid search.
See also this example about how to use GridSearchCV, or the Automatic parameter searches section of the excellent Getting-started guide.
If you don't want to define yourself which hyperparameter to tune, or which values to test, you need an external definition of reasonable hyperparameter to tune that would work for any dataset. For example, you can take a look at "auto-ML" packages:
auto-sklearn
An automated machine learning toolkit and a drop-in replacement for a
scikit-learn estimator.
autoviml
Automatically Build Multiple Machine Learning Models with a Single Line of Code.
Designed as a faster way to use scikit-learn models without having to preprocess data.
TPOT
An automated machine learning toolkit that optimizes a series of scikit-learn
operators to design a machine learning pipeline, including data and feature
preprocessors as well as the estimators. Works as a drop-in replacement for a scikit-learn estimator.
Featuretools
A framework to perform automated feature engineering. It can be used for
transforming temporal and relational datasets into feature matrices for
machine learning.
Neuraxle
A library for building neat pipelines, providing the right abstractions to
both ease research, development, and deployment of machine learning
applications. Compatible with deep learning frameworks and scikit-learn API,
it can stream minibatches, use data checkpoints, build funky pipelines, and
serialize models with custom per-step savers.
EvalML
EvalML is an AutoML library which builds, optimizes, and evaluates
machine learning pipelines using domain-specific objective functions.
It incorporates multiple modeling libraries under one API, and
the objects that EvalML creates use an sklearn-compatible API.

Related

Strategies to assign specific weights to training instances

I am working on a Machine Learning Classification Model in which the user can provide label instances that should help improve the model.
More relevance needs to be given to the latest instances given by the user than for those instances that were previously available for training.
In particular, I am developing my machine learning models in python using Sklearn libraries.
So far I've only found the strategy of oversampling particular instances as a possible solution to the problem. With this strategy I would create multiple copies of the instances for which I want to give higher relevance.
Other strategy that I've found, but it seems not help under these conditions is:
Strategies that focus on giving weights for each class. This strategy is highly used in multiple libraries like Sklearn by default. However, this generalizes the idea to a class level and doesn't help me to put focus on particular instances
I've look for multiple strategies that might help provide specific weights for individual instances but most have focused on class level instead of instance level weights.
I read some suggestions to multiple the loss function by some factors for instances in tensor flow models, but this seems to be mostly applicable to neural network models in Tensor flow.
I wonder if anyone has information of other approaches that might helps with this problem
I've look for multiple strategies that might help provide specific weights for individual instances but most have focused on class level instead of instance level weights.
This is not accurate; most scikit-learn classifiers provide a sample_weight argument in their fit methods, which does exactly that. For example, here is the documentation reference for Logistic Regression:
sample_weight : array-like, shape (n_samples,) optional
Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.
Similar arguments exist for most scikit-learn classifiers, e.g. decision trees, random forests etc, even for linear regression (not a classifier). Be sure to check the SVM: Weighted samples example in the docs.
The situation is roughly similar for other frameworks; see for example own answer in Is there in PySpark a parameter equivalent to scikit-learn's sample_weight?
What's more, scikit-learn also provides a utility function to compute sample_weight in cases of imbalanced datasets: sklearn.utils.class_weight.compute_sample_weight

Is there a way of using cosine similarity instead of dot product in python (sklearn/keras)

I just started using Sklearn (MLPRegressor) and Keras (Sequential, with Dense layers).
Today I read this paper describing how using cosine similarity instead of the dot product improves the performance. This basically says that if we replace f(w^Tx) with f((w^Tx)/(|x||w|)), i.e. we don't just feed the dot product to the activation function but we normalize it, we get a better and quicker performance.
Is there a way of doing this in Python, specifically in MLPRegressor in SKlearn (or another), or in Keras? (maybe TensorFlow?)
Sklearn uses prebuilt networks, so no. I also don't think it's possible in Keras, as it has prebuilt layers.
It sure can be implemented in Tensorflow though. Note that in TF you can explicitly define layers.
For example in this snippet you'd need to add normalization in line 25, namely you can divide output rows tf.nn.sigmoid(tf.matmul(X, w_1)) by appropriate norms of input rows (you can get them using tf.nn.l2_normalize with dim=1)

What kind of feature extractor is used in vowpal wabbit?

In sklearn when we pass sentence to algorithms we can use text features extractors like the countvectorizer, tf-idf vectoriser etc... And we get an array of floats.
But what we get when passed to vowpal wabbit the input file like this one:
-1 |Words The sun is blue
1 |Words The sun is yellow
What is used in internal implementation of vowpal wabbit? How does this text transform?
There are two separate questions here:
Q1: Why can't you (and shouldn't you) use transformations like tf-idf when using vowpal wabbit ?
A1: vowpal wabbit is not a batch learning system, it is an online-learning system. In order to compute measures like tf-idf (term frequency in each document vs the whole corpus) you need to see all the data (corpus) first, and sometimes do multiple passes over the data. vowpal wabbit as an online/incremental learning system is designed to also work on problems where you don't have the full data ahead of time. See This answer for a lot more details.
Q2: How does vowpal wabbit "transform" the features it sees ?
A2: It doesn't. It simply maps each word feature on-the-fly to its hashed location in memory. The online learning step is driven by a repetitive optimization loop (SGD or BFGS) example by example, to minimize the modeling error. You may select the loss function to optimize for.
However, if you already have the full data you want to train on, nothing prevents you from transforming it (using any other tool) before feeding the transformed values to vowpal wabbit. It's your choice. Depending on the particular data, you may get better or worse results using a transformation pre-pass, than by running multiple passes with vowpal wabbit itself without preliminary transformations (check-out the vw --passes option).
To complete the answer, let's add another related question:
Q3: Can I use pre-transformed (e.g. tf-idf) data with vowpal wabbit ?
A3: Yes, you can. Just use the following (post-transformation) form. Instead of words, use integers as feature IDs and since any feature can have an optional explicit weight, use the tf-idf floating point as weights, following the : separator in typical SVMlight format:
-1 | 1:0.534 15:0.123 3:0.27 29:0.066 ...
1 | 3:0.1 102:0.004 24:0.0304 ...
The reason this works, is because vw has a nice feature of distinguishing between string and integer-features. It doesn't hash feature-names that look like integers (unless you use the --hash_all option explicitly). Integer feature numbers are used directly as if they were the hash result of the feature.

What is the difference between Hashing vectorizer and Count vectorizer, when each to be used?

I am trying with various SVM variants in scikit-learn along with CountVectorizer and HashingVectorizer. They use fit or fit_transform in different examples, confusing me which to be used when.
Any clarification would be much honored.
They serve a similar purpose. The documentation provides some pro's and con's for the HashingVectorizer :
This strategy has several advantages:
it is very low memory scalable to large datasets as there is no need to store a vocabulary dictionary in memory
it is fast to pickle and un-pickle as it holds no state besides the constructor parameters
it can be used in a streaming (partial fit) or parallel pipeline as there is no state computed during fit.
There are also a couple of cons (vs using a CountVectorizer with an
in-memory vocabulary):
there is no way to compute the inverse transform (from feature indices to string feature names) which can be a problem when trying to
introspect which features are most important to a model.
there can be collisions: distinct tokens can be mapped to the same feature index. However in practice this is rarely an issue if
n_features is large enough (e.g. 2 ** 18 for text classification
problems).
no IDF weighting as this would render the transformer stateful.

10 fold cross validation with weka api

How can I make a classification model by 10-fold cross-validation using Weka Api.
Should I cross validate model first : e.g.
evaluation.crossValidateModel(classifier, trainingSet, 10, Random(1))
and then build a new classifier based on this trainedSet. e.g
NaiveBayes nb2 = new NaiveBayes();
nb2.buildClassifier(train);
and then save and use this model (nb2)?
You are mixing concepts. Cross validation is used to test the performance of learning techniques over a dataset. The common procedure is to perform CV using the whole dataset with 10 folds usually. Then you can see which learning technique is obtaining better performance. You can use that technique to learn a model over the whole dataset for future predictions.
http://en.wikipedia.org/wiki/Cross-validation_(statistics)

Resources