how to make grid_search on fastai learner? - machine-learning

how to make grid_search to deep learning fastai (i use LSTM) to tune hyperpramters of Learner - https://docs.fast.ai/basic_train.html#Learner my code is :
from fastai.train import Learner
from fastai.train import DataBunch
model = NeuralNet(embedding_matrix, y_aux_train.shape[-1])
learn = Learner(databunch, model, loss_func=custom_loss)
i wonder how can i tune the hyper-parameters like in standard classification algorithm of sklearn using -https://scikit-learn.org/stable/modules/grid_search.html .

You can use https://github.com/skorch-dev/skorch that allows you to use Sklearn grid search on a pytorch model, then wrap this model in a fastai learner.

Related

Linear regression with SGD using pyspark.ml.linearegression

I'm using the LinearRegression model in the Spark ML for prediction.
import pyspark.ml.regression.LinearRegression
featureassembler = VectorAssembler(inputCols=[‘Year’, ‘Present_Price’,
‘Kms_Driven’, ‘Owner’],
outputCol=’features’)
output = featureassembler.transform(df)
data = output.select('features', 'Selling_Price')
# Initializing a Linear Regression model
ss = LinearRegression(featuresCol='features', labelCol='Selling_Price')
I want to test the linear regression with SGD(Stochastic Gradient Descent.) but pyspark.ml does not propose any linearregressionwithSGD like mllib. Also, when accessing the mllib linear regressionwithSGD i found that it Deprecated since version 2.0.0.
How can i use ml for linear regression with SGD. Is there any parameter that i can use for that?
Instead of ml you can use mllib:
from pyspark.mllib.regression import LabeledPoint, LinearRegressionWithSGD, LinearRegressionModel
Here is the documentation:
https://spark.apache.org/docs/1.6.1/mllib-linear-methods.html

The workflow about using cross_val_score to predict

I want to confirm about the workflow of using cross_val_score in sklearn to predict.
Initialize the estimattor like using Logistic regression model like this:
model = LogisticRegression(solver="lbfgs", max_iter=1000)
Call the cross_val_score with the parameters like model, selected_X, selected_Y, and cv:
scores = cross_val_score(model, selected_X, selected_Y, cv=10)
Get the score's mean and std to see if the score is acceptable, if not, adjust model parameters in step 1. If yes, fit the model by calling:
model.fit(selected_X, selected_Y)
Finally, use the model to predict data:
predict_Y = model.predict(predict_X)
Please let me know if my understanding is correct.

Does Scikit-learn support transfer learning?

Does Scikit-learn support transfer learning? Please check the following code.
model clf is gotten by fit(X,y)
Can model clf2 learn on the base of clf and transfer learn by fit(X2,y2) ?
>>> from sklearn import svm
>>> from sklearn import datasets
>>> clf = svm.SVC()
>>> X, y= ....
>>> clf.fit(X, y)
SVC()
>>> import pickle
>>> s = pickle.dumps(clf)
>>> clf2 = pickle.loads(s)
>>> clf2.fit(X2,y2)
>>> clf2.predict(X[0:1])
In the context of scikit-learn there's no transfer learning as such, there is incremental learning or continuous learning or online learning.
By looking at your code, whatever you're intending to do won't work the way you're thinking here. From this scikit-learn documentation:
Calling fit() more than once will overwrite what was learned by any
previous fit()
Which means using fit() more than once on the same model will simply overwrite all the previously fitted coefficients, weights, intercept (bias), etc.
However if you want to fit a portion of your data set and then improve your model by fitting a new data, what you can do is look for estimators that include partial_fit API implementation.
If we call partial_fit() multiple times, framework will update the
existing weights instead of re-initialising them.
Another way to do incremental learning with scikit-learn is to look for algorithms that support the warm_start parameter.
From this doc:
warm_start: bool, default=False
When set to True, reuse the solution of
the previous call to fit() as initialization, otherwise, just erase the
previous solution. Useless for liblinear solver.
Another example is Random forrest regressor.

can I use PCA for dimensionality reduction and then use its o/p for one class SVM classifier in python

I want to use PCA for dimensionality reduction and then use its o/p for one class SVM classifier in python. My training data set is of the order 16000x60. Also how to map principal component to original column to use it in SVM or can I use principal component directly?
It is unclear what the problem is and what did you try already. Of course you can. You can either add PCA output to your original set or just use the output as a single feature. I encourage you to use sklearn pipelines.
Simple example:
from sklearn import decomposition, datasets
from sklearn.pipeline import Pipeline
from sklearn import svm
svc = svm.SVC()
pca = decomposition.PCA()
pipe = Pipeline(steps=[('pca', pca), ('svc', svc)])
digits = datasets.load_digits()
X_digits = digits.data
y_digits = digits.target
pipe.fit(X_digits, y_digits)
print(pipe.score(X_digits,y_digits))

Using Scorer Object for Classifier Score Method

I have written my custom scorer object which is necessary for my problem and which I've called "p_value_scoring_object".
For the function sklearn.cross_validation.cross_val_score one of the parameters is "scoring", which allows to use this scorer object.
However, this option is not available for the score method of a classifier. Is sklearn just lacking that feature, or is there a way around it?
from sklearn.datasets import load_iris
from sklearn.cross_validation import cross_val_score
from sklearn.tree import DecisionTreeClassifier
clf = DecisionTreeClassifier(random_state=0)
iris = load_iris()
cross_val_score(clf, iris.data, iris.target, cv=10,scoring=p_value_scoring_object)
This works. However, this doesn't:
clf.fit(iris.data,iris.target)
clf.score(iris.data,iris.target,scoring=p_value_scoring_object)
sklearn just lacking that feature. Score is internally binded to different metrics for different types of estimators. For example classifiers are binded to classification accuracy score metric, for regressors it's binded to r2_score.
You can look at these binds in sklearn.base, every mixin (For example ClassifierMixin) provides this score method.
Istead of this you can just run:
p_value_scoring_object(p_value_scoring_object, iris.data, iris.target)

Resources