I am confused with how and where to set num_class parameter for multi-classification using Xgboost Scikit API.
On the scikit website, there is no such parameter in docs.
xgboost.XGBClassifier(max_depth=3, learning_rate=0.1, n_estimators=100,
silent=True, objective='binary:logistic', nthread=-1, gamma=0,
min_child_weight=1, max_delta_step=0, subsample=1, colsample_bytree=1,
colsample_bylevel=1, reg_alpha=0, reg_lambda=1, scale_pos_weight=1,
base_score=0.5, seed=0, missing=None)
The XGB Classifier wrapper function does not require for you to specify the amount of classes priorly. The "label" column passed on the fit is used to specify how many/which classes you will be working with.
Related
I am using google collab to learn some CNN's.
I am using model.compile() to set my loss and optimizer function.
Where do I alter learning rate in the following code?
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
Instead of passing a string you could pass an optimizer to compile method and set your learning rate to the optimizer as shown below:
from keras import optimizers
optm = optimizers.Adam(learning_rate=0.001, beta_1=0.9, beta_2=0.999, amsgrad=False)
model.compile(optimizer=optm,
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
how to make grid_search to deep learning fastai (i use LSTM) to tune hyperpramters of Learner - https://docs.fast.ai/basic_train.html#Learner my code is :
from fastai.train import Learner
from fastai.train import DataBunch
model = NeuralNet(embedding_matrix, y_aux_train.shape[-1])
learn = Learner(databunch, model, loss_func=custom_loss)
i wonder how can i tune the hyper-parameters like in standard classification algorithm of sklearn using -https://scikit-learn.org/stable/modules/grid_search.html .
You can use https://github.com/skorch-dev/skorch that allows you to use Sklearn grid search on a pytorch model, then wrap this model in a fastai learner.
fsel = ske.ExtraTreesClassifier().fit(X, y)
model = SelectFromModel(fsel, prefit=True)
I am trying to train a data set over the ExtraTreesClassifier How does the function SelectFromModel() decide the importance value and what does it return?
As noted in the documentation for SelectFromModel:
threshold : string, float, optional default None
The threshold value to use for feature selection. Features whose importance is greater or equal are kept while the others are discarded. If “median” (resp. “mean”), then the threshold value is the median (resp. the mean) of the feature importances. A scaling factor (e.g., “1.25*mean”) may also be used. If None and if the estimator has a parameter penalty set to l1, either explicitly or implicitly (e.g, Lasso), the threshold used is 1e-5. Otherwise, “mean” is used by default.
In your case threshold is the default value, None, and the mean of the feature_importances_ in your ExtraTreesClassifier will be used as the threshold.
Example
from sklearn.datasets import load_iris
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.feature_selection import SelectFromModel
iris = load_iris()
X, y = iris.data, iris.target
clf = ExtraTreesClassifier()
model = SelectFromModel(clf)
SelectFromModel(estimator=ExtraTreesClassifier(bootstrap=False,
class_weight=None, criterion='gini',
max_depth=None, max_features='auto', max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
oob_score=False, random_state=None, verbose=0, warm_start=False),
norm_order=1, prefit=False, threshold=None)
model.fit(X, y)
print(model.threshold_)
#0.25
print(model.estimator_.feature_importances_)
#array([0.09790258, 0.02597852, 0.35586554, 0.52025336])
print(model.estimator_.feature_importances_.mean())
#0.25
As you can see the fitted model is an instance of SelectFromModel with ExtraTreesClassifier() as the estimator. The threshold is 0.25, which is also the mean of the feature importances of the fitted estimator. Based on the feature importances and threshold the model would keep only the 3rd and 4th features of the input data (those with an importance greater than the threshold). You can use the transform method of the fitted SelectFromModel() class to select these features from the input data.
I'm working on a multi-class classification problem, where I need to get a pre-defined number of classes in the prediction set. In the code below, I'm using sklearn's OnveVsRestClassifier for Logistic Regression
classifier = Pipeline([
('vectorizer', CountVectorizer(stop_words='english')),
('tfidf', TfidfTransformer()),
('clf', OneVsRestClassifier(LogisticRegression()))])
classifier.fit(X_train, Y_train)
predicted = classifier.predict(X_test)
The code above works fine, and it returns a variable number of classes for each test case. I was wondering how can I specify it return N number of predicted classes for each test set.
i can see the prediction using AdaBoostClassifier of ensemble method of sklearn using code like this.
from sklearn.ensemble import AdaBoostClassifier
clf = AdaBoostClassifier(n_estimators=100)
clf.fit(X_train, y_train)
y_pred= clf.predict(X_test)
print y_pred
Now i want to see the prediction of all base estimators(i.e. estimation of all the individual 100 base estimators.) Is it possible in sklearn. How would i do that ?please help me. Thnaks in advance.
for estimator in clf.estimators_:
print estimator.predict(X_test)
You can also get weight and classification error for each estimator, see documentation.