Weak Learners of Gradient Boosting Tree for Classification/ Multiclass Classification - machine-learning

I am a beginner in machine learning field and I want to learn how to do multiclass classification with Gradient Boosting Tree (GBT). I have read some of the articles about GBT but for regression problem and I couldn't find the right explanation about GBT for multiclass classfication. I also check GBT in scikit-learn library for machine learning. The implementation of GBT is GradientBoostingClassifier which used regression tree as the weak learners for multiclass classification.
GB builds an additive model in a forward stage-wise fashion; it allows for the optimization of arbitrary differentiable loss functions. In each stage n_classes_ regression trees are fit on the negative gradient of the binomial or multinomial deviance loss function. Binary classification is a special case where only a single regression tree is induced.
Source : http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html#sklearn.ensemble.GradientBoostingClassifier
The things is, why do we use regression tree as our learners for GBT instead of classification tree ? It would be very helpful, if someone can provide me the explanation about why regression tree is being used rather than classification tree and how regression tree can do the classification. Thank you

You are interpreting 'regression' too literally here (as numeric prediction), which is not the case; remember, classification is handled with logistic regression. See, for example, the entry for loss in the documentation page you have linked:
loss : {‘deviance’, ‘exponential’}, optional (default=’deviance’)
loss function to be optimized. ‘deviance’ refers to deviance (= logistic regression) for classification with probabilistic outputs. For loss ‘exponential’ gradient boosting recovers the AdaBoost algorithm.
So, a 'classification tree' is just a regression tree with loss='deviance'...

Related

How does Regression works in XGboost?

How does XGboost is performing the regression tasks?
Ex- We know that for a classification problem in Boosting, it is punishing the mis-classified points & the weightage is given more for them in the next stump.
How does the weightage is given in case of regression?
1.let’s go through a simple regression example, using Decision Trees as the base predictors (of course, Gradient Boosting also works great with regression tasks). This is called Gradient Tree Boosting, or Gradient Boosted Regression Trees (GBRT).
2.First, let’s fit a DecisionTreeRegressor to the training set (the ouput is a noise quadratic fit)
3.next we will train the second regression tree on the residuals made by the first regression tree.
4.Then we train a third regressor on the residual errors made by the second predictor
5.Now we have an ensemble containing three trees. It can make predictions on a new instance simply by adding up the predictions of all the trees:

Should I use multinomial logistic regression or linear discriminant analysis?

In a classification problem, I cannot use a simple logit model if my data label (aka., dependent variable) has more than two categories. That leaves me with multinomial regression and Linear Discriminant Analysis (LDA) and the likes. Why is it that multinomial logit is not as popular as LDA in machine learning? What is the particular advantage that LDA offers?

Classification LDA vs. TFIDF

I was running Multi-label classification on text data I noticed TFIDF outperformed LDA by a large margin. TFIDF accuracy was aorund 50% and LDA was around 29%.
Is this expected or should LDA do better than this?
LDA is normally used for unsupervised learning, not for classification. It provides a generative model, not a discriminative model (What is the difference between a Generative and Discriminative Algorithm?), which makes it less optimal for classification. LDA can also be sensitive to data preprocessing and model parameters.

Difference between Generalized linear modelling and regular logistic regression

I am trying to perform logistic regression for my data. I came to know about glm. What is the actual difference between glm and regular logistic regression?
What are the pros and cons of it?
Logistic Regression is a special case of Generalized Linear Models. GLMs is a class of models, parametrized by a link function. If you choose logit link function, you'll get Logistic Regression.
The main benefit of GLM over logistic regression is overfitting avoidance. GLM usually try to extract linearity between input variables and then avoid overfitting of your model. Overfitting means very good performance on training data and poor performance on test data.

Difference between classification and regression, with SVMs

What is the exact difference between a Support Vector Machine classifier and a Support Vector Machine regresssion machine?
The one sentence answer is that SVM classifier performs binary classification and SVM regression performs regression.
While performing very different tasks, they are both characterized by following points.
usage of kernels
absence of local minima
sparseness of the solution
capacity control obtained by acting on the margin
number of support vectors, etc.
For SVM classification the hinge loss is used, for SVM regression the epsilon insensitive loss function is used.
SVM classification is more widely used and in my opinion better understood than SVM regression.

Resources