How does Regression works in XGboost? - machine-learning

How does XGboost is performing the regression tasks?
Ex- We know that for a classification problem in Boosting, it is punishing the mis-classified points & the weightage is given more for them in the next stump.
How does the weightage is given in case of regression?

1.let’s go through a simple regression example, using Decision Trees as the base predictors (of course, Gradient Boosting also works great with regression tasks). This is called Gradient Tree Boosting, or Gradient Boosted Regression Trees (GBRT).
2.First, let’s fit a DecisionTreeRegressor to the training set (the ouput is a noise quadratic fit)
3.next we will train the second regression tree on the residuals made by the first regression tree.
4.Then we train a third regressor on the residual errors made by the second predictor
5.Now we have an ensemble containing three trees. It can make predictions on a new instance simply by adding up the predictions of all the trees:

Related

How do Gradient Boosted Trees calculate errors in classification?

I understand how gradient boosting works for regression when we build the next model on the residual error of the previous model - if we use for example linear regression then it will be the residual errror as the target of the next model then sums all the models at the end to get a strong leaner
But how is this done in gradient boosted classification trees? Lets say we have a binary classification model with outcome 0/1 - what is the residual error for the next model to be trained on? And how is it calculated because it will not be y minus y predicted as is the case in linear regression.
I am really stuck on this one! The error of one binary classification tree is the ones it missclassifies - so is the target for the next model the missclasified points only?
binary classification can be posed as a regression problem of predicting the probability, such as P(y=1 | x), where y is class-label. you can use log-loss (logistic loss) instead of a squared loss for this to work.

What does the KNN algorithm do in the training phase?

Unlike other algorithms like linear regressions ,KNN doesn't seems to perform any calculation in the training phase. Like in case of linear regressions it finds the coefficients in the training phase.But what about KNN?
During training phase, KNN arranges the data (sort of indexing process) in order to find the closest neighbors efficiently during the inference phase. Otherwise, it would have to compare each new case during inference with the whole dataset making it quite inefficient.
You can read more about it at: https://scikit-learn.org/stable/modules/neighbors.html#nearest-neighbor-algorithms
KNN belongs to the group of lazy learners. As opposed to eager learners such as logistic regression, svms, neural nets, lazy learners just store the training data in memory. Then, during inference, it find the K nearest neighbours from the training data in order to classify the new instance.
KNN is an instance based method, which completely relies on training examples, in other words, it memorizes all the training examples So in case of classification, whenever any examples appears, it compute euclidean distance between the input example and all the training examples, and returns the label of the closest training example based on the distance.
Knn is lazy learner . It means that , like other algorithms learn in their training phase (Linear regression etc) , Knn learn in training phase . It actually just store data points in RAM at time of training .
Like in case of linear regressions it finds the coefficients in the training phase.But what about KNN?--> In case of KNN it tunes its parameter in testing phase . In testing phase it finds its optimal solution of parameters (K value , Distance calculating technique etc).
Unlike other algorithms which learn in training phase and get tested in testing phase , KNN learn and get tested(K fold CV) for parameters in testing phase .
Distance calculation->https://scikit-learn.org/stable/modules/neighbors.html#nearest-neighbor-algorithms
KNN python docs->https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html

Weak Learners of Gradient Boosting Tree for Classification/ Multiclass Classification

I am a beginner in machine learning field and I want to learn how to do multiclass classification with Gradient Boosting Tree (GBT). I have read some of the articles about GBT but for regression problem and I couldn't find the right explanation about GBT for multiclass classfication. I also check GBT in scikit-learn library for machine learning. The implementation of GBT is GradientBoostingClassifier which used regression tree as the weak learners for multiclass classification.
GB builds an additive model in a forward stage-wise fashion; it allows for the optimization of arbitrary differentiable loss functions. In each stage n_classes_ regression trees are fit on the negative gradient of the binomial or multinomial deviance loss function. Binary classification is a special case where only a single regression tree is induced.
Source : http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html#sklearn.ensemble.GradientBoostingClassifier
The things is, why do we use regression tree as our learners for GBT instead of classification tree ? It would be very helpful, if someone can provide me the explanation about why regression tree is being used rather than classification tree and how regression tree can do the classification. Thank you
You are interpreting 'regression' too literally here (as numeric prediction), which is not the case; remember, classification is handled with logistic regression. See, for example, the entry for loss in the documentation page you have linked:
loss : {‘deviance’, ‘exponential’}, optional (default=’deviance’)
loss function to be optimized. ‘deviance’ refers to deviance (= logistic regression) for classification with probabilistic outputs. For loss ‘exponential’ gradient boosting recovers the AdaBoost algorithm.
So, a 'classification tree' is just a regression tree with loss='deviance'...

Transfer Learning and linear classifier

In cs231n handout here, it says
New dataset is small and similar to original dataset. Since the data
is small, it is not a good idea to fine-tune the ConvNet due to
overfitting concerns... Hence, the best idea might be to train a
linear classifier on the CNN codes.
I'm not sure what linear classifier means. Does the linear classifier refer to the last fully connected layer? (For example, in Alexnet, there are three fully connected layers. Does the linear classifier the last fully connected layer?)
Usually when people say "linear classifier" they refer to Linear SVM (support vector machine). A linear classifier learns a weight vecotr w and a threshold (aka "bias") b such that for each example x the sign of
<w, x> + b
is positive for the "positive" class and negative for the "negative" class.
The last (usually fully connected) layer of a neural-net can be considered as a form of a linear classifier.

What is the difference between linear regression and logistic regression? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
When we have to predict the value of a categorical (or discrete) outcome we use logistic regression. I believe we use linear regression to also predict the value of an outcome given the input values.
Then, what is the difference between the two methodologies?
Linear regression output as probabilities
It's tempting to use the linear regression output as probabilities but it's a mistake because the output can be negative, and greater than 1 whereas probability can not. As regression might actually
produce probabilities that could be less than 0, or even bigger than
1, logistic regression was introduced.
Source: http://gerardnico.com/wiki/data_mining/simple_logistic_regression
Outcome
In linear regression, the outcome (dependent variable) is continuous.
It can have any one of an infinite number of possible values.
In logistic regression, the outcome (dependent variable) has only a limited number of possible values.
The dependent variable
Logistic regression is used when the response variable is categorical in nature. For instance, yes/no, true/false, red/green/blue,
1st/2nd/3rd/4th, etc.
Linear regression is used when your response variable is continuous. For instance, weight, height, number of hours, etc.
Equation
Linear regression gives an equation which is of the form Y = mX + C,
means equation with degree 1.
However, logistic regression gives an equation which is of the form
Y = eX + e-X
Coefficient interpretation
In linear regression, the coefficient interpretation of independent variables are quite straightforward (i.e. holding all other variables constant, with a unit increase in this variable, the dependent variable is expected to increase/decrease by xxx).
However, in logistic regression, depends on the family (binomial, Poisson,
etc.) and link (log, logit, inverse-log, etc.) you use, the interpretation is different.
Error minimization technique
Linear regression uses ordinary least squares method to minimise the
errors and arrive at a best possible fit, while logistic regression
uses maximum likelihood method to arrive at the solution.
Linear regression is usually solved by minimizing the least squares error of the model to the data, therefore large errors are penalized quadratically.
Logistic regression is just the opposite. Using the logistic loss function causes large errors to be penalized to an asymptotically constant.
Consider linear regression on categorical {0, 1} outcomes to see why this is a problem. If your model predicts the outcome is 38, when the truth is 1, you've lost nothing. Linear regression would try to reduce that 38, logistic wouldn't (as much)2.
In linear regression, the outcome (dependent variable) is continuous. It can have any one of an infinite number of possible values. In logistic regression, the outcome (dependent variable) has only a limited number of possible values.
For instance, if X contains the area in square feet of houses, and Y contains the corresponding sale price of those houses, you could use linear regression to predict selling price as a function of house size. While the possible selling price may not actually be any, there are so many possible values that a linear regression model would be chosen.
If, instead, you wanted to predict, based on size, whether a house would sell for more than $200K, you would use logistic regression. The possible outputs are either Yes, the house will sell for more than $200K, or No, the house will not.
Just to add on the previous answers.
Linear regression
Is meant to resolve the problem of predicting/estimating the output value for a given element X (say f(x)). The result of the prediction is a continuous function where the values may be positive or negative. In this case you normally have an input dataset with lots of examples and the output value for each one of them. The goal is to be able to fit a model to this data set so you are able to predict that output for new different/never seen elements. Following is the classical example of fitting a line to set of points, but in general linear regression could be used to fit more complex models (using higher polynomial degrees):
Resolving the problem
Linear regression can be solved in two different ways:
Normal equation (direct way to solve the problem)
Gradient descent (Iterative approach)
Logistic regression
Is meant to resolve classification problems where given an element you have to classify the same in N categories. Typical examples are, for example, given a mail to classify it as spam or not, or given a vehicle find to which category it belongs (car, truck, van, etc ..). That's basically the output is a finite set of discrete values.
Resolving the problem
Logistic regression problems could be resolved only by using Gradient descent. The formulation in general is very similar to linear regression the only difference is the usage of different hypothesis function. In linear regression the hypothesis has the form:
h(x) = theta_0 + theta_1*x_1 + theta_2*x_2 ..
where theta is the model we are trying to fit and [1, x_1, x_2, ..] is the input vector. In logistic regression the hypothesis function is different:
g(x) = 1 / (1 + e^-x)
This function has a nice property, basically it maps any value to the range [0,1] which is appropiate to handle propababilities during the classificatin. For example in case of a binary classification g(X) could be interpreted as the probability to belong to the positive class. In this case normally you have different classes that are separated with a decision boundary which basically a curve that decides the separation between the different classes. Following is an example of dataset separated in two classes.
You can also use the below code to generate the linear regression
curve
q_df = details_df
# q_df = pd.get_dummies(q_df)
q_df = pd.get_dummies(q_df, columns=[
"1",
"2",
"3",
"4",
"5",
"6",
"7",
"8",
"9"
])
q_1_df = q_df["1"]
q_df = q_df.drop(["2", "3", "4", "5"], axis=1)
(import statsmodels.api as sm)
x = sm.add_constant(q_df)
train_x, test_x, train_y, test_y = sklearn.model_selection.train_test_split(
x, q3_rechange_delay_df, test_size=0.2, random_state=123 )
lmod = sm.OLS(train_y, train_x).fit() lmod.summary()
lmod.predict()[:10]
lmod.get_prediction().summary_frame()[:10]
sm.qqplot(lmod.resid,line="q") plt.title("Q-Q plot of Standardized
Residuals") plt.show()
Simply put, linear regression is a regression algorithm, which outpus a possible continous and infinite value; logistic regression is considered as a binary classifier algorithm, which outputs the 'probability' of the input belonging to a label (0 or 1).
The basic difference :
Linear regression is basically a regression model which means its will give a non discreet/continuous output of a function. So this approach gives the value. For example : given x what is f(x)
For example given a training set of different factors and the price of a property after training we can provide the required factors to determine what will be the property price.
Logistic regression is basically a binary classification algorithm which means that here there will be discreet valued output for the function . For example : for a given x if f(x)>threshold classify it to be 1 else classify it to be 0.
For example given a set of brain tumour size as training data we can use the size as input to determine whether its a benine or malignant tumour. Therefore here the output is discreet either 0 or 1.
*here the function is basically the hypothesis function
They are both quite similar in solving for the solution, but as others have said, one (Logistic Regression) is for predicting a category "fit" (Y/N or 1/0), and the other (Linear Regression) is for predicting a value.
So if you want to predict if you have cancer Y/N (or a probability) - use logistic. If you want to know how many years you will live to - use Linear Regression !
Regression means continuous variable, Linear means there is linear relation between y and x.
Ex= You are trying to predict salary from no of years of experience. So here salary is independent variable(y) and yrs of experience is dependent variable(x).
y=b0+ b1*x1
We are trying to find optimum value of constant b0 and b1 which will give us best fitting line for your observation data.
It is a equation of line which gives continuous value from x=0 to very large value.
This line is called Linear regression model.
Logistic regression is type of classification technique. Dnt be misled by term regression. Here we predict whether y=0 or 1.
Here we first need to find p(y=1) (wprobability of y=1) given x from formuale below.
Probaibility p is related to y by below formuale
Ex=we can make classification of tumour having more than 50% chance of having cancer as 1 and tumour having less than 50% chance of having cancer as 0.
Here red point will be predicted as 0 whereas green point will be predicted as 1.
Cannot agree more with the above comments.
Above that, there are some more differences like
In Linear Regression, residuals are assumed to be normally distributed.
In Logistic Regression, residuals need to be independent but not normally distributed.
Linear Regression assumes that a constant change in the value of the explanatory variable results in constant change in the response variable.
This assumption does not hold if the value of the response variable represents a probability (in Logistic Regression)
GLM(Generalized linear models) does not assume a linear relationship between dependent and independent variables. However, it assumes a linear relationship between link function and independent variables in logit model.
| Basis | Linear | Logistic |
|-----------------------------------------------------------------|--------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------|
| Basic | The data is modelled using a straight line. | The probability of some obtained event is represented as a linear function of a combination of predictor variables. |
| Linear relationship between dependent and independent variables | Is required | Not required |
| The independent variable | Could be correlated with each other. (Specially in multiple linear regression) | Should not be correlated with each other (no multicollinearity exist). |
In short:
Linear Regression gives continuous output. i.e. any value between a range of values.
Logistic Regression gives discrete output. i.e. Yes/No, 0/1 kind of outputs.
To put it simply, if in linear regression model more test cases arrive which are far away from the threshold(say =0.5)for a prediction of y=1 and y=0. Then in that case the hypothesis will change and become worse.Therefore linear regression model is not used for classification problem.
Another Problem is that if the classification is y=0 and y=1, h(x) can be > 1 or < 0.So we use Logistic regression were 0<=h(x)<=1.
Logistic Regression is used in predicting categorical outputs like Yes/No, Low/Medium/High etc. You have basically 2 types of logistic regression Binary Logistic Regression (Yes/No, Approved/Disapproved) or Multi-class Logistic regression (Low/Medium/High, digits from 0-9 etc)
On the other hand, linear regression is if your dependent variable (y) is continuous.
y = mx + c is a simple linear regression equation (m = slope and c is the y-intercept). Multilinear regression has more than 1 independent variable (x1,x2,x3 ... etc)
In linear regression the outcome is continuous whereas in logistic regression, the outcome has only a limited number of possible values(discrete).
example:
In a scenario,the given value of x is size of a plot in square feet then predicting y ie rate of the plot comes under linear regression.
If, instead, you wanted to predict, based on size, whether the plot would sell for more than 300000 Rs, you would use logistic regression. The possible outputs are either Yes, the plot will sell for more than 300000 Rs, or No.
In case of Linear Regression the outcome is continuous while in case of Logistic Regression outcome is discrete (not continuous)
To perform Linear regression we require a linear relationship between the dependent and independent variables. But to perform Logistic regression we do not require a linear relationship between the dependent and independent variables.
Linear Regression is all about fitting a straight line in the data while Logistic Regression is about fitting a curve to the data.
Linear Regression is a regression algorithm for Machine Learning while Logistic Regression is a classification Algorithm for machine learning.
Linear regression assumes gaussian (or normal) distribution of dependent variable. Logistic regression assumes binomial distribution of dependent variable.
The basic difference between Linear Regression and Logistic Regression is :
Linear Regression is used to predict a continuous or numerical value but when we are looking for predicting a value that is categorical Logistic Regression come into picture.
Logistic Regression is used for binary classification.

Resources