main differences between each tree in a RF and CART best tree? - machine-learning

Main differences between each tree in a RF and the best tree from CART?

The main differences between trees in a Random Forest and the best tree from CART are :
The training set of each tree in the RF is a subset of the main trainset (bootstrapping)
At each node of a RF tree, a subset of feature is used to choose the best one for the split

Related

Manually categorize vs Decision Tree categorize in Classification

encountering a problem when doing a classification problem in Kaggle competition.
If I am trying to categorize numeric variables or categorical variables into other categories manually(from plotting and observing from their distributions), and also find out Decision Tree Tree Graph seems not doing as good in categorizing, should I manually categorize them, to have a better classification accuracy result?
Basically, the question is, can Decision Tree and Random Forest and GradienBoost DT do such a job good enough so that my manual re-categorization has no merit and there is no need to manually categorize and feature engineer in this way?

Regarding prediction of Decision Tree

How does Decision tree predict the out come on a new Data set. Lets say with the hyper parameters I allowed my decision tree to grow only to a certain extent to avoid over fitting. Now a new data point is passed to this trained model, so the new data point reaches to one of the leaf nodes. But how does that leaf node predict whether the data point is either 1 or 0? ( I am talking about Classification here).
Well, you pretty much answered your own question. But just to the extension, in the last the data is labelled to either 0 or 1 is hgihly dependent on the type of algorithm you used, for example, ID3 , uses the mode value to predict. similarly C4.5 and C5 or CART have there different criteria based on info gain, ginni index etc etc....
In simplified terms, the process of training a decision tree and predicting the target features of query instances is as follows:
Present a dataset containing of a number of training instances characterized by a number of descriptive features and a target feature
Train the decision tree model by continuously splitting the target feature along the values of the descriptive features using a measure of information gain during the training process
Grow the tree until we accomplish a stopping criteria --> create leaf nodes which represent the predictions we want to make for new query instances
Show query instances to the tree and run down the tree until we arrive at leaf nodes
DONE - Congratulations you have found the answers to your questions
here is a link I am suggesting which explain the decision tree in very detail with scratch. Give it a good read -
https://www.python-course.eu/Decision_Trees.php

Decision Tree Performance, ML

If we don't give any constraints such as max_depth, minimum number of samples for nodes, Can decision tree always give 0 training error? or it depends on Dataset? What about shown dataset?
edit- it is possible to have a split which results in lower accuracy than parent node, right? According to theory of decision tree it should stop splitting there even if the end results after several splitting can be good! Am I correct?
Decision tree will always find a split that imrpoves accuracy/score
For example, I've built a decision tree on data similiar to yours:
A decision tree can get to 100% accuracy on any data set where there are no 2 samples with the same feature values but different labels.
This is one reason why decision trees tend to overfit, especially on many features or on categorical data with many options.
Indeed, sometimes, we prevent a split in a node if the improvement created by the split is not high enough. This is problematic as some relationships, like y=x_1 xor x_2 cannot be expressed by trees with this limitation.
So commonly, a tree doesn't stop because he cannot improve the model on training data.
The reason you don't see trees with 100% accuracy is because we use techniques to reduce overfitting, such as:
Tree pruning like this relatively new example. This basically means that you build your entire tree, but then you go back and prune nodes that did not contribute enough to the model's performance.
Using a ratio instead of gain for the splits. Basically this is a way to express the fact that we expect less improvement from a 50%-50% split than a 10%-90% split.
Setting hyperparameters, such as max_depth and min_samples_leaf, to prevent the tree from splitting too much.

Number of Trees in Random Forest Regression

I am learning the Random Forest Regression Model. I know that it forms many Trees(models) and then we can predict our target variables by averaging the result of all Trees. I also have a descent understanding of Decision Tree Regression Algorithm. How can we form the best number of Trees?
For example i have a dataset where i am predicting person salary and i have only two input variables that are 'Years of Experience', 'Performance Score ' then how many random Trees can i form using such dataset? Are Random Forest Trees dependent upon the number of input variables? Any Good Example will highly be appreciated..
Thanks in Advance
A decision tree trains the model on the entire dataset and only one model is created. In random forest, multiple decision trees are created and each decision tree is trained on a subset of data by limiting the number of rows and the features. In your case, you only have two features so the model will create and train data on subset of data.
You can create any number of random trees for your data. Usually in random forest, more trees result in better performance but also more computation time. Experiment with your data and see the performance changes between different number of trees. If performance remains same, then use less trees to have faster computation. You can use grid search for this.
Also you can experiment with other ml models like linear regression, which migh† perform well in your case.

How many features does the RandomForest algorithm select?

I'm working with random forest and I'd like to know how does the feature selection works.
I have a set of 423 features and I understand that they are randomnly selected using log2(F)+ 1. So this way I get a 12/13 features subset. But what I cannot understand is how random is the selection and if those subsets are supposed to be different for each tree or if the subset is the same for all the trees but what differ are the multiple combinations.
If I have a model with 10 trees, is the feature selection supposed to vary from tree to tree? Thanks for your help.
Each tree in the forest gets a different random sample of features. Decision tree learning is usually deterministic so if each tree had the same set of features, they would all learn the same decision tree, which defeats the purpose. You want them all to be trained on different subsets of features.
If the algorithm is selecting a subset of 12 features from the original set of 423 features, then each tree will get its own sample (without replacement) of 12 features from the full set.

Resources