Number of Trees in Random Forest Regression - machine-learning

I am learning the Random Forest Regression Model. I know that it forms many Trees(models) and then we can predict our target variables by averaging the result of all Trees. I also have a descent understanding of Decision Tree Regression Algorithm. How can we form the best number of Trees?
For example i have a dataset where i am predicting person salary and i have only two input variables that are 'Years of Experience', 'Performance Score ' then how many random Trees can i form using such dataset? Are Random Forest Trees dependent upon the number of input variables? Any Good Example will highly be appreciated..
Thanks in Advance

A decision tree trains the model on the entire dataset and only one model is created. In random forest, multiple decision trees are created and each decision tree is trained on a subset of data by limiting the number of rows and the features. In your case, you only have two features so the model will create and train data on subset of data.
You can create any number of random trees for your data. Usually in random forest, more trees result in better performance but also more computation time. Experiment with your data and see the performance changes between different number of trees. If performance remains same, then use less trees to have faster computation. You can use grid search for this.
Also you can experiment with other ml models like linear regression, which migh† perform well in your case.

Related

A random forest with one tree performs worse than a single decision tree?

I am analyzing medical data for a hospital study and if I am using a random forest with only one tree then the cross validation scores are quite bad (indicating overfitting) whereas if I am using a decision tree the score values are actually quiet good. Both classifier have the same depth parameter. So how can this behaviour be explained?
The construction procedure for decision trees usually includes pruning, which is a part that is done a posteriori in order to reduce the depth and avoid overfitting. Random Forest does not use this method, as it actually takes advantage of the high variance of the overfitted decision trees by averaging them.
Moreover, the decision tree would be built by training on the full dataset, while the "random forest" tree would be build on a bootstrap of the training dataset, which could likely translate into poorer performance since it will be biased towards records that have been included multiple times in the sampling. Again, Random Forest takes advantage of this by averaging over multiple trees, but here it's a disadvantage.
All and all, the difference in performance is not surprising.

Why random forests or decision trees can't give 100% precision? And How to handle huge noise in the middle?

If decision tree tries to determine splits based on highest amount of data belonging to similar class, why can't it split this particular data until each split has only 1 element in it,
which would lead to a 100% precision?
Having only 1 data point / case in every terminal node can cause over-fitting on the training dataset. To avoid this, test the constructed model using a particular summary statistic (eg. RMSE) against both the training and validation datasets. In Random Forest, the 'Out of Bag' sample can be used as a validation set. This is the proportion of data (roughly 37%) that is not used in the construction of every tree. The RMSE should be relatively similar between both the training and validation sets.

when adaboost is better than XGboost in some data combinations?

my name is Eslam a masters' student in Egypt, my thesis is in the field of education data mining. I used AdaBoost and XGBoost techniques for my predictive model to predict students success rate based on Open Learning Analytics data-set - OLAD.
the idea behind the analysis is tying various techniques (including ensemble and non ensemble techniques) on different combinations of features,interesting results showed up
Results:
the question is why some techniques performs better than others in specific features combinations? specially for Random Rorest,XGB and ADA?
ML model could achieve different results based on what kind of space and what kind of function you want to approximate. You can expect that SVM achieve highest score on data which are naturally embedded on Hilbert space. On the other hand if data does not fit this kind of space (i.e. many categorical, not ordered features) you can expect boosting trees methods would outperform SVM.
However if I good understood that 'Decision Tree Accuracy' is a single decision tree based on results from the picture I believe your tests was done on small data sets or your boosting and RF was incorrectly parametrized.

Create negative examples in dataset with only positive ones

Imagine we have a classification problem on a dataset where the examples are only positive (equivalently negative). For instance, on a problem where the the winning class is specified by position (e.g. think of a tennis dataset problem where the first player is always the winner). How can we create negative examples in order to train a supervised learning algorithm on this dataset? One idea could be to generate negative examples, by exchanging the positions of the features that are tied to each of the classes. Do you think this will give an unbiased dataset? Could we create negative duplicates of our original dataset and train a supervised learning algorithm on this double dataset?

Is there any classifier which is able to make decisions very fast?

Most classification algorithms are developed to improve the training speed. However, is there any classifier or algorithm focusing on the decision making speed(low computation complexity and simple realizable structure)? I can get enough training data,and endure the long training time.
There are many methods which classify fast, you could more or less sort models by classification speed in a following way (first ones - the fastest, last- slowest)
Decision Tree (especially with limited depth)
Linear models (linear regression, logistic regression, linear svm, lda, ...) and Naive Bayes
Non-linear models based on explicit data transformation (Nystroem kernel approximation, RVFL, RBFNN, EEM), Kernel methods (such as kernel SVM) and shallow neural networks
Random Forest and other committees
Big Neural Networks (ie. CNN)
KNN with arbitrary distance
Obviously this list is not exhaustive, it just shows some general ideas.
One way of obtaining such model is to build a complex, slow model, then use it as a black box label generator to train a simplier model (but on potentialy infinite training set) - thus getting a fast classifier at the cost of very expensive training. There are many works showing that one can do that for example by training a shallow neural network on outputs of deep nn.
In general classification speed should not be a problem. Some exceptions are algorithms which have a time complexity depending on the number of samples you have for training. One example is k-Nearest-Neighbors which has no training time, but for classification it needs to check all points (if implemented in a naive way). Other examples are all classifiers which work with kernels since they compute the kernel between the current sample and all training samples.
Many classifiers work with a scalar product of the features and a learned coefficient vector. These should be fast enough in almost all cases. Examples are: Logistic regression, linear SVM, perceptrons and many more. See #lejlot's answer for a nice list.
If these are still too slow you might try to reduce the dimension of your feature space first and then try again (this also speeds up training time).
Btw, this question might not be suited for StackOverflow as it is quite broad and recommendation instead of problem oriented. Maybe try https://stats.stackexchange.com/ next time.
I have a decision tree which is represented in the compressed form and which is at least 4 times faster than the actual tree in classifying an unseen instance.

Resources