What is the meaning of leaf node of J48 tree classifier - machine-learning

I am unable to understand the meaning of leaf node attribute of decision tree.
I am a new machine learner and after classifying the dataset by using J48 algo.I got a tree and now I'm unable to understand which attribute's value is related with tree's leaf node.
I'm simply perform a prediction by using dataset from Kaggle.

Majority class, and class frequencies.
These values are useful for estimating the certainty of the classifier.

Related

Regarding prediction of Decision Tree

How does Decision tree predict the out come on a new Data set. Lets say with the hyper parameters I allowed my decision tree to grow only to a certain extent to avoid over fitting. Now a new data point is passed to this trained model, so the new data point reaches to one of the leaf nodes. But how does that leaf node predict whether the data point is either 1 or 0? ( I am talking about Classification here).
Well, you pretty much answered your own question. But just to the extension, in the last the data is labelled to either 0 or 1 is hgihly dependent on the type of algorithm you used, for example, ID3 , uses the mode value to predict. similarly C4.5 and C5 or CART have there different criteria based on info gain, ginni index etc etc....
In simplified terms, the process of training a decision tree and predicting the target features of query instances is as follows:
Present a dataset containing of a number of training instances characterized by a number of descriptive features and a target feature
Train the decision tree model by continuously splitting the target feature along the values of the descriptive features using a measure of information gain during the training process
Grow the tree until we accomplish a stopping criteria --> create leaf nodes which represent the predictions we want to make for new query instances
Show query instances to the tree and run down the tree until we arrive at leaf nodes
DONE - Congratulations you have found the answers to your questions
here is a link I am suggesting which explain the decision tree in very detail with scratch. Give it a good read -
https://www.python-course.eu/Decision_Trees.php

Can decision tree based model predict future?

I am trying to build a model that predicts the shipping volume of each month, week, and day.
I found that the decision tree-based model works better than linear regression.
But I read some articles about machine learning and it says decision tree based model can't predict future which model didn't learn. (extrapolation issues)
So I think it means that if the data is spread between the dates that train data has, the model can predcit well, but if the date of data is out of the range, it can not.
I'd like to confirm if my understand is correct.
some posting shows prediction for datetime based data using random forest model, and it makes me confused.
Also please let me know if there is any way to overcome extrapolation issues on decision tree based model.
It depends on the data.
Decision tree predicts class value of any sample in range of [minimum of class value of training data, maximum of class value of training data]. For example, let there are five samples [(X1, Y1), (X2, Y2), ..., (X5, Y5)], and well trained tree has two decision node. The first node N1 includes (X1, Y1), (X2, Y2) and the other node N2 includes (X3, Y3), (X4, Y4), and (X5, Y5). Then the tree will predict a new sample as mean of Y1 and Y2 when the sample reaches N1, but it will predict a new sample as men of Y3, Y4, Y5 when the sample reaches N2.
With this reason, if the class value of new sample could be bigger than the maximum of class value of training data or could be smaller than the minimum of class value of training data, it is not recommend to use decision tree. Otherwise, tree-based model such as random forest shows good performance.
There can be different forms of extrapolation issues here.
As already mentioned a classical decision tree for classification can only predict values it has encountered in its training/creation process. In that sense you won't predict any previously unseen values.
This issue can be remedied if you have the classifier predict relative updates instead of absolute values. But you need to have some understanding of your data, to determine what works best for different cases.
Things are similar for a decision tree used for regression.
The next issue with "extrapolation" is that decision trees might perform badly if your training data has changing statistics over time. Again, I would propose to predict update relationships.
Otherwise, predictions based on training data from a more recent past might yield better predictions. Since individual decision trees can't be trained in an online manner, you would have to create a new decision tree every x time steps.
Going further than this I'd say you'll want to start thinking in state machines and trying to use your classifier for state predictions. But this a fairly uncharted domain of theory for decision trees from when I last checked. This will work better if you already have some for of model for your data relationships in mind.

How multi label classification works in scikit-learn decision tree?

I had a problem to classify inputs which have more than one label. So problem is multi-label classification. I used scikit-learn Decision Tree classifiers to do this and it gives pretty good results at initial stages. But, I am wondering how is it working under the hood and How the split is done in Decision Tree for multi-label classification? The important question is about how one model which is initialized once can be trained with two different class of labels at the same time? How the Decision Tree model will solve out the optimization task for both different sets of labels?
Under the hood, each node in your decision tree has the same labels as the root node, however, the probability of each label is different. When you run model.predict(), the model gives the prediction as the label with the highest probability. You can use model.predict_proba() to see the probability for each label separately. You can use this code to get the probabilities correctly:
all_probs=pd.DataFrame(model.predict_proba(X_test),columns=model.classes_)

Number of Trees in Random Forest Regression

I am learning the Random Forest Regression Model. I know that it forms many Trees(models) and then we can predict our target variables by averaging the result of all Trees. I also have a descent understanding of Decision Tree Regression Algorithm. How can we form the best number of Trees?
For example i have a dataset where i am predicting person salary and i have only two input variables that are 'Years of Experience', 'Performance Score ' then how many random Trees can i form using such dataset? Are Random Forest Trees dependent upon the number of input variables? Any Good Example will highly be appreciated..
Thanks in Advance
A decision tree trains the model on the entire dataset and only one model is created. In random forest, multiple decision trees are created and each decision tree is trained on a subset of data by limiting the number of rows and the features. In your case, you only have two features so the model will create and train data on subset of data.
You can create any number of random trees for your data. Usually in random forest, more trees result in better performance but also more computation time. Experiment with your data and see the performance changes between different number of trees. If performance remains same, then use less trees to have faster computation. You can use grid search for this.
Also you can experiment with other ml models like linear regression, which migh† perform well in your case.

Weka Decision Tree getting too big (out of memory)

For classification I used Weka's J48 decision tree to build a model on several nominal attributes. Now there is more data for classification (5 nonimal attributes) but each attribute has 3000 different values. I used J48 with pruning but it ran out of memory (associated 4GB). With a smaller dataset, I saw in the output, that J48 keeps all leaves with no instances associated with it. Why are they kept in the model? Should I switch to another classifcation algorithm?

Resources