OpenCV: How to get weak lerners from adaboost in - opencv

Is there a way to extract the features corresponding to the weak learners from the adaboost algorithm implemented in Opencv ?
I know that adaboost combines a set of weak learners based on a set of input features.
The same features are measured for each sample in the training set.
Usually adaboost uses a decision stump and sets a threshold for each feature and chooses the decision stump having the minimum error. I want to find out what are the features that generated the weak learners.
Thanks.

You simply have to save the model and extract the trees/stump from the text file.
The save() api is quite simple to use. In the file you will find items like this:
"splits:
- { var:448, quality:5.0241161137819290e-002,
le:1.7250000000000000e+002 }"
The number next to "var" is the feature index and the "le" is the "less than" value for this feature.

Related

Is there any meaning of having bootstrap=True, when max_sample=1.0 in case of bagging ensemble learning?

In bagging ensemble technique if I use n_estimator=500, max_sample=1.0, and bootstrap=True, then is it not equivalent to n_estimator=500, and bootstrap=False, as in both case only one sample of 500 training instances will be given to our predictor? Assuming number of training instances=500.
No, it is not equivalent.
When you specify bootstrap=False, you are basically saying that each of your weak estimators should be trained using every data point in your training set exactly once.
When you specify bootstrap=True, you are drawing with replacement, meaning that some data points might be used more than once and others not. max_samples is just a way to determine out of what data fraction you can bootstrap.

What is a weak learner?

I want to compare different error rates of different classifiers with the error rate from a weak learner (better than random guessing). So, my question is, what are a few choices for a simple, easy to process weak learner? Or, do I understand the concept incorrectly, and is a weak learner simply any benchmark that I choose (for example, a linear regression)?
better than random guessing
That is basically the only requirement for a weak learner. So long as you can consistently beat random guessing, any true boosting algorithm will be able to increase the accuracy of the final ensemble. What weak learner you should choose is then a trade off between 3 factors:
The bias of the model. A lower bias is almost always better, but you don't want to pick something that will overfit (yes, boosting can and does overfit)
The training time for the weak learner. Generally we want to be able to learn a weak learner quickly, as we are going to be building a few hundred (or thousand) of them.
The prediction time for our weak learner. If we use a model that has a slow prediction rate, our ensemble of them is going to be a few hundred times slower!
The classic weak learner is a decision tree. By changing the maximum depth of the tree, you can control all 3 factors. This makes them incredibly popular for boosting. What you should be using depends on your individual problem, but decision trees is a good starting point.
NOTE: So long as the algorithm supports weighted data instances, any algorithm can be used for boosting. A guest speaker at my University was boosting 5 layer deep neural networks for his work in computational biology.
Weak learners are basically thresholds for each feature. One simple example is a 1-level decision tree called decision stump applied in bagging or boosting. It just chooses a threshold for one feature and splits the data on that threshold (for example, to determine whether the iris flower is Iris versicolor or Iris virginica based on the petal width). Then it is trained on this specific feature by bagging or AdaBoost.

Selecting Best Features in a feature vector using Adaboost

I've read some documentation on how Adaboost works but have some questions regarding it.
I've also read that Adaboost also picks best features from data apart from weighting weak classifiers to and use them in testing phase to perform classification efficiently.
How does Adaboost pick best features from the data?
Correct me if my understanding of Adaboost is wrong!
In some cases the weak classifiers in Adaboost are (almost) equal to features. In other words, using a single feature to classify can result in slightly better than random performance, so it can be used as a weak classifier. Adaboost will find the set of best weak classifiers given the training data, so if the weak classifiers are equal to features then you will have an indication of the most useful features.
An example of weak classifiers resembling features are decision stumps.
OK, adaboost selects features based on its basic learner, tree. For a single tree, there are several means to estimate how much contribution a single feature does to the tree, called relative importance somewhere. For adaboosting, an ensamble method, containing several such trees, the relative significance of each feature to the final model can be calculated by measuring significance of each feature to each tree then average it.
Hope this can help you.

Basic understanding of the Adaboost algorithm

I'm a machine learning newbie trying to understand how Adaboost works.
I've read many articles explaining how Adaboost makes use of set of weak *classifiers* to create a strong classifier.
However, I seem to have problem understanding the statement that "Adaboost creates a Strong Classifier".
When I looked at implementations of Adaboost, I've realized that it doesn't "actually" create a Strong Classifier but somehow in the TESTING PHASE figures out on "how to use set of Weak Classifiers to get more accurate results" which in turn acts like a strong classifier "Collectively".
So technically there is NO SINGLE STRONG CLASSIFIER created (but set of weak classifiers collectively act as a strong classifier).
Please correct me if I'm wrong. It would be nice if someone can throw in some comments regarding this.
A classifier is a black box that receives an input (feature vectors) and returns an output (labeled vectors). So to call something a classifier, you only care about what it does, and not how it does it. AdaBoost's classifier can be seen as such black box, so it's indeed a single classifier, even if it uses internally several weak classifiers to produce such output.

Weak hypotheses in boosting method

What is the weak hypotheses generated during boosting?
I'm guessing that you mean the weak classifiers that are combined in boosting? Often these are decision trees only a few layers deep. They are trained, one after another, on the dataset weighted such that data points the last classifier got wrong are given more weight.
Check these notes from a UPenn machine learning class for more information:
http://alliance.seas.upenn.edu/~cis520/wiki/index.php?n=Lectures.Boosting

Resources