Opencv haartraining - opencv

What is the difference between real adaboost Logit boost discrete adaboost and gentle adaboost in train cascade parameter..
-bt <{DAB, RAB, LB, GAB(default)}>

from the docs
boost_type – Type of the boosting algorithm.
Possible values are:
CvBoost::DISCRETE Discrete AdaBoost.
CvBoost::REAL Real AdaBoost. It is a technique that utilizes confidence-rated predictions and works well with categorical data.
CvBoost::LOGIT LogitBoost. It can produce good regression fits.
CvBoost::GENTLE Gentle AdaBoost. It puts less weight on outlier data points and for that reason is often good with regression data.
Gentle AdaBoost and Real AdaBoost are often the preferable choices.

Related

using random forest as base classifier with adaboost

Can I use AdaBoost with random forest as a base classifier? I searched on the internet and I didn't find anyone who does it.
Like in the following code; I try to run it but it takes a lot of time:
estimators = Pipeline([('vectorizer', CountVectorizer()),
('transformer', TfidfTransformer()),
('classifier', AdaBoostClassifier(learning_rate=1))])
RF=RandomForestClassifier(criterion='entropy',n_estimators=100,max_depth=500,min_samples_split=100,max_leaf_nodes=None,
max_features='log2')
param_grid={
'vectorizer__ngram_range': [(1,2),(1,3)],
'vectorizer__min_df': [5],
'vectorizer__max_df': [0.7],
'vectorizer__max_features': [1500],
'transformer__use_idf': [True , False],
'transformer__norm': ('l1','l2'),
'transformer__smooth_idf': [True , False],
'transformer__sublinear_tf': [True , False],
'classifier__base_estimator':[RF],
'classifier__algorithm': ("SAMME.R","SAMME"),
'classifier__n_estimators':[4,7,11,13,16,19,22,25,28,31,34,43,50]
}
I tried with the GridSearchCV, I added the RF classifier into the AdaBoost parameters.
if I use it would the accuracy increase?
No wonder you have not actually seen anyone doing it - it is an absurd and bad idea.
You are trying to build an ensemble (Adaboost) which in itself consists of ensemble base classifiers (RFs) - essentially an "ensemble-squared"; so, no wonder about the high computation time.
But even if it was practical, there are good theoretical reasons not to do it; quoting from my own answer in Execution time of AdaBoost with SVM base classifier:
Adaboost (and similar ensemble methods) were conceived using decision trees as base classifiers (more specifically, decision stumps, i.e. DTs with a depth of only 1); there is good reason why still today, if you don't specify explicitly the base_classifier argument, it assumes a value of DecisionTreeClassifier(max_depth=1). DTs are suitable for such ensembling because they are essentially unstable classifiers, which is not the case with SVMs, hence the latter are not expected to offer much when used as base classifiers.
On top of this, SVMs are computationally much more expensive than decision trees (let alone decision stumps), which is the reason for the long processing times you have observed.
The argument holds for RFs, too - they are not unstable classifiers, hence there is not any reason to actually expect performance improvements when using them as base classifiers for boosting algorithms, like Adaboost.
Short answer:
It's not impossible.
I don't know if there's anything wrong with doing so in theory, but I tried this once and the accuracy increased.
Long answer:
I tried it on a typical dataset with n rows of p real-valued features, and a label list of length n. In case it matters, they are embeddings of nodes in a graph obtained by the DeepWalk algorithm, and the nodes are categorized into two classes. I trained a few classification models on this data using 5-fold cross validation, and measured common evaluation metrics for them (precision, recall, AUC etc.). The models I have used are SVM, logistic regression, random Forest, 2-layer perceptron and Adaboost with random forest classifiers. The last model, Adaboost with random forest classifiers, yielded the best results (95% AUC compared to multilayer perceptron's 89% and random forest's 88%). Sure, now the runtime has increased by a factor of, let's say, 100, but it's still about 20 mins, so it's not a constraint to me.
Here's what I thought: Firstly, I'm using cross validation, so there's probably no overfitting flying under the radar. Secondly, both are ensemble learning methods, but random forest is a bagging method, wheras Adaboost is a boosting technique. Perhaps they're still different enough for their combination to make sense?

Gradient Boosting vs Random forest

According to my understanding, RF selects features randomly and hence is hard to overfit. But, in sklearn Gradient boosting also offers the option of max_features which can help to prevent overfitting. So, why would anyone use Random forest?
Can anyone explain when to use Gradient boosting vs Random forest based on the given data?
Any help is highly appreciated.
According to my personal experience, Random Forest could be a better choice when..
You train a model on small data set.
Your data set has few features to learn.
Your data set has low Y flag count or you try to predict a situation that has low chance to occur or rarely occurs.
In these situations, Gradient Boosting algorithms like XGBoost and Light GBM can overfit (though their parameters are tuned) while simple algorithms like Random Forest or even Logistic Regression may perform better. To illustrate, for XGboost and Ligh GBM, ROC AUC from test set may be higher in comparison with Random Forest but shows too high difference with ROC AUC from train set.
Despite the sharp prediction form Gradient Boosting algorithms, in some cases, Random Forest take advantage of model stability from begging methodology (selecting randomly) and outperform XGBoost and Light GBM. However, Gradient Boosting algorithms perform better in general situations.
Similar question asked on Quora:
https://www.quora.com/How-do-random-forests-and-boosted-decision-trees-compare
I agree with the author at the link that random forests are more robust -- they don't require much problem-specific tuning to get good results. Besides that, a couple other items based on my own experience:
Random forests can perform better on small data sets; gradient boosted trees are data hungry
Random forests are easier to explain and understand. This perhaps seems silly but can lead to better adoption of a model if needed to be used by less technical people
I think that's also true. I have also read on this page How Random Forest Works
There explains the advantages of random forest. like this :
For applications in classification problems, Random Forest algorithm
will avoid the overfitting problem
For both classification and
regression task, the same random forest algorithm can be used
The Random Forest algorithm can be used for identifying the most
important features from the training dataset, in other words,
feature engineering.

Text classification algorithms which are not Naive?

Naive Bayes Algorithm assumes independence among features. What are some text classification algorithms which are not Naive i.e. do not assume independence among it's features.
The answer will be very straight forward, since nearly every classifier (besides Naive Bayes) is not naive. Features independence is very rare assumption, and is not taken by (among huge list of others):
logistic regression (in NLP community known as maximum entropy model)
linear discriminant analysis (fischer linear discriminant)
kNN
support vector machines
decision trees / random forests
neural nets
...
You are asking about text classification, but there is nothing really special about text, and you can use any existing classifier for such data.

High bias or variance? - SVM and weired learning curves

I have never seen such learning curves. Am I right, that huge overfitting occurs? The model is fitting better and better to the training data, while it generalizes worse for the test data.
Usually when there is high variance, like here, more examples should help. In this case, they won't, I suspect. Why is that? Why such example of learning curves can't be found easily in literature/tutorials?
Learning curves. SVM, param1 is C, param2 is gamma
You have to remember that SVM is non parametric model, thus more samples does not have to reduce variance. Reduction in variance can be more or less guaranteed for parametric model (like neural net), but SVM is not one of them - more samples mean not only better training data but also more complex model. Your learning curves are typical example of SVM overfitting, which happens a lot with RBF kernel.

Difference between classification and regression, with SVMs

What is the exact difference between a Support Vector Machine classifier and a Support Vector Machine regresssion machine?
The one sentence answer is that SVM classifier performs binary classification and SVM regression performs regression.
While performing very different tasks, they are both characterized by following points.
usage of kernels
absence of local minima
sparseness of the solution
capacity control obtained by acting on the margin
number of support vectors, etc.
For SVM classification the hinge loss is used, for SVM regression the epsilon insensitive loss function is used.
SVM classification is more widely used and in my opinion better understood than SVM regression.

Resources