I'm trying to convert an implementation of scikit-learn to OpenCV of several Machine Learning algorithms.
First of all, do you know of any specific question/document where I can find the parameters equivalence?
If not, in the specific case of Decision Trees, is the max_categories of OpenCv the equivalent of max_features in scikit-learn?
in the specific case of Decision Trees, is the max_categories of OpenCV the equivalent of max_features in scikit-learn?
It is not.
From the OpenCV docs:
maxCategories – Cluster possible values of a categorical variable into [...]
while scikit-learn does not even support directly categorical variables as predictors.
Related
I am trying to read the documentation of Vowpal Wabbit and it doesn't specify how to select specific learning algorithms (Not loss) like SVM,NN, Decision trees, etc. How does one select a specific learning algorithm?
Or does it select the algorithm itself depending on problem type (regression/classification like an automl type or low-code ML library?
There are some blogs showing to use Neural networks with -nn command but that isn't part of documentation--is this because it doesn't focus on specific algorithm, as noted above? If so, What is Vowpal Wabbit in essence?
Vowpal Wabbit is based on online learning (SGD-like updates, but there is also --bfgs if you really need batch optimization) and (machine learning) reductions. See some of the tutorials or papers to understand the idea of reductions. Many VW papers are also about Contextual Bandit, which is implemented as a reduction to a cost-sensitive one-against-all (OAA) classification (which is further reduced to regression). See a simple intro into reductions or a simple example how binary classification is reduced into regression.
As far as I know, VowpalWabbit does not support Decision trees nor ensembles, but see --boosting and --bootstrap. It does not support SVM, but see --loss_function hinge (hinge loss is one of the two key concepts of SVM) and --ksvm. It does not support NN, but --nn (and related options) provides a very limited support simulating a single hidden layer (feed-forward with tanh activation function), which can be added into the reduction stack.
I am recently found a model to classify the Irish flower based on the size of its leaf. There are 3 types of flowers as a target (dependent variable). As I know, the categorical data should be encoded so that it can be used in machine learning. However, in the model the data is used directly without encoding process.
Can anyone help to explain when to use encoding? Thank you in advance!
Relevant question - encoding of continuous feature variables.
Originally, the Iris data were published by Fisher when he published his linear discriminant classifier.
Generally, a distinction is made between:
Real-value classifiers
Discrete feature classifiers
Linear discriminant analysis and quadratic discriminant analysis are real-value classifiers. Trying to add discrete variables as extra input does not work. Special procedures for working with indicator variables (the name used in statistics) in discriminant analysis have been developed. Also the k-nearest neighbour classifier really only works well with real-valued feature variables.
The naive Bayes classifier is most commonly used for classification problems with discrete features. When you don't want to assume conditional independence between the feature variables, the multinomial classifier can be applied to discrete features. A classifier service that does all this for you in one go, is insight classifiers.
Neural networks and support vector machines combine real-valued and discrete features. My advice is to use one separate input node for each discrete outcome - don't use one single input node provided with values like: (0: small, 1: minor, 2: medium, 3: larger, 4: big). One input-node-per-outcome-encoding will improve your training result and yield better test set performance.
The random forest classifier also combines real-valued and discrete features seamlessly.
Final advice is to train and test-set compare at least 4 different types of classifiers, as there is no such thing as the universal best type of classifier.
What is the difference if we use Decision Tree as Base estimator in AdaBoost algorithm ?
Is Random Forest a special case of AdaBoost?
Most certainly not; Random Forest is a case of bagging ensemble algorithm (short for bootstrap aggregating), which is different from boosting - check here for their differences.
What is the difference if we use Decision Tree as Base estimator in AdaBoost algorithm ?
You don't get a Random Forest, but a Gradient Tree Boosting Machine, available in several packages like xgboost (R/Python), gbm (R), scikit-learn (Python) etc.
Check chapter 8 of the excellent (and freely available) book An Introduction to Statistical Learning for more, or The Elements of Statistical Learning (heavy in math & theory, not for the faint-hearted)...
I use scikit learn for classification. And mainly work with NAive bayes, SVM, Neural network. There are variant in each of them.
I see for training algo create vectors. What does this vector contains?
For all algorithm does it consider word frequency as a feature? If yes then how they differ?
For text classification you usually create a vector of words frequency, or tf-idf to be able to compute distances between two documents. You could use all kinds of method to create these weights on word.
The words (features) can be extracted by just a splitting the documents on separator but you can use more complex methods like stemming (keep only the root of the words).
You will find lots of example in the sklearn documentation. For instance :
http://scikit-learn.org/stable/auto_examples/text/document_classification_20newsgroups.html
This iPython Notebook could be a good start too.
NLopt is a solver for optimization, which implements different optimization algorithms and is implemented in different languages.
In order to use the LD_LBFGS algorithm in Julia, does the variable have to be a vector as opposed to a matrix?
If yes, once we need to optimize an objective which is a univariate function of a matrix variable, do we have to vectorize the matrix to be able to use this package?
Yes, NLopt only understands vectors of decision variables. If your code is more naturally expressed in terms of matrices, then you should convert the vector into a matrix in the function and derivative evaluation callbacks using reinterpret.