If I have two very different datasets and two very different classification techniques, is there a good way to combine the two outputs? I understand an average may work but is there a more relevant way to do this? I've heard of several concepts like boosting and ensemble learning, would these be applicable?
There are two general ways to go about this problem. The first, called boosting, uses weighted voting to decide on the prediction. The main idea is to combine advantages of both classifiers.
The second approach, called stacking, uses the outputs of the two classifiers as features into another classifier (possibly with other features, e.g. the original ones), and use the output of the final classifier for the prediction.
In the absence of further details, this is the best answer I can give.
See Bagging, boosting and stacking in machine learning on Stats.SE for more.
Related
If I have two very different types of datasets and two very different classification techniques where output class labels of both data sources are same, is there a good way to combine the two outputs? I've heard of several concepts like boosting and ensemble learning, would these be applicable?
I have heard of ensemble learning and multimodal approaches.
I have a dataset that contains around 30 features and I want to find out which features contribute the most to the outcome. I have 5 algorithms:
Neural Networks
Logistics
Naive
Random Forest
Adaboost
I read a lot about Information Gain technique and it seems it is independent of the machine learning algorithm used. It is like a preprocess technique.
My question follows, is it best practice to perform feature importance for each algorithm dependently or just use Information Gain. If yes what are the technique used for each ?
First of all, it's worth stressing that you have to perform the feature selection based on the training data only, even if it is a separate algorithm. During testing, you then select the same features from the test dataset.
Some approaches that spring to mind:
Mutual information based feature selection (eg here), independent of the classifier.
Backward or forward selection (see stackexchange question), applicable to any classifier but potentially costly since you need to train/test many models.
Regularisation techniques that are part of the classifier optimisation, eg Lasso or elastic net. The latter can be better in datasets with high collinearity.
Principal components analysis or any other dimensionality reduction technique that groups your features (example).
Some models compute latent variables which you can use for interpretation instead of the original features (e.g. Partial Least Squares or Canonical Correlation Analysis).
Specific classifiers can aid interpretability by providing extra information about the features/predictors, off the top of my head:
Logistic regression: you can obtain a p-value for every feature. In your interpretation you can focus on those that are 'significant' (eg p-value <0.05). (same for two-classes Linear Discriminant Analysis)
Random Forest: can return a variable importance index that ranks the variables from most to least important.
I have a dataset that contains around 30 features and I want to find out which features contribute the most to the outcome.
This will depend on the algorithm. If you have 5 algorithms, you will likely get 5 slightly different answers, unless you perform the feature selection prior to classification (eg using mutual information). One reason is that Random Forests and neural networks would pick up nonlinear relationships while logistic regression wouldn't. Furthermore, Naive Bayes is blind to interactions.
So unless your research is explicitly about these 5 models, I would rather select one model and proceed with it.
Since your purpose is to get some intuition on what's going on, here is what you can do:
Let's start with Random Forest for simplicity, but you can do this with other algorithms too. First, you need to build a good model. Good in the sense that you need to be satisfied with its performance and it should be Robust, meaning that you should use a validation and/or a test set. These points are very important because we will analyse how the model takes its decisions, so if the model is bad you will get bad intuitions.
After having built the model, you can analyse it at two level : For the whole dataset (understanding your process), or for a given prediction. For this task I suggest you to look at the SHAP library which computes features contributions (i.e how much does a feature influences the prediction of my classifier) that can be used for both puproses.
For detailled instructions about this process and more tools, you can look fast.ai excellent courses on the machine learning serie, where lessons 2/3/4/5 are about this subject.
Hope it helps!
Why behaviour of different classifier differ for different data?
Based on what parameters we can decide the good classifier for particular dataset?
For some dataset naive bayes gives better accuracy than SVM classifier
and for other dataset SVM performs better than naive bayes. Why is it
so? What is the reason?
Those are completely different classifiers. If you would have one classifier which is allways better than the other one. Why would you need the "bad one" then?
First google hit about when SVM's are not the best choice:
https://www.quora.com/For-what-kind-of-classification-problems-is-SVM-a-bad-approach
There is no general answer for this question. To understand which classifier to be used when you will need to understand the algorithm behind the classification procedure.
For instance logistic regression assumes a normal distribution of y and is generally useful when a particular parameter is not a uniquely deciding factor however combined weightage of the factors make a difference, for instance in text classification.
Decision tree on the other hand splits on the basis of parameter which gives most information. So if you have a set of parameters which is highly correlated with the label, then it makes more sense to use decision tree based classifiers.
SVM, work based on identifying adequate hyperplanes. These are generally useful when it is not possible to classify data in one plane but projecting them into higher plane classifies them easily. This is a nice tutorial on SVM https://blog.statsbot.co/support-vector-machines-tutorial-c1618e635e93
In short the only way to learn which classifier will be better in which situation is to understand how they work, and then figure out if they are best for your situation.
Another, crude way will be try every classifier and pick the best one, but i don't think you are interested in that.
I am intended to do a yes/no classifier. The problem is that the data does not come from me, so I have to work with what I have been given. I have around 150 samples, each sample contains 3 features, these features are continuous numeric variables. I know the dataset is quite small. I would like to make you two questions:
A) What would be the best machine learning algorithm for this? SVM? a neural network? All that I have read seems to require a big dataset.
B)I could make the dataset a little bit bigger by adding some samples that do not contain all the features, only one or two. I have read that you can use sparse vectors in this case, is this possible with every machine learning algorithm? (I have seen them in SVM)
Thanks a lot for your help!!!
My recommendation is to use a simple and straightforward algorithm, like decision tree or logistic regression, although, the ones you refer to should work equally well.
The dataset size shouldn't be a problem, given that you have far more samples than variables. But having more data always helps.
Naive Bayes is a good choice for a situation when there are few training examples. When compared to logistic regression, it was shown by Ng and Jordan that Naive Bayes converges towards its optimum performance faster with fewer training examples. (See section 4 of this book chapter.) Informally speaking, Naive Bayes models a joint probability distribution that performs better in this situation.
Do not use a decision tree in this situation. Decision trees have a tendency to overfit, a problem that is exacerbated when you have little training data.
I have two dependent continuous variables and i want to use their combined values to predict the value of a third binary variable. How do i go about discretizing/categorizing the values? I am not looking for clustering algorithms, i'm specifically interested in obtaining 'meaningful' discrete categories i can subsequently use in in a Bayesian classifier.
Pointers to papers, books, online courses, all very much appreciated!
That is the essence of machine learning and problem one of the most studied problem.
Least-square regression, logistic regression, SVM, random forest are widely used for this type of problem, which is called binary classification.
If your goal is to pragmatically classify your data, several libraries are available, like Scikits-learn in python and weka in java. They have a great documentation.
But if you want to understand what's the intrinsics of machine learning, just search (here or on google) for machine learning resources.
If you wanted to be a real nerd, generate a bunch of different possible discretizations and then train a classifier on it, and then characterize the discretizations by features and then run a classifier on that, and see what sort of discretizations are best!?
In general discretizing stuff is more of an art and having a good understanding of what the input variable ranges mean.