How to merge different datasets for Classification Problem in deep learning - machine-learning

If I have two very different types of datasets and two very different classification techniques where output class labels of both data sources are same, is there a good way to combine the two outputs? I've heard of several concepts like boosting and ensemble learning, would these be applicable?
I have heard of ensemble learning and multimodal approaches.

Related

feature extraction, selection, and classification concepts

I know that support vector machine, random tree forest and logistic regression are famous machine learning (ML)algorithms for classification.
I'm confused the terminology between a feature extraction, selection and classification.
Does the above ML algorithms are used for extracting features not part of selecting?
Does the ML algorithms include both process of feature extraction and classification?
Does the result of training the ML algorithm (accuracy, specificity, sensitivity..) tell us the result of classifying a disease after the feature extraction?
Regarding your confusion about the 3 terminologies,
Feature extraction: When you want to create new features out of raw data (say you have the transaction_day column but you are only interested in the month, so you create a new column "transaction_month" out of "transaction_day")
Feature selection: You have many features but want to select only the important ones (how many of them is another topic to be studied). This could speed up the process of learning and with the right strategy, you would not sacrifice accuracy in many applications.
Classification: Is a family of supervised (labeled) machine learning that your goal is to assign observations to known classes (for example emails to spam or normal class)
Note: Some of machine learning algorithms like "Lasso" have build-in feature selection but for others, large coefficient of the feature after training usually shows the importance of the feature (read more about recursive feature elimination (rfe))
you may also find a good discussion in this post.

Why do we use metric learning when we can classify

So far, I have read some highly cited metric learning papers. The general idea of such papers is to learn a mapping such that mapped data points with same label lie close to each other and far from samples of other classes. To evaluate such techniques they report the accuracy of the KNN classifier on the generated embedding. So my question is if we have a labelled dataset and we are interested in increasing the accuracy of classification task, why do not we learn a classifier on the original datapoints. I mean instead of finding a new embedding which suites KNN classifier, we can learn a classifier that fits the (not embedded) datapoints. Based on what I have read so far the classification accuracy of such classifiers is much better than metric learning approaches. Is there a study that shows metric learning+KNN performs better than fitting a (good) classifier at least on some datasets?
Metric learning models CAN BE classifiers. So I will answer the question that why do we need metric learning for classification.
Let me give you an example. When you have a dataset of millions of classes and some classes have only limited examples, let's say less than 5. If you use classifiers such as SVMs or normal CNNs, you will find it impossible to train because those classifiers (discriminative models) will totally ignore the classes of few examples.
But for the metric learning models, it is not a problem since they are based on generative models.
By the way, the large number of classes is a challenge for discriminative models itself.
The real-life challenge inspires us to explore more better models.
As #Tengerye mentioned, you can use models trained using metric learning for classification. KNN is the simplest approach but you can take the embeddings of your data and train another classifier, be it KNN, SVM, Neural Network, etc. The use of metric learning, in this case, would be to change the original input space to another one which would be easier for a classifier to handle.
Apart from discriminative models being hard to train when data is unbalanced, or even worse, have very few examples per class, they cannot be easily extended for new classes.
Take for example facial recognition, if facial recognition models are trained as classification models, these models would only work for the faces it has seen and wouldn't work for any new face. Of course, you could add images for the faces you wish to add and retrain the model or fine-tune the model if possible, but this is highly impractical. On the other hand, facial recognition models trained using metric learning can generate embeddings for new faces, which can be easily added to the KNN and your system then can identify the new person given his/her image.

How to combine two machine learning algorithm outputs?

If I have two very different datasets and two very different classification techniques, is there a good way to combine the two outputs? I understand an average may work but is there a more relevant way to do this? I've heard of several concepts like boosting and ensemble learning, would these be applicable?
There are two general ways to go about this problem. The first, called boosting, uses weighted voting to decide on the prediction. The main idea is to combine advantages of both classifiers.
The second approach, called stacking, uses the outputs of the two classifiers as features into another classifier (possibly with other features, e.g. the original ones), and use the output of the final classifier for the prediction.
In the absence of further details, this is the best answer I can give.
See Bagging, boosting and stacking in machine learning on Stats.SE for more.

Is it possible to use a support vector machine in combination with agglomerative clusterer?

Is it possible to use support vector machine in combination with a clustering algorithm somehow? What is a sample use-case where both of them need to communicate with each other?
You can always use clustering to partition your data set and learn multiple classifiers, then use ensemble methods to combine the classification results.
If a class consists of multiple clusters, this can improve accuracy to learn both sub-classes and merge them after classification.

How to categorize continuous data?

I have two dependent continuous variables and i want to use their combined values to predict the value of a third binary variable. How do i go about discretizing/categorizing the values? I am not looking for clustering algorithms, i'm specifically interested in obtaining 'meaningful' discrete categories i can subsequently use in in a Bayesian classifier.
Pointers to papers, books, online courses, all very much appreciated!
That is the essence of machine learning and problem one of the most studied problem.
Least-square regression, logistic regression, SVM, random forest are widely used for this type of problem, which is called binary classification.
If your goal is to pragmatically classify your data, several libraries are available, like Scikits-learn in python and weka in java. They have a great documentation.
But if you want to understand what's the intrinsics of machine learning, just search (here or on google) for machine learning resources.
If you wanted to be a real nerd, generate a bunch of different possible discretizations and then train a classifier on it, and then characterize the discretizations by features and then run a classifier on that, and see what sort of discretizations are best!?
In general discretizing stuff is more of an art and having a good understanding of what the input variable ranges mean.

Resources