Multi-Class Classification in WEKA

Multi-Class Classification in WEKA - machine-learning

I am trying to implement Multiclass classification in WEKA.
I have lot of rows, say bank transactions, and one is tagged as Food,Medicine,Rent,etc. I want to develop a classifier which can be trained with the previous data I have and predict the class it can belong to for future transactions. If I am right this is Multiclass and not multilabel since each transaction can belong to only one class.
Below are a few algorithms I am considering
Naive Bayes
Multinomial Logistic Regression
Multiclass SVM
Max Entropy
Neural Networks (if possible)
In my data Number of features <<< Number of transactions and hence I am thinking of one vs rest binary classifier instead of one vs one.
Are there any other algorithms I should lok into which will help with my goal?
Is there any algos that I put are useless for my goal?
Also,I found that scikit-learn in Python is better than WEKA but I can run scikit-learn only on one processor. Is this true?
Answers to any question would be helpful.
Thanks!

You can look at RandomForest which is a well known classifier and quite efficient.
In scikit-learn, you have some class that can be used over several core like RandomForestClassifier. It has a constructor parameter that can be used to define the number of core or a value that will use every available core. Look at the documentation, constructor that contains n_jobs parameter can be used over several core

Related

Why do we use metric learning when we can classify

So far, I have read some highly cited metric learning papers. The general idea of such papers is to learn a mapping such that mapped data points with same label lie close to each other and far from samples of other classes. To evaluate such techniques they report the accuracy of the KNN classifier on the generated embedding. So my question is if we have a labelled dataset and we are interested in increasing the accuracy of classification task, why do not we learn a classifier on the original datapoints. I mean instead of finding a new embedding which suites KNN classifier, we can learn a classifier that fits the (not embedded) datapoints. Based on what I have read so far the classification accuracy of such classifiers is much better than metric learning approaches. Is there a study that shows metric learning+KNN performs better than fitting a (good) classifier at least on some datasets?

Metric learning models CAN BE classifiers. So I will answer the question that why do we need metric learning for classification.
Let me give you an example. When you have a dataset of millions of classes and some classes have only limited examples, let's say less than 5. If you use classifiers such as SVMs or normal CNNs, you will find it impossible to train because those classifiers (discriminative models) will totally ignore the classes of few examples.
But for the metric learning models, it is not a problem since they are based on generative models.
By the way, the large number of classes is a challenge for discriminative models itself.
The real-life challenge inspires us to explore more better models.

As #Tengerye mentioned, you can use models trained using metric learning for classification. KNN is the simplest approach but you can take the embeddings of your data and train another classifier, be it KNN, SVM, Neural Network, etc. The use of metric learning, in this case, would be to change the original input space to another one which would be easier for a classifier to handle.
Apart from discriminative models being hard to train when data is unbalanced, or even worse, have very few examples per class, they cannot be easily extended for new classes.
Take for example facial recognition, if facial recognition models are trained as classification models, these models would only work for the faces it has seen and wouldn't work for any new face. Of course, you could add images for the faces you wish to add and retrain the model or fine-tune the model if possible, but this is highly impractical. On the other hand, facial recognition models trained using metric learning can generate embeddings for new faces, which can be easily added to the KNN and your system then can identify the new person given his/her image.

When to use supervised or unsupervised learning?

Which are the fundamental criterias for using supervised or unsupervised learning?
When is one better than the other?
Is there specific cases when you can only use one of them?
Thanks

If you a have labeled dataset you can use both. If you have no labels you only can use unsupervised learning.
It´s not a question of "better". It´s a question of what you want to achieve. E.g. clustering data is usually unsupervised – you want the algorithm to tell you how your data is structured. Categorizing is supervised since you need to teach your algorithm what is what in order to make predictions on unseen data.
See 1.
On a side note: These are very broad questions. I suggest you familiarize yourself with some ML foundations.
Good podcast for example here: http://ocdevel.com/podcasts/machine-learning
Very good book / notebooks by Jake VanderPlas: http://nbviewer.jupyter.org/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/Index.ipynb

Depends on your needs. If you have a set of existing data including the target values that you wish to predict (labels) then you probably need supervised learning (e.g. is something true or false; or does this data represent a fish or cat or a dog? Simply put - you already have examples of right answers and you are just telling the algorithm what to predict). You also need to distinguish whether you need a classification or regression. Classification is when you need to categorize the predicted values into given classes (e.g. is it likely that this person develops a diabetes - yes or no? In other words - discrete values) and regression is when you need to predict continuous values (1,2, 4.56, 12.99, 23 etc.). There are many supervised learning algorithms to choose from (k-nearest neighbors, naive bayes, SVN, ridge..)
On contrary - use the unsupervised learning if you don't have the labels (or target values). You're simply trying to identify the clusters of data as they come. E.g. k-Means, DBScan, spectral clustering..)
So it depends and there's no exact answer but generally speaking you need to:
Collect and see you data. You need to know your data and only then decide which way you choose or what algorithm will best suite your needs.
Train your algorithm. Be sure to have a clean and good data and bear in mind that in case of unsupervised learning you can skip this step as you don't have the target values. You test your algorithm right away
Test your algorithm. Run and see how well your algorithm behaves. In case of supervised learning you can use some training data to evaluate how well is your algorithm doing.
There are many books online about machine learning and many online lectures on the topic as well.

Depends on the data set that you have.
If you have target feature in your hand then you should go for supervised learning. If you don't have then it is a unsupervised based problem.
Supervised is like teaching the model with examples. Unsupervised learning is mainly used to group similar data, it plays a major role in feature engineering.
Thank you..

Collecting Machine learning training data

I am very new to machine learning, and need a couple of things clarified. I am trying to predict the probability of someone liking an activity based on their Facebook likes. I am using the Naive Bayes classifier, but am unsure on a couple of things. 1. What would my labels/inputs be? 2. What info do I need to collect for training data? My guess is create a survey and have questions on wether the person would enjoy an activity (Scale from 1-10)

In supervised classification, all classifiers need to be trained with known labeled data, this data is known as training data. Your data should have a vector of features followed by a special one called class. In your problem, if the person has enjoyed the activity or not.
Once you train the classifier, you should test it's behavior with another dataset in order not to be biased. This dataset must have the class as the train data. If you train and test with the same datasets your classifiers prediction may be really nice but unfair.
I suggest you to take a look to evaluation techniques like K Fold Cross Validation.
Another thing you should know is that the common Naïve Bayes classifier is used to predict binary data, so your class should be 0 or 1 meaning that the person you make a survey enjoyed or not the activity. Also it's implemented in packages like Weka (Java) or SkLearn (Python).
If you are really interested in Bayesian Classifiers I need to say that in fact, Naïve Bayes for binary classification is not the best one because Minsky in 1961 discovered that the decision boundaries are hyperplanes. Also the Brier Score is really bad and it is say that this classifier is not well calibrated. But, it make good predictions after all.
Hope it helps.

This may be fairly difficult with Naive Bayes. You'll need to collect (or calculate) samples of whether or not a person likes activity X, and also details on their Facebook likes (organized in some consistent way).
Basically, for Naive Bayes, your training data should be the same data type as your testing data.
The survey approach may work, if you have access to each person's Facebook like history.

Gaussian process multi-class classification

Is it possible to use GPML (http://www.gaussianprocess.org/gpml/) toolbox for multi-class classification? (The dataset that I am using has about 7000 training samples from 3 different classes).

You can always use a binary classifier for multi-class classification. Notably, by making all-pair comparisons, a simple majority vote gives a very reliable answer. It, however, needs O(n2) applications of binary classifiers for the problem with n classes. But that's not an issue for 3 classes.

How to categorize continuous data?

I have two dependent continuous variables and i want to use their combined values to predict the value of a third binary variable. How do i go about discretizing/categorizing the values? I am not looking for clustering algorithms, i'm specifically interested in obtaining 'meaningful' discrete categories i can subsequently use in in a Bayesian classifier.
Pointers to papers, books, online courses, all very much appreciated!

That is the essence of machine learning and problem one of the most studied problem.
Least-square regression, logistic regression, SVM, random forest are widely used for this type of problem, which is called binary classification.
If your goal is to pragmatically classify your data, several libraries are available, like Scikits-learn in python and weka in java. They have a great documentation.
But if you want to understand what's the intrinsics of machine learning, just search (here or on google) for machine learning resources.

If you wanted to be a real nerd, generate a bunch of different possible discretizations and then train a classifier on it, and then characterize the discretizations by features and then run a classifier on that, and see what sort of discretizations are best!?
In general discretizing stuff is more of an art and having a good understanding of what the input variable ranges mean.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart