I am using LIBLINEAR and i need to know whether Multi-Label Prediction in windows is possible or not.I tried google but no luck
I want the output to be produced the following way
I train some 10 documents with three class labels 1,2,3 and now when i feed a test document to the classifier and if the document belongs to label 1 and 2 then it should produce 1,2 or something else which shows that document belongs to 1 and 2 both the class labels
I want an example in windows
Thanks
By default neither libSvm nor LibLinear supports multiclass classification.
You need to perform a One against all approach.
You can find help on the libsvm page which provides different tools for multi-label classification that are based on LIBSVM or LIBLINEAR
Related
Is there any way to predict a single-label output using multi-label features?
I am now working with a document type prediction model.
Each document has at least one label and 7 different labels are used in labelling the data.
Given a series of documents, I am trying to predict the label for the current document based on labels of the previous documents.
I'd say this problem is a multi-class classification with multi-label features as I'm trying to make a machine give only 1 possible label for an unknown input.
I've tried both multi-class and multi-label classification on Scikit Learn. My impression is that we can only perform multi-label classification with multi-labelled data. Are there any Scikit Learn classifiers that can do multi-label --> single label predictions? If not, are there any other ways to do so?
You should try Simple Transformer Models and I am giving you a link where you explore
the different models related to multiclass and multilabel
https://simpletransformers.ai/docs/usage/#configuring-a-simple-transformers-model
I am new to Machine Learning.I am working on a project where the machine learning concept need to be applied.
Problem Statement:
I have large number(say 3000)key words.These need to be classified into seven fixed categories.Each category is having training data(sample keywords).I need to come with a algorithm, when a new keyword is passed to that,it should predict to which category this key word belongs to.
I am not aware of which text classification technique need to applied for this.do we have any tools that can be used.
Please help.
Thanks in advance.
This comes under linear classification. You can use naive-bayes classifier for this. Most of the ml frameworks will have an implementation for naive-bayes. ex: mahout
Yes, I would also suggest to use Naive Bayes, which is more or less the baseline classification algorithm here. On the other hand, there are obviously many other algorithms. Random forests and Support Vector Machines come to mind. See http://machinelearningmastery.com/use-random-forest-testing-179-classifiers-121-datasets/ If you use a standard toolkit, such as Weka, Rapidminer, etc. these algorithms should be available. There is also OpenNLP for Java, which comes with a maximum entropy classifier.
You could use the Word2Vec Word Cosine distance between descriptions of each your category and keywords in the dataset and then simple match each keyword to a category with the closest distance
Alternatively, you could create a training dataset from already matched to category, keywords and use any ML classifier, for example, based on artificial neural networks by using vectors of keywords Cosine distances to each category as an input to your model. But it could require a big quantity of data for training to reach good accuracy. For example, the MNIST dataset contains 70000 of the samples and it allowed me reach 99,62% model's cross validation accuracy with a simple CNN, for another dataset with only 2000 samples I was able reached only about 90% accuracy
There are many classification algorithms. Your example looks to be a text classification problems - some good classifiers to try out would be SVM and naive bayes. For SVM, liblinear and libshorttext classifiers are good options (and have been used in many industrial applcitions):
liblinear: https://www.csie.ntu.edu.tw/~cjlin/liblinear/
libshorttext:https://www.csie.ntu.edu.tw/~cjlin/libshorttext/
They are also included with ML tools such as scikit-learna and WEKA.
With classifiers, it is still some operation to build and validate a pratically useful classifier. One of the challenges is to mix
discrete (boolean and enumerable)
and continuous ('numbers')
predictive variables seamlessly. Some algorithmic preprocessing is generally necessary.
Neural networks do offer the possibility of using both types of variables. However, they require skilled data scientists to yield good results. A straight-forward option is to use an online classifier web service like Insight Classifiers to build and validate a classifier in one go. N-fold cross validation is being used there.
You can represent the presence or absence of each word in a separate column. The outcome variable is desired category.
I am a beginner to SVM which i have successfully implemented one-class classification.Now i want to know about multi-class classification which am very much confused about.
I went through How to do multi class classification using Support Vector Machines (SVM) which i want the exact same output but the link does not have a specific example using windows.If anyone can help me out with an example in windows for both “ONE-AGAINST-ONE”,”ONE-AGAINST-ALL” methods of multi-class classification
Thanks
Using libLinear you will not be able to have a similar output because it cannot predict probabilities. You should use libSVM for that.
LibLinear does not support multi-class classification by default, but you can download this tool from the official site and it can do the job.
If you want multi-class probability estimate, you can take a look at this tool
I think the other answer is talking about multi-label classification or something because liblinear does support multi-class classification by default (Choose -s to be between 0 and 7, there are 8 different modes https://github.com/cjlin1/liblinear/blob/master/README)
For the input, you can use the same as binary classification, but just set the label to be the index of the class (between 0 and (# of classes - 1)) instead of ±1.
I am using Weka and applying J48 to build my classifier. I have 40 features with 2000 instances (700 class a and 1300 class b).
The J48 decision tree is just using 2 features out of 40! Is there anyway to allow J48 to use all features or is there any other algorithm that allows using all features?
Thanks in advance.
Maybe it is because J48 does not need more attributes.
You can check feature's correlation in Select attribute tab, and run the selector with Ranker as search method and Principal Components as evaluator. It will show you the relations between each feature and each class, and it will also tell you which are the features that best describe your classes.
It is not necessary that all the 40 features are needed for the classification. Because some features might be redundant (e.g. correlated) or does not contain discriminatory information.
You can run feature selection before from the Select attributes tab in Weka Explorer and see which features are important.
Also you can test classifiers such as SVM (libSVM or SMO), Neural Network ( MultilayerPerceptron) and/or Random Forest as they tend to give the best classification results in general (problem dependent)
I am a newbie in NLP, just doing it for the first time.
I am trying to solve a problem.
My problem is I have some documents which are manually tagged like:
doc1 - categoryA, categoryB
doc2 - categoryA, categoryC
doc3 - categoryE, categoryF, categoryG
.
.
.
.
docN - categoryX
Here I have a fixed set of categories and any document can have any number of tags associated with it.
I want to train the classifier using this input, so that this tagging process can be automated.
Thanks
What you are trying to do is called multi-way supervised text categorization (or classification). Knowing the right question to ask is half the problem.
As for how this can be done, here are two references:
RCV1 : A New Benchmark Collection for Text Categorization
Research
Improved Nearest Neighbor Methods For Text Classification With
Language Modeling and Harmonic Functions
Most of classifier works on Bag of word model . There are multiple use case to get expected result.
Try out most general Multinomial naive base classifer with changing different input paramters and check out result.
Try variants of ML Naive base (http://scikit-learn.org/0.11/modules/naive_bayes.html)
You can check out sentence classifier along with considering sentence structures. Considering ngram concepts, you can try out with 2,3,4,5 gram models and check how result varies. Count vectorizer allows ngram, check out this link for example - http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html
Based on dataset features, not a single classifier can be best for you scenario, you have to check out different use case, which fits best for you.
Most initial approach is, you get started with simple classifier using scikit learn.
Put each category as traning class and train the classifier with this classes
For any input docX, classifier with trained model
You will get probability result for each category
Now put some threshold like probability different between three most highest resulting category, if it matches the threshold consider those category as result for that input class.
its not clear what you have tried or what programming language you are using but as most have suggested try text classification with document vectors, bag of words (as long as there are words in the documents that can help with classification)
Here are some simple tools that can help get you started
Weka http://www.cs.waikato.ac.nz/ml/weka/ (GUI & Java)
NLTK http://www.nltk.org (Python)
Mallet http://mallet.cs.umass.edu/ (command line & Java)
NUML http://numl.net/ (C#)