Clustering before classification in Weka

Clustering before classification in Weka - machine-learning

The instances in my dataset have multiple numeric attributes and a binary class. In Weka is there a way to use a clusterer and pass the result to a classifier (say SMO) to improve the results of classification?

One way that you could add cluster information to your data is using the below method (in Weka Explorer):
Load your Favourite Dataset
Choose your Cluster Model (In my case, I used SimpleKMeans)
Modify the Parameters of the Clusterer as Required
Use the Training Set for the Cluster Mode
Start the Clustering Process
Once the Clusters have been generated, Right-Click on the Result List and select 'Visualize Cluster Assignments'
Select Y to be the Cluster, then hit the Save Button as shown below:
Save the Data to a nominated location.
You should then be able to load this file and use the cluster information in your classifier just like any other attribute. Just make sure that the Class is set to the right attribute and you should be right to go.
NOTE: When I ran these tests I used J48 to evaluate the class, and it seemed that J48 used only the values of the clusters to estimate the class. The accuracy of the model was also surprisingly high as well, so either the dataset was either too simple or I may have missed a step somewhere in the clustering process.
Hope this Helps!

In Weka Explorer, after loading your dataset
choose the Preprocess tab,
click "Choose..." Button,
add the unsupervised-attribute-filter "AddCluster".
click next to button, to open the Clusterer Selection field, choose a clusterer,
configure/parameterize the clusterer
close all modal dialog boxes
Click "Apply" button to apply the filter. It will add another attribute called "cluster" as the rightmost one in your attribute list.
Then continue with your classification experiments.

Related

Which machine learning algorithms to use for given Problem

I want to implement the ML on the menu in an ERP application, what i want is the menu order should change based on the user behaviour.
I have data in below format:
Sn Role Time MenuID
1 admin 1830 menu1
2. admin 1900 menu2.....
What I want is, based on current user role and time, ML should predict which MenuID to put first?
Should I treat it as supervised learning since it could be a labelled data and its regression problem because I expect the output as number(menu order)?
I read a lot of tutorials but I can't to decide where to start. I don't need any code, I just want a starting point.

I believe you could use a basic regression approach where your input features would be the user role and the time, and the target variable you would be predicting would be the menu they are most likely on at that time (You'd need to convert all this categorical data into one-hot encoded data). Also, if you apply a softmax function to the outputs, then you could get the individual probabilities for each menu and then arrange them accordingly.

Extract SVM assigned values against each instance in WEKA

is there any way for extracting the values after using SVM traning model against each instance to see what value SVM has assigned to each instance for classifying the instance in either positive class or negative.. i am looking for some solution to get all the SVM based assigned valus against each instance in WEKA tool.
i have been using LibSVM and LibLinear classifiers under SVM. i need those values to use for ranking

Click Preprocess... Filter ... "Choose" Button...
Then select the Weka /Filters / Supervised / Attribute Filter
"AddClassification"
In its configuration Dialog, set "OutputClassification" to "True"
Click on the "LibSVM" label to invoke the second dialog box. Configure the Classifier.
Click apply.
A new column "Classification" will be added to your dataset - but this won't perform cross-validation on your dataset. It will use the entire dataset as training dataset and thus will lead to overfitting.
Alternative (for getting predictions on cross-validated output): You can also go to the "Classifier" tab, click "More Options..." button, "Output predictions", choose "PlainText" then the predictions will show up in the big "Classifier Output" textpanel.

Output additional attributes WEKA 3.8

I am using GUI version of WEKA and I am classifying using the Random Forest. I'm trying to find out which instances are misclassified.
I know that earlier versions of WEKA had and option of "Output additional attributes" where I can add instance Id and get around this problem, but now with WEKA 3.8 I can't see this option.

Answering my own question, on the preprocess tab you need to use the filter addID or you can add your own attribute as string. Then you would have to use the classifier "FilteredClassifier" click on it and use the filter "Remove" specifying the index of the attribute that holds the id, and then start the classification.
To see the misclassified, right click on the the result int he Result List, choose Visualize classifier errors, then choose save as arff. This will result in seeing all the instances with their prediction.

How to apply InformationGain in rapidminer with seperate test set ?

I am dealing with text classification in rapidminer. I have seperate test and training splits. I applied Information Gain to a dataset using n-fold cross validation but i am confused on how to apply it on seperate test set ? Below is attached image
In figure i have connected the word list output from first "Process Documents From Files" which is used for training to second "Processed Documents From Files" which is used for testing but i want to apply the reduced feature to the second "Process Documents From Files" which perhaps should be the one returned from "Select By Weight" (reduced dimensions) operator but it returns weights which i cannot provide to second "Process Documents From Files". I searched alot but did'nt managed to find anything which can satisfy my need ?
Is it really possible for Rapidminer to have seperate test/train splits and apply feature selection ?
Is there any way to convert these weights into word list ? Please don't say write in repository (i can't do this) ?
In such scenario when i have different test/train splits and needs to apply feature selection, how would i make sure that test/train splits have same dimension vectors ?
I am really trapped out at it, kindly help ...

Immediately after the lower Process Documents operator insert a new Select By Weight operator before the Apply Model. Use a Multiply operator to copy the weights from the Weight By Information Gain operator and connect this to the input of the new Select By Weight operator.

Weka: Results of each fold in 10-fold CV

For Weka Explorer (GUI), when we do a 10-fold CV for any given ARFF file, then what Weka Explorer provides (as far as I can see) is the average result for all the 10 folds.
Q. Is there any way to get the results of each fold? For instance, I need the error rates (incorrectly identified instances) for each fold.
Help appreciated.

I think this is possible using Weka's GUI. You need to use the Experimenter though instead of the Explorer. Here are the steps:
Open the Experimenter from the GUI Chooser
Create a new experiment (New button # top-right)
[optional] Enter a filename and location in the Results Destination to save the results to
Set the Number of (cross-validation) folds to your liking (start experimenting with 2 folds for easy results)
Add your dataset (if your dataset needs preprocessing then you should do this in the Explorer first and then save the preprocessed dataset)
Set the Number of repetitions (I recommend 1 to start of with)
Add the algorithm(s) you want to test (again start easy, start with one algorithm)
Go to the Run tab and Start the experiment and wait till it finishes
Go to the Analyse tab and import the experiment results by clicking Experiment (top-right)
For Row select: Fold
For Column select: Percent_incorrect or Number_incorrect (or any other measure you want to see)
You now see the specified results for each fold

Weka Explorer does not have an option to give the results for individual folds when using the crossvalidation option, there are some workarounds. If you explicitly don't want to change any code, you need to do some manual fiddling, but I think this gives more or less what you want
Instead of Cross-validation, select Percentage split and set it to 90%
Start classifier
Click More options... and change the Random seed for XVal / % Split value to something you haven't used before.
Repeat ten times.
This is not exactly equivalent to 10-fold crossvalidation though, since the pseudo-folds you make this way might overlap.
An alternative that is equivalent to crossvalidation, but more cumbersome, would be to make 10 folds manually by using the unsupervised instance filter RemoveFolds or RemoveRange.
Generate and save 10 training sets and 10 test sets. Then for every fold, load the training set, select Supplied test set in the classify tab, and select the appropriate test fold.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Clustering before classification in Weka - machine-learning

The instances in my dataset have multiple numeric attributes and a binary class. In Weka is there a way to use a clusterer and pass the result to a classifier (say SMO) to improve the results of classification?

Related

Which machine learning algorithms to use for given Problem

Extract SVM assigned values against each instance in WEKA

Output additional attributes WEKA 3.8

How to apply InformationGain in rapidminer with seperate test set ?

Weka: Results of each fold in 10-fold CV

Categories

Resources