Train and Test with 'one class classifier' using Weka - machine-learning

Suppose I have the following train-set:
f1,f2,f3, label
1,2,3, 0
1.2,2.3,3.3, 0
1.25,2.25,3.25, 0
and I want to get the classification for the following test-set:
f1,f2,f3, label
6,7,8, ?
1.1,2.1,3.1, ?
9,10,11, ?
When I'm using Weka and 'One class classifier', first I load the train-set and classify using use training set option in the test options, after that I choose the supplied test set option and load the above test set.The problem is that I get the same classification for all the test-set instances and I get a warning that the train and test set are not compatible, do you want to wrap with inputMappedClassifier?. The above are just a simple examples, I got these problems also with a huge anomaly injected dataset.
What do I do wrong?

I think, Since you are performing oneClassClassification, your test data should be (assumption here is all the test data rows are not outliers):
f1,f2,f3, label
6,7,8, 0
1.1,2.1,3.1, 0
9,10,11, 0
and if you enable predictions on test data, you may get:
=== Predictions on test set ===
inst# actual predicted error prediction
1 1:true 1:true 1
2 1:true ? ?
3 1:true 1:true 1:true
which means in test data:
a) Instance 1 is not outlier
b) Instance 2 is outlier
c) Instance 3 is not outlier

Related

what's the error using supply test set for prediction

I am trying to analyze the titanic dataset and build a predictive model. I have preprocessed the datasets. Now while I am trying to predict using the test set and I don't know why it doesn't show any result.
Titanic_test.arff
Titanic_train.afff
If you open the two files (training and test set) you will notice a difference: in the training set the last column has value 0 or 1, whereas in the test set it has ? (undefined).
This means that your test set doesn't contain the answers, therefore Weka cannot do any evaluation. It could do predictions though.

consistent prediction results in Weka despite different seeds value

I am using Weka 3.8.3 multilayer perceptron on Iris dataset. I have 75 training instances and 75 test instances. The thing is no matter how I change the 'seed' parameter, it does not affect the accuracy that much. It's almost always the stats below. Is seed used to randomly initialize the weight? Could someone please help to explain why it behaves this way? Many thanks.
=== Summary ===
Correctly Classified Instances 70 93.3333 %
Incorrectly Classified Instances 5 6.6667 %
I tried the same thing (Training and test 50% percentage split using the radio button) and got 72 and 3 with a random seed for XVal / % Split of 1.
When I change the random seed to 777 (or 666 or 54321) I get 73 and 2, which is a different result, so I can't replicate what you are seeing.
With a random seed of 0 I get 71 and 4.

error Evaluating classifier Train and test dataset are not compatible

I am getting error while running SMO model on test dataset in weka
Problem Evaluating classifier Train and test dataset are not
compatible. Class index differ: 3 != 0
Training dataset format
mean,variance,label
54.3333333333,1205.55555556,five
3.0,0.0,five
31739.0,0.0,five
3205.5,4475340.25,one
Test dataset format
mean,variance
3.0,0.0
257.0,0.0
216.0,14884.0
736.0,0.0
I trained the training dataset and want to get labels for the test dataset. Why I am getting these errors.
The test dataset should have identical structure to the training data. In your case you should add a column to the end called "label". Then, you need to assign some value to the label. This could be simply a question mark "?" to indicate the true label is unknown.

Which deep learning model can classify categories which are not mutually exclusive

Examples : I have a sentence in job description : "Java senior engineer in UK ".
I want to use a deep learning model to predict it as 2 categories : English and IT jobs. If i use traditional classification model, it only can predict 1 label with softmax function at last layer . Thus, i can use 2 model neural networks to predict "Yes"/"No" with both categories, but if we have more categories, it is too expensive . So do we have any deeplearning or machine learning model to predict 2 or more categories at same time ?
"Edit" : With 3 labels by traditional approach , it will be encoded by [1,0,0] but in my case, it will be encoded by [1,1,0] or [1,1,1]
Example : if we have 3 labels, and a sentence may be fit with all of these labels. So if output from softmax function is [0.45 , 0.35 , 0.2 ] we should classify it into 3 labels or 2 labels , or may be one ?
The main problem when we do it is : what is good threshold to classify into 1, or 2 , or 3 labels ?
If you have n different categories which can be true at the same time, have n outputs in your output layer with a sigmoid activation function. This will give each output a value between 0 and 1 independently.
Your loss function should be the mean of the negative log likelihood of the outputs. In tensorflow, this is:
linear_output = ... # the output layer before applying activation function
loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(
logits=linear_output, labels=correct_outputs))
output = tf.sigmoid(linear_output) # 0 to 1 for each category

Test single instance in weka which has no class label

This question is being already asked but i didn't understand the answer so I am again posting the question please do reply.
I have a weka model eg: j48 I have trained that model for my dataset and now I have to test the model with a single instance in which it should return the class label. How to do it?
I have tried these ways:
1)When I am giving my test instance a,b,c,class for class as ?. It is showing problem evaluating classifier .train and test are not compatible
2)When I list all the class labels and I put the ? for the class label for the test instance like this:
#attribute class {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27}
#data
1,2,............,?
It is not showing any results like this
=== Evaluation on test set ===
=== Summary ===
Total Number of Instances 0
Ignored Class Unknown Instances 1
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
0 0 0 0 0 ? 1
0 0 0 0 0 ? 2
0 0 0 0 0 ? 3
Weighted Avg. NaN NaN NaN NaN NaN NaN
confusion matrix is null
What to do?
Given the incomplete information from the OP, here is what probably happened:
You used
the Weka GUI Chooser
selected the Weka Explorer
loaded your training data on the Preprocess tab
selected the Classify tab
selected the J48 classifier
selected Supplied test set under test options and supplied your aforementioned test set
clicked on Start
Now to you problem:
"Evaluation on test set" should have given it away, because you're are evaluating the classifier -or better: the trained model. But for evaluation, you need to compare the predicted class with the actual class, which you didn't supply. Hence, the instance with the missing class label will be ignored.
Since you don't have any other test instances WITH class label, the confusion matrix is empty. There simply is not enough information available to build one. (And just as a side note: A confusion matrix for only one instance is kinda worthless.)
To see the actual prediction
You have to go to More options ..., click on Choose next to Output predictions and select an output format, e.g. PlainText, and you will see something like:
inst# actual predicted error prediction
1 1:? 1:0 0.757
2 1:? 1:0 0.824
3 1:? 1:0 0.807
4 1:? 1:0 0.807
5 1:? 1:0 0.79
6 1:? 2:1 0.661
This output lists the classified instances in the order they occur in the test file. This example was taken from the Weka site about "Making predictions" with the following explanation.
In this case, taken directly from a test dataset where all class
attributes were marked by "?", the "actual" column, which can be
ignored, simply states that each class belongs to an unknown class.
The "predicted" column shows that instances 1 through 5 are predicted
to be of class 1, whose value is 0, and instance 6 is predicted to be
of class 2, whose value is 1. The error field is empty; if predictions
were being performed on a labeled test set, each instance where the
prediction failed to match the label would contain a "+". The
probability that instance 1 actually belongs to class 0 is estimated
at 0.757.

Resources