unable to classify using Weka from command line - machine-learning

I am using weka 3.6.13 and trying to use a model to classify data:
java -cp weka-stable-3.6.13.jar weka.classifiers.Evaluation weka.classifiers.trees.RandomForest -l Parking.model -t Data_features_class_ques-2.arff
java.lang.Exception: training and test set are not compatible
though the model works when we use the GUI, through Explorer->Claasify ->Supplied test set and load the arff file->right click on result list and load model-> again right click -> re-evaluate model on current data set...
Any pointers please help.

If your data contains "String" features then first use StringToWordVector in batch mode i.e. for both data set in single command (command 1) then use command 2 and command 3.
Command 1.
java weka.filters.unsupervised.attribute.StringToWordVector -b -R first-last -i training.arff -o training_s2w.arff -r test.arff -s test_s2w.arff
Command 2.
java weka.classifiers.trees.RandomForest -t training_s2w.arff -d model.model
Command 3.
java weka.classifiers.trees.RandomForest -T test_s2w.arff -l model.model -p 0 > result.txt
PS: add path for weka.jar accordingly.

Related

How visualize j48 tree in weka

I have an output tree in weka but can't view it (right click ...). Is there a tool to generate the resulting tree in an understandable way from the copy of the log (figures)?
The above textual representation cannot be converted into other formats, unless you write your own parser.
However, if you use the -g option on the command-line, the tree will get output on stdout in dot-notation. You can then take this output and convert it into other formats, like PNG or PDF using the GraphViz software.
You can run Weka from the command line if you have java installed.
On my Windows machine from the Weka-3-9-5 directory:
C:\Weka-3-9-5> java -cp weka.jar weka.classifiers.trees.J48 -C 0.25 -M 2 -t .\data\iris.arff
This gives you the output that you current have with the trees. However:
C:\Weka-3-9-5> java -cp weka.jar weka.classifiers.trees.J48 -C 0.25 -M 2 -t .\data\iris.arff -g
gives you a different format:
digraph J48Tree {
N0 [label="petalwidth" ]
...
}
and you can feed this to GraphViz to get a nice printed tree.
I put the digraph output into a tree.txt file and then generated a png image file through GraphViz:
C:\GraphViz> dot -Tpng tree.txt > tree.png

Tool for edit lvm.conf file

is there any lvm.conf editor?
I'm trying to set global_filter, use_lvmtad and some other options, currently using sed:
sed -i /etc/lvm/lvm.conf \
-e "s/use_lvmetad = 1/use_lvmetad = 0/" \
-e "/^ *[^#] *global_filter/d" \
-e "/^devices {/a\ global_filter = [ \"r|/dev/drbd.*|\", \"r|/dev/dm-.*|\", \"r|/dev/zd.*|\" ]"
but I don't like this too much, is there any better way?
I found only lvmconfig tool, but it can only display certain configuration sections, and can't edit them.
If you using Ubuntu variant then you can use the LVM GUI to configure and manage the LVM. Refer this link
It seems that augtool is exactly what I was looking for.
These two packages should be enough to proper processing lvm.conf file:
apt install augeas-tools augeas-lenses
Example usage:
augtool print /files/etc/lvm/lvm.conf
And you should get the whole parse tree on stdout.
If the parser fails you won’t get any output, print the error message using:
augtool print /files/etc/lvm/lvm.conf/error
The augtool equivalent for the sed command from the original question:
augtool -s <<EOT
set /files/etc/lvm/lvm.conf/global/dict/use_lvmetad/int "0"
rm /files/etc/lvm/lvm.conf/devices/dict/global_filter
set /files/etc/lvm/lvm.conf/devices/dict/global_filter/list/0/str "r|^/dev/drbd.*|"
set /files/etc/lvm/lvm.conf/devices/dict/global_filter/list/1/str "r|/dev/dm-.*|"
set /files/etc/lvm/lvm.conf/devices/dict/global_filter/list/2/str "r|/dev/zd.*|"
EOT

wire shark log file conversion to text file through cli (in windows7)

For some automation purpose I have below requirements for the Wireshark log file(.pcap).
1-Conversion of Wireshark logs(.pcap file ) to text file with detail of packets.
2-Conversion of Wireshark logs (.pcap file) to text file with some filter (eg: bssgp.pdu_type == 0x00) with detail of packets.
I know how to convert the wireshark files to text file through GUI,
But I need the cli commands for the same to automate the procedure.
Thanks in advance
To convert a .pcap file to text output, you can run:
tshark -V -r file.pcap > file.txt
If you only want to convert certain packets that match a Wireshark display filter, then using your filter, you can run:
tshark -Y "bssgp.pdu_type == 0x00" -V -r file.pcap > file.txt
If the -V option provides too much detail, you can limit the detail to specific protocol(s) by using the -O option instead. For example, to provide details for bssgp only and a summary for all other protocols, try:
tshark -Y "bssgp.pdu_type == 0x00" -O bssgp -r file.pcap > file.txt
Refer to the tshark man page for more details about these options.

What is the truly correct usage of -S parameter on weka classifier A1DE?

So I'm using weka 3.7.11 in a Windows machine (and runnings bash scripts with cygwin), and I found an inconsistency regarding the AODE classifier (which in this version of weka, comes from an add-on package).
Using Averaged N-Dependencies Estimators from the GUI, I get the following configuration (from an example that worked alright in the Weka Explorer):
weka.classifiers.meta.FilteredClassifier -F "weka.filters.unsupervised.attribute.Discretize -F -B 10 -M -1.0 -R first-last" -W weka.classifiers.bayes.AveragedNDependenceEstimators.A1DE -- -F 1 -M 1.0 -S
So I modified this to get the following command in my bash script:
java -Xmx60G -cp "C:\work\weka-3.7.jar;C:\Users\Oracle\wekafiles\packages\AnDE\AnDE.jar" weka.classifiers.meta.FilteredClassifier \
-t train_2.arff -T train_1.arff \
-classifications "weka.classifiers.evaluation.output.prediction.CSV -distribution -p 1 -file predictions_final_multi.csv -suppress" \
-threshold-file umbral_multi.csv \
-F "weka.filters.unsupervised.attribute.Discretize -F -B 10 -M -1.0 -R first-last" \
-W weka.classifiers.bayes.AveragedNDependenceEstimators.A1DE -- -F 1 -M 1.0 -S
But this gives me the error:
Weka exception: No value given for -S option.
Which is weird, since this was not a problem with the GUI. In the GUI, the Information box says that -S it's just a flag ("Subsumption Resolution can be achieved by using -S option"), so it shouldn't expect any number at all, which is consistent with what I got using the Explorer.
So then, what's the deal with the -S option when using the command line? Looking at the error text given by weka, I found this:
Options specific to classifier weka.classifiers.bayes.AveragedNDependenceEstimators.A1DE:
-D
Output debugging information
-F <int>
Impose a frequency limit for superParents (default is 1)
-M <double>
Specify a weight to use with m-estimate (default is 1)
-S <int>
Specify a critical value for specialization-generalilzation SR (default is 100)
-W
Specify if to use weighted AODE
So it seems that this class works in two different ways, depending on which method I use (GUI vs. Command Line).
The solution I found, at least for the meantime, was to write -S 100 on my script. Is this really the same as just putting -S in the GUI?
Thanks in advance.
JM
I've had a play with this Classifier, and can confirm that what you are experiencing on your end is consistent with what I have here. From the GUI, the -S Option (subsumption Resolution) requires no parameters while the Command Prompt does (specialization-generalization SR).
They don't sound like the same parameter, so you may need to raise this issue with the developer of the third party package if you would like to know more information on these parameters. You can find this information from the Tools -> Package Manager -> AnDE, which will point you to the contacts for the library.

Error while creating mahout model

I am training mahout classifier for my data,
Following commands i issued to create mahout model
./bin/mahout seqdirectory -i /tmp/mahout-work-root/MyData-all -o /tmp/mahout-work-root/MyData-seq
./bin/mahout seq2sparse -i /tmp/mahout-work-root/MyData-seq -o /tmp/mahout-work-root/MyData-vectors -lnorm -nv -wt tfidf
./bin/mahout split -i /tmp/mahout-work-root/MyData-vectors/tfidf-vectors --trainingOutput /tmp/mahout-work-root/MyData-train-vectors --testOutput /tmp/mahout-work-root/MyData-test-vectors --randomSelectionPct 40 --overwrite --sequenceFiles -xm sequential
./bin/mahout trainnb -i /tmp/mahout-work-root/Mydata-train-vectors -el -o /tmp/mahout-work-root/model -li /tmp/mahout-work-root/labelindex -ow
When i try to create the model using trainnb command i am getting the following Exception :
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.mahout.classifier.naivebayes.BayesUtils.writeLabelIndex(BayesUtils.java:119) at org.apache.mahout.classifier.naivebayes.training.TrainNaiveBayesJob.createLabelIndex(TrainNaiveBayesJob.java:152)
What could be the problem here?
Note: Original Example mentioned here works fine.
I think it might be the problem of how you put your training files.
The files should be organized as following:
MyData-All
\classA
-file1
-file2
-...
\classB
-filex
....

Resources