How visualize j48 tree in weka - machine-learning

I have an output tree in weka but can't view it (right click ...). Is there a tool to generate the resulting tree in an understandable way from the copy of the log (figures)?

The above textual representation cannot be converted into other formats, unless you write your own parser.
However, if you use the -g option on the command-line, the tree will get output on stdout in dot-notation. You can then take this output and convert it into other formats, like PNG or PDF using the GraphViz software.

You can run Weka from the command line if you have java installed.
On my Windows machine from the Weka-3-9-5 directory:
C:\Weka-3-9-5> java -cp weka.jar weka.classifiers.trees.J48 -C 0.25 -M 2 -t .\data\iris.arff
This gives you the output that you current have with the trees. However:
C:\Weka-3-9-5> java -cp weka.jar weka.classifiers.trees.J48 -C 0.25 -M 2 -t .\data\iris.arff -g
gives you a different format:
digraph J48Tree {
N0 [label="petalwidth" ]
...
}
and you can feed this to GraphViz to get a nice printed tree.
I put the digraph output into a tree.txt file and then generated a png image file through GraphViz:
C:\GraphViz> dot -Tpng tree.txt > tree.png

Related

Interpreting Fortify results file (.fpr) through command line

As part of automating the process of running secure code analysis, I have a Jenkins job which uses the sourceanalyzer command line tool to generate an .fpr results file. At the moment I'm opening this results file in Audit Workbench application to view the results and check if there's any newly introduced issues etc, and generating a report from there in PDF/XML format.
Does anyone is it possible to invoke Audit Workbench through the command line and generate a report on the issues, which we could then leverage through a Jenkins script and also then mail the results? Looking online the command line usage seems to stop at the fpr generation stage.
Thanks in advance!
There is a command-line utility to generate an Report from the FPR file.
Currently there are two report generators: Legacy and BIRT. The BIRT report engine was introduced into Audit Workbench with version 4.40.
Here is an example using the BIRT Report engine to generate a DISA STIG report
BIRTReportGenerator -template "DISA STIG" -source HelloWorld_second.fpr
-output BirtReport.pdf -format PDF -showSuppressed --Version "DISA STIG 3.9"
-UseFortifyPriorityOrder
Using the legacy one is a little more involved. The command is:
ReportGenerator -format pdf -f LegacyReport.pdf -source HelloWorld_second.fpr
-template DisaStig3.10.xml -showSuppressed -showHidden
You can either use one of the predefined template reports located in the <SCA Install Dir>/Core/config/reports directory or generate one using the Report Wizard and saving the template which gets stored in the C:\Users\<USER>\AppData\Local\Fortify\config\AWB-XX.XX\reports\ directory in Windows.
On Linux/Mac look at the configuration file <SCA Install Dir>/Core/config/fortify.properties for the com.fortify.WorkingDirectory property, this is where the reports will be stored
#SBurris,
If you don't want to show Suppressed/Hidden is it just -hideSuppressed and -hideHidden?
Also, is there a way to add custom filters to not show things like "nones" from the STIG/SANS/OWASP like you can create in the AWB GUI?
Basically, I need a command(s) to merge two FPRs and then compare them based on what is found new on the scanned code vs. the old FPR.
Merge should be:
FPRUtility -merge -project <newest_scan.fpr> -source <previous_scan.fpr> -f <BUILDXX_MergedWith_BUILDXY.fpr>
The custom filter I need after the merge is:
"[OWASP Top 10 2013]:!<none> OR [SANS Top 25 2011]:!<none> OR [STIG 3.9]:!<none> AND [Detected On]:!/^/"
Where the Detected On field is a custom tag that I need to carry through from the previous FPR file into the newly merged one.
AND THEN output the report from that newly merged fpr in pdf and xml format to a location/filename I specify. Something along the lines of:
~AWB_Installation_Dir/bin/ReportGenerator -format pdf -f [BUILDXX_MergedWith_BUILDXY].pdf -source output.fpr
-template DisaStig3.10.xml -hideSuppressed -hideHidden
Obviously this can be a multitude of commands as long as we can get it back to Bamboo. Any help would be greatly appreciated. Thanks.
FPRUtility interprets the space-separated conditions in the -information -search -query ... parameter by applying the boolean AND operator. To obtain a union of 2 conditions A || B, I figured I could intersect negations of other conditions that complement the former: !C && !D (where A || B || C || D always holds true). I.e., to find all high and critical issues, I use
FORTIFY_ROOT\jre\bin\java -d64 -Xmx4096M -jar FORTIFY_ROOT\Core\lib\exe\fpr-utility-exe.jar -project APP_VER_DATE.fpr -information -search -query "[OWASP Top 10 2017]:A [fortify priority order]:!low [fortify priority order]:!medium" -categoryIssueCounts -listIssues > issues.txt
In case of an audit, I figured I needed the older report generation utility to include suppressed issues (and their comments),
sed -e 's/\(IssueListing limit=\)"[^"]\+"/\1"-1"/' -i "FORTIFY_ROOT/Core/config/reports/DeveloperWorkbook.xml"
cmd /c call ReportGenerator -template DeveloperWorkbookAll.xml -format pdf -source APP_VER_DATE.fpr -showSuppressed -f "APP_VER_DATE_with_suppressed.pdf"

unable to classify using Weka from command line

I am using weka 3.6.13 and trying to use a model to classify data:
java -cp weka-stable-3.6.13.jar weka.classifiers.Evaluation weka.classifiers.trees.RandomForest -l Parking.model -t Data_features_class_ques-2.arff
java.lang.Exception: training and test set are not compatible
though the model works when we use the GUI, through Explorer->Claasify ->Supplied test set and load the arff file->right click on result list and load model-> again right click -> re-evaluate model on current data set...
Any pointers please help.
If your data contains "String" features then first use StringToWordVector in batch mode i.e. for both data set in single command (command 1) then use command 2 and command 3.
Command 1.
java weka.filters.unsupervised.attribute.StringToWordVector -b -R first-last -i training.arff -o training_s2w.arff -r test.arff -s test_s2w.arff
Command 2.
java weka.classifiers.trees.RandomForest -t training_s2w.arff -d model.model
Command 3.
java weka.classifiers.trees.RandomForest -T test_s2w.arff -l model.model -p 0 > result.txt
PS: add path for weka.jar accordingly.

Fortify, how to start analysis through command

How we can generate FortiFy report using command ??? on linux.
In command, how we can include only some folders or files for analyzing and how we can give the location to store the report. etc.
Please help....
Thanks,
Karthik
1. Step#1 (clean cache)
you need to plan scan structure before starting:
scanid = 9999 (can be anything you like)
ProjectRoot = /local/proj/9999/
WorkingDirectory = /local/proj/9999/working
(this dir is huge, you need to "rm -rf ./working && mkdir ./working" before every scan, or byte code piles underneath this dir and consume your harddisk fast)
log = /local/proj/9999/working/sca.log
source='/local/proj/9999/source/src/**.*'
classpath='local/proj/9999/source/WEB-INF/lib/*.jar; /local/proj/9999/source/jars/**.*; /local/proj/9999/source/classes/**.*'
./sourceanalyzer -b 9999 -Dcom.fortify.sca.ProjectRoot=/local/proj/9999/ -Dcom.fortify.WorkingDirectory=/local/proj/9999/working -logfile /local/proj/working/9999/working/sca.log -clean
It is important to specify ProjectRoot, if not overwrite this system default, it will put under your /home/user.fortify
sca.log location is very important, if fortify does not find this file, it cannot find byte code to scan.
You can alter the ProjectRoot and Working Directory once for all if your are the only user: FORTIFY_HOME/Core/config/fortify_sca.properties).
In such case, your command line would be ./sourceanalyzer -b 9999 -clean
2. Step#2 (translate source code to byte code)
nohup ./sourceanalyzer -b 9999 -verbose -64 -Xmx8000M -Xss24M -XX:MaxPermSize=128M -XX:+CMSClassUnloadingEnabled -XX:+UseConcMarkSweepGC -XX:+UseParallelGC -Dcom.fortify.sca.ProjectRoot=/local/proj/9999/ -Dcom.fortify.WorkingDirectory=/local/proj/9999/working -logfile /local/proj/9999/sca.log -source 1.5 -classpath '/local/proj/9999/source/WEB-INF/lib/*.jar:/local/proj/9999/source/jars/**/*.jar:/local/proj/9999/source/classes/**/*.class' -extdirs '/local/proj/9999/source/wars/*.war' '/local/proj/9999/source/src/**/*' &
always unix background job (&) in case your session to server is timeout, it will keep working.
cp : put all your known classpath here for fortify to resolve the functiodfn calls. If function not found, fortify will skip the source code translation, so this part will not be scanned later. You will get a poor scan quality but FPR looks good (low issue reported). It is important to have all dependency jars in place.
-extdir: put all directories/files you don't want to be scanned here.
the last section, files between ' ' are your source.
-64 is to use 64-bit java, if not specified, 32-bit will be used and the max heap should be <1.3 GB (-Xmx1200M is safe).
-XX: are the same meaning as in launch application server. only use these to control the class heap and garbage collection. This is to tweak performance.
-source is java version (1.5 to 1.8)
3. Step#3 (scan with rulepack, custom rules, filters, etc)
nohup ./sourceanalyzer -b 9999 -64 -Xmx8000M -Dcom.fortify.sca.ProjectRoot=/local/proj/9999 -Dcom.fortify.WorkingDirectory=/local/proj/9999/working -logfile /local/ssap/proj/9999/working/sca.log **-scan** -filter '/local/other/filter.txt' -rules '/local/other/custom/*.xml -f '/local/proj/9999.fpr' &
-filter: file name must be filter.txt, any ruleguid in this file will not be reported.
rules: this is the custom rule you wrote. the HP rulepack is in FORTIFY_HOME/Core/config/rules directory
-scan : keyword to tell fortify engine to scan existing scanid. You can skip step#2 and only do step#3 if you did notchange code, just want to play with different filter/custom rules
4. Step#4 Generate PDF from the FPR file (if required)
./ReportGenerator -format pdf -f '/local/proj/9999.pdf' -source '/local/proj/9999.fpr'

What is the truly correct usage of -S parameter on weka classifier A1DE?

So I'm using weka 3.7.11 in a Windows machine (and runnings bash scripts with cygwin), and I found an inconsistency regarding the AODE classifier (which in this version of weka, comes from an add-on package).
Using Averaged N-Dependencies Estimators from the GUI, I get the following configuration (from an example that worked alright in the Weka Explorer):
weka.classifiers.meta.FilteredClassifier -F "weka.filters.unsupervised.attribute.Discretize -F -B 10 -M -1.0 -R first-last" -W weka.classifiers.bayes.AveragedNDependenceEstimators.A1DE -- -F 1 -M 1.0 -S
So I modified this to get the following command in my bash script:
java -Xmx60G -cp "C:\work\weka-3.7.jar;C:\Users\Oracle\wekafiles\packages\AnDE\AnDE.jar" weka.classifiers.meta.FilteredClassifier \
-t train_2.arff -T train_1.arff \
-classifications "weka.classifiers.evaluation.output.prediction.CSV -distribution -p 1 -file predictions_final_multi.csv -suppress" \
-threshold-file umbral_multi.csv \
-F "weka.filters.unsupervised.attribute.Discretize -F -B 10 -M -1.0 -R first-last" \
-W weka.classifiers.bayes.AveragedNDependenceEstimators.A1DE -- -F 1 -M 1.0 -S
But this gives me the error:
Weka exception: No value given for -S option.
Which is weird, since this was not a problem with the GUI. In the GUI, the Information box says that -S it's just a flag ("Subsumption Resolution can be achieved by using -S option"), so it shouldn't expect any number at all, which is consistent with what I got using the Explorer.
So then, what's the deal with the -S option when using the command line? Looking at the error text given by weka, I found this:
Options specific to classifier weka.classifiers.bayes.AveragedNDependenceEstimators.A1DE:
-D
Output debugging information
-F <int>
Impose a frequency limit for superParents (default is 1)
-M <double>
Specify a weight to use with m-estimate (default is 1)
-S <int>
Specify a critical value for specialization-generalilzation SR (default is 100)
-W
Specify if to use weighted AODE
So it seems that this class works in two different ways, depending on which method I use (GUI vs. Command Line).
The solution I found, at least for the meantime, was to write -S 100 on my script. Is this really the same as just putting -S in the GUI?
Thanks in advance.
JM
I've had a play with this Classifier, and can confirm that what you are experiencing on your end is consistent with what I have here. From the GUI, the -S Option (subsumption Resolution) requires no parameters while the Command Prompt does (specialization-generalization SR).
They don't sound like the same parameter, so you may need to raise this issue with the developer of the third party package if you would like to know more information on these parameters. You can find this information from the Tools -> Package Manager -> AnDE, which will point you to the contacts for the library.

HDF5 integration with ROOT framework

I've worked extensively with ROOT, which has it's own format for data files, but for various reasons we would like to switch to HDF5 files. Unfortunately we'd still require some way of translating files between formats. Does anyone know of any existing libraries which do this?
You might check out rootpy, which has a facility for converting ROOT files into HDF5 via PyTables: http://www.rootpy.org/commands/root2hdf5.html
If this issue is still of interest to you, recently there have been large improvements to rootpy's root2hdf5 script and the root_numpy package (which root2hdf5 uses to convert TTrees into NumPy structured arrays):
root2hdf5 -h
usage: root2hdf5 [-h] [-n ENTRIES] [-f] [--ext EXT] [-c {0,1,2,3,4,5,6,7,8,9}]
[-l {zlib,lzo,bzip2,blosc}] [--script SCRIPT] [-q]
files [files ...]
positional arguments:
files
optional arguments:
-h, --help show this help message and exit
-n ENTRIES, --entries ENTRIES
number of entries to read at once (default: 100000.0)
-f, --force overwrite existing output files (default: False)
--ext EXT output file extension (default: h5)
-c {0,1,2,3,4,5,6,7,8,9}, --complevel {0,1,2,3,4,5,6,7,8,9}
compression level (default: 5)
-l {zlib,lzo,bzip2,blosc}, --complib {zlib,lzo,bzip2,blosc}
compression algorithm (default: zlib)
--script SCRIPT Python script containing a function with the same name
that will be called on each tree and must return a tree or
list of trees that will be converted instead of the
original tree (default: None)
-q, --quiet suppress all warnings (default: False)
As of when I last checked (a few months ago) root2hdf5 had a limitation that it could not handle TBranches which were arrays. For this reason I wrote a bash script: root2hdf (sorry for non-creative name).
It takes a ROOT file and the path to the TTree in the file as input arguments and generates source code & compiles to an executable which can be run on ROOT files, converting them into HDF5 datasets.
It also has a limitation that it cannot handle compound TBranch types, but I don't know that root2hdf5 does either.

Resources