How do I get the raw predictions (-r) from Vowpal Wabbit when running in daemon mode? - machine-learning

Using the below, I'm able to get both the raw predictions and the final predictions as a file:
cat train.vw.txt | vw -c -k --passes 30 --ngram 5 -b 28 --l1 0.00000001 --l2 0.0000001 --loss_function=logistic -f model.vw --compressed --oaa 3
cat test.vw.txt | vw -t -i model.vw --link=logistic -r raw.txt -p predictions.txt
However, I'm unable to get the raw predictions when I run VW as a daemon:
vw -t -i model.vw --daemon --port 26542 --link=logistic
Do I have a pass in a specific argument or parameter to get the raw predictions? I prefer the raw predictions, not the final predictions. Thanks

On systems supporting /dev/stdout (and /dev/stderr), you may try this:
vw -t -i model.vw --daemon --port 26542 --link=logistic -r /dev/stdout
The daemon will write raw predictions into standard output which in this case end up in the same place as localhost port 26542.
The relative order of lines is guaranteed because the code dealing with different prints within each example (e.g non-raw vs raw) is always serial.

Since November 2015, the easiest way how to obtain probabilities is to use --oaa=N --loss_function=logistic --probabilities -p probs.txt. (Or if you need label-dependent features: --csoaa_ldf=mc --loss_function=logistic --probabilities -p probs.txt.)
--probabilities work with --daemon as well. There should be no more need for using --raw_predictions.

--raw_predictions is a kind of hack (the semantic depends on the reductions used) and it is not supported in --daemon mode. (Something like --output_probabilities would be useful and not difficult to implement and it would work in daemon mode, but so far no one had time to implement it.)
As a workaround, you can run VW in a pipe, so it reads stdin and writes the probabilities to stdout:
cat test.data | vw -t -i model.vw --link=logistic -r /dev/stdout | script.sh

According to https://github.com/VowpalWabbit/vowpal_wabbit/issues/1118 you can try adding --scores option in command line:
vw --scores -t -i model.vw --daemon --port 26542
It helped me with my oaa model.

Related

Exporting encrypted SNMPv3 traps to JSON with TShark

I have a pcap file with recordings of encrypted SNMPv3 traps from Wireshark (Version 3.2.2). For analyzing the traps, I want to export the protocol data to json using tshark.
tshark.exe -T ek -Y "snmp" -P -V -x -r input.pcap > output.json
Currently, I supply the infos to decrypt the packages via the "snmp_users" file in C:\Users\developer\AppData\Roaming\Wireshark.
# This file is automatically generated, DO NOT MODIFY.
,"snmp_user","SHA1","xxxxxx","AES","yyyyyyy"
Is it possible to supply the options via commandline?
I have tried:
tshark.exe -T ek -Y "snmp" -P -V -x -o "snmp.users_table.username:snmp_user" ...
But that causes an error:
tshark: -o flag "snmp.users_table.username:snmp_user" specifies unknown preference
Update 16.09.2020:
Option -Y used instead of -J:
-Y|--display-filter
Cause the specified filter (which uses the syntax of read/display
filters, rather than that of capture filters) to be applied before
printing a decoded form of packets or writing packets to a file.
You need to specify the option as a User Access Table or uat, with the specific table being the name of the file, namely snmp_users. So, for example:
On Windows:
tshark.exe -o "uat:snmp_users:\"\",\"snmp_user\",\"SHA1\",\"xxxxxx\",\"AES\",\"yyyyyyy\"" -T ek -J "snmp" -P -V -x -r input.pcap > output.json
And on *nix:
tshark -o 'uat:snmp_users:"","snmp_user","SHA1","xxxxxx","AES","yyyyyyy"' -T ek -J "snmp" -P -V -x -r input.pcap > output.json
Unfortunately, the Wireshark documentation is currently lacking in describing the uat option. There is a Google Summer of Code project underway though, of which Wireshark is participating, so perhaps documentation will be improved here.

How to colorize logs for docker container

I have container which in logs sometimes write key word which is for me important, and I want to highlight this word in color in my terminal, but also important is still see all content logs in real time (--follow). I just tried command
docker logs -f my_app --tail=100 | grep --color -E '^myWord'
but not working.
So exist some way to do this ?
I use ccze. as #aimless said, grc is the great utility also. It easy to install by sudo apt install ccze for debian/ubuntu-like OS
But if you want to colorize stderr, you need to redirect stderr output to stdout. For example:
docker logs -f my-app 2>&1 | ccze -m ansi
arg -m ansi helps if you want to scroll output normally
UPD:
ccze can be very slow. If you encounter this, try running ccze with the nolookups option: ccze -o nolookups.
originally answered - https://unix.stackexchange.com/a/461390/83391
Try this.
docker logs -f my_app --tail=100 | grep --color=always -E '^myWord'
Note the "--color=always" argument.
Another option would be to use something like https://github.com/jlinoff/colorize. I wrote it to specifically address situations like this. For example it has the ability to specify different colors for each pattern (see the help for details).
Here is an example of how to use it for your case.
$ curl -L https://github.com/jlinoff/colorize/releases/download/v0.8.1/colorize-linux-amd64 --out colorize
$ chmod a+x colorize
$ ./colorize -h
$ docker logs -f my_app --tail=100 | ./colorize '^myWord'
$ # really make it standout.
$ docker logs -f my_app --tail=100 | ./colorize -c red+greenB+bold '^myWord'
try grc. Follow the instruction to install and just pipe the logs output:
docker logs -app | grc

vowpal wabbit for binary text classification setup

I am using vw-8.20170116 for a binary text classification problem. The text strings are concatenated from several short (5-20 words) strings. The input looks like
-1 1.0 |aa .... ... ..... |bb ... ... .... .. |cc ....... .. ...
1 5.0 |aa .... ... ..... |bb ..... .. .... . |cc .... .. ...
The command that I am using for training is
./vw-8.20170116 -d train_feat.txt -k -c -f model.vw --ngram 2 --skips 2 --nn 10 --loss_function logistic --passes 100 --l2 1e-8 --holdout_off --threads --ignore bc
and for test
./vw-8.20170116 -d test_feat.txt -t --loss_function logistic --link logistic -i model.vw -p test_pred.txt
Question: How can I get vw to run (train) in parallel on my 8-core machine? I thought --threads should help but I am not seeing any speedups. And how do I control the number of cores used?
Using this link for reference.

Delete certain line while using iperf

I run iperf command like this :
iperf -c 10.0.0.1 -t 2 -f m -w 1K | grep -Po '[0-9.]*(?= Mbits/sec)'
I want to display throughput only such as 0.32 but because I use 1K here, there is a warning and the display becomes
WARNING: TCP window size set to 1024 bytes. A small window size will give poor performance. See the Iperf documentation.
0.32
How to delete this warning so I can get "0.32" only?
Just send the warning message to /dev/null, after that you get output only.
So your command would be,
iperf -c 10.0.0.1 -t 2 -f m -w 1K 2> /dev/null | grep -Po '[0-9.]*(?= Mbits/sec)'

How do I use tshark to print request-response pairs from a pcap file?

Given a pcap file, I'm able to extract a lot of information from the reconstructed HTTP request and responses using the neat filters provided by Wireshark. I've also been able to split the pcap file into each TCP stream.
Trouble I'm running into now is that of all the cool filters I'm able to use with tshark, I can't find one that will let me print out full request/response bodies. I'm calling something like this:
tshark -r dump.pcap -R "tcp.stream==123 and http.request" -T fields -e http.request.uri
Is there some filter name I can pass to -e to get the request/response body? The closest I've come is to use the -V flag, but it also prints out a bunch of information I don't necessary want and want to avoid having to kludge out with a "dumb" filter.
If you are willing to switch to another tool, tcptrace can do this with the -e option. It also has an HTTP analysis extension (xHTTP option) that generates the HTTP request/repsonse pairs for each TCP stream.
Here is a usage example:
tcptrace --csv -xHTTP -f'port=80' -lten capturefile.pcap
--csv to format output as comma sperated variable
-xHTTP for HTTP request/response written to 'http.times' this also switches on -e to dump the TCP stream payloads, so you really don't need -e as well
-f'port=80' to filter out non-web traffic
-l for long output form
-t to give me progress indication
-n to turn off hostname resolution (much faster without this)
If you captured a pcap file, you can do the following to show all requests+responses.
filename="capture_file.pcap"
for stream in `tshark -r "$filename" -2 -R "tcp and (http.request or http.response)" -T fields -e tcp.stream | sort -n | uniq`; do
echo "==========BEGIN REQUEST=========="
tshark -q -r "$filename" -z follow,tcp,ascii,$stream;
echo "==========END REQUEST=========="
done;
I just made diyism answer a bit easier to understand (you don't need sudo, and multiline script is imo simple to look at)
This probably wasn't an option when the question was asked but newer versions of tshark can "follow" conversations.
tshark -nr dump.pcap -qz follow,tcp,ascii,123
I know this is a super old question. I'm just adding this for anyone that ends up here looking for a current solution.
I use this line to show last 10 seconds request body and response body(https://gist.github.com/diyism/eaa7297cbf2caff7b851):
sudo tshark -a duration:10 -w /tmp/input.pcap;for stream in `sudo tshark -r /tmp/input.pcap -R "tcp and (http.request or http.response) and !(ip.addr==192.168.0.241)" -T fields -e tcp.stream | sort -n | uniq`; do sudo tshark -q -r /tmp/input.pcap -z follow,tcp,ascii,$stream; done;sudo rm /tmp/input.pcap

Resources