print certain words that begins with x from one line - parsing

i want to somehow print words where the word starts with for example srcip and srcintf, from this line from /var/log/syslog
Jul 21 13:13:35 some-name date=2020-07-21 time=13:13:34 devname="devicename" devid="deviceid" logid="0000000013" type="traffic" subtype="forward" level="notice" vd="root" eventtime=1595330014 srcip=1.2.3.4 srcport=57324 srcintf="someinterface" srcintfrole="wan" dstip=5.6.7.8 dstport=80 dstintf="anotherinterface" dstintfrole="lan" sessionid=supersecretid proto=6 action="deny" policyid=0 policytype="policy" service="HTTP" dstcountry="Sweden" srccountry="Sweden" trandisp="noop" duration=0 sentbyte=0 rcvdbyte=0 sentpkt=0 appcat="unscanned" crscore=30 craction=131072 crlevel="high"
to something that looks like this
date=2020-07-21 time=13:13:34 devname="devicename" action="deny" policyid=0 srcintf="someinterface" dstintf="anotherinterface" srcip=1.2.3.4 srcport=57324 -----> dstip=5.6.7.8 dstport=80
currently im using awk to do it. the scalability of it is pretty bad for obvious reasons:
cat /var/log/syslog | awk '{print $5,$6,$7,$25,$26,$17,$21,$15,$16,"-----> "$19,$20}'
also not all the lines have srcip in the same "field". so some lines are really skewed.
or would a syslog message rewriter be better for this purpose? how would you go about solving this? thanks in advance!

$ cat tst.awk
{
delete f
for (i=5; i<=NF; i++) {
split($i,tmp,/=/)
f[tmp[1]] = $i
}
print f["date"], f["time"], f["devname"], f["action"], f["policyid"], f["srcintf"], \
f["dstintf"], f["srcip"], f["srcport"], "----->", f["dstip"], f["dstport"]
}
.
$ awk -f tst.awk file
date=2020-07-21 time=13:13:34 devname="devicename" action="deny" policyid=0 srcintf="someinterface" dstintf="anotherinterface" srcip=1.2.3.4 srcport=57324 -----> dstip=5.6.7.8 dstport=80
The above assumes your quoted strings do not contain spaces as shown in your sample input.

I present you an awk answer which is flexible and, instead of a simple one-liner, a bit a more programmatic way. Your log-file has lines that look in general like:
key1=value1 key2=value2 key3=value3 ...
The idea in this awk is to break it down into an array in awk which is associative, so that the elements can be called as:
a[key1]=>value1 a[key2]=>value2 ... a[key2,"full"]=>key2=value2 ...
Using a function which is explained in this answer, you can write:
awk '
function str2map(str,fs1,fs2,map, n,tmp) {
n=split(str,map,fs1)
for (;n>0;n--) {
split(map[n],tmp,fs2);
map[tmp[1]]=tmp[2]; map[tmp[1],"full"]=map[n]
delete map[n]
}
}
{ str2map($0," ","=",a) }
{ print a["date","full"],a["time","full"],a["devname","full"],a["action","full"] }
' file
This method is very flexible. There is also no dependency in the order of the line or whatever.
note: the above method does not take care of quoting. So if a space appears within a quoted string, it might mess things up.

If you have filter.awk:
BEGIN{
split(filter,a,",");
for (i in a){
f[a[i]]=1;
}
}
{
for (i=1; i<=NF; i++) {
split($i,b,"=");
if (b[1] in f){
printf("%s ", $i);
}
}
printf("\n");
}
you can do:
awk -v filter="srcip,srcintf" -f filter.awk /var/log/syslog
In the filter you specify, comma separated, the keywords. It has to find
note: this script also assumed there file is of the form: key1=value key2=value and that there are no space in the values.

Related

Bookmarks parsing issue

I have a LARGE number of bookmarks and wanted to export them and share them with a group I work with. The issue is that when I export them, there are ADD_DATE and LAST_MODIFIED fields added by the browser (Firefox). I was hoping to just use cut or awk to pull the fields I want but the lack of a space before the >(website_name) is making that difficult. And my regex skills are weak.
How do I add a single space before the second to last > at the end of the line so that I can use cut or awk to pull out the fields I want into a new file?
Ex: 123456">SecurityTrails would become 123456 >SecurityTrails
Please see below for examples of what I'm working with. Any help is greatly appreciated!
<DT>SecurityTrails
i use firefox myself. it frequently also embeds favicon into the exported bookmarks.html file via base64 encoding. so to account for the different scenarios (than just the one mentioned by OP), maybe something like
{mawk/mawk2/gawk} 'BEGIN { FS = "\042" } $1 = $1'
then do whatever cutting that you want. That's just assuming OP wanted to keep every bit of it, and simply remove the quotations.
Now, if the objective is just to take out URL+Name of it,
{mawk/mawk2/gawk} 'BEGIN { DBLQT="\042"; FS = "(<A HREF=" DBLQT "|>)" } /<A HREF=/ {
url = substr($2, 1, index($2, DBLQT) - 1);
sitename = $(NF-1);
sub(/<\/A$/, "", sitename) ;
print url " > " sitename ; }' # or whatever way you want the output to be
I just typed it in extra verbosity to show what \042 meant - the ascii octal for double quote.

consolidating foreach command

foreach old_cellname $old_cell_full_name {
echo $old_cellname >> origin.txt}
foreach origin $cell_origin {
echo $origin >> origin.txt}
foreach new_cellname $new_cell_full_name {
echo $new_cellname >> origin.txt}
Using the above code I am able to get the output in the origin.txt as old cell names followed by their origin numbers followed by the new cell names. But i want my output as rows ie old cell name its origin and new cell name. Is it possible to make these changes? Any help is highly appreciated.
If the lists are of the same length and match up sensibly, yes, of course. Just use a multi-list foreach:
foreach old $old_cell_full_name origin $cell_origin new $new_cell_full_name {
# echo isn't a standard Tcl command, but I guess this ought to work
echo "$old\t$origin\t$new" >> origin.txt
}
I assume tab-separated will do. It's pretty convenient since it lets you import the data into a spreadsheet easily. If you prefer commas, use , instead of \t.

Parse text with powershell and display only what you were searching for

I'm trying to parse a text file with this format:
\\fileshare40\abccheck\logons\ABC64ZXZ.txt:5398:UserID: abcusernamehere Logged: 09:18:36 2014/03/13
\\fileshare40\abccheck\logons\ABC63BZB.txt:5403:UserID: abcusernamehere Logged: 01:21:31 2014/03/14
\\fileshare40\abccheck\logons\ABC61ZSF.txt:5408:UserID: abcusernamehere Logged: 08:22:31 2014/03/17
\\fileshare40\abccheck\logons\ABC62ETB.txt:5413:UserID: abcusernamehere Logged: 07:58:52 2014/03/18
\\fileshare40\abccheck\logons\ABC60BBB.txt:5418:UserID: abcusernamehere Logged: 13:11:36 2014/03/19
The only thing I need out of here is the machine name (ABC*****). Later on I'll put it into an array to see what if there are duplicates, but the answer here will get me started on that path.
I've tried this:
$abc = select-string -path c:\users\abcusernamehere\desktop\findusermachines.txt -pattern "TCWS....." -allmatches
But doing so displays the whole line of text in that file. How can I break up the line to JUST find and display what I'm searching for?
For that, you can use a regex:
(get-content c:\users\abcusernamehere\desktop\findusermachines.txt) -replace '.+\\(.+?)\.txt:.+','$1'
ABC64ZXZ
ABC63BZB
ABC61ZSF
ABC62ETB
ABC60BBB
There's really no point in using Select-String to search for the lines that have the server names if they all have one. Just use Get-Content, and run the -replace operator against all the lines.
You can do that with a regex match.
[regex]::Matches((gc c:\users\abcusernamehere\desktop\findusermachines.txt),"ABC.....")|select -ExpandProperty value
That spits back:
ABC64ZXZ
ABC63BZB
ABC61ZSF
ABC62ETB
ABC60BBB
Yet an other answer :-)
${c:\users\abcusernamehere\desktop\findusermachines.txt } | ? { $_ -cmatch "\b(?<MACHINE>ABC.+)\b.txt" } | % { $Matches['MACHINE'] }
ABC64ZXZ
ABC63BZB
ABC61ZSF
ABC62ETB
ABC60BBB

How do I know whether I'm looking at a newline or carriage return etc.?

For example, say I wanted to determine whether this form was storing newlines as carriage returns or newlines or whatever characters. I'm often in situations where I'm writing code and am not sure what type of new-line character a file/form/whatever I'm parsing is using.
How could I determine this? Is there a way to determine this without actually doing a check inside of code? (It seems like I should be able to right-click and "show all characters" or something like that).
Note: I realize I could write code saying
(if == '\r') cout << "Carriage";
etc
but I have a feeling there's a simpler solution.
Maybe is list what you are looking for (from vim help):
:[range]l[ist] [count] [flags]
Same as :print, but display unprintable characters
with '^' and put $ after the line. This can be
changed with the 'listchars' option.
See ex-flags for [flags].
You can switch modes with:
:set list
and
:set nolist
Additionally you can use "listchars" as shown in this example:
You could for example check your document for occourences of "Carriage Return" or "New Line"/"Line Feed".
e.g. (php):
if( strstr( $yourstring , "\r" ) != false ){ // You have Carriage return
// Do something
}
elseif( strstr( $yourstring , "\n" ) != false ){ // You have New Line/Line feed
// Do something
}
else{
// You cannot determine which on is used, because the string is single-lined
}
I hope this is the thing you're looking for
Note: In windows "\r\n" is used to specify ne lines

sed extract multiple possible(?) values from a file

I have a file that has multiple lines like the following:
"<sender from="+919892000000" msisdn="+919892000000" ipAddress="" destinationServerIp="" pcfIp="" imsi="892000000" sccpAddress="+919895000005" country="IN" network="India::Airtel (Kerala)"
"<sender from="+919892000000" msisdn="+919892000000" ipAddress="" destinationServerIp="" pcfIp="" sccpAddress="+919895000005" country="IN" network="India::Airtel (Kerala)"
In the first one imsi exists and in the second line imsi does not exist
For every line that starts with the word sender (there are other lines in the file) I want to extract both the msisdn value and the imsi value. If the imsi value is not there I would line to print out imsi: Unknown.
I tried the following but it does not work:
/sender / { /msisdn/ {s/.*msisdn=\"([^\"]*)?\".*/msisdn: \1/}; p; /imsi/ {s/.*imsi=\"([^\"]*)?\".*/imsi: \1/}; /imsi/! {s/.*/imsi: Unknown/}; p};
What am I missing?
A
Your match for "msisdn" is stripping out the "imsi" so the negative match is always taken. Simply copy your line into hold space, do your "msisdn" processing, swap the hold space back into pattern space, then do your "imsi" processing:
/sender / {h; /msisdn/ {s/.*msisdn=\"([^\"]*)?\".*/msisdn: \1/}; p;x; /imsi/ {s/.*imsi=\"([^\"]*)?\".*/imsi: \1/}; /imsi/! {s/.*/imsi: Unknown/};p}
This can be done using the following sed script:
s/^.*sender .*msisdn="\([^"]*\)" .* imsi="\([^"]*\)".*$/msisdn: \1, imsi: \2/
t
s/^.*sender .*msisdn="\([^"]*\)".*$/msisdn: \1, imsi: Unknown/
t
d
The first s command will print all
sender lines containing the imsi
field.
The first t command will continue
with the next line if the previous
command succeeded.
The second t command will print all
sender lines without the imsi field.
The second t command will continue
with the next line if the previous
command succeeded.
The d command will remove all other
lines.
In order to run this script, just copy it to a file and run it using sed -f script.
Just to add to why I am using sed for this particular problem. The following is the multi lined sed that I am using to create a data structure to pass into awk:
cat xmlEventLog_2010-03-23T* |
sed -nr "/<event eventTimestamp/,/<\/event>/ {
/event /{/uniqueId/ {s/.*uniqueId=\"([^\"]+)\".*/\nuniqueId: \1/g}; /uniqueId/! {s/.*/\nuniqueId: Unknown/}; p};
/payloadType / {/type/ {s/.*type=\"([^\"]+)\".*/payload: \1/g}; /type/! {s/.*protocol=\"([^\"]+)\".*/payload: \1/g}; p};
***/sender / { /msisdn/ {s/.*msisdn=\"([^\"]*)?\".*/msisdn: \1/}; p; /imsi/ {s/.*imsi=\"([^\"]*)?\".*/imsi: \1/}; p; /imsi/! {s/.*/imsi: Unknown/}; p};
/result /{s/.*value=\"([^\"]+)\".*/result: \1/g; p}; /filter code/{s/.*type=\"([^\"]+)\".*/type: \1/g; p}}"
| awk 'BEGIN{FS="\n"; RS=""; OFS=";"; ORS="\n"} $4~/payload: SMS-MT-FSM-INFO|SMS-MT-FSM|SMS-MT-FSM-DEL-REP|SMS-MT-FSM-DEL-REP-INFO|SMS-MT-FSM-DEL-REP/ && $2~/result: Blocked|Modified/ && $3~/msisdn: +919844000011/ {$1=$1 ""; print}'
This parses out files that are filled with events like so:
<event eventTimestamp="2010-03-23T00:00:00.074" originalReceivedMessageSize="28" uniqueId="1280361600.74815_PFS_1_2130328364" deliveryReport="true">
<result value="Allowed"/>
<source name="MFE" host="PFS_1"/>
<sender from="+919892000000" msisdn="+919892000000" ipAddress="" destinationServerIp="" pcfIp="" imsi="892000000" sccpAddress="+919895000005" country="IN" network="India::Airtel (Kerala)">
<profile code=""/>
<mvno code=""/>
</sender>
<recipients>
<recipient code="+919844000039" imsi="892000000" SccpAddress="+919895000005" country="IN" network="India::Airtel (Kerala)">
</recipient>
</recipients>
<payload>
<payloadType protocol="SMS" type="SMS-MT-FSM-DEL-REP"/>
<message signature="70004b7c9267f348321cde977c96a7a3">
<MailFrom value=""/>
<rcptToList>
</rcptToList>
<pduList>
<pdu type="SMS_SS_REQUEST_IND" time="2010-07-29T00:00:00.074" source="SMSPROBE" dest="PCF"/>
<pdu type="SMS_SS_REQ_HANDLING_STOP" time="2010-07-29T00:00:00.074" source="PCF" dest=""/>
</pduList>
<numberOfImages>0</numberOfImages>
<attachments numberOf="1">
<attachment index="0" size="28" contentType="text/plain"/>
</attachments>
<emailSmtpDeliveryStatus value="" time="" reason=""/>
<pepId value="989350000109.989350000209.0.0"/>
</message>
</payload>
<filters>
</filters>
</event>
There could be up to 10000 events like the one above each file and there will be hundreds of files. The structures output for awk should be of the type:
uniqueId: 1280361600.208152_PFS_1_1509661383
result: Allowed
msisdn: +919892000000
imsi: 892000000
payload: SMS-MT-FSM-DEL-REP
filter:
So for this reason I need to extract 2 values from the sender line and different values from the other lines. The abovementioned filter extracts all correctly except for the part when the sender line is found (marked *** in the filter). So I just want to extract the 2 items from the sender line for the structure. Multiple attempts have failed.
I used Perl to solve your problem.
cat file | perl -n -e 'if (/sender.*msisdn="([^"]*)"(.*imsi="([^"]*)")?/) { print $1, " ", $3 || "unknown", "\n"; }'

Resources