sed extract multiple possible(?) values from a file

sed extract multiple possible(?) values from a file - parsing

I have a file that has multiple lines like the following:
"<sender from="+919892000000" msisdn="+919892000000" ipAddress="" destinationServerIp="" pcfIp="" imsi="892000000" sccpAddress="+919895000005" country="IN" network="India::Airtel (Kerala)"
"<sender from="+919892000000" msisdn="+919892000000" ipAddress="" destinationServerIp="" pcfIp="" sccpAddress="+919895000005" country="IN" network="India::Airtel (Kerala)"
In the first one imsi exists and in the second line imsi does not exist
For every line that starts with the word sender (there are other lines in the file) I want to extract both the msisdn value and the imsi value. If the imsi value is not there I would line to print out imsi: Unknown.
I tried the following but it does not work:
/sender / { /msisdn/ {s/.*msisdn=\"([^\"]*)?\".*/msisdn: \1/}; p; /imsi/ {s/.*imsi=\"([^\"]*)?\".*/imsi: \1/}; /imsi/! {s/.*/imsi: Unknown/}; p};
What am I missing?
A

Your match for "msisdn" is stripping out the "imsi" so the negative match is always taken. Simply copy your line into hold space, do your "msisdn" processing, swap the hold space back into pattern space, then do your "imsi" processing:
/sender / {h; /msisdn/ {s/.*msisdn=\"([^\"]*)?\".*/msisdn: \1/}; p;x; /imsi/ {s/.*imsi=\"([^\"]*)?\".*/imsi: \1/}; /imsi/! {s/.*/imsi: Unknown/};p}

This can be done using the following sed script:
s/^.*sender .*msisdn="\([^"]*\)" .* imsi="\([^"]*\)".*$/msisdn: \1, imsi: \2/
t
s/^.*sender .*msisdn="\([^"]*\)".*$/msisdn: \1, imsi: Unknown/
t
d
The first s command will print all
sender lines containing the imsi
field.
The first t command will continue
with the next line if the previous
command succeeded.
The second t command will print all
sender lines without the imsi field.
The second t command will continue
with the next line if the previous
command succeeded.
The d command will remove all other
lines.
In order to run this script, just copy it to a file and run it using sed -f script.

Just to add to why I am using sed for this particular problem. The following is the multi lined sed that I am using to create a data structure to pass into awk:
cat xmlEventLog_2010-03-23T* |
sed -nr "/<event eventTimestamp/,/<\/event>/ {
/event /{/uniqueId/ {s/.*uniqueId=\"([^\"]+)\".*/\nuniqueId: \1/g}; /uniqueId/! {s/.*/\nuniqueId: Unknown/}; p};
/payloadType / {/type/ {s/.*type=\"([^\"]+)\".*/payload: \1/g}; /type/! {s/.*protocol=\"([^\"]+)\".*/payload: \1/g}; p};
***/sender / { /msisdn/ {s/.*msisdn=\"([^\"]*)?\".*/msisdn: \1/}; p; /imsi/ {s/.*imsi=\"([^\"]*)?\".*/imsi: \1/}; p; /imsi/! {s/.*/imsi: Unknown/}; p};
/result /{s/.*value=\"([^\"]+)\".*/result: \1/g; p}; /filter code/{s/.*type=\"([^\"]+)\".*/type: \1/g; p}}"
| awk 'BEGIN{FS="\n"; RS=""; OFS=";"; ORS="\n"} $4~/payload: SMS-MT-FSM-INFO|SMS-MT-FSM|SMS-MT-FSM-DEL-REP|SMS-MT-FSM-DEL-REP-INFO|SMS-MT-FSM-DEL-REP/ && $2~/result: Blocked|Modified/ && $3~/msisdn: +919844000011/ {$1=$1 ""; print}'
This parses out files that are filled with events like so:
<event eventTimestamp="2010-03-23T00:00:00.074" originalReceivedMessageSize="28" uniqueId="1280361600.74815_PFS_1_2130328364" deliveryReport="true">
<result value="Allowed"/>
<source name="MFE" host="PFS_1"/>
<sender from="+919892000000" msisdn="+919892000000" ipAddress="" destinationServerIp="" pcfIp="" imsi="892000000" sccpAddress="+919895000005" country="IN" network="India::Airtel (Kerala)">
<profile code=""/>
<mvno code=""/>
</sender>
<recipients>
<recipient code="+919844000039" imsi="892000000" SccpAddress="+919895000005" country="IN" network="India::Airtel (Kerala)">
</recipient>
</recipients>
<payload>
<payloadType protocol="SMS" type="SMS-MT-FSM-DEL-REP"/>
<message signature="70004b7c9267f348321cde977c96a7a3">
<MailFrom value=""/>
<rcptToList>
</rcptToList>
<pduList>
<pdu type="SMS_SS_REQUEST_IND" time="2010-07-29T00:00:00.074" source="SMSPROBE" dest="PCF"/>
<pdu type="SMS_SS_REQ_HANDLING_STOP" time="2010-07-29T00:00:00.074" source="PCF" dest=""/>
</pduList>
<numberOfImages>0</numberOfImages>
<attachments numberOf="1">
<attachment index="0" size="28" contentType="text/plain"/>
</attachments>
<emailSmtpDeliveryStatus value="" time="" reason=""/>
<pepId value="989350000109.989350000209.0.0"/>
</message>
</payload>
<filters>
</filters>
</event>
There could be up to 10000 events like the one above each file and there will be hundreds of files. The structures output for awk should be of the type:
uniqueId: 1280361600.208152_PFS_1_1509661383
result: Allowed
msisdn: +919892000000
imsi: 892000000
payload: SMS-MT-FSM-DEL-REP
filter:
So for this reason I need to extract 2 values from the sender line and different values from the other lines. The abovementioned filter extracts all correctly except for the part when the sender line is found (marked *** in the filter). So I just want to extract the 2 items from the sender line for the structure. Multiple attempts have failed.

I used Perl to solve your problem.
cat file | perl -n -e 'if (/sender.*msisdn="([^"]*)"(.*imsi="([^"]*)")?/) { print $1, " ", $3 || "unknown", "\n"; }'

Related

print certain words that begins with x from one line

i want to somehow print words where the word starts with for example srcip and srcintf, from this line from /var/log/syslog
Jul 21 13:13:35 some-name date=2020-07-21 time=13:13:34 devname="devicename" devid="deviceid" logid="0000000013" type="traffic" subtype="forward" level="notice" vd="root" eventtime=1595330014 srcip=1.2.3.4 srcport=57324 srcintf="someinterface" srcintfrole="wan" dstip=5.6.7.8 dstport=80 dstintf="anotherinterface" dstintfrole="lan" sessionid=supersecretid proto=6 action="deny" policyid=0 policytype="policy" service="HTTP" dstcountry="Sweden" srccountry="Sweden" trandisp="noop" duration=0 sentbyte=0 rcvdbyte=0 sentpkt=0 appcat="unscanned" crscore=30 craction=131072 crlevel="high"
to something that looks like this
date=2020-07-21 time=13:13:34 devname="devicename" action="deny" policyid=0 srcintf="someinterface" dstintf="anotherinterface" srcip=1.2.3.4 srcport=57324 -----> dstip=5.6.7.8 dstport=80
currently im using awk to do it. the scalability of it is pretty bad for obvious reasons:
cat /var/log/syslog | awk '{print $5,$6,$7,$25,$26,$17,$21,$15,$16,"-----> "$19,$20}'
also not all the lines have srcip in the same "field". so some lines are really skewed.
or would a syslog message rewriter be better for this purpose? how would you go about solving this? thanks in advance!

$ cat tst.awk
{
delete f
for (i=5; i<=NF; i++) {
split($i,tmp,/=/)
f[tmp[1]] = $i
}
print f["date"], f["time"], f["devname"], f["action"], f["policyid"], f["srcintf"], \
f["dstintf"], f["srcip"], f["srcport"], "----->", f["dstip"], f["dstport"]
}
.
$ awk -f tst.awk file
date=2020-07-21 time=13:13:34 devname="devicename" action="deny" policyid=0 srcintf="someinterface" dstintf="anotherinterface" srcip=1.2.3.4 srcport=57324 -----> dstip=5.6.7.8 dstport=80
The above assumes your quoted strings do not contain spaces as shown in your sample input.

I present you an awk answer which is flexible and, instead of a simple one-liner, a bit a more programmatic way. Your log-file has lines that look in general like:
key1=value1 key2=value2 key3=value3 ...
The idea in this awk is to break it down into an array in awk which is associative, so that the elements can be called as:
a[key1]=>value1 a[key2]=>value2 ... a[key2,"full"]=>key2=value2 ...
Using a function which is explained in this answer, you can write:
awk '
function str2map(str,fs1,fs2,map, n,tmp) {
n=split(str,map,fs1)
for (;n>0;n--) {
split(map[n],tmp,fs2);
map[tmp[1]]=tmp[2]; map[tmp[1],"full"]=map[n]
delete map[n]
}
}
{ str2map($0," ","=",a) }
{ print a["date","full"],a["time","full"],a["devname","full"],a["action","full"] }
' file
This method is very flexible. There is also no dependency in the order of the line or whatever.
note: the above method does not take care of quoting. So if a space appears within a quoted string, it might mess things up.

If you have filter.awk:
BEGIN{
split(filter,a,",");
for (i in a){
f[a[i]]=1;
}
}
{
for (i=1; i<=NF; i++) {
split($i,b,"=");
if (b[1] in f){
printf("%s ", $i);
}
}
printf("\n");
}
you can do:
awk -v filter="srcip,srcintf" -f filter.awk /var/log/syslog
In the filter you specify, comma separated, the keywords. It has to find
note: this script also assumed there file is of the form: key1=value key2=value and that there are no space in the values.

abas-ERP: How to create a dataset without using the .make-command

I need to create a dataset in a file that is not covered by the .make-command. How can I achieve this?
I tried it using the file-identifier you use in the .select-command and the correct group-identifier (e.g. group3). When run, it prompted "wrong group".

You can also use EDP or EPI.
Short example how to create a customer using EDP:
.type text xtedp xterr xtres xtsys
..
..: file containing the EDP commands
.file -TEMPNAME U|xtedp
..: file containing the error output of the edp command
.file -TEMPNAME U|xterr
..: file containing the id of the new customor
.file -TEMPNAME U|xtres
..
..: Create the edp command file containing two new customers
.input DATEI.F
.output new 'U|xtedp'
# hier you can write a comment for the edp file
#!database=0
#!group=1
#!action=new
#!password=yourpassword
#!charset=EKS
#!report=NUM
#!DONTCHANGE=-
#!TRANSACTION=1
# now we list all fields which we want to write
such#name#ans#plz#nort#str#
DOW#John Dow Ldt;#John Dow Ltd#12345#Someplace#Somestreet#
Max#Max Ldt;#Max Ltd#22345#Someplace2#Somestreet2#
..
..: close edp file
.output TERMINAL
..
..: Execute the edp command
.formula U|xtsys = "edpimport.sh " + " -t# -I " + 'U|xtedp' + " > " + 'U|xtrep' + " 2> " + 'U|xterr'
.system 'U|xtsys' background
..
..: G|mehr or G|success is "true" when the command could be executed successfully (on some abas-ERP versions G|success does not wort, use G|mehr)
.continue ERROR ? _G|success
.continue SHOW
!ERROR
..: Do something here!!
..
!SHOW
New customer(s) created:
.input -TEXT 'U|xtres'
.continue
The great benefit using edp is, that you can use transactions. If one operation fails, all transactions will be rolled back.

There is a workaround via .command
When the invisible-Property is set to 1, the mask doesn't become visible and the dataset is saved immediately.
You can use it as follows:
.formula xtCmd = "<File-Identifier> <new>, Group-Identifier ? param1=value1|param2=value2|[invisible]=1"
.command -WAIT -ID maskID 'U|xtCmd'

parse multilines from a file and replace

I need to read a file where the content is like below :
Computer Location = afp.local/EANG
Description = RED_TXT
Device Name = EANG04W
Domain Name = afp.local
Full Name = Admintech
Hardware Monitoring Type = ASIC2
Last Blocked Application Scan Date = 1420558125
Last Custom Definition Scan Date = 1348087114
Last Hardware Scan Date = 1420533869
Last Policy Sync Date = 1420533623
Last Software Scan Date = 1420533924
Last Update Scan Date = 1420558125
Last Vulnerability Scan Date = 1420558125
LDAP Location = **CN=EANG04W**,OU=EANG,DC=afp,DC=local
Login Name = ADMINTECH
Main Board OEM Name = Dell Inc.
Number of Files = 384091
Primary Owner = **CN= LOUHICHI anoir**,OU=EANG,DC=afp,DC=localenter code here
I need to replace CN=$value by CN=Compagny where $value is what is retrived after CN= and before ,.

Ok, so you really should have updated your question an not posted the code in a comment, because it's really hard to read. Here's what I think you intended:
$file = 'D:\sources\scripts\2.txt'
$content = Get-Content $file | foreach ($line in $content) {
if ($line.Contains('CN=')) {
$variable = $line.Split(',').Split('=')[2]
$variable1 = $variable -replace $variable, "Compagny"
} Set-Content -path $file
}
That deffinately has some syntax errors. The first line is great, you define the path. Then things go wrong... Your call to Get-Content is fine, that will get the contents of the file, and send them down the pipe.
You pipe that directly into a ForEach loop, but it's the wrong kind. What you really want there is a ForEach-Object loop (which can be confusing, because it can be shortened to just ForEach when used in a pipeline like this). The ForEach-Object loop does not declare an internal variable (such as ($line in $content)) and instead the scriptblock uses the automatic variable $_. So your loop needs to become something like:
Get-Content $file | ForEach { <do stuff> } | Set-Content
Next let's look inside that loop. You use an If statement to see if the line contains "CN=", understandable, and functional. If it does you then split the line on commas, and then again on equals, selecting the second record. Hm, you create an array of strings anytime you split one, and you have split a string twice, but only specify which record of the array you want to work with for the second split. That could be a problem. Anyway, you assign that substring to $variable, and proceed to replace that whole thing with "company" and store that output to $variable1. So there's a couple issues here. Once you split the string on the commas you have the following array of strings:
"LDAP Location = **CN=EANG04W**"
"OU=EANG"
"DC=afp"
"DC=local"
That's an array with 4 string objects. So then you try to split at least one of those (because you don't specify which one) on the equals sign. You now have an array with 4 array objects, where each of those has 2 string objects:
("LDAP Location", "**CN", "EANG04W**")
("OU", "EANG")
("DC","afp")
("DC","local")
You do specify the third record at this point (arrays in PowerShell start at record 0, so [2] specifies the third record). But you didn't specify which record in the first array so it's just going to throw errors. Let's say that you actually selected what you really wanted though, and I'm guessing that would be "EANG04W". (by the way, that would be $_.Split(",")[0].Split("=")[1]). You then assign that to $Variable, and proceed to replace all of it with "Company", so after PowerShell expands the variable it would look like this:
$variable1 = "EANG04W" -replace "EANG04W", "company"
Ok, you just successfully assigned "company" to a variable. And your If statement ends there. You never output anything from inside your If statement, so Set-Content has nothing to set. Also, it would set that nothing for each and every line that is piped to the ForEach statement, re-writing the file each time, but fortunately for you the script didn't work so it didn't erase your file. Plus, since you were trying to pipe to Set-Content, there was no output at the end of the pipeline, you have assigned absolutely nothing to $content.
So let's try and fix it, shall we? First line? Works great! No change. Now, we aren't saving anything in a variable, we just want to update a file's content, so there's no need to have $Content = there. We'll just move on then, shall we? We pipe the Get-Content into a ForEach loop, just like you tried to do. Once inside the ForEach loop, we're going to do things a bit differently though. The -replace method performs a RegEx match. We can use that to our advantage here. We will replace the text you are interested in for each line, and if it's not found, no replacement will be made, and pass each line on down the pipeline. That will look something like this for the inside of the ForEach:
$_ -replace "(<=CN\=).*?(?=,)", "Company"
The breakdown of that RegEx match can be seen here: https://regex101.com/r/gH6hP2/1
But, let's just say that it looks for text that has 'CN=' immediately before it, and goes up to the first comma following it. In your example, that includes the two trailing asterisks, but it doesn't touch the leading ones. Is that what you intended? That would make the last line of your example file:
Primary Owner = **CN=Company,OU=EANG,DC=afp,DC=localenter code here
Well, if that is as intended, then we have a winner. Now we close out the ForEach loop, and pipe the output to Set-Content and we're all set! Personally, I would highly suggest outputting to a new file, in case you need to reference the original file for some reason later, so that's what I'm going to do.
$file = 'D:\sources\scripts\2.txt'
$newfile = Join-Path (split-path $file) -ChildPath ('Updated-'+(split-path $file -Leaf))
Get-Content $file | ForEach{$_ -replace "(?<=CN\=).*?(?=,)", "Company"} | Set-Content $newfile
Ok, that's it. That code will produce D:\sources\scripts\Updated-2.txt with the following content:
Computer Location = afp.local/EANG
Description = RED_TXT
Device Name = EANG04W
Domain Name = afp.local
Full Name = Admintech
Hardware Monitoring Type = ASIC2
Last Blocked Application Scan Date = 1420558125
Last Custom Definition Scan Date = 1348087114
Last Hardware Scan Date = 1420533869
Last Policy Sync Date = 1420533623
Last Software Scan Date = 1420533924
Last Update Scan Date = 1420558125
Last Vulnerability Scan Date = 1420558125
LDAP Location = **CN=Company,OU=EANG,DC=afp,DC=local
Login Name = ADMINTECH
Main Board OEM Name = Dell Inc.
Number of Files = 384091
Primary Owner = **CN=Company,OU=EANG,DC=afp,DC=localenter code here

Parse text with powershell and display only what you were searching for

I'm trying to parse a text file with this format:
\\fileshare40\abccheck\logons\ABC64ZXZ.txt:5398:UserID: abcusernamehere Logged: 09:18:36 2014/03/13
\\fileshare40\abccheck\logons\ABC63BZB.txt:5403:UserID: abcusernamehere Logged: 01:21:31 2014/03/14
\\fileshare40\abccheck\logons\ABC61ZSF.txt:5408:UserID: abcusernamehere Logged: 08:22:31 2014/03/17
\\fileshare40\abccheck\logons\ABC62ETB.txt:5413:UserID: abcusernamehere Logged: 07:58:52 2014/03/18
\\fileshare40\abccheck\logons\ABC60BBB.txt:5418:UserID: abcusernamehere Logged: 13:11:36 2014/03/19
The only thing I need out of here is the machine name (ABC*****). Later on I'll put it into an array to see what if there are duplicates, but the answer here will get me started on that path.
I've tried this:
$abc = select-string -path c:\users\abcusernamehere\desktop\findusermachines.txt -pattern "TCWS....." -allmatches
But doing so displays the whole line of text in that file. How can I break up the line to JUST find and display what I'm searching for?

For that, you can use a regex:
(get-content c:\users\abcusernamehere\desktop\findusermachines.txt) -replace '.+\\(.+?)\.txt:.+','$1'
ABC64ZXZ
ABC63BZB
ABC61ZSF
ABC62ETB
ABC60BBB
There's really no point in using Select-String to search for the lines that have the server names if they all have one. Just use Get-Content, and run the -replace operator against all the lines.

You can do that with a regex match.
[regex]::Matches((gc c:\users\abcusernamehere\desktop\findusermachines.txt),"ABC.....")|select -ExpandProperty value
That spits back:
ABC64ZXZ
ABC63BZB
ABC61ZSF
ABC62ETB
ABC60BBB

Yet an other answer :-)
${c:\users\abcusernamehere\desktop\findusermachines.txt } | ? { $_ -cmatch "\b(?<MACHINE>ABC.+)\b.txt" } | % { $Matches['MACHINE'] }
ABC64ZXZ
ABC63BZB
ABC61ZSF
ABC62ETB
ABC60BBB

Parse a file to pull out 2 sets of information

I have a log file that keeps a record of user inputs. Each line in the log is unique and I need to pull out 2 specific items - a userId and a URL. I can't just use awk < file '{print$1, print$6}' because the items are not always in the same position in each line.
Example text:
userId='1' managed:no address:123street phone:1234567890 http:/someurl.com
newuser:yes userId='2' managed:yes address:123street http:/someurl.com
userId='3' address:123 street phone:1234567890 http:/someurl.com
userId='4' managed:no address:123street phone:1234567890 http:/someurl.com
I need to parse the userId and URL address to a file, but these are not always in the same position in each line. Any suggestions would be greatly appreciated.

$ awk '{for(i=1;$i!~/userId/;i++); print $i, $NF}' file
userId='1' http:/someurl.com
userId='2' http:/someurl.com
userId='3' http:/someurl.com
userId='4' http:/someurl.com

Try the following gawk code :
gawk '{
for (i=1; i<=NF; i++)
if ($i ~ "^userId=") id=gensub(/userId=\047([0-9]+)\047/, "\\1", "", $i)
else if ($i ~ "^http") url=$i
print "In line "NR", the id is "id" and the url is "url
}' file.txt
Sample input :
userId='1' managed:no address:123street phone:1234567890 http:/someurl1.com
newuser:yes userId='2' managed:yes address:123street http:/someurl2.com
userId='3' address:123 street phone:1234567890 http:/someurl3.com
userId='4' managed:no address:123street phone:1234567890 http:/someurl4.com
Sample output :
In line 1, the id is 1 and the url is http:/someurl1.com
In line 2, the id is 2 and the url is http:/someurl2.com
In line 3, the id is 3 and the url is http:/someurl3.com
In line 4, the id is 4 and the url is http:/someurl4.com
This solution have the advantage to have id or http items to be anywhere you'd want in lines.

With awk:
awk '{for(c=1;c<NF;c++){if(match($c,/userId/)){print $c,$NF; break}}}' your.file
Output:
userId='1' http:/someurl.com
userId='2' http:/someurl.com
userId='3' http:/someurl.com
userId='4' http:/someurl.com

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

sed extract multiple possible(?) values from a file - parsing

I used Perl to solve your problem. cat file | perl -n -e 'if (/sender.msisdn="([^"])"(.imsi="([^"])")?/) { print $1, " ", $3 || "unknown", "\n"; }'

Related

print certain words that begins with x from one line

abas-ERP: How to create a dataset without using the .make-command

parse multilines from a file and replace

Parse text with powershell and display only what you were searching for

Parse a file to pull out 2 sets of information

Categories

Resources

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

sed extract multiple possible(?) values from a file - parsing

I used Perl to solve your problem. cat file | perl -n -e 'if (/sender.*msisdn="([^"]*)"(.*imsi="([^"]*)")?/) { print $1, " ", $3 || "unknown", "\n"; }'

Related

print certain words that begins with x from one line

abas-ERP: How to create a dataset without using the .make-command

parse multilines from a file and replace

Parse text with powershell and display only what you were searching for

Parse a file to pull out 2 sets of information

Categories

Resources

I used Perl to solve your problem. cat file | perl -n -e 'if (/sender.msisdn="([^"])"(.imsi="([^"])")?/) { print $1, " ", $3 || "unknown", "\n"; }'