Parse a file to pull out 2 sets of information

Parse a file to pull out 2 sets of information - parsing

I have a log file that keeps a record of user inputs. Each line in the log is unique and I need to pull out 2 specific items - a userId and a URL. I can't just use awk < file '{print$1, print$6}' because the items are not always in the same position in each line.
Example text:
userId='1' managed:no address:123street phone:1234567890 http:/someurl.com
newuser:yes userId='2' managed:yes address:123street http:/someurl.com
userId='3' address:123 street phone:1234567890 http:/someurl.com
userId='4' managed:no address:123street phone:1234567890 http:/someurl.com
I need to parse the userId and URL address to a file, but these are not always in the same position in each line. Any suggestions would be greatly appreciated.

$ awk '{for(i=1;$i!~/userId/;i++); print $i, $NF}' file
userId='1' http:/someurl.com
userId='2' http:/someurl.com
userId='3' http:/someurl.com
userId='4' http:/someurl.com

Try the following gawk code :
gawk '{
for (i=1; i<=NF; i++)
if ($i ~ "^userId=") id=gensub(/userId=\047([0-9]+)\047/, "\\1", "", $i)
else if ($i ~ "^http") url=$i
print "In line "NR", the id is "id" and the url is "url
}' file.txt
Sample input :
userId='1' managed:no address:123street phone:1234567890 http:/someurl1.com
newuser:yes userId='2' managed:yes address:123street http:/someurl2.com
userId='3' address:123 street phone:1234567890 http:/someurl3.com
userId='4' managed:no address:123street phone:1234567890 http:/someurl4.com
Sample output :
In line 1, the id is 1 and the url is http:/someurl1.com
In line 2, the id is 2 and the url is http:/someurl2.com
In line 3, the id is 3 and the url is http:/someurl3.com
In line 4, the id is 4 and the url is http:/someurl4.com
This solution have the advantage to have id or http items to be anywhere you'd want in lines.

With awk:
awk '{for(c=1;c<NF;c++){if(match($c,/userId/)){print $c,$NF; break}}}' your.file
Output:
userId='1' http:/someurl.com
userId='2' http:/someurl.com
userId='3' http:/someurl.com
userId='4' http:/someurl.com

Related

print certain words that begins with x from one line

i want to somehow print words where the word starts with for example srcip and srcintf, from this line from /var/log/syslog
Jul 21 13:13:35 some-name date=2020-07-21 time=13:13:34 devname="devicename" devid="deviceid" logid="0000000013" type="traffic" subtype="forward" level="notice" vd="root" eventtime=1595330014 srcip=1.2.3.4 srcport=57324 srcintf="someinterface" srcintfrole="wan" dstip=5.6.7.8 dstport=80 dstintf="anotherinterface" dstintfrole="lan" sessionid=supersecretid proto=6 action="deny" policyid=0 policytype="policy" service="HTTP" dstcountry="Sweden" srccountry="Sweden" trandisp="noop" duration=0 sentbyte=0 rcvdbyte=0 sentpkt=0 appcat="unscanned" crscore=30 craction=131072 crlevel="high"
to something that looks like this
date=2020-07-21 time=13:13:34 devname="devicename" action="deny" policyid=0 srcintf="someinterface" dstintf="anotherinterface" srcip=1.2.3.4 srcport=57324 -----> dstip=5.6.7.8 dstport=80
currently im using awk to do it. the scalability of it is pretty bad for obvious reasons:
cat /var/log/syslog | awk '{print $5,$6,$7,$25,$26,$17,$21,$15,$16,"-----> "$19,$20}'
also not all the lines have srcip in the same "field". so some lines are really skewed.
or would a syslog message rewriter be better for this purpose? how would you go about solving this? thanks in advance!

$ cat tst.awk
{
delete f
for (i=5; i<=NF; i++) {
split($i,tmp,/=/)
f[tmp[1]] = $i
}
print f["date"], f["time"], f["devname"], f["action"], f["policyid"], f["srcintf"], \
f["dstintf"], f["srcip"], f["srcport"], "----->", f["dstip"], f["dstport"]
}
.
$ awk -f tst.awk file
date=2020-07-21 time=13:13:34 devname="devicename" action="deny" policyid=0 srcintf="someinterface" dstintf="anotherinterface" srcip=1.2.3.4 srcport=57324 -----> dstip=5.6.7.8 dstport=80
The above assumes your quoted strings do not contain spaces as shown in your sample input.

I present you an awk answer which is flexible and, instead of a simple one-liner, a bit a more programmatic way. Your log-file has lines that look in general like:
key1=value1 key2=value2 key3=value3 ...
The idea in this awk is to break it down into an array in awk which is associative, so that the elements can be called as:
a[key1]=>value1 a[key2]=>value2 ... a[key2,"full"]=>key2=value2 ...
Using a function which is explained in this answer, you can write:
awk '
function str2map(str,fs1,fs2,map, n,tmp) {
n=split(str,map,fs1)
for (;n>0;n--) {
split(map[n],tmp,fs2);
map[tmp[1]]=tmp[2]; map[tmp[1],"full"]=map[n]
delete map[n]
}
}
{ str2map($0," ","=",a) }
{ print a["date","full"],a["time","full"],a["devname","full"],a["action","full"] }
' file
This method is very flexible. There is also no dependency in the order of the line or whatever.
note: the above method does not take care of quoting. So if a space appears within a quoted string, it might mess things up.

If you have filter.awk:
BEGIN{
split(filter,a,",");
for (i in a){
f[a[i]]=1;
}
}
{
for (i=1; i<=NF; i++) {
split($i,b,"=");
if (b[1] in f){
printf("%s ", $i);
}
}
printf("\n");
}
you can do:
awk -v filter="srcip,srcintf" -f filter.awk /var/log/syslog
In the filter you specify, comma separated, the keywords. It has to find
note: this script also assumed there file is of the form: key1=value key2=value and that there are no space in the values.

abas-ERP: How to create a dataset without using the .make-command

I need to create a dataset in a file that is not covered by the .make-command. How can I achieve this?
I tried it using the file-identifier you use in the .select-command and the correct group-identifier (e.g. group3). When run, it prompted "wrong group".

You can also use EDP or EPI.
Short example how to create a customer using EDP:
.type text xtedp xterr xtres xtsys
..
..: file containing the EDP commands
.file -TEMPNAME U|xtedp
..: file containing the error output of the edp command
.file -TEMPNAME U|xterr
..: file containing the id of the new customor
.file -TEMPNAME U|xtres
..
..: Create the edp command file containing two new customers
.input DATEI.F
.output new 'U|xtedp'
# hier you can write a comment for the edp file
#!database=0
#!group=1
#!action=new
#!password=yourpassword
#!charset=EKS
#!report=NUM
#!DONTCHANGE=-
#!TRANSACTION=1
# now we list all fields which we want to write
such#name#ans#plz#nort#str#
DOW#John Dow Ldt;#John Dow Ltd#12345#Someplace#Somestreet#
Max#Max Ldt;#Max Ltd#22345#Someplace2#Somestreet2#
..
..: close edp file
.output TERMINAL
..
..: Execute the edp command
.formula U|xtsys = "edpimport.sh " + " -t# -I " + 'U|xtedp' + " > " + 'U|xtrep' + " 2> " + 'U|xterr'
.system 'U|xtsys' background
..
..: G|mehr or G|success is "true" when the command could be executed successfully (on some abas-ERP versions G|success does not wort, use G|mehr)
.continue ERROR ? _G|success
.continue SHOW
!ERROR
..: Do something here!!
..
!SHOW
New customer(s) created:
.input -TEXT 'U|xtres'
.continue
The great benefit using edp is, that you can use transactions. If one operation fails, all transactions will be rolled back.

There is a workaround via .command
When the invisible-Property is set to 1, the mask doesn't become visible and the dataset is saved immediately.
You can use it as follows:
.formula xtCmd = "<File-Identifier> <new>, Group-Identifier ? param1=value1|param2=value2|[invisible]=1"
.command -WAIT -ID maskID 'U|xtCmd'

Powershell: Must provide a value expression on the right-hand side of the '-' operator

#GET TEXT FILE WITH LIST OF "SAMACCOUNTNAME" TO LIST VARIABLE
$list = Get-Content C:\PSSCripts\listofusers.txt
#PULL INFORMATION FROM ACTIVE DIRECTORY TO USERRESULTS VARIABLE
$UserResults = Get-AdUser -filter * -searchbase "OU=THISOU,DC=THISDOMAIN,DC=int" -Properties displayname
#DETERMINE IF USER IS IN THE TXT LIST
foreach ($user in $UserResults)
{
if ($user.SamAccountName -in $list.SamAccountName)
{
#ECHO THEIR NAME TO VERIFY
write-host $user.displayName
}
}
#VERIFY USER TO BE OFFBOARDED VIA Y/N PROMPT - VISUALLY INSPECT LIST
$choice = ""
while ($choice -notmatch "[y|n]"){
$choice = read-host "The following user profiles have been loaded for offboarding. Do you want to continue? Please Verify the users before continuing. (Y/N)"
}
if ($choice -eq "y"){
# LOOP THROUGH USERS AND APPLY CHANGES
foreach ($user in $UserResults)
{
#DETERMINE IF USER IS IN TXT FILE
if ($user.SamAccountName -in $list.SamAccountName)
{
# DISABLE ACCOUNT
Disable-ADAccount -Identity $user
# CHANGE DISPLAYNAME AND DESCRIPTION TO DISPLAY TERMINATED - $USER
$newname = "Terminated - " + $user.displayName
Get-ADUser -Identity $user | Set-ADObject -Description $newname -DisplayName $newname
# CHANGE USER PASSWORD TO "Password1"
$password = "Password1" | ConvertTo-SecureString -AsPlainText -Force
Set-ADAccountPassword -NewPassword $password -Identity $user -Reset
# MOVE USER TO DIFFERENT LOCATION, Disabled Users organizational unit
Move-ADObject -Identity $user -TargetPath "OU=DisabledUsers,DC=THATDOMAIN,DC=int" -Confirm:$false
}
}
}
else {write-host "Script aborted!"}
Getting the following error:
*You must provide a value expression on the right-hand side of the '-' operator. At :11 char:29
if ($user.SamAccountName - <<<< in $list.SamAccountName)
Category Info : ParserError (:) [], ParseException
FullyQualifiedErrorID : ExpectedValueExpression
I have a list of users in a text file with the header SAMACCOUNTNAME. These users are being checked against the list of users in a particular OU. Powershell will echo the list of users in my text list to me (after having checked it against all the users in that OU in AD - to verify nothing is being offboarded / changed in error), prompt to verify (y|n) before moving forward and executing a script I wrote with the help of some redditors from /r/powershell earlier.
I'm not understanding why I'm getting this error, is
-in $list.SamAccountName
Not correct?
Thanks for the help, stackoverflow! First time posting, looking forward to getting better with Powershell and giving back to the community.

You should use "-eq" or "-contains" (I am not sure what is a scalar value and what is an array in your program).

Parse text with powershell and display only what you were searching for

I'm trying to parse a text file with this format:
\\fileshare40\abccheck\logons\ABC64ZXZ.txt:5398:UserID: abcusernamehere Logged: 09:18:36 2014/03/13
\\fileshare40\abccheck\logons\ABC63BZB.txt:5403:UserID: abcusernamehere Logged: 01:21:31 2014/03/14
\\fileshare40\abccheck\logons\ABC61ZSF.txt:5408:UserID: abcusernamehere Logged: 08:22:31 2014/03/17
\\fileshare40\abccheck\logons\ABC62ETB.txt:5413:UserID: abcusernamehere Logged: 07:58:52 2014/03/18
\\fileshare40\abccheck\logons\ABC60BBB.txt:5418:UserID: abcusernamehere Logged: 13:11:36 2014/03/19
The only thing I need out of here is the machine name (ABC*****). Later on I'll put it into an array to see what if there are duplicates, but the answer here will get me started on that path.
I've tried this:
$abc = select-string -path c:\users\abcusernamehere\desktop\findusermachines.txt -pattern "TCWS....." -allmatches
But doing so displays the whole line of text in that file. How can I break up the line to JUST find and display what I'm searching for?

For that, you can use a regex:
(get-content c:\users\abcusernamehere\desktop\findusermachines.txt) -replace '.+\\(.+?)\.txt:.+','$1'
ABC64ZXZ
ABC63BZB
ABC61ZSF
ABC62ETB
ABC60BBB
There's really no point in using Select-String to search for the lines that have the server names if they all have one. Just use Get-Content, and run the -replace operator against all the lines.

You can do that with a regex match.
[regex]::Matches((gc c:\users\abcusernamehere\desktop\findusermachines.txt),"ABC.....")|select -ExpandProperty value
That spits back:
ABC64ZXZ
ABC63BZB
ABC61ZSF
ABC62ETB
ABC60BBB

Yet an other answer :-)
${c:\users\abcusernamehere\desktop\findusermachines.txt } | ? { $_ -cmatch "\b(?<MACHINE>ABC.+)\b.txt" } | % { $Matches['MACHINE'] }
ABC64ZXZ
ABC63BZB
ABC61ZSF
ABC62ETB
ABC60BBB

sed extract multiple possible(?) values from a file

I have a file that has multiple lines like the following:
"<sender from="+919892000000" msisdn="+919892000000" ipAddress="" destinationServerIp="" pcfIp="" imsi="892000000" sccpAddress="+919895000005" country="IN" network="India::Airtel (Kerala)"
"<sender from="+919892000000" msisdn="+919892000000" ipAddress="" destinationServerIp="" pcfIp="" sccpAddress="+919895000005" country="IN" network="India::Airtel (Kerala)"
In the first one imsi exists and in the second line imsi does not exist
For every line that starts with the word sender (there are other lines in the file) I want to extract both the msisdn value and the imsi value. If the imsi value is not there I would line to print out imsi: Unknown.
I tried the following but it does not work:
/sender / { /msisdn/ {s/.*msisdn=\"([^\"]*)?\".*/msisdn: \1/}; p; /imsi/ {s/.*imsi=\"([^\"]*)?\".*/imsi: \1/}; /imsi/! {s/.*/imsi: Unknown/}; p};
What am I missing?
A

Your match for "msisdn" is stripping out the "imsi" so the negative match is always taken. Simply copy your line into hold space, do your "msisdn" processing, swap the hold space back into pattern space, then do your "imsi" processing:
/sender / {h; /msisdn/ {s/.*msisdn=\"([^\"]*)?\".*/msisdn: \1/}; p;x; /imsi/ {s/.*imsi=\"([^\"]*)?\".*/imsi: \1/}; /imsi/! {s/.*/imsi: Unknown/};p}

This can be done using the following sed script:
s/^.*sender .*msisdn="\([^"]*\)" .* imsi="\([^"]*\)".*$/msisdn: \1, imsi: \2/
t
s/^.*sender .*msisdn="\([^"]*\)".*$/msisdn: \1, imsi: Unknown/
t
d
The first s command will print all
sender lines containing the imsi
field.
The first t command will continue
with the next line if the previous
command succeeded.
The second t command will print all
sender lines without the imsi field.
The second t command will continue
with the next line if the previous
command succeeded.
The d command will remove all other
lines.
In order to run this script, just copy it to a file and run it using sed -f script.

Just to add to why I am using sed for this particular problem. The following is the multi lined sed that I am using to create a data structure to pass into awk:
cat xmlEventLog_2010-03-23T* |
sed -nr "/<event eventTimestamp/,/<\/event>/ {
/event /{/uniqueId/ {s/.*uniqueId=\"([^\"]+)\".*/\nuniqueId: \1/g}; /uniqueId/! {s/.*/\nuniqueId: Unknown/}; p};
/payloadType / {/type/ {s/.*type=\"([^\"]+)\".*/payload: \1/g}; /type/! {s/.*protocol=\"([^\"]+)\".*/payload: \1/g}; p};
***/sender / { /msisdn/ {s/.*msisdn=\"([^\"]*)?\".*/msisdn: \1/}; p; /imsi/ {s/.*imsi=\"([^\"]*)?\".*/imsi: \1/}; p; /imsi/! {s/.*/imsi: Unknown/}; p};
/result /{s/.*value=\"([^\"]+)\".*/result: \1/g; p}; /filter code/{s/.*type=\"([^\"]+)\".*/type: \1/g; p}}"
| awk 'BEGIN{FS="\n"; RS=""; OFS=";"; ORS="\n"} $4~/payload: SMS-MT-FSM-INFO|SMS-MT-FSM|SMS-MT-FSM-DEL-REP|SMS-MT-FSM-DEL-REP-INFO|SMS-MT-FSM-DEL-REP/ && $2~/result: Blocked|Modified/ && $3~/msisdn: +919844000011/ {$1=$1 ""; print}'
This parses out files that are filled with events like so:
<event eventTimestamp="2010-03-23T00:00:00.074" originalReceivedMessageSize="28" uniqueId="1280361600.74815_PFS_1_2130328364" deliveryReport="true">
<result value="Allowed"/>
<source name="MFE" host="PFS_1"/>
<sender from="+919892000000" msisdn="+919892000000" ipAddress="" destinationServerIp="" pcfIp="" imsi="892000000" sccpAddress="+919895000005" country="IN" network="India::Airtel (Kerala)">
<profile code=""/>
<mvno code=""/>
</sender>
<recipients>
<recipient code="+919844000039" imsi="892000000" SccpAddress="+919895000005" country="IN" network="India::Airtel (Kerala)">
</recipient>
</recipients>
<payload>
<payloadType protocol="SMS" type="SMS-MT-FSM-DEL-REP"/>
<message signature="70004b7c9267f348321cde977c96a7a3">
<MailFrom value=""/>
<rcptToList>
</rcptToList>
<pduList>
<pdu type="SMS_SS_REQUEST_IND" time="2010-07-29T00:00:00.074" source="SMSPROBE" dest="PCF"/>
<pdu type="SMS_SS_REQ_HANDLING_STOP" time="2010-07-29T00:00:00.074" source="PCF" dest=""/>
</pduList>
<numberOfImages>0</numberOfImages>
<attachments numberOf="1">
<attachment index="0" size="28" contentType="text/plain"/>
</attachments>
<emailSmtpDeliveryStatus value="" time="" reason=""/>
<pepId value="989350000109.989350000209.0.0"/>
</message>
</payload>
<filters>
</filters>
</event>
There could be up to 10000 events like the one above each file and there will be hundreds of files. The structures output for awk should be of the type:
uniqueId: 1280361600.208152_PFS_1_1509661383
result: Allowed
msisdn: +919892000000
imsi: 892000000
payload: SMS-MT-FSM-DEL-REP
filter:
So for this reason I need to extract 2 values from the sender line and different values from the other lines. The abovementioned filter extracts all correctly except for the part when the sender line is found (marked *** in the filter). So I just want to extract the 2 items from the sender line for the structure. Multiple attempts have failed.

I used Perl to solve your problem.
cat file | perl -n -e 'if (/sender.*msisdn="([^"]*)"(.*imsi="([^"]*)")?/) { print $1, " ", $3 || "unknown", "\n"; }'

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Parse a file to pull out 2 sets of information - parsing

$ awk '{for(i=1;$i!~/userId/;i++); print $i, $NF}' file userId='1' http:/someurl.com userId='2' http:/someurl.com userId='3' http:/someurl.com userId='4' http:/someurl.com

With awk: awk '{for(c=1;c<NF;c++){if(match($c,/userId/)){print $c,$NF; break}}}' your.file Output: userId='1' http:/someurl.com userId='2' http:/someurl.com userId='3' http:/someurl.com userId='4' http:/someurl.com

Related

print certain words that begins with x from one line

abas-ERP: How to create a dataset without using the .make-command

Powershell: Must provide a value expression on the right-hand side of the '-' operator

Parse text with powershell and display only what you were searching for

sed extract multiple possible(?) values from a file

Categories

Resources