I have a file that has field separated by multiple characters. For Ex:
abc sometext def;ghi=123;
abc sometext def;ghi=123;
abc sometext def;ghi=123;
Now I want to parse the file in AWK to extract the fields. for example to get all the values of 'ghi',
awk '{print $3}' | awk 'BEGIN {FS = "="} { print $NF }' inputFile.txt
Is there any way to parse the file in one shot instead of using multiple pipes and AWK commands.
Yes, you can use the split function in awk
awk '{split($3,a,"=");print a[2]}'
123;
123;
123;
This divides filed nr 3 using = as separator in to an array a, then print second value of array a[2]
If there are variation of fields in filed number 3 and you like the last, do like this:
awk '{n=split($3,a,"=");print a[n]}'
123;
123;
123;
In your case, this will do too:
awk -F= '{print $NF}'
This can also be accomplished using multiple field separators in awk:
$ awk -F"[=;]" '{print $3}' file
123
123
123
This tells awk to use field separators = or ;. Based on that, the numbers you want are in the 3rd position.
If you expect the ghi part to be changeable and important, you can also use grep with a look-behind:
$ grep -Po '(?<=ghi=)\d+' file
123
123
123
This will print all digits after ghi=.
Related
I have this text file:
# cat letter.txt
this
is
just
a
test
to
check
if
grep
works
The letter "e" appear in 3 words.
# grep e letter.txt
test
check
grep
Is there any way to return the letter printed on left of the selected character?
expected.txt
t
h
r
With shown samples in awk, could you please try following.
awk '/e/{print substr($0,index($0,"e")-1,1)}' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
/e/{ ##Looking if current line has e in it then do following.
print substr($0,index($0,"e")-1,1)
##Printing sub string from starting value of index e-1 and print 1 character from there.
}
' Input_file ##Mentioning Input_file name here.
You can use positive lookahead to match a character that is followed by an e, without making the e part of the match.
cat letter.txt | grep -oP '.(?=e)'
With sed:
sed -nE 's/.*(.)e.*/\1/p' letter.txt
Assuming you have this input file:
cat file
this
is
just
a
test
to
check
if
grep
works
egg
element
You may use this grep + sed solution to find letter or empty string before e:
grep -oE '(^|.)e' file | sed 's/.$//'
t
h
r
l
m
Or alternatively this single awk command should also work:
awk -F 'e' 'NF > 1 {
for (i=1; i<NF; i++) print substr($i, length($i), 1)
}' file
This might work for you (GNU sed):
sed -nE '/(.)e/{s//\n\1\n/;s/^[^\n]*\n//;P;D}' file
Turn off implicit printing and enable extended regexp -nE.
Focus only on lines that meet the requirements i.e. contain a character before e.
Surround the required character by newlines.
Remove any characters before and including the first newline.
Print the first line (up to the second newline).
Delete the first line (including the newline).
Repeat.
N.B. The solution will print each such character on a separate line.
To print all such characters on their own line, use:
sed -nE '/(.e)/{s//\n\1/g;s/^/e/;s/e[^\n]*\n?//g;s/\B/ /g;p}' file
N.B. Remove the s/\B /g if space separation is not needed.
With GNU awk you can use empty string as FS to split the input as individual characters:
awk -v FS= '/[e]/ {for(i=2;i<=NF;i++) if ($i=="e") print $(i-1)}' file
t
h
r
Excluding "e" at the beginning in the for loop.
edited
empty string if e is the first character in the word.
For example, this input:
cat file2
grep
erroneously
egg
Wednesday
effectively
awk -v FS= '/^[e]/ {print ""} /[e]/ {for(i=2;i<=NF;i++) if ($i=="e") print $(i-1)}' file2
r
n
W
n
f
v
I have some files, and I want grep to return the lines, where I have at least one string Position:"Engineer" AND at least one string which does have Position not equal to "Engineer"
So in the below file should return only first line:
Position:"Engineer" Name:"Jes" Position:"Accountant" Name:"Criss"
Position:"Engineer" Name:"Eva" Position:"Engineer" Name:"Adam"
I could write something like
grep 'Position:"Engineer"' filename | grep 'Position:"Accountant"'
And this works fine (I get only first line), but the thing is I don't know what are all of the possible values in Position, so the grep needs to be generic something like
grep 'Position:"Engineer"' filename | grep -v 'Position:"Engineer"'
But this doesn't return anything (as both grep contradict each other)
Do you have any idea how this can be done?
This line works :
grep "^Position:\"Engineer\"" filename | grep -v " Position:\"Engineer\""
The first expresion with "$" catch only the Position at the begining of line, the second expression with " " space remove the second "Postion" expression.
You can avoid the pipe and additional subshell by using awk if that is allowed, e.g.
awk '
$1~/Engineer/ {if ($3~/Engineer/) next; print}
$3~/Engineer/ {if ($1~/Engineer/) next; print}
' file
Above just checks if the first field contains Engineer and if so checks if field 3 also contains Engineer, and if so skips the record, if not prints it. The second rule, just swaps the order of the tests. The result of the tests is that Engineer can only appear in one of the fields (either first or third, but not both)
Example Use/Output
With your sample input in file, you would have:
$ awk '
$1~/Engineer/ {if ($3~/Engineer/) next; print}
$3~/Engineer/ {if ($1~/Engineer/) next; print}
' file
Position:"Engineer" Name:"Jes" Position:"Accountant" Name:"Criss"
Use negative lookahead to exclude a pattern after match.
grep 'Position:"Engineer"' | grep -P 'Position:"(?!Engineer)'
With two greps in a pipe:
grep -F 'Position:"Engineer"' file | grep -Ev '(Position:"[^"]*").*\1'
or, perhaps more robustly
grep -F 'Position:"Engineer"' file | grep -v 'Position:"Engineer".*Position:"Engineer"'
In general case, if you want to print the lines with unique Position fields,
grep -Ev '(Position:"[^"]*").*\1' file
should do the job, assuming all the lines have the format specified. This will work also when there are more than two Position fields in the line.
I have a program which output is summary file with header and few columns of results.
I want to show only two data: file name and best period prediction and I use this command:
program input_file | gawk 'NR==2 {print $3}; NR==4 {print $2}'
as the result I obtain result in one column, two lines. What I have to do to have this result in one line, two columns?
You could use:
program input_file | gawk 'NR==2 {heading = $3}; NR==4 {print heading " = " $2}'
This saves the value in $3 on line 2 in variable heading and prints the heading and the value from column 2 when it reads line 4.
I do the following in order to get all WORD in file but not in lines that start with "//"
grep -v "//" file | grep WORD
Can I get some other elegant suggestion to find all occurrences of WORD in the file except lines that begin with //?
Remark: "//" does not necessarily exist at the beginning of the line; there could be some spaces before "//".
For example
// WORD
AA WORD
// ss WORD
grep -v "//" file | grep WORD
This will also exclude any lines with "//" after WORD, such as:
WORD // This line here
A better approach with GNU Grep would be:
grep -v '^[[:space:]]*//' file | grep 'WORD'
...which would first filter out any lines beginning with zero-or-more spaces and a comment string.
Trying to put these two conditions into a single regular expression is probably not more elegant.
awk '!/^[ \t]*\/\// && /WORD/{m=gsub("WORD","");total+=m}END{print total}' file
I have lines in a file which look like the following
....... DisplayName="john" ..........
where .... represents variable number of other fields.
Using the following grep command, I am able to extract all the lines which have a valid 'DisplayName' field:
grep DisplayName="[0-9A-Za-z[:space:]]*" e:\test
However, I wish to extract just the name (ie "john") from each line instead of the whole line returned by grep. I tried piping the output into the cut command but it does not accept string delimiters.
This works for me:
awk -F "=" '/DisplayName/ {print $2}'
which returns "john". To remove the quotes for john use:
awk -F "=" '/DisplayName/ {gsub("\"","");print $2}'
Specifically:
sed 's/.*DisplayName="\(.*\)".*/\1/'
Should do, sed semantics is s/subsitutethis/forthis/ where "/" is delimiter. The escaped parentheses in combination with escaped 1 are used to keep the part of the pattern designated by parentheses. This expression keeps everything inside the parentheses after displayname and throws away the rest.
This can also work without first using grep, if you use:
sed -n 's/.*DisplayName="\(.*\)".*/\1/p'
The -n option and p flag tells sed to print just the changed lines.
More in: http://www.grymoire.com/Unix/Sed.html