I am trying to extract bin names from from Cargo.toml using Bash, I enabled perl regular expression like this
First attempt
grep -Pzo '(?<=(^\[\[bin\]\]))\s*name\s*=\s*"(.*)"' ./Cargo.toml
The regular expression is tested at regex101
But got nothing
the Pzo options usage can be found here
Second attempt
grep -P (?<=(^[[bin]]))\n*\sname\s=\s*"(.*)" ./Cargo.toml
Still nothing
grep -Pzo '(?<=(^\[\[bin\]\]))\s*name\s*=\s*"(.*)"' ./Cargo.toml
Cargo.toml
[[bin]]
name = "acme1"
path = "bin/acme1.rs"
[[bin]]
name = "acme2"
path = "src/acme1.rs"
grep:
grep -A1 '^\[\[bin\]\]$' |
grep -Po '(?<=^name = ")[^"]*(?=".*)'
or if you can use awk, this is more robust
awk '
$1 ~ /^\[\[?[[:alnum:]]*\]\]?$/{
if ($1=="[[bin]]" || $1=="[bin]") {bin=1}
else {bin=0}
}
bin==1 &&
sub(/^[[:space:]]*name[[:space:]]*=[[:space:]]*/, "") {
sub(/^"/, ""); sub(/".*$/, "")
print
}' cargo.toml
Example:
$ cat cargo.toml
[[bin]]
name = "acme1"
path = "bin/acme1.rs"
[bin]
name="acme2"
[[foo]]
name = "nobin"
[bin]
not_name = "hello"
name="acme3"
path = "src/acme3.rs"
[[bin]]
path = "bin/acme4.rs"
name = "acme4" # a comment
$ sh solution
acme1
acme2
acme3
acme4
Obviously, these are no substitute for a real toml parser.
With your shown samples and attempts, please try following code with tac + awk combination, which will be easier to maintain and does the job with easiness, which will be difficult in grep.
tac Input_file |
awk '
/^name =/{
gsub(/"/,"",$NF)
value=$NF
next
}
/^path[[:space:]]+=[[:space:]]+"bin\//{
print value
value=""
}
' |
tac
Explanation: Adding detailed explanation for above code.
tac Input_file | ##Using tac command on Input_file to print it in bottom to top order.
awk ' ##passing tac output to awk as standard input.
/^name =/{ ##Checking if line starts from name = then do following.
gsub(/"/,"",$NF) ##Globally substituting " with NULL in last field.
value=$NF ##Setting value to last field value here.
next ##next will skip all further statements from here.
}
/^path[[:space:]]+=[[:space:]]+"bin\//{ ##Checking if line starts from path followed by space = followed by spaces followed by "bin/ here.
print value ##printing value here.
value="" ##Nullifying value here.
}
' | ##Passing awk program output as input to tac here.
tac ##Printing values in their actual order.
Related
Am not able to get the desired o/p when the data field has pipe in it.
If the i/p is
SAmple file is tst
hdr1|"hdr2|tst"|"hdr3|tst|tst"|hdr4|"hdr5|tst|tst"
lbl1|"lbl2|tst"|"lbl3|tst|tst"|lbl4|"lbl5|tst|tst"
I tried with this cmd but dont get the expected o/p - cut -f2,3 -d"|" tst
The expected o/p is
"hdr2|tst"|"hdr3|tst|tst"
"lbl2|tst"|"lbl3|tst|tst"
Is there an easy way that we can crack this o/p...Dont want to go with sed bcoz the tool that am using doesnt allow the charecter (""- backslash). I mean am embedding this command in one of the tool
Also am using old version of gawk -
so this cmd doesnt give te desired o/p
gawk -v FPAT='[^|]*|("[^"]*")+' '{print $2, $3}' OFS="|"
Output of gawk --version
GNU Awk 3.1.7
Output of cat -vet tst
hdr1|"hdr2|tst"|"hdr3|tst|tst"|hdr4|"hdr5|tst|tst"$
lbl1|"lbl2|tst"|"lbl3|tst|tst"|lbl4|"lbl5|tst|tst"$
Upgrading your gawk version is by far the best approach as you're missing a few bug fixes and a ton of extremely useful functionality introduced since gawk 3.1.7 came out 10+ years ago (we're currently on gawk version 5.1!) but if you can't do that for some reason then - here's what you can do if you don't have FPAT using any awk in any shell on every UNIX box:
$ cat tst.awk
BEGIN { OFS="|" }
{
orig = $0
$0 = i = ""
while ( (orig != "") && match(orig,/[^|]*|("[^"]*")+/) ) {
$(++i) = substr(orig,RSTART,RLENGTH)
orig = substr(orig,RSTART+RLENGTH+1)
}
print $2, $3
}
.
$ awk -f tst.awk file
"hdr2|tst"|"hdr3|tst|tst"
"lbl2|tst"|"lbl3|tst|tst"
Just to verify that it's identifying all of the fields correctly:
$ cat tst.awk
BEGIN { OFS="|" }
{
orig = $0
$0 = i = ""
while ( (orig != "") && match(orig,/[^|]*|("[^"]*")+/) ) {
$(++i) = substr(orig,RSTART,RLENGTH)
orig = substr(orig,RSTART+RLENGTH+1)
}
print NF " <" $0 ">"
for (i=1; i<=NF; i++) {
print "\t" i " <" $i ">"
}
}
.
$ awk -f tst.awk file
5 <hdr1|"hdr2|tst"|"hdr3|tst|tst"|hdr4|"hdr5|tst|tst">
1 <hdr1>
2 <"hdr2|tst">
3 <"hdr3|tst|tst">
4 <hdr4>
5 <"hdr5|tst|tst">
5 <lbl1|"lbl2|tst"|"lbl3|tst|tst"|lbl4|"lbl5|tst|tst">
1 <lbl1>
2 <"lbl2|tst">
3 <"lbl3|tst|tst">
4 <lbl4>
5 <"lbl5|tst|tst">
if you don't have embedded double quotes, you can substitute the quoted delimiter values with another unused character (I used ~) and after extraction switch back to the original values. Obviously it requires that the new delimiter is not used within text.
$ awk 'BEGIN{OFS=FS="\""} {for(i=2;i<NF;i+=2) gsub("\\|","~",$i)}1' file |
awk 'BEGIN{OFS=FS="|"} {print $2,$3}' |
sed 's/~/|/g'
"hdr2|tst"|"hdr3|tst|tst"
"lbl2|tst"|"lbl3|tst|tst"
Not sure it's simpler than the single awk script though.
Main problem here is the document format design. Requires another patch if there are embedded double quotes, or escaped pipes etc.
New to scripting. I have only one line & one file. How do I capture summerfruit value (ie "mango") & pass it to another variable from the below line.
.. abc.dfe summer.fruit=mango summer.vegetable=potato projects.blah ...
If your grep supports Perl-compatible regular expressions (PCRE):
summerfruit=$(grep -Po 'summer\.fruit=\K[^ ]+' file)
The \K doesn't print the matched summer.fruit= and [^ ]+ matches one or more non-space characters after the =.
without PCRE:
summerfruit=$(grep -o 'summer\.fruit=[^ ]*' file | grep -o '[^=]*$')
With sed:
summerfruit=$(sed 's/.*summer\.fruit=\([^ ]*\).*/\1/' file)
With awk:
summerfruit=$(awk '{
for (i=1;i<=NF;i++)
if ($i ~ /^summer\.fruit=/){ sub(/^[^=]*=/,"",$i); print $i; exit }
}' file)
I know there have been similar questions posted but I'm still having a bit of trouble getting the output I want using awk FNR==NR...
I have 2 files as such
File 1:
123|this|is|good
456|this|is|better
...
File 2:
aaa|123
bbb|456
...
So I want to join on values from file 2/column2 to file 1/column1 and output file 1 (col 2,3,4) and file 2 (col 1).
Thanks in advance.
With awk you could do something like
awk -F \| 'BEGIN { OFS = FS } NR == FNR { val[$2] = $1; next } $1 in val { $(NF + 1) = val[$1]; print }' file2 file1
NF is the number of fields in a record (line by default), so $NF is the last field, and $(NF + 1) is the field after that. By assigning the saved value from the pass over file2 to it, a new field is appended to the record before it is printed.
One thing to note: This behaves like an inner join, i.e., only records are printed whose key appears in both files. To make this a right join, you can use
awk -F \| 'BEGIN { OFS = FS } NR == FNR { val[$2] = $1; next } { $(NF + 1) = val[$1]; print }' file2 file1
That is, you can drop the $1 in val condition on the append-and-print action. If $1 is not in val, val[$1] is empty, and an empty field will be appended to the record before printing.
But it's probably better to use join:
join -1 1 -2 2 -t \| file1 file2
If you don't want the key field to be part of the output, pipe the output of either of those commands through cut -d \| -f 2- to get rid of it, i.e.
join -1 1 -2 2 -t \| file1 file2 | cut -d \| -f 2-
If the files have the same number of lines in the same order, then
paste -d '|' file1 file2 | cut -d '|' -f 2-5
this|is|good|aaa
this|is|better|bbb
I see in a comment to Wintermute's answer that the files aren't sorted. With bash, process substitutions are handy to sort on the fly:
paste -d '|' <(sort -t '|' -k 1,1 file1) <(sort -t '|' -k 2,2 file2) |
cut -d '|' -f 2-5
To reiterate: this solution requires a one-to-one correspondence between the files
I need to parse and modify a each field from a CSV header line for a dynamic sqlite create table statement. Below is what works from the command line with the appropriate output:
echo ",header1,header2,header3"| awk 'BEGIN {FS=","}; {for(i=2;i<=NF;i++){printf ",%s text ", $i}; printf "\n"}'
,header1 text ,header2 text ,header3 text
Well, it breaks when it is run from within a bash shell script. I got it to work by writing the output to a file like below:
echo $optionalHeaders | awk 'BEGIN {FS=","}; {for(i=2;i<=NF;i++){printf ",%s text ", $i}; printf "\n"}' > optionalHeaders.txt
This sucks! There are a lot of examples that show how to parse/modify specific Nth fields. This issue requires each field to be modified. Is there a more concise and elegant Awk one liner that can store its contents to a variable rather than writing to a file?
sed is usually the right tool for simple substitutions on a single line. Take your pick:
$ echo ",header1,header2,header3" | sed 's/[^,][^,]*/& text/g'
,header1 text,header2 text,header3 text
$ echo ",header1,header2,header3" | sed -r 's/[^,]+/& text/g'
,header1 text,header2 text,header3 text
The last 1 above requires GNU sed to use EREs instead of BREs. You can do the same in awk using gsub() if you prefer:
$ echo ",header1,header2,header3" | awk '{gsub(/[^,]+/,"& text")}1'
,header1 text,header2 text,header3 text
I found the problem and it was me... I forgot to echo the contents of the variable to the Awk command. Brianadams comment was so simple that forced me to re-look at my code and find the problem! Thanks!
I am ok with resolving this but if anyone wants to propose a more concise and elegant Awk one liner - that would be cool.
You can try the following:
#! /bin/bash
header=",header1,header2,header3"
newhead=$(awk 'BEGIN {FS=OFS=","}; {for(i=2;i<=NF;i++) $i=$i" text"}1' <<<"$header")
echo "$newhead"
with output:
,header1 text,header2 text,header3 text
Instead of modifying fields one by one, another option is with a simple substitution:
echo ",header1,header2,header3" | awk '{gsub(/[^,]+/, "& text", $0); print}'
That is, replace a sequence of non-comma characters with text appended.
Another alternative would be replacing the commas, but due to the irregularities of your header line (first comma must be left alone, no comma at the end), that's a bit less easy:
echo ",header1,header2,header3" | awk '{gsub(/,/, " text,", $0); sub(/^ text,/, "", $0); print $0 " text"}'
Btw, the rough equivalent of the two commands in sed:
echo ",header1,header2,header3" | sed -e 's/[^,]\{1,\}/& text/g'
echo ",header1,header2,header3" | sed -e 's/\(.\),/\1 text,/g' -e 's/$/ text/'
I have a CSV file that shows the statistics for links on a half an hour basis. The link name only appears on the 00:00 line.
link1,0:00,0,0,0,0
,00:30,0,0,0,0
,01:00,0,0,0,0
,01:30,0,0,0,0
,02:00,0,0,0,0
,02:30,0,0,0,0
,03:00,0,0,0,0
,03:30,0,0,0,0
,23:30,0,0,0,0
....
....
link2,00:00,0,0,0,0
How do I copy the link name to every other line until the link name is different, using sed or awk?
With awk, just keep track of the last seen non-empty link name, and always use that.
awk -F, -v OFS=, '$1 != "" { link=$1 } { $1 = link; print $0 }'
Omitting the ellipses, this gives:
link1,0:00,0,0,0,0
link1,00:30,0,0,0,0
link1,01:00,0,0,0,0
link1,01:30,0,0,0,0
link1,02:00,0,0,0,0
link1,02:30,0,0,0,0
link1,03:00,0,0,0,0
link1,03:30,0,0,0,0
link1,23:30,0,0,0,0
link2,00:00,0,0,0,0
This is a simpler job with awk, but if you want to use sed:
sed -e '/^[^,]/{h;s/,.*//;x};/^,/{G;s/^\(.*\)\n\(.*\)/\2\1/}'
Bellow a commented version in sed script file format that can be run with sed -f script:
# For lines not beginning with a ',', saves what precedes a ',' in the hold space and print the original line.
/^[^,]/{
h
s/,.*//
x}
# For lines beginning with a ',', put what has been save in the hold space at the beginning of the pattern space and print.
/^,/{
G
s/^\(.*\)\n\(.*\)/\2\1/}
You can do that in pure bash shell without needing to start a new process, which should be faster than using awk or sed:
IFS=","
while read v1 v2; do
if [[ $v1 != "" ]]; then
link=$v1;
fi
printf "%s,%s\n" "$link" "$v2"
done < file