How to use grep to construct a pattern to catch the '\n' in strings - grep

In Linux, I can use
$grep -r " is not exist\!\!"
to catch the lines of
printf(" %s is not exist!!\n", path);
But when I use
$grep -r " is not exist\!\!\n"
$grep -r " is not exist\!\!\\n"
Nothing returns.
How should I construct the string pattern so that I can exactly catch the '\n' in a string such as in a printf?

Related

Extract bin name from Cargo.toml using Bash

I am trying to extract bin names from from Cargo.toml using Bash, I enabled perl regular expression like this
First attempt
grep -Pzo '(?<=(^\[\[bin\]\]))\s*name\s*=\s*"(.*)"' ./Cargo.toml
The regular expression is tested at regex101
But got nothing
the Pzo options usage can be found here
Second attempt
grep -P (?<=(^[[bin]]))\n*\sname\s=\s*"(.*)" ./Cargo.toml
Still nothing
grep -Pzo '(?<=(^\[\[bin\]\]))\s*name\s*=\s*"(.*)"' ./Cargo.toml
Cargo.toml
[[bin]]
name = "acme1"
path = "bin/acme1.rs"
[[bin]]
name = "acme2"
path = "src/acme1.rs"
grep:
grep -A1 '^\[\[bin\]\]$' |
grep -Po '(?<=^name = ")[^"]*(?=".*)'
or if you can use awk, this is more robust
awk '
$1 ~ /^\[\[?[[:alnum:]]*\]\]?$/{
if ($1=="[[bin]]" || $1=="[bin]") {bin=1}
else {bin=0}
}
bin==1 &&
sub(/^[[:space:]]*name[[:space:]]*=[[:space:]]*/, "") {
sub(/^"/, ""); sub(/".*$/, "")
print
}' cargo.toml
Example:
$ cat cargo.toml
[[bin]]
name = "acme1"
path = "bin/acme1.rs"
[bin]
name="acme2"
[[foo]]
name = "nobin"
[bin]
not_name = "hello"
name="acme3"
path = "src/acme3.rs"
[[bin]]
path = "bin/acme4.rs"
name = "acme4" # a comment
$ sh solution
acme1
acme2
acme3
acme4
Obviously, these are no substitute for a real toml parser.
With your shown samples and attempts, please try following code with tac + awk combination, which will be easier to maintain and does the job with easiness, which will be difficult in grep.
tac Input_file |
awk '
/^name =/{
gsub(/"/,"",$NF)
value=$NF
next
}
/^path[[:space:]]+=[[:space:]]+"bin\//{
print value
value=""
}
' |
tac
Explanation: Adding detailed explanation for above code.
tac Input_file | ##Using tac command on Input_file to print it in bottom to top order.
awk ' ##passing tac output to awk as standard input.
/^name =/{ ##Checking if line starts from name = then do following.
gsub(/"/,"",$NF) ##Globally substituting " with NULL in last field.
value=$NF ##Setting value to last field value here.
next ##next will skip all further statements from here.
}
/^path[[:space:]]+=[[:space:]]+"bin\//{ ##Checking if line starts from path followed by space = followed by spaces followed by "bin/ here.
print value ##printing value here.
value="" ##Nullifying value here.
}
' | ##Passing awk program output as input to tac here.
tac ##Printing values in their actual order.

Nullify fields in pipe delimited file

Am not able to get the desired o/p when the data field has pipe in it.
If the i/p is
SAmple file is tst
hdr1|"hdr2|tst"|"hdr3|tst|tst"|hdr4|"hdr5|tst|tst"
lbl1|"lbl2|tst"|"lbl3|tst|tst"|lbl4|"lbl5|tst|tst"
I tried with this cmd but dont get the expected o/p - cut -f2,3 -d"|" tst
The expected o/p is
"hdr2|tst"|"hdr3|tst|tst"
"lbl2|tst"|"lbl3|tst|tst"
Is there an easy way that we can crack this o/p...Dont want to go with sed bcoz the tool that am using doesnt allow the charecter (""- backslash). I mean am embedding this command in one of the tool
Also am using old version of gawk -
so this cmd doesnt give te desired o/p
gawk -v FPAT='[^|]*|("[^"]*")+' '{print $2, $3}' OFS="|"
Output of gawk --version
GNU Awk 3.1.7
Output of cat -vet tst
hdr1|"hdr2|tst"|"hdr3|tst|tst"|hdr4|"hdr5|tst|tst"$
lbl1|"lbl2|tst"|"lbl3|tst|tst"|lbl4|"lbl5|tst|tst"$
Upgrading your gawk version is by far the best approach as you're missing a few bug fixes and a ton of extremely useful functionality introduced since gawk 3.1.7 came out 10+ years ago (we're currently on gawk version 5.1!) but if you can't do that for some reason then - here's what you can do if you don't have FPAT using any awk in any shell on every UNIX box:
$ cat tst.awk
BEGIN { OFS="|" }
{
orig = $0
$0 = i = ""
while ( (orig != "") && match(orig,/[^|]*|("[^"]*")+/) ) {
$(++i) = substr(orig,RSTART,RLENGTH)
orig = substr(orig,RSTART+RLENGTH+1)
}
print $2, $3
}
.
$ awk -f tst.awk file
"hdr2|tst"|"hdr3|tst|tst"
"lbl2|tst"|"lbl3|tst|tst"
Just to verify that it's identifying all of the fields correctly:
$ cat tst.awk
BEGIN { OFS="|" }
{
orig = $0
$0 = i = ""
while ( (orig != "") && match(orig,/[^|]*|("[^"]*")+/) ) {
$(++i) = substr(orig,RSTART,RLENGTH)
orig = substr(orig,RSTART+RLENGTH+1)
}
print NF " <" $0 ">"
for (i=1; i<=NF; i++) {
print "\t" i " <" $i ">"
}
}
.
$ awk -f tst.awk file
5 <hdr1|"hdr2|tst"|"hdr3|tst|tst"|hdr4|"hdr5|tst|tst">
1 <hdr1>
2 <"hdr2|tst">
3 <"hdr3|tst|tst">
4 <hdr4>
5 <"hdr5|tst|tst">
5 <lbl1|"lbl2|tst"|"lbl3|tst|tst"|lbl4|"lbl5|tst|tst">
1 <lbl1>
2 <"lbl2|tst">
3 <"lbl3|tst|tst">
4 <lbl4>
5 <"lbl5|tst|tst">
if you don't have embedded double quotes, you can substitute the quoted delimiter values with another unused character (I used ~) and after extraction switch back to the original values. Obviously it requires that the new delimiter is not used within text.
$ awk 'BEGIN{OFS=FS="\""} {for(i=2;i<NF;i+=2) gsub("\\|","~",$i)}1' file |
awk 'BEGIN{OFS=FS="|"} {print $2,$3}' |
sed 's/~/|/g'
"hdr2|tst"|"hdr3|tst|tst"
"lbl2|tst"|"lbl3|tst|tst"
Not sure it's simpler than the single awk script though.
Main problem here is the document format design. Requires another patch if there are embedded double quotes, or escaped pipes etc.

how do capture(grep/awk/sed) substring from a string the value in shell

New to scripting. I have only one line & one file. How do I capture summerfruit value (ie "mango") & pass it to another variable from the below line.
.. abc.dfe summer.fruit=mango summer.vegetable=potato projects.blah ...
If your grep supports Perl-compatible regular expressions (PCRE):
summerfruit=$(grep -Po 'summer\.fruit=\K[^ ]+' file)
The \K doesn't print the matched summer.fruit= and [^ ]+ matches one or more non-space characters after the =.
without PCRE:
summerfruit=$(grep -o 'summer\.fruit=[^ ]*' file | grep -o '[^=]*$')
With sed:
summerfruit=$(sed 's/.*summer\.fruit=\([^ ]*\).*/\1/' file)
With awk:
summerfruit=$(awk '{
for (i=1;i<=NF;i++)
if ($i ~ /^summer\.fruit=/){ sub(/^[^=]*=/,"",$i); print $i; exit }
}' file)

How can I send grep results into a different output file for each input file?

I have a folder that contains text files. I need to extract lines that has 'BA' from these text files . I used grep command to print the lines with BA. I would like to save the outputs to another folder with the same file names. How can I change the following code?
grep " BA " dir/*.txt
for i in dir/*.txt; do
grep " BA " $i > $newdir/`basename $i`
done
Note the use of basename, which takes dir/a.txt (say) and returns a.txt
Sounds like a job for GNU parallel:
parallel --dry-run grep '" BA "' '{} > otherdir/{/}' ::: dir/{a,b,c}.txt
Output:
grep " BA " dir/a.txt > otherdir/a.txt
grep " BA " dir/b.txt > otherdir/b.txt
grep " BA " dir/c.txt > otherdir/c.txt
Remove --dry-run when you're happy with what you see.
{} is replaced by the inputs after ::: (these can also come from stdin or a file), {/} is the basename of {}.

Parsing and Printing $PATH Using Unix

I've placed my PATH in a text file and would like to print each path on a newline using a simple command in UNIX.
I've found a long way to do it that goes like this...
cat Path.txt | awk -F\; '{print $1"\n", $2"\n", ... }'
This however seems inefficient so I know there must be a way to quickly print out my results on new lines each time without having to manually call each field separated by the delimiter.
Yet another way:
echo $PATH | tr : '\n'
or:
tr : '\n' <Path.txt
The tr solution is the right one but if you were going to use awk then there'd be no need for a loop:
$ echo "$PATH"
/usr/local/bin:/usr/bin:/cygdrive/c/winnt/system32:/cygdrive/c/winnt
$ echo "$PATH" | awk -F: -v OFS="\n" '$1=$1'
/usr/local/bin
/usr/bin
/cygdrive/c/winnt/system32
/cygdrive/c/winnt
I have a Perl script that I use for this:
#!/usr/bin/env perl
#
# "#(#)$Id: echopath.pl,v 1.8 2011/08/22 22:15:53 jleffler Exp $"
#
# Print the components of a PATH variable one per line.
# If there are no colons in the arguments, assume that they are
# the names of environment variables.
use strict;
use warnings;
#ARGV = $ENV{PATH} unless #ARGV;
foreach my $arg (#ARGV)
{
my $var = $arg;
$var = $ENV{$arg} if $arg =~ /^[A-Za-z_][A-Za-z_0-9]*$/;
$var = $arg unless $var;
my #lst = split /:/, $var;
foreach my $val (#lst)
{
print "$val\n";
}
}
I invoke it like:
echopath $PATH
echopath PATH
echopath LD_LIBRARY_PATH
echopath CDPATH
echopath MANPATH
echopath $CLASSPATH
etc. You can specify the variable name, or the value of the variable; it works both ways.
With Perl for UNIX/UNIX-likes :
echo $PATH | perl -F: -ane '{print join "\n", #F}'
With any OSes (tested on Windows XP, Linux, Minix, Solaris):
my $sep;
my $path;
if ($^O =~ /^MS/) {
$sep = ";";
$path = "Path";
}
else {
$sep = ":";
$path = "PATH";
}
print join "\n", split $sep, $ENV{$path} . "\n";
If using bash for Unix, try the following code :
printf '%s\n' ${PATH//:/ }
This use bash parameter expansion
awk:
echo $PATH|awk -F: '{gsub(/:/,"\n");print}'
perl:
echo $PATH|perl -F: -lane 'foreach(#F){print $_}'
for AWK, in addition to:
echo $PATH | awk -vFS=':' -vOFS='\n' '$1=$1'
You can:
echo $PATH | awk -vRS=':' '1'

Resources