grep between pattern and exclude start/end pattern in output - grep

I have file.txt from which I need to grep first occurrence of a pattern. How can I grep and get matched string only ':whitespace' and 'end of line'
I m trying below command
cat file.txt | grep -m1 -P "(:\s+).*ccas-apache$"
But it gives me
name: nginx-ccas-apache
and I want is
nginx-ccas-apache
file.txt
pod: nginx-ccas-apache-0
name: nginx-ccas-apache
image: myregnapq//ccas_apache
name: nginx-filebeat
pod: nginx-ccas-apache-1
name: nginx-ccas-apache
image: myregnapq/ccas_apache
name: nginx-filebeat

Another approach using sed:
sed -En '/^[[:space:]]+name:[[:space:]](.*ccas-apache)$/{s//\1/p;q}' file.txt
Explanation
-En Use extended regexp with -E and prevent the default printing of a line by sed with -n
/^[[:space:]]+name:[[:space:]](.*ccas-apache)$/ The pattern that specifies what to match
If the previous pattern matched, run commands between the curly brackets
s//\1/p Use the last matched pattern with // and replace with group 1. Then print the pattern space with p
q exit sed
The regex matches:
^ Start of string
[[:space:]]+name:[[:space:]] Match name: with leading spaces and single space after
(.*ccas-apache) Capture group 1, match optional chars and ccas-apache
$ End of string
Output
nginx-ccas-apache
Note that you don't have to use cat
See an online demo.

Using grep
$ grep -Pom1 'name: \K.*$' file.txt
nginx-ccas-apache

You can use awk, too:
awk -F: '/:[[:space:]].*ccas-apache$/{sub(/^[[:space:]]+/, "", $2); print $2; exit}' file
Details:
-F: - a colon is used as a field separator
:[[:space:]].*ccas-apache$ - searches for a line with :, a whitespace, then any text, ccas-apache at the end of the string, and once found
sub(/^[[:space:]]+/, "", $2) - remove the initial whitespaces from Field 2
print $2 - then print the Field 2 value
exit - stop processing the file.
See the online demo:
#!/bin/bash
s='pod: nginx-ccas-apache-0
name: nginx-ccas-apache
image: myregnapq//ccas_apache
name: nginx-filebeat
pod: nginx-ccas-apache-1
name: nginx-ccas-apache
image: myregnapq/ccas_apache
name: nginx-filebeat'
awk -F: '/:[[:space:]].*ccas-apache$/{sub(/^[[:space:]]+/, "", $2); print $2; exit}' <<< "$s"
Output: nginx-ccas-apache

INPUT
pod: nginx-ccas-apache-0
name: nginx-ccas-apache
image: myregnapq//ccas_apache
name: nginx-filebeat
pod: nginx-ccas-apache-1
name: nginx-ccas-apache
image: myregnapq/ccas_apache
name: nginx-filebeat
CODE
enter any properly-escaped pattern for __ that includes the string tail $
3 ways of saying the same thing
any one solution works in gawk, mawk-1, mawk-2, or macos nawk
mawk '_{exit} _=$(NF=NF)~__' FS='^.*[ \t]' __='ccas-apache$' OFS=
or
gawk '_{exit} NF*=_=$(NF)~__' FS='^.*[ \t]' __='ccas-apache$' OFS=
or
nawk '_{exit} _=NF*=$NF~__' FS='^.*[ \t]' __='ccas-apache$' OFS=
OUTPUT
nginx-ccas-apache
GENERIC SOLUTION
not just at the tail
this time enter pattern at FS
CODE
{m,g}awk '_{exit} _=(!_<NF)*sub("[^ \t]*"(FS)"[^ \t]*","\4&\4")*\
gsub("^[^\4]*\4|\4[^\4]*$","")' FS='your_pattern_here'
OUTPUT
FS='image'
>>> `image:`
FS='myregnapq'
>>> `myregnapq//ccas_apache`

With your shown samples please try following awk code.
awk -F':[[:space:]]+' '
$1~/^[[:space:]]+name$/ && $2~/^[^-]*-ccas-apache$/{
print $2
exit
}
' Input_file
Explanation: Simple explanation would be, setting field separator as colon followed by space(1 or more occurrences). In main program checking condition if first field matches regex starts with space followed by name AND 2nd field matches regex ^[^-]*-ccas-apache$ then printing 2nd field of that line and `exit from program.

Related

Is it possible to show all lines after match with grep/ripgrep? [duplicate]

Question: I'd like to print a single line directly following a line that contains a matching pattern.
My version of sed will not take the following syntax (it bombs out on +1p) which would seem like a simple solution:
sed -n '/ABC/,+1p' infile
I assume awk would be better to do multiline processing, but I am not sure how to do it.
Never use the word "pattern" in this context as it is ambiguous. Always use "string" or "regexp" (or in shell "globbing pattern"), whichever it is you really mean. See How do I find the text that matches a pattern? for more about that.
The specific answer you want is:
awk 'f{print;f=0} /regexp/{f=1}' file
or specializing the more general solution of the Nth record after a regexp (idiom "c" below):
awk 'c&&!--c; /regexp/{c=1}' file
The following idioms describe how to select a range of records given a specific regexp to match:
a) Print all records from some regexp:
awk '/regexp/{f=1}f' file
b) Print all records after some regexp:
awk 'f;/regexp/{f=1}' file
c) Print the Nth record after some regexp:
awk 'c&&!--c;/regexp/{c=N}' file
d) Print every record except the Nth record after some regexp:
awk 'c&&!--c{next}/regexp/{c=N}1' file
e) Print the N records after some regexp:
awk 'c&&c--;/regexp/{c=N}' file
f) Print every record except the N records after some regexp:
awk 'c&&c--{next}/regexp/{c=N}1' file
g) Print the N records from some regexp:
awk '/regexp/{c=N}c&&c--' file
I changed the variable name from "f" for "found" to "c" for "count" where
appropriate as that's more expressive of what the variable actually IS.
f is short for found. Its a boolean flag that I'm setting to 1 (true) when I find a string matching the regular expression regexp in the input (/regexp/{f=1}). The other place you see f on its own in each script it's being tested as a condition and when true causes awk to execute its default action of printing the current record. So input records only get output after we see regexp and set f to 1/true.
c && c-- { foo } means "if c is non-zero then decrement it and if it's still non-zero then execute foo" so if c starts at 3 then it'll be decremented to 2 and then foo executed, and on the next input line c is now 2 so it'll be decremented to 1 and then foo executed again, and on the next input line c is now 1 so it'll be decremented to 0 but this time foo will not be executed because 0 is a false condition. We do c && c-- instead of just testing for c-- > 0 so we can't run into a case with a huge input file where c hits zero and continues getting decremented so often it wraps around and becomes positive again.
It's the line after that match that you're interesting in, right? In sed, that could be accomplished like so:
sed -n '/ABC/{n;p}' infile
Alternatively, grep's A option might be what you're looking for.
-A NUM, Print NUM lines of trailing context after matching lines.
For example, given the following input file:
foo
bar
baz
bash
bongo
You could use the following:
$ grep -A 1 "bar" file
bar
baz
$ sed -n '/bar/{n;p}' file
baz
I needed to print ALL lines after the pattern ( ok Ed, REGEX ), so I settled on this one:
sed -n '/pattern/,$p' # prints all lines after ( and including ) the pattern
But since I wanted to print all the lines AFTER ( and exclude the pattern )
sed -n '/pattern/,$p' | tail -n+2 # all lines after first occurrence of pattern
I suppose in your case you can add a head -1 at the end
sed -n '/pattern/,$p' | tail -n+2 | head -1 # prints line after pattern
And I really should include tlwhitec's comment in this answer (since their sed-strict approach is the more elegant than my suggestions):
sed '0,/pattern/d'
The above script deletes every line starting with the first and stopping with (and including) the line that matches the pattern. All lines after that are printed.
awk Version:
awk '/regexp/ { getline; print $0; }' filetosearch
If pattern match, copy next line into the pattern buffer, delete a return, then quit -- side effect is to print.
sed '/pattern/ { N; s/.*\n//; q }; d'
Actually sed -n '/pattern/{n;p}' filename will fail if the pattern match continuous lines:
$ seq 15 |sed -n '/1/{n;p}'
2
11
13
15
The expected answers should be:
2
11
12
13
14
15
My solution is:
$ sed -n -r 'x;/_/{x;p;x};x;/pattern/!s/.*//;/pattern/s/.*/_/;h' filename
For example:
$ seq 15 |sed -n -r 'x;/_/{x;p;x};x;/1/!s/.*//;/1/s/.*/_/;h'
2
11
12
13
14
15
Explains:
x;: at the beginning of each line from input, use x command to exchange the contents in pattern space & hold space.
/_/{x;p;x};: if pattern space, which is the hold space actually, contains _ (this is just a indicator indicating if last line matched the pattern or not), then use x to exchange the actual content of current line to pattern space, use p to print current line, and x to recover this operation.
x: recover the contents in pattern space and hold space.
/pattern/!s/.*//: if current line does NOT match pattern, which means we should NOT print the NEXT following line, then use s/.*// command to delete all contents in pattern space.
/pattern/s/.*/_/: if current line matches pattern, which means we should print the NEXT following line, then we need to set a indicator to tell sed to print NEXT line, so use s/.*/_/ to substitute all contents in pattern space to a _(the second command will use it to judge if last line matched the pattern or not).
h: overwrite the hold space with the contents in pattern space; then, the content in hold space is ^_$ which means current line matches the pattern, or ^$, which means current line does NOT match the pattern.
the fifth step and sixth step can NOT exchange, because after s/.*/_/, the pattern space can NOT match /pattern/, so the s/.*// MUST be executed!
This might work for you (GNU sed):
sed -n ':a;/regexp/{n;h;p;x;ba}' file
Use seds grep-like option -n and if the current line contains the required regexp replace the current line with the next, copy that line to the hold space (HS), print the line, swap the pattern space (PS) for the HS and repeat.
Piping some greps can do it (it runs in POSIX shell and under BusyBox):
cat my-file | grep -A1 my-regexp | grep -v -- '--' | grep -v my-regexp
-v will show non-matching lines
-- is printed by grep to separate each match, so we skip that too
If you just want the next line after a pattern, this sed command will work
sed -n -e '/pattern/{n;p;}'
-n supresses output (quiet mode);
-e denotes a sed command (not required in this case);
/pattern/ is a regex search for lines containing the literal combination of the characters pattern (Use /^pattern$/ for line consisting of only of “pattern”;
n replaces the pattern space with the next line;
p prints;
For example:
seq 10 | sed -n -e '/5/{n;p;}'
Note that the above command will print a single line after every line containing pattern. If you just want the first one use sed -n -e '/pattern/{n;p;q;}'. This is also more efficient as the whole file is not read.
This strictly sed command will print all lines after your pattern.
sed -n '/pattern/,${/pattern/!p;}
Formatted as a sed script this would be:
/pattern/,${
/pattern/!p
}
Here’s a short example:
seq 10 | sed -n '/5/,${/5/!p;}'
/pattern/,$ will select all the lines from pattern to the end of the file.
{} groups the next set of commands (c-like block command)
/pattern/!p; prints lines that doesn’t match pattern. Note that the ; is required in early versions, and some non-GNU, of sed. This turns the instruction into a exclusive range - sed ranges are normally inclusive for both start and end of the range.
To exclude the end of range you could do something like this:
sed -n '/pattern/,/endpattern/{/pattern/!{/endpattern/d;p;}}
/pattern/,/endpattern/{
/pattern/!{
/endpattern/d
p
}
}
/endpattern/d is deleted from the “pattern space” and the script restarts from the top, skipping the p command for that line.
Another pithy example:
seq 10 | sed -n '/5/,/8/{/5/!{/8/d;p}}'
If you have GNU sed you can add the debug switch:
seq 5 | sed -n --debug '/2/,/4/{/2/!{/4/d;p}}'
Output:
SED PROGRAM:
/2/,/4/ {
/2/! {
/4/ d
p
}
}
INPUT: 'STDIN' line 1
PATTERN: 1
COMMAND: /2/,/4/ {
COMMAND: }
END-OF-CYCLE:
INPUT: 'STDIN' line 2
PATTERN: 2
COMMAND: /2/,/4/ {
COMMAND: /2/! {
COMMAND: }
COMMAND: }
END-OF-CYCLE:
INPUT: 'STDIN' line 3
PATTERN: 3
COMMAND: /2/,/4/ {
COMMAND: /2/! {
COMMAND: /4/ d
COMMAND: p
3
COMMAND: }
COMMAND: }
END-OF-CYCLE:
INPUT: 'STDIN' line 4
PATTERN: 4
COMMAND: /2/,/4/ {
COMMAND: /2/! {
COMMAND: /4/ d
END-OF-CYCLE:
INPUT: 'STDIN' line 5
PATTERN: 5
COMMAND: /2/,/4/ {
COMMAND: }
END-OF-CYCLE:

Regex for line containing one or more spaces or dashes

I got .txt file with city names, each in separate line. Some of them are few words with one or multiple spaces or words connected with '-'. I need to create bash command which will echo those lines out. Currently I'm using cat piped with grep but I can't get both spaces and dash into one search and I had problems with checking for multiple spaces.
print lines with dash:
cat file.txt | grep ".*-.*"
print lines with spaces:
cat file.txt | grep ".*\s.*"
tho when I try to do:
cat file.txt | grep ".*\s+.*"
I get nothing.
Thanks for help
Something like that should work:
grep -E -- ' |\-' file.txt
Explanation:
-E: to interpret patterns as extended regular expressions
--: to signify the end of command options
' |\-': the line contains either a space or a dash
This does not directly address your question, but is too much to put in a comment.
You don't need the .* in your patterns. .* at the beginning or end of a pattern is useless, because it means "0 or more of any character" and so will always match.
These lines are all identical:
cat file.txt | grep ".*-.*"
cat file.txt | grep "-.*"
cat file.txt | grep "-"
Plus you don't need to cat and pipe:
grep "-" file.txt
When grep pattern matches, the default action is to print the whole line, so .* in all your patterns are redundant, you may delete them. Also, you don't have to use cat file | as you may specify the file to grep directly after pattern, i.e. grep 'pattern' file.txt.
Here are some more details:
grep ".*-.*" = grep -- "-" - returns any lines having a - char (-- singals the end of options, the next thing is the pattern)
grep ".*\s.*" = grep "\s" - matches and returns lines containing a whitespace char (only GNU grep)
grep ".*\s+.*" = grep "\s+" - returns line containing a whitespace followed with a literal + char (since you are using POSIX BRE regex here the unescaped + matches a literal plus symbol).
You want
grep "[[:space:]-]" file.txt
See the online demo:
#!/bin/bash
s='abc - def
ghi
jkl mno'
grep '[[:space:]-]' <<< "$s"
Output:
abc - def
jkl mno
The [[:space:]-] POSIX BRE and ERE (enabled with -E option) compliant pattern matches either any whitespace (with the [:space:] POSIX character class) or a hyphen.
Note that [\s-] won't work since \s inside a bracket expression is not treated as a regex escape sequence but as a mere \ or s.

Extract bin name from Cargo.toml using Bash

I am trying to extract bin names from from Cargo.toml using Bash, I enabled perl regular expression like this
First attempt
grep -Pzo '(?<=(^\[\[bin\]\]))\s*name\s*=\s*"(.*)"' ./Cargo.toml
The regular expression is tested at regex101
But got nothing
the Pzo options usage can be found here
Second attempt
grep -P (?<=(^[[bin]]))\n*\sname\s=\s*"(.*)" ./Cargo.toml
Still nothing
grep -Pzo '(?<=(^\[\[bin\]\]))\s*name\s*=\s*"(.*)"' ./Cargo.toml
Cargo.toml
[[bin]]
name = "acme1"
path = "bin/acme1.rs"
[[bin]]
name = "acme2"
path = "src/acme1.rs"
grep:
grep -A1 '^\[\[bin\]\]$' |
grep -Po '(?<=^name = ")[^"]*(?=".*)'
or if you can use awk, this is more robust
awk '
$1 ~ /^\[\[?[[:alnum:]]*\]\]?$/{
if ($1=="[[bin]]" || $1=="[bin]") {bin=1}
else {bin=0}
}
bin==1 &&
sub(/^[[:space:]]*name[[:space:]]*=[[:space:]]*/, "") {
sub(/^"/, ""); sub(/".*$/, "")
print
}' cargo.toml
Example:
$ cat cargo.toml
[[bin]]
name = "acme1"
path = "bin/acme1.rs"
[bin]
name="acme2"
[[foo]]
name = "nobin"
[bin]
not_name = "hello"
name="acme3"
path = "src/acme3.rs"
[[bin]]
path = "bin/acme4.rs"
name = "acme4" # a comment
$ sh solution
acme1
acme2
acme3
acme4
Obviously, these are no substitute for a real toml parser.
With your shown samples and attempts, please try following code with tac + awk combination, which will be easier to maintain and does the job with easiness, which will be difficult in grep.
tac Input_file |
awk '
/^name =/{
gsub(/"/,"",$NF)
value=$NF
next
}
/^path[[:space:]]+=[[:space:]]+"bin\//{
print value
value=""
}
' |
tac
Explanation: Adding detailed explanation for above code.
tac Input_file | ##Using tac command on Input_file to print it in bottom to top order.
awk ' ##passing tac output to awk as standard input.
/^name =/{ ##Checking if line starts from name = then do following.
gsub(/"/,"",$NF) ##Globally substituting " with NULL in last field.
value=$NF ##Setting value to last field value here.
next ##next will skip all further statements from here.
}
/^path[[:space:]]+=[[:space:]]+"bin\//{ ##Checking if line starts from path followed by space = followed by spaces followed by "bin/ here.
print value ##printing value here.
value="" ##Nullifying value here.
}
' | ##Passing awk program output as input to tac here.
tac ##Printing values in their actual order.

Use awk to parse and modify every CSV field

I need to parse and modify a each field from a CSV header line for a dynamic sqlite create table statement. Below is what works from the command line with the appropriate output:
echo ",header1,header2,header3"| awk 'BEGIN {FS=","}; {for(i=2;i<=NF;i++){printf ",%s text ", $i}; printf "\n"}'
,header1 text ,header2 text ,header3 text
Well, it breaks when it is run from within a bash shell script. I got it to work by writing the output to a file like below:
echo $optionalHeaders | awk 'BEGIN {FS=","}; {for(i=2;i<=NF;i++){printf ",%s text ", $i}; printf "\n"}' > optionalHeaders.txt
This sucks! There are a lot of examples that show how to parse/modify specific Nth fields. This issue requires each field to be modified. Is there a more concise and elegant Awk one liner that can store its contents to a variable rather than writing to a file?
sed is usually the right tool for simple substitutions on a single line. Take your pick:
$ echo ",header1,header2,header3" | sed 's/[^,][^,]*/& text/g'
,header1 text,header2 text,header3 text
$ echo ",header1,header2,header3" | sed -r 's/[^,]+/& text/g'
,header1 text,header2 text,header3 text
The last 1 above requires GNU sed to use EREs instead of BREs. You can do the same in awk using gsub() if you prefer:
$ echo ",header1,header2,header3" | awk '{gsub(/[^,]+/,"& text")}1'
,header1 text,header2 text,header3 text
I found the problem and it was me... I forgot to echo the contents of the variable to the Awk command. Brianadams comment was so simple that forced me to re-look at my code and find the problem! Thanks!
I am ok with resolving this but if anyone wants to propose a more concise and elegant Awk one liner - that would be cool.
You can try the following:
#! /bin/bash
header=",header1,header2,header3"
newhead=$(awk 'BEGIN {FS=OFS=","}; {for(i=2;i<=NF;i++) $i=$i" text"}1' <<<"$header")
echo "$newhead"
with output:
,header1 text,header2 text,header3 text
Instead of modifying fields one by one, another option is with a simple substitution:
echo ",header1,header2,header3" | awk '{gsub(/[^,]+/, "& text", $0); print}'
That is, replace a sequence of non-comma characters with text appended.
Another alternative would be replacing the commas, but due to the irregularities of your header line (first comma must be left alone, no comma at the end), that's a bit less easy:
echo ",header1,header2,header3" | awk '{gsub(/,/, " text,", $0); sub(/^ text,/, "", $0); print $0 " text"}'
Btw, the rough equivalent of the two commands in sed:
echo ",header1,header2,header3" | sed -e 's/[^,]\{1,\}/& text/g'
echo ",header1,header2,header3" | sed -e 's/\(.\),/\1 text,/g' -e 's/$/ text/'

Inserting a matched string from previous line to the current line using sed or awk

I have a CSV file that shows the statistics for links on a half an hour basis. The link name only appears on the 00:00 line.
link1,0:00,0,0,0,0
,00:30,0,0,0,0
,01:00,0,0,0,0
,01:30,0,0,0,0
,02:00,0,0,0,0
,02:30,0,0,0,0
,03:00,0,0,0,0
,03:30,0,0,0,0
,23:30,0,0,0,0
....
....
link2,00:00,0,0,0,0
How do I copy the link name to every other line until the link name is different, using sed or awk?
With awk, just keep track of the last seen non-empty link name, and always use that.
awk -F, -v OFS=, '$1 != "" { link=$1 } { $1 = link; print $0 }'
Omitting the ellipses, this gives:
link1,0:00,0,0,0,0
link1,00:30,0,0,0,0
link1,01:00,0,0,0,0
link1,01:30,0,0,0,0
link1,02:00,0,0,0,0
link1,02:30,0,0,0,0
link1,03:00,0,0,0,0
link1,03:30,0,0,0,0
link1,23:30,0,0,0,0
link2,00:00,0,0,0,0
This is a simpler job with awk, but if you want to use sed:
sed -e '/^[^,]/{h;s/,.*//;x};/^,/{G;s/^\(.*\)\n\(.*\)/\2\1/}'
Bellow a commented version in sed script file format that can be run with sed -f script:
# For lines not beginning with a ',', saves what precedes a ',' in the hold space and print the original line.
/^[^,]/{
h
s/,.*//
x}
# For lines beginning with a ',', put what has been save in the hold space at the beginning of the pattern space and print.
/^,/{
G
s/^\(.*\)\n\(.*\)/\2\1/}
You can do that in pure bash shell without needing to start a new process, which should be faster than using awk or sed:
IFS=","
while read v1 v2; do
if [[ $v1 != "" ]]; then
link=$v1;
fi
printf "%s,%s\n" "$link" "$v2"
done < file

Resources