I have a text file that contains text blocks roughly formatted like this:
Beginning of block
...
...
...
.........some_pattern.......
...
...
End of block
Beginning of block
...
... etc.
The blocks can have any number of lines but always start with the two delimiters. What I'd like to do is match "some_pattern" and print the whole block to stdout. With the example above, I would get this only:
Beginning of block
...
...
...
.........some_pattern.......
...
...
End of block
I've tried with something like this but without success:
grep "Beginning of block\n.*some_pattern.*\n.*End of block"
Any idea how to do this with grep? (or maybe with some other tool)
I guess awk is better for this:
awk '/Beginning of block/ {p=1};
{if (p==1) {a[NR]=$0}};
/some_pattern/ {f=1};
/End of block/ {p=0; if (f==1) {for (i in a) print a[i]};f=0; delete a}' file
Explanation
It just prints when the p flag is "active" and some_pattern is matched:
When it finds Beginning of block, then makes variable p=1 and starts storing the lines in the array a[].
If it finds some_pattern, it sets the flag f to 1, so that we know the pattern has been found.
When it finds End of block it resets p=0. If some_pattern had been found since the last Beginning of block, all the lines that had been stored are printed. Finally a[] is cleared and f is reset; we will have a fresh start when we again encounter Beginning of block.
Other test
$ cat a
Beginning of block
blabla
.........some_pattern.......
and here i am
hello
End of block
Beginning of block
...
... etc.
End of block
$ awk '/Beginning of block/ {p=1}; {if(p==1){a[NR]=$0}}; /some_pattern/ {f=1}; /End of block/ {p=0; if (f==1) {for (i in a) print a[i]}; delete a;f=0}' a
Beginning of block
blabla
.........some_pattern.......
and here i am
hello
End of block
The following might work for you:
sed -n '/Beginning of block/!b;:a;/End of block/!{$!{N;ba}};{/some_pattern/p}' filename
Not sure if I missed something but here is a simpler variation of one of the answers above:
awk '/Beginning of block/ {p=1};
/End of block/ {p=0; print $0};
{if (p==1) print $0}'
You need to print the input line in the End of Block case to get both delimiters.
I wanted a slight variation that doesn't print the delimiters. In the OP's question the delimiter pattern is simple and unique. Then the simplest is to pipe into | grep -v block. My case was more irregular, so I used the variation below. Notice the next statement so the opening block isn't printed by the third statement:
awk '/Beginning of block/ {p=1; next};
/End of block/ {p=0};
{if (p==1) print $0}'
Here's one way using awk:
awk '/Beginning of block/ { r=""; f=1 } f { r = (r ? r ORS : "") $0 } /End of block/ { if (f && r ~ /some_pattern/) print r; f=0 }' file
Results:
Beginning of block
...
...
...
.........some_pattern.......
...
...
End of block
sed -n "
/Beginning of block/,/End of block/ {
N
/End of block/ {
s/some_pattern/&/p
}
}"
sed is efficient for such a treatment
with grep, you certainly should pass through intermediary file or array.
I have this current solution for CVS status managment:-
cvs -q status|awk 'c-->0;$0~s{if(b)for(c=b+1;c>1;c--)print r[(NR-c+1)%b];print;c=a}b{r[NR%b]=$0}' b=1 a=9 s='(Locally Modified)|(Needs Patch)'
This gives me a display of Locally Modified files and files that need patching, which is great.
However a better solution for me that would catch all status is when the status is not equal to 'Up-to-date'.
I have tried s!= and s<> but it only seems to allow =.
A little whitespace will go a long way...
The opposite of $0 ~ s is $0 !~ s, so
cvs -q status | awk '
c-- > 0
$0 !~ s {
if (b)
for (c=b+1; c>1; c--)
print r[(NR-c+1)%b]
print
c=a
}
b {r[NR%b]=$0}
' b=1 a=9 s='Up-to-date'
So ... I know that I can reverse the order of lines in a file using tac or a few other tools, but how do I reorder in the other dimension, i.e. horizontally? I'm trying to do it with the following awk script:
{
out="";
for(i=length($0);i>0;i--) {
out=out substr($0,i,1)}
print out;
}
This seems to reverse the characters, but it's garbled, and I'm not seeing why. What am I missing?
I'm doing this in awk, but is there something better? sed, perhaps?
Here's an example. Input data looks like this:
$ cowsay <<<"hello"
_______
< hello >
-------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||
And the output looks like this:
$ cowsay <<<"hello" | rev
_______
> olleh <
-------
^__^ \
_______\)oo( \
\/\) \)__(
| w----||
|| ||
Note that the output is identical whether I use rev or my own awk script. As you can see, things ARE reversed, but ... it's mangled.
rev is nice, but it doesn't pad input lines. It just reverses them.
The "mangling" you're seeing is because one line may be 20 characters long, and the next may be 15 characters long. In your input text they share a left-hand column. But in your output text, they need to share a right-hand column.
So you need padding. Oh, and asymmetric reversal, as Joachim said.
Here's my revawk:
#!/usr/bin/awk -f
#
length($0)>max {
max=length($0);
}
{
# Reverse the line...
for(i=length($0);i>0;i--) {
o[NR]=o[NR] substr($0,i,1);
}
}
END {
for(i=1;i<=NR;i++) {
# prepend the output with sufficient padding
fmt=sprintf("%%%ds%%s\n",max-length(o[i]));
printf(fmt,"",o[i]);
}
}
(I did this in gawk; I don't think I used any gawkisms, but if you're using a more classic awk variant, you may need to adjust this.)
Use this the same way you'd use rev.
ghoti#pc:~$ echo hello | cowsay | ./revawk | tr '[[]()<>/\\]' '[][)(><\\/]'
_______
< olleh >
-------
^__^ /
_______/(oo) /
/\/( /(__)
| w----||
|| ||
If you're moved to do so, you might even run the translate from within the awk script by adding it to the last printf line:
printf(fmt," ",o[i]) | "tr '[[]()<>/\\]' '[][)(><\\/]'";
But I don't recommend it, as it makes the revawk command less useful for other applications.
Your lines aren't the same length, so reversing the cow will break it. What you need to do is to "pad" the lines to be the same length, then reverse.
For example;
cowsay <<<"hello" | awk '{printf "%-40s\n", $0}' | rev
will pad it to 40 columns, and then reverse.
EDIT: #ghoti did a script that sure beats this simplistic reverse, have a look at his answer.
Here's one way using GNU awk and rev
Run like:
awk -f ./script.awk <(echo "hello" | cowsay){,} | rev
Contents of script.awk:
FNR==NR {
if (length > max) {
max = length
}
next
}
{
while (length < max) {
$0=$0 OFS
}
}1
Alternatively, here's the one-liner:
awk 'FNR==NR { if (length > max) max = length; next } { while (length < max) $0=$0 OFS }1' <(echo "hello" | cowsay){,} | rev
Results:
_______
> olleh <
-------
^__^ \
_______\)oo( \
\/\) \)__(
| w----||
|| ||
----------------------------------------------------------------------------------------------
Here's another way just using GNU awk:
Run like:
awk -f ./script.awk <(echo "hello" | cowsay){,}
Contents of script.awk:
BEGIN {
FS=""
}
FNR==NR {
if (length > max) {
max = length
}
next
}
{
while (length < max) {
$0=$0 OFS
}
for (i=NF; i>=1; i--) {
printf (i!=1) ? $i : $i ORS
}
}
Alternatively, here's the one-liner:
awk 'BEGIN { FS="" } FNR==NR { if (length > max) max = length; next } { while (length < max) $0=$0 OFS; for (i=NF; i>=1; i--) printf (i!=1) ? $i : $i ORS }' <(echo "hello" | cowsay){,}
Results:
_______
> olleh <
-------
^__^ \
_______\)oo( \
\/\) \)__(
| w----||
|| ||
----------------------------------------------------------------------------------------------
Explanation:
Here's an explanation of the second answer. I'm assuming a basic knowledge of awk:
FS="" # set the file separator to read only a single character
# at a time.
FNR==NR { ... } # this returns true for only the first file in the argument
# list. Here, if the length of the line is greater than the
# variable 'max', then set 'max' to the length of the line.
# 'next' simply means consume the next line of input
while ... # So when we read the file for the second time, we loop
# through this file, adding OFS (output FS; which is simply
# a single space) to the end of each line until 'max' is
# reached. This pad's the file nicely.
for ... # then loop through the characters on each line in reverse.
# The printf statement is short for ... if the character is
# not at the first one, print it; else, print it and ORS.
# ORS is the output record separator and is a newline.
Some other things you may need to know:
The {,} wildcard suffix is a shorthand for repeating the input file name twice.
Unfortunately, it's not standard Bourne shell. However, you could instead use:
<(echo "hello" | cowsay) <(echo "hello" | cowsay)
Also, in the first example, { ... }1 is short for { ... print $0 }
HTH.
You could also do it with bash, coreutils and sed (to make it work with zsh the while loop needs to be wrapped in tr ' ' '\x01' | while ... | tr '\x01' ' ', not sure why yet):
say=hello
longest=$(cowsay "$say" | wc -L)
echo "$say" | rev | cowsay | sed 's/\\/\\\\/g' | rev |
while read; do printf "%*s\n" $longest "$REPLY"; done |
tr '[[]()<>/\\]' '[][)(><\\/]'
Output:
_______
< hello >
-------
^__^ /
_______/(oo) /
/\/( /(__)
| w----||
|| ||
This leaves a lot of excess spaces at the end, append | sed 's/ *$//' to remove.
Explanation
The cowsay output needs to be quoted, especially the backslashes which sed takes care of by duplicating them. To get the correct line width printf '%*s' len str is used, which uses len as the string length parameter. Finally asymmetrical characters are replaced by their counterparts, as done in ghoti's answer.
I don't know if you can do this in AWK, but here are the needed steps:
Identify the length of your original's most lengthy line, you will need it give proper spacing to any smaller lines.
(__)\ )\/\
For the last char on each line, map out the need of start-of-line spaces based on what you acquired from the first step.
< hello >
//Needs ??? extra spaces, because it ends right after '>'.
//It does not have spaces after it, making it miss it's correct position after reverse.
(__)\ )\/\
< hello >???????????????
For each line, apply the line's needed number of spaces, followed by the original chars in reverse order.
_______
> olleh <
-------
^__^ \
_______\)oo( \
\/\) \)__(
| w----||
|| ||
Finally, replace all characters that are not horizontally symmetrical with their horizontally-opposite chars. (< to >, [ to ], etc)
_______
< olleh >
-------
^__^ /
_______/(oo) /
/\/( /(__)
| w----||
|| ||
Two things to watch out for:
Text, as you can see, will not go right with reversions.
Characters like $, % and & are not horizontally symmetrical,
but also might not have an opposite unless you use specialized
Unicode blocks.
I would say that you may need each line to be fixed column width so each line is the same length. So if the first line is a character followed by a LF, you'll need to pad the reverse with white space before reversing.
I'd like to show all lines except those containing foo, unless they also contain bar. Logically !(foo and (!bar)) === (!foo) or bar, so I can use two separate expressions. Can I do this sort of match with a single grep or egrep? -v doesn't work, since it negates both expressions, and I probably can't use Perl regex.
The following works, but it would be much less work to convert the code if it could be done in egrep:
$ echo '
foo
bar
moofoo
foobar
barbar' | grep -Pv '^((?!bar).)*foo((?!bar).)*$'
bar
foobar
barbar
The issue at hand is speed (looking for patterns in gigabytes of data).
If using awk is fine then following gives desired output
awk 'BEGIN {FS=" "};
{
if ($0 ~ /(foo)/)
{
if ($0 ~ /(bar)/)
{
print $0
}
}
else
{
print $0
}
}' FileContainingText.txt
since this works per line and no pipes are involved this should be fast.
Trying to merge some data that I have. The input would look like so:
foo bar
foo baz boo
abc def
abc ghi
And I would like the output to look like:
foo bar baz boo
abc def ghi
I have some ideas using some arrays in a shell script, but I was looking for a more elegant or quicker solution.
How about join?
file="file"
join -a1 -a2 <(sort "$file" | sed -n 1~2p) <(sort "$file" | sed -n 2~2p)
The seds there are just splitting the file on odd and even lines
While pixelbeat's answer works, I can't say I'm very enthused about it. I think I'd use awk something like this:
{ for (i=2; i<=NF; i++) { lines[$1] = lines[$1] " " $i;} }
END { for (i in lines) printf("%s%s\n", i, lines[i]); }
This shouldn't require pre-sorting the data, and should work fine regardless of the number or length of the fields (short of overflowing memory, of course). Its only obvious shortcoming is that its output is in an arbitrary order. If you need it sorted, you'll need to pipe the output through sort (but getting back to the original order would be something else).
An awk solution
awk '
{key=$1; $1=""; x[key] = x[key] $0}
END {for (key in x) {print key x[key]}}
' filename
if the length of the first field is fixed, you can use uniq with the -w option. Otherwise you night want to use awk (warning: untested code):
awk '
BEGIN{last='';}
{
if ($1==last) {
for (i = 1; i < NF;i++) print $i;
} else {
print "\n", $0;
last = $1;
}
}'
Pure Bash, for truly alternating lines:
infile="paste.dat"
toggle=0
while read -a line ; do
if [ $toggle -eq 0 ] ; then
echo -n "${line[#]}"
else
unset line[0] # remove first element
echo " ${line[#]}"
fi
((toggle=1-toggle))
done < "$infile"
Based on fgm's pure Bash snippet:
text='
foo bar
foo baz boo
abc def
abc ghi
'
count=0
oneline=""
firstword=""
while IFS=" " read -a line ; do
let count++
if [[ $count -eq 1 ]]; then
firstword="${line[0]}"
oneline="${line[#]}"
else
if [[ "$firstword" == "${line[0]}" ]]; then
unset line[0] # remove first word of line
oneline="${oneline} ${line[#]}"
else
printf "%s\n" "${oneline}"
oneline="${line[#]}"
firstword="${line[0]}"
fi
fi
done <<< "$text"