awk/sed/shell to merge/concatenate data - join

Trying to merge some data that I have. The input would look like so:
foo bar
foo baz boo
abc def
abc ghi
And I would like the output to look like:
foo bar baz boo
abc def ghi
I have some ideas using some arrays in a shell script, but I was looking for a more elegant or quicker solution.

How about join?
file="file"
join -a1 -a2 <(sort "$file" | sed -n 1~2p) <(sort "$file" | sed -n 2~2p)
The seds there are just splitting the file on odd and even lines

While pixelbeat's answer works, I can't say I'm very enthused about it. I think I'd use awk something like this:
{ for (i=2; i<=NF; i++) { lines[$1] = lines[$1] " " $i;} }
END { for (i in lines) printf("%s%s\n", i, lines[i]); }
This shouldn't require pre-sorting the data, and should work fine regardless of the number or length of the fields (short of overflowing memory, of course). Its only obvious shortcoming is that its output is in an arbitrary order. If you need it sorted, you'll need to pipe the output through sort (but getting back to the original order would be something else).

An awk solution
awk '
{key=$1; $1=""; x[key] = x[key] $0}
END {for (key in x) {print key x[key]}}
' filename

if the length of the first field is fixed, you can use uniq with the -w option. Otherwise you night want to use awk (warning: untested code):
awk '
BEGIN{last='';}
{
if ($1==last) {
for (i = 1; i < NF;i++) print $i;
} else {
print "\n", $0;
last = $1;
}
}'

Pure Bash, for truly alternating lines:
infile="paste.dat"
toggle=0
while read -a line ; do
if [ $toggle -eq 0 ] ; then
echo -n "${line[#]}"
else
unset line[0] # remove first element
echo " ${line[#]}"
fi
((toggle=1-toggle))
done < "$infile"

Based on fgm's pure Bash snippet:
text='
foo bar
foo baz boo
abc def
abc ghi
'
count=0
oneline=""
firstword=""
while IFS=" " read -a line ; do
let count++
if [[ $count -eq 1 ]]; then
firstword="${line[0]}"
oneline="${line[#]}"
else
if [[ "$firstword" == "${line[0]}" ]]; then
unset line[0] # remove first word of line
oneline="${oneline} ${line[#]}"
else
printf "%s\n" "${oneline}"
oneline="${line[#]}"
firstword="${line[0]}"
fi
fi
done <<< "$text"

Related

Parsing simple string with awk or sed in linux

original string :
A/trunk/apple/B/trunk/apple/Z/trunk/orange/citrus/Q/trunk/melon/juice/venti/straw/
Depth of directories will vary, but /trunk part will always remain the same.
And a single character in front of /trunk is the indicator of that line.
desired output :
A /trunk/apple
B /trunk/apple
Z /trunk/orange
Q /trunk/melon/juice/venti/straw
*** edit
I'm sorry I made a mistake by adding a slash at the end of each path in the original string which made the output confusing. Original string didn't have the slash in front of the capital letter, but I'll leave it be.
my attempt :
echo $str1 | sed 's/\(.\/trunk\)/\n\1/g'
I feel like it should work but it doesn't.
With GNU awk for multi-char RS and RT:
$ awk -v RS='([^/]+/){2}[^/\n]+' 'RT{sub("/",OFS,RT); print RT}' file
A trunk/apple
B trunk/apple
Z trunk/orange
I'm setting RS to a regexp describing each string you want to match, i.e. 2 repetitions of non-/s followed by / and then a final string of non-/s (and non-newline for the last string on the input line). RT is automatically set to each of the matching strings, so then I just change the first / to a blank and print the result.
If each path isn't always 3 levels deep but does always start with something/trunk/, e.g.:
$ cat file
A/trunk/apple/banana/B/trunk/apple/Z/trunk/orange
then:
$ awk -v RS='[^/]+/trunk/' 'RT{if (NR>1) print pfx $0; pfx=gensub("/"," ",1,RT)} END{printf "%s%s", pfx, $0}' file
A trunk/apple/banana/
B trunk/apple/
Z trunk/orange
To deal with complex samples input, like where there could be N number of / and values after trunk in a single line please try following.
awk '
{
gsub(/[^/]*\/trunk/,OFS"&")
sub(/^ /,"")
sub(/\//,OFS"&")
gsub(/ +[^/]*\/trunk\/[^[:space:]]+/,"\n&")
sub(/\n/,OFS)
gsub(/\n /,ORS)
gsub(/\/trunk/,OFS"&")
sub(/[[:space:]]+/,OFS)
}
1
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
{
gsub(/[^/]*\/trunk/,OFS"&") ##Globally substituting everything from / to till next / followed by trunk/ with space and matched value.
sub(/^ /,"") ##Substituting starting space with NULL here.
sub(/\//,OFS"&") ##Substituting first / with space / here.
gsub(/ +[^/]*\/trunk\/[^[:space:]]+/,"\n&") ##Globally substituting spaces followed by everything till / trunk till space comes with new line and matched values.
sub(/\n/,OFS) ##Substituting new line with space.
gsub(/\n /,ORS) ##Globally substituting new line space with ORS.
gsub(/\/trunk/,OFS"&") ##Globally substituting /trunk with OFS and matched value.
sub(/[[:space:]]+/,OFS) ##Substituting spaces with OFS here.
}
1 ##Printing edited/non-edited line here.
' Input_file ##Mentioning Input_file name here.
With your shown samples, please try following awk code.
awk '{gsub(/\/trunk/,OFS "&");gsub(/trunk\/[^/]*\//,"&\n")} 1' Input_file
In awk you can try this solution. It deals with the special requirement of removing forward slashes when the next character is upper case. Will not win a design award but works.
$ echo "A/trunk/apple/B/trunk/apple/Z/trunk/orange" |
awk -F '' '{ x=""; for(i=1;i<=NF;i++){
if($(i+1)~/[A-Z]/&&$i=="/"){$i=""};
if($i~/[A-Z]/){ printf x""$i" "}
else{ x="\n"; printf $i } }; print "" }'
A /trunk/apple
B /trunk/apple
Z /trunk/orange
Also works for n words. Actually works with anything that follows the given pattern.
$ echo "A/fruits/apple/mango/B/anything/apple/pear/banana/Z/ball/orange/anything" |
awk -F '' '{ x=""; for(i=1;i<=NF;i++){
if($(i+1)~/[A-Z]/&&$i=="/"){$i=""};
if($i~/[A-Z]/){ printf x""$i" "}
else{ x="\n"; printf $i } }; print "" }'
A /fruits/apple/mango
B /anything/apple/pear/banana
Z /ball/orange/anything
This might work for you (GNU sed):
sed 's/[^/]*/& /;s/\//\n/3;P;D' file
Separate the first word from the first / by a space.
Replace the third / by a newline.
Print/delete the first line and repeat.
If the first word has the property that it is only one character long:
sed 's/./& /;s#/\(./\)#\n\1#;P;D' file
Or if the first word has the property that it begins with an upper case character:
sed 's/[[:upper:]][^/]*/& /;s#/\([[:upper:][^/]*/\)#\n\1#;P;D' file
Or if the first word has the property that it is followed by /trunk/:
sed -E 's#([^/]*)(/trunk/)#\n\1 \2#g;s/.//' file
With GNU sed:
$ str="A/trunk/apple/B/trunk/apple/Z/trunk/orange/citrus/Q/trunk/melon/juice/venti/straw/"
$ sed -E 's|/?(.)(/trunk/)|\n\1 \2|g;s|/$||' <<< "$str"
A /trunk/apple
B /trunk/apple
Z /trunk/orange/citrus
Q /trunk/melon/juice/venti/straw
Note the first empty output line. If it is undesirable we can separate the processing of the first output line:
$ sed -E 's|(.)|\1 |;s|/(.)(/trunk/)|\n\1 \2|g;s|/$||' <<< "$str"
A /trunk/apple
B /trunk/apple
Z /trunk/orange/citrus
Q /trunk/melon/juice/venti/straw
Using gnu awk you could use FPAT to set contents of each field using a pattern.
When looping the fields, replace the first / with /
str1="A/trunk/apple/B/trunk/apple/Z/trunk/orange"
echo $str1 | awk -v FPAT='[^/]+/trunk/[^/]+' '{
for(i=1;i<=NF;i++) {
sub("/", " /", $i)
print $i
}
}'
The pattern matches
[^/]+ Match any char except /
/trunk/[^/]+ Match /trunk/ and any char except /
Output
A /trunk/apple
B /trunk/apple
Z /trunk/orange
Other patterns that can be used by FPAT after the updated question:
Matching a word boundary \\< and an uppercase char A-Z and after /trunk repeat / and lowercase chars
FPAT='\\<[A-Z]/trunk(/[a-z]+)*'
If the length of the strings for the directories after /trunk are at least 2 characters:
FPAT='\\<[A-Z]/trunk(/[^/]{2,})*'
If there can be no separate folders that consist of a single uppercase char A-Z
FPAT='\\<[A-Z]/trunk(/([^/A-Z][^/]*|[^/]{2,}))*'
Output
A /trunk/apple
B /trunk/apple
Z /trunk/orange/citrus
Q /trunk/melon/juice/venti/straw
Assuming your data will always be in the format provided as a single string, you can try this sed.
$ sed 's/$/\//;s|\([A-Z]\)\([a-z/]*\)/\([a-z]*\?\)|\1 \2\3\n|g' input_file
$ echo "A/trunk/apple/pine/skunk/B/trunk/runk/bunk/apple/Z/trunk/orange/T/fruits/apple/mango/P/anything/apple/pear/banana/L/ball/orange/anything/S/fruits/apple/mango/B/rupert/cream/travel/scout/H/tall/mountains/pottery/barnes" | sed 's/$/\//;s|\([A-Z]\)\([a-z/]*\)/\([a-z]*\?\)|\1 \2\3\n|g'
A /trunk/apple/pine/skunk
B /trunk/runk/bunk/apple
Z /trunk/orange
T /fruits/apple/mango
P /anything/apple/pear/banana
L /ball/orange/anything
S /fruits/apple/mango
B /rupert/cream/travel/scout
H /tall/mountains/pottery/barnes
Some fun with perl, where you can using nonconsuming regex to autosplit into the #F array, then just print however you want.
perl -lanF'/(?=.{1,2}trunk)/' -e 'print "$F[2*$_] $F[2*$_+1]" for 0..$#F/2'
Step #1: Split
perl -lanF/(?=.{1,2}trunk)/'
This will take the input stream, and split each line whenever the pattern .{1,2}trunk is encountered
Because we want to retain trunk and the preceeding 1 or 2 chars, we wrap the split pattern in the (?=) for a non-consuming forward lookahead
This splits things up this way:
$ echo A/trunk/apple/B/trunk/apple/Z/trunk/orange/citrus/Q/trunk/melon/juice/venti/straw/ | perl -lanF'/(?=.{1,2}trunk)/' -e 'print join " ", #F'
A /trunk/apple/ B /trunk/apple/ Z /trunk/orange/citrus/ Q /trunk/melon/juice/venti/straw/
Step 2: Format output:
The #F array contains pairs that we want to print in order, so we'll iterate half of the array indices, and print 2 at a time:
print "$F[2*$_] $F[2*$_+1]" for 0..$#F/2 --> Double the iterator, and print pairs
using perl -l means each print has an implicit \n at the end
The results:
$ echo A/trunk/apple/B/trunk/apple/Z/trunk/orange/citrus/Q/trunk/melon/juice/venti/straw/ | perl -lanF'/(?=.{1,2}trunk)/' -e 'print "$F[2*$_] $F[2*$_+1]" for 0..$#F/2'
A /trunk/apple/
B /trunk/apple/
Z /trunk/orange/citrus/
Q /trunk/melon/juice/venti/straw/
Endnote: Perl obfuscation that didn't work.
Any array in perl can be cast as a hash, of the format (key,val,key,val....)
So %F=#F; print "$_ $F{$_}" for keys %F seems like it would be really slick
But you lose order:
$ echo A/trunk/apple/B/trunk/apple/Z/trunk/orange/citrus/Q/trunk/melon/juice/venti/straw/ | perl -lanF'/(?=.{1,2}trunk)/' -e '%F=#F; print "$_ $F{$_}" for keys %F'
Z /trunk/orange/citrus/
A /trunk/apple/
Q /trunk/melon/juice/venti/straw/
B /trunk/apple/
Update
With your new data file:
$ cat file
A/trunk/apple/B/trunk/apple/Z/trunk/orange/citrus/Q/trunk/melon/juice/venti/straw/
This GNU awk solution:
awk '
{
sub(/[/]$/,"")
gsub(/[[:upper:]]{1}/,"& ")
print gensub(/([/])([[:upper:]])/,"\n\\2","g")
}' file
A /trunk/apple
B /trunk/apple
Z /trunk/orange/citrus
Q /trunk/melon/juice/venti/straw

extract the adjacent character of selected letter

I have this text file:
# cat letter.txt
this
is
just
a
test
to
check
if
grep
works
The letter "e" appear in 3 words.
# grep e letter.txt
test
check
grep
Is there any way to return the letter printed on left of the selected character?
expected.txt
t
h
r
With shown samples in awk, could you please try following.
awk '/e/{print substr($0,index($0,"e")-1,1)}' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
/e/{ ##Looking if current line has e in it then do following.
print substr($0,index($0,"e")-1,1)
##Printing sub string from starting value of index e-1 and print 1 character from there.
}
' Input_file ##Mentioning Input_file name here.
You can use positive lookahead to match a character that is followed by an e, without making the e part of the match.
cat letter.txt | grep -oP '.(?=e)'
With sed:
sed -nE 's/.*(.)e.*/\1/p' letter.txt
Assuming you have this input file:
cat file
this
is
just
a
test
to
check
if
grep
works
egg
element
You may use this grep + sed solution to find letter or empty string before e:
grep -oE '(^|.)e' file | sed 's/.$//'
t
h
r
l
m
Or alternatively this single awk command should also work:
awk -F 'e' 'NF > 1 {
for (i=1; i<NF; i++) print substr($i, length($i), 1)
}' file
This might work for you (GNU sed):
sed -nE '/(.)e/{s//\n\1\n/;s/^[^\n]*\n//;P;D}' file
Turn off implicit printing and enable extended regexp -nE.
Focus only on lines that meet the requirements i.e. contain a character before e.
Surround the required character by newlines.
Remove any characters before and including the first newline.
Print the first line (up to the second newline).
Delete the first line (including the newline).
Repeat.
N.B. The solution will print each such character on a separate line.
To print all such characters on their own line, use:
sed -nE '/(.e)/{s//\n\1/g;s/^/e/;s/e[^\n]*\n?//g;s/\B/ /g;p}' file
N.B. Remove the s/\B /g if space separation is not needed.
With GNU awk you can use empty string as FS to split the input as individual characters:
awk -v FS= '/[e]/ {for(i=2;i<=NF;i++) if ($i=="e") print $(i-1)}' file
t
h
r
Excluding "e" at the beginning in the for loop.
edited
empty string if e is the first character in the word.
For example, this input:
cat file2
grep
erroneously
egg
Wednesday
effectively
awk -v FS= '/^[e]/ {print ""} /[e]/ {for(i=2;i<=NF;i++) if ($i=="e") print $(i-1)}' file2
r
n
W
n
f
v

Word count and it output

I have the following lines:
123;123;#rss
123;123;#site #design #rss
123;123;#rss
123;123;#rss
123;123;#site #design
and need to count how many times each tag appears, do the following:
grep -Eo '#[a-z].*' ./1.txt | tr "\ " "\n" | uniq -c
i.e. first select only the tags from the strings, and then break them down and count it.
output:
1 #rss
1 #site
1 #design
3 #rss
1 #site
1 #design
instead of the expected:
2 #site
4 #rss
2 #design
It seems that the problem is in the non-printable characters, which makes counting incorrect. Or is it something else? Can anyone suggest a correct solution?
uniq -c works only on sorted input.
Also, you can drop the tr by changing the regex to #[a-z]*.
grep -Eo '#[a-z]*' ./1.txt | sort | uniq -c
prints
2 #design
4 #rss
2 #site
as expected.
It can be done in a single gnu awk:
awk -v RS='#[a-zA-Z]+' 'RT {++freq[RT]} END {for (i in freq) print freq[i], i}' file
2 #site
2 #design
4 #rss
Or else a grep + awk solution:
grep -iEo '#[a-z]+' file |
awk '{++freq[$1]} END {for (i in freq) print freq[i], i}'
2 #site
2 #design
4 #rss
Using awk as an alternative:
awk -F [" "\;] '{ for(i=3;i<=NF;i++) { map[$i]++ } } END { for (i in map) { print map[i]" "i} }' file
Set the field separator to a space or a ";" Then loop from the third field to the last field (NF), adding to an array map, with the field as the index and incrementing counter as the value. At the end of the file processing, loop through the map array and print the indexes/values.
With your shown samples only, could you please try following. Written and tested in GNU awk.
awk '
{
while($0){
match($0,/#[^ ]*/)
count[substr($0,RSTART,RLENGTH)]++
$0=substr($0,RSTART+RLENGTH)
}
}
END{
for(key in count){
print count[key],key
}
}' Input_file
Output will be as follows.
2 #site
2 #design
4 #rss
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
{
while($0){ ##Running while till line value.
match($0,/#[^ ]*/) ##using match function to match regex #[^ ]* in current line.
count[substr($0,RSTART,RLENGTH)]++ ##Creating count array which has index as matched sub string and keep increasing its value with 1 here.
$0=substr($0,RSTART+RLENGTH) ##Putting rest of line after match into currnet line here.
}
}
END{ ##Starting END block of this program from here.
for(key in count){ ##using for loop to go throgh count here.
print count[key],key ##printing value of count which has index as key and key here.
}
}
' Input_file ##Mentioning Input_file name here.
$ cut -d';' -f3 file | tr ' ' '\n' | sort | uniq -c
2 #design
4 #rss
2 #site

grep a block of text delimited by two key lines

I have a text file that contains text blocks roughly formatted like this:
Beginning of block
...
...
...
.........some_pattern.......
...
...
End of block
Beginning of block
...
... etc.
The blocks can have any number of lines but always start with the two delimiters. What I'd like to do is match "some_pattern" and print the whole block to stdout. With the example above, I would get this only:
Beginning of block
...
...
...
.........some_pattern.......
...
...
End of block
I've tried with something like this but without success:
grep "Beginning of block\n.*some_pattern.*\n.*End of block"
Any idea how to do this with grep? (or maybe with some other tool)
I guess awk is better for this:
awk '/Beginning of block/ {p=1};
{if (p==1) {a[NR]=$0}};
/some_pattern/ {f=1};
/End of block/ {p=0; if (f==1) {for (i in a) print a[i]};f=0; delete a}' file
Explanation
It just prints when the p flag is "active" and some_pattern is matched:
When it finds Beginning of block, then makes variable p=1 and starts storing the lines in the array a[].
If it finds some_pattern, it sets the flag f to 1, so that we know the pattern has been found.
When it finds End of block it resets p=0. If some_pattern had been found since the last Beginning of block, all the lines that had been stored are printed. Finally a[] is cleared and f is reset; we will have a fresh start when we again encounter Beginning of block.
Other test
$ cat a
Beginning of block
blabla
.........some_pattern.......
and here i am
hello
End of block
Beginning of block
...
... etc.
End of block
$ awk '/Beginning of block/ {p=1}; {if(p==1){a[NR]=$0}}; /some_pattern/ {f=1}; /End of block/ {p=0; if (f==1) {for (i in a) print a[i]}; delete a;f=0}' a
Beginning of block
blabla
.........some_pattern.......
and here i am
hello
End of block
The following might work for you:
sed -n '/Beginning of block/!b;:a;/End of block/!{$!{N;ba}};{/some_pattern/p}' filename
Not sure if I missed something but here is a simpler variation of one of the answers above:
awk '/Beginning of block/ {p=1};
/End of block/ {p=0; print $0};
{if (p==1) print $0}'
You need to print the input line in the End of Block case to get both delimiters.
I wanted a slight variation that doesn't print the delimiters. In the OP's question the delimiter pattern is simple and unique. Then the simplest is to pipe into | grep -v block. My case was more irregular, so I used the variation below. Notice the next statement so the opening block isn't printed by the third statement:
awk '/Beginning of block/ {p=1; next};
/End of block/ {p=0};
{if (p==1) print $0}'
Here's one way using awk:
awk '/Beginning of block/ { r=""; f=1 } f { r = (r ? r ORS : "") $0 } /End of block/ { if (f && r ~ /some_pattern/) print r; f=0 }' file
Results:
Beginning of block
...
...
...
.........some_pattern.......
...
...
End of block
sed -n "
/Beginning of block/,/End of block/ {
N
/End of block/ {
s/some_pattern/&/p
}
}"
sed is efficient for such a treatment
with grep, you certainly should pass through intermediary file or array.

How to horizontally mirror ascii art?

So ... I know that I can reverse the order of lines in a file using tac or a few other tools, but how do I reorder in the other dimension, i.e. horizontally? I'm trying to do it with the following awk script:
{
out="";
for(i=length($0);i>0;i--) {
out=out substr($0,i,1)}
print out;
}
This seems to reverse the characters, but it's garbled, and I'm not seeing why. What am I missing?
I'm doing this in awk, but is there something better? sed, perhaps?
Here's an example. Input data looks like this:
$ cowsay <<<"hello"
_______
< hello >
-------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||
And the output looks like this:
$ cowsay <<<"hello" | rev
_______
> olleh <
-------
^__^ \
_______\)oo( \
\/\) \)__(
| w----||
|| ||
Note that the output is identical whether I use rev or my own awk script. As you can see, things ARE reversed, but ... it's mangled.
rev is nice, but it doesn't pad input lines. It just reverses them.
The "mangling" you're seeing is because one line may be 20 characters long, and the next may be 15 characters long. In your input text they share a left-hand column. But in your output text, they need to share a right-hand column.
So you need padding. Oh, and asymmetric reversal, as Joachim said.
Here's my revawk:
#!/usr/bin/awk -f
#
length($0)>max {
max=length($0);
}
{
# Reverse the line...
for(i=length($0);i>0;i--) {
o[NR]=o[NR] substr($0,i,1);
}
}
END {
for(i=1;i<=NR;i++) {
# prepend the output with sufficient padding
fmt=sprintf("%%%ds%%s\n",max-length(o[i]));
printf(fmt,"",o[i]);
}
}
(I did this in gawk; I don't think I used any gawkisms, but if you're using a more classic awk variant, you may need to adjust this.)
Use this the same way you'd use rev.
ghoti#pc:~$ echo hello | cowsay | ./revawk | tr '[[]()<>/\\]' '[][)(><\\/]'
_______
< olleh >
-------
^__^ /
_______/(oo) /
/\/( /(__)
| w----||
|| ||
If you're moved to do so, you might even run the translate from within the awk script by adding it to the last printf line:
printf(fmt," ",o[i]) | "tr '[[]()<>/\\]' '[][)(><\\/]'";
But I don't recommend it, as it makes the revawk command less useful for other applications.
Your lines aren't the same length, so reversing the cow will break it. What you need to do is to "pad" the lines to be the same length, then reverse.
For example;
cowsay <<<"hello" | awk '{printf "%-40s\n", $0}' | rev
will pad it to 40 columns, and then reverse.
EDIT: #ghoti did a script that sure beats this simplistic reverse, have a look at his answer.
Here's one way using GNU awk and rev
Run like:
awk -f ./script.awk <(echo "hello" | cowsay){,} | rev
Contents of script.awk:
FNR==NR {
if (length > max) {
max = length
}
next
}
{
while (length < max) {
$0=$0 OFS
}
}1
Alternatively, here's the one-liner:
awk 'FNR==NR { if (length > max) max = length; next } { while (length < max) $0=$0 OFS }1' <(echo "hello" | cowsay){,} | rev
Results:
_______
> olleh <
-------
^__^ \
_______\)oo( \
\/\) \)__(
| w----||
|| ||
----------------------------------------------------------------------------------------------
Here's another way just using GNU awk:
Run like:
awk -f ./script.awk <(echo "hello" | cowsay){,}
Contents of script.awk:
BEGIN {
FS=""
}
FNR==NR {
if (length > max) {
max = length
}
next
}
{
while (length < max) {
$0=$0 OFS
}
for (i=NF; i>=1; i--) {
printf (i!=1) ? $i : $i ORS
}
}
Alternatively, here's the one-liner:
awk 'BEGIN { FS="" } FNR==NR { if (length > max) max = length; next } { while (length < max) $0=$0 OFS; for (i=NF; i>=1; i--) printf (i!=1) ? $i : $i ORS }' <(echo "hello" | cowsay){,}
Results:
_______
> olleh <
-------
^__^ \
_______\)oo( \
\/\) \)__(
| w----||
|| ||
----------------------------------------------------------------------------------------------
Explanation:
Here's an explanation of the second answer. I'm assuming a basic knowledge of awk:
FS="" # set the file separator to read only a single character
# at a time.
FNR==NR { ... } # this returns true for only the first file in the argument
# list. Here, if the length of the line is greater than the
# variable 'max', then set 'max' to the length of the line.
# 'next' simply means consume the next line of input
while ... # So when we read the file for the second time, we loop
# through this file, adding OFS (output FS; which is simply
# a single space) to the end of each line until 'max' is
# reached. This pad's the file nicely.
for ... # then loop through the characters on each line in reverse.
# The printf statement is short for ... if the character is
# not at the first one, print it; else, print it and ORS.
# ORS is the output record separator and is a newline.
Some other things you may need to know:
The {,} wildcard suffix is a shorthand for repeating the input file name twice.
Unfortunately, it's not standard Bourne shell. However, you could instead use:
<(echo "hello" | cowsay) <(echo "hello" | cowsay)
Also, in the first example, { ... }1 is short for { ... print $0 }
HTH.
You could also do it with bash, coreutils and sed (to make it work with zsh the while loop needs to be wrapped in tr ' ' '\x01' | while ... | tr '\x01' ' ', not sure why yet):
say=hello
longest=$(cowsay "$say" | wc -L)
echo "$say" | rev | cowsay | sed 's/\\/\\\\/g' | rev |
while read; do printf "%*s\n" $longest "$REPLY"; done |
tr '[[]()<>/\\]' '[][)(><\\/]'
Output:
_______
< hello >
-------
^__^ /
_______/(oo) /
/\/( /(__)
| w----||
|| ||
This leaves a lot of excess spaces at the end, append | sed 's/ *$//' to remove.
Explanation
The cowsay output needs to be quoted, especially the backslashes which sed takes care of by duplicating them. To get the correct line width printf '%*s' len str is used, which uses len as the string length parameter. Finally asymmetrical characters are replaced by their counterparts, as done in ghoti's answer.
I don't know if you can do this in AWK, but here are the needed steps:
Identify the length of your original's most lengthy line, you will need it give proper spacing to any smaller lines.
(__)\ )\/\
For the last char on each line, map out the need of start-of-line spaces based on what you acquired from the first step.
< hello >
//Needs ??? extra spaces, because it ends right after '>'.
//It does not have spaces after it, making it miss it's correct position after reverse.
(__)\ )\/\
< hello >???????????????
For each line, apply the line's needed number of spaces, followed by the original chars in reverse order.
_______
> olleh <
-------
^__^ \
_______\)oo( \
\/\) \)__(
| w----||
|| ||
Finally, replace all characters that are not horizontally symmetrical with their horizontally-opposite chars. (< to >, [ to ], etc)
_______
< olleh >
-------
^__^ /
_______/(oo) /
/\/( /(__)
| w----||
|| ||
Two things to watch out for:
Text, as you can see, will not go right with reversions.
Characters like $, % and & are not horizontally symmetrical,
but also might not have an opposite unless you use specialized
Unicode blocks.
I would say that you may need each line to be fixed column width so each line is the same length. So if the first line is a character followed by a LF, you'll need to pad the reverse with white space before reversing.

Resources