grep in pipeline: why it does not work - grep

I want to extract certain information from the output of a program. But my method does not work. I write a rather simple script.
#!/usr/bin/env python
print "first hello world."
print "second"
After making the script executable, I type ./test | grep "first|second". I expect it to show the two sentences. But it does not show anything. Why?

Escape the expression.
$ ./test | grep "first\|second"
first hello world.
second
Also bear in mind that the shebang is #!/usr/bin/env python, not just #/usr/bin/env python.

use \| instead of |
./test | grep "first\|second"

Related

How can i make grep show a line ignoring the words i want?

I am trying to use grep with the pwd command.
So, if i enter pwd, it shows me something like:
/home/hrq/my-project/
But, for purposes of a script i am making, i need to use it with grep, so it only prints what is after hrq/, so i need to hide my home folder always (the /home/hrq/) excerpt, and show only what is onwards (like, in this case, only my-project).
Is it possible?
I tried something like
pwd | grep -ov 'home', since i saw that the "-v" flag would be equivalent to the NOT operator, and combine it with the "-o" only matching flag. But it didn't work.
Given:
$ pwd
/home/foo/tmp
$ echo "$PWD"
/home/foo/tmp
Depending on what it is you really want to do, either of these is probably what you really should be using rather than trying to use grep:
$ basename "$PWD"
tmp
$ echo "${PWD#/home/foo/}"
tmp
Use grep -Po 'hrq/\K.*', for example:
grep -Po 'hrq/\K.*' <<< '/home/hrq/my-project/'
my-project/
Here, grep uses the following options:
-P : Use Perl regexes.
-o : Print the matches only (1 match per line), not the entire lines.
\K : Cause the regex engine to "keep" everything it had matched prior to the \K and not include it in the match. Specifically, ignore the preceding part of the regex when printing the match.
SEE ALSO:
grep manual
perlre - Perl regular expressions

Use awk to parse and modify every CSV field

I need to parse and modify a each field from a CSV header line for a dynamic sqlite create table statement. Below is what works from the command line with the appropriate output:
echo ",header1,header2,header3"| awk 'BEGIN {FS=","}; {for(i=2;i<=NF;i++){printf ",%s text ", $i}; printf "\n"}'
,header1 text ,header2 text ,header3 text
Well, it breaks when it is run from within a bash shell script. I got it to work by writing the output to a file like below:
echo $optionalHeaders | awk 'BEGIN {FS=","}; {for(i=2;i<=NF;i++){printf ",%s text ", $i}; printf "\n"}' > optionalHeaders.txt
This sucks! There are a lot of examples that show how to parse/modify specific Nth fields. This issue requires each field to be modified. Is there a more concise and elegant Awk one liner that can store its contents to a variable rather than writing to a file?
sed is usually the right tool for simple substitutions on a single line. Take your pick:
$ echo ",header1,header2,header3" | sed 's/[^,][^,]*/& text/g'
,header1 text,header2 text,header3 text
$ echo ",header1,header2,header3" | sed -r 's/[^,]+/& text/g'
,header1 text,header2 text,header3 text
The last 1 above requires GNU sed to use EREs instead of BREs. You can do the same in awk using gsub() if you prefer:
$ echo ",header1,header2,header3" | awk '{gsub(/[^,]+/,"& text")}1'
,header1 text,header2 text,header3 text
I found the problem and it was me... I forgot to echo the contents of the variable to the Awk command. Brianadams comment was so simple that forced me to re-look at my code and find the problem! Thanks!
I am ok with resolving this but if anyone wants to propose a more concise and elegant Awk one liner - that would be cool.
You can try the following:
#! /bin/bash
header=",header1,header2,header3"
newhead=$(awk 'BEGIN {FS=OFS=","}; {for(i=2;i<=NF;i++) $i=$i" text"}1' <<<"$header")
echo "$newhead"
with output:
,header1 text,header2 text,header3 text
Instead of modifying fields one by one, another option is with a simple substitution:
echo ",header1,header2,header3" | awk '{gsub(/[^,]+/, "& text", $0); print}'
That is, replace a sequence of non-comma characters with text appended.
Another alternative would be replacing the commas, but due to the irregularities of your header line (first comma must be left alone, no comma at the end), that's a bit less easy:
echo ",header1,header2,header3" | awk '{gsub(/,/, " text,", $0); sub(/^ text,/, "", $0); print $0 " text"}'
Btw, the rough equivalent of the two commands in sed:
echo ",header1,header2,header3" | sed -e 's/[^,]\{1,\}/& text/g'
echo ",header1,header2,header3" | sed -e 's/\(.\),/\1 text,/g' -e 's/$/ text/'

Bash: grep pattern to parse command output

I'm trying to parse the output of a command line tool. It outputs XML directly to STDOU and I want to parse it.
The tool outputs a full XML document like the following:
My goal is to parse that output and only the the string between the <date> tag, but since the document might contain another <date> tags, it must check only the the <date> that follows <key>SULastCheckTime</key>. (And that is a messy situation with new line/spaces there).
Currently I'm solving this situation with the following command:
tool... | grep -A1 '<key>SULastCheckTime</key>' | grep 'string.$' | sed -e 's,.*<date>\([^<]*\)</date>.*,\1,g'
It works fine but it's very messy as you can see and I can't write anything better? Can you help me making it better?
Thank you!
PS: Since I'm doing this in OSX, I don't have the new GNU grepoptions. Btw, by bash version is 3.2.48(1). And... I can't afford to install other tools to parse XML in a better way.
Maybe something like this?
$ cat foo.input
foo
foo
<key>some key</key>
<date>some date</date>
bar
bar
<key>SULastCheckTime</key>
<date>2013-08-10T00:27:40Z</date>
quux
quux
 
$ awk '/<key>SULastCheckTime<\/key>/ { toggle=1 } toggle && /<date>.*<\/date>/ { gsub(/<[^>]*>/, "", $1); print; exit }' foo.input
2013-08-10T00:27:40Z

grep is unable to find all pattern matching "\[\[\[\["

I am having problems with using grep along with a pipe. The scenario is as follows:
I am running a python script that outputs (using print) to the screen debug messages. I use ./prog | grep "\[\[\[\[" to catch the strings with "[[[[" in them. It returns few matching results but not others (Another observation: results found by grep come before the results not found by grep in the file). I have ran the ./prog without pipe and grep and it outputs all the strings with "[[[[" pattern.
The problem is that the left square bracket is a special character in regular expressions. "grep" is not just a string matcher. Regular expressions are an involved language that let you describe patterns of text. Grep is trying to interpret [[[[ as a regular expression, not just a string.
As your question subject suggests, you can usually escape special characters with a backslash. So the following might work:
./prog | grep '\[\[\[\['
You can also "escape" square brackets by putting them inside square brackets. Thus, [[][[][[][[] or [[]{4} if your version of grep handles it.
You also need to determine whether your program, ./prog, is sending output to "standard output" or "standard error". You can put all your stderr through the pipe with:
./proc 2>&1 | egrep '[[]{4}'
UPDATE:
[ghoti#pc ~]$ printf '[[[[\n[[[\n[[[[\n[[[[[\n[[\n' | grep '\[\[\[\['
[[[[
[[[[
[[[[[
[ghoti#pc ~]$ printf '[[[[\n[[[\n[[[[\n[[[[[\n[[\n' | egrep '[[]{4}'
[[[[
[[[[
[[[[[
[ghoti#pc ~]$
Obviously, my results do not match yours. If you can provide more details as to the data you're processing, it will be helpful in trying to duplicate your results.
Error messages are usually sent to stderr, not stdout; your pipe is filtering stdout. (Your "another observation" hints at this.) You can redirect stderr along with stdout to the pipe:
./prog 2>&1 | grep '\[\[\[\['

Can grep show only words that match search pattern?

Is there a way to make grep output "words" from files that match the search expression?
If I want to find all the instances of, say, "th" in a number of files, I can do:
grep "th" *
but the output will be something like (bold is by me);
some-text-file : the cat sat on the mat
some-other-text-file : the quick brown fox
yet-another-text-file : i hope this explains it thoroughly
What I want it to output, using the same search, is:
the
the
the
this
thoroughly
Is this possible using grep? Or using another combination of tools?
Try grep -o:
grep -oh "\w*th\w*" *
Edit: matching from Phil's comment.
From the docs:
-h, --no-filename
Suppress the prefixing of file names on output. This is the default
when there is only one file (or only standard input) to search.
-o, --only-matching
Print only the matched (non-empty) parts of a matching line,
with each such part on a separate output line.
Cross distribution safe answer (including windows minGW?)
grep -h "[[:alpha:]]*th[[:alpha:]]*" 'filename' | tr ' ' '\n' | grep -h "[[:alpha:]]*th[[:alpha:]]*"
If you're using older versions of grep (like 2.4.2) which do not include the -o option, then use the above. Else use the simpler to maintain version below.
Linux cross distribution safe answer
grep -oh "[[:alpha:]]*th[[:alpha:]]*" 'filename'
To summarize: -oh outputs the regular expression matches to the file content (and not its filename), just like how you would expect a regular expression to work in vim/etc... What word or regular expression you would be searching for then, is up to you! As long as you remain with POSIX and not perl syntax (refer below)
More from the manual for grep
-o Print each match, but only the match, not the entire line.
-h Never print filename headers (i.e. filenames) with output lines.
-w The expression is searched for as a word (as if surrounded by
`[[:<:]]' and `[[:>:]]';
The reason why the original answer does not work for everyone
The usage of \w varies from platform to platform, as it's an extended "perl" syntax. As such, those grep installations that are limited to work with POSIX character classes use [[:alpha:]] and not its perl equivalent of \w. See the Wikipedia page on regular expression for more
Ultimately, the POSIX answer above will be a lot more reliable regardless of platform (being the original) for grep
As for support of grep without -o option, the first grep outputs the relevant lines, the tr splits the spaces to new lines, the final grep filters only for the respective lines.
(PS: I know most platforms by now would have been patched for \w.... but there are always those that lag behind)
Credit for the "-o" workaround from #AdamRosenfield answer
It's more simple than you think. Try this:
egrep -wo 'th.[a-z]*' filename.txt #### (Case Sensitive)
egrep -iwo 'th.[a-z]*' filename.txt ### (Case Insensitive)
Where,
egrep: Grep will work with extended regular expression.
w : Matches only word/words instead of substring.
o : Display only matched pattern instead of whole line.
i : If u want to ignore case sensitivity.
You could translate spaces to newlines and then grep, e.g.:
cat * | tr ' ' '\n' | grep th
Just awk, no need combination of tools.
# awk '{for(i=1;i<=NF;i++){if($i~/^th/){print $i}}}' file
the
the
the
this
thoroughly
grep command for only matching and perl
grep -o -P 'th.*? ' filename
I was unsatisfied with awk's hard to remember syntax but I liked the idea of using one utility to do this.
It seems like ack (or ack-grep if you use Ubuntu) can do this easily:
# ack-grep -ho "\bth.*?\b" *
the
the
the
this
thoroughly
If you omit the -h flag you get:
# ack-grep -o "\bth.*?\b" *
some-other-text-file
1:the
some-text-file
1:the
the
yet-another-text-file
1:this
thoroughly
As a bonus, you can use the --output flag to do this for more complex searches with just about the easiest syntax I've found:
# echo "bug: 1, id: 5, time: 12/27/2010" > test-file
# ack-grep -ho "bug: (\d*), id: (\d*), time: (.*)" --output '$1, $2, $3' test-file
1, 5, 12/27/2010
cat *-text-file | grep -Eio "th[a-z]+"
You can also try pcregrep. There is also a -w option in grep, but in some cases it doesn't work as expected.
From Wikipedia:
cat fruitlist.txt
apple
apples
pineapple
apple-
apple-fruit
fruit-apple
grep -w apple fruitlist.txt
apple
apple-
apple-fruit
fruit-apple
I had a similar problem, looking for grep/pattern regex and the "matched pattern found" as output.
At the end I used egrep (same regex on grep -e or -G didn't give me the same result of egrep) with the option -o
so, I think that could be something similar to (I'm NOT a regex Master) :
egrep -o "the*|this{1}|thoroughly{1}" filename
To search all the words with start with "icon-" the following command works perfect. I am using Ack here which is similar to grep but with better options and nice formatting.
ack -oh --type=html "\w*icon-\w*" | sort | uniq
You could pipe your grep output into Perl like this:
grep "th" * | perl -n -e'while(/(\w*th\w*)/g) {print "$1\n"}'
grep --color -o -E "Begin.{0,}?End" file.txt
? - Match as few as possible until the End
Tested on macos terminal
$ grep -w
Excerpt from grep man page:
-w: Select only those lines containing matches that form whole words. The test is that the matching substring must either be at the beginning of the line, or preceded by a non-word constituent character.
ripgrep
Here are the example using ripgrep:
rg -o "(\w+)?th(\w+)?"
It'll match all words matching th.

Resources