Using globs in GNU grep's path argument

Using globs in GNU grep's path argument - grep

BSD (Mac) grep allows for this command:
grep -n "FIXME" **/*.rb
But GNU grep forces me to specify at least a folder to start from:
grep -n "FIXME" {lib,spec}/**/*.rb
Is there a way to get this to behave like it does in BSD grep?

Switch to ack. It uses the recursive strategy by default, and comes with loads of tricky regexes for types of language files available as flags.
For instance, writing:
ack FIXME --ruby
Will search the current directory recursively for anything that may be a Ruby file. This will work the same on Mac and Linux.

Related

Grep with RegEx Inside a Docker Container?

Dipping my toes into Bash coding for the first time (not the most experienced person with Linux either) and I'm trying to read the version from the version.php inside a container at:
/config/www/nextcloud/version.php
To do so, I run:
docker exec -it 1c8c05daba19 grep -eo "(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(?:-((?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\.(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\+([0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?" /config/www/nextcloud/version.php
This uses a semantic versioning RegEx pattern (I know, a bit overkill, but it works for now) to read and extract the version from the line:
$OC_VersionString = '20.0.1';
However, when I run the command it tells me No such file or directory, (I've confirmed it does exist at that path inside the container) and then proceeds to spit out the entire contents of the file it just said doesn't exist?
grep: (0|[1-9]\d*).(0|[1-9]\d*).(0|[1-9]\d*)(?:-((?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-])(?:.(?:0|[1-9]\d|\d*[a-zA-Z-][0-9a-zA-Z-]))))?(?:+([0-9a-zA-Z-]+(?:.[0-9a-zA-Z-]+)*))?: No such file or directory
/config/www/nextcloud/version.php:$OC_Version = array(20,0,1,1);
/config/www/nextcloud/version.php:$OC_VersionString = '20.0.1';
/config/www/nextcloud/version.php:$OC_Edition = '';
/config/www/nextcloud/version.php:$OC_VersionCanBeUpgradedFrom = array (
/config/www/nextcloud/version.php: 'nextcloud' =>
/config/www/nextcloud/version.php: 'owncloud' =>
/config/www/nextcloud/version.php:$vendor = 'nextcloud';
Anyone able to spot the problem?
Update 1:
For the sake of clarity, I'm trying to run this from a bash script. I just want to fetch the version number from that file, to use it in other areas of the script.
Update 2:
Responding to the comments, I tried to login to the container first, and then run the grep, and still get the same result. Then I cat that file and it shows it's contents no problem.

Many containers don't have the GNU versions of Unix tools and their various extensions. It's popular to base containers on Alpine Linux, which in turn uses a very lightweight single-binary tool called BusyBox to provide the base tools. Those tend to have the set of options required in the POSIX specs, and no more.
POSIX grep(1) in particular doesn't have an -o option. So the command you're running is
grep \
-eo \ # specify "o" as the regexp to match
"(regexps are write-only)" \ # a filename
/config/www/nextcloud/version.php # a second filename
Notice that the grep output in the interactive shell only contains lines with the letter "o", but not for example the line just containing array.
POSIX grep doesn't have an equivalent for GNU grep's -o option
Print only the matched (non-empty) parts of matching lines, with each such part on a separate output line. Output lines use the same delimiters as input....
but it's easy to do that with sed(1) instead. Ask it to match some stuff, the regexp in question, and some stuff, and replace it with the matched group.
sed -e 's/.*\(any regexp here\).*/\1/' input-file
(POSIX sed only accepts basic regular expressions, so you'll have to escape more of the parentheses.)

Well, for any potential future readers, I had no luck getting grep to do it, I'm sure it was my fault somehow and not grep's, but thanks to the help in this post I was able to use awk instead of grep, like so:
docker exec -it 1c8c05daba19 awk '/^\$OC_VersionString/ && match($0,/\047[0-9]+\.[0-9]+\.[0-9]+\047/){print substr($0,RSTART+1,RLENGTH-2)}' /config/www/nextcloud/version.php
That ended up doing exactly what I needed:
It logs into a docker container.
Scans and returns just the version number from the line I am looking for at: /config/www/nextcloud/version.php inside the container.
Exits stage left from the container with just the info I needed.
I can get right back to eating my Hot Cheetos.

Wildcards in erlc's -I option?

Is it possible to use wildcards in the Erlang compiler's -I option?
For example, I want to do something like this:
erlc -I deps/*/include -I deps src/foo.erl
I know that other solutions exist (like using rebar or make) but in this case, I am looking explicitly at erlc.

In Linux (and other unixoid systems) wildcards are never resolved by the invoked program.
The shell you use (e.g. bash) resolves all wildcards.
So erlc won't see the the asterix at all.
(If you read the documentation of find(1) you may find that my previous explanation is somewhat oversimplified.)
If you don't want to use an extra tool (I'd recommend looking at rebar oder make, though), you could try:
erlc $(find deps -name include -exec echo '-I {}' ';') -I deps src/foo.erl
(Weak substitute, I know.)

grep match with string1 OR string2

I want to grep 2 patterns in a file on Solaris UNIX.
That is grep 'pattern1 OR pattern2' filename.
The following command does NOT work:
grep 'pattern1\|pattern2' filename
What is wrong with this command?
NOTE: I am on Solaris

What operating system are you on?
It will work with on systems with GNU grep, but on BSD, Solaris, etc., \| is not supported.
Try egrep or grep -E, e.g.
egrep 'pattern1|pattern2'

If you want POSIX functionality (i.e. Linux-like behavior) you can put the POSIX 2-compatible binaries at the beginning of your path:
$ echo $PATH
/usr/xpg4/bin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:[...]
There is also /usr/xpg6 which is POSIX 1 compatible.
/usr/bin: SVID/XPG3
/usr/xpg4/bin: POSIX.2/POSIX.2a/SUS/SUSv2/XPG4
/usr/xpg6/bin: POSIX.1-2001/SUSv3

That command works fine for me. Please add additional information such as your platform and the exact regular expression and file contents you're using (minimized to the smallest example that still reproduces the issue). (I would add a comment to your post but don't have enough reputation.)

That should be correct. Make sure that you do or don't add the appropriate spaces i.e. "pattern1\|pattern2" vs "pattern1\| pattern2".
Are you sure you aren't just having problems with cases or something? try the -i switch.

That depends entirely on what pattern1 and pattern2 are. If they're just words, it should work, otherwise you'll need:
grep '\(pattern1\)\|\(pattern2\)'

An arcane method using fgrep (ie: fixed strings) that works on Solaris 10...
Provide a pattern-list, with each pattern separated by a NEWLINE, yet quoted so as to be interpreted by the shell as one word:-
fgrep 'pattern1
pattern2' filename
This method also works for grep, fgrep and egrep in /usr/xpg4/bin, although the pipe-delimited ERE in any egrep is sometimes the least fussy.
You can insert arbitrary newlines in a string if your shell allows history-editing, eg: in bash issue C-v C-j in either emacs mode or vi-command mode.

egrep -e "string1|string2" works for me in SunOS 5.9 (Solaris)

To understand the practical use of Grep's option -H in different situations

This question is based on this answer.
Why do you get the same output from the both commands?
Command A
$sudo grep muel * /tmp
masi:muel
Command B
$sudo grep -H muel * /tmp
masi:muel
Rob's comment suggests me that Command A should not give me masi:, but only muel.
In short, what is the practical purpose of -H?

Grep will list the filenames by default if more than one filename is given. The -H option makes it do that even if only one filename is given. In both your examples, more than one filename is given.
Here's a better example:
$ grep Richie notes.txt
Richie wears glasses.
$ grep -H Richie notes.txt
notes.txt:Richie wears glasses.
It's more useful when you're giving it a wildcard for an unknown number of files, and you always want the filenames printed even if the wildcard only matches one file.

If you grep a single file, -H makes a difference:
$ grep muel mesi
muel
$ grep -H muel mesi
masi:muel
This could be significant in various scripting contexts. For example, a script (or a non-trivial piped series of commands) might not be aware of how many files it's actually dealing with: one, or many.

When you grep from multiple files, by default it shows the name of the file where the match was found. If you specify -H, the file name will always be shown, even if you grep from a single file. You can specify -h to never show the file name.

Emacs has grep interface (M-x grep, M-x lgrep, M-x rgrep). If you ask Emacs to search for foo in the current directory, then Emacs calls grep and process the grep output and then present you with results with clickable links. Clickable links, just like Google.
What Emacs does is that it passes two options to grep: -n (show line number) and -H (show filenames even if only one file. the point is consistency) and then turn the output into clickable links.
In general, consistency is good for being a good API, but consistency conflicts with DWIM.
When you directly use grep, you want DWIM, so you don't pass -H.

An encoding-savvy grep replacement?

I am frustrated that grep fails to find a word like "hello" in my UTF-16 documents.
Can anyone recommend a version of grep that attempts to guess the file encoding and then properly handle it?

ack as perl-based grep replacement?
You'll definitely want to check out ack.
It supports Unicode encodings, and is basically grep, but better.
try a matching Unicode locale with grep
If you are under Linux, Unix, etc. you may want to change your LANG envariable to an encoding to match your documents.
Check your locale first. Here is what mine is set to by default on my MacBook Pro:
$ locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=
say, under bash:
$ LANG="foo" grep 'gotta be found now' file.name
something a little more permanent (be careful with this):
$ export LANG="foo"
$ grep 'bar' mitz.vah

Perl has a way better regex syntax than grep (much more powerful), it has UTF8 and UTF16 support, but I'm not sure how good it is at guessing the encoding... if you tell it which encoding to use, though, it can read these files without any issues and run regexes over them. You'll have to write yourself a tiny Perl program for that (your own micro-grep implementation in Perl so to say), but that isn't too hard. Perl exists for all major operating systems.

I am frustrated that grep fails to find a word like "hello" in my
UTF-16 documents.
Can anyone recommend a version of grep that attempts to guess the file
encoding and then properly handle it?
ugrep which is free BSD-3 open source, supports all UTF encodings and claims to be a true drop-in replacement for grep by supporting the GNU/BSD grep command line options. Likewise, ripgrep, ack, and silver searcher (ag) also support UTF encodings but are not drop-in replacements for grep since their behavior and options differ from grep.
You could use the iconv filter utility in combination with grep to convert UTF-16 files to UTF-8, but you will have to explicitly specify the input and output encodings, something like:
iconv -f utf-16 -t utf8` < file.txt | grep PATTERN

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Using globs in GNU grep's path argument - grep

BSD (Mac) grep allows for this command: grep -n "FIXME" **/*.rb But GNU grep forces me to specify at least a folder to start from: grep -n "FIXME" {lib,spec}/**/*.rb Is there a way to get this to behave like it does in BSD grep?

Related

Grep with RegEx Inside a Docker Container?

Wildcards in erlc's -I option?

grep match with string1 OR string2

To understand the practical use of Grep's option -H in different situations

An encoding-savvy grep replacement?

Categories

Resources