I am trying to understand and read the man page. Yet everyday I find more inconsistent syntax and I would like some clarification to whether I am misunderstanding something.
Within the man page, it specifies the syntax for grep is grep [OPTIONS] [-e PATTERN]... [-f FILE]... [FILE...]
I got a working example that recursively searches all files within a directory for a keyword.
grep -rnw . -e 'memes
Now this example works, but I find it very inconsistent with the man page. The directory (Which the man page has written as [FILE...] but specifies the use case for if file == directory in the man page) is located last. Yet in this example it is located after [OPTIONS] and before [-e PATTERN].... Why is this allowed, it does not follow the specified regex fule of using this command?
Why is this allowed, it does not follow the specified regex fule of using this command?
The lines in the SYNOPSIS section of a manpage are not to be understood as strict regular expressions, but as a brief description of the syntax of a utility's arguments.
Depending on the particular application, the parser might be more or less flexible on how it accepts its options. After all, each program can implement whatever grammar they like for their arguments. Therefore, some might allow options at the beginning, at the end, or even in-between files (typically with ways to handle ambiguity that may arisa, e.g. reading from the standard input with -, filenames starting with -...).
Now, of course, there are some ways to do it that are common. For instance, POSIX.1-2017 12.1 Utility Argument Syntax says:
This section describes the argument syntax of the standard utilities and introduces terminology used throughout POSIX.1-2017 for describing the arguments processed by the utilities.
In your particular case, your implementation of grep (probably GNU's grep) allows to pass options in-between the file list, as you have discovered.
For more information, see:
https://unix.stackexchange.com/questions/17833/understand-synopsis-in-manpage
Are there standards for Linux command line switches and arguments?
https://www.gnu.org/software/libc/manual/html_node/Getopt-Long-Options.html
You can also leverage .
grep ‘string’ * -lR
Related
Some command options are with one dash e.g. ruby -c (check syntax) and ruby --copyright (print copyright). Is there any pattern to this?
These are known as short and long options. Which name/format a developer uses for options of his program is totally up to him.
However, there are some widespread conventions. Like -v/--version for printing version number, -h/--help for printing usage instructions, etc.
Sadly, most commandline tools on OSX seem not to conform to -v/-h.
Good CLI (command-line interface) design dictates that options of a program that are most useful should have two formats, short and long. You use short format in your everyday life (because it's faster to type).
ps aux | grep ruby
Long ones are for scripts that you write and rarely touch (they're easier to read and understand).
mongod --logpath /path/to/logs --dbpath /path/to/db --fork --smallfiles
Many less used options may have only the long version (because, you know, there are only 26 letters in latin alphabet).
On many rails commands there is a pattern. One dash is an abbreviation for a two dash option, e.g. rspec -o FILE is a synonym for rspec --out FILE.
How can we generate important keywords for any random article ?Does there exist any existing algorithm or tool to get the important keywords from the given text.
If you are using linux you can just use grep command to get the lines which has important keywords.
Eg: $ cat file_name.txt |grep key_word
The above command will display only the lines which has the specified key_word.
Please specify more details, like what type of file(eg:txt or doc files etc) More information about which programming language and Operating system you use to get a proper answer
I'm creating a command line parser and want to support option bundling. However, I'm not sure how to handle ambiguities and conflicts that can arise. Consider the three following cases:
1.
-I accepts a string
"-Iinclude" -> Would be parsed as "-I include"
2.
-I accepts a string
-n accepts an integer
"-Iincluden10" -> Would be parsed as "-I include -n 10" because the 'cluden10' after the first occurrence of 'n' cannot be parsed as an integer.
3.
-I accepts a string
-n accepts an integer
-c accepts a string
"-Iin10clude" -> ??? What now ???
How do I handle the last string? There are multiple ways of parsing it, so do I just throw an error informing the user about the ambiguity or do I choose to parse the string that yields the most, i.e. as "-I i -n 10 -c lude"?
I could not find any detailed conventions online, but personally, I'd flag this as an ambiguity error.
As far as I know, there is no standard on command-line parameter parsing, nor even a cross-platform consensus. So the best we can do is appeal to common-sense and the principle of least astonishment.
The Posix standard suggests some guidelines for parsing command-line parameters. They are just guidelines; as the linked section indicates, some standard shell utilities don't conform. And all while Gnu utilities are expected to conform to the Posix guidelines, they also typically deviate in some respects, including the use of "long" parameters.
In any event, what Posix says about grouping is:
One or more options without option-arguments, followed by at most one option that takes an option-argument, should be accepted when grouped behind one '-' delimiter.
Note that Posix options are all single character options. Note also that the guideline is clear that only the last option in an option group is permitted to be an option which might accept an argument.
With respect to Gnu-style long options, I don't know of a standard other than the behaviour of the getopt_long utility. This utility implements Posix style for single character options, including the above-mentioned grouped option syntax; it allows single character options which take arguments to either be immediately followed by the argument, or to be at the end of an (possibly singular) options group with the argument as the following word.
For long options, grouping is not allowed, regardless of whether the option accepts arguments. If the option does accept arguments, two styles are allowed: either the option is immediately followed by an = and then the argument, or the argument is the following word.
In Gnu style, long options cannot be confused with single-character options, because the long options must be specified with two dashes (--).
By contrast, many TCL/Tk-based utilities (and some other command-line parsers) allow long options with a single -, but do not allow option grouping.
In all of these styles, options are divided into two disjoint sets: those that take arguments, and those that do not.
None of these systems are ambiguous, although a random mix of styles, as you seem to be proposing, would be. Even with formal disambiguation rules, ambiguity is dangerous, particularly in console applications where a command line can be irreversible. Furthermore, contextual disambiguation can (even silently) change meaning if the set of available options is extended in the future, which would be a source of hard-to-predict errors in scripts.
Consequently, I'd recommend sticking to a simple existing practice such as Gnu, and to not try too hard to interpret incorrect command lines which do not conform.
Is there an existing POSIX sh grammar available or do I have to figure it out from the specification directly?
Note I'm not so much interested in a pure sh; an extended but conformant sh is also more than fine for my purposes.
The POSIX standard defines the grammar for the POSIX shell. The definition includes an annotated Yacc grammar. As such, it can be converted to EBNF more or less mechanically.
If you want a 'real' grammar, then you have to look harder. Choose your 'real shell' and find the source and work out what the grammar is from that.
Note that EBNF is not used widely. It is of limited practical value, not least because there are essentially no tools that support it. Therefore, you are unlikely to find an EBNF grammar (of almost anything) off-the-shelf.
I have done some more digging and found these resources:
An sh tutorial located here
A Bash book containing Bash 2.0's BNF grammar (gone from here) with the relevant appendix still here
I have looked through the sources of bash, pdksh, and posh but haven't found anything remotely at the level of abstraction I need.
I've had multiple attempts at writing my own full blown Bash interpreters over the past year, and I've also reached at some point the same book appendix reference stated in the marked answer (#2), but it's not completely correct/updated (for example it doesn't define production rules using the 'coproc' reserved keyword and has a duplicate production rule definition for a redirection using '<&', might be more problems but those are the ones I've noticed).
The best way i've found was to go to http://ftp.gnu.org/gnu/bash/
Download the current bash version's sources
Open the parse.y file (which in this case is the YACC file that basically contains all the parsing logic that bash uses) and just copy paste the lines between '%%' in your favorite text editor, those define the grammar's production rules
Then, using a little bit of regex (which I'm terrible at btw) we can delete the extra code logic that are in between '{...}' to make the grammar look more BNF-like.
The regex i used was :
(\{(\s+.*?)+\})\s+([;|])
It matches any line non greedily .*? including spaces and new lines \s+ that are between curly braces, and specifically the last closing brace before a ; or | character. Then i just replaced the matched strings to \3 (e.g. the result of the third capturing group, being either ; or |).
Here's the grammar definition that I managed to extract at the time of posting https://pastebin.com/qpsK4TF6
I'd expect that sh, csh, ash, bash, would contain parsers. GNU versions of these are open source; you might just go check there.
I am tasked with white labeling an application so that it contains no references to our company, website, etc. The problem I am running into is that I have many different patterns to look for and would like to guarantee that all patterns are removed. Since the application was not developed in-house (entirely) we cannot simply look for occurrences in messages.properties and be done. We must go through JSP's, Java code, and xml.
I am using grep to filter results like this:
grep SOME_PATTERN . -ir | grep -v import | grep -v // | grep -v /* ...
The patterns are escaped when I'm using them on the command line; however, I don't feel this pattern matching is very robust. There could possibly be occurrences that have import in them (unlikely) or even /* (the beginning of a javadoc comment).
All of the text output to the screen must come from a string declaration somewhere or a constants file. So, I can assume I will find something like:
public static final String SOME_CONSTANT = "SOME_PATTERN is currently unavailable";
I would like to find that occurrence as well as:
public static final String SOME_CONSTANT = "
SOME_PATTERN blah blah blah";
Alternatively, if we had an internal crawler / automated tests, I could simply pull back the xhtml from each page and check the source to ensure it was clean.
To address your concern about missing some occurrences, why not filter progressively:
Create a text file with all possible
matches as a starting point.
Use filter X (grep for '^import',
for example) to dump probable false
positives into a tmp file.
Use filter X again to remove those
matches from your working file (a
copy of [1]).
Do a quick visual pass of the tmp
file and add any real matches back
in.
Repeat [2]-[4] with other filters.
This might take some time, of course, but it doesn't sound like this is something you want to get wrong...
I would use sed, not grep!
Sed is used to perform basic text transformations on an input stream.
Try s/regexp/replacement/ option with sed command.
You can also try awk command. It has an option -F for fields separation, you can use it with ; to separate lines of you files with ;.
The best solution will be however a simple script in Perl or in Python.