I have a file like below:
city-italy
good food
bad climate
-
city-india
bad food
normal climate
-
city-brussel
normal dressing
stylish cookings
good food
-
Question - I want to grep city and food, for which "food" is "bad".
For example -
for the above question, i need a grep command to get a answer like below
city-india
bad food
Please help me like, how i will get pattern 1 and pattern 2 grepped only if both succeeds parallely.
i mean both pattern should match and it should grep in the following line.
You can do it with pipes - grep -A1 city <filename> | grep -B1 "bad food" or cat filename | grep -A1 city | grep -B1 "bad food" (or any other stream source for the pipe)
If the city name is guaranteed to come before the food quality (any other info in between is allowed):
sed -n -e '/^city/h' -e '/bad food/{x;G;p}' input
Which keeps the name of each city in the hold buffer and prints the last city name when matches bad food.
I know this is an old question, but here's a "robust" alternative (cuz I'm into that):
grep -x -e'city-.*' -e'good food' -e'bad food' -e'-' | tr \\n \| | sed -e's/|-|/\n/g' | grep -xe'[^|]\+|[^|]\+' | grep -e'|bad food$' | tr \| \\n
Explanation
grep -x -e'city-.*' -e'good food' -e'bad food' -e'-': only keep the lines that contain a "city line", a "food line" (either good or bad), or a "separator line" (the food line expression could be better, I know), the -x argument to grep will make it return a line only if the whole line matches the given expression (incidentally, this first stage makes the whole pipe not choke on differently-sized "registers"),
tr \\n \|: turn newlines into pipes (you can use any character that does not appear in the original file, pipe works, so does a colon, you get the idea),
sed -e's/|-|/\n/g': replace the |-| string by a newline (this are the places we know a "register" ends, since we only kept the datums we're interested in and the separators, we know that now we have each of our "registers" in a single line, with their fields separated by pipes),
grep -xe'[^|]\+|[^|]\+': only keep lines containing exactly two fields (ie. the city and food fields),
grep -e'|bad food$': keep only lines ending in |bad food,
tr \| \\n: turn pipes back into newlines (nb. this is just here so that the output conforms to the question's specification, it's not really needed, nor preferred in my opinion).
Partial outputs
After grep -x -e'city-.*' -e'good food' -e'bad food' -e'-':
city-italy
good food
-
city-india
bad food
-
city-brussel
good food
-
After tr \\n \|:
city-italy|good food|-|city-india|bad food|-|city-brussel|good food|-|
After sed -e's/|-|/\n/g':
city-italy|good food
city-india|bad food
city-brussel|good food
After grep -xe'[^|]\+|[^|]\+': idem, since we don't have a "city line" without a "food line" in the example given, nor a register containing two "city lines" and a "food line", nor a register containing a "city line" and two "food lines", nor... you get the picture,
After grep -e'|bad food$':
city-india|bad food
After tr \| \\n:
city-india
bad food
Why is this more "robust"?
The input file basically consists of different "registers", each containing a variable number of "fields", but instead of having them in an "horizontal" format, we find them in a "vertical" one, ie. one field per line with a lone - separating whole registers.
The pipe above supports any amount of fields in each register, it only assumes that:
Registers are separated by a lone -,
The "city fields" are all of the form city-*,
The "food fields" are either good food or bad food,
If at all existent, "city" fields appear before "food" fields.
(this last one I find particularly hard to relax, at least in a "normal"-ish pipe like the one given).
I does not assume that:
Each register has a "city" and a "food" field,
Each register has only "city" and "food" fields.
Disclaimer
I'm not claiming this is in any way better than any of the other answers, it's just that I can't do sed or awk to save my own life, and often find pipes like this are helpful in understanding how the file gets filtered and transformed.
All in all, it's just a matter of taste.
If the order is ensured, you can use directly the command grep with OR:
grep -e "city" -e "food" FILE_INPUT
Then hopefully the city will follow by its food feature at following.
The result looks like:
city-italy
good food
city-india
bad food
city-brussel
good food
You can change your pattern to get a more filtered result.
To get city with bad food using gnu awk (due to RS)
awk '/bad food/ {print RS $1}' RS="city" file
city-india
another awk line:
kent$ awk 'BEGIN{FS=OFS="\n";RS="-"FS}/bad food/{print $1,$2}' file
city-india
bad food
Related
I have a text file using markup language (similar to wikipedia articles)
cat test.txt
This is a sample text having: colon in the text. and there is more [[in single or double: brackets]]. I need to select the first word only.
and second line with no [brackets] colon in it.
I need to select the word "having:" only because that is part of regular text. I tried
grep -v '[*:*]' test.txt
This will correctly avoid the tags, but does not select the expected word.
The square brackets specify a character class, so your regular expression looks for any occurrence of one of the characters * or : (or *, but we said that already, didn't we?)
grep has the option -o to only print the matching text, so something lie
grep -ow '[^[:space:]]*:[^[:space:]]*' file.txt
would extract any text with a colon in it, surrounded by zero or more non-whitespace characters on each side. The -w option adds the condition that the match needs to be between word boundaries.
However, if you want to restrict in which context you want to match the text, you will probably need to switch to a more capable tool than plain grep. For example, you could use sed to preprocess each line to remove any bracketed text, and then look for matches in the remaining text.
sed -e 's/\[.*]//g' -e 's/ [^: ]*$/ /' -e 's/[^: ]* //g' -e 's/ /\n/' file.txt
(This assumes that your sed recognizes \n in the replacement string as a literal newline. There are simple workarounds available if it doesn't, but let's not go there if it's not necessary.)
In brief, we first replace any text between square brackets. (This needs to be improved if your input could contain multiple sequences of square brackets on a line with normal text between them. Your example only shows nested square brackets, but my approach is probably too simple for either case.) Then, we remove any words which don't contain a colon, with a special provision for the last word on the line, and some subsequent cleanup. Finally, we replace any remaining spaces with newlines, and (implicitly) print whatever is left. (This still ends up printing one newline too many, but that is easy to fix up later.)
Alternatively, we could use sed to remove any bracketed expressions, then use grep on the remaining tokens.
sed -e :a -e 's/\[[^][]*\]//' -e ta file.txt |
grep -ow '[^[:space:]]*:[^[:space:]]*'
The :a creates a label a and ta says to jump back to that label and try again if the regex matched. This one also demonstrates how to handle nested and repeated brackets. (I suppose it could be refactored into the previous attempt, so we could avoid the pipe to grep. But outlining different solution models is also useful here, I suppose.)
If you wanted to ensure that there is at least one non-colon character adjacent to the colon, you could do something like
... file.txt |
grep -owE '[^:[:space:]]+:[^[:space:]]*|[^[:space:]]*:[^: [:space:]]+'
where the -E option selects a slightly more modern regex dialect which allows us to use | between alternatives and + for one or more repetitions. (Basic grep in 1969 did not have these features at all; much later, the POSIX standard grafted them on with a slightly wacky syntax which requires you to backslash them to remove the literal meaning and select the metacharacter behavior... but let's not go there.)
Notice also how [^:[:space:]] matches a single character which is not a colon or a whitespace character, where [:space:] is the (slightly arcane) special POSIX named character class which matches any whitespace character (regular space, horizontal tab, vertical tab, possibly Unicode whitespace characters, depending on locale).
Awk easily lets you iterate over the tokens on a line. The requirement to ignore matches within square brackets complicates matters somewhat; you could keep a separate variable to keep track of whether you are inside brackets or not.
awk '{ for(i=1; i<=NF; ++i) {
if($i ~ /\]/) { brackets=0; next }
if($i ~ /\[/) brackets=1;
if(brackets) next;
if($i ~ /:/) print $i }' file.txt
This again hard-codes some perhaps incorrect assumptions about how the brackets can be placed. It will behave unexpectedly if a single token contains a closing square bracket followed by an opening one, and has an oversimplified treatment of nested brackets (the first closing bracket after a series of opening brackets will effectively assume we are no longer inside brackets).
A combined solution using sed and awk:
sed 's/ /\n/g' test.txt | gawk 'i==0 && $0~/:$/{ print $0 }/\[/{ i++} /\]/ {i--}'
sed will change all spaces to a newline
awk (or gawk) will output all lines matching $0~/:$/, as long as i equals zero
The last part of the awk stuff keeps a count of the opening and closing brackets.
Another solution using sed and grep:
sed -r -e 's/\[.*\]+//g' -e 's/ /\n/g' test.txt | grep ':$'
's/\[.*\]+//g' will filter the stuff between brackets
's/ /\n/g' will replace a space with a newline
grep will only find lines ending with :
A third on using only awk:
gawk '{ for (t=1;t<=NF;t++){
if(i==0 && $t~/:$/) print $t;
i=i+gsub(/\[/,"",$t)-gsub(/\]/,"",$t) }}' test.txt
gsub returns the number of replacements.
The variable i is used to count the level of brackets. On every [ it is incremented by 1, and on every ] it is decremented by one. This is done because gsub(/\[/,"",$t) returns the number of replaced characters. When having a token like [[][ the count is increased by (3-1=) 2. When a token has brackets AND a semicolon my code will fail, because the token will match, if it ends with a :, before the count of the brackets.
I want to extract a specific part out of the filenames to work with them.
Example:
ls -1
REZ-Name1,Surname1-02-04-2012.png
REZ-Name2,Surname2-07-08-2013.png
....
So I want to get only the part with the name.
How can this be achieved ?
There are several ways to do this. Here's a loop:
for file in REZ-*-??-??-????.png
do
name=${file#*-}
name=${name%-??-??-????.png}
echo "($name)"
done
Given a variety of filenames with all sorts of edge cases from spacing, additional hyphens and line feeds:
REZ-Anna-Maria,de-la-Cruz-12-32-2015.png
REZ-Bjørn,Dæhlie-01-01-2015.png
REZ-First,Last-12-32-2015.png
REZ-John Quincy,Adams-11-12-2014.png
REZ-Ridiculous example # this is one filename
is ridiculous,but fun-22-11-2000.png # spanning two lines
it outputs:
(Anna-Maria,de-la-Cruz)
(Bjørn,Dæhlie)
(First,Last)
(John Quincy,Adams)
(Ridiculous example
is ridiculous,but fun)
If you're less concerned with correctness, you can simplify it further:
$ ls | grep -o '[^-]*,[^-]*'
Maria,de
Bjørn,Dæhlie
First,Last
John Quincy,Adams
is ridiculous,but fun
In this case, cut makes more sense than grep:
ls -l | cut -f2 -d-
cut the second field from the input, using '-' as the field delimiter. That other guy's answer will correctly handle some cases mine will not, but for one off uses, I generally find the semantics of cut to be much easier to remember.
I have data that looks like the following:
bark art|evt|evt|nat
barnburner evt|hum
bash evt|evt
battle act|act|act|evt|evt
bay anm|art|art|art|evt|nat|plt
beat act|act|atr|com|evt|evt|evt|hum|loc|tme
beating act|act|evt|evt
bread act|act|evt|evt|hum|nat
I want to be able to extract from it all lines that have any string in the first column, but a specific pattern of information in the second column.
More specifically, I want to extract those lines that has evt in the second column and at least another value that I specify.
For instance, I want to extract all lines that have evt and at least hum or nat (or both hum and nat and evt simutaneously).
Thus, my desired result would be:
bark art|**evt**|**evt**|**nat**
barnburner **evt**|**hum**
bay anm|art|art|art|**evt**|**nat**|plt
beat act|act|atr|com|**evt**|**evt**|**evt**|**hum**|loc|tme
bread act|act|**evt**|**evt**|**hum**|**na**t
I have been trying to do this with a grepwith no success.
The grep that I have been trying is:
$ grep 'evt\|(hum|nat)' file
Can anyone point me in a direction to what I am doing wrong?
Thanks!
grep:
default: BRE (Basic Regex) , you have to escape some special chars to give them special meaning. like |, ( ...
-E option: ERE (Extened Regex), you escape some special chars to take special meaning away. like |, (, {...
So you used default option of grep, which is BRE, the evt\|(hum|nat) matches
"evt" or literal "(hum|hat)" with BRE, what you are looking for might be: evt|\(hum\|nat\) here \( and \| have special meaning.
Or use -E BRE, then you can grep 'evt\|(hum|hat) , the \| took the special meaning away, made it match literal "|"
You are so close, just use Extended regex paramter E.
$ grep -E 'evt\|(hum|nat)' file
bark art|evt|evt|nat
barnburner evt|hum
bay anm|art|art|art|evt|nat|plt
beat act|act|atr|com|evt|evt|evt|hum|loc|tme
bread act|act|evt|evt|hum|nat
grep (GNU grep) 2.14
Hello,
I have a log file that I want to filter on a selected word. However, it tends to filter on many for example.
tail -f gateway-* | grep "P_SIP:N_iptB1T1"
This will also find words like this:
"P_SIP:N_iptB1T10"
"P_SIP:N_iptB1T11"
"P_SIP:N_iptB1T12"
etc
However, I don't want to display anything after the 1. grep is picking up 11, 12, 13, etc.
Many thanks for any suggestions,
You can restrict the word to end at 1:
tail -f gateway-* | grep "P_SIP:N_iptB1T1\>"
This will work assuming that you have a matching case which is only "P_SIP:N_iptB1T1".
But if you want to extract from P_SIP:N_iptB1T1x, and display only once, then you need to restrict to show only first match.
grep -o "P_SIP:N_iptB1T1"
-o, --only-matching show only the part of a line matching PATTERN
More info
At least two approaches can be tried:
grep -w pattern matches for full words. Seems to work for this case too, even though the pattern has punctuation.
grep pattern -m 1 to restrict the output to first match. (Also doable with grep xxx | head -1)
If the lines contains the quotes as in your example, just use the -E option in grep and match the closing quote with \". For example:
grep -E "P_SIP:N_iptB1T1\"" file
If these quotes aren't in the text file, and there's blank spaces or endlines after the word, you can match these too:
# The word is followed by one or more blanks
grep -E "P_SIP:N_iptB1T1\s+" file
# Match lines ending with the interesting word
grep -E "P_SIP:N_iptB1T1$" file
I'm using the operating systems dictionary file to scan. I'm creating a java program to allow a user to enter any concoction of letters to find words that contain those letters. How would I do this using grep commands?
To find words that contain only the given letters:
grep -v '[^aeiou]' wordlist
The above filters out the lines in wordlist that don't contain any characters except for those listed. It's sort of using a double negative to get what you want. Another way to do this would be:
grep '^[aeiou]+$' wordlist
which searches the whole line for a sequence of one or more of the selected letters.
To find words that contain all of the given letters is a bit more lengthy, because there may be other letters in between the ones we want:
cat wordlist | grep a | grep e | grep i | grep o | grep u
(Yes, there is a useless use of cat above, but the symmetry is better this way.)
You can use a single grep to solve the last problem in Greg's answer, provided your grep supports PCRE. (Based on this excellent answer, boiled down a bit)
grep -P "(?=.*a)(?=.*e)(?=.*i)(?=.*o)(?=.*u)" wordlist
The positive lookahead means it will match anything with an "a" anywhere, and an "e" anywhere, and.... etc etc.