Any help would be greatly appreciated. I can read code and figure it out, but I have trouble writing from scratch.
I need help starting a ksh script that would search a file for multiple strings and write each line containing one of those strings to an output file.
If I use the following command:
$ grep "search pattern" file >> output file
...that does what I want it to. But I need to search multiple strings, and write the output in the order listed in the file.
Again... any help would be great! Thank you in advance!
Have a look at the regular expression manuals. You can specify multiple strings in the search expression such as grep "John|Bill"
Man grep will teach you a lot about regular expressions, but there are several online sites where you try them out, such as regex101 and (more colorful) regexr.
Sometimes you need egrep.
egrep "first substring|second substring" file
When you have a lot substrings you can put them in a variable first
findalot="first substring|second substring"
findalot="${findalot}|third substring"
findalot="${findalot}|find me too"
skipsome="notme"
skipsome="${skipsome}|dirty words"
egrep "${findalot}" file | egrep -v "${skipsome}"
Use "-f" in grep .
Write all the strings you want to match in a file ( lets say pattern_file , the list of strings should be one per line)
and use grep like below
grep -f pattern_file file > output_file
Related
I have a txt file that has only information about location (location.txt)
Another large txt file (all.txt) has a lot of information like id , and a location.txt is subset of all.txt ( some records common in both )
I want to search the location.txt in another file with grep (all.txt)
and print all common records ( but all information like all.txt )
I try to grep by :
grep -f location.txt all.txt
the problem grep just give me the last location not all locations
how can I print all location?
I'm assuming you mean to use one of the files as a set of patterns for grep. If this is the case, you seem to be looking for a way to print all lines in one file not found in the other and this is what you want:
grep -vFf file_with_patterns other_file
Explanation
-F means to interpret the pattern(s) literally, giving no particular meaning to regex metacharacters (like * and +, for example)
-f means read regex patterns from the file named as argument (file_with_patterns in this case).
I created a test file with the following:
<cert>
</cert>
I'm now trying to find this with grep and the following command, but it take forever to run.
How can I search quickly for files that contain adjacent lines like these?
tr -d '\n' | grep '<cert></cert>' test.test
So, from the comments, you're trying to get the filenames that contain an empty <cert>..</cert> element. You're using several tools wrong. As #iiSeymour pointed out, tr only reads from standard input-- so if you want to use it to select from lots of filenames, you'll need to use a loop. grep prints out matching lines, not filenames; though you could use grep -l to see the filenames instead.
But you're only joining lines because grep works one line at a time; so let's use a better tool. Here's how to search with awk:
awk '/<cert>/ { started=1; }
/<\/cert>/ { if (started) { print FILENAME; nextfile;} }
!/<cert>/ { started = 0; }' file1 file2 *.txt
It checks each line and keeps track of whether the previous line matched <cert>. (!/pattern/ sets the flag back to zero on lines not matching /pattern/.) Call it with all your files (or with a wildcard like *.txt).
And a friendly suggestion: Next time, try each command separately (you've been stuck on this for hours and you still don't know what grep does?). And have a quick look at the manual for the tools you want to use. Unix tools are usually too complex for simple trial and error.
This is a common problem I encounter when using grep. Say the pattern is 'chr1' in a third column of a file, when I do the following:
grep 'chr1' file
How can I avoid getting the results including chr10, chr11, chr13 etc as well?
Thanks!
It seems this works:
grep -w 'chr1' file
Since you're interested in values in specific columns, you're much better off using awk:
awk '$3 == "chr1"' file
I am trying to grep the output of a command that outputs unknown text and a directory per line. Below is an example of what I mean:
.MHuj.5.. /var/log/messages
The text and directory may be different from time to time or system to system. All I want to do though is be able to grep the directory out and send it to a variable.
I have looked around but cannot figure out how to grep to the end of a word. I know I can start the search phrase looking for a "/", but I don't know how to tell grep to stop at the end of the word, or if it will consider the next "/" a new word or not. The directories listed could change, so I can't assume the same amount of directories will be listed each time. In some cases, there will be multiple lines listed and each will have a directory list in it's output. Thanks for any help you can provide!
If your directory paths does not have spaces then you can do:
$ echo '.MHuj.5.. /var/log/messages' | awk '{print $NF}'
/var/log/messages
It's not clear from a single example whether we can generalize that e.g. the first occurrence of a slash marks the beginning of the data you want to extract. If that holds, try
grep -o '/.*' file
To fetch everything after the last space, try
grep -o '[^ ]*$' file
For more advanced pattern matching and extraction, maybe look at sed, or Awk or Perl or Python.
Your line can be described as:
^\S+\s+(\S+)$
That's assuming whitespace is your delimiter between the random text and the directory. It simply separates the whitespace from the non-whitespace and captures the second part.
Or you might want to look into the word boundary character class: \b.
I know you said to use grep, but I can't help to mention that this is trivially done using awk:
awk '{ print $NF }' input.txt
This is assuming that a whitespace is the delimiter and that the path does not contain any whitespaces.
What is the way to use 'grep' to search for combinations of a pattern in a text file?
Say, for instance I am looking for "by the way" and possible other combinations like "way by the" and "the way by"
Thanks.
Awk is the tool for this, not grep. On one line:
awk '/by/ && /the/ && /way/' file
Across the whole file:
gawk -v RS='\0' '/by/ && /the/ && /way/' file
Note that this is searching for the 3 words, not searching for combinations of those 3 words with spaces between them. Is that what you want?
Provide more details including sample input and expected output if you want more help.
The simplest approach is probably by using regexps. But this is also slightly wrong:
egrep '([ ]*(by|the|way)\>){3}'
What this does is to match on the group of your three words, taking spaces in front of the words
with it (if any) and forcing it to be a complete word (hence the \> at the end) and matching the string if any of the words in the group occurs three times.
Example of running it:
$ echo -e "the the the\nby the\nby the way\nby the may\nthe way by\nby the thermo\nbypass the thermo" | egrep '([ ]*(by|the|way)\>){3}'
the the the
by the way
the way by
As already said, this procudes a 'false' positive for the the the but if you can live with that, I'd recommend doing it this way.