grep (print) the lines where the third character is 't' - grep

below are the file contents
# cat example.db
LNT_PROFILE
TMP_PROFILE
GMT_PROFILE
nyt_profile
LNX_PROFILE
LOF_PROFILE
ASA_PROFILE
ist_profile
I want to grep (print) the lines where the third character is 't', I know we can achieve this by creating a script to get the third character and pass it to the if loop and success condition will print. but i want to do it in the grep using pattern matching.
Thanks

Related

select only a word that is part of colon

I have a text file using markup language (similar to wikipedia articles)
cat test.txt
This is a sample text having: colon in the text. and there is more [[in single or double: brackets]]. I need to select the first word only.
and second line with no [brackets] colon in it.
I need to select the word "having:" only because that is part of regular text. I tried
grep -v '[*:*]' test.txt
This will correctly avoid the tags, but does not select the expected word.
The square brackets specify a character class, so your regular expression looks for any occurrence of one of the characters * or : (or *, but we said that already, didn't we?)
grep has the option -o to only print the matching text, so something lie
grep -ow '[^[:space:]]*:[^[:space:]]*' file.txt
would extract any text with a colon in it, surrounded by zero or more non-whitespace characters on each side. The -w option adds the condition that the match needs to be between word boundaries.
However, if you want to restrict in which context you want to match the text, you will probably need to switch to a more capable tool than plain grep. For example, you could use sed to preprocess each line to remove any bracketed text, and then look for matches in the remaining text.
sed -e 's/\[.*]//g' -e 's/ [^: ]*$/ /' -e 's/[^: ]* //g' -e 's/ /\n/' file.txt
(This assumes that your sed recognizes \n in the replacement string as a literal newline. There are simple workarounds available if it doesn't, but let's not go there if it's not necessary.)
In brief, we first replace any text between square brackets. (This needs to be improved if your input could contain multiple sequences of square brackets on a line with normal text between them. Your example only shows nested square brackets, but my approach is probably too simple for either case.) Then, we remove any words which don't contain a colon, with a special provision for the last word on the line, and some subsequent cleanup. Finally, we replace any remaining spaces with newlines, and (implicitly) print whatever is left. (This still ends up printing one newline too many, but that is easy to fix up later.)
Alternatively, we could use sed to remove any bracketed expressions, then use grep on the remaining tokens.
sed -e :a -e 's/\[[^][]*\]//' -e ta file.txt |
grep -ow '[^[:space:]]*:[^[:space:]]*'
The :a creates a label a and ta says to jump back to that label and try again if the regex matched. This one also demonstrates how to handle nested and repeated brackets. (I suppose it could be refactored into the previous attempt, so we could avoid the pipe to grep. But outlining different solution models is also useful here, I suppose.)
If you wanted to ensure that there is at least one non-colon character adjacent to the colon, you could do something like
... file.txt |
grep -owE '[^:[:space:]]+:[^[:space:]]*|[^[:space:]]*:[^: [:space:]]+'
where the -E option selects a slightly more modern regex dialect which allows us to use | between alternatives and + for one or more repetitions. (Basic grep in 1969 did not have these features at all; much later, the POSIX standard grafted them on with a slightly wacky syntax which requires you to backslash them to remove the literal meaning and select the metacharacter behavior... but let's not go there.)
Notice also how [^:[:space:]] matches a single character which is not a colon or a whitespace character, where [:space:] is the (slightly arcane) special POSIX named character class which matches any whitespace character (regular space, horizontal tab, vertical tab, possibly Unicode whitespace characters, depending on locale).
Awk easily lets you iterate over the tokens on a line. The requirement to ignore matches within square brackets complicates matters somewhat; you could keep a separate variable to keep track of whether you are inside brackets or not.
awk '{ for(i=1; i<=NF; ++i) {
if($i ~ /\]/) { brackets=0; next }
if($i ~ /\[/) brackets=1;
if(brackets) next;
if($i ~ /:/) print $i }' file.txt
This again hard-codes some perhaps incorrect assumptions about how the brackets can be placed. It will behave unexpectedly if a single token contains a closing square bracket followed by an opening one, and has an oversimplified treatment of nested brackets (the first closing bracket after a series of opening brackets will effectively assume we are no longer inside brackets).
A combined solution using sed and awk:
sed 's/ /\n/g' test.txt | gawk 'i==0 && $0~/:$/{ print $0 }/\[/{ i++} /\]/ {i--}'
sed will change all spaces to a newline
awk (or gawk) will output all lines matching $0~/:$/, as long as i equals zero
The last part of the awk stuff keeps a count of the opening and closing brackets.
Another solution using sed and grep:
sed -r -e 's/\[.*\]+//g' -e 's/ /\n/g' test.txt | grep ':$'
's/\[.*\]+//g' will filter the stuff between brackets
's/ /\n/g' will replace a space with a newline
grep will only find lines ending with :
A third on using only awk:
gawk '{ for (t=1;t<=NF;t++){
if(i==0 && $t~/:$/) print $t;
i=i+gsub(/\[/,"",$t)-gsub(/\]/,"",$t) }}' test.txt
gsub returns the number of replacements.
The variable i is used to count the level of brackets. On every [ it is incremented by 1, and on every ] it is decremented by one. This is done because gsub(/\[/,"",$t) returns the number of replaced characters. When having a token like [[][ the count is increased by (3-1=) 2. When a token has brackets AND a semicolon my code will fail, because the token will match, if it ends with a :, before the count of the brackets.

Match Lines From Two Lists With Wildcards In One List

I have two lists, one of which contains wildcards (in this case represented by *). I would like to compare the two lists and create an output of those that match, with each wildcard * representing a single character.
For example:
File 1
123456|Jane|Johnson|Pharmacist|janejohnson#gmail.com
09876579|Frank|Roberts|Butcher|frankie1#hotmail.com
092362936|Joe|Jordan|Joiner|joe#joesjoinery.com
928|Bob|Horton|Farmer|bhorton#farmernews.co.uk
File 2
1***6|Jane|Johnson|Pharmacist|janejohnson#gmail.com
09876579|Frank|Roberts|Butcher|f**1#hotmail.com
092362936|Joe|Jordan|J*****|joe#joesjoinery.com
928|Bob|Horton|Farmer|b*****n#f*********.co.uk
Output
092362936|Joe|Jordan|Joiner|joe#joesjoinery.com
928|Bob|Horton|Farmer|bhorton#farmernews.co.uk
Explanation
The first two lines are not considered matches because the number of *s is not equal to the number of characters shown in the first file. The latter two are, so they are added to output.
I have tried to reason out ways to do this in AWK and using Join, but I don't know any way to even start trying to achieve this. Any help would be greatly appreciated.
$ cat tst.awk
NR==FNR {
file1[$0]
next
}
{
# Make every non-* char literal (see https://stackoverflow.com/a/29613573/1745001):
gsub(/[^^*]/,"[&]") # Convert every char X to [X] except ^ and *
gsub(/\^/,"\\^") # Convert every ^ to \^
# Convert every * to .:
gsub(/\*/,".")
# Add line start/end anchors
$0 = "^" $0 "$"
# See if the current file2 line matches any line from file1
# and if so print that line from file1:
for ( line in file1 ) {
if ( line ~ $0 ) {
print line
}
}
}
$ awk -f tst.awk file1 file2
092362936|Joe|Jordan|Joiner|joe#joesjoinery.com
928|Bob|Horton|Farmer|bhorton#farmernews.co.uk
sed 's/\./\\./g; s/\*/./g' file2 | xargs -I{} grep {} file1
Explanation:
I'd take advantage of regular expression matching. To do that, we need to turn every asterisk * into a dot ., which represents any character in regular expressions. As a side effect of enabling regular expressions, we need to escape all special characters, particularly the ., in order for them to be taken literally. In a regular expression, we need to use \. to represent a dot (as opposed to any character).
The first step is perform these substitutions with sed, the second is passing every resulting line as a search pattern to grep, and search file1 for that pattern. The glue that allows to do this is xargs, where a {} is a placeholder representing a single line from the results of the sed command.
Note:
This is not a general, safe solution you can simply copy and paste: you should watch out for any characters, in your file containing the asterisks, that are considered special in grep regular expressions.
Update:
jhnc extends the escaping to any of the following characters: .\^$[], thus accounting for almost all sorts of email addresses. He/she then avoids the use of xargs by employing -f - to pass the results of sed as search expressions to grep:
sed 's/[.\\^$[]/\\&/g; s/[*]/./g' file2 | grep -f - file1
This solution is both more general and more efficient, see comment below.

How to use grep command to filter a log file for a specific keyword within particular timestamp?

So
grep "xyz" file.log
will print all the lines having xyz as a key word and
grep "01/APR/2014:16:3[5-9]" file,log
will print lines within that time range.How to use both the feature i.e a key word filter within a time range?
Just pipe your two greps together:
grep “xyz” file.log | grep “01/APR/2014:16:3[5-9]”
The first grep will parse out all the lines with xyz, the second grep will winnow that list down by the date given. Depending on your data set, reversing the greps could be faster.

grep for argument of a latex command

I'm trying to sort out some broken references in a latex file. They are commands such as \cref{ps.1.1}. I would like to grep my file and get only the argument of the command as output, in this case ps.1.1. grep -Po \\\\cref{.*?} my.tex gives me only the command, not the rest of the line, but I'd like to also get rid of the \cref{ and } in the output, so that I could iterate over them.
Here is a Perl one-liner, printing out only the matches, including multiple ones on the same line. It puts out a line per match, even for those on the same line, prepended with their line numbers.
perl -nle 'print "$.: $1" while(/\\cref\{(.*?)\}/g)' file.tex
This may need to and can be modified, depending on the exact output you want.
For example, to print just once for multiple matches on the same line, drop the /g modifier (remove g after the regex). To match multiple patterns, add them to the regex (separated by | and grouped by ()) and add $2, $3 (...) to print. To see the whole line, change $1 to $_. Etc.
A simple script would offer far more flexiblity and processing opportunities.

Filter a specific letter within a word using Grep

I've been trying to find a way to filter a specific letter within a word using a regular expression. For exemple, filtering the letter "a" in the word "latin". Filtering only a letter would be simple using something like :
grep "\ba\b"
but I can't find a way to get the "a" only in a certain word.
Thanks for your help!
You can pipe to another grep, like this:
grep "\ba\b" /path/to/input/file | grep -o "a"
The latter part of the pipe uses the o flag which only outputs the matched part. Alternatively grep -o "a" should return all a's.

Resources