I have several large text files that I want to grep the thousands of records in them that have a string on 1 line of each record that says "1 name userabc.db" or "1 name xy040101.db" or "1 name abcdfr.db" or "1 name efgh.db" for example.
The string "userabc.db", etc, could have uppercase letters in this name, like USERABC.DB, or Userabc.db or userAbc.db, anywhere in the name of the .db file..
So I need to be able to search for and identify this line in each record that has an uppercase letter anywhere on this line, if any.
When I use "grep '[1 name ][A-Z] ./store.txt" no double quotes, I find :
"1 name USERxxx.db" and
"1 name Xy040101.DB" and
"1 name Abcdfr.db" and
"1 name EFGH.DB" but not when the uppercase letter or letters begins at the 2nd or subsequent letter position of the .db file name in the line in question.
Bottom line is I need to be able to find all lines that have an uppercase letter or letters when these letters are anywhere in the name of the .db file not just at the beginning or when all letters are uppercase..
Can this be done ? maybe better done with sed or awk ?
Thanks,
Bob Perez (bperez#novell.com)
All you need is:
grep "1 name .*[A-Z].*" ./store.txt
.* will match any character any number of times.
Related
The txt file is :
bar
quux
kabe
Ass
sBo
CcdD
FGH
I would like to grep the words with only one capital letter in this example, but when I use "grep [A-Z]", it shows me all words with capital letters.
Could anyone find the "grep" solution here? My expected output is
Ass
sBo
grep '\<[a-z]*[A-Z][a-z]*\>' my.txt
will match lines in the ASCII text file my.txt if they contain at least one word consisting entirely of ASCII letters, exactly one of which is upper case.
You seem to have a text file with each word on its own line.
You may use
grep '^[[:lower:]]*[[:upper:]][[:lower:]]*$' file
See the grep online demo.
The ^ matches the start of string (here, line since grep operates on a line by lin basis by default), then [[:lower:]]* matches 0 or more lowercase letters, then an [[:upper:]] pattern matches any uppercase letter, and then [[:lower:]]* matches 0+ lowercase letters and $ asserts the position at the end of string.
If you need to match a whole line with exactly one uppercase letter you may use
grep '^[^[:upper:]]*[[:upper:]][^[:upper:]]*$' file
The only difference from the pattern above is the [^[:upper:]] bracket expression that matches any char but an uppercase letter. See another grep online demo.
To extract words with a single capital letter inside them you may use word boundaries, as shown in mathguy's answer. With GNU grep, you may also use
grep -o '\b[^[:upper:]]*[[:upper:]][^[:upper:]]*\b' file
grep -o '\b[[:lower:]]*[[:upper:]][[:lower:]]*\b' file
See yet another grep online demo.
I am using a GREP search in the find function of Sublime Text 3. I want to find all lines that are more than 12 characters, excluding spaces. As discussed in this Stack Overflow thread, the way to do this is with the following command:
^\h*(?:\S\h*){13,}$
But I also want the GREP search to exclude all lines that are written in all caps, when it is conducting the above search.
Example:
FAVORITE FOODS OF AMERICA
Mac and Cheese
Peanut Butter and Jelly Sandwich
In the above example, Mac and Cheese would not be found, because
it's exactly 12 characters excluding spaces.
FAVORITE FOODS OF AMERICA would also not be found because although it's more than 12 characters, it is a line that is written in all caps.
Only Peanut Butter and Jelly Sandwich would be found, because it's more than 12 characters and although it has capital letters, the line is not written in all caps.
How would I do this? Googling around I can find Grep commands that exclude lines that contain at least one capital letter, but not lines that are only in all caps.
Note: I am using the Grep option in Sublime's Find function.
You could use a negative lookahead asserting what is on the right is not only uppercase chars with possible horizontal whitespace chars.
^(?![A-Z\h]+$)\h*(?:\S\h*){13,}$
Regex demo
Suppose I have a file named as test.txt having content .
I want to find the line containing the words starting with "r" character and ending with "i" character?
That would be something like:
grep '\b[Rr][A-Za-z]*[Ii]\b' test.txt
That's case insensitive so, if you want to ensure specific capitalisation, you would adjust the individual character classes in the expression.
I would like to know how to prevent notepad++ to output duplicated lines when I execute Find All.
Here is sample text file.
aaa aaa
bbb bbb
ccc ccc
I put "aaa" as search strings and click "Find All in Current Document".
Then output will be like this. Line 1 appears 2 times because strings "aaa" matched 2 times in line 1.
Search "aaa" (2 hits in 1 file)
new 2 (2 hits)
Line 1: aaa aaa
Line 1: aaa aaa
It is bothering to remove duplicated lines later. Is there any good method to prevent duplicated lines?
Thanks in advance!
I have never heard of a way to make the "Find result" window remove "duplicate lines". They originate from the multiple hits on the same line.
To stop the dupes from appearing is to make sure you only find the last occurrence of the string on a line.
To do this, you need to use a regex in the following form:
(\Qaa.a\E)(?!.*\1)
where <YOUR_VALUE> is aaa. The \Q and \E operators are necessary to make the regex engine treat all chars between them as literal characters. The (...) will capture the search string into Group 1 and the (?!.*\1) negative lookahead will fail all matches that are followed with the same search string on the line (as .* matches any 0+ chars other than linebreak symbols).
If I have the file foo:
read_from_buffer
read_from_buffer_and_file
write_to_buffer
some_other_function
then using
cat foo | grep 'read_from_buffer'
will list 2 lines:
read_from_buffer
read_from_buffer_and_file
But I want only exact matches... How to tell grep that different character must come than character: 0-9a-zA-Z_
Use this:
grep -w 'read_from_buffer' foo
From man grep:
-w, --word-regexp: Select only those lines containing matches that form whole words. The test is that the matching substring must
either be at the beginning of the line, or preceded by a non-word constituent character. Similarly, it must be
either at the end of the line or followed by a non-word constituent character. Word-constituent characters are
letters, digits, and the underscore.
or
-x, --line-regexp: Select only those matches that exactly match the whole line. (-x is specified by POSIX.)