Match same character appear multiple times - grep

I want to use regex to match my requirement that, for a same character, it appeared 3 times with exactly one other character inserted into them (to simplify the answer assume all chars are in [a-zA-Z]).
For eg popape, ccccAjAkA meet my requirement, but KKKccc, FFFsF (not an 'other' char between two 'F's) are not qualified. how can I write this grep command?

Using (experimental in grep) Perl compatible regular expression (PCRE):
grep -P '([a-zA-Z])(?!\1)(.)\1(?!\1)(.)\1'

Related

In Grep, how do I add a digit immediately after a backreference?

If I have a search: (\d\d):(\d\d) and I want to add an extra 0 to the numbers that I find (ie, 12:30 would become 120:130), how do I prevent the 0 being interpreted as \10 and \20:
\10:\20
I tried escaping it with \ but that just made more backreferences. Is there another way to escape in grep?
In your original post, you didn't mention that you're using these backreferences in the replacement pattern, not the search pattern. You also didn't mention that you're using BBEdit. Solving your problem requires both of those facts.
From page 209 of the BBEdit manual:
\NNN+
If more than two decimal digits follow the backslash, only the first two are considered part of the backreference. Thus, “\111” would be interpreted as the 11th backreference, followed by a literal “1”. You may use a leading zero; for example, if in your replacement pattern you want the first backreference followed by a literal “1”, you can use “\011”. (If you use “\11”, you will get the 11th backreference, even if it is empty.)
Therefore you should try this replacement pattern:
\010:\020

How to type AND in regex word matching

I'm trying to do a word search with regex and wonder how to type AND for multiple criteria.
For example, how to type the following:
(Start with a) AND (Contains p) AND (Ends with e), such as the word apple?
Input
apple
pineapple
avocado
Code
grep -E "regex expression here" input.txt
Desired output
apple
What should the regex expression be?
In general you can't implement and in a regexp (but you can implement then with .*) but you can in a multi-regexp condition using a tool that supports it.
To address the case of ands, you should have made your example starts with a and includes p and includes l and ends with e with input including alpine so it wasn't trivial to express in a regexp by just putting .*s in between characters but is trivial in a multi-regexp condition:
$ cat file
apple
pineapple
avocado
alpine
Using &&s will find both words regardless of the order of p and l as desired:
$ awk '/^a/ && /p/ && /l/ && /e$/' file
apple
alpine
but, as you can see, you can't just use .*s to implement and:
$ grep '^a.*p.*l.*e$' file
apple
If you had to use a single regexp then you'd have to do something like:
$ grep -E '^a.*(p.*l|l.*p).*e$' file
apple
alpine
two ways you can do it
all that "&&" is same as negating the totality of a bunch of OR's "||", so you can write the reverse of what you want.
at a single bit-level, AND is same as multiplication of the bits, which means, instead of doing all the && if u think it's overly verbose, you can directly "multiply" the patterns together :
awk '/^a/ * /p/ * /e$/'
so by multiplying them, you're doing the same as performing multiple logical ANDs all at once
(but only use the short hand if inputs aren't too gigantic, or when savings from early exit are known to be negligible.
don't think of them as merely regex patterns - it's easier for one to think of anything not inside an action block, what's typically referred to as pattern, as
any combination and collection of items that could be evaluated for a boolean outcome of TRUE or FALSE in the end
e.g. POSIX-compliant expressions that work in the space include
sprintf()
field assignments, etc
(even decrementing NR - if there's such a need)
but not
statements like next, print, printf(),
delete array etc, or any of the loop structures
surprisingly though, getline is directly doable
in the pattern space area (with some wrapper workaround)

Linux word search using grep

I want to search a file in which there any words which contain alphanumeric words (i.e. words that have both combination of alpha and numeral)
I have tried using different grep combinations but not able to find the exact result I want to achieve
for example if I have a file that contains multiple lines
asbcd acblk54 lkasdfn
098213 102938 091283
aalk adsf adf
lkjas 0098324 0980 assdf
alkj30lkl 093adflkj 0lkdsf094
since lines 1 and 5 contain words which are alphanumeric only two lines should be filtered. how can I achieve this using grep.(line 2 contains numerals only, line 3 contains alpha only, line 4 contains words that are either alpha or numeral but not combination of both)
What you are interested in is a grep that matches full words. So you need the -w option:
-w, --word-regexp: Select only those lines containing matches that form whole words. The test is that the matching substring must either be at the beginning of the line, or preceded by a non-word constituent character. Similarly, it must be either at the end of the line or followed by a non-word constituent character. Word-constituent characters are letters, digits, and the underscore. This option has no effect if -x is also specified.
source: man grep
The regex you search for uses indeed [[:alnum:]] but you have to ensure that it has both a [[:alpha:]] and a [[:digit:]]. A word containing both must thus have a sequence [[:alpha:]][[:digit:]] or [[:digit:]][[:alpha:]]. The regex you are after is thus: [[:alnum:]]*([[:alpha:]][[:digit:]]|[[:digit:]][[:alpha:]])[[:alnum:]]*
The following grep will do the matches:
$ grep -w -E '[[:alnum:]]*([[:alpha:]][[:digit:]]|[[:digit:]][[:alpha:]])[[:alnum:]]*' file

Grep {n} The preceding item is matched exactly n times, is not clear to me in case of "Hair", "Haair" and "Haaair"

Suppose there is three strings "Hair", "Haair" and "Haaair" , When i use grep -E '^Ha{1}' , it returns all the former three words, instead i was expecting only "Hair", as i have asked return a line which starts with H and is followed by letter 'a' exactly once.
grep does not check that its input matches the given search expression. Grep finds substrings of the input that match the search.
See:
grep test <<< This is a test.
The input does not exactly match test. Only part of the input matches,
This is a test.
but that is enough for grep to output the whole line.
Similarly, when you say
grep -E '^Ha{1}' <<< Haaair
The input does not exactly match the search, but a part of it does,
Haaair
and that is enough. Note that {n,m} syntax is purely a convenience: Ha{1} is exactly equivalent to Ha, Ha{3,} is Haaa+, Ha{2,5} is Haa(a?){3} is Haaa?a?a?, etc. In other words, {1} does not mean "exactly once", it just means "once".
What you want to do is match a Ha that is not followed by another a. You have two options:
If your grep supports PCRE, you can use a negative lookahead:
grep -P '^Ha(?!a)'
(?!a) is a zero-length assertion, like ^. It doesn't match any characters; it simply causes the match to fail if there is an a after the first one.
Or, you can keep it simple and use a negative []:
grep -E '^Ha([^a]|$)'
Where [^a] matches any single character that is not a, and the alternation with $ handles the case of no character at all.

How to grep to find all instances of a Java method call using a reference?

I am trying the following query, but without success
grep -nr "[[:alnum:]]+\.[[:alnum:]]+\(\)" .
So, according to my logic, a method call would be one or more alphanumeric characters
[[:alnum:]]+
followed by a dot
\.
followed by one or more alphanumeric characters
[[:alnum:]]+
followed by paranthesis (for void return type only)
\(\)
But this query isn't working. How to write such a query?
grep provides several types of regex syntax.
Your pattern is written is the extended syntax and works with -E
extended-regexp has an easier/better syntax, and perl-regexp is, well, quite powerful.
-E, --extended-regexp
-F, --fixed-strings
-G, --basic-regexp (the default)
-P, --perl-regexp
grep -nrE "[[:alnum:]]+\.[[:alnum:]]+\(\)" .
You need to use "\+" instead of "+" otherwise it'll directly match the character "+".

Resources