I want a regex that will match for strings of RACE or RACE_1, but not RACE_2 and RACE_3. I've been on Rubular for a while now trying to figure it out, but can't seem to get all the conditions I need met. Help is appreciated.
/^RACE(_1)?$/
Rubular example here
RACE(_1)?\b
\b means the end of a word, and that prevents matching RACE in RACE_2.
You can use:
(\bRACE(_[1])?\b)
It requires the one copy of RACE, and then 0 -> N occurrences of the _[1]. In the square brackets you can include any number you want. EXAMPLE:
(\bRACE(_[12345])?\b) will match up to RACE_5. You can then customize it to even skip numbers if you want [1245] for RACE_1, RACE_2, RACE_4, RACE_5 but not RACE_3.
/RACE(?!_)|RACE_1/
Its a bit of a hack but might fit your needs
EDIT:
Here might be a more specific one that works better
/RACE(?!_\d)|RACE_1/
In both cases, you use negative lookahead to enforce that RACE cannot be followed by _ and a number, but then specifically allow it with the or statement following.
Also, if you plan on only searching for instances of said matches that are whole words, prepend/append with \b to designate word boundaries.
/\bRACE(?!_\d)|RACE_1\b/
Related
So let's say I have this in Lua:
myvara = "Box red"
myvarb = "Box red 36"
How do I form an expression to see if both variables are the same if the number changes every time? i.e. I just want to check if both variables are red boxes but the number is not important.
I want to use pattern matching but I don't know how to do so efficiently and in an expression. I don't want to use string.find, it has to be pattern matching.
What I need to be able to do is:
if myvara == myvarb (ignoring box number) then...
... with pattern matching (not string.find or anything like that).
Oh, and there might be a different number of words sometimes and the number might be in a different place. That's why I need to use pattern matching.
Thank you.
You can remove all spaces and numbers from both strings before comparing them:
if (myvara:gsub("[%d ]","") == myvarb:gsub("[%d ]","")) then
....
I have a problem that requires me to write a regex that finds a line that containing exactly 3 groups of characters (it could be words or numbers) and that ends with another specific word. The way I had in mind was to find a pattern that ended in a space, and look for it 3 times. assuming this is the correct way to go about it, I do no know how to find a space, but I thought it would look like .*"find a space"{3} endword$. Is this the way it would be done? Even if it is not the way to do it how do you find a space? Any suggestions?
Assuming by three groups of words you would accept any non-space character, you could write:
/^\s*(?:\S+\s+){3}endword$/
The initial caret is to make sure you have exactly 3 non-space groups on the line.
Of course you need to consider whether things like control characters could appear, and adjust accordingly.
Depending on your flavor, something like the below would do it:
\b+.+?\b+.+?\b+.+?\bendword$
This makes use of the word boundary mark (\b) and non-greedy repetitions (+?), so it may be slightly different in your specific implementation, especially if you're using something old like grep.
I have 2D array in which the second column has domain names of some emails, let us call the array myData[][]. I decided to use ArrayLib in order to search the second column for a specific domain.
ArrayLib.indexOf(myData, 1, domain)
Here is where I found an issue. In myData array, one of the domains look like this "ewmining.com" (pay attention to the w).
While searching for "e.mining.com" (notice the first dot), the indexOf() function actully gave me the row containing "ewmining.com".
This is what is in the array "ewmining.com"
This is what is in the serach string "e.mining.com"
It seams that ArrayLib treats the dot to mean any character. Is this supposed to be the correct behavior? Is there a way to stop this behavior and search for exact match.
I really need help on this issue.
Thanks in advance for your help.
The dot usually represents "any character" in regular expressions. I am not familiar with ArrayLib, but maybe you should look for a way to turn off regular expressions when searching. Otherwise you might have to escape the dot, for example search for e[.]mining[.]com
I'm trying to write a user name validation that has the following restrictions:
Must contain at least 1 letter (a-zA-Z)
May not contain anything other than digits, letters, or underscores
The following examples are valid: abc123, my_name, 12345a
The following examples are invalid: 123456, my_name!, _1235
I found something about using positive lookaheads for the letter contraint: (?=.*[a-zA-Z]), and it looks like there could be some sort of negative lookahead for the second constraint, but I'm not sure how to mix them together into one regex. (Note... I am not really clear on what the .* portion does inside the lookahead...)
Is it something like this: /(?=.*[a-zA-Z])(?!.*[^a-zA-Z0-9_])/
Edit:
Because the question asks for a regex, the answer I'm accepting is:
/^[a-zA-Z0-9_]*[a-zA-Z][a-zA-Z0-9_]*$/
However, the thing I'm actually going to implement is the suggestion by Bryan Oakley to split it into multiple smaller checks. This makes it easier to both read and extend in the future in case requirements change. Thanks all!
And because I tagged this with ruby-on-rails, I'll include the code I'm actually using:
validate :username_format
def username_format
has_one_letter = username =~ /[a-zA-Z]/
all_valid_characters = username =~ /^[a-zA-Z0-9_]+$/
errors.add(:username, "must have at least one letter and contain only letters, digits, or underscores") unless (has_one_letter and all_valid_characters)
end
/^[a-zA-Z0-9_]*[a-zA-Z][a-zA-Z0-9_]*$/: 0 or more valid characters followed by one alphabetical followed by 0 or more valid characters, constrained to be both the beginning and the end of the line.
It's easy to check whether the pattern has any illegal characters, and it's easy to check whether there's at least one letter. Trying to do that all in one regular expression will make your code hard to understand.
My recommendation is to do two tests. Put the tests in functions to make your code absolutely dead-simple to understand:
if no_illegal_characters(string) && contains_one_alpha(string) {
...
}
For the former you can use the pattern ^[a-zA-Z0-9_]+$, and for the latter you can use [a-zA-Z].
If you don't like the extra functions that's ok, just don't try to solve the problem with one difficult-to-read regular expression. There are no bonus points awarded for cramming as much functionality into one expression as possible.
the simplest regex that resolve your problem is:
/^[a-zA-Z0-9][a-zA-Z0-9_]*$/
I encourage you to try it out live on http://rubular.com/
Question
I would like to be able to use a single regex (if possible) to require that a string fits [A-Za-z0-9_] but doesn't allow:
Strings containing just numbers or/and symbols.
Strings starting or ending with symbols
Multiple symbols next to eachother
Valid
test_0123
t0e1s2t3
0123_test
te0_s1t23
t_t
Invalid
t__t
____
01230123
_0123
_test
_test123
test_
test123_
Reasons for the Rules
The purpose of this is to filter usernames for a website I'm working on. I've arrived at the rules for specific reasons.
Usernames with only numbers and/or symbols could cause problems with routing and database lookups. The route for /users/#{id} allows id to be either the user's id or user's name. So names and ids shouldn't be able to collide.
_test looks wierd and I don't believe it's valid subdomain i.e. _test.example.com
I don't like the look of t__t as a subdomain. i.e. t__t.example.com
This matches exactly what you want:
/\A(?!_)(?:[a-z0-9]_?)*[a-z](?:_?[a-z0-9])*(?<!_)\z/i
At least one alphabetic character (the [a-z] in the middle).
Does not begin or end with an underscore (the (?!_) and (?<!_) at the beginning and end).
May have any number of numbers, letters, or underscores before and after the alphabetic character, but every underscore must be separated by at least one number or letter (the rest).
Edit: In fact, you probably don't even need the lookahead/lookbehinds due to how the rest of the regex works - the first ?: parenthetical won't allow an underscore until after an alphanumeric, and the second ?: parenthetical won't allow an underscore unless it's before an alphanumeric:
/\A(?:[a-z0-9]_?)*[a-z](?:_?[a-z0-9])*\z/i
Should work fine.
I'm sure that you could put all this into one regular expression, but it won't be simple and I'm not sure why insist on it being one regex. Why not use multiple passes during validation? If the validation checks are done when users create a new account, there really isn't any reason to try to cram it into one regex. (That is, you will only be dealing with one item at a time, not hundreds or thousands or more. A few passes over a normal sized username should take very little time, I would think.)
First reject if the name doesn't contain at least one number; then reject if the name doesn't contain at least one letter; then check that the start and end are correct; etc. Each of those passes could be a simple to read and easy to maintain regular expression.
What about:
/^(?=[^_])([A-Za-z0-9]+_?)*[A-Za-z](_?[A-Za-z0-9]+)*$/
It doesn't use a back reference.
Edit:
Succeeds for all your test cases. Is ruby compatible.
This doesn't block "__", but it does get the rest:
([A-Za-z]|[0-9][0-9_]*)([A-Za-z0-9]|_[A-Za-z0-9])*
And here's the longer form that gets all your rules:
([A-Za-z]|([0-9]+(_[0-9]+)*([A-Za-z|_[A-Za-z])))([A-Za-z0-9]|_[A-Za-z0-9])*
dang, that's ugly. I'll agree with Telemachus, that you probably shouldn't do this with one regex, even though it's technically possible. regex is often a pain for maintenance.
The question asks for a single regexp, and implies that it should be a regexp that matches, which is fine, and answered by others. For interest, though, I note that these rules are rather easier to state directly as a regexp that should not match. I.e.:
x !~ /[^A-Za-z0-9_]|^_|_$|__|^\d+$/
no other characters than letters, numbers and _
can't start with a _
can't end with a _
can't have two _s in a row
can't be all digits
You can't use it this way in a Rails validates_format_of, but you could put it in a validate method for the class, and I think you'd have much better chance of still being able to make sense of what you meant, a month or a year from now.
Here you go:
^(([a-zA-Z]([^a-zA-Z0-9]?[a-zA-Z0-9])*)|([0-9]([^a-zA-Z0-9]?[a-zA-Z0-9])*[a-zA-Z]+([^a-zA-Z0-9]?[a-zA-Z0-9])*))$
If you want to restrict the symbols you want to accept, simply change all [^a-zA-Z0-9] with [] containing all allowed symbols
(?=.*[a-zA-Z].*)^[A-Za-z0-9](_?[A-Za-z0-9]+)*$
This one works.
Look ahead to make sure there's at least one letter in the string, then start consuming input. Every time there is an underscore, there must be a number or a letter before the next underscore.
/^(?![\d_]+$)[A-Za-z0-9]+(?:_[A-Za-z0-9]+)*$/
Your question is essentially the same as this one, with the added requirement that at least one of the characters has to be a letter. The negative lookahead - (?![\d_]+$) - takes care of that part, and is much easier (both to read and write) than incorporating it into the basic regex as some others have tried to do.
[A-Za-z][A-Za-z0-9_]*[A-Za-z]
That would work for your first two rules (since it requires a letter at the beginning and end for the second rule, it automatically requires letters).
I'm not sure the third rule is possible using regexes.