Premature end of char-class - ruby-on-rails

I had the regular expression for email validating following rules
The local-part of the e-mail address may use any of these ASCII characters:
Uppercase and lowercase English letters (a-z, A-Z)
Digits 0 to 9
Characters ! # $ % & ' * + - / = ? ^ _ ` { | } ~
Character . (dot, period, full stop) provided that it is not the first or last character, and provided also that it does not appear two or more times consecutively.
/^(([^<>()[\]\\.,;:\s#\"]+(\.[^<>()[\]\\.,;:\s#\"]+)*)|(\".+\"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/i
It is working in Javascript but in Ruby http://rubular.com/ it gives error "Premature end of char-class".
How can i resolve this?

Brackets are part of regex syntax. If you want to match a literal bracket (or any other special symbol, for that matter), escape with a backslash.
this should work :
/^(([^<>()\[\]\\.,;:\s#\"]+(\.[^<>()\[\]\\.,;:\s#\"]+)*)|(\".+\"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/i

You should escape opening square brackets as well as closings inside the symbol range:
# ⇓ ⇓
/^(([^<>()[\]\\.,;:\s#\"]+(\.[^<>()[\]\\.,;:\s#\"]+)*)…/
This should be:
/^(([^<>()\[\]\\.,;:\s#\"]+(\.[^<>()\[\]\\.,;:\s#\"]+)*)…/
Hope it helps.

irb(main):016:0> /[[e]/
SyntaxError: (irb):16: premature end of char-class: /[[e]/
from /ms/dist/ruby/PROJ/core/2.0.0-p195/bin/irb:12:in `<main>'
In JavaScript regular expression engine, you don't need to escape [ inside a character group []. However, you have to use \[ in Ruby regular expression.
/^(([^<>()\[\]\\.,;:\s#\"]+(\.[^<>()\[\]\\.,;:\s#\"]+)*)|(\".+\"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/i

Related

How to create a regular expression to match specific substrings inside brackets?

In my Ruby on Rails app I need a regex that accepts the following values:
{DD}
{MM}
{YY}
{NN}
{NNN}
{NNNN}
{NNNNN}
{NNNNNN}
upper and lowercase letters
the special characters -, _, . and #
I am still new to regular expressions and I came up with this:
/\A[a-zA-Z._}{#-]*\z/
This works pretty well already, however it also matches strings that should not be allowed such as:
}FOO or {YYY}
Can anybody help?
You may use
/\A(?:\{(?:DD|MM|YY|N{2,6})\}|[A-Za-z_.#-])*\z/
See Rubular demo
\A - start of string anchor
(?:\{(?:DD|MM|YY|N{2,6})\}|[A-Za-z_.#-])* - a non-capturing group ((?:...) that only groups sequences of atoms and does not create submatches/subcaptures) zero or more occurrences of:
\{(?:DD|MM|YY|N{2,6})\} - a { then either DD, or MM, YY, 2 to 6N followed with }
| - or
[A-Za-z_.#-] - 1 char from the set (ASCII letter, _, ., # or -)
\z - end of string.

Regex get from character to whitespace

I'm trying to pull the username from a post in rails. I thought the best way to do this would be using regex and pull from the # to the next whitespace character which would give me the username.
e.g in the string:
'#stackoverflow is good for help'
I would be able to pull from the # to the next whitespace character giving me the string 'stackoverflow'
My regex skills are a little lacking so any help would be appreciated.
Thanks.
You can use \S to match any non-whitespace character, for example:
(?<=#)\S*
Will match any sequence of zero or more non-whitespace characters which appear immediately after a # character. The (?<=…) creates a lookbehind assertion, so the # will not be included in the match.
Demonstration
Alternatively, you could use:
#(\S*)
This will match a #, followed by zero or more non-whitespace characters, captured in group 1.
Demonstration
How about this:
regex = /#(\S*)/
\S here matches all non-whitespace character.

How to update this REGEX to make sure string does not have _(underscore) at the end or beigning

This is the regular expression which i have, i need to make sure that string does not start or end with underscore , underscore may appear in between.
/^[a-zA-Z0-9_.-]+$/
I have tried
(?!_)
But doesn't seem to work
Allowed strings:
abcd
abcd_123
Not allowed strings:
abcd_
_abcd_123
Not too hard!
/^[^_].*[^_]$/
"Any character except an underscore at the start of the line (^[^_]), then any characters (.*), then any character except an underscore before the end of the line ([^_]$)."
This does require at least two characters to validate the string. If you want to allow one character lines:
/^[^_](.*[^_]|)$/
"Anything except an underscore to start the line, and then either some characters plus a non-underscore character before end-of-line, or just an immediate end-of-line.
You could approach this in the inverse way,
Check all those that do match starting and ending underscores like this:
/^_|_$/
^_ #starts with underscore
| #OR
_$ #ends with underscore
And then eliminate those that match. The above regexp is much more easier to read.
Check : http://www.rubular.com/r/H3Axvol13b
Or you can try the longer regex:
/^[a-zA-Z0-9.-][a-zA-Z0-9_.-]*[a-zA-Z0-9.-]$|^[a-zA-Z0-9.-]+$|^[a-zA-Z0-9.-][a-zA-Z0-9.-]$/
^[a-zA-Z0-9.-] #starts with a-z, or A-Z, or 0-9, or . -
[a-zA-Z0-9_.-]* #anything that can occur and the underscore
[a-zA-Z0-9.-]$ #ends with a-z, or A-Z, or 0-9, or . -
| #OR
^[a-zA-Z0-9.-]$ #for one-letter words
| #OR
^[a-zA-Z0-9.-][a-zA-Z0-9.-]$ #for two letter words
Check: http://www.rubular.com/r/FdtCqW6haG
/^[a-zA-Z0-9.-][a-zA-Z0-9_.-]+[a-zA-Z0-9.-]$/
Try this
Description:
In the first section, [a-zA-Z0-9.-], regex only allows lower and upper case alphabets, digits, dot and hyphen.
In the next section, [a-zA-Z0-9_.-]+, regex looks for a single or more than one characters that are lower or upper case alphabets, digits dot, hyphen or an underscore.
The last part, [a-zA-Z0-9.-], is the same as the first part that restricts the input to end with an underscore.
Try this:
Recently had the same concern and this is how I did it.
// '"^[a-zA-Z0-9_.-]*$"' → Alphanumeric and 「.」「_」「-」
// "^[^_].*[^_]$" → Reject start and end of string if contains 「_」
// (?=) REGEX AND operator
SLUG_REGEX = '"(?=^[a-zA-Z0-9_.-]*$)(?=^[^_].*[^_]$)"';
I used this snippet for my Laravel Validation so you may need to change the code as needed like " to / based on your code sample and other answers' code.

Escaping strings for gsub

I read a file:
local logfile = io.open("log.txt", "r")
data = logfile:read("*a")
print(data)
output:
...
"(\.)\n(\w)", r"\1 \2"
"\n[^\t]", "", x, re.S
...
Yes, logfile looks awful as it's full of various commands
How can I call gsub and remove i.e. "(\.)\n(\w)", r"\1 \2" line from data variable?
Below snippet, does not work:
s='"(\.)\n(\w)", r"\1 \2"'
data=data:gsub(s, '')
I guess some escaping needs to be done. Any easy solution?
Update:
local data = [["(\.)\n(\w)", r"\1 \2"
"\n[^\t]", "", x, re.S]]
local s = [["(\.)\n(\w)", r"\1 \2"]]
local function esc(x)
return (x:gsub('%%', '%%%%')
:gsub('^%^', '%%^')
:gsub('%$$', '%%$')
:gsub('%(', '%%(')
:gsub('%)', '%%)')
:gsub('%.', '%%.')
:gsub('%[', '%%[')
:gsub('%]', '%%]')
:gsub('%*', '%%*')
:gsub('%+', '%%+')
:gsub('%-', '%%-')
:gsub('%?', '%%?'))
end
print(data:gsub(esc(s), ''))
This seems to works fine, only that I need to escape, escape character %, as it wont work if % is in matched string. I tried :gsub('%%', '%%%%') or :gsub('\%', '\%\%') but it doesn't work.
Update 2:
OK, % can be escaped this way if set first in above "table" which I just corrected
:terrible experience:
Update 3:
Escaping of ^ and $
As stated in Lua manual (5.1, 5.2, 5.3)
A caret ^ at the beginning of a pattern anchors the match at the beginning of the subject string. A $ at the end of a pattern anchors the match at the end of the subject string. At other positions, ^ and $ have no special meaning and represent themselves.
So a better idea would be to escape ^ and $ only when they are found (respectively) and the beginning or the end of the string.
Lua 5.1 - 5.2+ incompatibilities
string.gsub now raises an error if the replacement string contains a % followed by a character other than the permitted % or digit.
There is no need to double every % in the replacement string. See lua-users.
According to Programming in Lua:
The character `%´ works as an escape for those magic characters. So, '%.' matches a dot; '%%' matches the character `%´ itself. You can use the escape `%´ not only for the magic characters, but also for all other non-alphanumeric characters. When in doubt, play safe and put an escape.
Doesn't this mean that you can simply put % in front of every non alphanumeric character and be fine. This would also be future proof (in the case that new special characters are introduced). Like this:
function escape_pattern(text)
return text:gsub("([^%w])", "%%%1")
end
It worked for me on Lua 5.3.2 (only rudimentary testing was performed). Not sure if it will work with older versions.
Why not:
local quotepattern = '(['..("%^$().[]*+-?"):gsub("(.)", "%%%1")..'])'
string.quote = function(str)
return str:gsub(quotepattern, "%%%1")
end
to escape and then gsub it away?
try
line = '"(\.)\n(\w)", r"\1 \2"'
rx = '\"%(%\.%)%\n%(%\w%)\", r\"%\1 %\2\"'
print(string.gsub(line, rx, ""))
escape special characters with %, and quotes with \
Try s=[["(\.)\n(\w)", r"\1 \2"]].
Use stringx.replace() from Penlight Lua Libraries instead.
Reference: https://stevedonovan.github.io/Penlight/api/libraries/pl.stringx.html#replace
Implementation (v1.12.0): https://github.com/lunarmodules/Penlight/blob/1.12.0/lua/pl/stringx.lua#L288
Based on their implementation:
function escape(s)
return (s:gsub('[%-%.%+%[%]%(%)%$%^%%%?%*]','%%%1'))
end
function replace(s,old,new,n)
return (gsub(s,escape(old),new:gsub('%%','%%%%'),n))
end

regex validation - grails constraints

I'm pretty new on grails, I'm having a problem in matches validation using regex. What I wanted to happen is my field can accept a combination of alphanumeric and specific special characters like period (.), comma (,) and dash (-), it may accept numbers (099) or letters only (alpha) , but it won't accept input that only has special characters (".-,"). Is it possible to filter this kind of input using regex?
please help. Thank you for sharing your knowledge.
^[0-9a-zA-Z,.-]*?[0-9a-zA-Z]+?[0-9a-zA-Z,.-]*$
meaning:
/
^ beginning of the string
[...]*? 0 or more characters from this class (lazy matching)
[...]+? 1 or more characters from this class (lazy matching)
[...]* 0 or more characters from this class
$ end of the string
/
I think you could match that with a regular expression like this:
".*[0-9a-zA-Z.,-]+.*"
That means:
"." Begin with any character
"*" Have zero or more of these characters
"[0-9a-zA-Z.,-]" Have characters in the range 0-9, a-z, etc, or . or , or -
"+" Have one or more of this kind of character (so it's mandatory to have one in this set)
"." End with any character
"*" Have zero or more of these characters
This is working ok for me, hope it helps!

Resources