Ruby on rails: Regex to include accented and specials characters? - ruby-on-rails

In my rails app I want to use a regex that accept accented characters(é ç à, ...) and special characters(& () " ' , ...), right now this is my vlidation
validates_format_of :job_title,
:with => /[a-zA-Z0-9]/,
:message => "le titre de l'offre n'est pas valide",
:multiline => true
i want also that regex to not accept non latin characters like Arabic, Chinese, ...

Use [:alnum:] for alphanumeric characters:
validates_format_of :job_title,
:with => /[[:alnum:]]/,
:message => "le titre de l'offre n'est pas valide",
:multiline => true

For the Latin characters you could use the \p{Latin} script character property. You would have to make sure you normalize the input first, as decomposed strings won’t match (i.e. strings containing characters using combining characters). Also this wouldn’t match things like x́ (that’s x followed by COMBINING ACUTE ACCENT) since it won’t compose into a single character, but that’s probably okay as it’s not likely to be actually used by anyone.
For the “special characters” you really need to be more specific about what you want. You say you want to allow " and ' (so called “straight” quotes), but what about “, ”, ‘ and ’ (“typographical” or “curly” quotes”). And since you are allowing European languages, what about «, », ‹, › and „? You could use the \p{Punct} class, which should match all these and more, you will need to decide if it matches too much.
You probably also want to match spaces as well. Will just the space character be okay? What about tabs, non-breaking spaces, newlines etc.? \p{Space} should get them.
There may be other characters you need to match that these won’t pick up, e.g. current symbols, may need to add those too.
So a first attempt at your regex might look like this (I’ve added \A and \z to anchor the start and end, as well as * to match all characters – I think you will need them):
/\A[\p{Latin}\p{Punct}\p{Space}0-9]*\z/

A simple option is to white-list all the characters you want to accept. For example:
/[a-zA-Z0-9áéíóúÁÉÍÓÚÑñ&*]/
Instead of a-zA-Z0-9 you can use \w. It represents any word character (letter, number, underscore).
/[\wáéíóúÁÉÍÓÚÑñ&*]/

Related

Validate: Only allow letters, numbers, spaces, hyphens, and apostrophes

I tried doing it like this:
validates :name, :format => { :with => /^[a-zA-Z][a-zA-Z0-9 -']+$/ }
However, it allows ! as well.
How could I change the code so it can only allow letters, numbers, spaces, hyphens, and apostrophes in the name and that it will start with a letter?
Thanks!
You didn't escape the - amid of your regex:
^[a-zA-Z][a-zA-Z0-9 \-']+$
^ here
If you place - inside the character class [](between two characetrs) then you must escape this with escape character(i.e. \). Otherwise it means a range. For your case it was between the space and '. Which means any characters those reside from space(ascii value) to '(ascii value). And unfortunately the ! resides in that range.

Splitting strings using Ruby ignoring certain characters

I'm trying to split a string and counts the number os words using Ruby but I want ignore special characters.
For example, in this string "Hello, my name is Hugo ..." I'm splitting it by spaces but the last ... should't counts because it isn't a word.
I'm using string.inner_text.split(' ').length. How can I specify that special characters (such as ... ? ! etc.) when separated from the text by spaces are not counted?
Thank you to everyone,
Kind Regards,
Hugo
"Hello, my name is não ...".scan /[^*!#%\^\s\.]+/
# => ["Hello,", "my", "name", "is", "não"]
/[^*!#%\^]+/ will match anything other than *!#%\^. You can add more to this list which need not be matched
this is part answer, part response to #Neo's answer: why not use proper tools for the job?
http://www.ruby-doc.org/core-1.9.3/Regexp.html says:
POSIX bracket expressions are also similar to character classes. They provide a portable alternative to the above, with the added benefit that they encompass non-ASCII characters. For instance, /\d/ matches only the ASCII decimal digits (0-9); whereas /[[:digit:]]/ matches any character in the Unicode Nd category.
/[[:alnum:]]/ - Alphabetic and numeric character
/[[:alpha:]]/ - Alphabetic character
...
Ruby also supports the following non-POSIX character classes:
/[[:word:]]/ - A character in one of the following Unicode general categories Letter, Mark, Number, Connector_Punctuation
you want words, use str.scan /[[:word:]]+/

Regex to validate string having only characters (not special characters), blank spaces and numbers

I am using Ruby on Rails 3.0.9 and I would like to validate a string that can have only characters (not special characters - case insensitive), blank spaces and numbers.
In my validation code I have:
validates :name,
:presence => true,
:format => { :with => regex } # Here I should set the 'regex'
How I should state the regex?
There are a couple ways of doing this. If you only want to allow ASCII word characters (no accented characters like Ê or letters from other alphabets like Ӕ or ל), use this:
/^[a-zA-Z\d\s]*$/
If you want to allow only numbers and letters from other languages for Ruby 1.8.7, use this:
/^(?:[^\W_]|\s)*$/u
If you want to allow only numbers and letters from other languages for Ruby 1.9.x, use this:
^[\p{Word}\w\s-]*$
Also, if you are planning to use 1.9.x regex with unicode support in Ruby on Rails, add this line at the beginning of your .rb file:
# coding: utf-8
You're looking for:
[a-zA-Z0-9\s]+
The + says one or more so it'll not match empty string. If you need to match them as well, use * in place of +.
In addition to what have been said, assign any of the regular expresion to your regex variable in your control this, for instance
regex = ^[a-zA-Z\d\s]*$

Parentheses messing up validation in rails

I have the following validation:
validates_format_of :title,
:with => /^[A-Z0-9 áàâäãçéèêëíìîïñóòôøöõúùûüý'-.]*$/i,
:message => "must contain only letters, numbers, dashes, periods, and single quotes"
This works most of the time, but when a title contains an open and closed parentheses, it passes. Anyone know how to get around this or maybe there is something wrong with my validation regex?
At the end of your regular expression you have '-.
This means that you want to allow all characters between (and including) the apostrophe and the period just like you did at the beginning of the regular expression with A-Z and 0-9.
The expression /['-.]/ allows all these characters: '()*+,-.
Inside the [], you need to escape the - character. I think that this will work the way you are hoping:
/^[A-Z0-9 áàâäãçéèêëíìîïñóòôøöõúùûüý'\-.]*$/i
PS. You don't have to escape the . inside the square brackets []

Regular Expression for Special Characters in Rails

I need the regex method in rails for the european language special characters like eg. é, ä, ö, ü, ß. Kindly help me.
Regular expressions will work just fine with "special" characters. If you're wanting to match a set of special characters, you'll need to tell the expression exactly what those characters are. Your definition of "special" might not match the next guy's.
For instance, if you wanted to see if a string contains any of the characters you listed above, you can do this:
irb(main):001:0> word = "resumé"
=> "resum\303\251"
irb(main):002:0> word =~ /[éäöüß]/
=> 5
irb(main):003:0> word.gsub(/é/, 'e')
=> "resume"
I hope this helps!

Resources