Rails gsub for russian symbols - ruby-on-rails

In my rails app i need to format my string so that it consist only letters, without symbols. But main trouble is that string is in russian language, so how do it? For rnglish and letters and digits i do that:
ArtLookup.get_analog(#articles.ART_ARTICLE_NR.gsub(/[^0-9A-Za-z]/, ''))
But how to do it for russian alphabet? (first is А, last is Я). Only letters, and delete spaces?

Use \p{Cyrillic}, which matches any cyrillic character.
Example:
1.9.3p194 :001 > s = "helloЯ"
=> "helloЯ"
1.9.3p194 :002 > s.gsub(/\p{Cyrillic}/, '')
=> "hello"
More info on special characters handling in Ruby: http://ruby-doc.org/core-1.9.3/Regexp.html
Edited Answer:
If you want only a subset of the cyrillic alphabet, I'm afraid you have to build your own set.
For this, you can try to use a range: /[а-я]+/i, which should work. If it doesn't, just specify your alphabet explicitely: /[абвгдеёжзийклмнопрстуфхцчшщъыьэюя]+/i

Related

Why is my regexp_replace statement returning \0001?

My postgresql statement is :
update #{table} set email = regexp_replace(email, '.*(\d+).*', 'email_\1#foo.com', 'g') where email like '%#none.com'
The result is converting "placeholder_1#none.com" to "email_\u0001#foo.com"
Where it should be email_1#foo.com
Any ideas why it is returning what appears to be unicode?
You are not giving a lot of information here, but given the fact that you tagged this issue with ruby-on-rails, I'll assume that that's what you're using.
If that is the case and if the query you posted above is written in your ruby code in double quotes, then that is the reason:
2.6.5 :005 > s = "email_\1#foo.com"
=> "email_\u0001#foo.com"
Double-quote strings allow escaped characters
\nnn octal bit pattern, where nnn is 1-3 octal digits ([0-7])
See ruby string docs
If you want to have the actual backslash in your query, you'll need to escape the backslash:
query = "update #{table} set email = regexp_replace(email, '.*(\\d+).*', 'email_\\1#foo.com', 'g') where email like '%#none.com'"

Testing for word characters in Ruby/Rails regular expressions for all languages

I know I can match a word character with \w in Ruby's regular expressions:
2.0.0p247 :003 > /[\w]+/.match('hi')
=> #<MatchData "hi">
However, as I understand, that only matches [a-zA-Z0-9_]. I'd like to also match characters that appear in standard words in other languages. Is there an easy way to do this?
UPDATE: Seems like I may have found my answer in the POSIX bracket expressions:
/[[:alnum:]]/ - Alphabetic and numeric character
/[[:alpha:]]/ - Alphabetic character
Is this what I'm looking for?
Yes. Definitely on the right track with :alpha: Here's a locale aware example from (https://stackoverflow.com/a/3879835/499581):
/\A[[:alpha:]]+\Z/
also for certain punctuation consider using:
/[[:punct:]]/
more here.

Ruby: Remove number from embedded string

In a Rails app I am reading a file with key/values. An index number is embedded in the key name, and I'd like to remove it, along with one of the spacing underscores.
So in the sample data below, I'd like to convert:
PRIMER_LEFT_1_END_STABILITY into PRIMER_LEFT_END_STABILITY
PRIMER_RIGHT_1_END_STABILITY into PRIMER_RIGHT_END_STABILITY
PRIMER_PAIR_1_COMPL_ANY_TH into PRIMER_PAIR_COMPL_ANY_TH
Sample Data
PRIMER_LEFT_1_END_STABILITY=7.2000
PRIMER_RIGHT_1_END_STABILITY=7.9000
PRIMER_PAIR_1_COMPL_ANY_TH=0.00
EDIT
Thanks to #tihom for the first answer. It's partially working, but I did not specify that the embedded integer can be of any value. When over 1 digit in length the regex fails:
1.9.3-p327 :003 > "PRIMER_LEFT_221_END_STABILITY".sub(/_\d/,"")
=> "PRIMER_LEFT21_END_STABILITY"
1.9.3-p327 :004 > "PRIMER_LEFT_21_END_STABILITY".sub(/_\d/,"")
=> "PRIMER_LEFT1_END_STABILITY"
To remove the first occurrence use sub else to remove all occurrences use gsub
"PRIMER_LEFT_1_END_STABILITY".sub(/_(\d)+/,"") # => "PRIMER_LEFT_END_STABILITY"
"+" matches one or more of the preceding character. So in this case it matches one or more of any digit followed by a "_"
You can use String#tr and String#squeezeas below :
ar=['PRIMER_LEFT_1_END_STABILITY','PRIMER_RIGHT_1_END_STABILITY','PRIMER_PAIR_1_COMPL_ANY_TH']
p ar.map{|s| s.tr('0-9','').squeeze("_")}
# => ["PRIMER_LEFT_END_STABILITY", "PRIMER_RIGHT_-END_STABILITY", "PRIMER_PAIR_COMPL_ANY_TH"]
ar=["PRIMER_LEFT_221_END_STABILITY","PRIMER_LEFT_21_END_STABILITY"]
p ar.map{|s| s.tr('0-9','').squeeze("_")}
# => ["PRIMER_LEFT_END_STABILITY", "PRIMER_LEFT_END_STABILITY"]

Regex to validate string having only characters (not special characters), blank spaces and numbers

I am using Ruby on Rails 3.0.9 and I would like to validate a string that can have only characters (not special characters - case insensitive), blank spaces and numbers.
In my validation code I have:
validates :name,
:presence => true,
:format => { :with => regex } # Here I should set the 'regex'
How I should state the regex?
There are a couple ways of doing this. If you only want to allow ASCII word characters (no accented characters like Ê or letters from other alphabets like Ӕ or ל), use this:
/^[a-zA-Z\d\s]*$/
If you want to allow only numbers and letters from other languages for Ruby 1.8.7, use this:
/^(?:[^\W_]|\s)*$/u
If you want to allow only numbers and letters from other languages for Ruby 1.9.x, use this:
^[\p{Word}\w\s-]*$
Also, if you are planning to use 1.9.x regex with unicode support in Ruby on Rails, add this line at the beginning of your .rb file:
# coding: utf-8
You're looking for:
[a-zA-Z0-9\s]+
The + says one or more so it'll not match empty string. If you need to match them as well, use * in place of +.
In addition to what have been said, assign any of the regular expresion to your regex variable in your control this, for instance
regex = ^[a-zA-Z\d\s]*$

Regular Expression for Special Characters in Rails

I need the regex method in rails for the european language special characters like eg. é, ä, ö, ü, ß. Kindly help me.
Regular expressions will work just fine with "special" characters. If you're wanting to match a set of special characters, you'll need to tell the expression exactly what those characters are. Your definition of "special" might not match the next guy's.
For instance, if you wanted to see if a string contains any of the characters you listed above, you can do this:
irb(main):001:0> word = "resumé"
=> "resum\303\251"
irb(main):002:0> word =~ /[éäöüß]/
=> 5
irb(main):003:0> word.gsub(/é/, 'e')
=> "resume"
I hope this helps!

Resources