Character encoding conversion - ruby-on-rails

I have a string which contains Swedish characters and want to convert it to basic English.
name = "LänödmåtnÖng ÅjädårbÄn"
These characters should be converted as follows:
Å use A
å use a
Ä use A
ä use a
Ö use O
ö use o
Is there a simple way to do it? If I try:
ascii_to_string = name.unpack("U*").map{|s|s.chr}.join
It returns L\xE4n\xF6dm\xE5tn\xD6ng \xC5j\xE4d\xE5rb\xC4n as ASCII, but I want to convert it to English.

Using OP's conversion table as input for the tr method:
#encoding: utf-8
name = "LänödmåtnÖng ÅjädårbÄn"
p name.tr("ÅåÄäÖö", "AaAaOo") #=> "LanodmatnOng AjadarbAn"

Try this:
string.mb_chars.normalize(:kd).gsub(/[^\x00-\x7F]/n,'').downcase.to_s
As found in this post.

You already got decent answer, however there is a way that is easier to remember (no magical regular expressions):
name.parameterize
It changes whitespaces to dashes, so you need to handle it somehow, for example by processing each word separately:
name.split.map { |s| s.parameterize }.join ' '

Related

How to remove from string before __

I am building a Rails 5.2 app.
In this app I got outputs from different suppliers (I am building a webshop).
The name of the shipping provider is in this format:
dhl_freight__233433
It could also be in this format:
postal__US-320202
How can I remove all that is before (and including) the __ so all that remains are the things after the ___ like for example 233433.
Perhaps some sort of RegEx.
A very simple approach would be to use String#split and then pick the second part that is the last part in this example:
"dhl_freight__233433".split('__').last
#=> "233433"
"postal__US-320202".split('__').last
#=> "US-320202"
You can use a very simple Regexp and a ask the resulting MatchData for the post_match part:
p "dhl_freight__233433".match(/__/).post_match
# another (magic) way to acces the post_match part:
p $'
Postscript: Learnt something from this question myself: you don't even have to use a RegExp for this to work. Just "asddfg__qwer".match("__").post_match does the trick (it does the conversion to regexp for you)
r = /[^_]+\z/
"dhl_freight__233433"[r] #=> "233433"
"postal__US-320202"[r] #=> "US-320202"
The regular expression matches one or more characters other than an underscore, followed by the end of the string (\z). The ^ at the beginning of the character class reads, "other than any of the characters that follow".
See String#[].
This assumes that the last underscore is preceded by an underscore. If the last underscore is not preceded by an underscore, in which case there should be no match, add a positive lookbehind:
r = /(?<=__[^_]+\z/
This requires the match to be preceded by two underscores.
There are many ruby ways to extract numbers from string. I hope you're trying to fetch numbers out of a string. Here are some of the ways to do so.
Ref- http://www.ruby-forum.com/topic/125709
line.delete("^0-9")
line.scan(/\d/).join('')
line.tr("^0-9", '')
In the above delete is the fastest to trim numbers out of strings.
All of above extracts numbers from string and joins them. If a string is like this "String-with-67829___numbers-09764" outut would be like this "6782909764"
In case if you want the numbers split like this ["67829", "09764"]
line.split(/[^\d]/).reject { |c| c.empty? }
Hope these answers help you! Happy coding :-)

how to replace unicode "&aring" with norwegian character å

I want to replace the unicode: &aring with Norwegian character å but the following code is not helping:
[unq1 stringByReplacingOccurrencesOfString:#"å" withString:#"å" ];
It is working perfectly on my side, may be the problem is you have not save the result of this stringByReplacingOccurrencesOfString method. So try like this
unq1 = [unq1 stringByReplacingOccurrencesOfString:#"å" withString:#"å"];

Regex in Ruby not working

I have a string from which I want to extract a certain part:
Original String: /abc/d7_t/g-12/jkl/m-n3/pqr/stu/vwx
Result Desired: /abc/d7_t/g-12/jkl/
The number of characters can vary in the entire string. It has alphabets, numbers, underscore and hyphen. I want to basically cut the string after the 5th "/"
I tried a few regex, but it seems there is some mistake with the format.
If a non-regexp approach is acceptable, how about this:
s.split('/').take(n).join('/')+'/'
Where s if your string (in your case: /abc/d7_t/g-12/jkl/m-n3/pqr/stu/vwx).
def cut_after(s, n)
s.split('/').take(n).join('/')+'/'
end
Then
cut_after("/abc/d7_t/g-12/jkl/m-n3/pqr/stu/vwx", 5)
should work. Not as compact as a regexp, but some people may find it clearer.
The regexp would be: %r(/(?:[^/]+/){4}). Note that it is a good idea in this case to use the %r literal version to avoid escaping slashes. Unescaped slashes are likely the cause of your format errors.
Match any sequence of chars except '/' 4 times :-
(\/[^\/]+){4}\/

Standardize a String for Filename, remove accents and special chars

I'm trying to find a way to normalize a string to pass it as a filename.
I have this so far:
my_string.mb_chars.normalize(:kd).gsub(/[^\x00-\x7F]/n, '').downcase.gsub(/[^a-z]/, '_')
But first problem: the - character. I guess there is more problems with this method.
I don't control the name, the name string can have accents, white spaces and special chars. I want to remove all of them, replace the accents with the corresponding letter ('é' => 'e') and replace the rest with the '_' character.
The names are like:
"Prélèvements - Routine"
"Carnet de santé"
...
I want them to be like a filename with no space/special chars:
"prelevements_routine"
"carnet_de_sante"
...
Thanks for the help :)
Take a look at ActiveSupport::Inflector.transliterate, it's very useful handling this kind of chars problems. Read there: ActiveSupport::Inflector
Then, you could do something like:
ActiveSupport::Inflector.transliterate my_string.downcase.gsub(/\s/,"_")
Use ActiveStorage::Filename#sanitized, if spaces are okay.
If spaces are okay, which I would suggest keeping, if this is a User-provided and/or User-downloadable file, then you can make use of the ActiveStorage::Filename#sanitized method that is meant for exactly this situation.
It removes special characters that are not allowed in a file name, whilst keeping all of the nice characters that Users typically use to nicely organize and describe their files, like spaces and ampersands (&).
ActiveStorage::Filename.new( "Prélèvements - Routine" ).sanitized
#=> "Prélèvements - Routine"
ActiveStorage::Filename.new( "Carnet de santé" ).sanitized
#=> "Carnet de santé"
ActiveStorage::Filename.new( "Foo:Bar / Baz.jpg" ).sanitized
#=> "Foo-Bar - Baz.jpg"
Use String#parameterize, if you want to remove nearly everything.
And if you're really looking to remove everything, try String#parameterize:
"Prélèvements - Routine".parameterize
#=> "prelevements-routine"
"Carnet de santé".parameterize
#=> "carnet-de-sante"
"Foo:Bar / Baz.jpg".parameterize
#=> "foo-bar-baz-jpg"

How can I delete special characters?

I'm practicing with Ruby and regex to delete certain unwanted characters. For example:
input = input.gsub(/<\/?[^>]*>/, '')
and for special characters, example ☻ or ™:
input = input.gsub('&#', '')
This leaves only numbers, ok. But this only works if the user enters a special character as a code, like this:
™
My question:
How I can delete special characters if the user enters a special character without code, like this:
™ ☻
First of all, I think it might be easier to define what constitutes "correct input" and remove everything else. For example:
input = input.gsub(/[^0-9A-Za-z]/, '')
If that's not what you want (you want to support non-latin alphabets, etc.), then I think you should make a list of the glyphs you want to remove (like ™ or ☻), and remove them one-by-one, since it's hard to distinguish between a Chinese, Arabic, etc. character and a pictograph programmatically.
Finally, you might want to normalize your input by converting to or from HTML escape sequences.
If you just wanted ASCII characters, then you can use:
original = "aøbauhrhræoeuacå"
cleaned = ""
original.each_byte { |x| cleaned << x unless x > 127 }
cleaned # => "abauhrhroeuac"
You can use parameterize:
'#!#$%^&*()111'.parameterize
=> "111"
You can match all the characters you want, and then join them together, like this:
original = "aøbæcå"
stripped = original.scan(/[a-zA-Z]/).to_s
puts stripped
which outputs "abc"
An easier way to do this inspirated by Can Berk Güder answer is:
In order to delete special characters:
input = input.gsub(/\W/, '')
In order to keep word characters:
input = input.scan(/\w/)
At the end input is the same! Try it on : http://rubular.com/

Resources