Can not transliterate strings with CP850 encoding

Can not transliterate strings with CP850 encoding - ruby-on-rails

For my Blog App I use FriendlyId to generate slugs.
In irb post creation process following message appears:
ArgumentError (Can not transliterate strings with CP850 encoding)
I found out that the error appears because of a space between words in the title only, so it probably has something to do with friendly_id.
I develop on Windows 10
Encoding.default_internal = #<Encoding:UTF-8>
Encoding.default_external = #<Encoding:UTF-8>
Encoding.locale_charmap = "CP850"
I would like to use FriendlyId but also be able to use spaces in post title. Any ideas?

Related

Rails admin encoding error when i try to use 'windows-1250'

I got error :_ incompatible character encodings: UTF-8 and Windows-1250_
when i try to show something with chars from Poland ie. 'ąęźć'
in my application.rb i got:
config.encoding = "windows-1250"
In database.yml:
encoding: windows-1250
How can i show params in windows-1250 in rails admin panel?

I would suggest you go with utf-8 encoding (which is ruby's default these days).
Your input 'ąęźć' is a valid utf-8 string, so you would face no problem in decoding it as a utf-8 string.
If you still want to hack around, you can use:
'ąęźć'.mb_chars.tidy_bytes.to_s
which should also give you the desired output.

Sanitizing Unicode strings for URL slugs (Ruby/Rails)

I have UTF-8 encoded post titles which I'd rather show using the appropriate characters in slugs. An example is Amazon Japan's URL here.
How can any arbitrary string be converted to a safe URL slug such as this, with Ruby (or Rails)?
(There are some related PHP posts, but nothing I could find for Ruby.)

From reading here it seems like a solution is this:
require 'open-uri'
str = "\x12\x34\x56\x78\x9a\xbc\xde\xf1\x23\x45\x67\x89\xab\xcd\xef\x12\x34\x56\x78\x9a".force_encoding('ASCII-8BIT')
puts URI::encode(str)
Here is the documentation for open-uri. and here is some info on utf-8 encoded url schema.
EDIT: having looked into this more I noticed encode is just an alias for URI.escape which is documented here. example taken from the docs below:
require 'uri'
enc_uri = URI.escape("http://example.com/?a=\11\15")
p enc_uri
# => "http://example.com/?a=%09%0D"
p URI.unescape(enc_uri)
# => "http://example.com/?a=\t\r"
p URI.escape("#?#!", "!?")
# => "#%3F#%21"
Let me know if this is what you were looking for?
EDIT #2: I was interested and kept looking a little more, according to the comments ryan bates' railscasts on friendlyid also seems to work with chinese characters.

RoR character class regex

I have the following line of code in my Ruby on Rails app, which checks whether the given string contains Korean characters or not:
isKorean = !/\p{Hangul}/.match(word).nil?
It works perfectly in the console, but raises a syntax error for the actual app:
invalid character property name {Hangul}: /\p{Hangul}/
What am I missing and how can I get it to work?

This is a character encoding issue, you need to add:
# encoding: utf-8
to the top of the Ruby file you're using that regex in. You can probably use any encoding that the character class you're using exists in instead of UTF-8 if you wish. Note that in Ruby 2.0, UTF-8 is now the default, so this is not needed in Ruby 2.0+.
This is known as a "magic comment". You can and should read more about encoding in Ruby 1.9. Note that encoding in Rails views is handled automatically by config.encoding (set to UTF-8 by default in config/application.rb.
It was likely working in the console because your terminal is set to use UTF-8 already.

Rails: encoding woes with serialized hashes despite UTF8

I've just updated from ruby 1.9.2 to ruby 1.9.3p0 (2011-10-30 revision 33570). My rails application uses postgresql as its database backend. The system locale is UTF8, as is the database encoding. The default encoding of the rails application is also UTF8. I have Chinese users who input Chinese characters as well as English characters. The strings are stored as UTF8 encoded strings.
Rails version: 3.0.9
Since the update some of the existing Chinese strings in the database are no longer displayed correctly. This does not affect all strings, but only those that are part of a serialized hash. All other strings that are stored as plain strings still appear to be correct.
Example:
This is a serialized hash that is stored as a UTF8 string in the database:
broken = "--- !map:ActiveSupport::HashWithIndifferentAccess \ncheckbox: \"1\"\nchoice: \"Round Paper Clips \\xEF\\xBC\\x88\\xE5\\x9B\\x9E\\xE5\\xBD\\xA2\\xE9\\x92\\x88\\xEF\\xBC\\x89\\r\\n\"\ninfo: \"10\\xE7\\x9B\\x92\"\n"
In order to convert this string to a ruby hash, I deserialize it with YAML.load:
broken_hash = YAML.load(broken)
This returns a hash with garbled contents:
{"checkbox"=>"1", "choice"=>"Round Paper Clips ï¼\u0088å\u009B\u009Eå½¢é\u0092\u0088ï¼\u0089\r\n", "info"=>"10ç\u009B\u0092"}
The garbled stuff is supposed to be UTF8-encoded Chinese. broken_hash['info'].encoding tells me that ruby thinks this is #<Encoding:UTF-8>. I disagree.
Interestingly, all other strings that were not serialized before look fine, however. In the same record a different field contains Chinese characters that look just right---in the rails console, the psql console, and the browser. Every string---no matter if serialized hash or plain string---saved to the database since the update looks fine, too.
I tried to convert the garbled text from a possible wrong encoding (like GB2312 or ANSI) to UTF-8 despite ruby's claim that this was already UTF-8 and of course I failed. This is the code I used:
require 'iconv'
Iconv.conv('UTF-8', 'GB2312', broken_hash['info'])
This fails because ruby doesn't know what to do with illegal sequences in the string.
I really just want to run a script to fix all the old, presumably broken serialized hash strings and be done with it. Is there a way to convert these broken strings to something resembling Chinese again?
I just played with the encoded UTF-8 string in the raw string (called "broken" in the above example). This is the Chinese string that is encoded in the serialized string:
chinese = "\\xEF\\xBC\\x88\\xE5\\x9B\\x9E\\xE5\\xBD\\xA2\\xE9\\x92\\x88\\xEF\\xBC\\x89\\r\\n\"
I noticed that it is easy to convert this to a real UTF-8 encoded string by unescaping it (removing the escape backslashes).
chinese_ok = "\xEF\xBC\x88\xE5\x9B\x9E\xE5\xBD\xA2\xE9\x92\x88\xEF\xBC\x89\r\n"
This returns a proper UTF-8-encoded Chinese string: "（回形针）\r\n"
The thing falls apart only when I use YAML.load(...) to convert the string to a ruby hash. Maybe I should process the raw string before it is fed to YAML.load. Just makes me wonder why this is so...
Interesting! This is likely due to the YAML engine "psych" that's used by default now in 1.9.3. I switched to the "syck" engine with YAML::ENGINE.yamler = 'syck' and the broken strings are correctly parsed.

This seems to have been caused by a difference in the behaviour of the two available YAML engines "syck" and "psych".
To set the YAML engine to syck:
YAML::ENGINE.yamler = 'syck'
To set the YAML engine back to psych:
YAML::ENGINE.yamler = 'psych'
The "syck" engine processes the strings as expected and converts them to hashes with proper Chinese strings. When the "psych" engine is used (default in ruby 1.9.3), the conversion results in garbled strings.
Adding the above line (the first of the two) to config/application.rb fixes this problem. The "syck" engine is no longer maintained, so I should probably only use this workaround to buy me some time to make the strings acceptable for "psych".

From the 1.9.3 NEWS file:
* yaml
* The default YAML engine is now Psych. You may downgrade to syck by setting
YAML::ENGINE.yamler = 'syck'.
Apparently the Syck and Psych YAML engines treat non-ASCII strings in different and incompatible ways.
Given a Hash like you have:
h = {
"checkbox" => "1",
"choice" => "Round Paper Clips （回形针）\r\n",
"info" => "10盒"
}
Using the old Syck engine:
>> YAML::ENGINE.yamler = 'syck'
>> h.to_yaml
=> "--- \ncheckbox: "1"\nchoice: "Round Paper Clips \\xEF\\xBC\\x88\\xE5\\x9B\\x9E\\xE5\\xBD\\xA2\\xE9\\x92\\x88\\xEF\\xBC\\x89\\r\\n"\ninfo: "10\\xE7\\x9B\\x92"\n"
we get the ugly double-backslash format the you currently have in your database. Switching to Psych:
>> YAML::ENGINE.yamler = 'psych'
=> "psych"
>> h.to_yaml
=> "---\ncheckbox: '1'\nchoice: ! "Round Paper Clips （回形针）\\r\\n"\ninfo: 10盒\n"
The strings stay in normal UTF-8 format. If we manually screw up the encoding to be Latin-1:
>> Iconv.conv('UTF-8', 'ISO-8859-1', "\xEF\xBC\x88\xE5\x9B\x9E\xE5\xBD\xA2\xE9\x92\x88\xEF\xBC\x89")
=> "ï¼\u0088å\u009B\u009Eå½¢é\u0092\u0088ï¼\u0089"
then we get the sort of nonsense that you're seeing.
The YAML documentation is rather thin so I don't know if you can force Psych to understand the old Syck format. I think you have three options:
Use the old unsupported and deprecated Syck engine, you'd need to YAML::ENGINE.yamler = 'syck' before you YAML anything.
Load and decode all your YAML using Syck and then re-encode and save it using Psych.
Stop using serialize in favor of manually serializing/deserializing using JSON (or some other stable, predictable, and portable text format) or use an association table so that you're not storing serialized data at all.

Ruby on Rails 3, incompatible character encodings: UTF-8 and ASCII-8BIT with i18n

I've got some troubles with the couple Rails 3.0.1, Ruby 1.9.2 and my website localization.
The problem is quite simple, i've got something like that in a view :
f.input :zip_code, :label => I18n.t('labels.zip_code')
and a es.yml file :
es:
labels:
zip_code: "Este código postal no es valido."
There are no troubles with the en.yml file (it's pure ASCII) but when the website is set with i18n.locale == 'es' I get this error :
incompatible character encodings: UTF-8 and ASCII-8BIT
I have been looking around for quite a while but didn't found a way to use my UTF-8 translation files.
Did some knows how to make it works ?
Thanks for your help.

Ok so problem solved after some hours of googling...
There was actually two bugs in my code. The first one was a file encoding error and the second was the problem with the MySQL Data base configuration.
First, to solve the error caused by MySQL I used this two articles :
http://www.dotkam.com/2008/09/14/configure-rails-and-mysql-to-support-utf-8/
http://www.rorra.com.ar/2010/07/30/rails-3-mysql-and-utf-8/
Second, to solve the file encoding problem I added these 2 lines in my config/environment.rb
Encoding.default_external = Encoding::UTF_8
Encoding.default_internal = Encoding::UTF_8
Hopefully this will help someone :)

I solved most of the problems by combining many solutions:
Make sure application.rb has this line: config.encoding = "utf-8".
Make sure you are using 'mysql2' gem
Putting # encoding: utf-8 at the top of any file containing utf-8 characters.
Add the following two lines above the <App Name>::Application.initialize! line in environment.rb:
Encoding.default_external = Encoding::UTF_8
Encoding.default_internal = Encoding::UTF_8
http://rorguide.blogspot.com/2011/06/incompatible-character-encodings-ascii.html

Are you sure your es.yml file was saved as UTF-8?
If you're on Windows, use http://notepad-plus-plus.org/ to make sure.

Using this unpack function helped me sort this out finally, try this if you get the can't convert error message:
myString.unpack('U*').pack('U*')

Make sure you have config.encoding = "utf-8" in your config/application.rb. Also, your example translation file doesn't match the key you're searching for (com_name and first_name) but I suppose that could just be a typo.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart