cyrillic character escaping problem on RAILS JSON + EXTJS - ruby-on-rails

I have a problem with the escaping of the cyrillic characters in the rails JSON output:
{"success":true,"total":"2","offices":[{"address":"addr","created_at":"2011-06-03T11:55:09Z","description":"desc","id":1,"name":"Office 1","published":true,"updated_at":"2011-06-05T13:48:35Z"},{"address":"\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd","created_at":"2011-06-03T12:32:19Z","description":"\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd","id":2,"name":"\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd 2","published":null,"updated_at":"2011-06-05T13:49:51Z"}]}
They are not properly decoded in EXTJS and result in grid is �������� 2
The page encoding is UTF8. Mysql and Rails configs are set to UTF8
Any ideas ?

Any code?
All I see is that your characters already distorted in json output (they're all the same, if you take a closer look - "\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd").

Related

Convert Uniocode to UTF-8 before sending json

My rails app gets certain data in database from another application. That data is stored as text and it may have some unicode chars in it. Now my rails app does have UTF-8 set as default in the config. But when that data is sent as json to backbone front-end then those unicode chars and not converted properly and the front-end displays ? or smart-quotes instead of displaying the proper char. How do I force the rails backend to do the encoding on the backend to convert unicode chars to UTF-8 in the json?
.encode('UTF-8') on each field.
Which is not that good, or you can write your own json serializer, where you can encode to any encoding you want
http://matthewrobertson.org/blog/2013/08/06/active-record-serializers-from-scratch/
or patch the system one
http://api.rubyonrails.org/classes/ActiveModel/Serializers/JSON.html

How do I prevent French accent character truncation with Ruby 1.9, Rails 3.2, and MySQL?

I am running into this issue where I have a controller that receives a string which is the assigned to an attribute for one of my models that I then save to the database. An log message with an inspect call shows the model successfully takes the string right up until the #save call. The problem seems to be that without any errors being thrown, if the string contains a french character, the string from that character to the end of the string becomes truncated.
Further investigation seems to show that the string gets truncated when being written to the MySQL database. I also came across this article: Stale Rails Issue
If I am reading that right, it looks like characters that are not in the ASCII character encoding but are in the ISO Latin-1 character encoding are subject to this bug. I actually upgraded my project from Rails 3.0 to Rails 3.2 and from Ruby 1.8 to Ruby 1.9 so I could easily use the mysql2 adapter with Rails which some other articles seemed to suggest might solve the issue. However it didn't.
So how do I prevent the string truncation from happening?
Edit1: If I enter the query SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%'; I get:
Variable Name, Value
'character_set_client', 'utf8'
'character_set_connection', 'utf8'
'character_set_database', 'utf8'
'character_set_filesystem', 'binary'
'character_set_results', 'utf8'
'character_set_server', 'latin1'
'character_set_system', 'utf8'
'collation_connection', 'utf8_general_ci'
'collation_database', 'utf8_unicode_ci'
'collation_server', 'latin1_swedish_ci'
Also I noticed that if I place in the french character via the MySQL Query Browser and then refresh the rails app on my browser so it pulls the new data from the database it display, it displays it correctly. It just seems to drop it when saving the model data.
Edit2: I just changed some config parameters to try to fix the problem but it still exists. However, this is what I had changed the values to.
Variable Name, Value
'character_set_client', 'utf8'
'character_set_connection', 'utf8'
'character_set_database', 'utf8'
'character_set_filesystem', 'binary'
'character_set_results', 'utf8'
'character_set_server', 'utf8'
'character_set_system', 'utf8'
'collation_connection', 'utf8_general_ci'
'collation_database', 'utf8_unicode_ci'
'collation_server', 'utf8_unicode_ci'
Well you are using utf8 but if you use utf8_unicode_ci it could be better there is another encoding utf8_general_ci which is of better performance but could have problems with German if that's a problem use the utf8_unicode_ci, that's for the database, for more information on MySQL character set check out MySQL's charset-unicode-sets.
On the side of Rails and Ruby you should check this questions out French accents in ruby. And also Rails messages in french.
As a last resource you could html encode the data before inserting it in the database. This can mess up searches but if you encode the search data also before searching the database everything should be fine for more information check French characters in rails page. I hope this helps if you keep getting errors please tell me so I can check other ways to help you out.
Also the comment by #Ahmed Ali could help you out it looks like the encodings get changed
Fetching data from any database (Mysql, Postgresql, Sqlite2 & 3), all configured to have UTF-8 as it's character set, returns the data with ASCII-8BIT in ruby 1.9.1 and rails 2.3.2.1.
See the link Ahmed posted for the complete answer and the link to the page from where the quote was taken, (ASCII-8BIT encoding of query results in rails 2.3.2 and ruby 1.9.1).
Sorry for all the trouble. I'll just put down the answer. It just turned out in this case the database was correctly set up for utf8 but a user was inputing strings encoded in ISO-Latin-1 and I wasn't doing a check for what encoding user input as I assumed all input would be utf8 compatible. Turns out that french accent characters in ISO-Latin-1 are illegal utf8 characters. The database seems to handle it by just raising a warning and truncating the string at the point of the illegal character but keeping everything before it.

Upgraded to Rails 3 and Ruby 1.9 and Unicode data in Postgres database now returning as ASCII (potential bug?)

I'm running into a really strange phenomena after upgrading from Rails 2.3/Ruby 1.8 to Rails 3/Ruby 1.9. As I mentioned in the title, I'm using Postgres, along with the pg gem 0.10.0.
When I make a call to a model's string or text fields that contain Unicode, it works correctly, and they are returned with an encoding of UTF-8.
However, I also make use of serialized Hashes in a number of models, and whenever I make a call to read their contents (which worked perfectly prior to the upgrade), I get the following puzzling behavior:
If the contents contains Unicode data, it returns as ASCII, and is displayed as escaped characters.
If the contents contains ASCII data, it returns as UTF-8 (correctly), and is properly displayed.
I can simply re-encode the Unicode-returned-as-ASCII strings back to UTF-8, and everything will work fine. However, that is definitely a hack, and doesn't strike me as a good approach.
Is there a way to make serialized UTF-8 fields display correctly? If this is a bug somewhere, any idea where, and if it's known already?
Does this answer it? Why are all strings ASCII-8BIT after I upgraded to Rails 3?

Problem with cyrillic characters in Ruby on Rails

In my rails app I work a lot with cyrillic characters. Thats no problem, I store them in the db, I can display it in html.
But I have a problem exporting them in a plain txt file. A string like "элиас" gets "—ç–ª–∏–∞—Å" if I let rails put in in a txt file and download it. Whats wrong here? What has to be done?
Regards,
Elias
Obviously, there's a problem with your encoding. Make sure you text is in Unicode before writing it to the text file. You may use something like this:
ic = Iconv.new('UTF-8//IGNORE', 'UTF-8')
your_unicode_text = ic.iconv(your_text + ' ')[0..-2]
Also, double check that your database encoding is UTF-8. Cyrillic characters can display fine in DB and in html with non-unicode encoding, e.g. KOI8-RU, but you're guaranteed to have problems with them elsewhere.

Multibyte characters in URL are not rendering

I have a bugging problem. For a website I made there are search engine friendly URL's generated. The only problem is there are ß-chars in the url too. Chars like ö, ï, ä, ü etc. are placed correct. But with the ß-char there is a diamond-icon with a questionmark in it. -> �
I thought it had to do with the charset which is used but i've tried both UTF-8 and iso-8859-1. Both without luck.
I need to have the correct character in the url for the readability of deeplinks.
does the character U+00DF in UTF8 work with you?
i tried to use it on Firefox and the URL was translated into ss
in URL encoding, the U+00DF should be translated to %DF
Thanks for your answers, both + 1. I've solved the problem by using the iconv function, which is installed by default.

Resources