Convert Uniocode to UTF-8 before sending json - ruby-on-rails

My rails app gets certain data in database from another application. That data is stored as text and it may have some unicode chars in it. Now my rails app does have UTF-8 set as default in the config. But when that data is sent as json to backbone front-end then those unicode chars and not converted properly and the front-end displays ? or smart-quotes instead of displaying the proper char. How do I force the rails backend to do the encoding on the backend to convert unicode chars to UTF-8 in the json?

.encode('UTF-8') on each field.
Which is not that good, or you can write your own json serializer, where you can encode to any encoding you want
http://matthewrobertson.org/blog/2013/08/06/active-record-serializers-from-scratch/
or patch the system one
http://api.rubyonrails.org/classes/ActiveModel/Serializers/JSON.html

Related

Character encoding, how do I tell the difference?

Characters coming out of my database are encoded differently than the same characters written directly in the source. For exmaple, the word Permissões shows a different result when the string is written directly in the HTML, than when the string is output from a db record.
# From the source
Addressable::URI.encode("Permissões.pdf") #=> "Permiss%C3%B5es.pdf"
# From the db
Addressable::URI.encode("Permissões.pdf") #=> "Permisso%CC%83es.pdf"
The encodings are different. But my database is set to UTF-8, and I am using HTML5. What could be causing this?
I am unable to download files I upload to S3 because of this issue. I tried to force the encoding attachment.path.encode("UTF-8") but that makes no diffrence.
To solve this, since I am using Rails, I used ActiveSupport::Multibyte::Unicode to normalize any unicode characters before they get inserted into the database.
before_save do
self.path = ActiveSupport::Multibyte::Unicode.normalize(path)
end

Isn't user data that comes in from a form in Rails going to be UTF-8 encoded?

A Rails 3.2 app I'm contributing to has a method that coerces user input to UTF-8.
require "iconv"
def normalize(user_input_text)
Iconv.new('UTF-8//IGNORE', 'UTF-8').iconv(user_input_text.dup)
end
It basically encodes the string in UTF-8 and ignores characters that can't be transcoded.
But isn't all user data that's entering Rails through a form going to be UTF-8 encoded?
In other words, isn't this code specious and unnecessary?
These resources suggest that indeed you are right.
Now that the vast majority of web input is UTF-8, we set
the inbound parameters to UTF-8. This will eliminate many
cases of incompatible encodings between ASCII-8BIT and
UTF-8.
https://github.com/rails/rails/commit/25215d7285db10e2c04d903f251b791342e4dd6a
Rails 3 solves this very nicely by doing a number of things including interpreting params as UTF-8 and adding workarounds for Internet Explorer
http://jasoncodes.com/posts/ruby19-rails2-encodings

Rails 3 dealing with special characters

I want to provide user with ability to fill-in input field with special characters (i.e. ¥ and others).
User input could be saved in xml file and later fetched and rendered back to form input.
What is the best practice of saving special symbols to xml (maybe using html entities or hexadecimal form)?
Thanks for advance.
I'd say if you save the file in utf-8 you will have no problems.
If some controller/view has problems with encoding you have to place this in the first line:
# encoding: utf-8
There's nothing special about them and you can don't need to encode them. Let your XML library deal with that, XML supports unicode ever since, and what you call "special symbols" are just unicode characters.

cyrillic character escaping problem on RAILS JSON + EXTJS

I have a problem with the escaping of the cyrillic characters in the rails JSON output:
{"success":true,"total":"2","offices":[{"address":"addr","created_at":"2011-06-03T11:55:09Z","description":"desc","id":1,"name":"Office 1","published":true,"updated_at":"2011-06-05T13:48:35Z"},{"address":"\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd","created_at":"2011-06-03T12:32:19Z","description":"\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd","id":2,"name":"\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd 2","published":null,"updated_at":"2011-06-05T13:49:51Z"}]}
They are not properly decoded in EXTJS and result in grid is �������� 2
The page encoding is UTF8. Mysql and Rails configs are set to UTF8
Any ideas ?
Any code?
All I see is that your characters already distorted in json output (they're all the same, if you take a closer look - "\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd").

Rails - Saving Mail Attachment in a Postgres DB, results in PGError: ERROR: invalid byte sequence for encoding "UTF8": 0xa0

Has anyone seen this error before?
PGError: ERROR: invalid byte sequence for encoding "UTF8": 0xa0
I'm trying to save an incoming mail attachment(s), of any file type to the database for processing.
Any ideas?
What type of column are you saving your data to? If the attachment could be of any type, you need a bytea column to ensure that the data is simply passed through as a blob (binary "large" object). As mentioned in other answers, that error indicates that some data sent to PostgreSQL that was tagged as being text in UTF-8 encoding was invalid.
I'd recommend you store email attachments as binary along with their MIME content-type header. The Content-Type header should include the character encoding needed to convert the binary content to text for attachments where that makes sense: e.g. "text/plain; charset=iso-8859-1".
If you want the decoded text available in the database, you can have the application decode it and store the textual content, maybe having an extra column for the decoded version. That's useful if you want to use PostgreSQL's full-text indexing on email attachments, for example. However, if you just want to store them in the database for later retrieval as-is, just store them as binary and leave worrying about text encoding to the application.
The 0xa0 is a non-breaking space, possibly latin1 encoding. In Python I'd use str.decode() and str.encode() to change it from its current encoding to the target encoding, here 'utf8'. But I don't know how you'd go about it in Rails.
I do not know about Rails, but when PG gives this error message it means that :
the connection between postgres and your Rails client is correctly configured to use utf-8 encoding, meaning that all text data going between the client and postgres must be encoed in utf-8
and your Rails client erroneously sent some data encoded in another encoding (most probably latin-1 or ISO-8859) : therefore postgres rejects it
You must look into your client code where the data is inserted into the database, probably you try to insert a non-unicode string or there is some improper transcoding taking place.

Resources