Ruby On Rails and UTF-8 - ruby-on-rails

I have an Rails application with SayController, hello action and view template say/hello.html.erb. When I add some cyrillic character like "ю", I get an error:
ArgumentError in SayController#hello
invalid byte sequence in UTF-8
Headers:
{"Cache-Control"=>"no-cache",
"X-Runtime"=>"11",
"Content-Type"=>"text/html; charset=utf-8"}
If I try to write this letter with embedded Ruby,
<%= "ю" %>
I don't get any error, but it displays a question mark in black square (�) instead of this letter.
I use Windows 7 x64, Ruby 1.9.1p378, Rails 2.3.5, WEBrick server.

A likely cause of this error is that the file which contains the cyrillic letters is not encoded in UTF8, but perhaps in some russian encoding like KOI8. This will cause the characters to be impossible to interpret in UTF8 (and rightly so!).
So double check that your file is properly encoded in UTF8.

Create a initializer file (e.g encoding_fix.rb) under your_app/config/initializers with the following content:
Encoding.default_internal = Encoding::UTF_8 if RUBY_VERSION > "1.9"
Encoding.default_external = Encoding::UTF_8 if RUBY_VERSION > "1.9"
This sets the encoding to utf8.

Related

How to change html encoded character to ascii character

I have a french character that is encoded as follows:
"Jos\xE9e"
I need to convert it to regular character because it produces this error on my server:
invalid byte sequence in UTF-8
What can I do to fix this error?
Rails 3 Ruby 1.9.2
That looks like "Josée" encoded in ISO 8859-1 (AKA Latin-1). You can use Iconv to convert it to UTF-8:
require 'iconv'
utf_string = Iconv.conv('UTF-8', 'ISO-8859-1', "Jos\xE9e")
Use a editor support utf-8, and add coding line at the top of all source files:
# coding: utf-8
If some input string is not utf-8, convert it to utf-8 first before processing:
input_str = "Jos\xE9e"
utf_input = input_str.force_encoding('iso-8859-1').encode('utf-8')
All above only work under ruby 1.9. For more information, you can check the book: Ruby Best Practices.
you should use utf8 in all your source code, how about save your file in utf-8 encoding

Ruby on Rails 3 => truncate method with special characters throws Encoding Incompatability error

I need some help with the following. I got a string here which contains special characters e.g. ë, é etc. I can display them correctly in my view but once I call the truncate method, it throws the following error:
incompatible character encodings: ASCII-8BIT and UTF-8
The weird thing is that, when I inspect the encoding of the truncated string, it does give me UTF-8, which is what I need (and UTF-8 is used for my database).
my_string_with_special_characters.truncate(35).encoding.inspect
=> UTF-8
But is is when I call:
<%= my_string_with_special_characters.truncate(35) %>
=> incompatible character encodings: ASCII-8BIT and UTF-8
I have also tried the magic_encoding gem which prepends the magic comment
"encoding : utf-8" in all of my controller files, but I still got the incompatible character encoding error.
If anyone knows how to solve this, let me know. Much appreciated.
Alex
Try to use this string in the beginning of you file (for *.rb files)
# -*- encoding: utf-8 -*-

ActionView::Template::Error (incompatible character encodings: UTF-8 and ASCII-8BIT)

I am using Ruby 1.9.2, Rails 3.0.4/3.0.5 and Phusion Passenger 3.0.3/3.0.4. My templates are written in HAML and I am using the MySQL2 gem. I have a controller action that when passed a parameter that has a special character, like an umlaut, gives me the following error:
ActionView::Template::Error (incompatible character encodings: UTF-8 and ASCII-8BIT)
The error points to the first line of my HAML template, which has the following code on it:
<!DOCTYPE html>
My understanding is that this is caused because I have a UTF-8 string that is being concatenated with an ASCII-8BIT string, but I can't for the life of me figure out what that ASCII-8BIT string is. I have checked that the params in the action are encoded using UTF-8 and I have added an encoding: UTF-8 declaration to the top of the HAML template and the ruby files and I still get this error. My application.rb file has a config.encoding = "UTF-8" declaration in it as well and the following all result in UTF-8:
ENV['LANG']
__ENCODING__
Encoding.default_internal
Encoding.default_external
Here's the kicker: I cannot reproduce this result locally on my Mac-OSX using standalone passenger or mongrel in either development or production. I can only reproduce it on a production server running nginx+passenger on linux. I have verified in the production server's console that the latter mentioned commands all result in UTF-8 as well.
Have you experienced this same error and how did you solve it?
After doing some debugging I found out the issue occurs when using the ActionDispatch::Request object which happens to have strings that are all coded in ASCII-8BIT, regardless of whether my app is coded in UTF-8 or not. I do not know why this only happens when using a production server on Linux, but I'm going to assume it's some quirk in Ruby or Rails since I was unable to reproduce this error locally. The error occurred specifically because of a line like this:
#current_path = request.env['PATH_INFO']
When this instance variable was printed in the HAML template it caused an error because the string was encoded in ASCII-8BIT instead of UTF-8. To solve this I did the following:
#current_path = request.env['PATH_INFO'].dup.force_encoding(Encoding::UTF_8)
Which forced #current_path to use a duplicated string that was forced into the proper UTF-8 encoding. This error can also occur with other request related data like request.headers.
Mysql could be the source of troublesome ascii. Try putting the following in initializer to at least eliminate this possibility:
require 'mysql'
class Mysql::Result
def encode(value, encoding = "utf-8")
String === value ? value.force_encoding(encoding) : value
end
def each_utf8(&block)
each_orig do |row|
yield row.map {|col| encode(col) }
end
end
alias each_orig each
alias each each_utf8
def each_hash_utf8(&block)
each_hash_orig do |row|
row.each {|k, v| row[k] = encode(v) }
yield(row)
end
end
alias each_hash_orig each_hash
alias each_hash each_hash_utf8
end
edit
This may not be applicable to mysql2 gem. Works for mysql however.

Character Encoding issue in Rails v3/Ruby 1.9.2

I get this error sometimes "invalid byte sequence in UTF-8" when I read contents from a file. Note - this only happens when there are some special characters in the string. I have tried opening the file without "r:UTF-8", but still get the same error.
open(file, "r:UTF-8").each_line { |line| puts line.strip(",") } # line.strip generates the error
Contents of the file:
# encoding: UTF-8
290919,"SE","26","Sk‰l","",59.4500,17.9500,, # this errors out
290956,"CZ","45","HornÌ Bradlo","",49.8000,15.7500,, # this errors out
290958,"NO","02","Svaland","",58.4000,8.0500,, # this works
This is the CSV file I got from outside and I am trying to import it into my DB, it did not come with "# encoding: UTF-8" at the top, but I added this since I read somewhere it will fix this problem, but it did not. :(
Environment:
Rails v3.0.3
ruby 1.9.2p0 (2010-08-18 revision 29036) [x86_64-darwin10.5.0]
Ruby has a notion of an external encoding and internal encoding for each file. This allows you to work with a file in UTF-8 in your source, even when the file is stored in a more esoteric format. If your default external encoding is UTF-8 (which it is if you're on Mac OS X), all of your file I/O is going to be in UTF-8 as well. You can check this using File.open('file').external_encoding. What you're doing when you opening your file and passing "r:UTF-8" is forcing the same external encoding that Ruby is using by default.
Chances are, your source document isn't in UTF-8 and those non-ascii characters aren't mapping cleanly to UTF-8 (if they were, you would either get the correct characters and no error, and if they mapped by incorrectly, you would get incorrect characters and no error). What you should do is try to determine the encoding of the source document, then have Ruby transcode the document on read, like so:
File.open(file, "r:windows-1251:utf-8").each_line { |line| puts line.strip(",") }
If you need help determining the encoding of the source, give this Python library a whirl. It's based on the automatic charset detection fallback that was in Seamonkey/Mozilla (and is possibly still in Firefox).
If you want to change your file encoding, you can use gem 'charlock holmes'
https://github.com/brianmario/charlock_holmes
$require 'charlock_holmes/string'
content = File.read('test2.txt')
if !content.is_utf8?
detection = CharlockHolmes::EncodingDetector.detect(content)
utf8_encoded_content = CharlockHolmes::Converter.convert content, detection[:encoding], 'UTF-8'
end
Then you can save your new content in a temp file and overwrite your original file.
Hope this help.

rails 2.3.5 with ruby 1.9.1p429 : incompatible character encodings: ASCII-8BIT and UTF-8

I tried the ruby hacks for utf8 (from : http://gist.github.com/273741) ... and I'm still getting the following error:
ActionView::TemplateError (incompatible character encodings: ASCII-8BIT and UTF-8)
What is bizarre for me is that the same content if retrieved with a post action (searching the app with an html from) it is displaying well ... however, with get (using an html link) it telling that their is character incompatibility !
Do you have any idea where it comes from ? is there a rails/ruby patchs for this issue ?
Thanks,
I think you problem com from template encoding in UTF-8 not in ASCII, like attempts.
In Rails 3, there are a new configuration for that :
# Configure the default encoding used in templates for Ruby 1.9.
config.encoding = "utf-8"

Resources