Using Rails 2.3 with Ruby 1.8.7
I am working with an SQL Server database on a windows server with collation
SQL_Latin1_General_CP1_CI_AS
When I go to the rails console on the Linux server with the app and query the problem record I get
=> "Rodríguez, César"
To try to isolate the problem in my controller I tried just render :text => with the record's problem field, but on the browser I am seeing
Rodr?guez, C?sar
I believe this is an encoding issue, but I don't know how to
resolve.(and Google + Stackoverflow skills are failing me) Given that the
source data can't be changed, what do I need to do on the rails side
to get the text to render properly?
On Chrome I have tried to manually change the encoding and no matter
which I select I can't get the text to render correctly.
Also, why would it render correctly on the console?
character encoding is by default unicode in firefox and the same is for chrome. Just check if you tried with these.
You need to check and confirm some of the issues like
--Meta tags in the html page. check the charset from the source of file. Change it to utf-8 in the layout and try.
--Database encoding
--Select a character set that contains mappings for all the characters that an application and its users will want to see
There can be better solutions, still give a try using Inkscape command line tool, change the text to image files and then you can display.
Encoding is handled here with no issues currently.
Related
I'm using Ruby 2.0 and Rails 3.2.14. My view is littered several UTF-8 characters, mainly currency symbols like บาท and د.إ etc. I noticed some
(ActionView::Template::Error) "incompatible character encodings: ASCII-8BIT and UTF-8
in our production code and promptly tried visiting the page url on my browser without any issues. On digging in, I realised the error was actually caused by BingBot and few spiders. So when I tried to curl the same url, I was able to reproduce the issue. So, if I try
curl http://localhost:3000/?x=✓
I get the error where UTF-8 symbols are used in the view code. I also realised that if use HTML encoded strings in place of the symbols, this does not occur. However, I prefer using the actual symbols.
And I have already tried setting Encoding.default_external = Encoding::UTF_8 in environment.rb adding #encoding: utf-8 magic comment to top of file and it does not help.
So, why does this error occur? What is the difference between hitting this url on browser and on CURL besides cookies? And how do I go about fixing this issue and allow BingBot to index our site? Thanks.
The culprit that was leaking non UTF-8 characters in my template was an innocuous meta tag for Facebook Open Graph
%meta{property: "og:url", content: request.url}
And when the request is non-standard, this causes the encoding issue. Changing it to
%meta{property: "og:url", content: request.url.force_encoding('UTF-8')}
made the trick.
That error message usually occurs when you try to concatenate strings with different character encodings.
Is your database set to use UTF-8 as well?
If not, you could have a problem when you try to insert the non-UTF8 values into your UTF-8 template.
I am running into this issue where I have a controller that receives a string which is the assigned to an attribute for one of my models that I then save to the database. An log message with an inspect call shows the model successfully takes the string right up until the #save call. The problem seems to be that without any errors being thrown, if the string contains a french character, the string from that character to the end of the string becomes truncated.
Further investigation seems to show that the string gets truncated when being written to the MySQL database. I also came across this article: Stale Rails Issue
If I am reading that right, it looks like characters that are not in the ASCII character encoding but are in the ISO Latin-1 character encoding are subject to this bug. I actually upgraded my project from Rails 3.0 to Rails 3.2 and from Ruby 1.8 to Ruby 1.9 so I could easily use the mysql2 adapter with Rails which some other articles seemed to suggest might solve the issue. However it didn't.
So how do I prevent the string truncation from happening?
Edit1: If I enter the query SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%'; I get:
Variable Name, Value
'character_set_client', 'utf8'
'character_set_connection', 'utf8'
'character_set_database', 'utf8'
'character_set_filesystem', 'binary'
'character_set_results', 'utf8'
'character_set_server', 'latin1'
'character_set_system', 'utf8'
'collation_connection', 'utf8_general_ci'
'collation_database', 'utf8_unicode_ci'
'collation_server', 'latin1_swedish_ci'
Also I noticed that if I place in the french character via the MySQL Query Browser and then refresh the rails app on my browser so it pulls the new data from the database it display, it displays it correctly. It just seems to drop it when saving the model data.
Edit2: I just changed some config parameters to try to fix the problem but it still exists. However, this is what I had changed the values to.
Variable Name, Value
'character_set_client', 'utf8'
'character_set_connection', 'utf8'
'character_set_database', 'utf8'
'character_set_filesystem', 'binary'
'character_set_results', 'utf8'
'character_set_server', 'utf8'
'character_set_system', 'utf8'
'collation_connection', 'utf8_general_ci'
'collation_database', 'utf8_unicode_ci'
'collation_server', 'utf8_unicode_ci'
Well you are using utf8 but if you use utf8_unicode_ci it could be better there is another encoding utf8_general_ci which is of better performance but could have problems with German if that's a problem use the utf8_unicode_ci, that's for the database, for more information on MySQL character set check out MySQL's charset-unicode-sets.
On the side of Rails and Ruby you should check this questions out French accents in ruby. And also Rails messages in french.
As a last resource you could html encode the data before inserting it in the database. This can mess up searches but if you encode the search data also before searching the database everything should be fine for more information check French characters in rails page. I hope this helps if you keep getting errors please tell me so I can check other ways to help you out.
Also the comment by #Ahmed Ali could help you out it looks like the encodings get changed
Fetching data from any database (Mysql, Postgresql, Sqlite2 & 3), all configured to have UTF-8 as it's character set, returns the data with ASCII-8BIT in ruby 1.9.1 and rails 2.3.2.1.
See the link Ahmed posted for the complete answer and the link to the page from where the quote was taken, (ASCII-8BIT encoding of query results in rails 2.3.2 and ruby 1.9.1).
Sorry for all the trouble. I'll just put down the answer. It just turned out in this case the database was correctly set up for utf8 but a user was inputing strings encoded in ISO-Latin-1 and I wasn't doing a check for what encoding user input as I assumed all input would be utf8 compatible. Turns out that french accent characters in ISO-Latin-1 are illegal utf8 characters. The database seems to handle it by just raising a warning and truncating the string at the point of the illegal character but keeping everything before it.
I've got a Grails (2.0.4) application, all setted up to manage UTF-8 encoding (meta tag in the layout, mysql database tables). Unfortunately, something strange happens.
For example, if in a form (to create a domain instance) I type any text containing non-UK characters, like this:
más que nada
the POST contains the exact text (with the "á" character as is) but the params variable in the controller contains the wrong text:
más que nada
There's nothing between the view and the controller, how can this happen?
I also tried, without good results, to set in Config.groovy:
grails.views.default.codec = "html"
Is there something else I'm missing to set up?
Thanks in advance to everyone who will take the time to have a look at this issue.
How about these values in your Config.groovy:
grails.views.default.codec = "none"
grails.views.gsp.encoding = "UTF-8"
grails.converters.encoding = "UTF-8"
Are those properly configured?
On prod I have configured my tomcat 6 in server.xml as
<Connector port="14080" protocol="HTTP/1.1"
connectionTimeout="20000"
redirectPort="14443" URIEncoding="UTF-8"/>
The most important line is URIEncoding="UTF-8"
What's the default charset of your MySQL database? Is it ok?
This is how I create my MySQL databases:
create database [dbname] DEFAULT CHARACTER SET = utf8 DEFAULT COLLATE utf8_swedish_ci;
see http://dev.mysql.com/doc/refman/5.5/en/create-database.html for full syntax of CREATE DATABASE
Collation affects sorting. You can get a list with "show collation" sql statement in mysql. http://dev.mysql.com/doc/refman/5.1/en/show-collation.html
Changing an existing table's encoding is done with this command:
ALTER TABLE tbl_name CONVERT TO CHARACTER SET charset_name [COLLATE collation_name];
You can check the encoding of an existing table with the "show create table tbl_name" command. Changing the default encoding of the database doesn't change the encoding of existing tables (or tables imported from a mysql dump).
Did you already try with
${myHtmlContent.encodeAsHtml()}
in your view?
Well this post is few months old and the OP might possibly have found out a better solution. But an alternative solution to this problem that I have managed is to explicitly change the character encoding of the parameter in concern.
For instance, params.paramsname = new String(params.unicodeInput.getBytes("8859_1"), "UTF8");
This will force the paramsname to be correctly decoded to the Unicode character.
I just ran into this problem and just to remind you that it's just a workaround. I'm still looking for a better solution too. cheerzzz!
I'm sorry, I figured out what the problem was days ago but I hadn't time to answer my own question till now.
Unfortunately I forgot to mention a key part of the problem, because I didn't thought it was related. I got the encoding problem only on AJAX call, and I didn't mention it because all savings in my application are done through AJAX.
So, the encoding problem was related to the configuration of the content type of the jQuery post, which (to work properly with UTF-8) has to be like this:
contentType: "application/x-www-form-urlencoded;charset=UTF-8"
Given a rails models column that contains
"Something & Something Else" when outputting to_xml
Rails will escape the Ampersand like so:
<MyElement>Something & Something Else</MyElement>
Our client software is all UTF aware and it would be better if we can just leave the column content raw in our XML output.
There was an old solution that worked by setting $KCODE="UTF8" in an environment file, but this trick no longer works, and was always an All or Nothing solution.
Any recommendations on how to disable this? on a case by case basis?
It does not matter if the client software is UTF-8-aware. An ampersand cannot be used unescaped in XML. If the software is supposed to also be XML-aware, then any content that includes ampersands is not allowed to be kept "raw".
This is nothing to do with Unicode (or "UTF"). Ampersands in XML must be escaped, otherwise it isn't XML, and no XML software will accept it. If you're saying you want the escaping disabled, then you're saying you don't want the output to be XML.
In Ruby on Rails 3 (currently using Beta 4), I see that when using the form_tag or form_for helpers there is a hidden field named _snowman with the value of ☃ (Unicode \x9731) showing up.
So, what is this for?
This parameter was added to forms in order to force Internet Explorer (5, 6, 7 and 8) to encode its parameters as unicode.
Specifically, this bug can be triggered if the user switches the browser's encoding to Latin-1. To understand why a user would decide to do something seemingly so crazy, check out this google search. Once the user has put the web-site into Latin-1 mode, if they use characters that can be understood as both Latin-1 and Unicode (for instance, é or ç, common in names), Internet Explorer will encode them in Latin-1.
This means that if a user searches for "Ché Guevara", it will come through incorrectly on the server-side. In Ruby 1.9, this will result in an encoding error when the text inevitably makes its way into the regular expression engine. In Ruby 1.8, it will result in broken results for the user.
By creating a parameter that can only be understood by IE as a unicode character, we are forcing IE to look at the accept-charset attribute, which then tells it to encode all of the characters as UTF-8, even ones that can be encoded in Latin-1.
Keep in mind that in Ruby 1.8, it is extremely trivial to get Latin-1 data into your UTF-8 database (since nothing in the entire stack checks that the bytes that the user sent at any point are valid UTF-8 characters). As a result, it's extremely common for Ruby applications (and PHP applications, etc. etc.) to exhibit this user-facing bug, and therefore extremely common for users to try to change the encoding as a palliative measure.
All that said, when I wrote this patch, I didn't realize that the name of the parameter would ever appear in a user-facing place (it does with forms that use the GET action, such as search forms). Since it does, we will rename this parameter to _e, and use a more innocuous-looking unicode character.
This is here to support Internet Explorer 5 and encourage it to use UTF-8 for its forms.
The commit message seen here details it as follows:
Fix several known web encoding issues:
Specify accept-charset on all forms. All recent browsers, as well as
IE5+, will use the encoding specified
for form parameters
Unfortunately, IE5+ will not look at accept-charset unless at least one
character in the form's values is not
in the page's charset. Since the
user can override the default
charset (which Rails sets to UTF-8),
we provide a hidden input containing
a unicode character, forcing IE to
look at the accept-charset.
Now that the vast majority of web input is UTF-8, we set the inbound
parameters to UTF-8. This will
eliminate many cases of incompatible
encodings between ASCII-8BIT and
UTF-8.
You can safely ignore params[:_snowman]
In short, you can safely ignore this parameter.
Still, I am not sure why we're supporting old technologies like Internet Explorer 5. It seems like a very non-Ruby on Rails decision if you ask me.