ruby on rails x charset - ruby-on-rails

i'm having problem to deal with charset in ruby on rails app, specificially in my templates. Code that comes from my database, works fine, but codes like ç ~ that are located in my views are not working. I added the following codes to my code
I added a function like that, but that still not working i have ç ~ codes in my application.rhtml that are not working.
before_filter :configure_charsets
# Configuring charset to UTF-8 def configure_charsets
headers["Content-Type"] = "text/html; charset=UTF-8"
end
I added as well meta http-equiv html to utf-8 and a .htaccess parameter AddDefaultCharset UTF-8
That's still not working, any other tip?

Put this piece of code in your config (environment.rb)
Rails::Initializer.run do |config|
config.action_controller.default_charset = "iso-8859-1"
end
This will do it.
Also, remove the default charset line if any in layouts/application.html

Is the text editor you're using to put the special characters into the file (either source or views) treating those characters as UTF-8? For example, if you're using TextMate, you can deliberately save a file as UTF-8. If for some reason you used a different encoding earlier (a default, perhaps), those UTF-8 characters might be getting transcoded at the code editing stage, so even if the rendering process is using UTF-8 throughout, it'll still not work.
Further, if you're using something from a shell, like vi, or whatever, is your terminal set up to accept UTF-8 as default? If you had it set to ISO-8859-1 or whatever, you'd get the same issue.

Is your application.rhtml file written in the correct character set? Make sure it's UTF-8, and not ISO-8859-1.

So if the contents of your file are UTF-8, and the output is being interpreted as UTF-8, something in between is changing the data. Can give give us the the hex interpretation of the input bytes (anything non-ASCII will be at least two bytes in UTF-8) for one of your special characters, and the hex interpretation of the output byte or bytes? Perhaps we can figure out what the change is, and work back from there.

Related

—- " added in HTML when converting MarkDown file to HTML using Jekyll tool

I have used Jekyll tool to convert MarkDown file To HTML. It has been successfully converted to HTML. but the below following encoded punctuation characters has been added at the top of the HTML, due to the file encoded format is Encode in UTF-8.
"—-"
After changed the same markdown file to Encode in ANSI format in NotePad++[encoding option in menu bar]. The punctuation character not included in generated HTML.
In this we need to manually change the markdown file to ANSI for HTML generation 'Jekyll'.
Any solution for this?
 is the UTF-8 BOM so that's probably what you are seeing, assuming you are looking at it using CP1252; and — is something out of the General Punctuation block.
Proper diagnostics are not possible without an indication of which character encoding you are using instead of UTF-8 to view the file, and/or what exact bytes you have in the file, probably as a hex dump. The first few bytes (the BOM) would be EF BB BF. See also the character-encoding tag wiki for troubleshooting tips.
Quick googling indicates that Jekyll is highly allergic to UTF-8 BOM in its input, so it seems unlikely that it generates spurious BOM characters on output. I could speculate that the template file you are using has a BOM and that it is being faithfully included in the output, but I'm not really familiar enough with Jekyll to actually help troubleshoot any further.
Of course, as per the big ugly warnings all over the Jekyll site, I assume you have already made sure that your Markdown input doesn't have a BOM character. Many Windows editors are notorious for putting one in when you save as UTF-8; make sure you use "UTF-8 without a BOM" as the "Save As..." format -- and switch to an editor which offers this option if yours doesn't have it.
try using charset=utf-8
or
Check your content has any straight double quote (" ") or straight single quote (' ') and remove those
http://practicaltypography.com/straight-and-curly-quotes.html
This encoding format issue. make the markdown file in UTF-8 without BOM format.
This will remove the punctuation character in 'html' .

Ruby on Rails locales utf-8 issues

I allready have problems with utf-8 encoding in my ror app ...
some are fixed now. But some are still left.
I have now an utf-8 force in my layout
But still have problems with German special chars (ä, ö, ü). In my /config/locales/de.yml I have lots of them. In the File they look nice :) tested with rubymine and nano.
But when I start the app it crashes. The yml is encoded in utf-8 ..
I've also tried this:
f\xC3\xBCr --> should be für
always got this:
incompatible character encodings: UTF-8 and ASCII-8BIT
Does anyone have some hints for me?
It seems to me that the encoding of the app is set to UTF-8.
Are you sure that RubyMine saves your file with UTF-8?
You can add
# encoding: UTF-8 to the top of your files to assure it is set. (Not sure if this works in .yml)
Edit:
If you have pasted any text into the file it may still contain wrong encoding.
Move the de.yml out of the project.
Create a new file de.yml
de:
first_translation: Ich möchten ein bisschen Müsli
If this works, then you need to rewrite everything from the old file, no copying!

What Windows 'hosts' encoding is?

What the Windows 'hosts' file encoding is? Is it UTF-8? Or ASCII + system codepage? How IDN (international domain names with umlauts etc.) entries should be added and can they be added at all?
It should be ANSI or UTF-8 without BOM. I just dealt with a server that had the hosts file encoding set to UCS-2 Little Endian, and that led to the file being ignored.
There is a wealth of information here:
https://serverfault.com/questions/452268/hosts-file-ignored-how-to-troubleshoot
The simple answer is
ANSI or UTF-8 WITH BOM.
(UTF-8 without BOM is NOT valid).
Details:
As far as I have tried, the encoding of the hosts file on Windows should be
ANSI or UTF-8 with BOM.
I know this question is many years old, but a colleague made the mistake of looking at this post and the ServerFault post, so I decided to add an answer.
1. Simple case only ASCII
Works.
Without any multi-byte characters, This is equivalent to ANSI, also equivalent to UTF-8 without BOM.
2. ANSI (with Japanese ANSI multi-byte characters)
Works.
note: There are Japanese characters but this is valid ANSI encoding in windows.
In Japanese editions of Windows, this code page cp932 is referred to as "ANSI",
https://en.wikipedia.org/wiki/Code_page_932_(Microsoft_Windows)
3. UTF-8 with BOM
Works.
note: BOM 付き means with BOM.
4. UTF-8 without BOM
DOES NOT work.
5. Additional test cases
If you use emoji instead of Japanese, the result will be the same.
Use emoji and save as UTF8 without BOM does not work.
(However, other lines not include emoji may be worked correctly.)
Use emoji and save as UTF8 with BOM can resolve host correctly.
note: If you use Notepad to check it yourself, be sure to put double quotes in the file name when you save it, or Notepad will be create hosts.txt.
Appended:
(Asked in comment)
The hosts file supports inline comments.

Dealing with a non-ascii character in Rspec Testing

I'm using the DocSplit gem for Ruby 1.9.3 to create Unicode UTF-8 versions of word documents. To my surprise today while I was running a test on a particular piece of one of these documents I started running into character encoding inconstencies.
I have tried a number of different methods to resolve the issue which I will list below, but the best success I've had so far is to remove all non-ASCII characters. This is far from ideal, as I don't think the character's are really going to be all that problematic in the DB.
gsub(/[^[:ascii:]]/, "")
This is a sample of what my output looks like vs. what I'm expecting:
My CODES'S APOSTROPHE
My CODES’S APOSTROPHE
The second apostrophe should look squiggly. If you paste it into irb, you get the following: \U+FFE2
I tried Regexing specifically for this character and it appears to work in Rubular. As soon as I put it in my model however, I got a syntax error.
syntax error, unexpected $end, expecting ')'
raw_title = raw_title.gsub(/’/, "")
I also tried forcing the encoding to UTF-8, but everything is already in UTF-8 and this does not appear to have an effect. I tried forcing the output to US-ASCII, but I get a byte sequence error.
I also tried a few of the encoding options found in Ruby library. These basically did the same thing as the Regex.
This all comes down to that I'm trying to match output for testing purposes. Should I even be concerned about these special characters? Is there a better way to match these characters without blindly removing them?
Try adding:
# encoding: utf-8
at the top of the failing rspec file. This should ensure things like:
raw_title = raw_title.gsub(/’/, "")
in your spec work.
I tried using the above example. but even after that it kept failing. So I used iconv to convert that specfic character. THis is what I used
Iconv.conv('ASCII//IGNORE', 'UTF8', text_to_be_converted)
I tried what was given in the following link - How to get rid of non-ascii characters in ruby

UTF-8 issue in Ruby on Rails with × character

<a class="close" href="#">×</a>
I get an error regarding the use of ×.
It's used in error messages on twitter's bootstrap framework, I get an invalid byte sequence in UTF-8 error when I try to use it. Is there any work-around? Apart from using a normal x or X.
I have:
# Configure the default encoding used in templates for Ruby 1.9.
config.encoding = "utf-8"
In my application.rb
This seems almost too simple, but why aren't you using ×?
You need to set the encoding at the top of the file where that character is used. You can do this with:
# coding: utf-8
class MyClass
end
I haven't tried it in an erb file, but I don't see why that would be any different. I think you can use the word "encoding" too instead of just "coding" if that feels better. All that is required is at minimum "coding".
What editor are you using?
I suspect that you are saving the source file using an encoding other than UTF-8 (such as Latin-1 or ANSI on Windows), which is then causing ruby to fail to interpret the file correctly.
I've tried adding the times symbol to one of my views (using HAML) and it worked correctly. I'm using VIM as my editor and saving in UTF-8 without any BOM.
#encoding: utf-8
class ClassiClass
end
everything works fine!

Resources