Prawn encoding not correct - ruby-on-rails

I have problems runnig this code with Prawn:
require 'prawn'
Prawn::Document.generate "example.pdf" do |pdf|
pdf.text_box "W\xF6rth".force_encoding('UTF-8'), :at => [200,720], :size => 32
end
somehow i get this error:
`rescue in normalize_encoding': Arguments to text methods must be UTF-8
encoded(Prawn::Errors::IncompatibleStringEncoding)
But when i try this code, it works:
pdf.text_box "Wörth".force_encoding('UTF-8')
What do i wrong? How can i also fix my first example with the \xF6 in the string? Thanks!

"W\xF6rth" is not a valid UTF-8 sequence.
"W\xF6rth".valid_encoding?
=> false
The maximum one-byte character code in UTF-8 is 0x7F, after that you need to start encoding with two bytes.
"Wörth".bytes.map { |b| b.to_s(16) }
=> ["57", "c3", "b6", "72", "74", "68"]
^^----^^ <-- Two bytes representing UTF-8 "ö"
I think you're trying to convert ISO-8859-1 to UTF-8.
In ISO-8859-1 "ö" is 0xF6.
This is what should work in your case:
"W\xf6rth".force_encoding('iso-8859-1').encode('utf-8')
=> "Wörth"
I.e.
pdf.text_box "W\xF6rth".force_encoding('iso-8859-1').encode('utf-8'), ...
References:
http://en.wikipedia.org/wiki/ISO/IEC_8859-1
http://en.wikipedia.org/wiki/UTF-8

Related

How to decode utf8 characters with ruby Base64

Just as the title describes, how can I do the following?
require 'base64'
text = 'éééé'
encode = Base64.encode64(text)
Base64.decode64(encode)
Result: éééé instead of \xC3\xA9\xC3\xA9
When you decode64 you get back a string with BINARY (a.k.a. ASCII-8BIT) encoding:
Base64.decode64(encode).encoding
# => #<Encoding:ASCII-8BIT>
The trick is to force-apply a particular encoding:
Base64.decode64(encode).force_encoding('UTF-8')
# => "éééé"
This assumes that your string is valid UTF-8, which it might not be, so use with caution.
Just use Base64's encode and decode method:
require 'base64'
=> true
Base64.encode64('aksdfjd')
=> "YWtzZGZqZA==\n"
Base64.decode64 "YWtzZGZqZA==\n"
=> "aksdfjd"

rails incompatible character encodings: UTF-8 and ASCII-8BIT in json

I use RestClient to retrieve a json string from a webservice via GET.
This works fine but as soon as there are Umlauts (ü) and other chars (e.g. ß) in the string, I get this error in my view
#output = RestClient.get 'https://myurl.com/api/v1/orders/53e0ae7f6630361c46060000', {:authorization => 'Token xxxxxx', :content_type => :json, :accept => :json}
<%= #output %>
=>
Encoding::CompatibilityError
incompatible character encodings: UTF-8 and ASCII-8BIT
any idea how to solve this?
Solved after adding this line
#output = #output.force_encoding('utf-8').encode

"\xC2" to UTF-8 in conversion from ASCII-8BIT to UTF-8

I have a rails project that runs fine with MRI 1.9.3. When I try to run with Rubinius I get this error in app/views/layouts/application.html.haml:
"\xC2" to UTF-8 in conversion from ASCII-8BIT to UTF-8
It turns out the page had an invalid character (an interpunct '·'), which I found out with the following code (credits to this gist and this question):
lines = IO.readlines("app/views/layouts/application.html.haml").map do |line|
line.force_encoding('ASCII-8BIT').encode('UTF-8', :invalid => :replace, :undef => :replace, :replace => '?')
end
File.open("app/views/layouts/application.html.haml", "w") do |file|
file.puts(lines)
end
After running this code, I could find the problematic characters with a simple git diff and moved the code to a helper file with # encoding: utf-8 at the top.
I'm not sure why this doesn't fail with MRI but it should since I'm not specifying the encoding of the haml file.

UTF-8 conversion not working with String#encode but Iconv

I had this with Iconv:
git_log = Iconv.conv 'UTF-8', 'iso8859-1', git_log
Now I want to change it to use String#encode due to deprecation warnings, but I can't, doesn't work:
git_log = git_log.encode(Encoding::UTF_8, :invalid => :replace, :undef => :replace, :replace => '')
I used to use Iconv here, and it's still working:
https://github.com/gamersmafia/gamersmafia/blob/master/lib/formatting.rb#L244
But when I replace these line with String#encode method, first gsub raises a "invalid byte sequence in UTF-8" error.
Do you know why?
In your call to String#encode you don’t specify a source encoding. Ruby is using the strings current encoding as the source, which appears to be UTF-8, and according to the docs:
Please note that conversion from an encoding enc to the same encoding enc is a no-op, i.e. the receiver is returned without any changes, and no exceptions are raised, even if there are invalid bytes.
In other words the call has no effect, and leaves the bytes in the string as they are, encoded as ISO-8859-1. The next call to gsub then tries to interpret these bytes as UTF-8, and since they are invalid (they are unchanged from ISO-8859-1) you get the error you see.
String#encode has a a form that accepts the source encoding as the second parameter, so you can explicitly specify it, similarly to what you are doing with Iconv. Try this:
git_log = git_log.encode(Encoding::UTF_8,
Encoding::ISO_8859_1,
:invalid => :replace,
:undef => :replace,
:replace => '')
You could also use the ! form in this case, which has the same effect:
git_log.encode!(Encoding::UTF_8,
Encoding::ISO_8859_1,
:invalid => :replace,
:undef => :replace,
:replace => '')
Try the following approach, which removes a character from a string if the character is mal-encoded:
invalid_character_indices = []
mystring.each_char.with_index do |char, i|
invalid_character_indices << i unless char == char.encode(Encoding::UTF_8, Encoding::ISO_8859_1,:invalid => :replace, :undef => :replace, :replace => "")
end
invalid_character_indices.each do |i|
mystring.delete!(mystring[i])
end

How to change the encoding during CSV parsing in Rails

I would like to know how can I change the encoding of my CSV file when I import it and parse it. I have this code:
csv = CSV.parse(output, :headers => true, :col_sep => ";")
csv.each do |row|
row = row.to_hash.with_indifferent_access
insert_data_method(row)
end
When I read my file, I get this error:
Encoding::CompatibilityError in FileImportingController#load_file
incompatible character encodings: ASCII-8BIT and UTF-8
I read about row.force_encoding('utf-8') but it does not work:
NoMethodError in FileImportingController#load_file
undefined method `force_encoding' for #<ActiveSupport::HashWithIndifferentAccess:0x2905ad0>
Thanks.
I had to read CSV files encoded in ISO-8859-1.
Doing the documented
CSV.foreach(filename, encoding:'iso-8859-1:utf-8', col_sep: ';', headers: true) do |row|
threw the exception
ArgumentError: invalid byte sequence in UTF-8
from csv.rb:2027:in '=~'
from csv.rb:2027:in 'init_separators'
from csv.rb:1570:in 'initialize'
from csv.rb:1335:in 'new'
from csv.rb:1335:in 'open'
from csv.rb:1201:in 'foreach'
so I ended up reading the file and converting it to UTF-8 while reading, then parsing the string:
CSV.parse(File.open(filename, 'r:iso-8859-1:utf-8'){|f| f.read}, col_sep: ';', headers: true, header_converters: :symbol) do |row|
pp row
end
force_encoding is meant to be run on a string, but it looks like you're calling it on a hash. You could say:
output.force_encoding('utf-8')
csv = CSV.parse(output, :headers => true, :col_sep => ";")
...
Hey I wrote a little blog post about what I did, but it's slightly more verbose than what's already been posted. For whatever reason, I couldn't get those solutions to work and this did.
This gist is that I simply replace (or in my case, remove) the invalid/undefined characters in my file then rewrite it. I used this method to convert the files:
def convert_to_utf8_encoding(original_file)
original_string = original_file.read
final_string = original_string.encode(invalid: :replace, undef: :replace, replace: '') #If you'd rather invalid characters be replaced with something else, do so here.
final_file = Tempfile.new('import') #No need to save a real File
final_file.write(final_string)
final_file.close #Don't forget me
final_file
end
Hope this helps.
Edit: No destination encoding is specified here because encode assumes that you're encoding to your default encoding which for most Rails applications is UTF-8 (I believe)

Resources