How to decode utf8 characters with ruby Base64 - ruby-on-rails

Just as the title describes, how can I do the following?
require 'base64'
text = 'éééé'
encode = Base64.encode64(text)
Base64.decode64(encode)
Result: éééé instead of \xC3\xA9\xC3\xA9

When you decode64 you get back a string with BINARY (a.k.a. ASCII-8BIT) encoding:
Base64.decode64(encode).encoding
# => #<Encoding:ASCII-8BIT>
The trick is to force-apply a particular encoding:
Base64.decode64(encode).force_encoding('UTF-8')
# => "éééé"
This assumes that your string is valid UTF-8, which it might not be, so use with caution.

Just use Base64's encode and decode method:
require 'base64'
=> true
Base64.encode64('aksdfjd')
=> "YWtzZGZqZA==\n"
Base64.decode64 "YWtzZGZqZA==\n"
=> "aksdfjd"

Related

Prawn encoding not correct

I have problems runnig this code with Prawn:
require 'prawn'
Prawn::Document.generate "example.pdf" do |pdf|
pdf.text_box "W\xF6rth".force_encoding('UTF-8'), :at => [200,720], :size => 32
end
somehow i get this error:
`rescue in normalize_encoding': Arguments to text methods must be UTF-8
encoded(Prawn::Errors::IncompatibleStringEncoding)
But when i try this code, it works:
pdf.text_box "Wörth".force_encoding('UTF-8')
What do i wrong? How can i also fix my first example with the \xF6 in the string? Thanks!
"W\xF6rth" is not a valid UTF-8 sequence.
"W\xF6rth".valid_encoding?
=> false
The maximum one-byte character code in UTF-8 is 0x7F, after that you need to start encoding with two bytes.
"Wörth".bytes.map { |b| b.to_s(16) }
=> ["57", "c3", "b6", "72", "74", "68"]
^^----^^ <-- Two bytes representing UTF-8 "ö"
I think you're trying to convert ISO-8859-1 to UTF-8.
In ISO-8859-1 "ö" is 0xF6.
This is what should work in your case:
"W\xf6rth".force_encoding('iso-8859-1').encode('utf-8')
=> "Wörth"
I.e.
pdf.text_box "W\xF6rth".force_encoding('iso-8859-1').encode('utf-8'), ...
References:
http://en.wikipedia.org/wiki/ISO/IEC_8859-1
http://en.wikipedia.org/wiki/UTF-8

Format encrypted string to pass as URL param

I am using the 'encryptor' gem to scramble a string that I want to include as a URL parameter on a redirect. Unfortunately, the URI::encode function does not turn the encrypted string into an acceptable format to be included into the URL. How can I turn it into something that can be passed as a URL parameter?
salt = Time.now.to_i.to_s
secret_key = 'secret'
iv = OpenSSL::Cipher::Cipher.new('rc2').random_iv
encrypted_url = Encryptor.encrypt("some url parameter as string", :algorithm => 'rc2', :key => secret_key, :iv => iv, :salt => salt)
url = URI::encode(encrypted_url)
redirect_to 'http://domain.com/' + url
Base64 is recommended in this use case.
Addressable
Genereally encoding URL's should be done with the Addressable gem. In your case you're using non UTF-8 characters which will raise an error with the standard parse. So you'll need to use the encode feature of addressable.
require 'encryptor'
require 'openssl'
require 'addressable/uri'
salt = Time.now.to_i.to_s
# => "1414221973"
secret_key = 'secret'
# => "secret"
iv = OpenSSL::Cipher::Cipher.new('rc2').random_iv
# => "\x97\xE5\x83\xFF#\x97\x0Fn"
encrypted_url = Encryptor.encrypt("some url parameter as string", :algorithm => 'rc2', :key => secret_key, :iv => iv, :salt => salt)
# => "\xD6\x1D\x1A\x8A\x06f\x91\x91I\xD2\x04\xEB\x81\xFF\xCC&\xFA\e\x94,\xAE\xA0\xDA\xFA\xD2\xD8w\xF3\xD4\x8E\xB64"
url = Addressable::URI.encode_component(encrypted_url)
# => "%D6%1D%1A%8A%06f%91%91I%D2%04%EB%81%FF%CC&%FA%1B%94,%AE%A0%DA%FA%D2%D8w%F3%D4%8E%B64"
redirect_to 'http://domain.com/?' + url # You'll want to prepend url params with a question mark
For the URL I recommend give the encrypted string a param name
'http://domain.com/?encsite=' + url
NOTE: I'm uncertain as to whether these % symbols are permitted as is in URLs. You may need to URI.encode the result to exchange % to %25.
In my tests with Addressable I got the following:
require 'addressable/uri'
#...
url = Addressable::URI.parse(encrypted_url)
# => #<Addressable::URI:0x93bcf0 URI:��f��I�����&�,������w�Ԏ�4>
url.normalize
# ArgumentError: invalid byte sequence in UTF-8
# SO ENCODE INSTEAD
url = Addressable::URI.encode_component(encrypted_url)
# => "%D6%1D%1A%8A%06f%91%91I%D2%04%EB%81%FF%CC&%FA%1B%94,%AE%A0%DA%FA%D2%D8w%F3%D4%8E%B64"
For more in depth Addressable encoding information you can find the list of methods with description here: http://www.rubydoc.info/gems/addressable/Addressable/URI
Base64
You can just use Base64 instead. E.G.)
require 'base64'
#...
url = Base64.encode64(encrypted_url)
# => "1h0aigZmkZFJ0gTrgf/MJvoblCyuoNr60th389SOtjQ=\n"
url.chomp!
# => "1h0aigZmkZFJ0gTrgf/MJvoblCyuoNr60th389SOtjQ="
Base64.decode64(url) == encrypted_url
# => true

Ruby string force encoding

How can I force encode this: Al-F\u0026#257;ti\u0026#293;ah to Al-Fātiĥah
I tried .encode!('UTF-16', :undef => :replace, :invalid => :replace, :replace => "") and force_encoding("UTF-8") with no luck
That text seems to include HTML or XML entities.
Try
require "cgi/util"
CGI.unescapeHTML("Al-F\u0026#257;ti\u0026#293;ah")
or
# gem install nokogiri
require "nokogiri"
Nokogiri::XML.fragment("Al-F\u0026#257;ti\u0026#293;ah").text
See: Converting escaped XML entities back into UTF-8

Encoding::UndefinedConversionError: "\xE4" from ASCII-8BIT to UTF-8

I tried to fetch this CSV-File with Net::HTTP.
File.open(file, "w:UTF-8") do |f|
content = Net::HTTP.get_response(URI.parse(url)).body
f.write(content)
end
After reading my local csv file again, i got some weird output.
Nationalit\xE4t;Alter 0-5
I tried to encode it to UTF-8, but got the error Encoding::UndefinedConversionError: "\xE4" from ASCII-8BIT to UTF-8
The rchardet gem tolds me the content is ISO-8859-2. But convert to UTF-8 will not work.
After open it in a normal Texteditor, i see it normal encoded.
You can go with force_encoding:
require 'net/http'
url = "http://data.linz.gv.at/katalog/population/abstammung/2012/auslg_2012.csv"
File.open('output', "w:UTF-8") do |f|
content = Net::HTTP.get_response(URI.parse(url)).body
f.write(content.force_encoding("UTF-8"))
end
But this will make you lose some acentuation in your .cvs file
If you are deadly sure that you always will use this URL as input, and the file will always keep this encoding, you can do
# encoding: utf-8
require 'net/http'
url = "http://data.linz.gv.at/katalog/population/abstammung/2012/auslg_2012.csv"
File.open('output', "w:UTF-8") do |f|
content = Net::HTTP.get_response(URI.parse(url)).body
f.write(content.encode("UTF-8", "ISO-8859-15"))
end
But this will only work to this file.

How to change the encoding during CSV parsing in Rails

I would like to know how can I change the encoding of my CSV file when I import it and parse it. I have this code:
csv = CSV.parse(output, :headers => true, :col_sep => ";")
csv.each do |row|
row = row.to_hash.with_indifferent_access
insert_data_method(row)
end
When I read my file, I get this error:
Encoding::CompatibilityError in FileImportingController#load_file
incompatible character encodings: ASCII-8BIT and UTF-8
I read about row.force_encoding('utf-8') but it does not work:
NoMethodError in FileImportingController#load_file
undefined method `force_encoding' for #<ActiveSupport::HashWithIndifferentAccess:0x2905ad0>
Thanks.
I had to read CSV files encoded in ISO-8859-1.
Doing the documented
CSV.foreach(filename, encoding:'iso-8859-1:utf-8', col_sep: ';', headers: true) do |row|
threw the exception
ArgumentError: invalid byte sequence in UTF-8
from csv.rb:2027:in '=~'
from csv.rb:2027:in 'init_separators'
from csv.rb:1570:in 'initialize'
from csv.rb:1335:in 'new'
from csv.rb:1335:in 'open'
from csv.rb:1201:in 'foreach'
so I ended up reading the file and converting it to UTF-8 while reading, then parsing the string:
CSV.parse(File.open(filename, 'r:iso-8859-1:utf-8'){|f| f.read}, col_sep: ';', headers: true, header_converters: :symbol) do |row|
pp row
end
force_encoding is meant to be run on a string, but it looks like you're calling it on a hash. You could say:
output.force_encoding('utf-8')
csv = CSV.parse(output, :headers => true, :col_sep => ";")
...
Hey I wrote a little blog post about what I did, but it's slightly more verbose than what's already been posted. For whatever reason, I couldn't get those solutions to work and this did.
This gist is that I simply replace (or in my case, remove) the invalid/undefined characters in my file then rewrite it. I used this method to convert the files:
def convert_to_utf8_encoding(original_file)
original_string = original_file.read
final_string = original_string.encode(invalid: :replace, undef: :replace, replace: '') #If you'd rather invalid characters be replaced with something else, do so here.
final_file = Tempfile.new('import') #No need to save a real File
final_file.write(final_string)
final_file.close #Don't forget me
final_file
end
Hope this helps.
Edit: No destination encoding is specified here because encode assumes that you're encoding to your default encoding which for most Rails applications is UTF-8 (I believe)

Resources