ActiveAdmin, CSV import, change encoding to cp1251 - ruby-on-rails

I use ActiveAdmin. ActiveAdmin provides CSV file downloads on the index screen for each resource. How can I change encoding of CSV file to cp1251 standard?

In ruby CSV encoding you can do this way..
contents = CSV.generate( your_string.encode("cp1251"))

Add the following line to the config/initializers/active_admin.rb:
config.csv_options = { col_sep: ';', force_quotes: true, encoding: 'ISO-8859-1', encoding_options: {invalid: :replace, undef: :replace, replace: '?'}}

Related

Invalid Byte Sequence in UTF-8 from Excel file

(Ruby 2.5) I have a method that reads and parses a csv file that's being uploaded via Alchemy CMS
def process_csv(csv_file, current_user_id, original_filename)
lock_importer
errors = []
index = 0
string_converter = lambda { |field| field.strip }
total = CSV.foreach(csv_file, headers: true).count
csv_string = csv_file.read.encode!("UTF-8", "iso-8859-1", invalid: :replace)
CSV.parse(csv_string, headers: true, header_converters: :symbol, skip_blanks: true, converters: [string_converter] ) do |row|
# do other stuff
end
but when I try to upload a csv file that has a column (name) with a string that contains special characters then I receive the Invalid Byte Sequence in UTF-8 error. I'm trying to test the value N'öt Réal Stô'rë.
I've tried a few solutions that I found on the web but no luck - any suggestions?
It's unclear what your csv_fileis. I guess it is a File-object.
Sometimes I got csv from Excel as a UTF-16. So let's try an example:
I have a csv-file stored in UTF-16BE with the following content:
line;comment;UmlautÄ
1;Das ist UTF-16 BE;Ä
2;öüäÖÄÜ;Ä
If I execute the following code:
require 'csv'
def process_csv(csv_file)
csv_string = csv_file.read#.encode!("UTF-8", "iso-8859-1", invalid: :replace)
CSV.parse(csv_string, headers: true, skip_blanks: true, col_sep: ';') do |row|
p row # do other stuff
end
end
process_csv(File.open('example_utf16BE.txt'))
then I get also a Invalid byte sequence in UTF-8-error.
If I use
process_csv(File.open('example_utf16BE.txt', 'rb', encoding: 'BOM|utf-16BE'))
then everything works.
So I guess, you get a File-object in a wron encoding and the code csv_file.read.encode!("UTF-8", "iso-8859-1", invalid: :replace) is a code part to repair this problem.
What you can do:
Add to you code:
p csv_file
p csv_file.external_encoding
You should get
#<File:example_utf16BE.txt>
#<Encoding:UTF-16BE>
Now check, if the file (in my example: example_utf16BE.txt has really the encoding of the 2nd line.
If not, try to adapt the File-object creation.
If this is not possible, then you can try to use csv_file.set_encoding 'utf-8' to change the encoding before you read the content.

Parse binary CSV file in Ruby

This should have been such an easy thing... buy I can't for the life of me figure out how to parse a CSV file that doesn't seem to have a specific encoding.
File.open(Rails.root.join('data', 'mike/test-csv.csv'), 'rb') { |f| f.read }
=> "ID,\x00Q\x00u\x00a\x00n\x00t\x00i\x00t\x00y\n\x006\x00e\x005\x004\x009\x001\x00e\x007\x00-\x007\x00f\x001\x005\x00-\x004\x001\x007\x00d\x00-\x00a\x004\x000\x003\x00-345\x00,\x00\x005\x000\x00.\x000\x000\x000\x000\x000\x000\x000\x000\x00\n"
Here's a gist of it, can't figure out a way to post the specific CSV.
All I get from checking the encoding of the file is that it's in binary format, any thoughts on how I could get it into a normal csv?
Note: This is a downloaded CSV so converting it to another encoding via opening it in excel and exporting (or something like that) is not an option :)
Thanks!
Updating with attempted solution 1:
path = Rails.root.join('data', 'mike/test-csv.csv')
CSV.read(path, {:headers => true, :encoding => 'utf-8'}).each do |d|
puts d
end
Result: 6e5491e7-7f15-417d-a403-345,50.00000000
While this is correct, it ONLY works with puts, for example:
CSV.read(path, {:headers => true, :encoding => 'utf-8'}).map { |row| row }
=> [#<CSV::Row "ID":"\u00006\u0000e\u00005\u00004\u00009\u00001\u0000e\u00007\u0000-\u00007\u0000f\u00001\u00005\u0000-\u00004\u00001\u00007\u0000d\u0000-\u0000a\u00004\u00000\u00003\u0000-345\u0000" "\u0000Q\u0000u\u0000a\u0000n\u0000t\u0000i\u0000t\u0000y":"\u0000\u00005\u00000\u0000.\u00000\u00000\u00000\u00000\u00000\u00000\u00000\u00000\u0000">]
CSV.read(path, {:headers => true, :encoding => 'utf-8'}).map(&:to_s)
=> ["\u00006\u0000e\u00005\u00004\u00009\u00001\u0000e\u00007\u0000-\u00007\u0000f\u00001\u00005\u0000-\u00004\u00001\u00007\u0000d\u0000-\u0000a\u00004\u00000\u00003\u0000-345\u0000,\u0000\u00005\u00000\u0000.\u00000\u00000\u00000\u00000\u00000\u00000\u00000\u00000\u0000\n"]
It's unfortunately still not the correct string :(
Final Solution (via #ashmaroli below):
path = Rails.root.join('data', 'mike/test-csv.csv')
csv_text = ''
File.open(path, 'r') do |csv|
csv.each_line do |line|
csv_text << line.gsub(/\u0000/, '')
end
end
CSV.parse(csv_text, headers:true).map do |row| row end
Result:
[#<CSV::Row "ID":"6e5491e7-7f15-417d-a403-345" "Quantity":"50.00000000">]
Github Gist
Download Example CSV File
path = Rails.root.join('data', 'mike/test-csv.csv')
file = ""
File.open(path, 'r') do |csv|
csv.each_line do |line|
file << line.gsub(/\u0000/, '')
end
end
print file
print file.inspect # same as above just wraps the string in a
# single line with "\n" chars

Rails 4.2 - how to fix ascii code in CSV exporting without gem 'iconv'?

When exporting csv in Rails 4.2 app, there are ascii code in the csv output for Chinese characters (UTF8):
中åˆåŒç†Šå·¥ç­‰ç”¨é¤
We tried options in send_data without luck:
send_data #payment_requests.to_csv, :type => 'text/csv; charset=utf-8; header=present'
And:
send_data #payment_requests.to_csv.force_encoding("UTF-8")
In model, there is forced encoding utf8:
# encoding: utf-8
But it does not work. There are online posts talking about use gem iconv. However iconv depends on the platform's ruby version. Is there cleaner solution to fix the ascii in Rails 4.2 csv exporting?
If #payment_requests.to_csv includes ASCII text, then you should use encode method:
#payment_requests.to_csv.encode("UTF-8")
or
#payment_requests.to_csv.force_encoding("ASCII").encode("UTF-8")
depending on which internal encoding #payment_requests.to_csv has.
You can try:
#payment_requests.to_csv.force_encoding("ISO-8859-1")
for Chinese characters
CSV.generate(options) do |csv|
csv << column_names
all.each do |product|
csv << product.attributes.values_at(*column_names)
end
end.encode('gb2312', :invalid => :replace, :undef => :replace, :replace => "?")
This is what worked for me:
head = 'EF BB BF'.split(' ').map{|a|a.hex.chr}.join()
csv_str = CSV.generate(csv = head) do |csv|
csv << [ , , , ...]
#elements.each do |element|
csv << [ , , , ...]
end
end

"\x9D" to UTF-8 in conversion from Windows-1252 to UTF-8

I have created a csv uploader on my rails app, but sometimes I get an error of
"\x9D" to UTF-8 in conversion from Windows-1252 to UTF-8
This is the source to my uploader:
def self.import(file)
CSV.foreach(file.path, headers: true, encoding: "windows-1252:utf-8") do |row|
title = row[1]
row[1] = title.to_ascii
description = row[2]
row[2] = description.to_ascii
Event.create! row.to_hash
end
end
I am using the unidecode gem (https://github.com/norman/unidecoder) to normalize any goofy characters that a user may input. I've ran into this error a few times, but can't determine how to fix it. I thought the encoding: "windows-1252:utf-8" line would fix the problem, but nothing there.
Thanks stack!
There is no 9D character (as well as 81, 8D, 8F, 90) in Windows-1252. It means your text is not in Windows-1252 encoding. At the very least your source text is corrupt.
I was running into this error from reading url contents:
table = CSV.parse(URI.open(document.url).read)
Turns out the API I am fetching conditionally returns GZIP if the file is too large.
Another annoying thing is that rails decompression was then failing on a valid UTF8 error.
This did NOT work:
ActiveSupport::Gzip.decompress(URI.open(document.url).read)
This did work:
Zlib::GzipReader.wrap(URI.open(document.url), &:read)
My next problem is the CSV.parse() reads the entire blob, and I had a single line with errors.
downloaded_file = StringIO.new(Zlib::GzipReader.wrap(URI.open(document.url), &:read))
tempfile = Tempfile.new("open-uri", binmode: true)
IO.copy_stream(downloaded_file, tempfile.path)
headers = nil
File.foreach(tempfile.path) do |line|
row = []
if headers.blank?
headers = CSV.parse_line(line, { col_sep: "\t", liberal_parsing: true })
else
line_data = CSV.parse_line(line.force_encoding("UTF-8").encode('UTF-8', :invalid => :replace, :undef => :replace), { col_sep: "\t", liberal_parsing: true })
row = headers.zip(line_data)
end
puts row.inspect
... # do a lot more stuff
end
wow.

How to change the encoding during CSV parsing in Rails

I would like to know how can I change the encoding of my CSV file when I import it and parse it. I have this code:
csv = CSV.parse(output, :headers => true, :col_sep => ";")
csv.each do |row|
row = row.to_hash.with_indifferent_access
insert_data_method(row)
end
When I read my file, I get this error:
Encoding::CompatibilityError in FileImportingController#load_file
incompatible character encodings: ASCII-8BIT and UTF-8
I read about row.force_encoding('utf-8') but it does not work:
NoMethodError in FileImportingController#load_file
undefined method `force_encoding' for #<ActiveSupport::HashWithIndifferentAccess:0x2905ad0>
Thanks.
I had to read CSV files encoded in ISO-8859-1.
Doing the documented
CSV.foreach(filename, encoding:'iso-8859-1:utf-8', col_sep: ';', headers: true) do |row|
threw the exception
ArgumentError: invalid byte sequence in UTF-8
from csv.rb:2027:in '=~'
from csv.rb:2027:in 'init_separators'
from csv.rb:1570:in 'initialize'
from csv.rb:1335:in 'new'
from csv.rb:1335:in 'open'
from csv.rb:1201:in 'foreach'
so I ended up reading the file and converting it to UTF-8 while reading, then parsing the string:
CSV.parse(File.open(filename, 'r:iso-8859-1:utf-8'){|f| f.read}, col_sep: ';', headers: true, header_converters: :symbol) do |row|
pp row
end
force_encoding is meant to be run on a string, but it looks like you're calling it on a hash. You could say:
output.force_encoding('utf-8')
csv = CSV.parse(output, :headers => true, :col_sep => ";")
...
Hey I wrote a little blog post about what I did, but it's slightly more verbose than what's already been posted. For whatever reason, I couldn't get those solutions to work and this did.
This gist is that I simply replace (or in my case, remove) the invalid/undefined characters in my file then rewrite it. I used this method to convert the files:
def convert_to_utf8_encoding(original_file)
original_string = original_file.read
final_string = original_string.encode(invalid: :replace, undef: :replace, replace: '') #If you'd rather invalid characters be replaced with something else, do so here.
final_file = Tempfile.new('import') #No need to save a real File
final_file.write(final_string)
final_file.close #Don't forget me
final_file
end
Hope this helps.
Edit: No destination encoding is specified here because encode assumes that you're encoding to your default encoding which for most Rails applications is UTF-8 (I believe)

Resources