Parse binary CSV file in Ruby - ruby-on-rails

This should have been such an easy thing... buy I can't for the life of me figure out how to parse a CSV file that doesn't seem to have a specific encoding.
File.open(Rails.root.join('data', 'mike/test-csv.csv'), 'rb') { |f| f.read }
=> "ID,\x00Q\x00u\x00a\x00n\x00t\x00i\x00t\x00y\n\x006\x00e\x005\x004\x009\x001\x00e\x007\x00-\x007\x00f\x001\x005\x00-\x004\x001\x007\x00d\x00-\x00a\x004\x000\x003\x00-345\x00,\x00\x005\x000\x00.\x000\x000\x000\x000\x000\x000\x000\x000\x00\n"
Here's a gist of it, can't figure out a way to post the specific CSV.
All I get from checking the encoding of the file is that it's in binary format, any thoughts on how I could get it into a normal csv?
Note: This is a downloaded CSV so converting it to another encoding via opening it in excel and exporting (or something like that) is not an option :)
Thanks!
Updating with attempted solution 1:
path = Rails.root.join('data', 'mike/test-csv.csv')
CSV.read(path, {:headers => true, :encoding => 'utf-8'}).each do |d|
puts d
end
Result: 6e5491e7-7f15-417d-a403-345,50.00000000
While this is correct, it ONLY works with puts, for example:
CSV.read(path, {:headers => true, :encoding => 'utf-8'}).map { |row| row }
=> [#<CSV::Row "ID":"\u00006\u0000e\u00005\u00004\u00009\u00001\u0000e\u00007\u0000-\u00007\u0000f\u00001\u00005\u0000-\u00004\u00001\u00007\u0000d\u0000-\u0000a\u00004\u00000\u00003\u0000-345\u0000" "\u0000Q\u0000u\u0000a\u0000n\u0000t\u0000i\u0000t\u0000y":"\u0000\u00005\u00000\u0000.\u00000\u00000\u00000\u00000\u00000\u00000\u00000\u00000\u0000">]
CSV.read(path, {:headers => true, :encoding => 'utf-8'}).map(&:to_s)
=> ["\u00006\u0000e\u00005\u00004\u00009\u00001\u0000e\u00007\u0000-\u00007\u0000f\u00001\u00005\u0000-\u00004\u00001\u00007\u0000d\u0000-\u0000a\u00004\u00000\u00003\u0000-345\u0000,\u0000\u00005\u00000\u0000.\u00000\u00000\u00000\u00000\u00000\u00000\u00000\u00000\u0000\n"]
It's unfortunately still not the correct string :(
Final Solution (via #ashmaroli below):
path = Rails.root.join('data', 'mike/test-csv.csv')
csv_text = ''
File.open(path, 'r') do |csv|
csv.each_line do |line|
csv_text << line.gsub(/\u0000/, '')
end
end
CSV.parse(csv_text, headers:true).map do |row| row end
Result:
[#<CSV::Row "ID":"6e5491e7-7f15-417d-a403-345" "Quantity":"50.00000000">]
Github Gist
Download Example CSV File

path = Rails.root.join('data', 'mike/test-csv.csv')
file = ""
File.open(path, 'r') do |csv|
csv.each_line do |line|
file << line.gsub(/\u0000/, '')
end
end
print file
print file.inspect # same as above just wraps the string in a
# single line with "\n" chars

Related

Invalid Byte Sequence in UTF-8 from Excel file

(Ruby 2.5) I have a method that reads and parses a csv file that's being uploaded via Alchemy CMS
def process_csv(csv_file, current_user_id, original_filename)
lock_importer
errors = []
index = 0
string_converter = lambda { |field| field.strip }
total = CSV.foreach(csv_file, headers: true).count
csv_string = csv_file.read.encode!("UTF-8", "iso-8859-1", invalid: :replace)
CSV.parse(csv_string, headers: true, header_converters: :symbol, skip_blanks: true, converters: [string_converter] ) do |row|
# do other stuff
end
but when I try to upload a csv file that has a column (name) with a string that contains special characters then I receive the Invalid Byte Sequence in UTF-8 error. I'm trying to test the value N'öt Réal Stô'rë.
I've tried a few solutions that I found on the web but no luck - any suggestions?
It's unclear what your csv_fileis. I guess it is a File-object.
Sometimes I got csv from Excel as a UTF-16. So let's try an example:
I have a csv-file stored in UTF-16BE with the following content:
line;comment;UmlautÄ
1;Das ist UTF-16 BE;Ä
2;öüäÖÄÜ;Ä
If I execute the following code:
require 'csv'
def process_csv(csv_file)
csv_string = csv_file.read#.encode!("UTF-8", "iso-8859-1", invalid: :replace)
CSV.parse(csv_string, headers: true, skip_blanks: true, col_sep: ';') do |row|
p row # do other stuff
end
end
process_csv(File.open('example_utf16BE.txt'))
then I get also a Invalid byte sequence in UTF-8-error.
If I use
process_csv(File.open('example_utf16BE.txt', 'rb', encoding: 'BOM|utf-16BE'))
then everything works.
So I guess, you get a File-object in a wron encoding and the code csv_file.read.encode!("UTF-8", "iso-8859-1", invalid: :replace) is a code part to repair this problem.
What you can do:
Add to you code:
p csv_file
p csv_file.external_encoding
You should get
#<File:example_utf16BE.txt>
#<Encoding:UTF-16BE>
Now check, if the file (in my example: example_utf16BE.txt has really the encoding of the 2nd line.
If not, try to adapt the File-object creation.
If this is not possible, then you can try to use csv_file.set_encoding 'utf-8' to change the encoding before you read the content.

Parse csv text with an accent

I am reading of a CSV file and I can't seem to properly store a word with an accent on it.
csv_raw = File.read(self.attachment.path)
csv_text = csv_raw.encode("utf-8", :invalid => :replace)
csv = CSV.parse(csv_text, :headers => true, :encoding => "iso-8859-1:utf-8")
csv.each do |row|
row
A value of row of the csv file I want to parse is Lerías. However, it is getting through as Ler�as

Rails 4.2 - how to fix ascii code in CSV exporting without gem 'iconv'?

When exporting csv in Rails 4.2 app, there are ascii code in the csv output for Chinese characters (UTF8):
中åˆåŒç†Šå·¥ç­‰ç”¨é¤
We tried options in send_data without luck:
send_data #payment_requests.to_csv, :type => 'text/csv; charset=utf-8; header=present'
And:
send_data #payment_requests.to_csv.force_encoding("UTF-8")
In model, there is forced encoding utf8:
# encoding: utf-8
But it does not work. There are online posts talking about use gem iconv. However iconv depends on the platform's ruby version. Is there cleaner solution to fix the ascii in Rails 4.2 csv exporting?
If #payment_requests.to_csv includes ASCII text, then you should use encode method:
#payment_requests.to_csv.encode("UTF-8")
or
#payment_requests.to_csv.force_encoding("ASCII").encode("UTF-8")
depending on which internal encoding #payment_requests.to_csv has.
You can try:
#payment_requests.to_csv.force_encoding("ISO-8859-1")
for Chinese characters
CSV.generate(options) do |csv|
csv << column_names
all.each do |product|
csv << product.attributes.values_at(*column_names)
end
end.encode('gb2312', :invalid => :replace, :undef => :replace, :replace => "?")
This is what worked for me:
head = 'EF BB BF'.split(' ').map{|a|a.hex.chr}.join()
csv_str = CSV.generate(csv = head) do |csv|
csv << [ , , , ...]
#elements.each do |element|
csv << [ , , , ...]
end
end

Generating a PDF With Images from Base64 with Prawn

I am trying to save multiple pngs in one pdf. I'm receiving the PNGs from an API Call to the Endicia Label Server, which is giving me a Base64 Encoded Image as response.
Based on this Question:
How to convert base64 string to PNG using Prawn without saving on server in Rails
def batch_order_labels
#orders = Spree::Order.ready_to_ship.limit(1)
dt = Date.current.strftime("%d %b %Y ")
title = "Labels - #{dt} - #{#orders.count} Orders"
Prawn::Document.generate("#{title}.pdf") do |pdf|
#orders.each do |order|
label = order.generate_label
if order.international?
#image = label.response_body.scan(/<Image PartNumber=\"1\">([^<>]*)<\/Image>/imu).flatten.last
else
#image = label.image
end
file = Tempfile.new('labelimg', :encoding => 'utf-8')
file.write Base64.decode64(#image)
file.close
pdf.image file
pdf.start_new_page
end
end
send_data("#{title}.pdf")
end
But I'm receiving following error:
"\x89" from ASCII-8BIT to UTF-8
Any Idea?
There's no need to write the image data to a tempfile, Prawn::Document#image can accept a StringIO.
Try replacing this:
file = Tempfile.new('labelimg', :encoding => 'utf-8')
file.write Base64.decode64(#image)
file.close
pdf.image file
With this:
require 'stringio'
.....
image_data = StringIO.new( Base64.decode64(#image) )
pdf.image(image_data)
The Problem is, that the Api is returning this thing in UTF-8 - So I dont have a great choice.
Anyhow, I found this solution to be working
file = Tempfile.new('labelimg', :encoding => 'utf-8')
File.open(file, 'wb') do |f|
f.write Base64.decode64(#image)
end
you can't convert the Base64 to UTF-8.
Leave it as plain ASCII:
file = Tempfile.new('labelimg', :encoding => 'ascii-8bit')
file.write Base64.decode64(#image)
file.close
or even better - leave it as binary:
file = Tempfile.new('labelimg')
file.write Base64.decode64(#image)
file.close
UTF-8 is multibite format and it's not usable for transferring binary data such as pics.

Ruby csv import trouble

I tried to import data from csv in my rails app, but something went wrong:
CSV::MalformedCSVError in ArticlesController#index
Unclosed quoted field on line 1.
My csv looks like this:
"Код";"№ по каталогу (артикул)";"Наименование товара";"Ед. изм.";"Цена опт.";"Доп.";"Остатки";"Класс";"Группа";"Бренд";"Блок."
2223;15-562-44;15-562-44 (27-B07-F) VW Polo 95-R ;шт ;37,430;;;Амортизаторы ;Амортизаторы BOGE ;;
10327;24-052-1;24-052-1(46-A27-0) LAND ROVER 84- F ;шт ;68,750;;;Амортизаторы ;Амортизаторы BOGE ;;
10328;24-053-1;24-053-1(46-A28-0) LAND ROVER 84- R ;шт ;68,750;;;Амортизаторы ;Амортизаторы BOGE ;;
Maybe this is because of the first line (which has no ;;)
My code look like this:
def csv_import
require 'csv'
file = File.open("/#{Rails.public_path}/uploads/smallcsv.csv")
#csv = CSV.parse(file)
csv = CSV.open(file, "r:ISO-8859-15:UTF-8", {:col_sep => ";", :row_sep => ";;", :headers => :first_row})
file_path = "/#{Rails.public_path}/uploads/smallcsv.csv"
##parsed_file=CSV::Reader.parse(file_path)
csv.each do |row|
ename = row[2]
eprice = row[5]
eqnt = row[7]
esupp = row[10]
logger.warn(ename)
end
end
I'm running ruby 1.9+ with fastercsv gem
I figured this out myself using "CSV - Unquoted fields do not allow \r or \n (line 2)".
The problem was with the first line, so :auto helped me.

Resources