How do I parse an Excel file that will give me data exactly as it appears visually? - ruby-on-rails

I'm on Rails 5 (Ruby 2.4). I want to read an .xls doc and I would like to get the data into CSV format, just as it appears in the Excel file. Someone recommended I use Roo, and so I have
book = Roo::Spreadsheet.open(file_location)
sheet = book.sheet(0)
text = sheet.to_csv
arr_of_arrs = CSV.parse(text)
However what is getting returned is not the same as what I see in the spreadsheet. For isntance, a cell in the spreadsheet has
16:45.81
and when I get the CSV data from above, what is returned is
"0.011641319444444444"
How do I parse the Excel doc and get exactly what I see? I don't care if I use Roo to parse or not, just as long as I can get CSV data that is a representation of what I see rather than some weird internal representation. For reference the file type I was parsing givies this when I run "file name_of_file.xls" ...
Composite Document File V2 Document, Little Endian, Os: Windows, Version 5.1, Code page: 1252, Author: Dwight Schroot, Last Saved By: Dwight Schroot, Name of Creating Application: Microsoft Excel, Create Time/Date: Tue Sep 21 17:05:21 2010, Last Saved Time/Date: Wed Oct 13 16:52:14 2010, Security: 0

You need to save the custom formula in a text format on the .xls side. If your opening the .xls file from the internet this won't work but this will fix your problem if you can manipulate the file. You can do this using the function =TEXT(A2, "mm:ss.0") A2 is just the cell I'm using as an example.
book = ::Roo::Spreadsheet.open(file_location)
puts book.cell('B', 2)
=> '16.45.8'
If manipulating the file is not an option you could just pass a custom converter to CSV.new() and convert the decimal time back to the correct format you need.
require 'roo-xls'
require 'csv'
CSV::Converters[:time_parser] = lambda do |field, info|
case info[:header].strip
when "time" then begin
# 0.011641319444444444 * 24 hours * 3600 seconds = 1005.81
parse_time = field.to_f * 24 * 3600
# 1005.81.divmod(60) = [16, 45.809999999999999945]
mm, ss = parse_time.divmod(60)
# returns "16:45.81"
time = "#{mm}:#{ss.round(2)}"
time
rescue
field
end
else
field
end
end
book = ::Roo::Spreadsheet.open(file_location)
sheet = book.sheet(0)
csv = CSV.new(sheet.to_csv, headers: true, converters: [:time_parser]).map {|row| row.to_hash}
puts csv
=> {"time "=>"16:45.81"}
{"time "=>"12:46.0"}

Under the hood roo-xls gem uses the spreadsheet gem to parse the xls file. There was a similar issue to yours logged here, but it doesn't appear that there was any real resolution. Internally xls stores 16:45.81 as a Number and associates some formatting with it. I believe the issue has something to do with the spreadsheet gem not correctly handling the cell format.
I did try messing around with adding a format mm:ss.0 by following this guide but I couldn't get it to work, maybe you'll have more luck.

You can use converters option. It seems looking like this:
arr_of_arrs = CSV.parse(text, {converters: :date_time})
http://ruby-doc.org/stdlib-2.0.0/libdoc/csv/rdoc/CSV.html

Your problem seems to be with the way you're parsing (reading) the input file.
roo parses only Excel 2007-2013 (.xlsx) files. From you question, you want to parse .xls, which is a different format.
Like the documentation says, use the roo-xls gem instead.

Related

How to open base64 spreadsheet on Ruby

I've been trying to manipulate a file that's base64 encoded that I'm recieving from my client.
I'm currently using https://github.com/zdavatz/spreadsheet/blob/master/GUIDE.md to manipulate it, however, there doesn't appear to be any way to open a file directly from the base64 blob, or should I write it and then read from it? can't that a potential security threat for the server?
for example, if I recieve a file :
file = params[:file] with contents:
data:application/vnd.ms-excel;base64,0M8R4KGxGuEAAAAAAAAAAAAAAAAAAAAAOwADAP7
(should I remove the data:application/vnd.ms-excel;base64, ?)
I'd like to open it with this:
Spreadsheet.client_encoding = 'UTF-8'
book = Spreadsheet.open "#{Rails.root}/app/assets/spreadsheet/event.xls"
(or with a blob or temp fle)
Sorry if it's pretty obvious, been looking for hours and there's not much info about it available, tried creating a temp file first but I don't think that's supported and there's not much I can get from the docs.
Shot in the dark: Maybe decode it, write to binary-enabled tempfile, and then feed that to Spreadsheet?
tmpfile = Tempfile.new.binmode
tmpfile << Base64.decode64(params[:file])
tmpfile.rewind
book = Spreadsheet.open(tmpfile)

How to correctly handle character encoding when using Postgresql's copy_data function?

In my Rails app, I managed to stream large CSV files directly from Postgres based on solutions mentioned in this SO post. My working code looks somewhat like so:
query = <A Long SQL Query String>
response.headers["Cache-Control"] = "no-cache"
response.headers["Content-Type"] = "text/csv; charset=utf-8"
response.headers["Content-Disposition"] =
%(attachment; filename="#{csv_filename}")
response.headers["Last-Modified"] = Time.now.ctime.to_s
conn = ActiveRecord::Base.connection.raw_connection
conn.copy_data("COPY (#{query}) TO STDOUT WITH (FORMAT CSV, HEADER TRUE, FORCE_QUOTE *, ESCAPE E'\\\\');") do
while row = conn.get_copy_data
response.stream.write row
end
end
response.stream.close
end
Some of the columns (VARCHAR) being queried have values as either English or Chinese strings. The CSV file resulting from the above code doesn’t show the Chinese characters as is. Instead, I get something like this:
大大 文文
Am I supposed to change the way I’m using the copy_data function, or is there something I could do to the CSV file to solve this? I’ve tried saving the file as UTF-8 .txt file, as well as trying the convert_to function mentioned in the copy_data documentation, but to no avail.
This depends of the original encoding included in the CSV file.
Do this on Linux :
file -i you_file
Are you sure it's not UTF-16 or GB 18030 ?
And also in what kind of encoding is setup your database ?
do a \l in psql to see this.
So it boiled down to my MS Excel not being able to render the Chinese chars correctly. On MacOS, opening the same .csv file using the Numbers app (or even Atom, for that matter) resolved this issue for me.

Importing a CSV to Rails database

I asked this question earlier this week, and it worked fine. I just tried it with a slightly bigger spreadsheet and it doesn't seem to work for some reason.
My code is as follows:
require 'roo'
xlsx = Roo::Spreadsheet.open(File.expand_path('../Downloads/unistats/LOCATION.csv'))
xlsx.each_row_streaming(offset: 1) do |row|
Location.find_or_create_by(ukprn: row[0].value, accomurl: row[1].value, instbeds: row[3].value, instlower: row[4].value, instupper: row[5].value, locid: row[6].value, locname: row[7].value, lat: row[9].value, long: row[10].value, locukprn: row[11].value, loccountry: row[12].value, privatelower: row[13].value, privateupper: row[14].value, suurl: row[15].value)
end
But unlike last time, this is coming up with this error:
NoMethodError: undefined method `each_row_streaming' for #<Roo::CSV:0xb9e0b78>
Did you mean? each_row_using_tempdir
This file is a CSV rather than .xlsx but that shouldn't make a difference.
Any ideas what I'm doing wrong?
It does actually makes a difference that you're trying to read a CSV file using the Excel methods.
Excerpts from the Roo documentation.
# Load a CSV file
s = Roo::CSV.new("mycsv.csv")
# Load a tab-delimited csv
s = Roo::CSV.new("mytsv.tsv", csv_options: {col_sep: "\t"})
# Load a csv with an explicit encoding
s = Roo::CSV.new("mycsv.csv", csv_options: {encoding: Encoding::ISO_8859_1})
A neat way to read both Excel and CSV files is to do something like
if File.extname(filename).start_with?('xls')
workbook = Roo::Excel.new(filename)
else
workbook = Roo::CSV.new(filename)
end
workbook.default_sheet = workbook.sheets[0]
(workbook.first_row..workbook.last_row).each do |line|
...
end

Ole::Storage::FormatError: OLE2 signature is invalid

I want to read an Excel File in my Rails Application.
This is how I open my Excel file and read it.
doc = Spreadsheet.open('./try.xls', "r")
sheet = doc.worksheet 0
sheet.each do |row|
array_rows << row.to_a
end
I have it as a rake task.When I try to Read this file it throws an error.
Ole::Storage::FormatError: OLE2 signature is invalid
What is happening? what should I do?
The .xls file must be saved in EXCEL 2003 format. So
File-->Save As
from All Formats dropdown select the Excel year 2003
This solved my problem
On Mac I had to save it as Excel 97-2004(.xls) to get it to work

Spreadsheet - encoding problem with reading cyrillic characters

I'm working on a rails app for a small shop. It needs to load an .xls file, parse it and maybe load to the database.
I use Spreadsheet gem to work with the file.
The problem is that the file contains russian characters which are displayed as "└ÛÛ.ExT H-1727F (ÓÝÓÙ¯Ò GP T304)"
The reference says, I need to specify the encoding, but I don't know which one is used in this file. I tried "win-1251" but it gave me an error about being unable to find a "utf-8 to win-1251 converter"
I've setting encoding to "WINDOWS-1251" but it gave me this error:
U+00BE to WINDOWS-1251 in conversion from CP850 to UTF-8 to WINDOWS-1251
So then I've tried CP850, which didn't throw an error, but the characters were still not readable.
There's not much code really.
# -*- encoding : utf-8 -*-
...
def show
require 'spreadsheet'
Spreadsheet.client_encoding = 'UTF-8'
book = Spreadsheet.open 'c:\rails\renergy23\public\price-16-04-11.xls'
#sheet = book.worksheet 0
end
For simpicity I don't load it to the database right now. Instead I output it in my view:
- 30.times do |i|
= #sheet.row i+10
%br
http://dl.dropbox.com/u/4976861/price-16-04-11.xls
I kinda solved this after 1.5 months by first saving the document in .xlsx and then saving it in .xls (97-2003). I couldn't use the .xlsx because of some weird OLE signature incorrect error.

Resources