Parsing XLS Spreadsheet in Rails using Roo Gem - ruby-on-rails

I am trying to parse a XLS file with the roo gem without using a file upload plugin. Unfortunately I can not access the data of the File.
I get the error:
#<File:0x007ffac2282250> is not an Excel file
So roo is not recognizing the file as an Excel file. Do I need to save the file locally to use roo or is there a way around that. I would like to parse the data of the excel file directly into the database.
The params that are coming through:
Parameters: {"utf8"=>"✓", "authenticity_token"=>"yLqOpSK981tDNYjKSoWBh0VnFEKSk0XA/wOt3r+yWJc=", "uploadform"=>{"name"=>"xls", "file"=>#<ActionDispatch::Http::UploadedFile:0x007ffac22b6550 #original_filename="cities2.xls", #content_type="application/octet-stream", #headers="Content-Disposition: form-data; name=\"uploadform[file]\"; filename=\"cities2.xls\"\r\nContent-Type: application/octet-stream\r\n", #tempfile=#<File:/var/folders/qn/70msrkt90pd390sdr14_0g2m0000gn/T/RackMultipart20120306-3729-1m2xcsp>>}, "commit"=>"Save Uploadform"}
I am trying to access the file with
if params[:uploadform][:file].original_filename =~ /.*\.xls$/i
oo = Excel.new(params[:uploadform][:file].open)
rooparse(oo)
end
I also tried params[:uploadform][:file].read and params[:uploadform][:file] already but I think the .open would be the correct method here!?
And would you recommend using paperclip or carrierwave here?
Thank you for your help!

Yes, I can not parse the full file yet but that's another problem. At least I am getting the first row from the table into my database with the following lines:
require 'fileutils'
require 'iconv'
tmp = params[:uploadform][:file].tempfile
file = File.join("public", params[:uploadform][:file].original_filename)
FileUtils.cp tmp.path, file
oo = Excel.new(file)
rooparse(oo)
FileUtils.rm file
Thanks for your input!

Looking at the source for Excel.new, it seems that it wants a file name, not a File object or handler. In other words, it needs string representation of the full path, including filename, to the the file you want to parse. Also, it checks the extension of the file. So if the tempfile doesn't end with ".xls" you'll need to rename the file first.

This is the path:
params[:file].tempfile.path.
You can try this:
Excel.new(params[:uploadform][:file].tempfile.path)

Related

How do I use paperclip and pdf-reader to parse PDF before uploading to S3 bucket?

I'm building a feature that parses a PDF format CV. I have a method that is called on :before_save which handles parsing. I'm able to access the PDF file within this method, before it saves using...
file = cv.queued_for_write[:original]
But then I need to pass the file to PDF::Reader, however, it seems like pdf-reader only accepts paths or URLs to files, not the actual file itself. This approach...
reader = PDF::Reader.new(file)
Throws this error:
ArgumentError (input must be an IO-like object or a filename):
Do I need to save the file to a tmp folder or something and then pass the path to the pdf-reader to parse it? I'm hoping to parse the PDF as quickly as possible, so that doesn't seem ideal. Any advice is appreciated!
I figured out that the "queued_for_write" object has a path attribute.
file = cv.queued_for_write[:original]
So I can just access it like this:
reader = PDF::Reader.new(file.path)

How do I get this ruby on rails app to copy a file to my hard drive?

I have a file coming into my rails app from another website. The POST data looks like this:
Parameters: {"file"=>#<ActionDispatch::Http::UploadedFile:0x007fa03cf0c8d0 #original_filename="Georgia.svg", #content_type="application/octet-stream", #headers="Content-Disposition: form-data; name=\"file\"; filename=\"Georgia.svg\"\r\nContent-Type: application/octet-stream\r\n", #tempfile=#<File:/var/folders/g0/m3jlqvpd4cbc3khznvn5c_7m0000gn/T/RackMultipart20130507-52672-1sw119a>>, "originalFileName"=>"Georgia.ttf"}
My controller code is this:
def target
#incoming_file = params[:file]
file_name = params[:originalFileName]
File.open("/Users/my_home_directory/#{file_name}", "w+b") {|f| f.write(#thing)}
end
Right now, I can create a file on my hard drive that contains a line of text that shows the object.
This is the code from the file created in my hard drive.
<ActionDispatch::Http::UploadedFile:0x007fa03cd1c318>
I can write a file with the name of the uploaded file.
I can't seem to figure out how to write the data from the file to my drive. I'm new to ruby on rails. Help me see what I'm missing.
Thx.
Obvious solution would be the one suggested by Richie Min, but it is a bad solution in terms of memory usage, which might get critical if you start uploading large files. Since
File.open(...) {|f| f.write(#incoming_file.read)}
reads whole uploaded file in memory with #incoming_file.read. Better option would be:
def target
#incoming_file = params[:file]
file_name = params[:originalFileName]
FileUtils.mv #incoming_file.tempfile, "/Users/my_home_directory/#{file_name}"
end
Uploaded data is always stored in temporary files, and UploadedFile.read is actually just a proxy to actual File object, which is accessible trough UploadedFile.tempfile. This, however, could also be not the best solution if temporary folder and destination directory are on different partitions or even on different disk drives, but still much better than reading the whole file in memory in Rails controller.
You can test it with:
curl -X POST -F file=#[some large file] -F originalFileName=somefilename.ext http://[your url]
use
File.open("/Users/my_home_directory/#{file_name}", "w+b") {|f| f.write(#incoming_file.read)}

Modify a csv file with a Ruby script

I have a .xlsx file converted to .csv.I need to write a script to modify this file(change/rename columns etc.) How can I open this .csv file and save it from within the script?
Thanks!
Open the csv file just like you would open any other file in ruby using the standard File api
csv_file = File.open('data.csv', 'r')
Parse it manually or use a library like FasterCSV. Make your modifications, writeback to the file and close. There is nothing inherently special about a csv file, work with it like you would with any file in ruby.
You should proably work with a CSV library (or in the ruby world a gem). So install the gem,
and your code will look something like this:
FasterCSV.foreach("path/to/file.csv") do |row|
# use row here...
end
http://fastercsv.rubyforge.org/
As far as I know, you cannot make inline modifications to the CSV file. You would have to output via another file.

Ruby/Rails CSV parsing, invalid byte sequence in UTF-8

I am trying to parse a CSV file generated from an Excel spreadsheet.
Here is my code
require 'csv'
file = File.open("input_file")
csv = CSV.parse(file)
But I get this error
ArgumentError: invalid byte sequence in UTF-8
I think the error is because Excel encodes the file into ISO 8859-1 (Latin-1) and not in UTF-8
Can someone help me with a workaround for this issue, please
Thanks in advance.
You need to tell Ruby that the file is in ISO-8859-1. Change your file open line to this:
file=File.open("input_file", "r:ISO-8859-1")
The second argument tells Ruby to open read only with the encoding ISO-8859-1.
Specify the encoding with encoding option:
CSV.foreach(file.path, headers: true, encoding:'iso-8859-1:utf-8') do |row|
...
end
You can supply source encoding straight in the file mode parameter:
CSV.foreach( "file.csv", "r:windows-1250" ) do |row|
<your code>
end
If you have only one (or few) file, so when its not needed to automatically declare encoding on whatever file you get from input, and you have the contents of this file visible in plaintext (txt, csv etc) separated with i.e. semicolon, you can create new file with .csv extension manually, and paste the contents of your file there, then parse the contents like usual.
Keep in mind, that this is a workaround, but in need of parsing in linux only one big excel file, converted to some flavour of csv, it spares time on experimenting with all those fancy encodings
Save the file in utf-8, unless for some reason you need to save it differently in which case you may specify the encoded set while reading the file
add second argument "r:ISO-8859-1" as File.open("input_file","r:ISO-8859-1" )
I had this same problem and was just using google spreadsheets and then downloading as a CSV. That was the easiest solution.
Then I came across this gem
https://github.com/singlebrook/utf8-cleaner
Now I don't need to worry about this issue at all. Hope this helps!

rails reading lines of file before upload with paperclip

I am trying to upload a file in rails (using paperclip), and I want to process some of the file data before letting paperclip send it off to s3 storage. In my controller, I just grab the file parameter (which does give me a file) and then I try to read the lines into an array
csv_file = params[:activity][:data]
array = IO.readlines(csv_file.path)
The problem is, I'm only getting the last line of the file. I tried using .rewind, but still get just the last line.
I dislike readlines and I always use regular expressions. Try this.
End of line - \n
Handy block structure to ensure that the file handle is closed:
File.open(csv_file.path) do |f|
a = f.readlines
process a...
end
Reading a whole file into memory might not be a good idea depending on the size of the files.

Resources