Parse remote csv on Rails 4 - ruby-on-rails

I keep getting the error file name is too long.
I am running rails on Heroku so I am trying to have an uploaded file saved on cloud, and then imported so it is not lost on their dyno.
I want to create a new object for each row in the csv. Parsing the CSV has worked perfectly before in development when using a temp file. But I have to change this for Heroku.
What is wrong about my code for the remote csv being parsed correctly?
def self.import_open_order(file_url)
open(file_url) do |file|
CSV.parse(self.parse_headers(file.read), headers: true) do |row|
...

This fixed it
def self.import_open_order(file)
imported_file = open(file)
CSV.parse(self.parse_headers(imported_file), headers: true) do |row|
Since open(file).class = Tempfile... I was able to just create the Tempfile and pass it through CSV.parse
I swear I had already tried this but now it works!

Related

Rails Import/Parse from CSV UTF-8 Missing Column

So I'm working on allowing users to import data from a CSV file. Right now all the fields will import correctly, except whatever is the first field.
What I've discovered is the file type is affecting the import.
My code looks like:
class Import < Operation
require 'csv'
def call(file, training_event_id)
csv_data = CSV.parse(file.read, headers: true)
list_occo = []
csv_data.each do |row|
occupant = Occupant.new
occupant.account_number = row['Account Number']
occupant.check_in = row['Check In']
binding.pry
occupant.training_event_id = training_event_id
list_occo << occupant
end
binding.pry
occo_errors = check_file(list_occo)
list_occo.each(&:save) if occo_errors.empty?
return occo_errors
end
When I do the binding.pry and check on occupant I'm getting nil on the Account Number when doing CSV UTF-8. If I switch to straight up CSV not an issue. Is there a way to convert/switch a CSV UTF-8 to CSV? I thought/tried using some sort of encoding on the parse like: encoding: 'iso-8859-1' but that didn't work.
Is there a way to convert the CSV UTF-8 or is there a way to do a straight up file format check to ensure it's CSV and not CSV UTF-8?
Just in case someone comes across this issue in the future. I looked at the file in the rails console using CSV.read(file.path) and noticed U+FEFF preceding the first column header. There's a rabbit hole of information about BOM and UTF-8 issues. Without wanting to do a CSV/File.open I attempted things like doing a split, gsub, file checks on utf-8, etc. Then I simply changed the csv_data line to be:
csv_data = CSV.parse(File.read(file, encoding: 'bom|utf-8'), headers: true)
Then in my controller I updated it from (params[:file]) to (params[:file].path) as I was getting an error of
no implicit conversion of ActionDispatch::Http::UploadedFile into
String
Hopefully this helps someone else.

Importing a CSV to Rails database

I asked this question earlier this week, and it worked fine. I just tried it with a slightly bigger spreadsheet and it doesn't seem to work for some reason.
My code is as follows:
require 'roo'
xlsx = Roo::Spreadsheet.open(File.expand_path('../Downloads/unistats/LOCATION.csv'))
xlsx.each_row_streaming(offset: 1) do |row|
Location.find_or_create_by(ukprn: row[0].value, accomurl: row[1].value, instbeds: row[3].value, instlower: row[4].value, instupper: row[5].value, locid: row[6].value, locname: row[7].value, lat: row[9].value, long: row[10].value, locukprn: row[11].value, loccountry: row[12].value, privatelower: row[13].value, privateupper: row[14].value, suurl: row[15].value)
end
But unlike last time, this is coming up with this error:
NoMethodError: undefined method `each_row_streaming' for #<Roo::CSV:0xb9e0b78>
Did you mean? each_row_using_tempdir
This file is a CSV rather than .xlsx but that shouldn't make a difference.
Any ideas what I'm doing wrong?
It does actually makes a difference that you're trying to read a CSV file using the Excel methods.
Excerpts from the Roo documentation.
# Load a CSV file
s = Roo::CSV.new("mycsv.csv")
# Load a tab-delimited csv
s = Roo::CSV.new("mytsv.tsv", csv_options: {col_sep: "\t"})
# Load a csv with an explicit encoding
s = Roo::CSV.new("mycsv.csv", csv_options: {encoding: Encoding::ISO_8859_1})
A neat way to read both Excel and CSV files is to do something like
if File.extname(filename).start_with?('xls')
workbook = Roo::Excel.new(filename)
else
workbook = Roo::CSV.new(filename)
end
workbook.default_sheet = workbook.sheets[0]
(workbook.first_row..workbook.last_row).each do |line|
...
end

Ruby on Rails - CSV file not storing to file path

I have a file path but i don't know how do i store the csv file to file path and i tried code below but i didn't get file from that file-path.I want uploaded csv file store to some location
File.join(file-path, filename)
if you want to open CSV there a couple options but the code above has a syntactic error.
you can try this:
parsed_file = CSV.parse(File.open('/Users/yourname/Desktop/' + 'file.csv', 'wb'))
parsed_file.each do |row|
puts row[0] # will print first column of each row
end

CSV file encoding in Rails with S3 and Heroku

My rails app uploads CSV files to S3, then subsequently pulls them down into a tempfile to send each row's data to a Sidekiq worker. I'm using Carrierwave and fog to handle the uploading.
This all worked beautifully until recently switching to Heroku, and now, when trying to create my tempfile I get the following error:
Error type Encoding::UndefinedConversionError
Error message "\xA2" from ASCII-8BIT to UTF-8
I've tried setting the encoding when creating the tempfile as well as working with the CSV file and continue to get the same error. I cannot reproduce this error on my local machine, which has made this entire process that much more fun :)
Currently, my Sidekiq worker calls the following method:
def upload_csv(filename, file_path)
file = Tempfile.new(filename, Rails.root.join('tmp'), encoding: "ISO8859-1:utf-8").tap do |f|
open(file_path).rewind
f.write(open(file_path).read)
f.close
end
CSV.foreach(file, headers: true, encoding: "ISO8859-1:utf-8")do |row|
#do stuff to rows
end
end
I understand the very basics of encoding, but I'm super stuck on this. Any insight would be appreciated.
Thanks!
Not sure if this will help anyone else, but I found a solution that works for me:
def upload_csv(filename, file_path)
file = Tempfile.new(filename, Rails.root.join('tmp')).tap do |f|
open(file_path).rewind
f.write(open(file_path).read.force_encoding('utf-8'))
f.close
end
CSV.foreach(file, headers: true)do |row|
#do stuff to rows
end
end
Even though I could confirm that the file was UTF-8 encoded before it was uploaded, open(#file_path).read.encoding returning an ASCII-8BIT encoding. It was getting confused on how to write the file and convert it from ASCII-8BIT to UTF-8.

Generating a CSV and uploading it to S3 when finished in a background job

I'm providing users with the ability to download an extremely large amount of data via CSV. To do this, I'm using Sidekiq and putting the task off into a background job once they've initiated it. What I've done in the background job is generate a csv containing all of the proper data, storing it in /tmp and then call save! on my model, passing the location of the file to the paperclip attribute which then goes off and is stored in S3.
All of this is working perfectly fine locally. My problem now lies with Heroku and it's ability to store files for a short duration dependent on what node you're on. My background job is unable to find the tmp file that gets saved because of how Heroku deals with these files. I guess I'm searching for a better way to do this. If there's some way that everything can be done in-memory, that would be awesome. The only problem is that paperclip expects an actual file object as an attribute when you're saving the model. Here's what my background job looks like:
class CsvWorker
include Sidekiq::Worker
def perform(report_id)
puts "Starting the jobz!"
report = Report.find(report_id)
items = query_ranged_downloads(report.start_date, report.end_date)
csv = compile_csv(items)
update_report(report.id, csv)
end
def update_report(report_id, csv)
report = Report.find(report_id)
report.update_attributes(csv: csv, status: true)
report.save!
end
def compile_csv(items)
clean_items = items.compact
path = File.new("#{Rails.root}/tmp/uploads/downloads_by_title_#{Process.pid}.csv", "w")
csv_string = CSV.open(path, "w") do |csv|
csv << ["Item Name", "Parent", "Download Count"]
clean_items.each do |row|
if !row.item.nil? && !row.item.parent.nil?
csv << [
row.item.name,
row.item.parent.name,
row.download_count
]
end
end
end
return path
end
end
I've omitted the query method for readabilities sake.
I don't think Heroku's temporary file storage is the problem here. The warnings around that mostly center around the facts that a) dynos are ephemeral, so anything you write can and will disappear without notice; and b) dynos are interchangeable, so the presence of inter-request tempfiles are a matter of luck when you have more than one web dyno running. However, in no situation do temporary files just vanish while your worker is running.
One thing I notice is that you're actually creating two temporary files with the same name:
> path = File.new("/tmp/filename", "w")
=> #<File:/tmp/filename>
> path.fileno
=> 3
> CSV.open(path, "w") do |csv| csv << %w(foo bar baz); puts csv.fileno end
4
=> nil
You could change the path = line to just set the filename (instead of opening it for writing), and then make update_report open the filename for reading. I haven't dug into what Paperclip does when you give it an empty, already-overwritten, opened-for-writing file handle, but changing that flow may well fix the issue.
Alternately, you could do this in memory instead: generate the CSV as a string and give it to Paperclip as a StringIO. (Paperclip supports certain non-file objects, including StringIOs, using e.g. Paperclip::StringioAdapter.) Try something like:
# returns a CSV as a string
def compile_csv(items)
CSV.generate do |csv|
# ...
end
end
def update_report(report_id, csv)
report = Report.find(report_id)
report.update_attributes(csv: StringIO.new(csv), status: true)
report.save!
end

Resources