Rails - compressing CSV files - ruby-on-rails

I'm trying to attach a zipped CSV file to an e-mail without any joy. I've tried following http://api.rubyonrails.org/classes/ActiveSupport/Gzip.html:
class UserExportProcessor
#queue = :user_export_queue
def self.perform(person_id, collection_ids)
person = Person.unscoped.find(person_id)
collection = Person.unscoped.where(id: [49522, 70789])
file = ActiveSupport::Gzip.compress(collection.to_csv)
PersonMailer.people_export(person, file).deliver
end
end
This sends an attachment - still as a CSV file - filled with symbols (no letters or numbers).
When I try and remove the compression:
class UserExportProcessor
#queue = :user_export_queue
def self.perform(person_id, collection_ids)
person = Person.unscoped.find(person_id)
collection = Person.unscoped.where(id: [49522, 70789])
PersonMailer.people_export(person, collection.to_csv).deliver
end
end
The system e-mails the CSV file as it should and the CSV is properly formed. What am I doing wrong? Do I need to make a new file out of the compressed data? I've tried various approaches with no joy..
Thanks in advance
EDIT: I'm trying
class UserExportProcessor
require 'zip'
#queue = :user_export_queue
def self.perform(person_id, collection_ids)
person = Person.unscoped.find(person_id)
collection = Person.unscoped.where(id: [49522, 70789])
file = Zip::ZipFile.open("files.zip", Zip::ZipFile::CREATE) { |zipfile|
puts zipfile.get_output_stream(collection.to_csv)
zipfile.mkdir("a_dir")
end
PersonMailer.people_export(person, file).deliver
end
end
However this fails with:
Errno::ENAMETOOLONG: File name too long - /Users/mark/projects/bla/Role,Title,First Name,Last Name,Address 1,Address 2,Address 3,City,Postcode,Country,Email,Telephone,Mobile,Job Title,Company,Area of work,Department,Regions,Account Manager,Sales Coordinator,Production Studios,Production Partners,Genres,Last login,Created date
Is there any way for me to set the file name with the above approach?

The mistake probably happens in the code of PersonMailer.people_export (which you didn't include). So this is my best guess: you probably add the attachment and do not properly define the mime-type.
Make sure you set the correct file extension when you add the attachment:
http://edgeguides.rubyonrails.org/action_mailer_basics.html#adding-attachments
something along this line should work:
attachments['archive.zip'] = ActiveSupport::Gzip.compress(collection.to_csv)

Related

Reading text from a PDF works in Rails console but not in Rails application

I have a simple one-page searchable PDF that is uploaded to a Rails 6 application model (Car) using Active Storage. I can extract the text from the PDF using the 'tempfile' and 'pdf-reader' gems in the Rails console:
> #car.creport.attached?
=> true
> f = Tempfile.new(['file', '.pdf'])
> f.binmode
> f.write(#car.creport.blob.download)
> r = PDF::Reader.new(f.path.to_s)
> r.pages[1].text
=> "Welcome to the ABC Car Report for January 16, 20...
But, if I try the same thing in the create method of my cars_controller.rb, it doesn't work:
# cars_controller.rb
...
def create
#car = Car.new(car_params)
#car.filetext = ""
f = Tempfile.new(['file', '.pdf'])
f.binmode
f.write(#car.creport.blob.download)
r = PDF::Reader.new(f.path.to_s)
#car.filetext = r.pages[1].text
...
end
When I run the Rails application I can create a new Car and select a PDF file to attach. But when I click 'Submit' I get a FileNotFoundError in cars_controller.rb at the f.write() line.
My gut instinct is that the controller is trying to read the blob in order to write it to the temp file too soon (i.e., before the blob has even been written). I tried inserting a sleep(2) to give it time, but I get the same FileNotFoundError.
Any ideas?
Thank you!
I don't get why you're jumping through so many hoops. And using .download without a block loads the entire file into memory (yikes). If #car.creport is an ActiveStorage attachment you can just use the open method instead:
#car.creport.blob.open do |file|
file.binmode
r = PDF::Reader.new(file) # just pass the IO object
#car.filetext = r.pages[1].text
end if #car.creport
This steams the file to disk instead (as a tempfile).
If you're just taking file input via a plain old file input you will get a ActionDispatch::Http::UploadedFile in the parameters that also is extemely easy to open:
params[:file].open do |file|
file.binmode
r = PDF::Reader.new(file) # just pass the IO object
#car.filetext = r.pages[1].text
end if params[:file].respond_to?(:open)
The difference looks like it's with your #car variable.
In the console you have a blob attached (#car.creport.attached? => true). In your controller, you're initializing a new instance of the Car class, so unless you have some initialization going on that attaches something in the background, that will be nil.
Why that would return a 'file not found' error I'm not sure, but from what I can see that's the only difference between code samples. You're trying to write #car.creport.blob.download, which is present on #car in console, but nil in your controller.

Rails 5.2 Import XLSX with ActiveStorage and Creek

I have a model called ImportTemp which is used for store imported XLSX file to database. I'm using ActiveStorage for storing the files.
this is the model code:
class ImportTemp < ApplicationRecord
belongs_to :user
has_one_attached :file
has_one_attached :log_result
end
this is my import controller code:
def import
# Check filetype
case File.extname(params[:file].original_filename)
when ".xlsx"
# Add File to ImportFile model
import = ImportTemp.new(import_type: 'UnitsUpload', user: current_user)
import.file.attach(params[:file])
import.save
# Import unit via sidekiq with background jobs
ImportUnitWorker.perform_async(import.id)
# Notice
flash.now[:notice] = "We are processing your xlsx, we will inform you after it's done via notifications."
# Unit.import_file(xlsx)
else flash.now[:error] = t('shared.info.unknown')+": #{params[:file].original_filename}"
end
end
after upload the xlsx file, then the import will be processed in sidekiq. This is the worker code (still doesn't do the import actually) :
class ImportUnitWorker
include Sidekiq::Worker
sidekiq_options retry: false
def perform(file_id)
import_unit = ImportTemp.find(file_id)
# Open the uploaded xlsx to Creek
creek = Creek::Book.new(Rails.application.routes.url_helpers.rails_blob_path(import_unit.file, only_path: true))
sheet = creek.sheets[0]
puts "Opening Sheet #{sheet.name}"
sheet.rows.each do |row|
puts row
end
units = []
# Unit.import(units)
end
but after i tried it, it gives me error:
Zip::Error (File /rails/active_storage/blobs/eyJfcmFpbHMiOnsibWVzc2FnZSI6IkJBaHBCZz09IiwiZXhwIjpudWxsLCJwdXIiOiJibG9iX2lkIn19--3960b6ba5b55f7004e09967d16dfabe63f09f0a9/2018-08-10_10_39_audit_gt.xlsx not found)
but if i tried to open it with my browser, which is the link looks like this:
http://localhost:3000/rails/active_storage/blobs/eyJfcmFpbHMiOnsibWVzc2FnZSI6IkJBaHBCZz09IiwiZXhwIjpudWxsLCJwdXIiOiJibG9iX2lkIn19--3960b6ba5b55f7004e09967d16dfabe63f09f0a9/2018-08-10_10_39_audit_gt.xlsx
it's working and the xlsx is downloaded. My question is what's wrong with it? why the file is not found in the sidekiq?
I ended up using Tempfile as suggested by George Claghorn. I don't know if this is the best solution or best practices, but it works for me now. I'm going to use this solution while waiting Rails 6 stable to come out with ActiveStorage::Blob#open feature.
def perform(file_id)
import = ImportTemp.find(file_id)
temp_unit = Tempfile.new([ 'unit_import_temp', '.xlsx' ], :encoding => 'ascii-8bit')
units = []
begin
# Write xlsx from ImportTemp to Tempfile
temp_unit.write(import.file.download)
# Open the temp xlsx to Creek
book = Creek::Book.new(temp_unit.path)
sheet = book.sheets[0]
sheet.rows.each do |row|
# Skip the header
next if row.values[0] == 'Name' || row.values[1] == 'Abbreviation'
cells = row.values
# Add cells to new Unit
unit = Unit.new(name: cells[0], abbrev: cells[1], desc: cells[2])
units << unit
end
# Import the unit
Unit.import(units)
ensure
temp_unit.close
temp_unit.unlink # deletes the temp file
end
end
Rails.application.routes.url_helpers.rails_blob_path doesn’t return the path to the file on disk. Rather, it returns a path that can be combined with a hostname to produce an URL for downloading the file, for use in links.
You have two options:
If you’d prefer to keep ImportUnitWorker indifferent to the storage service in use, “download” the file to a tempfile on disk. Switch to Rails master and use ActiveStorage::Blob#open:
def perform(import_id)
import = ImportTemp.find(import_id)
units = []
import.file.open do |file|
book = Creek::Book.new(file.path)
sheet = creek.sheets[0]
# ...
end
Unit.import(units)
end
If you don’t mind ImportWorker knowing that you use the disk service, ask the service for the path to the file on disk. ActiveStorage::Service::DiskService#path_for(key) is private in Rails 5.2, so either forcibly call it with send or upgrade to Rails master, where it’s public:
def perform(import_id)
import = ImportTemp.find(import_id)
units = []
path = ActiveStorage::Blob.service.send(:path_for, import.file.key)
book = Creek::Book.new(path)
sheet = creek.sheets[0]
# ...
Unit.import(units)
end
The answer now seems to be (unless I am missing something):
Creek::Book.new file.service_url, check_file_extension: false, remote: true

How to handle a file_as_string (generated by Prawn) so that it is accepted by Carrierwave?

I'm using Prawn to generate a PDF from the controller of a Rails app,
...
respond_to do |format|
format.pdf do
pdf = GenerateReportPdf.new(#object, view_context)
send_data pdf.render, filename: "Report", type: "application/pdf", disposition: "inline"
end
end
This works fine, but I now want to move GenerateReportPdf into a background task, and pass the resulting object to Carrierwave to upload directly to S3.
The worker looks like this
def perform
pdf = GenerateReportPdf.new(#object)
fileString = ???????
document = Document.new(
object_id: #object.id,
file: fileString )
# file is field used by Carrierwave
end
How do I handle the object returned by Prawn (?????) to ensure it is a format that can be read by Carrierwave.
fileString = pdf.render_file 'filename' writes the object to the root directory of the app. As I'm on Heroku this is not possible.
file = pdf.render returns ArgumentError: string contains null byte
fileString = StringIO.new( pdf.render_file 'filename' ) returns TypeError: no implicit conversion of nil into String
fileString = StringIO.new( pdf.render ) returns ActiveRecord::RecordInvalid: Validation failed: File You are not allowed to upload nil files, allowed types: jpg, jpeg, gif, png, pdf, doc, docx, xls, xlsx
fileString = File.open( pdf.render ) returns ArgumentError: string contains null byte
....and so on.
What am I missing? StringIO.new( pdf.render ) seems like it should work, but I'm unclear why its generating this error.
It turns out StringIO.new( pdf.render ) should indeed work.
The problem I was having was that the filename was being set incorrectly and, despite following the advise below on Carrierwave's wiki, a bug elsewhere in the code meant that the filename was returning as an empty string. I'd overlooked this an assumed that something else was needed
https://github.com/carrierwaveuploader/carrierwave/wiki/How-to:-Upload-from-a-string-in-Rails-3
my code ended up looking like this
def perform
s = StringIO.new(pdf.render)
def s.original_filename; "my file name"; end
document = Document.new(
object_id: #object.id
)
document.file = s
document.save!
end
You want to create a tempfile (which is fine on Heroku as long as you don't expect it to persist across requests).
def perform
# Create instance of your Carrierwave Uploader
uploader = MyUploader.new
# Generate your PDF
pdf = GenerateReportPdf.new(#object)
# Create a tempfile
tmpfile = Tempfile.new("my_filename")
# set to binary mode to avoid UTF-8 conversion errors
tmpfile.binmode
# Use render to write the file contents
tmpfile.write pdf.render
# Upload the tempfile with your Carrierwave uploader
uploader.store! tmpfile
# Close the tempfile and delete it
tmpfile.close
tmpfile.unlink
end
Here's a way you can use StringIO like Andy Harvey mentioned, but without adding a method to the StringIO intstance's eigenclass.
class VirtualFile < StringIO
attr_accessor :original_filename
def initialize(string, original_filename)
#original_filename = original_filename
super(string)
end
end
def perform
pdf_string = GenerateReportPdf.new(#object)
file = VirtualFile.new(pdf_string, 'filename.pdf')
document = Document.new(object_id: #object.id, file: file)
end
This one took me couple of days, the key is to call render_file controlling the filepath so you can keep track of the file, something like this:
in one of my Models e.g.: Policy i have a list of documents and this is just the method for updating the model connected with the carrierwave e.g.:PolicyDocument < ApplicationRecord mount_uploader :pdf_file, PdfDocumentUploader
def upload_pdf_document_file_to_s3_bucket(document_type, filepath)
policy_document = self.policy_documents.where(policy_document_type: document_type)
.where(status: 'processing')
.where(pdf_file: nil).last
policy_document.pdf_file = File.open(file_path, "r")
policy_document.status = 's3_uploaded'
policy_document.save(validate:false)
policy_document
rescue => e
policy_document.status = 's3_uploaded_failed'
policy_document.save(validate:false)
Rails.logger.error "Error uploading policy documents: #{e.inspect}"
end
end
in one of my Prawn PDF File Generators e.g.: PolicyPdfDocumentX in here please note how im rendering the file and returning the filepath so i can grab from the worker object itself
def generate_prawn_pdf_document
Prawn::Document.new do |pdf|
pdf.draw_text "Hello World PDF File", size: 8, at: [370, 462]
pdf.start_new_page
pdf.image Rails.root.join('app', 'assets', 'images', 'hello-world.png'), width: 550
end
end
def generate_tmp_file(filename)
file_path = File.join(Rails.root, "tmp/pdfs", filename)
self.generate_prawn_pdf_document.render_file(file_path)
return filepath
end
in the "global" Worker for creating files and uploading them in the s3 bucket e.g.: PolicyDocumentGeneratorWorker
def perform(filename, document_type, policy)
#here we create the instance of the prawn pdf generator class
pdf_generator_class = document_type.constantize.new
#here we are creating the file, but also `returning the filepath`
file_path = pdf_generator_class.generate_tmp_file(filename)
#here we are simply updating the model with the new file created
policy.upload_pdf_document_file_to_s3_bucket(document_type, file_path)
end
finally how to test, run rails c and:
the_policy = Policies.where....
PolicyDocumentGeneratorWorker.new.perform('report_x.pdf', 'PolicyPdfDocumentX',the_policy)
NOTE: im using meta-programming in case we have multiple and different file generators, constantize.new is just creating new prawn pdf doc generator instance so is similar to PolicyPdfDocument.new that way we can only have one pdf doc generator worker class that can handle all of your prawn pdf documents so for instance if you need a new document you can simply PolicyDocumentGeneratorWorker.new.perform('report_y.pdf', 'PolicyPdfDocumentY',the_policy)
:D
hope this helps someone to save some time

write stream to paperclip

I want to store received email attachment with usage of paperclip. From email I get part.body and I have no idea how to put it to paperclip'ed model. For now I create temporary file and write port.body to it, store this file to paperclip, and delete file. Here is how I do it with temporary file:
l_file = File.open(l_path, "w+b", 0644)
l_file.write(part.body)
oAsset = Asset.new(
:email_id => email.id,
:asset => l_file,
:header => h,
:original_file_name => o,
:hash => h)
oAsset.save
l_file.close
File.delete(l_path)
:asset is my 'has_attached_file' field. Is there a way to omit file creation and to do something like: :asset => part.body in Asset.new ?
This is how I would do it, assuming your using the mail gem to read the email. you'll need the whole email 'part', not just part.body
file = StringIO.new(part.body) #mimic a real upload file
file.class.class_eval { attr_accessor :original_filename, :content_type } #add attr's that paperclip needs
file.original_filename = part.filename #assign filename in way that paperclip likes
file.content_type = part.mime_type # you could set this manually aswell if needed e.g 'application/pdf'
now just use the file object to save to the Paperclip association.
a = Asset.new
a.asset = file
a.save!
Hope this helps.
Barlow's answer is good, but it is effectively monkey-patching the StringIO class. In my case I was working with Mechanize::Download#body_io and I didn't want to possibly pollute the class leading to unexpected bugs popping up far away in the app. So I define the methods on the instances metaclass like so:
original_filename = "whatever.pdf" # Set local variables for the closure below
content_type = "application/pdf"
file = StringIO.new(part.body)
metaclass = class << file; self; end
metaclass.class_eval do
define_method(:original_filename) { original_filename }
define_method(:content_type) { content_type }
end
I like gtd's answer a lot, but it can be simpler.
file = StringIO.new(part.body)
class << file
define_method(:original_filename) { "whatever.pdf" }
define_method(:content_type) { "application/pdf" }
end
There's not really a need to extract the "metaclass" into a local variable, just append some class to the object.
From ruby 1.9, you can use StringIO and define_singleton_method :
def attachment_from_string(string, original_filename, content_type)
StringIO.new(string).tap do |file|
file.define_singleton_method(:original_filename) { original_filename }
file.define_singleton_method(:content_type) { content_type }
end
end
This would have been better as a comment on David-Barlow's answer but I don't have enough reputation points yet...
But, as others mentioned I didn't love the monkey-patching. Instead, I just created a new class that inherited from StringIO, like so:
class TempFile < StringIO
attr_accessor :original_filename, :content_type
end
For posterity, here is the best answer. Put the top part in vendor/paperclip/data_uri_adapter.rb and the bottom part in config/initializers/paperclip.rb.
https://github.com/thoughtbot/paperclip/blob/43eb9a36deb09ce5655028a1061578dbf0268a5d/lib/paperclip/io_adapters/data_uri_adapter.rb
This requires a data URI scheme stream, but these days that seems pretty common. Simply set your paperclip'd variable to a string with the stream data, and the code takes care of the rest.
I used a similar technique to pull down images into paperclip
this should work, but is obvs untested:
io = part.body
def io.original_filename; part.original_file_name || 'unknown-file-name'; end
asset = Asset.new(:email=>email)
asset.asset = io
When we are assigning the IO directly to the paperclip instance, it needs to have a .original_file_name to it, so that's what we're doing in the second line.

Error with Paperclip / FasterCSV Processing for optional csv upload

I have a page where a user can import data to the site. either in the form of copy and pasting into a text area from excel, or by uploading a .csv file.
The controller checks if a csv has been uploaded - if so it processes this, else it will process the pasted content. (working on the assumption the user will only choose one option for now).
The copy and paste part works perfectly, however, the problem arises when I try to process the uploaded csv file:
I get the error:
can't convert
ActionController::UploadedTempfile
into String
#events_controller
def invite_save
#event = Event.find(params[:id])
if params[:guest_list_csv]
lines = parse_csv_file(params[:guest_list_csv])
else
#csv file uploaded
lines = params[:guest_list_paste]
end
if lines.size > 0
lines.each do |line|
new_user(line.split)
end
flash[:notice] = "List processing was successful."
else
flash[:error] = "List data processing failed."
end
end
private
def parse_csv_file(path_to_csv)
lines = []
require 'fastercsv'
FasterCSV.foreach(path_to_csv) do |row|
lines << row
end
lines
end
def new_user(line)
#code to create new user would go here
end
I'm essentially trying to upload and process the csv in one smooth action, rather than have to get the user to press a "process" button.
On the line #6 above
if params[:guest_list_csv]
lines = parse_csv_file(params[:guest_list_csv])
else
#csv file uploaded
lines = params[:guest_list_paste]
end
The problem is params[:guest_list_csv] is not the actual string, neither is the path, since it's a file object. What you need is explicitly call #path on it.
# line 6
lines = parse_csv_file(params[:guest_list_csv].path)
Please try it and see if it fixes your problem.

Resources