Zip up all Paperclip attachments stored on S3 - ruby-on-rails

Paperclip is a great upload plugin for Rails. Storing uploads on the local filesystem or Amazon S3 seems to work well. I'd just assume store files on the localhost, but the use of S3 is required for this app as it will be hosted on Heroku.
How would I go about getting all of my uploads/attachments from S3 in a single zipped download?
Getting a zip of files from the local filesystem seems straight forward. It's getting the files from S3 that has me puzzled. I think it may have something to do with the way that rubyzip handles files referenced by URL. I've tried various approaches but can't seem to avoid errors.
format.zip {
registrations_with_attachments = Registration.find_by_sql('SELECT * FROM registrations WHERE abstract_file_name NOT LIKE ""')
headers['Cache-Control'] = 'no-cache'
tmp_filename = "#{RAILS_ROOT}/tmp/tmp_zip_" <<
Time.now.to_f.to_s <<
".zip"
# rubyzip gem version 0.9.1
# rdoc http://rubyzip.sourceforge.net/
Zip::ZipFile.open(tmp_filename, Zip::ZipFile::CREATE) do |zip|
#get all of the attachments
# attempt to get files stored on S3
# FAIL
registrations_with_attachments.each { |e| zip.add("abstracts/#{e.abstract.original_filename}", e.abstract.url(:original, false)) }
# => No such file or directory - http://s3.amazonaws.com/bucket/original/abstract.txt
# Should note that these files in S3 bucket are publicly accessible. No ACL.
# works with local storage. Thanks to Henrik Nyh
# registrations_with_attachments.each { |e| zip.add("abstracts/#{e.abstract.original_filename}", e.abstract.path(:original)) }
end
send_data(File.open(tmp_filename, "rb+").read, :type => 'application/zip', :disposition => 'attachment', :filename => tmp_filename.to_s)
File.delete tmp_filename
}

You almost certainly want to use e.abstract.to_file.path instead of e.abstract.url(...).
See:
Paperclip::Storage::S3::to_file (should return a TempFile)
TempFile::path
UPDATE
From the changelog:
New in 3.0.1:
API CHANGE: #to_file has been removed. Use the #copy_to_local_file method instead.

#vlard's solution is ok. However I've run into some issues with the to_file. It creates a tempfile and the garbage collector deletes (sometimes) the file before it was added to the zip file. Therefor, I'm getting random Errno::ENOENT: No such file or directory errors.
So I'm using the following code now (I've kept the initial code variables names for consistency with the initial question)
format.zip {
registrations_with_attachments = Registration.find_by_sql('SELECT * FROM registrations WHERE abstract_file_name NOT LIKE ""')
headers['Cache-Control'] = 'no-cache'
#please note that using nanoseconds option in strftime reduces the risks concerning the situation where 2 or more users initiate the download in the same time
tmp_filename = "#{RAILS_ROOT}/tmp/tmp_zip_" <<
Time.now.strftime('%Y-%m-%d-%H%M%S-%N').to_s <<
".zip"
# rubyzip gem version 0.9.4
zip = Zip::ZipFile.open(tmp_filename, Zip::ZipFile::CREATE)
zip.close
registrations_with_attachments.each { |e|
file_to_add = e.file.to_file
zip = Zip::ZipFile.open(tmp_filename)
zip.add("abstracts/#{e.abstract.original_filename}", file_to_add.path)
zip.close
puts "added #{file_to_add.path} to #{tmp_filename}" #force garbage collector to keep the file_to_add until after the file has been added to zip
}
send_data(File.open(tmp_filename, "rb+").read, :type => 'application/zip', :disposition => 'attachment', :filename => tmp_filename.to_s)
File.delete tmp_filename
}

Related

How to write csv files to S3 from inside a Job?

I have a data backup system for customers of my app. I gather up all associated csv files and zip them. Once that zip file is complete, I attach it in an email. This process breaks on heroku due to their file system. I thought since heroku-16 we could write to the app/tmp directory and that this process might occur within the same transaction and the files would be fine, but that doesn't seem to be the case. I don't even seem to be writing the files to the tmp directory in production (in Dev I am).
So, what I would like to do instead is just write the csv files directly to S3, then Zip those files and also save the .zip to S3...then, pull that file as an email attachment. To do this, I need to generate the csv files and write them to S3 from inside ActiveJob. I use S3 already as part of ActiveStorage, but this process will not utilize ActiveStorage.
Is there's a command for me to manually direct upload to an S3 bucket. I've been digging around in the docs, etc but don't see what I'm after.
The Job (using /tmp)
def perform(company_id, recipient_id)
company = Company.find(company_id)
source_folder = "#{ Rails.root }/tmp"
zipfile_name = "company_#{ company.id }_archive.zip"
zipfile_path = "#{ Rails.root }/tmp/#{ zipfile_name }"
input_filenames = []
# USERS: create a new empty csv file,
# ... then add rows to it
# ... and, add the file name to the list of files array
users_file_name = "#{ company.name.parameterize.underscore }_users_list.csv"
input_filenames << users_file_name
users_csv_file = File.new("#{ Rails.root.join('tmp') }/#{ users_file_name }", 'w')
users_csv_file << company.users.to_csv
users_csv_file.close
...
# gather up the created files and zip them
Zip::File.open(zipfile_path, create: true) do |zipfile|
input_filenames.uniq.each do |filename|
zipfile.add(filename, File.join(source_folder, filename))
end
end
puts "attaching data_export".colorize(:red)
company.data_exports.attach(
io: StringIO.new("#{ Rails.root }/tmp/company_14_#{ Time.current.to_date.to_s }_archive.zip"),
filename: 'company_14_archive.zip',
content_type: 'application/zip'
)
last_id = company.data_exports.last.id
puts "sending mail using company.id: #{ company.id }, recipient_id: #{ recipient_id }, company.data_exports.last.id: #{ last_id }".colorize(:red)
CompanyMailer.mail_data_export(
company.id,
recipient_id,
last_id
)
end
You can upload file like this on S3
key = "file_name.zip"
file_path = "tmp/file_name.zip"
new_s3_client = Aws::S3::Resource.new(region: 'eu-west-1', access_key_id: '123', secret_access_key: '456')
new_bucket = new_s3_client.bucket('public')
obj = new_bucket.object(key)
obj.upload_file(file_path)

Download zip file from rails 4 to angularjs

I'm trying to download a zip file, sent from a rails 4 application to the front-end.
The zip file construction is working correctly, I can unzip the file and get the content
filename = "cvs_job_#{params[:job_id]}.zip"
archive_path ="#{Rails.root}/tmp/#{filename}"
File.delete(archive_path) if File.exists?(archive_path)
Zip::File.open(archive_path, Zip::File::CREATE) do |zipfile|
params[:user_ids].each do |user_id|
user = User.find(user_id)
zipfile.add("#{user.last_name}_#{user.first_name}.pdf", user.cv_file.path) unless user.cv_file.nil?
end
end
send_file("#{Rails.root}/tmp/#{filename}", :type => 'application/zip', :disposition => 'attachment')
but how am I supposed to handle the response back in the promise?
$http(req).success(function(success){
console.log(success)
})
I saw the zip file in the chrome console, such as :
"...8f�~��/g6�I�-v��=� ..."
I have tried many solutions but none are working.
I thought that I would be able to send the file and download from my front.

Rails download file direct from S3 with content-disposition = attachment?

This is my controller
Cotroller
def download
data = open(#attachment.file.url).read
#attachment.clicks = #attachment.clicks.to_i + 1
#attachment.save
send_data data, :type => #attachment.content_type, :filename => #attachment.name
end
example:
#attachment.file.url = "http://my_bucket.cloudfront.net/uploads/attachment/file/50/huge_file.pptx"
I did this, but if #attachement is a huge file (eg. 300MB), my server crash.
I want to allow users to download the file in the browser directly from my AWS server?
2) tip: Do you suggest to download file from S3 (where they are stored) or with CloudFront?
If you using carrierwave gem, you can try this to track number of clicks
def download
#attachment.clicks.to_i += 1
#attachment.save
redirect_to #attachment.file.url(query: {"response-content-disposition" => "attachment;"})
end
references:
Rails carrierwave S3 get url with Content-Disposition header

Export large mount of data in a zip

I'm exporting some data from my server to the client.
It's an zip archive but when the amount of data is to big : TimeOut !
#On my controller
def export
filename = 'my_archive.zip'
temp_file = Tempfile.new(filename)
begin
Zip::OutputStream.open(temp_file) { |zos| }
Zip::File.open(temp_file.path, Zip::File::CREATE) do |zip|
#videos.each do |v|
video_file_name = v.title + '.mp4'
zip.add(video_file_name, v.source.file.path(:original))
end
end
zip_data = File.read(temp_file.path)
send_data(zip_data, :type => 'application/zip', :filename => filename)
ensure
temp_file.close
temp_file.unlink
end
end
I'm using PaperClip to attach my video on my app.
Is there any way to create and upload the zip (with a stream?) without a too long wait?
You could try the zipline gem. It claims to be "Hacks on Hacks on Hacks" so heads up! Looks very easy to use though, worth a shot.

Migrating paperclip S3 images to new url/path format

Is there a recommended technique for migrating a large set of paperclip S3 images to a new :url and :path format?
The reason for this is because after upgrading to rails 3.1, new versions of thumbs are not being shown after cropping (previously cached version is shown). This is because the filename no longer changes (since asset_timestamp was removed in rails 3.1). I'm using :fingerprint in the url/path format, but this is generated from the original, which doesn't change when cropping.
I was intending to insert :updated_at in the url/path format, and update attachment.updated_at during cropping, but after implementing that change all existing images would need to be moved to their new location. That's around half a million images to rename over S3.
At this point I'm considering copying them to their new location first, then deploying the code change, then moving any images which were missed (ie uploaded after the copy), but I'm hoping there's an easier way... any suggestions?
I had to change my paperclip path in order to support image cropping, I ended up creating a rake task to help out.
namespace :paperclip_migration do
desc 'Migrate data'
task :migrate_s3 => :environment do
# Make sure that all of the models have been loaded so any attachments are registered
puts 'Loading models...'
Dir[Rails.root.join('app', 'models', '**/*')].each { |file| File.basename(file, '.rb').camelize.constantize }
# Iterate through all of the registered attachments
puts 'Migrating attachments...'
attachment_registry.each_definition do |klass, name, options|
puts "Migrating #{klass}: #{name}"
klass.find_each(batch_size: 100) do |instance|
attachment = instance.send(name)
unless attachment.blank?
attachment.styles.each do |style_name, style|
old_path = interpolator.interpolate(old_path_option, attachment, style_name)
new_path = interpolator.interpolate(new_path_option, attachment, style_name)
# puts "#{style_name}:\n\told: #{old_path}\n\tnew: #{new_path}"
s3_copy(s3_bucket, old_path, new_path)
end
end
end
end
puts 'Completed migration.'
end
#############################################################################
private
# Paperclip Configuration
def attachment_registry
Paperclip::AttachmentRegistry
end
def s3_bucket
ENV['S3_BUCKET']
end
def old_path_option
':class/:id_partition/:attachment/:hash.:extension'
end
def new_path_option
':class/:attachment/:id_partition/:style/:filename'
end
def interpolator
Paperclip::Interpolations
end
# S3
def s3
AWS::S3.new(access_key_id: ENV['S3_KEY'], secret_access_key: ENV['S3_SECRET'])
end
def s3_copy(bucket, source, destination)
source_object = s3.buckets[bucket].objects[source]
destination_object = source_object.copy_to(destination, {metadata: source_object.metadata.to_h})
destination_object.acl = source_object.acl
puts "Copied #{source}"
rescue Exception => e
puts "*Unable to copy #{source} - #{e.message}"
end
end
Didn't find a feasible method for migrating to a new url format. I ended up overriding Paperclip::Attachment#generate_fingerprint so it appends :updated_at.

Resources