How to write csv files to S3 from inside a Job?

How to write csv files to S3 from inside a Job? - ruby-on-rails

I have a data backup system for customers of my app. I gather up all associated csv files and zip them. Once that zip file is complete, I attach it in an email. This process breaks on heroku due to their file system. I thought since heroku-16 we could write to the app/tmp directory and that this process might occur within the same transaction and the files would be fine, but that doesn't seem to be the case. I don't even seem to be writing the files to the tmp directory in production (in Dev I am).
So, what I would like to do instead is just write the csv files directly to S3, then Zip those files and also save the .zip to S3...then, pull that file as an email attachment. To do this, I need to generate the csv files and write them to S3 from inside ActiveJob. I use S3 already as part of ActiveStorage, but this process will not utilize ActiveStorage.
Is there's a command for me to manually direct upload to an S3 bucket. I've been digging around in the docs, etc but don't see what I'm after.
The Job (using /tmp)
def perform(company_id, recipient_id)
company = Company.find(company_id)
source_folder = "#{ Rails.root }/tmp"
zipfile_name = "company_#{ company.id }_archive.zip"
zipfile_path = "#{ Rails.root }/tmp/#{ zipfile_name }"
input_filenames = []
# USERS: create a new empty csv file,
# ... then add rows to it
# ... and, add the file name to the list of files array
users_file_name = "#{ company.name.parameterize.underscore }_users_list.csv"
input_filenames << users_file_name
users_csv_file = File.new("#{ Rails.root.join('tmp') }/#{ users_file_name }", 'w')
users_csv_file << company.users.to_csv
users_csv_file.close
...
# gather up the created files and zip them
Zip::File.open(zipfile_path, create: true) do |zipfile|
input_filenames.uniq.each do |filename|
zipfile.add(filename, File.join(source_folder, filename))
end
end
puts "attaching data_export".colorize(:red)
company.data_exports.attach(
io: StringIO.new("#{ Rails.root }/tmp/company_14_#{ Time.current.to_date.to_s }_archive.zip"),
filename: 'company_14_archive.zip',
content_type: 'application/zip'
)
last_id = company.data_exports.last.id
puts "sending mail using company.id: #{ company.id }, recipient_id: #{ recipient_id }, company.data_exports.last.id: #{ last_id }".colorize(:red)
CompanyMailer.mail_data_export(
company.id,
recipient_id,
last_id
)
end

You can upload file like this on S3
key = "file_name.zip"
file_path = "tmp/file_name.zip"
new_s3_client = Aws::S3::Resource.new(region: 'eu-west-1', access_key_id: '123', secret_access_key: '456')
new_bucket = new_s3_client.bucket('public')
obj = new_bucket.object(key)
obj.upload_file(file_path)

Related

S3 save old url, change paperclip config, set new url as old

So here is the thing: currently our files, when user downloads them, have names like 897123uiojdkashdu182uiej.pdf. I need to change that to file-name.pdf.
And logically I go and change paperclip.rb config from this:
Paperclip::Attachment.default_options.update({
path: '/:hash.:extension',
hash_secret: Rails.application.secrets.secret_key_base
})
to this:
Paperclip::Attachment.default_options.update({
path: "/attachment/#{SecureRandom.urlsafe_base64(64)}/:filename",
hash_secret: Rails.application.secrets.secret_key_base
})
which works just fine, filenames are great. However, old files are now unaccessable due to the change in the path. So I came up with the following decision
First I made a rake task which will store the old paths in the database:
namespace :paperclip do
desc "Set old urls for attachments"
task :update_old_urls => :environment do
Asset.find_each do |asset|
if asset.attachment
attachment_url = asset.attachment.try!(:url)
file_url = "https:#{attachment_url}"
puts "Set old url asset attachment #{asset.id} - #{file_url}"
asset.update(old_url: file_url)
else
puts "No attachment found in asset #{asset.id}"
end
end
end
end
Now the asset.old_url stores the current url of the file. Then I go and change the config, making the file unaccessable.
Now it's time for the new rake task:
require 'uri'
require 'open-uri'
namespace :paperclip do
desc "Recreate attachments and save them to new destination"
task :move_attachments => :environment do
Asset.find_each do |asset|
unless asset.old_url.blank?
url = asset.old_url
filename = File.basename(asset.attachment.path)
file = File.new("#{Rails.root}/tmp/#{filename}", "wb")
file.write(open(url).read)
if File.exists? file
puts "Re-saving asset attachment #{asset.id} - #{filename}"
asset.attachment = file
asset.save
# if there are multiple styles, you want to recreate them :
asset.attachment.reprocess!
file.close
else
puts "Missing file attachment #{asset.id} - #{filename}"
end
File.delete(file)
end
end
end
end
But my plan didn't work at all, I didn't get access to the files, and the asset.url still isn't equal to asset.old_url.
Would appreciate help very much!

With S3, you can set the "filename upon saving" as a header. Specifically, the user will get to an url https://foo.bar.com/mangled/path/some/weird/hash/whatever?options and when the browser will offer to save, you can control the filename (not the url).
The trick to that relies on the browser reading the Content-Disposition header from the response, if it reads Content-Disposition: attachment; filename="filename.jpg" it will save (or ask the user to save as) filename.jpg, independently on the original URL.
You can force S3 to add this header by adding one more parameter to the URL or by setting a metadata on the file.
The former can be done by passing it to the url method:
has_attached_file :attachment,
s3_url_options: ->(instance) {
{response_content_disposition: "attachment; filename=\"#{instance.filename}\""}
}
Check https://github.com/thoughtbot/paperclip/blob/v6.1.0/lib/paperclip/storage/s3.rb#L221-L225 for the relevant source code.
The latter can be done in bulk via paperclip (and you should also configure it to do it on new uploads). It will also take a long time!!
Asset.find_each do |asset|
next unless asset.attachment
s3_object = asset.attachment.s3_object
s3_object.copy_to(
s3_object,
metadata_directive: 'REPLACE',
content_disposition: "attachment; filename=\"#{asset.filename}\")"
)
end
# for new uploads
has_attached_file :attachment,
s3_headers: ->(att) {
{content_disposition: "attachment; filename=\"#{att.model.filename}\""}
}

Why is AWS uploading literal file paths, instead of uploading images?

TL;DR
How do you input file paths into the AWS S3 API Ruby client, and have them interpreted as images, not string literal file paths?
More Details
I'm using the Ruby AWS S3 client to upload images programmatically. I have taken this code from their example startup code and barely modified it myself. See https://docs.aws.amazon.com/sdk-for-ruby/v3/developer-guide/s3-example-upload-bucket-item.html
def object_uploaded?(s3_client, bucket_name, object_key)
response = s3_client.put_object(
body: "tmp/cosn_img.jpeg", # is always interpreted literally
acl: "public-read",
bucket: bucket_name,
key: object_key
)
if response.etag
return true
else
return false
end
rescue StandardError => e
puts "Error uploading object: #{e.message}"
return false
end
# Full example call:
def run_me
bucket_name = 'cosn-images'
object_key = "#{order_number}-trello-pic_#{list_config[:ac_campaign_id]}.jpeg"
region = 'us-west-2'
s3_client = Aws::S3::Client.new(region: region)
if object_uploaded?(s3_client, bucket_name, object_key)
puts "Object '#{object_key}' uploaded to bucket '#{bucket_name}'."
else
puts "Object '#{object_key}' not uploaded to bucket '#{bucket_name}'."
end
end
This works and is able to upload to AWS, but it is uploading just the file path from the body, not the actual file itself.
file path shown when you click on attachment link
As far as I can see from the Client documentation, this should work. https://docs.aws.amazon.com/sdk-for-ruby/v3/api/Aws/S3/Client.html#put_object-instance_method
Client docs
Also, manually uploading this file through the frontend does work just fine, so it has to be an issue in my code.
How are you supposed to let AWS know that it should interpret that file path as a file path, and not just as a string literal?

You have two issues:
You have commas at the end of your variable assignments in object_uploaded? that are impacting the way that your variables are being stored. Remove these.
You need to reference the file as a File object type, not as a file path. Like this:
image = File.open("#{Rails.root}/tmp/cosn_img.jpeg")
See full code below:
def object_uploaded?(image, s3_client, bucket_name, object_key)
response = s3_client.put_object(
body: image,
acl: "public-read",
bucket: bucket_name,
key: object_key
)
puts response
if response.etag
return true
else
return false
end
rescue StandardError => e
puts "Error uploading object: #{e.message}"
return false
end
def run_me
image = File.open("#{Rails.root}/tmp/cosn_img.jpeg")
bucket_name = 'cosn-images'
object_key = "#{order_number}-trello-pic_#{list_config[:ac_campaign_id]}.jpeg"
region = 'us-west-2'
s3_client = Aws::S3::Client.new(region: region)
if object_uploaded?(image, s3_client, bucket_name, object_key)
puts "Object '#{object_key}' uploaded to bucket '#{bucket_name}'."
else
puts "Object '#{object_key}' not uploaded to bucket '#{bucket_name}'."
end
end

Their docs seem a bit weird and not straigtforward, but it seems that you might need to pass in a file/io object, instead of the path.
The ruby docs here have an example like this:
s3_client.put_object(
:bucket_name => 'mybucket',
:key => 'some/key'
:content_length => File.size('myfile.txt')
) do |buffer|
File.open('myfile.txt') do |io|
buffer.write(io.read(length)) until io.eof?
end
end
or another option in the aws ruby sdk docs, under "Streaming a file from disk":
File.open('/source/file/path', 'rb') do |file|
s3.put_object(bucket: 'bucket-name', key: 'object-key', body: file)
end

Export large mount of data in a zip

I'm exporting some data from my server to the client.
It's an zip archive but when the amount of data is to big : TimeOut !
#On my controller
def export
filename = 'my_archive.zip'
temp_file = Tempfile.new(filename)
begin
Zip::OutputStream.open(temp_file) { |zos| }
Zip::File.open(temp_file.path, Zip::File::CREATE) do |zip|
#videos.each do |v|
video_file_name = v.title + '.mp4'
zip.add(video_file_name, v.source.file.path(:original))
end
end
zip_data = File.read(temp_file.path)
send_data(zip_data, :type => 'application/zip', :filename => filename)
ensure
temp_file.close
temp_file.unlink
end
end
I'm using PaperClip to attach my video on my app.
Is there any way to create and upload the zip (with a stream?) without a too long wait?

You could try the zipline gem. It claims to be "Hacks on Hacks on Hacks" so heads up! Looks very easy to use though, worth a shot.

Downloading and zipping files that were uploaded to S3 with CarrierWave

I have a small Rails 3.2.1 app that uses CarrierWave 0.5.8 for file uploads to S3 (using Fog)
I want users to be able to select some images that they'd like to download, then zip them up and send them a zip. Here is what I've come up with:
def generate_zip
#A collection of Photo objects. The Photo object has a PhotoUploader mounted.
photos = Photo.all
tmp_filename = "#{Rails.root}/tmp/" << Time.now.strftime('%Y-%m-%d-%H%M%S-%N').to_s << ".zip"
zip = Zip::ZipFile.open(tmp_filename, Zip::ZipFile::CREATE)
zip.close
photos.each do |photo|
file_to_add = photo.photo.file
zip = Zip::ZipFile.open(tmp_filename)
zip.add("tmp/", file_to_add.path)
zip.close
end
#do the rest.. like send zip or upload file and e-mail link
end
This doesn't work because photo.photo.file returns an instance of CarrierWave::Storage::Fog::File instead of a regular file.
EDIT: The error this leads to:
Errno::ENOENT: No such file or directory - uploads/photos/name.jpg
I also tried the following:
tmp_filename = "#{Rails.root}/tmp/" << Time.now.strftime('%Y-%m-%d-%H%M%S-%N').to_s << ".zip"
zip = Zip::ZipFile.open(tmp_filename, Zip::ZipFile::CREATE)
zip.close
photos.each do |photo|
processed_uri = URI.parse(URI.escape(URI.unescape(photo.photo.file.authenticated_url)).gsub("[", "%5B").gsub("]", "%5D"))
file_to_add = CarrierWave::Uploader::Download::RemoteFile.new(processed_uri)
zip = Zip::ZipFile.open(tmp_filename)
zip.add("tmp/", file_to_add.path)
zip.close
end
But this gives me a 403. Some help would be greatly appreciated.. It probably is not that hard I'm just Doing it Wrong™

I've managed to solve the problem with help from #ffoeg
The solution offered by #ffoeg didn't work quite so well for me since I was dealing with zip files > 500 MB which caused me problems on Heroku. I've therefor moved the zipping to a background process using resque:
app/workers/photo_zipper.rb:
require 'zip/zip'
require 'zip/zipfilesystem'
require 'open-uri'
class PhotoZipper
#queue = :photozip_queue
#I pass
def self.perform(id_of_object_with_images, id_of_user_to_be_notified)
user_mail = User.where(:id => id_of_user_to_be_notified).pluck(:email)
export = PhotoZipper.generate_zip(id_of_object_with_images, id_of_user_to_be_notified)
Notifications.zip_ready(export.archive_url, user_mail).deliver
end
# Zipfile generator
def self.generate_zip(id_of_object_with_images, id_of_user_to_be_notified)
object = ObjectWithImages.find(id_of_object_with_images)
photos = object.images
# base temp dir
temp_dir = Dir.mktmpdir
# path for zip we are about to create, I find that ruby zip needs to write to a real file
# This assumes the ObjectWithImages object has an attribute title which is a string.
zip_path = File.join(temp_dir, "#{object.title}_#{Date.today.to_s}.zip")
Zip::ZipOutputStream.open(zip_path) do |zos|
photos.each do |photo|
path = photo.photo.path
zos.put_next_entry(path)
zos.write photo.photo.file.read
end
end
#Find the user that made the request
user = User.find(id_of_user_to_be_notified)
#Create an export object associated to the user
export = user.exports.build
#Associate the created zip to the export
export.archive = File.open(zip_path)
#Upload the archive
export.save!
#return the export object
export
ensure
# clean up the tempdir now!
FileUtils.rm_rf temp_dir if temp_dir
end
end
app/controllers/photos_controller.rb:
format.zip do
#pick the last ObjectWithImages.. ofcourse you should include your own logic here
id_of_object_with_images = ObjectWithImages.last.id
#enqueue the Photozipper task
Resque.enqueue(PhotoZipper, id_of_object_with_images, current_user.id)
#don't keep the user waiting and flash a message with information about what's happening behind the scenes
redirect_to some_path, :notice => "Your zip is being created, you will receive an e-mail once this process is complete"
end
Many thanks to #ffoeg for helping me out. If your zips are smaller you could try #ffoeg's solution.

Here is my take. There could be typos but I think this is the gist of it :)
# action method, stream the zip
def download_photos_as_zip # silly name but you get the idea
generate_zip do |zipname, zip_path|
File.open(zip_path, 'rb') do |zf|
# you may need to set these to get the file to stream (if you care about that)
# self.last_modified
# self.etag
# self.response.headers['Content-Length']
self.response.headers['Content-Type'] = "application/zip"
self.response.headers['Content-Disposition'] = "attachment; filename=#{zipname}"
self.response.body = Enumerator.new do |out| # Enumerator is ruby 1.9
while !zf.eof? do
out << zf.read(4096)
end
end
end
end
end
# Zipfile generator
def generate_zip(&block)
photos = Photo.all
# base temp dir
temp_dir = Dir.mktempdir
# path for zip we are about to create, I find that ruby zip needs to write to a real file
zip_path = File.join(temp_dir, 'export.zip')
Zip::ZipFile::open(zip_path, true) do |zipfile|
photos.each do |photo|
zipfile.get_output_stream(photo.photo.identifier) do |io|
io.write photo.photo.file.read
end
end
end
# yield the zipfile to the action
block.call 'export.zip', zip_path
ensure
# clean up the tempdir now!
FileUtils.rm_rf temp_dir if temp_dir
end

Zip up all Paperclip attachments stored on S3

Paperclip is a great upload plugin for Rails. Storing uploads on the local filesystem or Amazon S3 seems to work well. I'd just assume store files on the localhost, but the use of S3 is required for this app as it will be hosted on Heroku.
How would I go about getting all of my uploads/attachments from S3 in a single zipped download?
Getting a zip of files from the local filesystem seems straight forward. It's getting the files from S3 that has me puzzled. I think it may have something to do with the way that rubyzip handles files referenced by URL. I've tried various approaches but can't seem to avoid errors.
format.zip {
registrations_with_attachments = Registration.find_by_sql('SELECT * FROM registrations WHERE abstract_file_name NOT LIKE ""')
headers['Cache-Control'] = 'no-cache'
tmp_filename = "#{RAILS_ROOT}/tmp/tmp_zip_" <<
Time.now.to_f.to_s <<
".zip"
# rubyzip gem version 0.9.1
# rdoc http://rubyzip.sourceforge.net/
Zip::ZipFile.open(tmp_filename, Zip::ZipFile::CREATE) do |zip|
#get all of the attachments
# attempt to get files stored on S3
# FAIL
registrations_with_attachments.each { |e| zip.add("abstracts/#{e.abstract.original_filename}", e.abstract.url(:original, false)) }
# => No such file or directory - http://s3.amazonaws.com/bucket/original/abstract.txt
# Should note that these files in S3 bucket are publicly accessible. No ACL.
# works with local storage. Thanks to Henrik Nyh
# registrations_with_attachments.each { |e| zip.add("abstracts/#{e.abstract.original_filename}", e.abstract.path(:original)) }
end
send_data(File.open(tmp_filename, "rb+").read, :type => 'application/zip', :disposition => 'attachment', :filename => tmp_filename.to_s)
File.delete tmp_filename
}

You almost certainly want to use e.abstract.to_file.path instead of e.abstract.url(...).
See:
Paperclip::Storage::S3::to_file (should return a TempFile)
TempFile::path
UPDATE
From the changelog:
New in 3.0.1:
API CHANGE: #to_file has been removed. Use the #copy_to_local_file method instead.

#vlard's solution is ok. However I've run into some issues with the to_file. It creates a tempfile and the garbage collector deletes (sometimes) the file before it was added to the zip file. Therefor, I'm getting random Errno::ENOENT: No such file or directory errors.
So I'm using the following code now (I've kept the initial code variables names for consistency with the initial question)
format.zip {
registrations_with_attachments = Registration.find_by_sql('SELECT * FROM registrations WHERE abstract_file_name NOT LIKE ""')
headers['Cache-Control'] = 'no-cache'
#please note that using nanoseconds option in strftime reduces the risks concerning the situation where 2 or more users initiate the download in the same time
tmp_filename = "#{RAILS_ROOT}/tmp/tmp_zip_" <<
Time.now.strftime('%Y-%m-%d-%H%M%S-%N').to_s <<
".zip"
# rubyzip gem version 0.9.4
zip = Zip::ZipFile.open(tmp_filename, Zip::ZipFile::CREATE)
zip.close
registrations_with_attachments.each { |e|
file_to_add = e.file.to_file
zip = Zip::ZipFile.open(tmp_filename)
zip.add("abstracts/#{e.abstract.original_filename}", file_to_add.path)
zip.close
puts "added #{file_to_add.path} to #{tmp_filename}" #force garbage collector to keep the file_to_add until after the file has been added to zip
}
send_data(File.open(tmp_filename, "rb+").read, :type => 'application/zip', :disposition => 'attachment', :filename => tmp_filename.to_s)
File.delete tmp_filename
}

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

How to write csv files to S3 from inside a Job? - ruby-on-rails

You can upload file like this on S3 key = "file_name.zip" file_path = "tmp/file_name.zip" new_s3_client = Aws::S3::Resource.new(region: 'eu-west-1', access_key_id: '123', secret_access_key: '456') new_bucket = new_s3_client.bucket('public') obj = new_bucket.object(key) obj.upload_file(file_path)

Related

S3 save old url, change paperclip config, set new url as old

Why is AWS uploading literal file paths, instead of uploading images?

Export large mount of data in a zip

Downloading and zipping files that were uploaded to S3 with CarrierWave

Zip up all Paperclip attachments stored on S3

Categories

Resources