Why is AWS uploading literal file paths, instead of uploading images? - ruby-on-rails

TL;DR
How do you input file paths into the AWS S3 API Ruby client, and have them interpreted as images, not string literal file paths?
More Details
I'm using the Ruby AWS S3 client to upload images programmatically. I have taken this code from their example startup code and barely modified it myself. See https://docs.aws.amazon.com/sdk-for-ruby/v3/developer-guide/s3-example-upload-bucket-item.html
def object_uploaded?(s3_client, bucket_name, object_key)
response = s3_client.put_object(
body: "tmp/cosn_img.jpeg", # is always interpreted literally
acl: "public-read",
bucket: bucket_name,
key: object_key
)
if response.etag
return true
else
return false
end
rescue StandardError => e
puts "Error uploading object: #{e.message}"
return false
end
# Full example call:
def run_me
bucket_name = 'cosn-images'
object_key = "#{order_number}-trello-pic_#{list_config[:ac_campaign_id]}.jpeg"
region = 'us-west-2'
s3_client = Aws::S3::Client.new(region: region)
if object_uploaded?(s3_client, bucket_name, object_key)
puts "Object '#{object_key}' uploaded to bucket '#{bucket_name}'."
else
puts "Object '#{object_key}' not uploaded to bucket '#{bucket_name}'."
end
end
This works and is able to upload to AWS, but it is uploading just the file path from the body, not the actual file itself.
file path shown when you click on attachment link
As far as I can see from the Client documentation, this should work. https://docs.aws.amazon.com/sdk-for-ruby/v3/api/Aws/S3/Client.html#put_object-instance_method
Client docs
Also, manually uploading this file through the frontend does work just fine, so it has to be an issue in my code.
How are you supposed to let AWS know that it should interpret that file path as a file path, and not just as a string literal?

You have two issues:
You have commas at the end of your variable assignments in object_uploaded? that are impacting the way that your variables are being stored. Remove these.
You need to reference the file as a File object type, not as a file path. Like this:
image = File.open("#{Rails.root}/tmp/cosn_img.jpeg")
See full code below:
def object_uploaded?(image, s3_client, bucket_name, object_key)
response = s3_client.put_object(
body: image,
acl: "public-read",
bucket: bucket_name,
key: object_key
)
puts response
if response.etag
return true
else
return false
end
rescue StandardError => e
puts "Error uploading object: #{e.message}"
return false
end
def run_me
image = File.open("#{Rails.root}/tmp/cosn_img.jpeg")
bucket_name = 'cosn-images'
object_key = "#{order_number}-trello-pic_#{list_config[:ac_campaign_id]}.jpeg"
region = 'us-west-2'
s3_client = Aws::S3::Client.new(region: region)
if object_uploaded?(image, s3_client, bucket_name, object_key)
puts "Object '#{object_key}' uploaded to bucket '#{bucket_name}'."
else
puts "Object '#{object_key}' not uploaded to bucket '#{bucket_name}'."
end
end

Their docs seem a bit weird and not straigtforward, but it seems that you might need to pass in a file/io object, instead of the path.
The ruby docs here have an example like this:
s3_client.put_object(
:bucket_name => 'mybucket',
:key => 'some/key'
:content_length => File.size('myfile.txt')
) do |buffer|
File.open('myfile.txt') do |io|
buffer.write(io.read(length)) until io.eof?
end
end
or another option in the aws ruby sdk docs, under "Streaming a file from disk":
File.open('/source/file/path', 'rb') do |file|
s3.put_object(bucket: 'bucket-name', key: 'object-key', body: file)
end

Related

How to write csv files to S3 from inside a Job?

I have a data backup system for customers of my app. I gather up all associated csv files and zip them. Once that zip file is complete, I attach it in an email. This process breaks on heroku due to their file system. I thought since heroku-16 we could write to the app/tmp directory and that this process might occur within the same transaction and the files would be fine, but that doesn't seem to be the case. I don't even seem to be writing the files to the tmp directory in production (in Dev I am).
So, what I would like to do instead is just write the csv files directly to S3, then Zip those files and also save the .zip to S3...then, pull that file as an email attachment. To do this, I need to generate the csv files and write them to S3 from inside ActiveJob. I use S3 already as part of ActiveStorage, but this process will not utilize ActiveStorage.
Is there's a command for me to manually direct upload to an S3 bucket. I've been digging around in the docs, etc but don't see what I'm after.
The Job (using /tmp)
def perform(company_id, recipient_id)
company = Company.find(company_id)
source_folder = "#{ Rails.root }/tmp"
zipfile_name = "company_#{ company.id }_archive.zip"
zipfile_path = "#{ Rails.root }/tmp/#{ zipfile_name }"
input_filenames = []
# USERS: create a new empty csv file,
# ... then add rows to it
# ... and, add the file name to the list of files array
users_file_name = "#{ company.name.parameterize.underscore }_users_list.csv"
input_filenames << users_file_name
users_csv_file = File.new("#{ Rails.root.join('tmp') }/#{ users_file_name }", 'w')
users_csv_file << company.users.to_csv
users_csv_file.close
...
# gather up the created files and zip them
Zip::File.open(zipfile_path, create: true) do |zipfile|
input_filenames.uniq.each do |filename|
zipfile.add(filename, File.join(source_folder, filename))
end
end
puts "attaching data_export".colorize(:red)
company.data_exports.attach(
io: StringIO.new("#{ Rails.root }/tmp/company_14_#{ Time.current.to_date.to_s }_archive.zip"),
filename: 'company_14_archive.zip',
content_type: 'application/zip'
)
last_id = company.data_exports.last.id
puts "sending mail using company.id: #{ company.id }, recipient_id: #{ recipient_id }, company.data_exports.last.id: #{ last_id }".colorize(:red)
CompanyMailer.mail_data_export(
company.id,
recipient_id,
last_id
)
end
You can upload file like this on S3
key = "file_name.zip"
file_path = "tmp/file_name.zip"
new_s3_client = Aws::S3::Resource.new(region: 'eu-west-1', access_key_id: '123', secret_access_key: '456')
new_bucket = new_s3_client.bucket('public')
obj = new_bucket.object(key)
obj.upload_file(file_path)

S3 save old url, change paperclip config, set new url as old

So here is the thing: currently our files, when user downloads them, have names like 897123uiojdkashdu182uiej.pdf. I need to change that to file-name.pdf.
And logically I go and change paperclip.rb config from this:
Paperclip::Attachment.default_options.update({
path: '/:hash.:extension',
hash_secret: Rails.application.secrets.secret_key_base
})
to this:
Paperclip::Attachment.default_options.update({
path: "/attachment/#{SecureRandom.urlsafe_base64(64)}/:filename",
hash_secret: Rails.application.secrets.secret_key_base
})
which works just fine, filenames are great. However, old files are now unaccessable due to the change in the path. So I came up with the following decision
First I made a rake task which will store the old paths in the database:
namespace :paperclip do
desc "Set old urls for attachments"
task :update_old_urls => :environment do
Asset.find_each do |asset|
if asset.attachment
attachment_url = asset.attachment.try!(:url)
file_url = "https:#{attachment_url}"
puts "Set old url asset attachment #{asset.id} - #{file_url}"
asset.update(old_url: file_url)
else
puts "No attachment found in asset #{asset.id}"
end
end
end
end
Now the asset.old_url stores the current url of the file. Then I go and change the config, making the file unaccessable.
Now it's time for the new rake task:
require 'uri'
require 'open-uri'
namespace :paperclip do
desc "Recreate attachments and save them to new destination"
task :move_attachments => :environment do
Asset.find_each do |asset|
unless asset.old_url.blank?
url = asset.old_url
filename = File.basename(asset.attachment.path)
file = File.new("#{Rails.root}/tmp/#{filename}", "wb")
file.write(open(url).read)
if File.exists? file
puts "Re-saving asset attachment #{asset.id} - #{filename}"
asset.attachment = file
asset.save
# if there are multiple styles, you want to recreate them :
asset.attachment.reprocess!
file.close
else
puts "Missing file attachment #{asset.id} - #{filename}"
end
File.delete(file)
end
end
end
end
But my plan didn't work at all, I didn't get access to the files, and the asset.url still isn't equal to asset.old_url.
Would appreciate help very much!
With S3, you can set the "filename upon saving" as a header. Specifically, the user will get to an url https://foo.bar.com/mangled/path/some/weird/hash/whatever?options and when the browser will offer to save, you can control the filename (not the url).
The trick to that relies on the browser reading the Content-Disposition header from the response, if it reads Content-Disposition: attachment; filename="filename.jpg" it will save (or ask the user to save as) filename.jpg, independently on the original URL.
You can force S3 to add this header by adding one more parameter to the URL or by setting a metadata on the file.
The former can be done by passing it to the url method:
has_attached_file :attachment,
s3_url_options: ->(instance) {
{response_content_disposition: "attachment; filename=\"#{instance.filename}\""}
}
Check https://github.com/thoughtbot/paperclip/blob/v6.1.0/lib/paperclip/storage/s3.rb#L221-L225 for the relevant source code.
The latter can be done in bulk via paperclip (and you should also configure it to do it on new uploads). It will also take a long time!!
Asset.find_each do |asset|
next unless asset.attachment
s3_object = asset.attachment.s3_object
s3_object.copy_to(
s3_object,
metadata_directive: 'REPLACE',
content_disposition: "attachment; filename=\"#{asset.filename}\")"
)
end
# for new uploads
has_attached_file :attachment,
s3_headers: ->(att) {
{content_disposition: "attachment; filename=\"#{att.model.filename}\""}
}

How to get objects' keys in amazon s3 in Rails?

I use S3 to store some photos to send their path through rails to mobile clients, but when I started to send the data for a specific item in show method I used this code with aws-sdk gem to check where is the photos and get their path
s3 = Aws::S3::Resource.new(
access_key_id: 'askdlkasdmkmakml',
secret_access_key: 'aklsdmkmasldkmasmdlmasdl',
region: 'ap-southeast-1'
)
images = []
s3.bucket('my-pics').objects.find_all do |object|
images << object.key if object.key.include?(self.barcode_number})
end
this code I tested it with ruby not rails and I get this as result:
photos/9000101044393/1.jpg
photos/9000101044393/2.jpg
but when I use this code inside rails I got all these stuff
[#<Aws::S3::ObjectSummary:0x007fe348713170 #bucket_name="my-pics",
#key="photos/9000101044393/1.jpg", #data=#<struct
Aws::S3::Types::Object key="photos/9000101044393/1.jpg",
last_modified=2018-03-01 05:14:57 UTC,
etag="\"ee4540acc2a5bfc948507e0927e9dd1b\"", size=140119,
storage_class="STANDARD", owner=#<struct Aws::S3::Types::Owner
display_name="mohammed.eliass",
id="06bf2e3c37f83d96de16b13fc00efc20a097988edd1d4">>, #client=#
<Aws::S3::Client>>, #<Aws::S3::ObjectSummary:0x007fe348713008
#bucket_name="my-pics", #key="photos/9000101044393/2.jpg", #data=#
<struct Aws::S3::Types::Object key="photos/9000101044393/2.jpg",
last_modified=2018-03-01 05:14:57 UTC,
etag="\"b13dc7b1a516fee5ed15bccc57e\"", size=33132,
storage_class="STANDARD", owner=#<struct Aws::S3::Types::Owner
display_name="mohammed.eliass", id="06bf2e3c37f83d96de16b13fc0adasdaddskfl88edd1d449d85b7157d95bdf334">>,
#client=#<Aws::S3::Client>>, #<Aws::S3::ObjectSummary:0x007fe348712ef0
#bucket_name="my-pics", #key="photos/9000101044430/1.jpg", #data=#
<struct Aws::S3::Types::Object key="photos/9000101044430/1.jpg",
last_modified=2018-03-01 05:14:59 UTC,
etag="\"308ab75c98257b821469b4b93e9ca8f8\"", size=141066,
storage_class="STANDARD", owner=#<struct Aws::S3::Types::Owner
display_name="mohammed.eliass"]
So any ideas how should I get rid of all this and get only the keys?
From the docs :
bucket.objects.each do |obj|
puts obj.key
end
Untested, but if you just just want the matching keys in your output :
def matching_s3_keys
s3 = Aws::S3::Resource.new(
access_key_id: 'askdlkasdmkmakml',
secret_access_key: 'aklsdmkmasldkmasmdlmasdl',
region: 'ap-southeast-1'
)
images = s3.bucket('my-pics').objects.select do |object|
# This looks a bit weird has i'm unsure if you really
# want to match a s3key with another value in your app/db.
# Just showing here a way to filter your results
#
object.key == self.barcode_number
end
images.map(&:key)
end
Try this to retrieve the object keys
s3 = AWS::S3.new
s3.buckets['bucket-name'].objects.each do |o|
puts o.key
end
https://docs.aws.amazon.com/AWSRubySDK/latest/AWS/S3/S3Object.html
In your case
images = []
s3.buckets['my-pics'].objects.each do |object|
images << object.key if object.key.include?(self.barcode_number})
end

Need to change the storage "directory" of files in an S3 Bucket (Carrierwave / Fog)

I am using Carrierwave with 3 separate models to upload photos to S3. I kept the default settings for the uploader, which was to store photos in a root S3 bucket. I then decided to store them in sub-directories according to model name like /avatars, items/, etc. based on the model they were uploaded from...
Then, I noticed that files of the same name were being overwritten and when I deleted a model record, the photo wasn't being deleted.
I've since changed the store_dir from an uploader-specific setup like this:
def store_dir
"items"
end
to a generic one which stores photo under the model ID (I use mongo FYI):
def store_dir
"uploads/#{model.class.to_s.underscore}/#{mounted_as}/#{model.id}"
end
Here comes the problem. I am trying to move all the photos already into S3 into the proper "directory" within S3. From what I've ready, S3 doesn't have directories per se. I'm having trouble with the rake task. Since i changed the store_dir, Carrierwave is looking for all the photos previously uploaded in the wrong directory.
namespace :pics do
desc "Fix directory location of pictures on s3"
task :item_update => :environment do
connection = Fog::Storage.new({
:provider => 'AWS',
:aws_access_key_id => 'XXXX',
:aws_secret_access_key => 'XXX'
})
directory = connection.directories.get("myapp-uploads-dev")
Recipe.all.each do |l|
if l.images.count > 0
l.items.each do |i|
if i.picture.path.to_s != ""
new_full_path = i.picture.path.to_s
filename = new_full_path.split('/')[-1].split('?')[0]
thumb_filename = "thumb_#{filename}"
original_file_path = "items/#{filename}"
puts "attempting to retrieve: #{original_file_path}"
original_thumb_file_path = "items/#{thumb_filename}"
photo = directory.files.get(original_file_path) rescue nil
if photo
puts "we found: #{original_file_path}"
photo.expires = 2.years.from_now.httpdate
photo.key = new_full_path
photo.save
thumb_photo = directory.files.get(original_thumb_file_path) rescue nil
if thumb_photo
puts "we found: #{original_thumb_file_path}"
thumb_photo.expires = 2.years.from_now.httpdate
thumb_photo.key = "/uploads/item/picture/#{i.id}/#{thumb_filename}"
thumb_photo.save
end
end
end
end
end
end
end
end
So I'm looping through all the Recipes, looking for items with photos, determining the old Carrierwave path, trying to update it with the new one based on the store_dir change. I thought if I simply updated the photo.key with the new path, it would work, but it's not.
What am I doing wrong? Is there a better way to accomplish the ask here?
Here's what I did to get this working...
namespace :pics do
desc "Fix directory location of pictures"
task :item_update => :environment do
connection = Fog::Storage.new({
:provider => 'AWS',
:aws_access_key_id => 'XXX',
:aws_secret_access_key => 'XXX'
})
bucket = "myapp-uploads-dev"
puts "Using bucket: #{bucket}"
Recipe.all.each do |l|
if l.images.count > 0
l.items.each do |i|
if i.picture.path.to_s != ""
new_full_path = i.picture.path.to_s
filename = new_full_path.split('/')[-1].split('?')[0]
thumb_filename = "thumb_#{filename}"
original_file_path = "items/#{filename}"
original_thumb_file_path = "items/#{thumb_filename}"
puts "attempting to retrieve: #{original_file_path}"
# copy original item
begin
connection.copy_object(bucket, original_file_path, bucket, new_full_path, 'x-amz-acl' => 'public-read')
puts "we just copied: #{original_file_path}"
rescue
puts "couldn't find: #{original_file_path}"
end
# copy thumb
begin
connection.copy_object(bucket, original_thumb_file_path, bucket, "uploads/item/picture/#{i.id}/#{thumb_filename}", 'x-amz-acl' => 'public-read')
puts "we just copied: #{original_thumb_file_path}"
rescue
puts "couldn't find thumb: #{original_thumb_file_path}"
end
end
end
end
end
end
end
Perhaps not the prettiest thing in the world, but it worked.
You need to be interacting with the S3 Objects directly to move them. You'll probably want to look at copy_object and delete_object in the Fog gem, which is what CarrierWave uses to interact with S3.
https://github.com/fog/fog/blob/8ca8a059b2f5dd2abc232dd2d2104fe6d8c41919/lib/fog/aws/requests/storage/copy_object.rb
https://github.com/fog/fog/blob/8ca8a059b2f5dd2abc232dd2d2104fe6d8c41919/lib/fog/aws/requests/storage/delete_object.rb

Migrating paperclip S3 images to new url/path format

Is there a recommended technique for migrating a large set of paperclip S3 images to a new :url and :path format?
The reason for this is because after upgrading to rails 3.1, new versions of thumbs are not being shown after cropping (previously cached version is shown). This is because the filename no longer changes (since asset_timestamp was removed in rails 3.1). I'm using :fingerprint in the url/path format, but this is generated from the original, which doesn't change when cropping.
I was intending to insert :updated_at in the url/path format, and update attachment.updated_at during cropping, but after implementing that change all existing images would need to be moved to their new location. That's around half a million images to rename over S3.
At this point I'm considering copying them to their new location first, then deploying the code change, then moving any images which were missed (ie uploaded after the copy), but I'm hoping there's an easier way... any suggestions?
I had to change my paperclip path in order to support image cropping, I ended up creating a rake task to help out.
namespace :paperclip_migration do
desc 'Migrate data'
task :migrate_s3 => :environment do
# Make sure that all of the models have been loaded so any attachments are registered
puts 'Loading models...'
Dir[Rails.root.join('app', 'models', '**/*')].each { |file| File.basename(file, '.rb').camelize.constantize }
# Iterate through all of the registered attachments
puts 'Migrating attachments...'
attachment_registry.each_definition do |klass, name, options|
puts "Migrating #{klass}: #{name}"
klass.find_each(batch_size: 100) do |instance|
attachment = instance.send(name)
unless attachment.blank?
attachment.styles.each do |style_name, style|
old_path = interpolator.interpolate(old_path_option, attachment, style_name)
new_path = interpolator.interpolate(new_path_option, attachment, style_name)
# puts "#{style_name}:\n\told: #{old_path}\n\tnew: #{new_path}"
s3_copy(s3_bucket, old_path, new_path)
end
end
end
end
puts 'Completed migration.'
end
#############################################################################
private
# Paperclip Configuration
def attachment_registry
Paperclip::AttachmentRegistry
end
def s3_bucket
ENV['S3_BUCKET']
end
def old_path_option
':class/:id_partition/:attachment/:hash.:extension'
end
def new_path_option
':class/:attachment/:id_partition/:style/:filename'
end
def interpolator
Paperclip::Interpolations
end
# S3
def s3
AWS::S3.new(access_key_id: ENV['S3_KEY'], secret_access_key: ENV['S3_SECRET'])
end
def s3_copy(bucket, source, destination)
source_object = s3.buckets[bucket].objects[source]
destination_object = source_object.copy_to(destination, {metadata: source_object.metadata.to_h})
destination_object.acl = source_object.acl
puts "Copied #{source}"
rescue Exception => e
puts "*Unable to copy #{source} - #{e.message}"
end
end
Didn't find a feasible method for migrating to a new url format. I ended up overriding Paperclip::Attachment#generate_fingerprint so it appends :updated_at.

Resources