Paperclip or Google Cloud Storage issue when renaming paths - ruby-on-rails

I've a Rails app with Paperclip and I use Google Cloud Storage. So far so good.
To avoid having both development and production using the same storage, I decided to change the default Paperclip path to another based based on the environment. This way every env has his own directory. Then I consistently moved the old images from the default Paperclip path to the new ones.
The problem is that now old images give a 404, whereas any new image I upload works properly. Is there any way to fix that?
Here it's the previous settings:
module MyApp
class Application < Rails::Application
config.paperclip_defaults = {
storage: :fog,
fog_public: true,
fog_directory: 'myapp-01',
fog_credentials: {
google_storage_access_key_id: ENV['GOOGLE_STORAGE_ID'],
google_storage_secret_access_key: ENV['GOOGLE_STORAGE_SECRET'],
provider: 'Google'
}
}
}
I override the default using the following settings:
path: ":rails_env/:class/:attachment/:id_partition/:style/:filename",
url: "/:rails_root/:class/:attachment/:id_partition/:style/:filename"
My guess is that it's not sufficient to update Paperclip config with the new path and move all images to the new directory. You need also to update the old records...
If you wonder, the old records point to root/images/?123456789.

Your guess is right. Changing the config is not enough, you need to move the files. This, is better left for a rake task or background job. I have some code for S3, but it should give you an idea of how to implement it for Google:
def old_key(image, file_name_field)
# Previous `:path`: '/:class/:attachment/:id/:style/:filename'
klass = self.class.to_s.pluralize.downcase
attachment = image.pluralize
"#{klass}/#{attachment}/#{id}/original/#{send(file_name_field)}"
end
def re_path(image)
file_name_field = "#{image}_file_name"
return if send(file_name_field).blank?
old_object = bucket.object(old_key(image, file_name_field))
return unless old_object.exists?
Rails.logger.warn "Re-saving image attachment #{self.class}/#{id}"
send "#{image}=", URI.parse(old_object.public_url)
save
end
I'm basically building the old path using my own interpolation, finding the object in S3 (hence key/object lingo) and re-download every image from S3. Be careful with this, since you might incur in extra cost for downloading rather than just moving, if that's someone Google allows.
Then I just called this method on every image for every object:
Object.each do { |o| o.re_path(:logo); o.re_path(:background); }

Related

Rails - Resave All Models for S3 Migration

rails 6.1.3.2
aws-sdk-s3 gem
I currently have a rails app in production that uses ActiveStorage to attach image data to a wrapper Image model. It's currently using the local strategy to save images to disk and I am migrating it to S3. I am not using paperclip or anything similar.
I succeeded in setting it up. Currently it is set to use local primarily and have S3 as a mirror so that I can write to two places during the migration. However the documentation says that it will only save new images to S3 upon create and update of a record. I would like to "re-save" all models in production to force the migration to happen. Does anyone know how to do this?
Looks like it was already answered!
If you happen to be stuck with only access to the Rails Console like I was, this solution worked perfectly. If you copy-paste this code into the console, it will begin to produce output of the S3 uploads. After 5k of those, I was done. An immense thank you to Tayden for the solution.
all_services = [ActiveStorage::Blob.service.primary, *ActiveStorage::Blob.service.mirrors]
# Iterate through each blob
ActiveStorage::Blob.all.each do |blob|
# Select services where file exists
services = all_services.select { |file| file.exist? blob.key }
# Skip blob if file doesn't exist anywhere
next unless services.present?
# Select services where file doesn't exist
mirrors = all_services - services
# Open the local file (if one exists)
local_file = File.open(services.find{ |service| service.is_a? ActiveStorage::Service::DiskService }.path_for blob.key) if services.select{ |service| service.is_a? ActiveStorage::Service::DiskService }.any?
# Upload local file to mirrors (if one exists)
mirrors.each do |mirror|
mirror.upload blob.key, local_file, checksum: blob.checksum
end if local_file.present?
# If no local file exists then download a remote file and upload it to the mirrors (thanks #Rystraum)
services.first.open blob.key, checksum: blob.checksum do |temp_file|
mirrors.each do |mirror|
mirror.upload blob.key, temp_file, checksum: blob.checksum
end
end unless local_file.present?

How to specify a prefix when uploading to S3 using activestorage's direct upload?

With a standard S3 configuration:
AWS_ACCESS_KEY_ID: [AWS ID]
AWS_BUCKET: [bucket name]
AWS_REGION: [region]
AWS_SECRET_ACCESS_KEY: [secret]
I can upload a file to S3 (using direct upload) with this Rails 5.2 code (only relevant code shown):
form.file_field :my_asset, direct_upload: true
This will effectively put my asset in the root of my S3 bucket, upon submitting the form.
How can I specify a prefix (e.g. "development/", so that I can mimic a folder on S3)?
2022 update: as of Rails 6.1 (check this commit), this is actually supported:
user.avatar.attach(key: "avatars/#{user.id}.jpg", io: io, content_type: "image/jpeg", filename: "avatar.jpg")
My current workaround (at least until ActiveStorage introduces the option to pass a path for the has_one_attached and has_many_attached macros) on S3 is to implement the move_to method.
So I'm letting ActiveStorage save the image to S3 as it normally does right now (at the top of the bucket), then moving the file into a folder structure.
The move_to method basically copies the file into the folder structure you pass then deletes the file that was put at the root of the bucket. This way your file ends up where you want it.
So for instance if we were storing driver details: name and drivers_license, save them as you're already doing it so that it's at the top of the bucket.
Then implement the following (I put mine in a helper):
module DriversHelper
def restructure_attachment(driver_object, new_structure)
old_key = driver_object.image.key
begin
# Passing S3 Configs
config = YAML.load_file(Rails.root.join('config', 'storage.yml'))
s3 = Aws::S3::Resource.new(region: config['amazon']['region'],
credentials: Aws::Credentials.new(config['amazon']['access_key_id'], config['amazon']['secret_access_key']))
# Fetching the licence's Aws::S3::Object
old_obj = s3.bucket(config['amazon']['bucket']).object(old_key)
# Moving the license into the new folder structure
old_obj.move_to(bucket: config['amazon']['bucket'], key: "#{new_structure}")
update_blob_key(driver_object, new_structure)
rescue => ex
driver_helper_logger.error("Error restructuring license belonging to driver with id #{driver_object.id}: #{ex.full_message}")
end
end
private
# The new structure becomes the new ActiveStorage Blob key
def update_blob_key(driver_object, new_key)
blob = driver_object.image_attachment.blob
begin
blob.key = new_key
blob.save!
rescue => ex
driver_helper_logger.error("Error reassigning the new key to the blob object of the driver with id #{driver_object.id}: #{ex.full_message}")
end
end
def driver_helper_logger
#driver_helper_logger ||= Logger.new("#{Rails.root}/log/driver_helper.log")
end
end
It's important to update the blob key so that references to the key don't return errors.
If the key is not updated any function attempting to reference the image will look for it in it's former location (at the top of the bucket) rather than in it's new location.
I'm calling this function from my controller as soon as the file is saved (that is, in the create action) so that it looks seamless even though it isn't.
While this may not be the best way, it works for now.
FYI: Based on the example you gave, the new_structure variable would be new_structure = "development/#{driver_object.image.key}".
I hope this helps! :)
Thank you, Sonia, for your answer.
I tried your solution and it works great, but I encountered problems with overwriting attachments. I often got IntegrityError while doing it. I think, that this and checksum handling may be the reason why the Rails core team don't want to add passing pathname feature. It would require changing the entire logic of the upload method.
ActiveStorage::Attached#create_from_blob method, could also accepts an ActiveStorage::Blob object. So I tried a different approach:
Create a Blob manually with a key that represents desired file structure and uploaded attachment.
Attach created Blob with the ActiveStorage method.
In my usage, the solution was something like that:
def attach file # method for attaching in the model
blob_key = destination_pathname(file)
blob = ActiveStorage::Blob.find_by(key: blob_key.to_s)
unless blob
blob = ActiveStorage::Blob.new.tap do |blob|
blob.filename = blob_key.basename.to_s
blob.key = blob_key
blob.upload file
blob.save!
end
end
# Attach method from ActiveStorage
self.file.attach blob
end
Thanks to passing a full pathname to Blob's key I received desired file structure on a server.
Sorry, that’s not currently possible. I’d suggest creating a bucket for Active Storage to use exclusively.
The above solution will still give IntegrityError, need to use File.open(file). Thank Though for idea.
class History < ApplicationRecord
has_one_attached :gs_history_file
def attach(file) # method for attaching in the model
blob_key = destination_pathname(file)
blob = ActiveStorage::Blob.find_by(key: blob_key.to_s)
unless blob
blob = ActiveStorage::Blob.new.tap do |blob|
blob.filename = blob_key.to_s
blob.key = blob_key
#blob.byte_size = 123123
#blob.checksum = Time.new.strftime("%Y%m%d-") + Faker::Alphanumeric.alpha(6)
blob.upload File.open(file)
blob.save!
end
end
# Attach method from ActiveStorage
self.gs_history_file.attach blob
end
def destination_pathname(file)
"testing/filename-#{Time.now}.xlsx"
end
end

Paperclip Nginx 504 Gateway Time-out

I have a Rails 4 application that allows to upload videos using the jQuery Dropzone plugin and the paperclip gem. Each uploaded video is encoded into multiple formats and uploaded to Amazon S3 in the background using delayed_paperclip, av-transcoder and sidekiq gems.
All works fine with most videos, but with a higher size like 1.1GB after the upload reaches what seems like the end of the progress bar of the dropzone plugin it returns an Nginx 504 Gateway Time-out.
As far as server goes, the rails app runs on Nginx + Passenger on a couple of servers that are behind a load balancer (Nginx used here too). I do not have timeouts set in the upstream section of the load balancer, the client_max_body_size is set to 2000M (both on the load balancer and servers), I've tried setting passenger_pool_idle_time to a large value (600), that didn't help, I have also tried setting send_timeout (600s), nothing made any difference.
Note: When making those changes, I did them on the host files of both servers as well as of the load balancer and always restarted nginx afterwards.
I've read also several answers regarding similar problems like this one and this one but still can't figure this out, google wasn't much more helpful either.
Some extra notes for those unfamiliar with the whole paperclip/delayed_paperclip process, the file is uploaded to the server and then the operation is done as far as the user is concerned, in the background the post processing of the videos (encoding/uploading to S3) is pushed to Redis as a job and Sidekiq processes it whenever it has time/resources.
What could be causing this issue? How can I debug this and solve it?
UPDATE
Thanks to Sergey's answer I was able to solve the issue. Since I was restricted to a specific version of Paperclip, I couldn't update it to the newest version that has the fix, therefore I'll leave here what I ended up doing.
In the engine that I use to handle the uploads I've added the following code in the engine_name.rb file to override the methods from Paperclip that needed fixing:
Paperclip::AbstractAdapter.class_eval do
def copy_to_tempfile(src)
link_or_copy_file(src.path, destination.path)
destination
end
def link_or_copy_file(src, dest)
Paperclip.log("Trying to link #{src} to #{dest}")
FileUtils.ln(src, dest, force: true) # overwrite existing
#destination.close
#destination.open.binmode
rescue Errno::EXDEV, Errno::EPERM, Errno::ENOENT => e
Paperclip.log("Link failed with #{e.message}; copying link #{src} to #{dest}")
FileUtils.cp(src, dest)
end
end
Paperclip::AttachmentAdapter.class_eval do
def copy_to_tempfile(source)
if source.staged?
link_or_copy_file(source.staged_path(#style), destination.path)
else
source.copy_to_local_file(#style, destination.path)
end
destination
end
end
Paperclip::Storage::Filesystem.class_eval do
def flush_writes #:nodoc:
#queued_for_write.each do |style_name, file|
FileUtils.mkdir_p(File.dirname(path(style_name)))
begin
move_file(file.path, path(style_name))
rescue SystemCallError
File.open(path(style_name), "wb") do |new_file|
while chunk = file.read(16 * 1024)
new_file.write(chunk)
end
end
end
unless #options[:override_file_permissions] == false
resolved_chmod = (#options[:override_file_permissions] &~ 0111) || (0666 &~ File.umask)
FileUtils.chmod( resolved_chmod, path(style_name) )
end
file.rewind
end
after_flush_writes # allows attachment to clean up temp files
#queued_for_write = {}
end
private
def move_file(src, dest)
# Support hardlinked files
if File.identical?(src, dest)
File.unlink(src)
else
FileUtils.mv(src, dest)
end
end
end
I faced similar issue a while ago. Maybe, my experience will help.
We had m3.medium instance on Amazon with 4Gb of memory.
User could be able to upload large video files. We faced an issue of 504 error when uploading files larger than 400Mb.
During monitoring and logging the upload process it appeared that Paperclip creates 4 files per attachment and thus all the instance resources work on a file system.
Here there is a description of this problem
https://github.com/thoughtbot/paperclip/issues/1642
and proposed a solution - use links instead of files when possible. You can see the appropriate code changes here
https://github.com/arnonhongklay/paperclip/commit/cd80661df18d7cd112944bfe26d90cb87c928aad
However 2 days ago Paperclip was updated to 5.2.0 version and they implemented similar solution.
So for now it creates only one file per attachment. Thus our file system is not overloaded and after updating to 5.2.0 version we stopped receiving 504 error.
Conclusion:
Use monkey patch from the link attached above if you're restricted in Paperclip version for some reason
Update Paperclip to 5.2.0 version. Should help.

Aws::S3 undefined method 'with_prefix' on a ::Resource

So, first, I'll give you an idea of what I'm trying to accomplish. I host a bunch of photos on S3. They're car photos, and organized by VIN. At creation, I don't always have the VIN, so I make a fake one. Later on, when updated with the correct VIN, I want to update the folder name on S3. Can't rename stuff in S3, so I have to copy with a new name, and delete the original. Phew! Now on to how I'm failing...
creds = ::Aws::Credentials.new(Settings.aws.access_key_id, Settings.aws.secret_access_key)
s3 = ::Aws::S3::Client.new(region: 'us-west-1', credentials: creds)
bucket = Settings.aws.bucket
s3.copy_object(bucket: bucket, copy_source: "#{bucket}/foo", key: 'bar')
s3.delete_object(bucket: bucket, key: "#{bucket}/foo")
This does not work... Aws::S3::Errors::NoSuchKey: The specified key does not exist. when calling copy_object. copy_source is a folder, so it doesn't work like a regular object, fine.
So, I looked around and found that I have to call:
s3 = ::Aws::S3::Resource.new(region: 'us-west-1', credentials: creds)
bucket = bucket = s3.bucket(Settings.aws.bucket)
bucket.objects.with_prefix('foo/').each do |object|
object.copy_from(copy_source: "#{bucket}/foo")
end
This does not work... notice that I'm calling ::Resource now, as I don't know how to get a bucket from ::Client. With above, I get NoMethodError: undefined method 'with_prefix' for #<Aws::Resources::Collection:0x007fad587255a8> with baffles me, as everything I read seems to point to that solution.
I'm using v2 of the AWS SDK. Not sure if I'm looking at v1 solutions?
I have no idea how the copy_from works, honestly... what I really want is a copy_to which doesn't seem to be in the documentation.
aws-sdk-core-ruby github
aws sdk docs
The 'objects' method of 'bucket' class in aws-sdk v2 returns a Collection of ObjectSummary (See: http://docs.aws.amazon.com/sdkforruby/api/Aws/S3/Bucket.html#objects-instance_method).
To query a list of objects with a prefix you should use:
bucket.objects({prefix: 'some_prefix'})
I use
bucket.objects.find_all { |object| object.key.include?('foo/') }
for aws sdk v2

Tracking Upload Progress of File to S3 Using Ruby aws-sdk

Firstly, I am aware that there are quite a few questions that are similar to this one in SO. I have read most, if not all of them, over the past week. But I still can't make this work for me.
I am developing a Ruby on Rails app that allows users to upload mp3 files to Amazon S3. The upload itself works perfectly, but a progress bar would greatly improve user experience on the website.
I am using the aws-sdk gem which is the official one from Amazon. I have looked everywhere in its documentation for callbacks during the upload process, but I couldn't find anything.
The files are uploaded one at a time directly to S3 so it doesn't need to load it into memory. No multiple file upload necessary either.
I figured that I may need to use JQuery to make this work and I am fine with that.
I found this that looked very promising: https://github.com/blueimp/jQuery-File-Upload
And I even tried following the example here: https://github.com/ncri/s3_uploader_example
But I just could not make it work for me.
The documentation for aws-sdk also BRIEFLY describes streaming uploads with a block:
obj.write do |buffer, bytes|
# writing fewer than the requested number of bytes to the buffer
# will cause write to stop yielding to the block
end
But this is barely helpful. How does one "write to the buffer"? I tried a few intuitive options that would always result in timeouts. And how would I even update the browser based on the buffering?
Is there a better or simpler solution to this?
Thank you in advance.
I would appreciate any help on this subject.
The "buffer" object yielded when passing a block to #write is an instance of StringIO. You can write to the buffer using #write or #<<. Here is an example that uses the block form to upload a file.
file = File.open('/path/to/file', 'r')
obj = s3.buckets['my-bucket'].objects['object-key']
obj.write(:content_length => file.size) do |buffer, bytes|
buffer.write(file.read(bytes))
# you could do some interesting things here to track progress
end
file.close
After read the source code of the AWS gem, I've adapted (or mostly copy) the multipart upload method to yield the current progress based on how many chunks have been uploaded
s3 = AWS::S3.new.buckets['your_bucket']
file = File.open(filepath, 'r', encoding: 'BINARY')
file_to_upload = "#{s3_dir}/#{filename}"
upload_progress = 0
opts = {
content_type: mime_type,
cache_control: 'max-age=31536000',
estimated_content_length: file.size,
}
part_size = self.compute_part_size(opts)
parts_number = (file.size.to_f / part_size).ceil.to_i
obj = s3.objects[file_to_upload]
begin
obj.multipart_upload(opts) do |upload|
until file.eof? do
break if (abort_upload = upload.aborted?)
upload.add_part(file.read(part_size))
upload_progress += 1.0/parts_number
# Yields the Float progress and the String filepath from the
# current file that's being uploaded
yield(upload_progress, upload) if block_given?
end
end
end
The compute_part_size method is defined here and I've modified it to this:
def compute_part_size options
max_parts = 10000
min_size = 5242880 #5 MB
estimated_size = options[:estimated_content_length]
[(estimated_size.to_f / max_parts).ceil, min_size].max.to_i
end
This code was tested on Ruby 2.0.0p0

Resources