Transferring some (not all) files between S3 buckets using Paperclip - ruby-on-rails

I have a Heroku-hosted app that uses Paperclip to store User photos on Amazon S3
I want to move some (not all) files to a new bucket based on some internal logic (the app is multi-tenant and I'm separating AWS file storage and my Postgres DB into separate tenants/schemas)
I have 2 options I'm considering (drawn above)
Option 1 - Use the AWS Cli to move files directly between buckets
This option is AWS native, but it has the drawback of having to worry about an entire folder structure for each file (thumbnails, etc..). Moving a file involves moving all the various styles of the file - original, medium size, thumbnail, etc.. so it's not as straightforward as copying 1 file over.
It also copies everything over to the new bucket with the exact same folder/id structure, which I'd like to avoid since the User's corresponding DB info (e.g. id) will change when I migrate them over in the postgres DB
Option 2 - Use paperclip to pull down each file locally and re-upload it
This is an attractive option because it lets paperclip handle all the work.
However, paperclip uses the bucket name to construct the URL of the file. I need it to pull from 1 bucket and push to another bucket. Is there a way to set the bucket name individually for each transaction?

Paperclip uses the bucket name to construct the URL of a remote file but the names of these directories and files doesn't depend on the bucket name. If your files or directories contains the bucket name then you are doing it wrong and you should start by fixing it.
Do the following:
Sync your public/system directory with oldbucket using the aws s3 sync OLD_BUCKET_URL public/system command
Perform the changes in the directories and files locally with a Ruby script using Paperclip
Sync (upload) your public/system directory with newbucket using the aws s3 sync public/system NEW_BUCKET_URL command.

Related

How do I get the local path to an Active Storage Blob?

I built my Rails app and figured I would use Active Storage (AS). I then realized that AS dumps all your files into your storage root. I need to segregate that between clients and also I would rather organize these on a model basis etc. Carrierwave (CW) can do this out of the box. I am going to build a rake task to migrate these old attachments over.
The AS blob key is the filename stored locally except on my local machine its stored like this:
/storage/HR/mw/HRmWZZNk4wd7dD1nt9iUbi1n
and on my S3 compatible store:
/HRmWZZNk4wd7dD1nt9iUbi1n
There seems to be no built-in method to return the local path of an AS file (which CW has). I know I can on the fly build the local path but looking to see if I am missing something obvious here.
Found it here:
Get path to ActiveStorage file on disk
ActiveStorage::Blob.service.send(:path_for, user.avatar.key)

Exporting existing uploaded images in Refinery to Amazon S3

I have an existing Rails RefineryCMS application, which have been running for quite some time. It has alot of image and document uploads, which always have been uploaded to the local filesystem.
But we are moving to Heroku, then this will be a problem, since Heroku doesn't persist these files.
So, we need to get all the existing images and document exported to a Amazon S3.
How could we achieve this?
Would it be plain simple as just copying over the existing files from the current production environment to the S3 bucket?
Kind regards
The plain solution was just to copy the existing folders that are generated by fileupload from Dragonfly in "app/public" to Amazon S3.

Which directory should I save generated image in rails as a best practice?

I created a image with rails apps by using rmagick and save the images to public/images.
But it looks like capistrano default rails job reset the folder each deploy.
Should I configure capistrano that public/images must not deleted.
Or first of all that was wrong idea to generate images to public/images?
Generally, the standard is to keep the publicly accessible content in the public directory. And for storing images we can use both public directory (provided we want them to be publicly accessible) or in some cloud based storage like Amazon s3. That is perfectly based on the requirement.
And finally, as you are keeping some dynamically generated data or may be user uploaded data in public directory, you should configure capistrano not to delete/modify those data.

Extracting uploaded archive to S3 with CarrierWave on Heroku

I want to do something what I thought will be a simple task:
Have a form with these controls:
File upload for one file
Checkbox if this file should be extracted
Text input where I would specify which file should I link to (required only if the checkbox is checked) - index_file
After submitting form:
If the checkbox isn't checked, upload the file via CarrierWave to S3 to the specified store_dir
If the checkbox is checked, extract all files from the archive (I expect only ZIP archives; I need to keep the directory structure), upload extracted files to the specified store_dir and set the index_file in database (I don't need to save to database anything about other extracted files)
As I have found, it isn't an easy task because of Heroku limitations. These files will have a large size (hundreds of MiBs or a few GiBs), so I don't want to redownload this file from S3 if possible.
I think that using Delayed Job or Resque might work, but I'm not exactly sure how to do it and what is the best solution of my problem.
Does anyone have any idea how to solve it with using the lowest resources as possible? I can change CarrierWave to another uploader (Paperclip etc.) and my hosting provider too if it isn't possible on Heroku.
I was also thinking about using CloudFlare, would this still work without problems?
Thank you for answers.
Based on this heroku support email, it would seem that the /tmp directory is many gigs in size. You just need to clean up after yourself so Heroku as a platform is not the issue.
A couple of articles may help you solve the problem:
https://github.com/jnicklas/carrierwave/wiki/How-to%3A-Make-Carrierwave-work-on-Heroku - which explains how to configure your app to use the /tmp directory as the cache directory for CarrierWave. Pay attention to the following line:
use Rack::Static, :urls => ['/carrierwave'], :root => 'tmp' # adding this line
This instructs rack to serve /carrierwave/xzy from the /tmp directory (useful for storing images temporarily)
Then, using the uploader.cache! method, you can deliberately cache the inbound uploaded file. Once stored, you can do checks to determine whether to call the uploader.store! method which will promote the contents to S3 (assuming you configured S3 as the store for CarrierWave.

Paperclip: migrating from file system storage to Amazon S3

I have a RoR website, where users can upload photos. I use paperclip gem to upload the photos and store them on the server as files. I am planning to move to Amazon S3 for storing the photos. I need to move all my existing photos from server to Amazon S3. Can someone tell me the best way for moving the photos. Thanks !
You'll want to log into your AWS Console and create a bucket structure to facilitate your images. Neither S3 nor Paperclip have any tools in the way of bulk migrations from file system -> s3, you'll need to use the tool s3cmd for that. In particular, you're interested in the s3cmd sync command, something along the lines of:
s3cmd sync ./public/system/images/ s3://imagesbucket
If you have any image urls hard-coded into your database (a la markdown/template code) this might be a little tricky. One option would be to manually update your urls to point to the new bucket. Alternatively, you can rack-rewrite.
You can easily do this by creating a bucket on Amazon S3 that has the same folder structure as your public directory on your Rails app.
So say for instance, you create a new bucket on Amazon S3 called MyBucket and it has a folder in it called images. You'd just move all of your images within your Rails app's images folder over to that new bucket's images folder.
Then you can set up your app to use an asset host like this answer describes: is it good to use S3 for Rails "public/images" and there an easy way to do it?
If you are using image_tag or other tag helpers (javascripts, stylesheets, etc), then it will use that asset_host for production environments and properly generate the URL to your S3 bucket.
I found this script which takes care of moving the images to Amazon S3 bucket using rake task.
https://gist.github.com/924617

Resources