Sync-ing files in Rails repo to S3 - ruby-on-rails

I am thinking of implementing a rake task that would sync certain files in my repository to S3. The catch is that I only want to update the files when they are changed in my Repo. So if file A gets modified and B stays the same, only file A will be synchronized to S3 during my next app deploy.
What is a reliable way to determine that a file has been modified? I am thinking of using git to determine whether the file has been changed locally.... is there any other way to do this? Does S3 provide similar functionality to this?

S3 does not presently support conditional PUTs, which would be the ideal solution, but you can get this behavior with two requests instead. Your sync operation would look something like:
For each file that you want on S3:
Calculate the MD5 of the local file.
Issue a HEAD request for that S3 object.
Issue a PUT request if the object's Content-MD5 differs or the object does not exist.
That said, this sounds a lot like something you'd do with assets, in which case you'd be reinventing the wheel. The Rails 3 asset pipeline addresses this problem well -- in particular, fingerprinting assets and putting the hash in the URL allows you to serve them with insanely long max-age values since they're immutable -- and the asset_sync gem can already put your assets on S3 automatically.

What about deleted files? The easy way to do it is to blast the whole directory with latest.

Related

True Paperclip Replacement (Speficially Structure of the File System)

With Rails 6, I need to replace Paperclip, but I can't find any substitutions that actually easily replicate it.
Specifically, the file structure paperclip used:
:model/:attachmant_field/000/000/000/:identifier/:style/:original_file_name
Over the last decade we have built several tools that rely on that structure (or something similar) and in addition our users expect that after uploading an image, they can reference the styles with the same file name and a permanent url (not a randomly generated name like ActiveStorage and Shrine does) and change the "style" component in the url to a different one in their html.
I've spent several days both on Shrine and ActiveStorage working to get the file structure and naming to work on and keep failing, as despite being "natural replacements" they don't actually handle things in the same way.
Our end system is on Amazon S3, though integrating with that hasn't been the issue, just the file system.
Thanks for your help, it's been really frustrating having to remove something that works great when there seems to be nothing that actually replaces it, if you want/need things done in the same way. I'd rather not have to start rewriting all of tools that we developed and resetting our customers expectations to work with a new structure.
Thanks so much.
Have you tried Carrierwave? You can specify any storage path and build it dynamically using model name (model.class.to_s.underscore), attachment field (mounted_as), model id (model.id). The original file name is also available as original_filename.

How do I get the local path to an Active Storage Blob?

I built my Rails app and figured I would use Active Storage (AS). I then realized that AS dumps all your files into your storage root. I need to segregate that between clients and also I would rather organize these on a model basis etc. Carrierwave (CW) can do this out of the box. I am going to build a rake task to migrate these old attachments over.
The AS blob key is the filename stored locally except on my local machine its stored like this:
/storage/HR/mw/HRmWZZNk4wd7dD1nt9iUbi1n
and on my S3 compatible store:
/HRmWZZNk4wd7dD1nt9iUbi1n
There seems to be no built-in method to return the local path of an AS file (which CW has). I know I can on the fly build the local path but looking to see if I am missing something obvious here.
Found it here:
Get path to ActiveStorage file on disk
ActiveStorage::Blob.service.send(:path_for, user.avatar.key)

Strategy for avoiding file upload naming conflicts

I have a webapp in Rails which as an AJAX file upload feature. Files are uploaded to a remote server (AWS S3). My current strategy is to upload the files in a temp/ directory (with their original name) until the user submits the form, and then rename them to their definitive name.
But the problem is that if multiple users try to upload two files with the same names at the same time, then one is gonna override the other.
The strategy I was thinking of to solve this was to generate random SHA1 when the upload page is loaded, store them in a table locally to make sure they're unique, and remove them when the temp file is renamed.
Do you see problems with this approach?
What's a good strategy to solve this problem?
One problem is, if they navigate away from the page without uploading anything, their hash will stay in the database, and eventually make a mess. I would avoid storing anything this temporary in the database.
Rather than try to come up with your own way to name temporary files, why not use the ruby tempfile library, which will do it for you?
Originally, I thought you were uploading the files to the ruby server, and uploading them to s3 yourself. Tempfiles won't help if users are uploading files directly. If you just want unique names for your temp files, a UUID generator might work for you. There is a Ruby UUID generator gem which is designed to not produce duplicates, even in a distributed setting. If you name your files with these, you shouldn't need to store anything in the database.

Extracting uploaded archive to S3 with CarrierWave on Heroku

I want to do something what I thought will be a simple task:
Have a form with these controls:
File upload for one file
Checkbox if this file should be extracted
Text input where I would specify which file should I link to (required only if the checkbox is checked) - index_file
After submitting form:
If the checkbox isn't checked, upload the file via CarrierWave to S3 to the specified store_dir
If the checkbox is checked, extract all files from the archive (I expect only ZIP archives; I need to keep the directory structure), upload extracted files to the specified store_dir and set the index_file in database (I don't need to save to database anything about other extracted files)
As I have found, it isn't an easy task because of Heroku limitations. These files will have a large size (hundreds of MiBs or a few GiBs), so I don't want to redownload this file from S3 if possible.
I think that using Delayed Job or Resque might work, but I'm not exactly sure how to do it and what is the best solution of my problem.
Does anyone have any idea how to solve it with using the lowest resources as possible? I can change CarrierWave to another uploader (Paperclip etc.) and my hosting provider too if it isn't possible on Heroku.
I was also thinking about using CloudFlare, would this still work without problems?
Thank you for answers.
Based on this heroku support email, it would seem that the /tmp directory is many gigs in size. You just need to clean up after yourself so Heroku as a platform is not the issue.
A couple of articles may help you solve the problem:
https://github.com/jnicklas/carrierwave/wiki/How-to%3A-Make-Carrierwave-work-on-Heroku - which explains how to configure your app to use the /tmp directory as the cache directory for CarrierWave. Pay attention to the following line:
use Rack::Static, :urls => ['/carrierwave'], :root => 'tmp' # adding this line
This instructs rack to serve /carrierwave/xzy from the /tmp directory (useful for storing images temporarily)
Then, using the uploader.cache! method, you can deliberately cache the inbound uploaded file. Once stored, you can do checks to determine whether to call the uploader.store! method which will promote the contents to S3 (assuming you configured S3 as the store for CarrierWave.

Taking images out of SVN

I have a bunch of images in SVN I would like to move out and put on S3. How have you dealt with keeping images out of your ruby on rails apps and out of SVN?
You don't have to put your images into your repository. You could still have them on your server, but it doesn't really matter where they're linked from. If you don't use any plugins to manage your assets then you can just remove them from your repository, upload all of them to S3 and update all links pointing to them.
If you do use some kind of plugin like paperclip or attachment_fu then you'll have to tell it where to find your files.

Resources