Strategy for avoiding file upload naming conflicts - ruby-on-rails

I have a webapp in Rails which as an AJAX file upload feature. Files are uploaded to a remote server (AWS S3). My current strategy is to upload the files in a temp/ directory (with their original name) until the user submits the form, and then rename them to their definitive name.
But the problem is that if multiple users try to upload two files with the same names at the same time, then one is gonna override the other.
The strategy I was thinking of to solve this was to generate random SHA1 when the upload page is loaded, store them in a table locally to make sure they're unique, and remove them when the temp file is renamed.
Do you see problems with this approach?
What's a good strategy to solve this problem?

One problem is, if they navigate away from the page without uploading anything, their hash will stay in the database, and eventually make a mess. I would avoid storing anything this temporary in the database.
Rather than try to come up with your own way to name temporary files, why not use the ruby tempfile library, which will do it for you?
Originally, I thought you were uploading the files to the ruby server, and uploading them to s3 yourself. Tempfiles won't help if users are uploading files directly. If you just want unique names for your temp files, a UUID generator might work for you. There is a Ruby UUID generator gem which is designed to not produce duplicates, even in a distributed setting. If you name your files with these, you shouldn't need to store anything in the database.

Related

True Paperclip Replacement (Speficially Structure of the File System)

With Rails 6, I need to replace Paperclip, but I can't find any substitutions that actually easily replicate it.
Specifically, the file structure paperclip used:
:model/:attachmant_field/000/000/000/:identifier/:style/:original_file_name
Over the last decade we have built several tools that rely on that structure (or something similar) and in addition our users expect that after uploading an image, they can reference the styles with the same file name and a permanent url (not a randomly generated name like ActiveStorage and Shrine does) and change the "style" component in the url to a different one in their html.
I've spent several days both on Shrine and ActiveStorage working to get the file structure and naming to work on and keep failing, as despite being "natural replacements" they don't actually handle things in the same way.
Our end system is on Amazon S3, though integrating with that hasn't been the issue, just the file system.
Thanks for your help, it's been really frustrating having to remove something that works great when there seems to be nothing that actually replaces it, if you want/need things done in the same way. I'd rather not have to start rewriting all of tools that we developed and resetting our customers expectations to work with a new structure.
Thanks so much.
Have you tried Carrierwave? You can specify any storage path and build it dynamically using model name (model.class.to_s.underscore), attachment field (mounted_as), model id (model.id). The original file name is also available as original_filename.

Rails 3.2 zip multiple text files

I need to write about 10 text files from query results then zip and send them.
Is there a way to do this all in memory or do I need to write the files to /tmp or database first? What is best practise for a Rails 3.2.11 application?
I don't need any functionality beyond creating the files, zipping and sending them in a single action. The files are not large.
You will need to create some temporary files. Where you chose to put them is up to, you, however.
Here's a blog post (not mine, and not tested, but I see no reason the process described shouldn't work) that describes using Rails to zip some files and send the resulting archive to the user. It shouldn't be too hard to adapt it for your needs.

Extracting uploaded archive to S3 with CarrierWave on Heroku

I want to do something what I thought will be a simple task:
Have a form with these controls:
File upload for one file
Checkbox if this file should be extracted
Text input where I would specify which file should I link to (required only if the checkbox is checked) - index_file
After submitting form:
If the checkbox isn't checked, upload the file via CarrierWave to S3 to the specified store_dir
If the checkbox is checked, extract all files from the archive (I expect only ZIP archives; I need to keep the directory structure), upload extracted files to the specified store_dir and set the index_file in database (I don't need to save to database anything about other extracted files)
As I have found, it isn't an easy task because of Heroku limitations. These files will have a large size (hundreds of MiBs or a few GiBs), so I don't want to redownload this file from S3 if possible.
I think that using Delayed Job or Resque might work, but I'm not exactly sure how to do it and what is the best solution of my problem.
Does anyone have any idea how to solve it with using the lowest resources as possible? I can change CarrierWave to another uploader (Paperclip etc.) and my hosting provider too if it isn't possible on Heroku.
I was also thinking about using CloudFlare, would this still work without problems?
Thank you for answers.
Based on this heroku support email, it would seem that the /tmp directory is many gigs in size. You just need to clean up after yourself so Heroku as a platform is not the issue.
A couple of articles may help you solve the problem:
https://github.com/jnicklas/carrierwave/wiki/How-to%3A-Make-Carrierwave-work-on-Heroku - which explains how to configure your app to use the /tmp directory as the cache directory for CarrierWave. Pay attention to the following line:
use Rack::Static, :urls => ['/carrierwave'], :root => 'tmp' # adding this line
This instructs rack to serve /carrierwave/xzy from the /tmp directory (useful for storing images temporarily)
Then, using the uploader.cache! method, you can deliberately cache the inbound uploaded file. Once stored, you can do checks to determine whether to call the uploader.store! method which will promote the contents to S3 (assuming you configured S3 as the store for CarrierWave.

carrierwave upload caching

How does carrierwave upload caching functionality work? From what I've read, it looks like it keeps the uploaded file in public/uploads/tmp to avoid reupload across form redisplays. I am guessing the cache would get assigned a unique id, but still be publicly accessible. How to make it more secure for sensitive uploads or disable this feature altogether?
One way to avoid this is to have the uploader as a separate model from the target model, such that validation errors won't require reuploading.
CarrierWave keeps uploaded images in a cache dir so you can easily re-submit forms in case of validation errors without forcing your users to re-upload images.
The cache dir in default is public/uploads/tmp but you can change it by setting the cache_dir configuration parameter.
Usually uploaded images are available for download without authentication. Therefore, placing uploaded and cached files in a public directory is fine. You can also change your uploader class to have a filename method that generates a unique random ID to make it less guessable.
By the way, this blog post describes how to integrate CarrierWave while storing and transforming images in the cloud and delivering through a CDN.

How to change Redmine to support versioned files in every issue

Its redmine, a Ruby on Rails application. Currently, every issue can have one or more files. But if a user decide to update/change them, the old files are replaced. My task is to develop something to allow versioned files for every issue: so, if a user update the content of an existing issue, the previous state of the issue is preserved and it can be displayed in some form.
I'm new to RoR and Redmine development.
I guess the best thing in this case is to modify Redmine so that instead of uploading files to the issue, you put the files into a subversion repository and then add a link in the issue.
Alternatively, allow multiple files to be added, and modify to code to rename them everytime one is uploaded - appending a suffix (_1, _2 etc) to each filename.

Resources