carrierwave upload caching - ruby-on-rails

How does carrierwave upload caching functionality work? From what I've read, it looks like it keeps the uploaded file in public/uploads/tmp to avoid reupload across form redisplays. I am guessing the cache would get assigned a unique id, but still be publicly accessible. How to make it more secure for sensitive uploads or disable this feature altogether?
One way to avoid this is to have the uploader as a separate model from the target model, such that validation errors won't require reuploading.

CarrierWave keeps uploaded images in a cache dir so you can easily re-submit forms in case of validation errors without forcing your users to re-upload images.
The cache dir in default is public/uploads/tmp but you can change it by setting the cache_dir configuration parameter.
Usually uploaded images are available for download without authentication. Therefore, placing uploaded and cached files in a public directory is fine. You can also change your uploader class to have a filename method that generates a unique random ID to make it less guessable.
By the way, this blog post describes how to integrate CarrierWave while storing and transforming images in the cloud and delivering through a CDN.

Related

Processing file before upload using ActiveStorage

How would I process a file before it's uploaded using activestorage. I need to be able to modify an svg file's content before it actually gets uploaded to S3. Can't seem to find any callbacks.
There is no way to do this natively with ActiveStorage. It's the major drawback with using ActiveStorage.
As far as I know, the only way to modify an upload is to create a variant of the original upload after it is created...which creates a (completely different) variant image based on the image that was originally uploaded.
ActiveStorage is easy to setup but, after using it with a few applications, Carrierwave..etc seem like better options.
In addition, if you want to upload in a background job, ActiveStorage is a pain.

how do I add files already stored on s3 to carrierwave backed by same datastore?

I already have files in my s3, which were uploaded via FTP, and I'd just like to attach them to my models, which allow upload via CarrierWave, if I'm using the same bucket to store the files I upload via CarrierWave and the ones I upload via FTP, is there a way to just assign the s3 key for the file to the (new) associated record (via a model) where the file itself is handled via CarrierWave's attachment strategy?
Assuming you have the thumbnails already created, and can store the files in the correct directory on S3, you could simply:
#user.update_column(:image, "your-image-name.png")
This will not execute any callbacks, and carrierwave will assume all processing has already been completed.
That's a lot of assumptions though, so this would likely not work for you in reality.

Where do you store uploaded user images

I am not yet using a service such as Amazon S3, so where in the file structure should I store uploaded user images? I want to avoid the public directory as the images are private.
Are you using a plugin to handle your uploads? Many of them allow you to specify a path to store files, if you want to avoid the public folder a reasonable suggestion would be "#{RAILS_ROOT}/uploads/images/"
It's very much a matter of personal taste though.
For example in a carrierwave uploader this will place items in an uploads folder below RAILS_ROOT which is not publicly accessible.
def store_dir
"#{RAILS_ROOT}/uploads/#{model.class.to_s.underscore}/#{mounted_as}/#{model.id}"
end
https://github.com/carrierwaveuploader/carrierwave#changing-the-storage-directory

Strategy for avoiding file upload naming conflicts

I have a webapp in Rails which as an AJAX file upload feature. Files are uploaded to a remote server (AWS S3). My current strategy is to upload the files in a temp/ directory (with their original name) until the user submits the form, and then rename them to their definitive name.
But the problem is that if multiple users try to upload two files with the same names at the same time, then one is gonna override the other.
The strategy I was thinking of to solve this was to generate random SHA1 when the upload page is loaded, store them in a table locally to make sure they're unique, and remove them when the temp file is renamed.
Do you see problems with this approach?
What's a good strategy to solve this problem?
One problem is, if they navigate away from the page without uploading anything, their hash will stay in the database, and eventually make a mess. I would avoid storing anything this temporary in the database.
Rather than try to come up with your own way to name temporary files, why not use the ruby tempfile library, which will do it for you?
Originally, I thought you were uploading the files to the ruby server, and uploading them to s3 yourself. Tempfiles won't help if users are uploading files directly. If you just want unique names for your temp files, a UUID generator might work for you. There is a Ruby UUID generator gem which is designed to not produce duplicates, even in a distributed setting. If you name your files with these, you shouldn't need to store anything in the database.

Extracting uploaded archive to S3 with CarrierWave on Heroku

I want to do something what I thought will be a simple task:
Have a form with these controls:
File upload for one file
Checkbox if this file should be extracted
Text input where I would specify which file should I link to (required only if the checkbox is checked) - index_file
After submitting form:
If the checkbox isn't checked, upload the file via CarrierWave to S3 to the specified store_dir
If the checkbox is checked, extract all files from the archive (I expect only ZIP archives; I need to keep the directory structure), upload extracted files to the specified store_dir and set the index_file in database (I don't need to save to database anything about other extracted files)
As I have found, it isn't an easy task because of Heroku limitations. These files will have a large size (hundreds of MiBs or a few GiBs), so I don't want to redownload this file from S3 if possible.
I think that using Delayed Job or Resque might work, but I'm not exactly sure how to do it and what is the best solution of my problem.
Does anyone have any idea how to solve it with using the lowest resources as possible? I can change CarrierWave to another uploader (Paperclip etc.) and my hosting provider too if it isn't possible on Heroku.
I was also thinking about using CloudFlare, would this still work without problems?
Thank you for answers.
Based on this heroku support email, it would seem that the /tmp directory is many gigs in size. You just need to clean up after yourself so Heroku as a platform is not the issue.
A couple of articles may help you solve the problem:
https://github.com/jnicklas/carrierwave/wiki/How-to%3A-Make-Carrierwave-work-on-Heroku - which explains how to configure your app to use the /tmp directory as the cache directory for CarrierWave. Pay attention to the following line:
use Rack::Static, :urls => ['/carrierwave'], :root => 'tmp' # adding this line
This instructs rack to serve /carrierwave/xzy from the /tmp directory (useful for storing images temporarily)
Then, using the uploader.cache! method, you can deliberately cache the inbound uploaded file. Once stored, you can do checks to determine whether to call the uploader.store! method which will promote the contents to S3 (assuming you configured S3 as the store for CarrierWave.

Resources