How can I migrate CarrierWave files to a new storage mechanism? - ruby-on-rails

I have a Ruby on Rails site with models using CarrierWave for file handling, currently using local storage. I want to start using cloud storage and I need to migrate existing local files to the cloud. I am wondering if anyone can point out a method for doing this?
Bonus points for using a model attribute that would allow me to do this row-by-row in the background without interrupting my site for extended downtime (in other words, some model rows would still have local storage while others used cloud storage).
My first instinct is to create a new uploader for each model that uses cloud storage, so I have two uploaders on each model, then transferring the files from one to the other, setting an attribute to indicate which file should be used until they are all transferred, then removing the old uploader. That seems a little excessive.

Minimal to Possibly Zero Donwtime Procedure
In my opinion, the easiest and fastest way to accomplish what you want with almost no downtime is this: (I will assume that you will use AWS cloud, but similar procedure is applicable to any cloud service)
Figure out and setup your assets bucket, bucket policies etc for making the assets publicly accessible.
Using s3cmd (command line tool for interacting with S3) or a GUI app, copy entire assets folder from file system to the appropriate folder in S3.
In your app, setup carrierwave and update your models/uploaders for :fog storage.
Do not restart your application yet. Instead bring up rails console and for your models, check that the new assets URL is correct and accessible as planned. For example, for a video model with picture asset, you can check this way:
Video.first.picture.url
This will give you a full cloud URL based on the updated settings. Copy the URL and paste in a browser to make sure that you can get to it fine.
If this works for at least one instance of each model that has assets, you are good to restart your application.
Upon restart, all your assets are being served from cloud, and you didn't need any migrations or multiple uploaders in your models.
(Based on comment by #Frederick Cheung): Using s3cmd (or something similar) rsync or sync the assets folder from the filesystem to S3 to account for assets that were uploaded between steps 2 and 5, if any.
PS: If you need help setting up carrierwave for cloud storage, let me know.

I'd try the following steps:
Change the storage in the uploaders to :fog or what ever you want to use
Write a migration like rails g migration MigrateFiles to let carrierwave get the current files, process them and upload them to the cloud.
If your model looks like this:
class Video
mount_uploader :attachment, VideoUploader
end
The migration would look like this:
#videos = Video.all
#videos.each do |video|
video.remote_attachment_url = video.attachment_url
video.save
end
If you execute this migration the following should happen:
Carrierwave downloads each image because you specified a remote url for the attachment(the current location, like http://test.com/images/1.jpg) and saves it to the cloud because you changed that in the uploader.
Edit:
Since San pointed out this will not work directly you should maybe create an extra column first, run a migration to copy the current attachment_urls from all the videos into that column, change the uploader after that and run the above migration using the copied urls in that new column. With another migration just delete the column again. Not that clean and easy but done in some minutes.

When we use Heroku, most of people suggest to use cloudinary. Free and simple setup.
My case is when we use cloudinary service and need move into aws S3 for some reasons.
This is what i did with the uploader:
class AvatarUploader < CarrierWave::Uploader::Base
def self.set_storage
if ENV['UPLOADER_SERVICE'] == 'aws'
:fog
else
nil
end
end
if ENV['UPLOADER_SERVICE'] == 'aws'
include CarrierWave::MiniMagick
else
include Cloudinary::CarrierWave
end
storage set_storage
end
also, setup the rake task:
task :migrate_cloudinary_to_aws do
profile_image_old_url = []
Profile.where("picture IS NOT NULL").each do |profile_image|
profile_image_old_url << profile_image
end
ENV['UPLOADER_SERVICE'] = 'aws'
load("#{Rails.root}/app/uploaders/avatar_uploader.rb")
Profile.where("picture IS NOT NULL OR cover IS NOT NULL").each do |profile_image|
old_profile_image = profile_image_old_url.detect { |image| image.id == profile_image.id }
profile_image.remote_picture_url = old_profile_image.picture.url
profile_image.save
end
end
The trick is how to change the uploader provider by env variable. Good luck!

I have migrated the Carrier wave files to Amazon s3 with s3cmd and it works.
Here are the steps to follow:
Change the storage kind of the uploader to fog.
Create a bucket on Amazon s3 if you already dont have one.
Install s3cmd on the remote server sudo apt-get install s3cmd
Configure s3cmd s3cmd --configure.
You would need to enter public and secret key here, provided by Amazon.
Sync the files by this command s3cmd sync /path_to_your_files ://bucket_name/
Set this flag --acl-public to upload the file as public and avoid permission issues.
Restart your server
Notes:
sync will not duplicate your records. It will first check if the file is present on remote server or not.

Related

How do you set a file to be uploaded to a specific folder when using Cloudinary and Active Storage?

I understand how this can be down when uploading directly to Cloudinary using the below syntax and passing the folder name as an argument.
Cloudinary::Uploader.upload("sample.jpg", :use_filename => true, :folder => "folder1/folder2")
However, I am using ActiveStorage, and when I upload a photo in the above manner it isn't attached to my Post model or associated in any way with my app.
I am using the following code to attach images
post.send(:images).attach io: StringIO.new(new_data), filename: blob.filename.to_s,
content_type: 'image'
It does not accept an argument to specify the folder. I have tried my best to read both ActiveStorage and Cloudinary docs in an attempt to find a way to make this work, however, I cannot seem to figure it out.
I have seen that setting a custom folder header may be a way to get this to work, but again cannot figure out how to set a custom header for the above code which takes place in the below job.
require 'tmpdir'
require 'fileutils'
require 'open-uri'
class ResizeImagesJob < ApplicationJob
queue_as :default
def perform(post)
post.images.each do |image|
blob = image.blob
blob.open do |temp_file|
path = temp_file.path
pipeline = ImageProcessing::MiniMagick.source(path)
.resize_to_limit(1200, 1200)
.call(destination: path)
new_data = File.binread(path)
post.send(:images).attach io: StringIO.new(new_data), filename: blob.filename.to_s,
content_type: 'image'
end
image.purge_later
end
end
end
What the above job is doing is waiting until after a post is created and then resizing and reattaching the photos to the post well deleting the originals. I am taking this approach to avoid integrity errors that take place resizing a post directly on Cloudinary after upload.
What I want to do is store the resized photos in a different folder. The reason for this is that I am using direct_upload and it is possible for users to upload photos without ever creating a post. Thus causing me to store unused, photos. This would provide an easy way to identify and deal with such images.
For folder names based on Rails environment
You can dynamically set your folder on storage.yml:
cloudinary:
service: Cloudinary
folder: <%= Rails.env %>
And so Cloudinary will automatically create folders based on your Rails env:
This is a long due issue with Active Storage that seems to have been worked around by the Cloudinary team. Thanks for the amazing work ❤️
By default, Cloudinary Active Storage service uploads to the root of the Cloudinary account. If you would like to upload files to a different base folder then you can configure that in the storage.yml file. To do that, you would add the folder option and set it to the folder in which you would like the Cloudinary service to upload the resources to -
cloudinary_gallery:
service: Cloudinary
folder: my_gallery_images
If what you're looking for is not a base folder, but rather to dynamically change the folder for uploaded files per-upload, then, in short, that isn't supported. The Cloudinary Active Storage service is implemented similar to other Storage providers, such as Azure Storage, Google Cloud Storage or Amazon S3. As a Storage service, its main function is to store files on the integrated service, but Active Storage is not able to support many custom upload flows, such as dynamic changing of the Storage path (folder) per upload. This functionality is supported on Cloudinary itself, but due to the constraint with how standard Active Storage services are integrated, it means it's currently not supported through Active Storage.
However, the developers working on Rails/Active Storage have a planned update in the upcoming versions to support defining multiple Active Storage adapters for the same service. This would allow you to configure multiple Cloudinary configurations in the storage.yml file which can then be configured per attachment. There is a Pull request that was merged to support this - https://github.com/rails/rails/pull/34935.
Using the above changes, we can do something like this -
In storage.yml
cloudinary_profiles:
service: Cloudinary
folder: profiles
cloudinary_images:
service: Cloudinary
folder: images
Then you can do this (where different attachments can reference different adapters) -
class User < ApplicationRecord
has_one_attached :profile, service: :cloudinary_profiles
has_many_attached :images, service: :cloudinary_images
end
If you need a single attachment to upload to different folders (or apply different upload parameters/configurations) dynamically, then Active Storage, in general, won't be applicable for this use-case due to the constraints that Active Storage services have based on their standard implementation. You can use the Cloudinary Ruby SDK without Active Storage for a lot more fine-grained control based on your use-case.
I just tested this and it works. You can write some ruby code in your storage.yml file to tell Cloudinary to host your files in a different pre-defined folder, and different for production and development environments:
<% if Rails.env == "development" %>
<% test = "test-dev" %>
<% else %>
<% test = "test-prod" %>
<% end %>
cloudinary:
service: Cloudinary
folder: <%= test %>
I hope this helps!

Retrieving images from AWS S3 bucket

I'm building a fairly basic Ruby on Rails app, I'll be using about 2000 images, and this is my first real dive into aws/s3. The app won't have any user interaction, so I'm not sure if it's better to have all of the images on the app, and then upload them to my bucket, or add them to my bucket manually, and then download them to the app from there. The AWS documentation is a bit all over the place.
I currently have carrierwave installed and not sure what the next steps should be, or how to retrieve images from S3 into rails. I'll be using Heroku as well, but I've already set up the config with my AWS credentials.
uploaders/photo_uploader.rb
class PhotoUploader < CarrierWave::Uploader::Base
storage :fog
def store_dir
"uploads/#{model.class.to_s.underscore}/#{mounted_as}/#{model.id}"
end
def content_type_whitelist
/image\//
end
end
initializers/carrierwave.rb
CarrierWave.configure do |config|
config.fog_provider = 'fog/aws'
config.fog_credentials = {
provider: "AWS",
aws_access_key_id: ENV["AWS_ACCESS_KEY_ID"],
aws_secret_access_key: ENV["AWS_SECRET_ACCESS_KEY"]
}
config.fog_directory = ENV["S3_BUCKET"]
end
First step is to integrate image uploading and you can utilize a number of libraries to make this happen.
You want to grab dotenv-rails gem so you can securely manage the credentials you will need from AWS S3. This is a dedicated resource for production ready RoR app.
The next gem you want is the carrierwave-aws and the carrierwave gem that will manage everything and so that's three gems thus far. Fourth and final gem is mini_magick which is a requirement in order to use the methods available by carrierwave.
Second step is to sign up to an AWS account to use the S3 bucket. You cannot have the images on the app because if you do, you will not be able to deploy to Heroku with the images. Heroku will get rid of them.
Once you've installed these gems, you run a bundle install and then build out the basic functionality.
Here is some documentation on carrierwave: https://github.com/carrierwaveuploader/carrierwave
The documentation in the above link will walk you through how to properly install carrierwave.
So you will do something like:
rails generate uploader Photo
In your photo_uploader.rb file, you want to uncomment this code:
def extension_whitelist
%w(jpg jpeg gif png)
end
You want this uncommented to serve as a validator of the type of files you can upload. So if its not a jpg jpeg gif png RoR will throw an error. This whitelist is handy so I strongly recommend it.
Next, you have to set up your mapping between your uploader and the database.
So, fast forwarding to the part where you need to connect AWS to your app. This is where your dotenv-rails gem comes in. By the way, all these gems can be found in rubygems.org.
In the root of your folder, you are going to create a file called .env.
In the .env file you are going to add these:
S3_BUCKET_NAME=
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_REGION=
Never push the AWS keys to any codebase versioning tool like Github.
You want to go into your .gitignore file and ensure the .env file is included. This way git will not track that file.
To get your AWS credentials, go to your name in the AWS console and click on it and you will see a dropdown with
my security credentials
as an option.
Next, create your bucket.
To test successful integrating with your RoR app, go to rails console and run this command:
ENV.fetch('S3_BUCKET_NAME')
If you get an error at this stage, you may need to go to config/application.rb and add:
require "dotenv-rails"
Once having done that, go back into rails c and run ENV.fetch('S3_BUCKET_NAME') again and you should be goood to go if you followed the steps correctly.
You should have an initializers folder and in there you are going to create carrierwave.rb file.
Inside of that file you are going to paste all the code thats under the Usage section of this documentation:
https://github.com/sorentwo/carrierwave-aws
Go back to your photo_uploader.rb file and change storage :file to storage :aws.
Home stretch here, go back to carrierwave.rb file and there is one line of code you need to completely remove from what you copy and pasted from the above link and it is this line here:
config.asset_host = "http://example.com/
Now you can start up your rails server and instead of pointing to your local file system it should now be pointing to your bucket.
You need to upload these all images using application, after install carrierwave and fog-aws then you need to create model controller and form for uploading images.
OK, currently you have confused how to show image after uploaded, Right?
The simple is if image uploaded properly then the imagine the table is images and model is Image and column is picture because you have did not provided those names.
images_controller
class ImagesController < ApplicationController
def index
#images = Image.all
end
end
view/images/index.html.erb
<% #images.each do |image| %>
<%= image_tag image.picture.url %>
<% end %>
Note
This not to promote a product
If you need to see a sample with source code then this is the BitBucket repository and this is the live Heroku app and Stripe test card number a CVC code must be provided type anything like 232 etc.

Where do you store uploaded user images

I am not yet using a service such as Amazon S3, so where in the file structure should I store uploaded user images? I want to avoid the public directory as the images are private.
Are you using a plugin to handle your uploads? Many of them allow you to specify a path to store files, if you want to avoid the public folder a reasonable suggestion would be "#{RAILS_ROOT}/uploads/images/"
It's very much a matter of personal taste though.
For example in a carrierwave uploader this will place items in an uploads folder below RAILS_ROOT which is not publicly accessible.
def store_dir
"#{RAILS_ROOT}/uploads/#{model.class.to_s.underscore}/#{mounted_as}/#{model.id}"
end
https://github.com/carrierwaveuploader/carrierwave#changing-the-storage-directory

file paths will not update after file upload in rails production environment

I have a rails application which will upload image with the name as same as the book id to assets/books_icon when the user creates a book.
upload_icon(params[:book][:image_upload])
# upload_image when create/update book
def upload_icon(uploaded_io)
photo_directory = "app/assets/images/"
# only when user upload the iphoto
if uploaded_io.present?
# upload to the exact location with category_id
extension = File.extname(uploaded_io.original_filename)
photo_location = 'books_image/'+ #book.id.to_s + extension
# open in binary mode
photo_full_location = photo_directory + photo_location
File.open(photo_full_location, 'wb') do |file|
file.write(uploaded_io.read)
end
# only have to state which is the directory,
# image_tag will use assets piepline which will add 'assets/images/' as prefix in src
#book.update_attribute(:image_url, photo_location)
end
end
It works in the development mode.Then, I deployed the rails application with passenger on mac Apache2 server with mysql as the database and I changed config.assets.compile = true in the development configuration file.
In the production mode, I can create the new book and I can upload the image to assets/books_icon. However, the file path for the image which I just upload will not update.
For example, I create a new book with id 2, and there is 2.jpg in assets/books_icon. But the rails application will told me http://localhost/assets/images/books_icon/2.jpg is missing.
However, when I restart the apache server, I can view the picture at http://localhost/assets/images/books_icon/2.jpg
Is there any solutions to solve this type of caching problem??
In this context, an uploaded image isn't really considered an "asset" -- think of it as data that you happen to be storing somewhere else. (An image asset might be a logo, or background image, etc.). Rails' asset pipeline does some pretty tricky stuff in order to resolve a simple file path to an actual resource (e.g. image) when it's an asset.
By default, Rails makes the app/public folder the document root, and thus a place that you could upload images to -- perhaps in a subdirectory called "upload/img" in which case you could reference it with the path /upload/img/mybook.jpg.
This approach tends to be fragile, however because the image is really directly associated with data in your database, but located on the filesystem of a server. It starts falling apart when you move from development to stagings or production servers.
One approach I would not recommend is to upload the image and store it as a blob type in your database. Another I would recommend is to have another "central" server that you can upload images to that acts as an extension to your database. Many people use Amazon AWS "S3" service for this kind of thing. Take a look at the CarrierWave gem which does an excellent job of making all of this really easy, flexible, and powerful.

Extracting uploaded archive to S3 with CarrierWave on Heroku

I want to do something what I thought will be a simple task:
Have a form with these controls:
File upload for one file
Checkbox if this file should be extracted
Text input where I would specify which file should I link to (required only if the checkbox is checked) - index_file
After submitting form:
If the checkbox isn't checked, upload the file via CarrierWave to S3 to the specified store_dir
If the checkbox is checked, extract all files from the archive (I expect only ZIP archives; I need to keep the directory structure), upload extracted files to the specified store_dir and set the index_file in database (I don't need to save to database anything about other extracted files)
As I have found, it isn't an easy task because of Heroku limitations. These files will have a large size (hundreds of MiBs or a few GiBs), so I don't want to redownload this file from S3 if possible.
I think that using Delayed Job or Resque might work, but I'm not exactly sure how to do it and what is the best solution of my problem.
Does anyone have any idea how to solve it with using the lowest resources as possible? I can change CarrierWave to another uploader (Paperclip etc.) and my hosting provider too if it isn't possible on Heroku.
I was also thinking about using CloudFlare, would this still work without problems?
Thank you for answers.
Based on this heroku support email, it would seem that the /tmp directory is many gigs in size. You just need to clean up after yourself so Heroku as a platform is not the issue.
A couple of articles may help you solve the problem:
https://github.com/jnicklas/carrierwave/wiki/How-to%3A-Make-Carrierwave-work-on-Heroku - which explains how to configure your app to use the /tmp directory as the cache directory for CarrierWave. Pay attention to the following line:
use Rack::Static, :urls => ['/carrierwave'], :root => 'tmp' # adding this line
This instructs rack to serve /carrierwave/xzy from the /tmp directory (useful for storing images temporarily)
Then, using the uploader.cache! method, you can deliberately cache the inbound uploaded file. Once stored, you can do checks to determine whether to call the uploader.store! method which will promote the contents to S3 (assuming you configured S3 as the store for CarrierWave.

Resources