Starting with ActiveStorage you can know define mirrors for storing your files.
local:
service: Disk
root: <%= Rails.root.join("storage") %>
amazon:
service: S3
access_key_id: <%= Rails.application.credentials.dig(:aws, :access_key_id) %>
secret_access_key: <%= Rails.application.credentials.dig(:aws, :secret_access_key) %>
region: us-east-1
bucket: mybucket
mirror:
service: Mirror
primary: local
mirrors:
- amazon
- another_mirror
If you add a mirror after a certain point of time you have to take care about copying all files e.g. from "local" to "amazon" or "another_mirror".
Is there a convenient method to keep the files in sync?
Or method run a validation to check if all files are avaiable on each service?
I have a couple of solutions that might work for you, one for Rails <= 6.0 and one for Rails >= 6.1:
Firstly, you need to iterate through your ActiveStorage blobs:
ActiveStorage::Blob.all.each do |blob|
# work with blob
end
then...
Rails <= 6.0
You will need the blob's key, checksum, and the local file on disk.
local_file = ActiveStorage::Blob.service.primary.path_for blob.key
# I'm picking the first mirror as an example,
# but you can select a specific mirror if you want
mirror = blob.service.mirrors.first
mirror.upload blob.key, File.open(local_file), checksum: blob.checksum
You may also want to avoid uploading a file if it already exists on the mirror. You can do that by doing this:
mirror = blob.service.mirrors.first
# If the file doesn't exist on the mirror, upload it
unless mirror.exist? blob.key
# Upload file to mirror
end
Putting it together, a rake task might look like:
# lib/tasks/active_storage.rake
namespace :active_storage do
desc 'Ensures all files are mirrored'
task mirror_all: [:environment] do
# Iterate through each blob
ActiveStorage::Blob.all.each do |blob|
# We assume the primary storage is local
local_file = ActiveStorage::Blob.service.primary.path_for blob.key
# Iterate through each mirror
blob.service.mirrors.each do |mirror|
# If the file doesn't exist on the mirror, upload it
mirror.upload(blob.key, File.open(local_file), checksum: blob.checksum) unless mirror.exist? blob.key
end
end
end
end
You may run into a situation like #Rystraum mentioned where you might need to mirror from somewhere other than the local disk. In this case, the rake task could look like this:
# lib/tasks/active_storage.rake
namespace :active_storage do
desc 'Ensures all files are mirrored'
task mirror_all: [:environment] do
# All services in our rails configuration
all_services = [ActiveStorage::Blob.service.primary, *ActiveStorage::Blob.service.mirrors]
# Iterate through each blob
ActiveStorage::Blob.all.each do |blob|
# Select services where file exists
services = all_services.select { |file| file.exist? blob.key }
# Skip blob if file doesn't exist anywhere
next unless services.present?
# Select services where file doesn't exist
mirrors = all_services - services
# Open the local file (if one exists)
local_file = File.open(services.find{ |service| service.is_a? ActiveStorage::Service::DiskService }.path_for blob.key) if services.select{ |service| service.is_a? ActiveStorage::Service::DiskService }.any?
# Upload local file to mirrors (if one exists)
mirrors.each do |mirror|
mirror.upload blob.key, local_file, checksum: blob.checksum
end if local_file.present?
# If no local file exists then download a remote file and upload it to the mirrors (thanks #Rystraum)
services.first.open blob.key, checksum: blob.checksum do |temp_file|
mirrors.each do |mirror|
mirror.upload blob.key, temp_file, checksum: blob.checksum
end
end unless local_file.present?
end
end
end
While the first rake task answers the OP's question, the latter is much more versatile:
It can be used with any combination of services
A DiskService is not required
Uploading via DiskServices are prioritized
Avoids extra exists? calls as we only call it once per service per blob
Rails > 6.1
Its super easy, just call this on each blob...
blob.mirror_later
Wrapping it up as a rake task looks like:
# lib/tasks/active_storage.rake
namespace :active_storage do
desc 'Ensures all files are mirrored'
task mirror_all: [:environment] do
ActiveStorage::Blob.all.each do |blob|
blob.mirror_later
end
end
end
(03-11-2021) On Rails > 6.1.4.1, using active_storage > 6.1.4.1 and within:
Gemfile:
gem 'azure-storage-blob', github: 'Azure/azure-storage-ruby'
config/environments/production.rb
# Store uploaded files on the local file system (see config/storage.yml for options).
config.active_storage.service = :mirror #:microsoft or #:amazon
config/storage.yml:
amazon:
service: S3
access_key_id: XXX
secret_access_key: XXX
region: XXX
bucket: XXX
microsoft:
service: AzureStorage
storage_account_name: YYY
storage_access_key: YYY
container: YYY
mirror:
service: Mirror
primary: amazon
mirrors: [ microsoft ]
This does NOT work:
ActiveStorage::Blob.all.each do |blob|
blob.mirror_later
end && puts("Mirroring done!")
What DID work is:
ActiveStorage::Blob.all.each do |blob|
ActiveStorage::Blob.service.try(:mirror, blob.key, checksum: blob.checksum)
end && puts("Mirroring done!")
Not sure why that is, maybe future versions of Rails support it, or it needs additional background job setup, or it would have happened eventually (which never happened for me).
TL;DR
If you need to do mirroring for your entire storage immediately, add this rake task and execute it on your given environment with bundle exec rails active_storage:mirror_all:
lib/tasks/active_storage.rake
namespace :active_storage do
desc 'Ensures all files are mirrored'
task mirror_all: [:environment] do
ActiveStorage::Blob.all.each do |blob|
ActiveStorage::Blob.service.try(:mirror, blob.key, checksum: blob.checksum)
end && puts("Mirroring done!")
end
end
Optional:
Once you mirrored all the blobs, then you probably want to change all their service names if you want them to actually get served from the right storage:
namespace :active_storage do
desc 'Change each blob service name to microsoft'
task switch_to_microsoft: [:environment] do
ActiveStorage::Blob.all.each do |blob|
blob.service_name = 'microsoft'
blob.save
end && puts("All blobs will now be served from microsoft!")
end
end
Finally, change: config.active_storage.service= in production.rb or make the primary mirror to be the one you want future uploads to go to.
I've worked on top of https://stackoverflow.com/a/57579839/365218 so the rake task does not assume that the file is in local.
I started with S3, and due to cost concerns, I've decided to move the files to disk and use S3 and Azure as mirrors instead.
So my situation is that for some files, my primary (disk) sometimes don't have the file and my complete repository is actually on my 1st mirror.
So, it's 2 things:
Move files from S3 to disk
Added a new mirror, and want to keep it up to date
namespace :active_storage do
desc "Ensures all files are mirrored"
task mirror_all: [:environment] do
ActiveStorage::Blob.all.each do |blob|
source_mirror = if blob.service.primary.exist? blob.key
blob.service.primary
else
blob.service.mirrors.find { |m| m.exist? blob.key }
end
source_mirror.open(blob.key, checksum: blob.checksum) do |file|
blob.service.primary.upload(blob.key, file, checksum: blob.checksum) unless blob.service.primary.exist? blob.key
blob.service.mirrors.each do |mirror|
next if mirror == source_mirror
mirror.upload(blob.key, file, checksum: blob.checksum) unless mirror.exist? blob.key
end
end
rescue StandardError
puts blob.key.to_s
end
end
end
Everything is stored according to ActiveStorage's keys, so as long as your bucket names and file names aren't changed in the transfer, you can just copy everything over to the new service. See this post for how to copy stuff over.
Any idea how to migrate a running project using Refile to the new rails's Active Storage?
Anyone knows any tutorial/guide about how to do that?
Thanks,
Patrick
I wrote a short post about it here which explains the process in detail:
https://dev.to/mtrolle/migrating-from-refile-to-activestorage-2dfp
Historically I hosted my Refile attached files in AWS S3, so what I did was refactoring all my code to use ActiveStorage instead. This primarily involved updating my model and views to use ActiveStorage syntax.
Then I removed the Refile gem and replaced it with ActiveStorage required gems like the image_processing gem and the aws-sdk-s3 gem.
Finally I created a Rails DB migration file to handle the actual migration of existing files. Here I looped through all records in my model with a Refile attachment to find their respective file in AWS S3, download it and then attach it to the model again using the ActiveStorage attachment.
Once the files were moved I could remove the legacy Refile database fields:
require 'mini_magick' # included by the image_processing gem
require 'aws-sdk-s3' # included by the aws-sdk-s3 gem
class User < ActiveRecord::Base
has_one_attached :avatar
end
class MovingFromRefileToActiveStorage < ActiveRecord::Migration[6.0]
def up
puts 'Connecting to AWS S3'
s3_client = Aws::S3::Client.new(
access_key_id: ENV['AWS_S3_ACCESS_KEY'],
secret_access_key: ENV['AWS_S3_SECRET'],
region: ENV['AWS_S3_REGION']
)
puts 'Migrating user avatar images from Refile to ActiveStorage'
User.where.not(avatar_id: nil).find_each do |user|
tmp_file = Tempfile.new
# Read S3 object to our tmp_file
s3_client.get_object(
response_target: tmp_file.path,
bucket: ENV['AWS_S3_BUCKET'],
key: "store/#{user.avatar_id}"
)
# Find content_type of S3 file using ImageMagick
# If you've been smart enough to save :avatar_content_type with Refile, you can use this value instead
content_type = MiniMagick::Image.new(tmp_file.path).mime_type
# Attach tmp file to our User as an ActiveStorage attachment
user.avatar.attach(
io: tmp_file,
filename: "avatar.#{content_type.split('/').last}",
content_type: content_type
)
if user.avatar.attached?
user.save # Save our changes to the user
puts "- migrated #{user.try(:name)}'s avatar image."
else
puts "- \e[31mFailed to migrate the avatar image for user ##{user.id} with Refile id #{user.avatar_id}\e[0m"
end
tmp_file.close
end
# Now remove the actual Refile column
remove_column :users, :avatar_id, :string
# If you've created other Refile fields like *_content_type, you can safely remove those as well
# remove_column :users, :avatar_content_type, :string
end
def down
raise ActiveRecord::IrreversibleMigration
end
end
The guide says that I can save an attachment to disc to run a process on it like this:
message.video.open do |file|
system '/path/to/virus/scanner', file.path
# ...
end
My model has an attachment defined as:
has_one_attached :zip
And then in the model I have defined:
def process_zip
zip.open do |file|
# process the zip file
end
end
However I am getting an error :
private method `open' called
on the zip.open call.
How can I save the zip locally for processing?
As an alternative in Rails 5.2 you can do this:
def process_zip
# Download the zip file in temp dir
zip_path = "#{Dir.tmpdir}/#{zip.filename}"
File.open(zip_path, 'wb') do |file|
file.write(zip.download)
end
Zip::File.open(zip_path) do |zip_file|
# process the zip file
# ...
puts "processing file #{zip_file}"
end
end
That’s an edge guide (note edgeguides.rubyonrails.org in the URL); it applies to the master branch of the rails/rails repository on GitHub. The latest changes in master haven’t been included in a released version of Rails yet.
You’re likely using Rails 5.2. Use edge Rails to take advantage of ActiveStorage::Blob#open:
gem "rails", github: "rails/rails"
I am struggling to access files on S3 with Carrierwave.
In my uploader file doc_uploader.rb I have the following code
storage :file
def store_dir
"uploads/#{model.class.to_s.underscore}/#{mounted_as}/#{model.id}"
end
to uplooad "doc" model defined as follow
class Doc < ActiveRecord::Base
belongs_to :user
mount_uploader :doc, DocUploader
end
To access the uploaded file I have the following line of code in a controller
#doc = current_user.docs.order("created_at").last #last file uploaded by user
io = open("#{Rails.root}/public" + #doc.doc.url)
Everything works perfectly locally. Now I want to move my file to S3 in the uploader I use fog and replace
storage :file
by
storage :fog
I adjust my config file carrierwave.rb and uploading works perfectly. However, to access the file I try to use
#doc = current_user.docs.order("created_at").last
io = open("#{#doc.doc.url}")
and I get the following error
No such file or directory # rb_sysopen - /uploads/doc/doc/11/the_uploaded_file.pdf
Could anyone give me the right syntax to access the file on S3 please? Thanks.
When accessing the asset through the console, it gives you only the path, you might need to append the protocol & host to the #doc.doc.url, something like:
io = open("http://example.com#{#doc.doc.url}")
Or you can set the asset url on the environment you need to, but this is not really necessary:
config.asset_host = 'http://example.com'
This only applies if you are using the console, on any web view this will not apply, carrierwave seems to handle it
I am trying to generate a csv file in a rake task and...
Email it
Upload it to Amazon s3.
Here is the task.
desc "This task is called by the Heroku scheduler add-on"
require 'csv'
task :send_report => :environment do
file = Baseline.to_csv
ReportMailer.database_report(file).deliver_now
Report.create!(:data => file)
end
The generation of the csv file and attachment to the email works fine (not shown). Its the carrierwave upload that isn't working. Please note that I have other uploaders for other models and they work fine so my bucket settings are correct.
Here are the other files.
class Report < ActiveRecord::Base
mount_uploader :data, ReportUploader
end
and
class ReportUploader < CarrierWave::Uploader::Base
storage :fog
def store_dir
"uploads/#{model.class.to_s.underscore}/#{mounted_as}/#{model.id}"
end
def extension_white_list
%w(jpg jpeg gif png csv xls)
end
end
I have tried various permutations such as store! with not luck. I should add that if I look at the database, the new report is being created (and the data attribute is "nil", with no upload in sight)
Thanks