Pass file to Active Job / background job

Pass file to Active Job / background job - ruby-on-rails

I'm receiving a file in a request params through a standard file input
def create
file = params[:file]
upload = Upload.create(file: file, filename: "img.png")
end
However, for large uploads, I'd like to do this in a background job.
Popular background jobs options like Sidekiq or Resque depend on Redis to store the parameters, so I can't just pass a file object through redis.
I could use a Tempfile, but on some platforms such as Heroku, local storage is not reliable.
What options do I have to make it reliable on "any" platform ?

I would suggest uploading directly to a service like Amazon S3 and then processing the file as you see fit in a background job.
When the user uploads the file, you can rest assure it will be safely stored in S3. You can use a private bucket for prohibiting public access. Then, in your background task you can process the upload by passing the file's S3 URI and let your background worker download the file.
I don't know what your background worker does with the file, but it goes without saying that downloading it again might not be necessary. It's stored somewhere after all.
I've used the carrierwave-direct gem in the past with success. Since you're mentioning Heroku, they have a detailed guide for uploading files directly to S3.

No tempfile
It sounds like you want to either speed up image uploading or push it into background. Here are my suggestions from another post. Maybe they'll help you if that's what you're looking for.
The reason I found this question is because I wanted to save a CSV file and have my background job add to the database with the info in that file.
I have a solution.
Because you the question is a bit unclear and I'm too lazy to post my own question and answer my own question, I'll just post the answer here. lol
Like the other dudes said, save the file on some cloud storage service. For Amazon, you need:
# Gemfile
gem 'aws-sdk', '~> 2.0' # for storing images on AWS S3
gem 'paperclip', '~> 5.0.0' # image processor if you want to use images
You also need this. Use the same code but different bucket name in production.rb
# config/environments/development.rb
Rails.application.configure do
config.paperclip_defaults = {
storage: :s3,
s3_host_name: 's3-us-west-2.amazonaws.com',
s3_credentials: {
bucket: 'my-bucket-development',
s3_region: 'us-west-2',
access_key_id: ENV['AWS_ACCESS_KEY_ID'],
secret_access_key: ENV['AWS_SECRET_ACCESS_KEY']
}
}
end
You also need a migration
# db/migrate/20000000000000_create_files.rb
class CreateFiles < ActiveRecord::Migration[5.0]
def change
create_table :files do |t|
t.attachment :import_file
end
end
end
and a model
class Company < ApplicationRecord
after_save :start_file_import
has_attached_file :import_file, default_url: '/missing.png'
validates_attachment_content_type :import_file, content_type: %r{\Atext\/.*\Z}
def start_file_import
return unless import_file_updated_at_changed?
FileImportJob.perform_later id
end
end
and a job
class FileImportJob < ApplicationJob
queue_as :default
def perform(file_id)
file = File.find file_id
filepath = file.import_file.url
# fetch file
response = HTTParty.get filepath
# we only need the contents of the response
csv_text = response.body
# use the csv gem to create csv table
csv = CSV.parse csv_text, headers: true
p "csv class: #{csv.class}" # => "csv class: CSV::Table"
# loop through each table row and do something with the data
csv.each_with_index do |row, index|
if index == 0
p "row class: #{row.class}" # => "row class: CSV::Row"
p row.to_hash # hash of all the keys and values from the csv file
end
end
end
end
In your controller
def create
#file.create file_params
end
def file_params
params.require(:file).permit(:import_file)
end

First you should save the file on storage(either local or AWS S3).
Then pass filepath or uuid as a parameter to background job.
I strongly recommend avoiding passing Tempfile on parameters. This stores object in memory which can get out of date, causing stale data problems.

Related

How to sync new ActiveStorage mirrors?

Starting with ActiveStorage you can know define mirrors for storing your files.
local:
service: Disk
root: <%= Rails.root.join("storage") %>
amazon:
service: S3
access_key_id: <%= Rails.application.credentials.dig(:aws, :access_key_id) %>
secret_access_key: <%= Rails.application.credentials.dig(:aws, :secret_access_key) %>
region: us-east-1
bucket: mybucket
mirror:
service: Mirror
primary: local
mirrors:
- amazon
- another_mirror
If you add a mirror after a certain point of time you have to take care about copying all files e.g. from "local" to "amazon" or "another_mirror".
Is there a convenient method to keep the files in sync?
Or method run a validation to check if all files are avaiable on each service?

I have a couple of solutions that might work for you, one for Rails <= 6.0 and one for Rails >= 6.1:
Firstly, you need to iterate through your ActiveStorage blobs:
ActiveStorage::Blob.all.each do |blob|
# work with blob
end
then...
Rails <= 6.0
You will need the blob's key, checksum, and the local file on disk.
local_file = ActiveStorage::Blob.service.primary.path_for blob.key
# I'm picking the first mirror as an example,
# but you can select a specific mirror if you want
mirror = blob.service.mirrors.first
mirror.upload blob.key, File.open(local_file), checksum: blob.checksum
You may also want to avoid uploading a file if it already exists on the mirror. You can do that by doing this:
mirror = blob.service.mirrors.first
# If the file doesn't exist on the mirror, upload it
unless mirror.exist? blob.key
# Upload file to mirror
end
Putting it together, a rake task might look like:
# lib/tasks/active_storage.rake
namespace :active_storage do
desc 'Ensures all files are mirrored'
task mirror_all: [:environment] do
# Iterate through each blob
ActiveStorage::Blob.all.each do |blob|
# We assume the primary storage is local
local_file = ActiveStorage::Blob.service.primary.path_for blob.key
# Iterate through each mirror
blob.service.mirrors.each do |mirror|
# If the file doesn't exist on the mirror, upload it
mirror.upload(blob.key, File.open(local_file), checksum: blob.checksum) unless mirror.exist? blob.key
end
end
end
end
You may run into a situation like #Rystraum mentioned where you might need to mirror from somewhere other than the local disk. In this case, the rake task could look like this:
# lib/tasks/active_storage.rake
namespace :active_storage do
desc 'Ensures all files are mirrored'
task mirror_all: [:environment] do
# All services in our rails configuration
all_services = [ActiveStorage::Blob.service.primary, *ActiveStorage::Blob.service.mirrors]
# Iterate through each blob
ActiveStorage::Blob.all.each do |blob|
# Select services where file exists
services = all_services.select { |file| file.exist? blob.key }
# Skip blob if file doesn't exist anywhere
next unless services.present?
# Select services where file doesn't exist
mirrors = all_services - services
# Open the local file (if one exists)
local_file = File.open(services.find{ |service| service.is_a? ActiveStorage::Service::DiskService }.path_for blob.key) if services.select{ |service| service.is_a? ActiveStorage::Service::DiskService }.any?
# Upload local file to mirrors (if one exists)
mirrors.each do |mirror|
mirror.upload blob.key, local_file, checksum: blob.checksum
end if local_file.present?
# If no local file exists then download a remote file and upload it to the mirrors (thanks #Rystraum)
services.first.open blob.key, checksum: blob.checksum do |temp_file|
mirrors.each do |mirror|
mirror.upload blob.key, temp_file, checksum: blob.checksum
end
end unless local_file.present?
end
end
end
While the first rake task answers the OP's question, the latter is much more versatile:
It can be used with any combination of services
A DiskService is not required
Uploading via DiskServices are prioritized
Avoids extra exists? calls as we only call it once per service per blob
Rails > 6.1
Its super easy, just call this on each blob...
blob.mirror_later
Wrapping it up as a rake task looks like:
# lib/tasks/active_storage.rake
namespace :active_storage do
desc 'Ensures all files are mirrored'
task mirror_all: [:environment] do
ActiveStorage::Blob.all.each do |blob|
blob.mirror_later
end
end
end

(03-11-2021) On Rails > 6.1.4.1, using active_storage > 6.1.4.1 and within:
Gemfile:
gem 'azure-storage-blob', github: 'Azure/azure-storage-ruby'
config/environments/production.rb
# Store uploaded files on the local file system (see config/storage.yml for options).
config.active_storage.service = :mirror #:microsoft or #:amazon
config/storage.yml:
amazon:
service: S3
access_key_id: XXX
secret_access_key: XXX
region: XXX
bucket: XXX
microsoft:
service: AzureStorage
storage_account_name: YYY
storage_access_key: YYY
container: YYY
mirror:
service: Mirror
primary: amazon
mirrors: [ microsoft ]
This does NOT work:
ActiveStorage::Blob.all.each do |blob|
blob.mirror_later
end && puts("Mirroring done!")
What DID work is:
ActiveStorage::Blob.all.each do |blob|
ActiveStorage::Blob.service.try(:mirror, blob.key, checksum: blob.checksum)
end && puts("Mirroring done!")
Not sure why that is, maybe future versions of Rails support it, or it needs additional background job setup, or it would have happened eventually (which never happened for me).
TL;DR
If you need to do mirroring for your entire storage immediately, add this rake task and execute it on your given environment with bundle exec rails active_storage:mirror_all:
lib/tasks/active_storage.rake
namespace :active_storage do
desc 'Ensures all files are mirrored'
task mirror_all: [:environment] do
ActiveStorage::Blob.all.each do |blob|
ActiveStorage::Blob.service.try(:mirror, blob.key, checksum: blob.checksum)
end && puts("Mirroring done!")
end
end
Optional:
Once you mirrored all the blobs, then you probably want to change all their service names if you want them to actually get served from the right storage:
namespace :active_storage do
desc 'Change each blob service name to microsoft'
task switch_to_microsoft: [:environment] do
ActiveStorage::Blob.all.each do |blob|
blob.service_name = 'microsoft'
blob.save
end && puts("All blobs will now be served from microsoft!")
end
end
Finally, change: config.active_storage.service= in production.rb or make the primary mirror to be the one you want future uploads to go to.

I've worked on top of https://stackoverflow.com/a/57579839/365218 so the rake task does not assume that the file is in local.
I started with S3, and due to cost concerns, I've decided to move the files to disk and use S3 and Azure as mirrors instead.
So my situation is that for some files, my primary (disk) sometimes don't have the file and my complete repository is actually on my 1st mirror.
So, it's 2 things:
Move files from S3 to disk
Added a new mirror, and want to keep it up to date
namespace :active_storage do
desc "Ensures all files are mirrored"
task mirror_all: [:environment] do
ActiveStorage::Blob.all.each do |blob|
source_mirror = if blob.service.primary.exist? blob.key
blob.service.primary
else
blob.service.mirrors.find { |m| m.exist? blob.key }
end
source_mirror.open(blob.key, checksum: blob.checksum) do |file|
blob.service.primary.upload(blob.key, file, checksum: blob.checksum) unless blob.service.primary.exist? blob.key
blob.service.mirrors.each do |mirror|
next if mirror == source_mirror
mirror.upload(blob.key, file, checksum: blob.checksum) unless mirror.exist? blob.key
end
end
rescue StandardError
puts blob.key.to_s
end
end
end

Everything is stored according to ActiveStorage's keys, so as long as your bucket names and file names aren't changed in the transfer, you can just copy everything over to the new service. See this post for how to copy stuff over.

How to migrate from Refile to ActiveStorage?

Any idea how to migrate a running project using Refile to the new rails's Active Storage?
Anyone knows any tutorial/guide about how to do that?
Thanks,
Patrick

I wrote a short post about it here which explains the process in detail:
https://dev.to/mtrolle/migrating-from-refile-to-activestorage-2dfp
Historically I hosted my Refile attached files in AWS S3, so what I did was refactoring all my code to use ActiveStorage instead. This primarily involved updating my model and views to use ActiveStorage syntax.
Then I removed the Refile gem and replaced it with ActiveStorage required gems like the image_processing gem and the aws-sdk-s3 gem.
Finally I created a Rails DB migration file to handle the actual migration of existing files. Here I looped through all records in my model with a Refile attachment to find their respective file in AWS S3, download it and then attach it to the model again using the ActiveStorage attachment.
Once the files were moved I could remove the legacy Refile database fields:
require 'mini_magick' # included by the image_processing gem
require 'aws-sdk-s3' # included by the aws-sdk-s3 gem
class User < ActiveRecord::Base
has_one_attached :avatar
end
class MovingFromRefileToActiveStorage < ActiveRecord::Migration[6.0]
def up
puts 'Connecting to AWS S3'
s3_client = Aws::S3::Client.new(
access_key_id: ENV['AWS_S3_ACCESS_KEY'],
secret_access_key: ENV['AWS_S3_SECRET'],
region: ENV['AWS_S3_REGION']
)
puts 'Migrating user avatar images from Refile to ActiveStorage'
User.where.not(avatar_id: nil).find_each do |user|
tmp_file = Tempfile.new
# Read S3 object to our tmp_file
s3_client.get_object(
response_target: tmp_file.path,
bucket: ENV['AWS_S3_BUCKET'],
key: "store/#{user.avatar_id}"
)
# Find content_type of S3 file using ImageMagick
# If you've been smart enough to save :avatar_content_type with Refile, you can use this value instead
content_type = MiniMagick::Image.new(tmp_file.path).mime_type
# Attach tmp file to our User as an ActiveStorage attachment
user.avatar.attach(
io: tmp_file,
filename: "avatar.#{content_type.split('/').last}",
content_type: content_type
)
if user.avatar.attached?
user.save # Save our changes to the user
puts "- migrated #{user.try(:name)}'s avatar image."
else
puts "- \e[31mFailed to migrate the avatar image for user ##{user.id} with Refile id #{user.avatar_id}\e[0m"
end
tmp_file.close
end
# Now remove the actual Refile column
remove_column :users, :avatar_id, :string
# If you've created other Refile fields like *_content_type, you can safely remove those as well
# remove_column :users, :avatar_content_type, :string
end
def down
raise ActiveRecord::IrreversibleMigration
end
end

Download an active Storage attachment to disc

The guide says that I can save an attachment to disc to run a process on it like this:
message.video.open do |file|
system '/path/to/virus/scanner', file.path
# ...
end
My model has an attachment defined as:
has_one_attached :zip
And then in the model I have defined:
def process_zip
zip.open do |file|
# process the zip file
end
end
However I am getting an error :
private method `open' called
on the zip.open call.
How can I save the zip locally for processing?

As an alternative in Rails 5.2 you can do this:
def process_zip
# Download the zip file in temp dir
zip_path = "#{Dir.tmpdir}/#{zip.filename}"
File.open(zip_path, 'wb') do |file|
file.write(zip.download)
end
Zip::File.open(zip_path) do |zip_file|
# process the zip file
# ...
puts "processing file #{zip_file}"
end
end

That’s an edge guide (note edgeguides.rubyonrails.org in the URL); it applies to the master branch of the rails/rails repository on GitHub. The latest changes in master haven’t been included in a released version of Rails yet.
You’re likely using Rails 5.2. Use edge Rails to take advantage of ActiveStorage::Blob#open:
gem "rails", github: "rails/rails"

Carrierwave upload and access file on S3

I am struggling to access files on S3 with Carrierwave.
In my uploader file doc_uploader.rb I have the following code
storage :file
def store_dir
"uploads/#{model.class.to_s.underscore}/#{mounted_as}/#{model.id}"
end
to uplooad "doc" model defined as follow
class Doc < ActiveRecord::Base
belongs_to :user
mount_uploader :doc, DocUploader
end
To access the uploaded file I have the following line of code in a controller
#doc = current_user.docs.order("created_at").last #last file uploaded by user
io = open("#{Rails.root}/public" + #doc.doc.url)
Everything works perfectly locally. Now I want to move my file to S3 in the uploader I use fog and replace
storage :file
by
storage :fog
I adjust my config file carrierwave.rb and uploading works perfectly. However, to access the file I try to use
#doc = current_user.docs.order("created_at").last
io = open("#{#doc.doc.url}")
and I get the following error
No such file or directory # rb_sysopen - /uploads/doc/doc/11/the_uploaded_file.pdf
Could anyone give me the right syntax to access the file on S3 please? Thanks.

When accessing the asset through the console, it gives you only the path, you might need to append the protocol & host to the #doc.doc.url, something like:
io = open("http://example.com#{#doc.doc.url}")
Or you can set the asset url on the environment you need to, but this is not really necessary:
config.asset_host = 'http://example.com'
This only applies if you are using the console, on any web view this will not apply, carrierwave seems to handle it

Carrierwave upload to Amazon S3 from a rake task: can't get upload to work

I am trying to generate a csv file in a rake task and...
Email it
Upload it to Amazon s3.
Here is the task.
desc "This task is called by the Heroku scheduler add-on"
require 'csv'
task :send_report => :environment do
file = Baseline.to_csv
ReportMailer.database_report(file).deliver_now
Report.create!(:data => file)
end
The generation of the csv file and attachment to the email works fine (not shown). Its the carrierwave upload that isn't working. Please note that I have other uploaders for other models and they work fine so my bucket settings are correct.
Here are the other files.
class Report < ActiveRecord::Base
mount_uploader :data, ReportUploader
end
and
class ReportUploader < CarrierWave::Uploader::Base
storage :fog
def store_dir
"uploads/#{model.class.to_s.underscore}/#{mounted_as}/#{model.id}"
end
def extension_white_list
%w(jpg jpeg gif png csv xls)
end
end
I have tried various permutations such as store! with not luck. I should add that if I look at the database, the new report is being created (and the data attribute is "nil", with no upload in sight)
Thanks

Categories

HOME

machine-learning

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Pass file to Active Job / background job - ruby-on-rails

First you should save the file on storage(either local or AWS S3). Then pass filepath or uuid as a parameter to background job. I strongly recommend avoiding passing Tempfile on parameters. This stores object in memory which can get out of date, causing stale data problems.

Related

How to sync new ActiveStorage mirrors?

How to migrate from Refile to ActiveStorage?

Download an active Storage attachment to disc

Carrierwave upload and access file on S3

Carrierwave upload to Amazon S3 from a rake task: can't get upload to work

Categories

Resources