Why is Carrierwave-Backgrounder / Sidekiq triggering redundant image processing? - carrierwave

I'm using Carrierwave-Backgrounder with Sidekiq to process User icons. Unfortunately, every time I update the User model it triggers CarrierWave::Workers::ProcessAsset to process the existing image. Is there a way to make the backgrounder run only if the User has an icon or changes an existing one? I've tried everything...
user.rb
class User < ActiveRecord::Base
mount_uploader :icon, IconUploader
process_in_background :icon
icon_uploader.rb
class IconUploader < CarrierWave::Uploader::Base
include ::CarrierWave::Backgrounder::Delay
INFO: Booting Sidekiq 2.14.1 using redis://localhost:6379/0 with options
INFO: Running in ruby 2.0.0p247 (2013-06-27 revision 41674)
INFO: Starting processing, hit Ctrl-C to stop
CarrierWave::Workers::ProcessAsset JID-f2c19a1e33e83be2dce31961 INFO: start
CarrierWave::Workers::ProcessAsset JID-f2c19a1e33e83be2dce31961 INFO: done: 0.733 sec
CarrierWave::Workers::ProcessAsset JID-a497e66f54609f76678db81a INFO: start
CarrierWave::Workers::ProcessAsset JID-a497e66f54609f76678db81a INFO: done: 0.32 sec
CarrierWave::Workers::ProcessAsset JID-576dd9a036323e700e86860c INFO: start
CarrierWave::Workers::ProcessAsset JID-576dd9a036323e700e86860c INFO: done: 0.588 sec

As gems go, carrierwave-backgrounder is pretty simple. They only enqueue a job if the field has been "updated", where "updated" is defined as any of the following being true:
avatar_changed?
previous_changes.has_key?(:avatar)
remote_avatar_url.present?
avatar_cache.present?
1 & 2 apply if you're changing the avatar through assignment, 3 applies if you're letting people paste in URLs, and 4 applies if you're caching changes between form reloads.
If you're absolutely certain that none of those are the case and you're still seeing jobs on every save, you may want to create an issue on their issue tracker.

Related

Sidekiq-Unique-jobs Lock works only for first Worker, then all pending jobs get executed at the same time?

I could have 3 jobs pending in the queue, as soon as the first Job is done, all remaining workers get executed at the same time. Why is that? I want each worker to get a lock and make the other jobs wait, in a serialized-fashion.
class StuffWorker
include Sidekiq::Worker
sidekiq_options lock: :until_executed,
lock_timeout: 999,
lock_info: true,
lock_args_method: :lock_args
def self.lock_args(args)
[args[0], args[1]]
end
def perform(company_id, person_id)
sleep 10
logger.info "STARTING IT! at #{DateTime.now.strftime('%H:%M:%S')}"
end
end
Produces the following:
JID-ce8c692b5341adb7a24584ab INFO: STARTING IT! at 23:29:52
JID-ce8c692b5341adb7a24584ab INFO: done: 10.728 sec
JID-ca8dac1cbd7cbaf5d87f6096 INFO: STARTING IT! at 23:30:02
JID-463bfe792775e1412d3c0af7 INFO: STARTING IT! at 23:30:02
JID-463bfe792775e1412d3c0af7 INFO: done: 17.754 sec
JID-ca8dac1cbd7cbaf5d87f6096 INFO: done: 14.024 sec
Possible, what you want is a runtime lock only, not queue lock. You have to use while_executing lock type https://github.com/mhenrixon/sidekiq-unique-jobs#while-executing.

How do I log initialize/end times in Sidekiq?

I found these logs here that I'm trying to duplicate in my code, specifically the start time and done times:
PID-- ThreadID----- LLvl YourKlass- JobID--------
TID-oveahmcxw INFO: HardWorker JID-oveaivtrg start
TID-oveajt7ro INFO: HardWorker JID-oveaish94 start
TID-oveahmcxw INFO: HardWorker JID-oveaivtrg done: 10.003 sec
TID-oveajt7ro INFO: HardWorker JID-oveaish94 done: 10.002 sec
Does anyone know how to access these values? I know there is a jid that gets popluated (which I'm using), but I can't for the life of me find any documentation on where start and done come from.
You can add logging by using a middleware:
class JobMiddleware
def call(worker, msg, queue)
Rails.logger.info "Job #{worker.jid} stated at #{Time.now}"
yield
Rails.logger.info "Job #{worker.jid} ended at #{Time.now}"
end
end
Add this to config/initializers/sidekiq.rb and restart your sidekiq workers.
Click here for more info about sidekiq middlewares

Problems with file.path with csv import via sidekiq on heroku

I am using a background job in order to import user data from a csv file into my datase. First I did this "hard" in my User model by simply calling a method in my User model and by passing the file path which is transmitted via a form file_field:
User.import_csv(params[:file].path)
Worked well locally and on production (heroku).
Now when it comes to huge CSV files, I understood that I need a job to perform this import in the background. I am familiar with redis and sidekiq so the job was built quickly.
CsvImportJob.perform_async(URI.parse(params[:file].path))
and in my worker:
def perform(file_path)
User.import_csv(file_path)
end
Well, that also works perfect locally but as soon as I hit this on production, I see the following error in my log:
» 10 Aug 2015 13:56:26.596 2015-08-10 11:56:25.987726+00:00 app worker.1 - - 3 TID-oqvt6v1d4 ERROR: Actor crashed!
» 10 Aug 2015 13:56:26.596 2015-08-10 11:56:25.987728+00:00 app worker.1 - - Errno::ENOENT: No such file or directory # rb_sysopen - /tmp/RackMultipart20150810-6-14u804c.csv
» 10 Aug 2015 13:56:26.596 2015-08-10 11:56:25.987730+00:00 app worker.1 - - /app/vendor/ruby-2.2.2/lib/ruby/2.2.0/csv.rb:1256:in `initialize'
This is meant to be the file_path variable.
Somehow heroku is not able to find the file when I pass it to a sidekiq job. When I do this without sidekiq, it works.
I don't really know how to tackle this issue so any help is appreciated.
I had the same experience, you can look at a similar project of mine at https://github.com/coderaven/datatable-exercise/tree/parallel_processing
(Basically just focus on object_record.rb model and the jobs: import_csv_job.rb and process_csv_job.rb)
The error: Errno::ENOENT: No such file or directory # rb_sysopen
If you said that this works on heroku then probably that means that the path you are getting this is valid (in your example you are using the /tmp/ path)
So here's 2 probable problems and their solution:
1.) You have saved an unknown to Heroku path (or inaccessible path) which cannot be access or opened by the application when it is running. Since, when handling the import csv without sidekiq - the file you uploaded are save temporarily in-memory until you finish processing the csv - However, in a job scheduler (or sidekiq) the path should not be in memory and should be an existing path that is accessible to the app.
Solution: Save the file to a storage somewhere (heroku has an ephemeral filesystem so you cannot save files via the running web-app) to work this around, you have to use an Amazon S3 like service (you can also use Google Drive like what I did) to save your files there and then give the path to your sidekiq worker - so it can access and process it later.
2.) If the paths are correct and the files are save or processed correctly then from my experience it could have been that you are using File.open instead of the open-uri's open method. File.open does not accept remote files, you need to require open-uri on your worker and then use the open method to work around remote files.
ex.
require 'open-uri'
class ProcessCsvJob < ActiveJob::Base
queue_as :default
def perform(csv_path)
csv_file = open(csv_path,'rb:UTF-8')
SmarterCSV.process(csv_file) do |array|
.... code here for processing ...
end
end
end
I'm fully aware this question is already past almost a year, so if you have solved this or this answer worked then it could also help serve as a documentation archive for those who will probably experience the same problem.
You can't pass a file object to the perform method.
The fix is to massage the data beforehand and pass in the parameters you need directly.
Something like...
def import_csv(file)
CSV.foreach(file.path, headers: true) do |row|
new_user = { email: row[0], password: row[1] }
CsvImportJob.perform_async(new_user)
end
end
Note: you'd call CsvImportJob.perform_later for Sidekiq with ActiveJob and Rails 5.
You got the error because on production/staging and sidekiq run on different servers.
Use my solution: upload csv to google cloud storage
class Services::Downloader
require 'fog'
StorageCredentials = YAML.load_file("#{::Rails.root}/config/g.yml")[Rails.env]
def self.download(file_name, local_path)
storage = Fog::Storage.new(
provider: "Google",
google_storage_access_key_id: StorageCredentials['key_id'],
google_storage_secret_access_key: StorageCredentials['access_key'])
storage.get_bucket(StorageCredentials['bucket'])
f = File.open(local_path)
storage.put_object(StorageCredentials['bucket'], file_name, f)
storage.get_object_https_url(StorageCredentials['bucket'], file_name, Time.now.to_f + 24.hours)
end
end
Class User
class User < ApplicationRecord
require 'csv'
require 'open-uri'
def self.import_data(file)
load_file = open(file)
data = CSV.read(load_file, { encoding: "UTF-8", headers: true, header_converters: :symbol, converters: :all})
...
Worker
class ImportWorker
include Sidekiq::Worker
sidekiq_options queue: 'workers', retry: 0
def perform(filename)
User.import_data(filename)
end
end
and code for start worker
--
path = Services::Downloader.download(zip.name, zip.path)
ImportWorker.perform_async(path)

Couldn't find Post without ID sidekiq error

I can't figure out what's going on with sidekiq. I could've swore this worked yesterday, but I must have been dreaming.
Here's my worker class:
class TagPostWorker
include Sidekiq::Worker
sidekiq_options queue: "tag"
sidekiq_options retry: false
def perform(options = {})
current_user = User.find(options[:user_id])
end
end
I've tried running this command on my show method in the Post:
TagPostWorker.perform_async({:user_id => current_user.id})
But I get this error:
2013-08-17T22:45:45Z 4029 TID-ors6jfr54 TagPostWorker JID-ae203958bb3bcee01c8f83ef INFO: start
2013-08-17T22:45:45Z 4029 TID-ors6jfr54 TagPostWorker JID-ae203958bb3bcee01c8f83ef INFO: fail: 0.003 sec
2013-08-17T22:45:45Z 4029 TID-ors6jfr54 WARN: {"retry"=>false, "queue"=>"tag", "class"=>"TagPostWorker", "args"=>[{"user_id"=>7}], "jid"=>"ae203958bb3bcee01c8f83ef", "enqueued_at"=>1376779545.9099338}
2013-08-17T22:45:45Z 4029 TID-ors6jfr54 WARN: Couldn't find Post without an ID
I don't understand how sidekiq could even be attempting to a Post since I'm not even calling it in the perform method. Any ideas what could be going on?
Any help is appreciated.
You may want to take a look at this blog post.
It sounds ridiculous, but Sidekiq is so fast that it can run your
worker before your model even finishes saving.
Also strange, I could've sworn I restarted Rails/sidekiq.
But I renamed the worker to TagWorker, restarted Rails/sidekiq and it started working again!

slow rails stack

When I run
rails server
or
rake -T
or some other rails script, it takes a lot of time, approx 1 minute.
What is the best way to determine what exactly is so slow ?
How can the speed be improved ?
Rails v is 3.0.3 run trough ruby 1.9.2 (RVM) - Linux
That is bothering me also, since I have switched to Rails 3.
To your second question: I found by digging through the framework that the initializers take about half the time of a simple rake or rails call before it actually starts doing its task.
If you put these simple timing lines into the loop of initializer calls in $GEM_PATH/gems/railties-3.0.3/lib/rails/initializable.rb (or piggy-back it if you like):
def run_initializers(*args)
return if instance_variable_defined?(:#ran)
t0 = Time.now
initializers.tsort.each do |initializer|
t = Time.now
initializer.run(*args)
puts("%60s: %.3f sec" % [initializer.name, Time.now - t])
end
puts "%60s: %.3f sec" % ["for all", Time.now - t0]
#ran = true
end
EDIT: Or, for railties 4.2.1:
def run_initializers(group=:default, *args)
return if instance_variable_defined?(:#ran)
t0 = Time.now
initializers.tsort.each do |initializer|
t = Time.now
initializer.run(*args) if initializer.belongs_to?(group)
puts("%60s: %.3f sec" % [initializer.name, Time.now - t])
end
puts "%60s: %.3f sec" % ["for all", Time.now - t0]
#ran = true
end
... you can follow up what happens. On my system, which is a 2.4 Core 2 Duo MacBook the initializers take about 7 seconds.
There are a few that are especially slow on my system. When I filter all out below a second, I get this result on my system:
load_active_support: 1.123 sec
active_support.initialize_time_zone: 1.579 sec
load_init_rb: 1.118 sec
set_routes_reloader: 1.291 sec
I am sure somebody (is it me?) will take some time to start there and optimize.
Our Rails 3.1 startup time was almost 1 minute (having a lot of gems)
Then we found out about some Ruby 1.9.3 tuning options on reddit:
http://www.reddit.com/r/ruby/comments/wgtqj/how_i_spend_my_time_building_rails_apps/c5daer4
export RUBY_HEAP_MIN_SLOTS=800000
export RUBY_HEAP_FREE_MIN=100000
export RUBY_HEAP_SLOTS_INCREMENT=300000
export RUBY_HEAP_SLOTS_GROWTH_FACTOR=1
export RUBY_GC_MALLOC_LIMIT=79000000
put this in your shell environment/profile/bashrc, and you are done.
We boosted our startup from 1 minute to 9 seconds
One workaround I use for this is to preload the rails environment with rails-sh. That way only the first rails/rake command is slow and the rest are pretty fast. Wrote a fuller answer to it in this question.
Another way I tried more recently and is compatible with the first is installing a patched ruby (with rvm or rubyenv or from source) and adjusting environment variables (see #stwienert's answer). The falcon patch and railsexpress patches both seem to pick up significant performance in ruby 1.9. Check out rvm/rubyenv on how to install patched rubies with them.
I used robokopp's tip here to discover that most of the time was being used in the build_middleware_stack and load_config_initializers steps for me. This is because I am using the omniauth gem that adds middlewares and perhaps has heavy initialization steps. I am on Rails 3.1.rc1, and my initialization takes almost 13 seconds (I am on ruby 1.9.2p180).
Even for brand new rails 3.1.rc1 app, the initialization takes ~3.6 seconds, with max time taken by load_config_initializers.
So I suggest you look for gems/your own code that have heavy initializers or add too many middlewares.

Resources