I have a newsletter that I send out to my customers (~10k emails) every morning and sometimes happens that this Sidekiq job is taking some much CPU/memory performance that the website (Rails app) is not running and facing blackouts.
When I look at the Sidekiq dashboard, I see there is some problem (probably invalid email address and Sidekiq repeatedly trying to send it again?) with the newsletter and it's stuck.
How do I prevent this behavior and preclude repeating the Sidekiq task (which I believe that's the problem of the breakout)?
Here's my code:
rake task:
namespace :mailer do desc "Carrier blast - morning"
task :newsletter_morning => [:environment] do
NewslettertJob.perform_later
end
end
job definition:
class NewslettertJob < ApplicationJob
def perform
...
NewsletterMailer.morning_blast(data).deliver_now
end
end
and NewsletterMailer:
class NewsletterMailer < ApplicationMailer
def morning_blast(data)
...
customers.each do |customer|
yield customer, nil; next if customer.email.blank?
begin
Retryable.retryable( tries: 1, sleep: 30, on: [Net::OpenTimeout, Net::SMTPAuthenticationError, Net::SMTPServerBusy]) do
send_email(customer.email).deliver
end
send_email(customer.email).deliver
rescue Net::SMTPSyntaxError => e
error_msg = "Newsletter sending failed on #{Time.now} with: #{e.message}. e.inspect: #{e.inspect}"
logger.warn error_msg
yield customer, nil
next
end
end
end
end
What I want to achieve is that the newsletter will be sent out every morning and if Rails/Sidekiq faces a problem, it will simply shut itself down, so the newsletter will not affect the "life" on the main website (its server).
Thank you in advance for every advice. I am being stuck on this issue for a while now.
If your machine only has one core, Sidekiq and puma will fight for CPU. Lower Sidekiq's concurrency so it uses less CPU, or get a machine with multiple cores, or move Sidekiq to a different machine.
If a Sidekiq process is using 100% of a core, lower the concurrency setting. The default in Sidekiq 6.0 is 10, which is a good default but if you are just delivering emails you could probably bump that to 20. You can run multiple Sidekiq processes if you wish to utilize multiple cores to process jobs faster.
I think ideally, you should separate your background task servers from your web servers, that way background process won't impact on the performance of the web server. I work for a very high traffic/ high-load company, and we have an architecture of sorts in here.
There are explanations on how to stop retries in this answer: Disable automatic retry with ActiveJob, used with Sidekiq
Another thing, your e-mail sending is done synchronously (.deliver). This implicates on your task being a huge monolitical process with many customers, with huge impact on memory. Instead, you could use a deliver_later, so each customer get's it's own little worker. This will also help aliviate CPU and Memory usage. You could even create a worker for sending e-mails per customer, and use your monolitical Job to merely dispatch those.
class NewslettertJob < ApplicationJob
def perform
...
customers.each |customer| do
NewsletterMailer.morning_blast(customer, data).deliver_later if customer.email.present?
end
end
end
However, I think the silver bullet is separating your sidekiq server from your web server - having one server dedicated to background tasks. On your web server, you don't even start the sidekiq instances.
Related
I have a ruby on rails web application deployed on Heroku.
This web app fetches some job feeds of given URLs as XMLs. Then regulates these XMLs and creates a single XML file. It worked pretty well for a while. However, since the #of URLs and job ads increases, it does not work at all. This process sometimes takes up to 45 secs since there are over 35K job vacancies (Heroku sends timeout after 30 secs). I am having an H12 timeout error. This error led me to read this worker dynos and background processing.
I figured out that I should apply the approach below:
Scalable-approach Heroku
Now I am using Redis and Sidekiq on my project. And I am able to create a background worker to do all the dirty work. But here is my question.
Instead of doing this call in the controller class:
def apply
send_data Aggregator.new(providers: providers).call,
type: 'text/xml; charset=UTF-8;',
disposition: 'attachment; filename=indeed_apply_yes.xml'
end
I am doin this perform_async call.
def apply
ReportWorker.perform_async(Time.now)
redirect_to health_path #and returns status 200 ok
end
I implemented this class: ReportWorker calls the Aggregator Service. data_xml is the field that I need to show somewhere or be downloaded automatically when it's ready.
class ReportWorker
include Sidekiq::Worker
sidekiq_options retry: false
data_xml = nil
def perform(start_date)
url_one = 'https://www.examplea.com/abc/download-xml'
url_two = 'https://www.exampleb.com/efg/download-xml'
cursor = 'stop'
providers = [url_one, url_two, cursor]
puts "SIDEKIQ WORKER GENERATING THE XML-DATA AT #{start_date}"
data_xml = Aggregator.new(providers: providers).call
puts "SIDEKIQ WORKER GENERATED THE XML-DATA AT #{Time.now}"
end
end
I know that It's not recommended to make send_data/file methods accessible out of Controller classes. Well, any suggestions on how to do it?
Thanks in advance!!
Do you can set up some database on your application? And then store record about completed jobs there, also you can save the entire file in database, but i recommend some cloud storage (like amazon s3).
And after that you can show current status of queued jobs on some page for user, with button 'download' after job has done
I have an API which uses a Service, in which I have used Ruby thread to reduce the response time of the API. I have tried to share the context using the following example. It was working fine with Rails 4, ruby 2.2.1
Now, we have upgraded rails to 5.2.3 and ruby 2.6.5. After which service has stopped working. I can call the service from Console, it works fine. But with API call, service becomes unresponsive once it reaches CurrencyConverter.new. Any Idea what can be the issue?
class ParallelTest
def initialize
puts "Initialized"
end
def perform
# Our sample set of currencies
currencies = ['ARS','AUD','CAD','CNY','DEM','EUR','GBP','HKD','ILS','INR','USD','XAG','XAU']
# Create an array to keep track of threads
threads = []
currencies.each do |currency|
# Keep track of the child processes as you spawn them
threads << Thread.new do
puts currency
CurrencyConverter.new(currency).print
end
end
# Join on the child processes to allow them to finish
threads.each do |thread|
thread.join
end
{ success: true }
end
end
class CurrencyConverter
def initialize(params)
#curr = params
end
def print
puts #curr
end
end
If I remove the CurrencyConverter.new(currency), then everything works fine. CurrencyConverter is a service object that I have.
Found the Issue
Thanks to #anothermh for this link
https://guides.rubyonrails.org/threading_and_code_execution.html#wrapping-application-code
https://guides.rubyonrails.org/threading_and_code_execution.html#load-interlock
As per the blog, When one thread is performing an autoload by evaluating the class definition from the appropriate file, it is important no other thread encounters a reference to the partially-defined constant.
Only one thread may load or unload at a time, and to do either, it must wait until no other threads are running application code. If a thread is waiting to perform a load, it doesn't prevent other threads from loading (in fact, they'll cooperate, and each perform their queued load in turn, before all resuming running together).
This can be resolved by permitting concurrent loads.
https://guides.rubyonrails.org/threading_and_code_execution.html#permit-concurrent-loads
Rails.application.executor.wrap do
urls.each do |currency|
threads << Thread.new do
CurrencyConverter.new(currency)
puts currency
end
ActiveSupport::Dependencies.interlock.permit_concurrent_loads do
threads.map(&:join)
end
end
end
Thank you everybody for your time, I appreciate.
Don't re-invent the wheel and use Sidekiq instead. 😉
From the project's page:
Simple, efficient background processing for Ruby.
Sidekiq uses threads to handle many jobs at the same time in the same process. It does not require Rails but will integrate tightly with Rails to make background processing dead simple.
With 400+ contributors, and 10k+ starts on Github, they have build a solid parallel job execution process that is production ready, and easy to setup.
Have a look at their Getting Started to see it by yourself.
I want to run an infinite loop on a separate thread that starts as soon as the app initializes (in an initializer). Here's what it might look like:
# in config/initializers/item_loop.rb
Thread.new
loop do
Item.find_each do |item|
# Get price from third-party api and update record.
item.update_price!
# Need to wait a little between requests to avoid getting throttled.
sleep 5
end
end
end
I tend to accomplish this by running batch updates in recurring background jobs. But this doesn't make sense since I don't really need parallelization, downtime, or queueing, I just want to update one item at a time in a single thread, forever.
Yet there are multiple things that concern me:
Leaked Connections: Should I open up a new connection_pool for the thread? Should I use a gem like safely to avoid crashing the thread?
Thread Safety: Should I be worried about race conditions? Should I make use of Mutex and synchronize? Does using ActiveRecord::Base.transaction impact thread safety?
Deadlock: Should I use Rails.application.executor.wrap?
Concurrent Ruby/Sleep Intervals: Should I use TimerTask from concurrent-ruby gem instead of sleep or something other than Thread.new?
Information on any of these subjects is appreciated.
Usually to perform a job in a background process(non web-server process) a background workers manager is used. Rails has a specific interface for that manager called ActiveJob There are few implementation of a background workers manager - Sidekiq, DelayedJob, Resque, etc. Sidekiq is preferred. Returning back to actual problem - you may create a schedule to run UpdatePriceJob every interval using gem sidekiq-scheduler Another nice extension for throttling Sidekiq workers is sidekiq-throttler
Some code snippets:
# app/workers/update_price_worker.rb
# Actual Worker class
class UpdatePriceWorker
include Sidekiq::Worker
sidekiq_options throttle: { threshold: 720, period: 1.hour }
def perform(item_id)
Item.find(item_id).update_price!
end
end
# app/workers/update_price_master_worker.rb
# Master worker that loops over items
class UpdatePriceMasterWorker
include Sidekiq::Worker
def perform
Item.find_each { |item| UpdatePriceWorker.perform_async item.id }
end
end
# config/sidekiq.yml
:schedule:
update_price:
cron: '0 */4 * * *' # Runs once per 4 hours - depends on how many Items are there
class: UpdatePriceMasterWorker
Idea of this setup - we run MasterWorker every 4 hours(this depends on how much time it takes to update all items). Master worker creates jobs to update price of an every particular item. UpdatePriceWorker is throttled to max 720 RPH.
I use rails runner x (god gem or k8s) in our similar case.
Rails runner runs in another process so that we do not have to worry about connection-leak and thread-safety.
God-gem or k8s supports concurrency and monitoring the job failure. Running 1 process with some specific sleep-time would promise third-party API throttles (running N process with N API-key could support speed up).
I think deadlock would happen in any concurrency situation.
I do not think this loop + sleep approach is a design flaw, because:
cron always starts based on schedule so that long running jobs could run simultaneously. We need to add a logic to avoid job overlapping. Rather, just loop + sleep keeps maximum throughput without any job overlap.
ActiveJob is good for one-shot long-running task, but it does not fit for daemon.
Does loading multiple Models in sidekiq worker can cause memory leak? Does it get garbage collected?
For example:
class Worker
include Sidekiq::Worker
def perform
Model.find_each do |item|
end
end
end
Does using ActiveRecord::Base.connection inside worker can cause problems? Or this connection automatically closes?
I think you are running into a problem that I also had with a "worker" - the actual problem was the code, not Sidekiq in any way, shape or form.
In my problematic code, I thoughtlessly just loaded up a boatload of models with a big, fat, greedy query (hundreds of thousands of instances).
I fixed my worker/code quite simply. For my instance, I transitioned my DB call from all to use find_in_batches with a lower number of objects pulled for the batch.
Model.find_in_batches(100) do |record|
# ... I like find_in_batches better than find_each because you can use lower numbers for the batch size
# ... other programming stuff
As soon as I did this, a job that would bring down Sidekiq after a while (running out of memory on the box) has run with find_in_batches for 5 months without me even having to restart Sidekiq ... Ok, I may have restarted Sidekiq some in the last 5 months when I've deployed or done maintenance :), but not because of the worker!
I have a Rails app that sends multiple requests both sequentially and in parallel to a third-party API and do calculation in the backend.
I would like to know how long each of my API requests and calculation takes. Is there performance testing gem I should use?
Note: my app uses Sidekiq to process backend jobs.
http://guides.rubyonrails.org/performance_testing.html might get you started, check out section 3 for details of wrapping methods in "benchmark" which outputs some useful stats to the log.
As a quick example:
def process
Benchmark.bm do |x|
x.report("Processing Task") do
process_task(task_options)
end
end
end
would output something like:
user system total real
Processing Task 8.206000 1.092000 9.298000 ( 14.609000)