I have a newsletter that I send out to my customers (~10k emails) every morning and sometimes happens that this Sidekiq job is taking some much CPU/memory performance that the website (Rails app) is not running and facing blackouts.
When I look at the Sidekiq dashboard, I see there is some problem (probably invalid email address and Sidekiq repeatedly trying to send it again?) with the newsletter and it's stuck.
How do I prevent this behavior and preclude repeating the Sidekiq task (which I believe that's the problem of the breakout)?
Here's my code:
rake task:
namespace :mailer do desc "Carrier blast - morning"
task :newsletter_morning => [:environment] do
NewslettertJob.perform_later
end
end
job definition:
class NewslettertJob < ApplicationJob
def perform
...
NewsletterMailer.morning_blast(data).deliver_now
end
end
and NewsletterMailer:
class NewsletterMailer < ApplicationMailer
def morning_blast(data)
...
customers.each do |customer|
yield customer, nil; next if customer.email.blank?
begin
Retryable.retryable( tries: 1, sleep: 30, on: [Net::OpenTimeout, Net::SMTPAuthenticationError, Net::SMTPServerBusy]) do
send_email(customer.email).deliver
end
send_email(customer.email).deliver
rescue Net::SMTPSyntaxError => e
error_msg = "Newsletter sending failed on #{Time.now} with: #{e.message}. e.inspect: #{e.inspect}"
logger.warn error_msg
yield customer, nil
next
end
end
end
end
What I want to achieve is that the newsletter will be sent out every morning and if Rails/Sidekiq faces a problem, it will simply shut itself down, so the newsletter will not affect the "life" on the main website (its server).
Thank you in advance for every advice. I am being stuck on this issue for a while now.
If your machine only has one core, Sidekiq and puma will fight for CPU. Lower Sidekiq's concurrency so it uses less CPU, or get a machine with multiple cores, or move Sidekiq to a different machine.
If a Sidekiq process is using 100% of a core, lower the concurrency setting. The default in Sidekiq 6.0 is 10, which is a good default but if you are just delivering emails you could probably bump that to 20. You can run multiple Sidekiq processes if you wish to utilize multiple cores to process jobs faster.
I think ideally, you should separate your background task servers from your web servers, that way background process won't impact on the performance of the web server. I work for a very high traffic/ high-load company, and we have an architecture of sorts in here.
There are explanations on how to stop retries in this answer: Disable automatic retry with ActiveJob, used with Sidekiq
Another thing, your e-mail sending is done synchronously (.deliver). This implicates on your task being a huge monolitical process with many customers, with huge impact on memory. Instead, you could use a deliver_later, so each customer get's it's own little worker. This will also help aliviate CPU and Memory usage. You could even create a worker for sending e-mails per customer, and use your monolitical Job to merely dispatch those.
class NewslettertJob < ApplicationJob
def perform
...
customers.each |customer| do
NewsletterMailer.morning_blast(customer, data).deliver_later if customer.email.present?
end
end
end
However, I think the silver bullet is separating your sidekiq server from your web server - having one server dedicated to background tasks. On your web server, you don't even start the sidekiq instances.
I have a piece of code that performs the same queries over and over, and it's doing that in a background worker within a thread.
I checkout out the activerecord query cache middleware but apparently it needs to be enabled before use. However I'm not sure if it's a safe thing to do and if it will affect other running threads.
you can see the tests here: https://github.com/rails/rails/blob/3e36db4406beea32772b1db1e9a16cc1e8aea14c/activerecord/test/cases/query_cache_test.rb#L19
my question is: can I borrow and/or use the middleware directly to enable query cache for the duration of a block safely in a thread?
when I tried ActiveRecord::Base.cache do my CI started failing left and right...
EDIT: Rails 5 and later: the ActiveRecord query cache is automatically enabled even for background jobs like Sidekiq (see: https://github.com/mperham/sidekiq/wiki/Problems-and-Troubleshooting#activerecord-query-cache for information on how to disable it).
Rails 4.x and earlier:
The difficulty with applying ActiveRecord::QueryCache to your Sidekiq workers is that, aside from the implementation details of it being a middleware, it's meant to be built during the request and destroyed at the end of it. Since background jobs don't have a request, you need to be careful about when you clear the cache. A reasonable approach would be to cache only during the perform method, though.
So, to implement that, you'll probably need to write your own piece of Sidekiq middleware, based on ActiveRecord::QueryCache but following Sidekiq's middleware guide. E.g.,
class SidekiqQueryCacheMiddleware
def call(worker, job, queue)
connection = ActiveRecord::Base.connection
enabled = connection.query_cache_enabled
connection_id = ActiveRecord::Base.connection_id
connection.enable_query_cache!
yield
ensure
ActiveRecord::Base.connection_id = connection_id
ActiveRecord::Base.connection.clear_query_cache
ActiveRecord::Base.connection.disable_query_cache! unless enabled
end
end
I am using Unicorn as my app server for my Rails app, and am trying to figure out why there sometimes is sometimes a non-trivial (> 5 seconds) delay between the start of a request, and when it reaches my controller.
This is what my production.log prints out:
Started GET "/search/articles.json?q=mashable.com" for 138.7.7.33 at 2015-07-23 14:59:19 -0400**
Parameters: {"q"=>"mashable.com"}
Searching articles for keyword: mashable.com, format: json, Time: 2015-07-23 14:59:26 -0400
Notice how there is a 7 second delay in between STARTED GET: and "Searching articles for keyword", which is the first thing the controller method does.
articles.json is routed to my controller method "articles" which simply does this for now:
def articles
format = params[:format]
keyword = params["q"]
Rails.logger.info "Searching articles for keyword: #{keyword}, format: #{format}, Time: #{Time.now.to_s}"
end
This is my routes.rb
MyApp::Application.routes.draw do
match '/search/articles' => 'search#articles'
#more routes here, but articles is the first route
end
What could possibly cause this delay? Is it because an Unicorn worker is busy? Is it because an Unicorn worker is taking up too much memory which leads the system to be slow?
Note: I don't believe the delay is in making any database connections but I could be wrong. The code doesn't need to make a database call, and the max connections for my database is 1000, and there are usually at most 1-2 connections.
Three thoughts:
You'll probably be better served using Puma instead of Unicorn
It could be that your system is running out of memory, or it could have plenty of memory available: install New Relic to troubleshoot where the bottleneck is
It could also be that you have more Unicorn instances than the number of connections your DB allows, in which case the instance is having to wait for others to disconnect before it can connect. This would likely manifest itself with irregular 5-second delays rather than happening every time.
Actually, it might be caused by an before_filter callback, you should check it
I think it can be because of lack of memory and thus frequent garbage collection, which freeze whole system.
If it's a production problem it could be caused by slow clients sending requests. New Relic and Monit are good options. You could consider sending signals to Unicorn workers to restart them to better understand the problem.
You could also try adding preload_app true in your Unicorn config to speed up the startup time of worker processes.
I have a script called 'worker.rb'. When ran this script will perform processing for a while (an hour lets say) and then die.
I need to have another script which is going to be responsible for spawning the worker script above. Let's call this script 'runner.rb'. 'runner.rb' will be called with an argument dictating how many workers it is allowed to spawn.
I'd like runner.rb to do the following: (e.g. 'ruby runner.rb 5')
- Query the database for specific values (e.g. got 100 values)
- Spawn 5 instances of 'worker.rb' (passing the first 5 values respectively)
- Keep checking for any of the instances of 'worker.rb' spawned above to finish and then call 'worker.rb' again with the 6th value from the database and continue this process indefinitely.
I'm using the Daemons gem but am lost as the best way to go about this. The 'runner' script should definitely be daemonized - but should worker also be daemonized?
How should 'runner' go about checking if 'worker' has finished or not? Can this be done using a PID stored in a file?
I used Daemons gem before. But somehow it didn't do well on keep the number of child processes. Then I made a another one, called light_daemon. You could let light_daemon to prefork certain number of worker processes. If one of the worker dies for any reason, the light_daemon will spawn a new one to replace it. If your worker process may cause memory leaking issue, you could let the work to actively die before it gets too big. The parent process will keep the number of the worker processes constant. I used it in the produce site of one of my projects. I worked pretty well.
The following is an example daemon using the light-daemon gem.
require 'rubygems'
require 'light_daemon'
class Client
def initialize
#count = 0
end
def call
`echo "process: #{Process.pid}" >> /tmp/light-daemon.txt`
sleep 3
#count +=1
(#count < 100)? true : false
end
end
LightDaemon::Daemon.start(Client.new, :children=> 2, :pid_file => "/tmp/light-daemon.pid" )
In the daemon, the worker process dies after the method "call" is invoked 100 times. Then a new worker process is spawned and the process continues.
Running a rails site right now using SQLite3.
About once every 500 requests or so, I get a
ActiveRecord::StatementInvalid (SQLite3::BusyException: database is locked:...
What's the way to fix this that would be minimally invasive to my code?
I'm using SQLLite at the moment because you can store the DB in source control which makes backing up natural and you can push changes out very quickly. However, it's obviously not really set up for concurrent access. I'll migrate over to MySQL tomorrow morning.
You mentioned that this is a Rails site. Rails allows you to set the SQLite retry timeout in your database.yml config file:
production:
adapter: sqlite3
database: db/mysite_prod.sqlite3
timeout: 10000
The timeout value is specified in miliseconds. Increasing it to 10 or 15 seconds should decrease the number of BusyExceptions you see in your log.
This is just a temporary solution, though. If your site needs true concurrency then you will have to migrate to another db engine.
By default, sqlite returns immediatly with a blocked, busy error if the database is busy and locked. You can ask for it to wait and keep trying for a while before giving up. This usually fixes the problem, unless you do have 1000s of threads accessing your db, when I agree sqlite would be inappropriate.
// set SQLite to wait and retry for up to 100ms if database locked
sqlite3_busy_timeout( db, 100 );
All of these things are true, but it doesn't answer the question, which is likely: why does my Rails app occasionally raise a SQLite3::BusyException in production?
#Shalmanese: what is the production hosting environment like? Is it on a shared host? Is the directory that contains the sqlite database on an NFS share? (Likely, on a shared host).
This problem likely has to do with the phenomena of file locking w/ NFS shares and SQLite's lack of concurrency.
If you have this issue but increasing the timeout does not change anything, you might have another concurrency issue with transactions, here is it in summary:
Begin a transaction (aquires a SHARED lock)
Read some data from DB (we are still using the SHARED lock)
Meanwhile, another process starts a transaction and write data (acquiring the RESERVED lock).
Then you try to write, you are now trying to request the RESERVED lock
SQLite raises the SQLITE_BUSY exception immediately (indenpendently of your timeout) because your previous reads may no longer be accurate by the time it can get the RESERVED lock.
One way to fix this is to patch the active_record sqlite adapter to aquire a RESERVED lock directly at the begining of the transaction by padding the :immediate option to the driver. This will decrease performance a bit, but at least all your transactions will honor your timeout and occurs one after the other. Here is how to do this using prepend (Ruby 2.0+) put this in a initializer:
module SqliteTransactionFix
def begin_db_transaction
log('begin immediate transaction', nil) { #connection.transaction(:immediate) }
end
end
module ActiveRecord
module ConnectionAdapters
class SQLiteAdapter < AbstractAdapter
prepend SqliteTransactionFix
end
end
end
Read more here: https://rails.lighthouseapp.com/projects/8994/tickets/5941-sqlite3busyexceptions-are-raised-immediately-in-some-cases-despite-setting-sqlite3_busy_timeout
Just for the record. In one application with Rails 2.3.8 we found out that Rails was ignoring the "timeout" option Rifkin Habsburg suggested.
After some more investigation we found a possibly related bug in Rails dev: http://dev.rubyonrails.org/ticket/8811. And after some more investigation we found the solution (tested with Rails 2.3.8):
Edit this ActiveRecord file: activerecord-2.3.8/lib/active_record/connection_adapters/sqlite_adapter.rb
Replace this:
def begin_db_transaction #:nodoc:
catch_schema_changes { #connection.transaction }
end
with
def begin_db_transaction #:nodoc:
catch_schema_changes { #connection.transaction(:immediate) }
end
And that's all! We haven't noticed a performance drop and now the app supports many more petitions without breaking (it waits for the timeout). Sqlite is nice!
bundle exec rake db:reset
It worked for me it will reset and show the pending migration.
Sqlite can allow other processes to wait until the current one finished.
I use this line to connect when I know I may have multiple processes trying to access the Sqlite DB:
conn = sqlite3.connect('filename', isolation_level = 'exclusive')
According to the Python Sqlite Documentation:
You can control which kind of BEGIN
statements pysqlite implicitly
executes (or none at all) via the
isolation_level parameter to the
connect() call, or via the
isolation_level property of
connections.
I had a similar problem with rake db:migrate. Issue was that the working directory was on a SMB share.
I fixed it by copying the folder over to my local machine.
Most answers are for Rails rather than raw ruby, and OPs question IS for rails, which is fine. :)
So I just want to leave this solution over here should any raw ruby user have this problem, and is not using a yml configuration.
After instancing the connection, you can set it like this:
db = SQLite3::Database.new "#{path_to_your_db}/your_file.db"
db.busy_timeout=(15000) # in ms, meaning it will retry for 15 seconds before it raises an exception.
#This can be any number you want. Default value is 0.
Source: this link
- Open the database
db = sqlite3.open("filename")
-- Ten attempts are made to proceed, if the database is locked
function my_busy_handler(attempts_made)
if attempts_made < 10 then
return true
else
return false
end
end
-- Set the new busy handler
db:set_busy_handler(my_busy_handler)
-- Use the database
db:exec(...)
What table is being accessed when the lock is encountered?
Do you have long-running transactions?
Can you figure out which requests were still being processed when the lock was encountered?
Argh - the bane of my existence over the last week. Sqlite3 locks the db file when any process writes to the database. IE any UPDATE/INSERT type query (also select count(*) for some reason). However, it handles multiple reads just fine.
So, I finally got frustrated enough to write my own thread locking code around the database calls. By ensuring that the application can only have one thread writing to the database at any point, I was able to scale to 1000's of threads.
And yea, its slow as hell. But its also fast enough and correct, which is a nice property to have.
I found a deadlock on sqlite3 ruby extension and fix it here: have a go with it and see if this fixes ur problem.
https://github.com/dxj19831029/sqlite3-ruby
I opened a pull request, no response from them anymore.
Anyway, some busy exception is expected as described in sqlite3 itself.
Be aware with this condition: sqlite busy
The presence of a busy handler does not guarantee that it will be invoked when there is
lock contention. If SQLite determines that invoking the busy handler could result in a
deadlock, it will go ahead and return SQLITE_BUSY or SQLITE_IOERR_BLOCKED instead of
invoking the busy handler. Consider a scenario where one process is holding a read lock
that it is trying to promote to a reserved lock and a second process is holding a reserved
lock that it is trying to promote to an exclusive lock. The first process cannot proceed
because it is blocked by the second and the second process cannot proceed because it is
blocked by the first. If both processes invoke the busy handlers, neither will make any
progress. Therefore, SQLite returns SQLITE_BUSY for the first process, hoping that this
will induce the first process to release its read lock and allow the second process to
proceed.
If you meet this condition, timeout isn't valid anymore. To avoid it, don't put select inside begin/commit. or use exclusive lock for begin/commit.
Hope this helps. :)
this is often a consecutive fault of multiple processes accessing the same database, i.e. if the "allow only one instance" flag was not set in RubyMine
Try running the following, it may help:
ActiveRecord::Base.connection.execute("BEGIN TRANSACTION; END;")
From: Ruby: SQLite3::BusyException: database is locked:
This may clear up the any transaction holding up the system
I believe this happens when a transaction times out. You really should be using a "real" database. Something like Drizzle, or MySQL. Any reason why you prefer SQLite over the two prior options?