Run Sidekiq workers without database connection - ruby-on-rails

Each Sidekiq worker (thread) requires 1 connection to the database. Postgresql can have at most a few hundreds connections. This is a bottleneck for scalability.
Since I need about 1 thousand workers and Postgresql isn't required (I can pass all the data that I need through Redis and remove the SQL) I am wondering if it's possible to start the Rails environment without connections to Postgresql.
How can I start Sidekiq workers without Postgresql?
Note that I still need Postgresql for the normal web app/backend so I cannot remove ActiveRecord altogether from the Rails app.

If a thread doesn't use the database, it won't take a connection. This assumption is false:
Each Sidekiq worker (thread) requires 1 connection to the database.

Related

Is there a way to start a Rails puma webserver within rails without a db connection?

I have a microservice that is taking in webhooks to process but it is currently getting pounded by the sender of said webhooks. Right now I am taking them and inserting the webhooks into the db for processing but the data is so bursty at times that I don't have enough bandwidth to manage the flood of requests and I cannot scale anymore as I'm out of db connections. The current thought is to just take the webhooks and throw them into a Kafka queue for processing; using Kafka I can scale up the number of frontend workers to whatever I need to handle the deluge of requests and I have the replayability of Kafka. By throwing the webhooks into Kafka, the frontend web server no longer needs a pool of db connections as it literally is just taking the request and throwing into the queue for processing. Does anyone have any knowledge on removing the db connectivity from Puma or have an alternative to do what's being asked?
Currently running
ruby 2.6.3
rails 6.0.1
puma 3.11
Ended up using Puma's before fork and on_worker_boot methods to not re-establish the database connection for those particular web workers within the config

Ensure max database connections is not exceeded with Rails and Sidekiq

I am using Nginx with Phusion Passenger with a single-threaded Rails application. Here's the catch. Within that application, I am using multi-threaded sidekiq to perfrom some background jobs. Typically in my database.yml, I would only need to set the pool value to 1. Here's an example:
default: &default
adapter: mysql2
encoding: utf8
collation: utf8_unicode_ci
pool: 1
username: username
password: password
host: localhost
The reason is because for each tcp socket connection opened, when an http request comes in through that socket, nginx will take the request and pass the information to passenger. Passenger detects its a Rails app, and it spawns a Rails instance, which converts the response to html, which is sent back to nginx, which is then passed back to the client (browser) So for each passenger instance, I will only need one database connection, with a single-threaded Rails app.
But in my sidekiq.yml, I have set concurrency to 5:
:concurrency: 5
This means for each passenger rack instance, I will have 5 concurrent threads handled by sidekiq plus the one connection for the main app, that is a total of 6 database connections for one passenger instance.
When I look at passenger-status, I notice that max_pool_size is set to 6:
----------- General information -----------
Max pool size : 6
So does that mean passenger will never spawn more than 6 Rails instances concurrently? And if that's the case, does that mean my math is correct: 6 (instances) * 6 (database connections: 5 for sidekiq and 1 for main app) = 36 (total database connections possible for my rails app to handle concurrently).
Right now my mysql database is configured to handle 151 max concurrent connections.
SHOW VARIABLES LIKE "max_connections";
+-----------------+-------+
| Variable_name | Value |
+-----------------+-------+
| max_connections | 151 |
+-----------------+-------+
I just want to make sure my math is correct regarding passenger, rails and sidekiq.
First of all, your Sidekiq processes and your web server (in your case, Passenger) are separate. Passenger's thread pool size has no effect on your Sidekiq concurrency; instead, your Sidekiq configuration specifies a separate concurrency. So, we'll consider the two separately:
Passenger
The ActiveRecord database pool value is the number of database connections that your web process will use, in total across all threads. If your Passenger server is set up in multi-process mode, then your max connections from your web processes is db pool size * passenger pool size. On the other hand, if you set it up in multi-threaded mode (which I'd recommend if possible), your max connections is just db pool size (multiplied by however many processes are running; Puma, for example, runs by default two processes with up to fifteen threads or so, so the max connections in that case would be 30).
So, if you're using multi-threaded mode, a pool size of 1 is absolutely not sufficient -- you'll want at least as big a pool as you expect to have threads. In multi-process mode, 1 might work but I doubt it's really worth straying from the default of 5, until you encounter issues.
Sidekiq
Sidekiq always runs in multi-threaded mode (you can technically run multiple processes as well, but I'll assume you aren't). So, like above, you want your connection pool to be at least as big as the number of threads. This might mean that you technically need two different values for your db pool value depending on whether the Rails env is spinning up for Passenger, or for Sidekiq -- see this issue on the Sidekiq repo or this helpful Heroku guide for more information on how to address that.
In summary
Don't forget that, aside from all the above, you may easily have multiple servers all running the same Rails app, but only one database with one connection limit. If you're running Passenger in multi-instance mode with a max of 6 processes, set your db pool size to 5, then each web server node will use up to 30 connections. If it's running a Sidekiq server, then add 5 to that. You will probably not need more than one Sidekiq server, so 4 web nodes # 30 connections + one Sidekiq process # 5 connections = 125 maximum connections, well within your MySQL connection limit.
I reviewed the Passenger documentation again, and while the answer above answers the question, I want to add a little more detail:
HTTP client via TCP sends a request to Nginx
Phusion Passenger loaded into Nginx checks if request should be handled by Passenger. If so, request is sent to Passenger Core.
Passenger core, using load balancing rules, determines which process a request should be forwarded to.
Passenger core also takes care of application spawning: if it determines that having more application processes is necessary or beneficial, then it will make that happen subject to user-configured limits: the core will never spawn more processes than a user-configured maximum.
Passenger core also has monitoring and statistics: passenger-memory-stats and passenger-status
Passenger core restarts an application process if it crashes.
UstRouter sits idle and does not consume resources if you did not configure it to send data to Union Station, a monitoring web service
Watchdog monitors Passenger Core and UstRouter. If either of them crash, they are restarted by the Watchdog.
passenger-memory-stats will verify the three aforementioned processes as well as the spawned rack apps:
------ Passenger processes ------
PID VMSize Private Name
---------------------------------
18355 419.1 MB ? Passenger watchdog
18358 1096.5 MB ? Passenger core
18363 427.2 MB ? Passenger ust-router
18700 818.9 MB 256.2 MB Passenger RubyApp: myapp_rack_rails
24783 686.9 MB 180.2 MB Passenger RubyApp: myapp_rack_rails
passenger-status shows that the max_pool_size is 6. That is, at most there will be 6 rack apps spawned by Passenger Core:
----------- General information -----------
Max pool size : 6
App groups : 2
Processes : 3
As stated in another answer, the ActiveRecord database pool value is the number of database connections that your web process will use, in total across all threads.
But since I am using the free Passenger server, which is set up in multi-process mode, then my max connections from my web processes is db pool size * passenger pool size. So since Passenger pool size is 6, and if my db pool size is 1, that is 6 * 1 = 6. That will be 6 maximum database connections.
Sidekiq always runs in multi-threaded mode.
If someone wants to use sidekiq they must configure the number of threads they want to run on or use the default (25). If they are using a database (likely) then to not hit a connection timeout error they will need to have at least as many connections in their database pool as sidekiq threads. Currently they must configure these two values in two different places, database pool in database.yml for ActiveRecord, and sidekiq connections either via command line or the sidekiq yml file. This is a problem as it is difficult to remember when you are modifying one value that you need to modify both.

Postgres connection not closing after sidekiq Ruby script

It's a small Ruby script running under Sidekiq. It opens a connection with
db_connect = Sequel.connect(#db_credential, search_path: #namespace)
It never explicitly closes the connection; I think this is not supposed to be necessary?
After the script has has been run many times, and they have all completed, and the Sidekiq web panel shows no tasks running or queued, Postgres shows 60 Sidekiq connections:
postgres=# select count(*) from pg_stat_activity where application_name like '%sidekiq%';
count
-------
60
(1 row)
The database is on localhost, so nothing else is creating these connections.
psql 9.3.6, Sidekiq 3.3.3, Rails 4.0.0, ruby 2.1.1p76, sequel 4.19.0, Ubuntu 14.04.2 LTS.
You can:
Either use Sequel pooling by connecting only once and maintaining db_connect value between your Sidekiq tasks executions
Or you can connect every time, but then you have to disconnect manually by calling the disconnect method (http://sequel.jeremyevans.net/rdoc/classes/Sequel/Database.html#method-i-disconnect).
I believe the issue with your current approach is that you're constructing a new connection pool on every Sidekiq tasks execution by calling Sequel.connect, and these connections keep hanging around. It may take a long time before they're actually garbage collected, if ever.

How does Redis work with Rails and Sidekiq

Problem: need to send e-mails from Rails asynchronously.
Environment: Windows 7, Ruby 2.0, Rails 4.1, Sidekiq, Redis
After setting everything up, starting Sidekiq and starting Redis, I can see the mail request queued to Redis through the monitor:
1414256204.699674 "exec"
1414256204.710675 "multi"
1414256204.710675 "sadd" "queues" "default"
1414256204.710675 "lpush" "queue:default" "{\"retry\":true,\"queue\":\"default\",\"class\":\"Sidekiq::Extensions::DelayedMailer\",\"args\":[\"---\\n- !ruby/class 'UserMailer'\\n- :async_reminder\\n- - 673\\n\"],\"jid\":\"d4024c0c219201e5d1649c54\",\"enqueued_at\":1414256204.709674}"
But the mailer method never seems to get executed. The mail doesn't get sent and none of the log messages show up.
How does Redis know to execute the job on the queue and does something else need to be setup in the environment for it to know where the application resides?
Is delayed_job a better solution?
I started redis in one window, bundle exec sidekiq in another window, and rails server in a third window.
How does an item on the redis queue get picked up and processed? Is sidekiq both putting things on the redis queue and checking to see if something was added that needs to be processed?
Redis is used just for storage. It stores jobs to be done. It does not execute anything. DelayedJob uses your database for job storage instead of Redis.
Rails process pushes new jobs to Redis.
Sidekiq process pops jobs from Redis and executes them.
In your MONITOR output, you should see LPUSH commands when Rails sends mail. You should also see BRPOP commands from Sidekiq.
You need to make sure that both Rails and Sidekiq processes use the same Redis server, database number, and namespace (if any). It's a frequent problem that they don't.

Rails3 active record pool and Sidekiq multi-thread

I am using sidekiq with rails3. Sidekiq runs 25 threads default. I would like to increase multi-thread limit, I have done this by changing sidekiq.yml.
So, what is the relation between pool value in database.yml and sidekiq multi-thread. What is the maximun value of mysql pool. Is it depends on server memory?
sidekiq.yml
:verbose: true
:concurrency: 50
:pool: 50
:queues:
- [queue_primary, 7]
- [default, 5]
- [queue_secondary, 3]
database.yml
production:
adapter: mysql2
encoding: utf8
reconnect: false
database: db_name
pool: 50
username: root
password: root
socket: /var/run/mysqld/mysqld.sock
Each Sidekiq job executes in one of up to 50 threads with your configuration. Inside the job, any time an ActiveRecord model needs to access the database, it uses a database connection from the pool of available connections shared by all ActiveRecord models in this process. The connection pool lets a thread take a connection or blocks until a free connection is available.
If you have less connections available in your ActiveRecord database connection pool than running Sidekiq jobs/threads, jobs will be blocked waiting for a connection and possibly timeout (after ~ 5 seconds) and fail.
This is why it's important that you have as many available database connections as threads in your sidekiq worker process.
Unicorn is a single-threaded, multi-process server - so you shouldn't need more than one connection for each Unicorn back-end worker process.
However, the database can only handle so many connections (depending on OS, hardware, and configuration limits) so you need to make sure that you are distributing your database connections where they are needed and not exceeding your maximum.
For example, if your database is limited to 1000 connections, you could only run 20 sidekiq processes with 50 threads each and nothing else.

Resources