I am trying to deploy a concurrent Rails 4 Puma app with capistrano and was confused by the example of capistrano-puma gem.
From the snipper from github
set :puma_threads, [0, 16]
set :puma_workers, 0
What are the differences of threads and workers in puma?
What does 0 puma worker means and [0, 16] threads mean?
What are the parameters to achieve concurrency? My aim is to achieve simple SSE to send notification. What are the best parameters to do in puma?
I am sorry if these are simple questions but i am having hard time finding resources online even on the official site, if someone can point me to an article which answer my question, i am happy to accept it. Thanks.
Tho not found in any documentation, I suppose set :puma_workers, 0 would means unlimited puma workers.
A worker is the number of processes running or instances of your application.
Each instance can run multiple threads. So if you have 2 workers running with max 16 threads it means your server can serve 2 * 16 = 32 requests at a time and if avg response time of your request is 100ms it means per second requests that could serve = (1000/100) * 32 = 320rps approx.
Related
I want to see if something is slow and occupied doing a request. Or if it is time to scale up.
I used Puma.stats but it only returns:
{
"started_at": "2020-09-07T14:43:53Z",
"backlog": 0,
"running": 7,
"pool_capacity": 3,
"max_threads": 7
}
I cannot see if the threadpool is full. Is there a way of seeing that info?
Puma.stats includes the information needed to know how many Puma threads are occupied. According to Puma's documentation, pool_capacity is the number of threads that are not occupied:
This number represents the number of requests that the server is capable of taking right now.
For example if the number is 5 then it means there are 5 threads sitting idle ready to take a request. If one request comes in, then the value would be 4 until it finishes processing.
I see this in Sidekiq official wiki that ActiveJob would be much slower.
But it is mentioned on Mar 2018, based on Rails 4.2 and Sidekiq 5.1.1 according to this issue, and the latest would be Rails 6 and Sidekiq 6.
Is it still this case that pure Sidekiq worker would be much suggested than ActiveJob with Sidekiq adapter?
I prepared a simple benchmark: https://github.com/mpan-wework/sidekiq-benchmark/actions?query=workflow%3ARSpec
CreateUserJob
behaves like Benchmark Job
"CreateUserJob-0, 1601366451612"
"CreateUserJob-last, 500, 1601366532766"
runs for 501 times
PureJob
behaves like Benchmark Job
"PureJob-0, 1601366532791"
"PureJob-last, 500, 1601366542691"
runs for 501 times
CreateUserWorker
behaves like Benchmark Worker
"CreateUserWorker-0, 1601366542695"
"CreateUserWorker-last, 500, 1601366621057"
runs for 501 times
PureWorker
behaves like Benchmark Worker
"PureWorker-0, 1601366621072"
"PureWorker-last, 500, 1601366630103"
runs for 501 times
Finished in 2 minutes 58.5 seconds (files took 1.72 seconds to load)
4 examples, 0 failures
The benchmark result is run over github actions with one postgres container as database and one redis container as cache.
Pure job or worker only contains in-memory commands, CreateUser job or worker will create 100 users through SQL.
0 represents the timestamp of the first job/worker to run; when every job/worker finishes, it will write its id and end time to cache, so last represents the last job/worker.
For each type of job/worker, 501 items are enqueued.
From the data collected, PureJob takes 9.900 seconds while PureWorker takes 9.031 seconds; CreateUserJob takes 81.154 seconds while CreateUserWorker takes 78.362 seconds. Sidekiq worker is faster than ActiveJob with Sidekiq adapter, but not as much as stated.
I have not yet tested on kubernetes cluster with multiple rails and sidekiq pods, but I guess the difference would not be significant.
ActiveJob is an adapter layer on top of Sidekiq::Worker, it will always add overhead. That said, how much overhead is version-, application- and usecase-specific so only you can measure your own system to determine the kind of overhead you are seeing.
Benchmarks show that Active Job is 2-20x times slower pushing jobs to Redis and has ~3x the processing overhead (with Rails 5.1.4 and Sidekiq 5.1.1).
https://github.com/mperham/sidekiq/wiki/Active-Job#performance
I am getting ActiveRecord::ConnectionTimeoutError once or twice in a day. could someone help me in calculating how many connections my application is making to my DB? and suggestion to optimise my connections?
Here is my configuration
AWS
Database : Mysql
Version : 5.7.23
Provider : AWS RDS (db.m5.large, vCPU: 2, RAM: 8GB)
3 servers with bellow configurations
# database.yml
pool: 20
# puma.rb
RAILS_MAX_THREADS : 5
WEB_CONCURRENCY : 2
1 sidekiq server with bellow configuration
# sidekiq
concurrency: 25
I tried to get max number of connection my DB is able to handle
# MySQL Max connections ("show global variables like 'max_connections';")
624
The total number of connections to a database equals the number of connections per server times the number of servers.
Total DB Connections = Connections per server * server count.
Connections per server = AR Database Pool Size * Processes per server (usually set with WEB_CONCURRENCY or SIDEKIQ_COUNT)
So for the web servers you have:
AR Database Pool Size = 20
Processes per server = 2
Server Count = 3
Total DB Connections(Web Server) = 20 * 2 * 3 = 120
The for the sidekiq server:
AR Database Pool Size = 20
Processes per server = 1
Server Count = 1
Total DB Connections(Sidekiq Server) = 20 * 1 * 1 = 20
So the total expected DB connections should be 140, which is way below the limit of the RDS instance.
My guess is that you are getting the ActiveRecord::ConnectionTimeoutError because your Sidekiq concurrency setting is higher than the AR connection pool value. All of the Sidekiq threads need an ActiveRecord database connection, so setting the AR pool size to a number smaller than Sidekiq's concurrency means some Sidekiq threads will become blocked waiting for a free database connection. In your case, at some point in time you might have 25 threads trying to access the database through a database pool that can use at most 20 connections and if a thread can't get a free database connection within 5 seconds, you get a connection timeout error.
In Sidekiq the total DB connections should be
minimum(Threads That Need a Database Connection, AR Database Pool Size) * Processes per Server (WEB_CONCURRENCY or SIDEKIQ_COUNT) * Server Count.
Additionally the Sidekiq documentation states that
Starting in Rails 5, RAILS_MAX_THREADS can be used to configure Rails and Sidekiq concurrency. Note that ActiveRecord has a connection pool which needs to be properly configured in config/database.yml to work well with heavy concurrency. Set pool equal to the number of threads pool: <%= ENV['RAILS_MAX_THREADS'] || 10 %>
Most of this answer is based on the Sidekiq in Practice email series from Nate Berkopec
I have a memory leak problem with my server (who is written in ruby on rails)
I want to implement a temporary solution that restarts the dynos automatically when their memory is exceeding. What is the best way to do this? And is it risky ?
There is a great solution for it if you're using Puma as a server.
https://github.com/schneems/puma_worker_killer
You can restart your server when the RAM exceeds some threshold:
for example:
PumaWorkerKiller.config do |config|
config.ram = 1024 # mb
config.frequency = 5 # seconds
config.percent_usage = 0.98
config.rolling_restart_frequency = 12 * 3600 # 12 hours in seconds
end
PumaWorkerKiller.start
Also, to prevent data corruption and other funny issues in your DB, I would also suggest to make sure you are covered with atomic transactions.
The default value is 4 hours. When I run my data to process, I got this error message:
E, [2014-08-15T06:49:57.821145 #17238] ERROR -- : 2014-08-15T06:49:57+0000: [Worker(delayed_job host:app-name pid:17238)] Job ImportJob (id=8) FAILED (1 prior attempts) with Delayed::WorkerTimeout: execution expired (Delayed::Worker.max_run_time is only 14400 seconds)
I, [2014-08-15T06:49:57.830621 #17238] INFO -- : 2014-08-15T06:49:57+0000: [Worker(delayed_job host:app-name pid:17238)] 1 jobs processed at 0.0001 j/s, 1 failed
Which means that the current limit is set on 4 hours.
Because I have a large amount of data to process that might take 40 or 80 hours to process, I was curious if I can set up this amount of hours for MAX_RUN_TIME.
Are there any limits or negatives for setting up, let's say, MAX_RUN_TIME on 100 hours? Or possibly, is there any other way to process this data?
EDIT: is there a way to set up MAX_RUN_TIME on an infinity value?
There does not appear to be a way to set MAX_RUN_TIME to infinity, but you can set it very high. To configure the max run time, add a setting to your delayed_job initializer (config/initializers/delayed_job_config.rb by default):
Delayed::Worker.max_run_time = 7.days
Assuming you are running your Delayed Job daemon on its own utility server (i.e. so that it doesn't affect your web server, assuming you have one) then I don't see why long run times would be problematic. Basically, if you're expecting long run times and you're getting them then it sounds like all is normal and you should feel free to up the MAX_RUN_TIME. However, it is also there to protect you so I would suggest keeping a reasonable limit lest you run into an infinite loop or something that actually never will complete.
As far as setting MAX_RUN_TIME to infinity... it doesn't look to be possible since Delayed Job doesn't make the max_run_time optional. And there's a part in the code where a to_i conversion is done, which wouldn't work with infinity:
[2] pry(main)> Float::INFINITY.to_i
# => FloatDomainError: Infinity