Is Puma WEB_CONCURRENCY on a per Dyno basis for Heroku? - ruby-on-rails

I'm using Puma web server on Heroku, and currently have 3 of the standard 2x dynos. The app is Ruby on Rails.
My understanding is that by increasing WEB_CONCURRENCY in /config/puma.rb it increases the number of puma workers, at the expense of additional RAM usage.
Current Setup:
workers ENV.fetch("WEB_CONCURRENCY") { 5 }
Question:
Is the 5 concurrent workers on a per dyno basis, or overall?
If I have 3 dynos, does this mean I have 15 workers, or only 5?
I was previously looking for a way to check the current number of existing workers, but couldn't find any commands to do this on Heroku.

Yes, the web concurrency is on a per-dyno basis.
Each dyno is an independent container, running on a different server. So you should see each dyno as an independent server.

Related

Rails: PUMA cluster mode vs threaded only?

In my rails app, hosted on the Digitalocean app platform, we're seeing high ram usage and can accommodate only 2 puma workers in a pro instance (1 vCPU, 2 GB Ram). Should I prefer more small instances without any workers as that would be more cost-effective?
i.e. if it was heroku, then should I use 2x dyno with more puma workers or multiple 1x dynos?
Is there any inherent advantage to using puma in cluster mode vs threaded only with more instances?

Why is there so many sidekiq process?

I ran htop in my production server to see what was eating my RAM. A lot of sidekiq process is running, is this normal?
Press Shift-H. htop shows individual threads as processes by default. There is only one actual sidekiq process.
You seem to have configured it to have 25 workers.
By default, one sidekiq process creates 25 threads.
If that's crushing your machine with I/O, you can adjust it down:
sidekiq -c 10
https://github.com/mperham/sidekiq/wiki/Advanced-Options
If you are not using JRuby then it's likely these all are seperate processes that consume memory.

How to schedule PhantomJS scraped on free Heroku dynos?

I'm under the impression that free dynos will spin down after a while.
What happens to a script that's running currently with my main ruby server / fires off PhantomJS sraper every now and again?
Do I need a dedicated worker process for this or will Heroku Scheduler do just fine alongside a paid dyno?
I've no issues paying for it, the development always takes a hot second and their workers are a little pricey.
Thanks in advance.
If you want to periodically run a script, Heroku Scheduler is really the ideal way to do this. It'll use one-off dynos, which DO count towards your free dyno allocation each month, but only run during the duration of the task, and stop afterwards.
This is much cheaper, for instance, than running a dedicated worker dyno that is up 24x7, vs a one-off dyno (powered by Heroku Scheduler) which only runs for a few minutes per day.

Can one Sidekiq instance process multiple queues from multiple applications?

I can run multiple sidekiq processes each one process queues for a specific Rails application.
But if I have 10 applications then it will be always running 10 Sidekiq processes each one consuming memory.
How to run only one Sidekiq process which will serve multiple applications?

Tell Sidekiq to use all available Heroku workers

I need to batch process a large set of files (millions of database records) as quickly as possible. For this purpose, I split the files into 3 directories and set up Sidekiq with the standard configuration (no config file).
I then started 3 Heroku workers and called 3 methods, which started 3 Sidekiq workers, all with the "default" queue. Initially, Sidekiq used 2 Heroku workers and after a while it decided to use only 1 worker.
How can I force Sidekiq to use all 3 workers to get the job done asap?
Thanks
I found the solution at the bottom of this page: http://manuelvanrijn.nl/blog/2012/11/13/sidekiq-on-heroku-with-redistogo-nano/
# app/config/sidekiq.yml
:concurrency: 1
# Procfile
web: bundle exec unicorn -p $PORT -c ./config/unicorn.rb
worker: bundle exec sidekiq -e production -C config/sidekiq.yml
Also, if you have many workers and a free / cheap Redis instance, make sure you limit the number of connections from each worker to the Redis server:
# app/config/initializers/sidekiq.rb
require 'sidekiq'
Sidekiq.configure_client do |config|
config.redis = { :size => 1 }
end
Sidekiq.configure_server do |config|
config.redis = { :size => 2 }
end
You can calculate the maximum of connections here: http://manuelvanrijn.nl/sidekiq-heroku-redis-calc/
I wanted to clarify a few things about your question. Your question reads "Tell Sidekiq to use all available Heroku workers". In fact, for each Dyno, a sidekiq process will execute using a command like bundle exec sidekiq -e production -C config/sidekiq.yml. Each one of these Sidekiq processes can handle multiple threads as specified in the config/sidekiq.yml file with a line like: :concurrency: 3 which is what the Sidekiq docs recommend for a Heroku standard-2x dyno (read here for more details https://github.com/mperham/sidekiq/wiki/Heroku), which only has 1gb of memory.
But technically you don't need to tell Sidekiq to use all available Heroku processes. There is another key element of this, which is the Redis server. Our main app will publish messages to a Redis server. Each Sidekiq process running on a given Dyno can be configured with the same Queue and thus all are subscribed to that Redis queue and pull the messages from the Queue. This is clearly stated by the creator of Sidekiq from the Sidekiq github page: https://github.com/mperham/sidekiq/issues/3603.
There are a couple of key points to share the load. First restrict the concurrency of a given Sidekiq process to a number like I mentioned above. Second, you can also limit the connections to the Redis server from within Sidekiq.configure_client. Finally, think of the Heroku load balancing somewhat different from how ALB in AWS works. ALB is a round robin that distributes traffic to instances in Target Groups based on certain metrics defined in Launch Templates and Auto Scaling Groups, such as vCPU utilization, memory utilization and read/write io. Rather the load balancing here is more like a publish-subscribe system where Sidekiq instances take work when they are able to and based on their restrictions both on concurrency and connection limits to the Redis server.
Finally, Heroku discourages long running jobs. The longer your job runs the higher the amount of memory it will eat up. Heroku dynos are expensive. The standard-2x is 4x the cost of a t3.micro in AWS for the same vCPU and Memory (1gb). Furthermore, in AWS you can create a spot fleet, where you purchase compute for 10 percent of the cost of its on-demand price and then execute these spot instances as batch jobs. In fact, AWS also has a service called AWS Batch. The spot fleet option doesn't exist in Heroku. Therefore, it's important to keep price in mind and thus how long the job runs. Read this article here where Heroku delineates why it is bad running long running jobs in Heroku environment: https://devcenter.heroku.com/articles/scaling#understanding-concurrency. Try to keep a job under 2 minutes.

Resources