It seems that cheaper and zerg (with broodlord) do much the same thing: that is spawning additional workers when needed.
What is the difference, and why would I use one and not the other?
With cheaper you choose the max number of processes an instance could spawn, no way to modify it without changing the config and reloading.
Zerg mode allows you to run new instances (even with a different config) attached to the same socket used by an already running instance.
This allows various forms of autoscaling.
Related
Looking for guidance on reactor schedulers.
I want to run certain IO tasks in background, i.e. send emails to the tech team. To make it asynchronous I use Mono.fromRunnable subscribed to a scheduler.
I have a choice to either use Schedulers.elastic() or Schedulers.newElastic(). I prefer the latter because it allows me to give it a unique name which would help in the logs analysis.
Is it ok to make a static variable e.g.
Scheduler emailSched = Schedulers.newElastic("email");
and subscribeOn my Mono to it versus should I create a new Scheduler instance every time?
I found only What is the difference between Schedulers.newElastic and Schedulers.elastic methods? and that did not help my question much.
should I create a new Scheduler instance every time?
There's no technical reason why you need to if you don't want to. In most instances it probably doesn't matter.
The key differences are:
You can give it a different name if you want (trivial)
Any individual elastic scheduler will cache and reuse the executors it creates under the hood, with a default timeout of 60 seconds. That caching is not shared between different scheduler instances of the same name, however.
You can dispose any individual elastic scheduler without effecting other schedulers of the same name.
In the case you describe, none of those really factor into play.
Separate to the above, note that Schedulers.boundedElastic() is now the preferred option, especially for wrapping blocking IO (which seems to be what you're doing there.)
As per title, if I am creating workers via helm or kubernetes, is it possible to assign "worker resources" (https://distributed.readthedocs.io/en/latest/resources.html#worker-resources) after workers have been created?
The use case is tasks that hit a database, I would like to limit the amount of processes able to hit the database in a given run, without limiting the total size of the cluster.
As of 2019-04-09 there is no standard way to do this. You've found the Worker.set_resources method, which is reasonable to use. Eventually I would also expect Worker plugins to handle this, but they aren't implemented.
For your application of controlling access to a database, it sounds like what you're really after is a semaphore. You might help build one (it's actually decently straightforward given the current Lock implementation), or you could use a Dask Queue to simulate one.
This is just an idea, let me know if I'm missing anything or if it could be a good one.
It's common to have N rails processes running on a single server/VM, but they can't perform at best due GIL (Global Interpreter Lock).
Instead of running N processes inside a single server I could run N containers each one with a single rails process (each one running on a different port).
In this way I should be able to execute more rails processes in parallel, right?
I guess containers add overhead but probably it could make sense anyway.
Any opinions?
Thank you
This would be far less efficient than running N processes. The simple reason here is that most process managers for Ruby on Rails use the "pre-fork" model where a large amount of code is loaded in before the processes are split off.
A fork of two process uses very little additional memory, the second process inherits a near exact copy of the first. Any changes made to this will incur more memory overhead, but as things like the Rails library and other gems are not changed, that comes along for free, basically.
If you had multiple processes that are independent, each would need to load, parse, and initialize every Ruby class necessary to run Rails.
This isn't to say that the container-ized approach isn't without merit, but it may necessitate a hybrid approach: N containers with M processes each.
Remember, if you're really having trouble with the GIL, just use Jruby which doesn't have one.
This won't improve concurrency at all: the GIL applies to threads within a single process. Multiple processes can already execute concurrently - the pattern of having multiple rails processes arises because of the GIL.
As tadman says, you'll also potentially increase memory usage. You might be able to estimate it (assuming you deploy using passenger) as the passenger-memory-stats tool allows you to get RSS as well as private dirty RSS (i.e. memory that is resident, but not shared with the parent process). If the non shared memory is almost none then you wouldn't waste any by following a non fork model.
Containers are wonderful things, but what you've outlined isn't a reason to use them.
I have a VPS which is hosting (currently) 5 different rails applications, all with different domains. To make them work I've added one server {} listener per app in my nginx config file. I've left everything else as default, for instance there's only one nginx worker process.
Concurrently, I also have 2 rails workers for one of the apps.
Now, this works as is, but performances are low, in particular speed. How could I make my apps quicker by adhering to my constraints?
Thanks!
Your problem is that you are deep into swap. The slowness you experiencing switching apps is the system loading the requested app from swap into physical memory.
To address this, you can observe who is hogging the memory (also using 'top'), and address that. It's possible you'll find some things to tune, but also quite possible you'll find that you are near the physical limits of what's possible without significant architectural changes.
If your time is worth much, your best course of action will be to upgrade to an instance with at least a 1GB of memory, because you are already using nearly that much.
The nginx "worker_processes" should be set to the number of cores that you have available to work with. You mentioned you had it set to 1. Do you have more cores than that?
Given that each Ruby-On-Rails application needs at least about 40MBs of memory, I was wondering if there is a way of running multiple rails-application instances (different ones) over one interpreter of Ruby so that all shared libraries (rmagick etc) are shared between different application instances, saving space.
If that would be possible, then, I could be running 5-6 rails applications in a single 256RAM virtual server.
Is that possible?
I don't think this is possible without substantially changing the current code base.
But all is not lost.
If these websites are fairly low traffic and you have a fast vps you should take in mind that mod passenger drops the instances from memory if they're inactive for a while. So in theory you could run an unlimited amount of applications as long as you only have a few active at the same time. The price is a slower response on the first request that loads the instance.
Another option would be to load all the shared libraries, then fork off as many child processes as you have apps (use Process.fork) and run a different app in each child.
Pages of memory which are only read and not written will be shared between the parent and child processes.