Limit clients from making client.restart() call is multiuser cluster - dask

I have created dask-ssh cluster which is shared among multiple users. We do not have resource manager yet (looking at job-queue with batch grid).
I like to limit admin to be able to restart cluster and not have users/client be able to run client.restart() at whim and interfere with other clients.
If its not possible then do you suggest to let every user create there own dask-cluster ?
Thanks!

I believe that the following config may solve your problem
distributed:
scheduler:
blocked-handlers:
- restart
If its not possible then do you suggest to let every user create there own dask-cluster ?
This is also a good option.

Related

How to use a scheduler(cron) container to execute commands in other containers

I've spent a fair amount of time researching and I've not found a solution to my problem that I'm comfortable with. My app is working in a dockerized environment:
one container for the database;
one or more containers for the APP itself. Each container holds a specific version of the APP.
It's a multi-tenant application, so each client (or tenant) may be related to only one version at a time (migration should be handle per client, but that's not relevant).
The problem is I would like to have another container to handle scheduling jobs, like sending e-mails, processing some data, etc. The scheduler would then execute commands in app's containers. Projects like Ofelia offer a great promise but I would have to know the container to execute the command ahead of time. That's not possible because I need to go to the database container to discover which version the client is in, to figure it out what container the command should be executed in.
Is there a tool to help me here? Should I change the structure somehow? Any tips would be welcome.
Thanks.
So your question is you want to get the APP's version info in the database container before scheduling jobs,right?
I think this is relate to the business, not the dockerized environment,you may have ways to slove the problem:
Check the network ,make sure the network of the container can connect to each other
I think the database should support RPC function,you can use it to get the version data
You can use some RPC supported tools,like SSH

Is it possible to assign worker resources to dask distributed worker after creation?

As per title, if I am creating workers via helm or kubernetes, is it possible to assign "worker resources" (https://distributed.readthedocs.io/en/latest/resources.html#worker-resources) after workers have been created?
The use case is tasks that hit a database, I would like to limit the amount of processes able to hit the database in a given run, without limiting the total size of the cluster.
As of 2019-04-09 there is no standard way to do this. You've found the Worker.set_resources method, which is reasonable to use. Eventually I would also expect Worker plugins to handle this, but they aren't implemented.
For your application of controlling access to a database, it sounds like what you're really after is a semaphore. You might help build one (it's actually decently straightforward given the current Lock implementation), or you could use a Dask Queue to simulate one.

Monitoring that process runs and active?

We have a number of rabbitmq consumers running. And I want to make sure that every one of those processes is working.
What are the best industry applied approaches for that?
We are considering using prometheus for that, is that the right direction?
You can go the Prometheus route. Use this link to get started, which would fairly quickly set you up with the ability to monitor RabbitMQ w/ Prometheus.
You can use AlertManager to setup alerts on failed processes and top that with automations to make sure you have service continuity.
HTH.

How does Celery discover new Nodes?

I'm running Celery and RabbitMQ Gunicorn in Docker.
My question is this: I understand that Celery is designed for distributed processing. What I have see no docs on at all is, assuming that I have several machines/nodes on the same LAN, how do they discover each other? Does RabbitMQ play a role? Do celery instances somehow discover each other? Is there a list of suitable hosts somewhere? If so, how do I edit it?
Also, assuming I'm going to use only one node to handle the HTTP requests, do I still need to have gunicorn running on all nodes? I ask this because in the gunicorn start command, it has a setting for the number of workers. And, is this setting applicable only to that node, or as a max total for all connected nodes?
EDIT:
After the first answer, I started working on this. It seems that I need some sort of networking setup, either swarm or bridging etc. I should clarify that I'm using docker-compose to bring up the solution, and I see that a normal swarm setup doesn't work, and I have to use something slightly different if I go that route.
To be clear: I need a way in which I can add celery workers on separate hosts and have them be able to communicate with the "main" host so that I can increase the capacity of the system. If someone could provide a clear process for achieving this or a link to such, it'd be most helpful.
I hope I've expressed this clearly, please let me know if you need any further info.
Thanks!
I feel like #ffledgling didn't fully answer the question so I am adding a note:
Here is a list of all events sent by the worker to the broker (in your case RabbitMq): http://docs.celeryproject.org/en/latest/userguide/monitoring.html#event-reference
As you can see, there are few worker self-related messages/events:
worker-online
worker-heartbeat
worker-offline
All of them contain a signature of the hostname. Therefore a successful handshake flow (not exactly handshake because master doesn't respond with message but using it as a metaphor here) may look like this:
>
new worker online --> worker send worker-online message to the queue --> master received and start to read logs from worker host --> master schedule tasks --> ...
Beyond that, host name is a standard body field in every event (both task and worker self-related), here is the documentation: http://docs.celeryproject.org/en/latest/internals/protocol.html?highlight=event%20reference#standard-body-fields
For example, if you look at task-started event: it also contains a hostname as signature, this is how the master knows who picked up the task and where to read the log of the task from.
I understand that Celery is designed for distributed processing. What
I have see no docs on at all is, assuming that I have several
machines/nodes on the same LAN, how do they discover each other? Does
RabbitMQ play a role? Do celery instances somehow discover each other?
Is there a list of suitable hosts somewhere? If so, how do I edit it?
Celery is a distributed task queue that works using a message brokering system such as RabbitMQ.
What essentially happens all celery workers connect a shared Queue such as RabbitMQ. The master(s) dispatch work by pushing it onto the queue. Workers who are connected to the Queue as well, pull work off of the queue and then attempt to execute it. Once it is finished (successfully or otherwise), it will push the results back onto the Queue, which the master(s) can then query.
Given this architecture, you do not need to add a list of hosts, they "auto-detect" work. You simply need to start them up and ensure they can talk to the Queue.
A slightly more detailed explanation from another SO answer.
Link to the architecture with a diagram.
Also, assuming I'm going to use only one node to handle the HTTP
requests, do I still need to have gunicorn running on all nodes? I ask
this because in the gunicorn start command, it has a setting for the
number of workers. And, is this setting applicable only to that node,
or as a max total for all connected nodes?
No, you do not need guicorn running on all the nodes, just the one you're using to serve HTTP requests via python. Celery workers do not need guicorn. The worker setting in guicorn refers to the number of workers in the HTTP listeners pool. This is separate, independent and unrelted to the set of workers that celery uses.

Single cron job across multiple AWS EC2 images

We have Ruby on Rails application running on EC2 and enabled autoscaling feature. We have been using whenever to manage cron. So new instances with an image of the main instance created automatically on spikes and dropped when low traffic. But this also copies cron jobs as well to newly created instances.
We have a specific requirement where we want to limit cron to a single instance.
I found a gem which looks like handing this specific requirement but I am skeptical about it, reason being it is for elastic beanstalk and no longer maintained.
as a workaround, you can have a condition within the cron specifying that the cron job should execute based on a condition that would elect a single instance among your autoscaling group. e.g have only the oldest instance running the cron, or only the instance having the "lowest" instance ID, or whatever you like as a condition.
you can achieve such a thing by having your instances calling the AWS API.
As a more proper solution, you maybe could use a single cronified lambda accessing your instances? this is now possible as per this page
Best is to set scale in protection. It prevents your instance being terminated during scaling events.
You can find more information here on AWS https://aws.amazon.com/blogs/aws/new-instance-protection-for-auto-scaling/

Resources