I'd like to use dask-distributed, both for supported algorithms and for general task graph execution. Unfortunately, the batch scheduler we use doesn't support DRMAA so I can't use dask-drmaa. We have NFS available to all hosts. Is there a way I can start using Dask or do I need to get DRMAA supported by the batch scheduler?
You can use your batch scheduler to run the dask-scheduler and dask-worker processes, which are just normal python processes. Because you have a shared network file system (NFS), this should be particularly easy.
Use your batch scheduler to run the following command
dask-scheduler --scheduler-file /path/to/some/new-file.json
Also use your batch scheduler to run the following command many times
dask-worker --scheduler-file /path/to/some/new-file.json
dask-worker --scheduler-file /path/to/some/new-file.json
dask-worker --scheduler-file /path/to/some/new-file.json
The scheduler and workers will coordinate their locations by that file and synchronize with each other.
You can then connect to this cluster from any Python session on the same network with the following commands
>>> from dask.distributed import Client
>>> client = Client(scheduler_file='/path/to/some/new-file.json')
Further information can be found here: http://distributed.readthedocs.io/en/latest/setup.html#using-a-shared-network-file-system-and-a-job-scheduler
Related
A local computer (client) has some resources that would be useful in computation, but if the distributed cluster is used, then the client will be idle during client.gather. How can I create a worker on a local machine that will connect to the existing distributed scheduler?
I remember reading an issue about this on GitHub, but I can't find it anymore (the thread was in the context of using GPU on the client).
I was looking for the answer in the wrong repos, the correct repo is dask_jobqueue and this is the link: https://github.com/dask/dask-jobqueue/issues/471
In that thread, #stuarteberg suggests a workaround: copy the local worker specs by temporarily opening a local cluster, add the new worker specs to the distributed cluster specifications (one of dask jobqueues clusters) and scale the number of workers appropriately, which will lead to creation of a local worker.
Assume we are running on a Localhost environment and my script is running some sort of Dask tasks.
Is there a way to find out what scheduler/tasks are running? (since I donĀ“t know what port the scheduler is running on)
You can find out the various ports active in the scheduler by calling client.scheduler_info()
I would then recommend using the diagnostic dashboard if possible. The port for the dashboard is listed under "bokeh" in the "services" key
>>> client.scheduler_info()['services']
{'bokeh': 43917}
Alternatively you can always use the client.run_on_scheduler function to inspect scheduling state directly.
While running Dask 0.16.0 on OSX 10.12.6 I'm unable to connect a local dask-worker to a local dask-scheduler. I simply want to follow the official Dask tutorial. Steps to reproduce:
Step 1: run dask-scheduler
Step 2: Run dask-worker 10.160.39.103:8786
The problem seems to related to the dask scheduler and not the worker, as I'm not even able to access the port by other means (e.g., nc -zv 10.160.39.103 8786).
However, the process is clearly still running on the machine:
My first guess is that due to network rules your computer may not accept network connections that look like they're coming from the outside world. You might want to try using dask-worker localhost:8786 and see if that works instead.
Also, as a reminder, you can always start a scheduler and worker directly from Python without creating dask-scheduler and dask-worker processes
from dask.distributed import Client
# client = Client('scheduler-address:8786')
client = Client() # create scheduler and worker automatically
As a foolproof method you can also pass processes=False which will avoid networking issues entirely
client = Client(processes=False)
I would like to do the equivalent of Client(LocalCluster()) from the command line.
When interacting with distributed from Jupyter notebooks, I end up restarting my kernel often and starting a new LocalCluster each time, as well as refreshing my bokeh webpage.
I would much rather have a process running in the background that I could just connect to, is this possible?
The relevant doc page here is http://distributed.readthedocs.io/en/latest/setup.html#using-the-command-line
In one terminal, write the following:
$ dask-scheduler
In another terminal, write the following:
$ dask-worker localhost:8786
The defaults are a bit different here. LocalCluster creates N single-threaded workers while dask-worker starts one N-threaded worker. You can change these defaults with the following keywords
$ dask-worker localhost:8786 --nthreads 1 --nprocs 4
I don't start my IPython cluster with the ipcluster command but with the individual commands ipcontroller and ipengine because I use several machines over a network. When starting the cluster with the ipcluster command, stopping the cluster is rather straightforward:
ipcluster stop
However, I haven't been able to found the procedure when using the individual commands separately.
Thanks for your help
The easiest way is by connecting a Client and issuing a shutdown command:
import ipyparallel as ipp
c = ipp.Client()
c.shutdown(hub=True)
Client.shutdown() shuts down engines; adding hub=True tells it to bring down the central controller process as well.