How can I keep a PBSCluster running? - dask

I have access to a cluster running PBS Pro and would like to keep a PBSCluster instance running on the headnode. My current (obviously broken) script is:
import dask_jobqueue
from paths import get_temp_dir
def main():
temp_dir = get_temp_dir()
scheduler_options = {'scheduler_file': temp_dir / 'scheduler.json'}
cluster = dask_jobqueue.PBSCluster(cores=24, memory='100GB', processes=1, scheduler_options=scheduler_options)
if __name__ == '__main__':
main()
This script is obviously broken because after the cluster is created the main() function exits and the cluster is destroyed.
I imagine I must call some sort of execute_io_loop function, but I can't find anything in the API.
So, how can I keep my PBSCluster alive?

I'm thinking that the section of the Python API (advanced) in the docs might be a good way to try to solve this issue.
Mind you this is an example of how to create Schedulers and Workers, but I'm assuming that the logic could be used in a similar way for your case.
import asyncio
async def create_cluster():
temp_dir = get_temp_dir()
scheduler_options = {'scheduler_file': temp_dir / 'scheduler.json'}
cluster = dask_jobqueue.PBSCluster(cores=24, memory='100GB', processes=1, scheduler_options=scheduler_options)
if __name__ == "__main__":
asyncio.get_event_loop().run_until_complete(create_cluster())
You might have to change the code a bit, but it should keep your create_cluster running until it finished.
Let me know if this works for you.

Related

Luigi: Running Tasks in a Docker Image

I am testing Luigi's abilities for running tasks in a Docker container, i.e. I would like Luigi to spawn a container from a given Docker image and execute a task therein.
As far as I understood, there is luigi.contrib.docker_runner.DockerTask for this purpose. I tried to modify its command and add an output:
import luigi
from luigi.contrib.docker_runner import DockerTask
class Task(DockerTask):
def output(self):
return luigi.LocalTarget("bla.txt")
def command(self):
return f"touch {self.output()}"
if __name__ == "__main__":
luigi.build([Task()], workers=2, local_scheduler=True)
But I am getting
It seems that there is a TypeError in docker. Is my use of DockerTask erroneous? Unfortunately, I cannot find any examples for use cases...
Here the answer from my reply to your email:
luigi.LocalTarget doesn't return a file path or string, but a target object, therefore f"touch {self.output()}" isn't a valid shell command.
Second, I just looked into the documentation of DockerTask.
https://luigi.readthedocs.io/en/stable/_modules/luigi/contrib/docker_runner.html#DockerTask
command is should be a property, not a method. This explains your error message, the code calls "Task().command", which is a function, and not a string. A solution might be to use a static attribute for the output name
class Task(DockerTask):
output_name = "bla.txt"
def output(self):
return luigi.LocalTarget(self.output_name)
command = f"touch {self.output_name}"
or to use a property decorator:
class Task(DockerTask):
def output(self):
return luigi.LocalTarget("bla.txt")
#property
def command(self):
target = self.output() # not a string/fpath
return f"touch {target.path}"
though I'm not sure whether this would work, I'd prefer the first option.
In the future, make sure to check the documentation for the base class that you're implementing, e.g. checking if an attribute is a method or property, maybe use a debugger to understand what's happening.
And for the sake of searchability and saving data and bandwidth, in the future try to post error logs not as screenshots but in text, either as a snippet or via a link to a pastebin or something like that.

How to monitor Python Application(without django or other framework) in instana

i have a simple python application without any framework and API calls.
how i will monitor python application on instana kubernates.
i want code snippet to add with python application ,which trace application
and display on instana
how i will monitor python application on instana kubernates
There is publicly available guide, that should help you setting up the kubernetes agent.
i have a simple python application without any framework and API calls
Well, instana is for distributed tracing, meaning distributed components calling each other, each other's APIs predominantly by using known frameworks (with registered spans).
Nevertheless, you could make use of SDKSpan, here is a super simple example:
import os
os.environ["INSTANA_TEST"] = "true"
import instana
import opentracing.ext.tags as ext
from instana.singletons import get_tracer
from instana.util.traceutils import get_active_tracer
def foo():
tracer = get_active_tracer()
with tracer.start_active_span(
operation_name="foo_op",
child_of=tracer.active_span
) as foo_scope:
foo_scope.span.set_tag(ext.SPAN_KIND, "exit")
result = 20 + 1
foo_scope.span.set_tag("result", result)
return result
def main():
tracer = get_tracer()
with tracer.start_active_span(operation_name="main_op") as main_scope:
main_scope.span.set_tag(ext.SPAN_KIND, "entry")
answer = foo() + 21
main_scope.span.set_tag("answer", answer)
if __name__ == '__main__':
main()
spans = get_tracer().recorder.queued_spans()
print('\nRecorded Spans and their IDs:',
*[(index,
span.s,
span.data['sdk']['name'],
dict(span.data['sdk']['custom']['tags']),
) for index, span in enumerate(spans)],
sep='\n')
This should work in any environment, even without an agent and should give you an output like this:
Recorded Spans and their IDs:
(0, 'ab3af60079f3ca57', 'foo_op', {'span.kind': 'exit', 'result': 21})
(1, '53b67f7298684cb7', 'main_op', {'span.kind': 'entry', 'answer': 42})
Of course in a production, you wouldn't want to print the recorded spans, but send it to the well configured agent, so you should remove setting the INSTANA_TEST.

how to set up proper printout destination for dask multiprocessing in jupyter notebook on linux

I am using dask in jupyter notebook on a linux server to run python functions on multiple CPUs. The python functions have standard print statement. I would like the output of the print to be shown in the jupyter notebook right below the cell. However, the print out were all shown in the console. Can anyone explain why this happens and how to make dask.function.print output to the notebook, or both the console and the notebook.
The following is a simplified version of the problem:
import dask
import functools
from dask import compute, delayed
iter_list=[0,1]
def iFunc(item):
print('Meme',item)
# call this function itself will print normally to the
# notebook below the cell, desired.
with dask.config.set(scheduler='processes',num_workers=2):
func1=functools.partial(iFunc)
ret=compute([delayed(func1)(item) for item in iter_list])
# surprisingly, Meme 0, Meme 1 only print out to the console,
# not the notebook, Not desired, hard to debug. Any clue?
The whole point of dask is leveraging multiple threads, processes, or nodes/machines to distribute work. The workers you create are therefore not on the same thread as your client, and may not be on the same process, or even the same machine (or like, in the same country) as your client, depending on how you set up your cluster.
If you start a LocalCluster from your jupyter notebook, whether you're using threads or processes, you should see printed output appearing as output in the cells which execute jobs on the workers:
In [1]: import dask.distributed as dd
In [2]: client = dd.Client(processes=4)
In [3]: def job():
...: print("hello from a worker!")
In [4]: client.submit(job).result()
hello from a worker!
However, if a different process is spinning up your workers, it is up to that process to decide how to handle stdout. So if you're spinning up workers using the jupyterlab terminal, stdout will appear there. If you're spinning up workers in a kubernetes pod, stdout will appear in the worker logs. Dask doesn't actively manage standard out, so it's up to you to handle this. Note that this also applies to logging - neither stdout nor logs are captured by dask. This is actually a really important design feature - many distributed systems have their own systems for managing the standard out & logging of nodes, and dask does not want to impose its own parallel/conflicting system for handling output. The main focus of dask is executing the tasks, not managing a distributed logging system.
That said, dask does have the infrastructure for passing around messages, and this is something the package could support. There is an open issue and pull request attempting to add this ability as a feature, but it looks like there are a lot of open design questions that would need to be resolved before this could be added. Many of them revolve around the issues I raised above - how to add a clean distributed logging feature without overburdening the scheduler, complicating the already complex set of configuration options, or overriding the important, existing logging systems users rely on. The dask core team seems to agree that this is a good idea, if the tough design questions can be resolved.
You certainly always have the option of returning messages. For example, the following would work:
In [10]: def job():
...: return_blob = {"diagnostics": {}, "messages": [], "return_val": None}
...: start = time.time()
...: return_blob["diagnostics"]["start"] = start
...:
...: try:
...: return_blob["messages"].append("raising error")
...: # this causes a DivideByZeroError
...: return_blob["return_val"] = 1 / 0
...: except Exception as e:
...: return_blob["diagnostics"]["error"] = e
...:
...: return_blob["diagnostics"]["end"] = time.time()
...: return return_blob
...:
In [11]: client.submit(job).result()
Out[11]:
{'diagnostics': {'start': 1644091274.738912,
'error': ZeroDivisionError('division by zero'),
'end': 1644091274.7389162},
'messages': ['raising error'],
'return_val': None}

Dask Client detect local default cluster already running

from dask.distributed import Client
Client()
Client(do_not_spawn_new_if_default_address_in_use=True) # should not spawn a new default cluster
Is this possible to do somehow?
The non-public function distributed.client._get_global_client() will return the current client if it exists, or None
client = _get_global_client() or Client()
Since it is internal, the API may change without notice.
You shouldn't really be creating multiple clients within the same Python session. It may be worth digging deeper into why you are calling Client() more than once.
If you already have a Dask cluster running on the default address you could set the DASK_SCHEDULER_ADDRESS environment variable which will instruct the client to look there instead of creating a local cluster.
>>> import os
>>> os.environ['DASK_SCHEDULER_ADDRESS'] = 'tcp://localhost:8786'
>>> from dask.distributed import Client
>>> Client() # Does not create a cluster
<Client: 'tcp://127.0.0.1:8786' processes=0 threads=0, memory=0 B>

Seeing logs of dask workers

I'm having trouble changing the temporary directory in Dask. When I change the temporary-directory in dask.yaml for some reason Dask is still writing out in /tmp (which is full). I now want to try and debug this, but when I use client.get_worker_logs() I only get INFO output.
I start my cluster with
from dask.distributed import LocalCluster, Client
cluster = LocalCluster(n_workers=1, threads_per_worker=4, memory_limit='10gb')
client = Client(cluster)
I already tried adding distributed.worker: debug to the distributed.yaml, but this doesn't change the output. I also check I am actually changing the configuration by calling dask.config.get('distributed.logging')
What am I doing wrong?
By default LocalCluster silences most logging. Try passing the silence_logs=False keyword
cluster = LocalCluster(..., silence_logs=False)

Resources