Computing dask array chunks asynchronously (Dask + FastAPI) - dask

I am building a FastAPI application that will serve chunks of a Dask Array. I would like to leverage FastAPI's asynchronous functionality alongside Dask-distributed's ability to operate asynchronously. Below is a mcve that demonstrates what I'm trying to do on both the server and client sides of the application:
Server-side:
import time
import dask.array as da
import numpy as np
import uvicorn
from dask.distributed import Client
from fastapi import FastAPI
app = FastAPI()
# create a dask array that we can serve
data = da.from_array(np.arange(0, 1e6, dtype=np.int), chunks=100)
async def _get_block(block_id):
"""return one block of the dask array as a list"""
block_data = data.blocks[block_id].compute()
return block_data.tolist()
#app.get("/")
async def get_root():
time.sleep(1)
return {"Hello": "World"}
#app.get("/{block_id}")
async def get_block(block_id: int):
time.sleep(1) # so we can test concurrency
my_list = await _get_block(block_id)
return {"block": my_list}
if __name__ == "__main__":
client = Client(n_workers=2)
print(client)
print(client.cluster.dashboard_link)
uvicorn.run(app, host="0.0.0.0", port=9000, log_level="debug")
Client-side
import dask
import requests
from dask.distributed import Client
client = Client()
responses = [
dask.delayed(requests.get, pure=False)(f"http://127.0.0.1:9000/{i}") for i in range(10)
]
dask.compute(responses)
In this setup, the compute() call in _get_block is "blocking" and only one chunk is computed at a time. I've tried various combinations of Client(asynchronous=True) and client.compute(dask.compute(responses)) without any improvement. Is it possible to await the computation of the dask array?

This line
block_data = data.blocks[block_id].compute()
is a blocking call. If you instead did client.compute(data.blocks[block_id]), you would get an awaitable future that can be used in conjunction with you IOLoop, so long as Dask is using the same loop.
Note that the Intake server would very much like to work in this manner (it too aspires to stream data by chunks for arrays and other data types).

Related

An example of workers with the right resource allocation in the dask distributed

Does anyone have a working sample code that shows you can use CPU and GPU workers selectively with the client.submit api that dask distributed provides here?
I am trying to train xgboost with dask-cudf in a distributed manner on GPU machines but I am not able to make it respects the resource tags I provide for different tasks
My friend and coworker, #pentschev (github), wanted to point you to this example from here:
https://github.com/dask/distributed/pull/4869#issue-909265778
import asyncio
import threading
import dask
from dask.distributed import Client, Scheduler, Worker
from distributed.threadpoolexecutor import ThreadPoolExecutor
def get_thread_name(prefix):
return prefix + threading.current_thread().name
async def main():
async with Scheduler() as s:
async with Worker(
s.address,
nthreads=5,
executor={
"GPU": ThreadPoolExecutor(1, thread_name_prefix="Dask-GPU-Threads")
},
resources={"GPU": 1, "CPU": 4},
) as w:
async with Client(s.address, asynchronous=True) as c:
with dask.annotate(resources={"CPU": 1}, executor="default"):
print(await c.submit(get_thread_name, "CPU-"))
with dask.annotate(resources={"GPU": 1}, executor="GPU"):
print(await c.submit(get_thread_name, "GPU-"))
if __name__ == "__main__":
asyncio.get_event_loop().run_until_complete(main())
Output:
CPU-Dask-Default-Threads'-29802-2
GPU-Dask-GPU-Threads-29802-3

Display progress on dask.compute(*something) call

I have the following structure on my code using Dask:
#dask.delayed
def calculate(data):
services = data.service_id
prices = data.price
return [services, prices]
output = []
for qid in notebook.tqdm(ids):
r = calculate(parts[parts.quotation_id == qid])
output.append(r)
Turns out that, when I call the dask.compute() method over my output list, I don't have any progress indication. The Diagnostic UI don't "capture" this action, and I'm not even sure that's properly running (judging by my processor usage, I think it's not).
result = dask.compute(*output)
I'm following the "best practices" article from the dask's documentation:
https://docs.dask.org/en/latest/delayed-best-practices.html
What I'm missing?
Edit: I think it's running, because I still got memory leak/high usage warnings. Still no progress indication.
As pointed out in the related post, dask has two methods for displaying the progress: one for "normal" dask, and one for dask.distributed.
Here's a reproducible example:
import random
from time import sleep
import dask
from dask.diagnostics import ProgressBar
from dask.distributed import Client, progress
# simulate work
#dask.delayed
def work(x):
sleep(x)
return True
# generate tasks
random.seed(42)
tasks = [work(random.randint(1,5)) for x in range(50)]
Using plain dask
ProgressBar().register()
dask.compute(*tasks)
produces:
using dask.distributed
client = Client()
futures = client.compute(tasks)
progress(futures)
produces:

Dask opportunistic caching in custom graphs

I have a custom DAG such as:
dag = {'load': (load, 'myfile.txt'),
'heavy_comp': (heavy_comp, 'load'),
'simple_comp_1': (sc_1, 'heavy_comp'),
'simple_comp_2': (sc_2, 'heavy_comp'),
'simple_comp_3': (sc_3, 'heavy_comp')}
And I'm looking to compute the keys simple_comp_1, simple_comp_2, and simple_comp_3, which I perform as follows,
import dask
from dask.distributed import Client
from dask_yarn import YarnCluster
task_1 = dask.get(dag, 'simple_comp_1')
task_2 = dask.get(dag, 'simple_comp_2')
task_3 = dask.get(dag, 'simple_comp_3')
tasks = [task_1, task_2, task_3]
cluster = YarnCluster()
cluster.scale(3)
client = Client(cluster)
dask.compute(tasks)
cluster.shutdown()
It seems, that without caching, the computation of these 3 keys will lead to the computation of heavy_comp also 3 times. And since this is a heavy computation, I tried to implement opportunistic caching from here as follows:
from dask.cache import Cache
cache = Cache(2e9)
cache.register()
However, when I tried to print the results of what was being cached I got nothing:
>>> cache.cache.data
[]
>>> cache.cache.heap.heap
{}
>>> cache.cache.nbytes
{}
I even tried increasing the cache size to 6GB, however to no effect. Am I doing something wrong? How can I get Dask to cache the result of the key heavy_comp?
Expanding on MRocklin's answer and to format code in the comments below the question.
Computing the entire graph at once works as you would expect it to. heavy_comp would only be executed once, which is what you want. Consider the following code you provided in the comments completed by empty function definitions:
def load(fn):
print('load')
return fn
def sc_1(i):
print('sc_1')
return i
def sc_2(i):
print('sc_2')
return i
def sc_3(i):
print('sc_3')
return i
def heavy_comp(i):
print('heavy_comp')
return i
def merge(*args):
print('merge')
return args
dag = {'load': (load, 'myfile.txt'), 'heavy_comp': (heavy_comp, 'load'), 'simple_comp_1': (sc_1, 'heavy_comp'), 'simple_comp_2': (sc_2, 'heavy_comp'), 'simple_comp_3': (sc_3, 'heavy_comp'), 'merger_comp': (merge, 'sc_1', 'sc_2', 'sc_3')}
import dask
result = dask.get(dag, 'merger_comp')
print('result:', result)
It outputs:
load
heavy_comp
sc_1
sc_2
sc_3
merge
result: ('sc_1', 'sc_2', 'sc_3')
As you can see, "heavy_comp" is only printed once, showing that the function heavy_comp has only been executed once.
The opportunistic cache in the core Dask library only works for the single-machine scheduler, not the distributed scheduler.
However, if you just compute the entire graph at once Dask will hold onto intermediate values intelligently. If there are values that you would like to hold onto regardless you might also look at the persist function.

How to let all worker do same task in dask?

I want to let all workers do same task ,like this:
from dask import distributed
from distributed import Client,LocalCluster
import dask
import socket
def writer(filename,data):
with open(filename,'w') as f:
f.writelines(data)
def get_ip(x):
return socket.gethostname()
#writer('/data/1.txt',a)
client = Client('192.168.123.1:8786')
A=client.submit(get_ip, 0,workers=['w1','w2'], pure=False)
print(client.ncores(),
client.scheduler_info()
# dask.config.get('distributed')
)
A.result()
i have 2 workers,but just print 1 workers'hostname
A simple way to achieve what you want is to use the Client.run method
client.run(socket.gethostname)
This runs the function on all workers and returns all results. It does not use the normal task scheduling system, which is designed for a very different purpose from what you want.

Kivy - threads, queues, clocks and Python sockets

I'm brand new to Kivy, and also new to GUI, but not new to programming.
I am completely missing the boat, the canoe, and the airplane on using Kivy.
In 30 years of programming, from machine code, assembly, Fortran, C, C++, Java, Python, I've never tried to use a language such as Kivy who's documentation is this thin, because it's so new. I know it'll get better, but I'm trying to use it now.
In my code, I'm trying to implement Queueing, so that I can obtain Python socket data. In normal Python programming, I would have IPC via a Queue - put data in, get data out.
I understand from Kivy, mostly from what I've read in various forums, but can't say I've found it in the documentation at kivy.org, that I can't do the following:
Kivy needs to be in it's own thread.
Nothing in Kivy should sleep.
Nothing in Kivy should do blocking IO.
After a LOT of Google searching, the only thing I've actually found that approaches being useful, is an informative note here on StackOverFlow . However, while it almost solves my problem, the answer assumes I know more about Kivy than I do; I don't know how to incorporate the answer.
If someone could take the time to put together a COMPLETE short demo of using that example, or one of your own unique COMPLETE answers, I would much appreciate it!
Here's some short code I put together, but it doesn't work, because it blocks on the get() call.
from Queue import Queue
from kivy.lang import Builder
from kivy.app import App
from kivy.uix.boxlayout import BoxLayout
from kivy.properties import StringProperty
from kivy.clock import Clock
from threading import Thread
class ClockedQueue(BoxLayout):
text1 = StringProperty("Start")
def __init__(self):
super(ClockedQueue,self).__init__()
self.q = Queue()
self.i=0
Clock.schedule_interval(self.get, 2)
def get(self,dt):
print("get entry")
val = self.q.get()
print(self.i + val)
self.i += 1
class ClockedQueueApp(App):
def build(self):
return ClockedQueue()
class SourceQueue(Queue):
def __init__(self):
q = Queue()
for word in ['First','Second']:
q.put(word)
print("SourceQueue finished.")
def main():
th = Thread(target=SourceQueue)
th.start()
ClockedQueueApp().run()
return 0
if __name__ == '__main__':
main()
Thanks!
Here's some short code I put together, but it doesn't work, because it blocks on the get() call.
So what you really want to do is get items from your queue in a non-blocking way?
There are multiple ways to do this. The simplest seems to be to just check if the queue has any items before getting one - Queue has several methods that help with this, including checking if it is empty or setting whether get is allowed to be blocking (by setting its first argument to False). If you just do this instead of calling get on its own, you won't block things waiting for the queue to have any items - if it's empty or you can't immediately get anything, you just do nothing.
I don't know what you want to do with the items you get from the queue, but if it's short operations that don't take long then you won't need anything more than this. For instance, you could Clock.schedule_interval the get method to happen every frame, do nothing if the queue is empty, or operate on the data if you get something back. No blocking, and no messing with your own threads.
You can also create your own thread and run the blocking code in it, which is general way to deal with blocking issues, especially tasks that can't be split up into short sections that can be performed between frames. I don't know about the details of this, but it should just involve using python threads normally. You can check the source of kivy's UrlRequest for an example, this can download a web source in a background thread.
Edit: Also your SourceQueue is messed up (you override its __init__ to make a new queue that you don't store anywhere), and your clock scheduling has a meaningless third argument false which isn't even defined. I don't know what's going on here, it probably affects what you're trying to do, but doesn't matter to my general answer above.
I was finally able to create something that worked.
Thanks everyone for your suggestions!
Here's the code (because I'm new, Stackoverflow wouldn't let me post it as answering my own question until 5:00 AM)
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# threads_and_kivy.py
#
'''threads_and_kivy.py
Trying to build up a foundation that satisfies the following:
- has a thread that will implement code that:
- simulates reads data from a Python socket
- works on the data
- puts the data onto a Python Queue
- has a Kivy mainthread that:
- via class ShowGUI
- reads data from the Queue
- updates a class variable of type StringProperty so it will
update the label_text property.
'''
from threading import Thread
from Queue import Queue, Empty
import time
from kivy.app import App
from kivy.uix.boxlayout import BoxLayout
from kivy.properties import StringProperty
from kivy.lang import Builder
from kivy.clock import Clock
kv='''
<ShowGUI>:
Label:
text: str(root.label_text)
'''
Builder.load_string(kv)
q = Queue()
class SimSocket():
global q
def __init__(self, queue):
self.q = queue
def put_on_queue(self):
print("<-----..threaded..SimSocket.put_on_queue(): entry")
for i in range(10):
print(".....threaded.....SimSocket.put_on_queue(): Loop " + str(i))
time.sleep(1)#just here to sim occassional data send
self.some_data = ["SimSocket.put_on_queue(): Data Loop " + str(i)]
self.q.put(self.some_data)
print("..threaded..SimSocket.put_on_queue(): thread ends")
class ShowGUI(BoxLayout):
label_text = StringProperty("Initial - not data")
global q
def __init__(self):
super(ShowGUI, self).__init__()
print("ShowGUI.__init__() entry")
Clock.schedule_interval(self.get_from_queue, 1.0)
def get_from_queue(self, dt):
print("---------> ShowGUI.get_from_queue() entry")
try:
queue_data = q.get(timeout = 5)
self.label_text = queue_data[0]
for qd in queue_data:
print("SimKivy.get_from_queue(): got data from queue: " + qd)
except Empty:
print("Error - no data received on queue.")
print("Unschedule Clock's schedule")
Clock.unschedule(self.get_from_queue)
class KivyGui(App):
def build(self):
return ShowGUI()
def main():
global q
ss = SimSocket(q)
simSocket_thread = Thread(name="simSocket",target=ss.put_on_queue)
simSocket_thread.start()
print("Starting KivyGui().run()")
KivyGui().run()
return 0
if __name__ == '__main__':
main()

Resources