I am trying to use "completed_count()" to track how many tasks are left in a group in Celery.
My "client" runs this:
from celery import group
from proj import do
wordList=[]
with open('word.txt') as wordData:
for line in wordData:
wordList.append(line)
readAll = group(do.s(i) for i in wordList)
result = readAll.apply_async()
while not result.ready():
print(result.completed_count())
result.get()
The 'word.txt" is just a file with one word on each line.
Then I have the celery worker(s) set to run the do task as:
#app.task(task_acks_late = True)
def do(word):
sleep(1)
return f"I'm doing {word}"
My broker is pyamqp and I use rpc for the backend.
I thought it would print an increasing count of tasks for each loop on the client side but all I get are "0"s.
The problem is not in completed_count method. You are getting zeros because of result.ready() stays False after all the tasks have been completed. It seems like we have a bug with rpc backend, there is an issue on github. Consider to change the backend setting to amqp, it is working correctly as I can see
Related
I've been using Celery for a while a now, in production I use RabbitMQ as the broker and Redis for the backend in a K8s cluster with no problems so far. Locally, I run a docker compose with a few services (Flask API, 2 different Workers, Beat, Redis, Flower, Hasura), using Redis as both the Broker and the Backend.
I haven't experienced problems with this setup for the past months, but yesterday I started getting erratic behavior while accessing task results.
Tasks are sent to queue, the worker recognizes it and performs the task, but while querying for the task state I sometimes get DisabledBackend. Normally on the first request, and then it works. Couldn't find a pattern of when it works and when it doesn't, it's erratic.
I've read somewhere that Celery didn't work very well with flask's builtin server so I switched to uWSGI with pretty much the same setup I have in production:
[uwsgi]
wsgi-file = app/uwsgi.py
callable = application
http = :8080
processes = 4
threads = 2
master = true
chmod-socket = 660
vacuum = true
die-on-term = true
buffer-size = 32768
enable-threads = true
req-logger = python:uwsgi
I've seen a similar question in Django in which the problem seemed to be on WSGI Mod with Apache, which is not my case, but the behavior seems similar. Every other question I've seen was related to misconfiguration of the backend, which is not my case.
Any ideas on what might be causing this?
Thanks.
So it seems that I need to access AsyncResult only via my Celery app instance, instead of through Celery, or pass the Celery app instance as an argument.
So, this doesn't work:
from celery.result import AsyncResult
#app.route('/status/<task_id>')
def get_status(task_id):
task = AsyncResult(task_id)
return task.state
This works:
from app import my_celery # Your own Celery Application Instance
#app.route('/status/<task_id>')
def get_status(task_id):
task = my_celery.AsyncResult(task_id)
return task.state
This also works:
from app import my_celery
from celery.result import AsyncResult
#app.route('/status/<task_id>')
def get_status(task_id):
task = AsyncResult(task_id, app=my_celery)
return task.state
I'm guessing what happens is that by calling AsyncResult directly from Celery, it doesn't access Celery's configurations, hence it thinks that there's no backend configured to query results to.
But that would only explain complete failure of the function, and not the erratic behavior. I'm guessing this is because of different threads, and situations in which the app instance is being importante, so Celery finds it, not too sure though.
I've ran a couple of tests and seems to be working fine again after changing the imported AsyncResult, but I'll keep digging.
I have 3 ubuntu machine(CPU). my dask scheduler and client both are present on the same machine, whereas the two dask workers are running on other two machines. when I launch first task, it gets scheduled on first worker, but then upon launching second worker, while the first one still executing, it does not get scheduled on second worker. here is the sample client code that I tried.
### client.py
from dask.distributed import Client
import time, sys, os, random
def my_task(arg):
print("doing something in my_task")
time.sleep(2)
print("inside my task..", arg)
print("again doing something in my_task")
time.sleep(2)
print("return some random value")
value = random.randint(1,100)
print("value::", value)
return value
client = Client("172.25.49.226:8786")
print("client::", client)
future = client.submit(my_task, "hi")
print("future result::", future.result())
print("closing the client..")
client.close()
I am running "python client.py" two times almost at the same time from two different terminal/machines. both the client seems to be executing, but it results in exactly the same output which it should not because the return type of the my_task() is a random value. I tested this on ubuntu machines.
However a month back, I was able to run same tasks in parallel on CentOs machines. And now if check back and ran same two tasks from those CentOs machines, the problem persist. This is strange. it did not run in parallel. Not able to figure out this behavior by dask. Am I missing any OS level settings or something else.?
Run the below almost at the same time,
python client.py # from one machine/terminal
python client.py # from another machine/terminal
these two tasks should run in parallel, each task should run on different worker(we have two free workers available), but this is not happening. I can't see any log on the second worker console nor on the scheduler, while the first task continues to execute. At the end I noticed both the tasks finishes exactly at the same time with exactly same output.
However the above client code works well in "parallel" in windows OS, each task running through multiple terminals. but I would like to run it on Ubuntu machines.
By default if you call the same function on the same inputs Dask will assume that this will produce the same value, and only compute it once. You can override this behavior with the pure=False keyword
future = client.submit(func, *args, pure=False)
When using Dask's distributed scheduler I have a task that is running on a remote worker that I want to stop.
How do I stop it? I know about the cancel method, but this doesn't seem to work if the task has already started executing.
If it's not yet running
If the task has not yet started running you can cancel it by cancelling the associated future
future = client.submit(func, *args) # start task
future.cancel() # cancel task
If you are using dask collections then you can use the client.cancel method
x = x.persist() # start many tasks
client.cancel(x) # cancel all tasks
If it is running
However if your task has already started running on a thread within a worker then there is nothing that you can do to interrupt that thread. Unfortunately this is a limitation of Python.
Build in an explicit stopping condition
The best you can do is to build in some sort of stopping criterion into your function with your own custom logic. You might consider checking a shared variable within a loop. Look for "Variable" in these docs: http://dask.pydata.org/en/latest/futures.html
from dask.distributed import Client, Variable
client = Client()
stop = Varible()
stop.put(False)
def long_running_task():
while not stop.get():
... do stuff
future = client.submit(long_running_task)
... wait a while
stop.put(True)
I want to run graphs/futures on my distributed cluster which all have a 'load data' root task and then a bunch of training tasks that run on that data. A simplified version would look like this:
from dask.distributed import Client
client = Client(scheduler_ip)
load_data_future = client.submit(load_data_func, 'path/to/data/')
train_task_futures = [client.submit(train_func, load_data_future, params)
for params in train_param_set]
Running this as above the scheduler gets one worker to read the file, then it spills that data to disk to share it with the other workers. However, loading the data is usually reading from a large HDF5 file, which can be done concurrently, so I was wondering if there was a way to force all workers to read this file concurrently (they all compute the root task) instead of having them wait for one worker to finish then slowly transferring the data from that worker.
I know there is the client.run() method which I can use to get all workers to read the file concurrently, but how would you then get the data you've read to feed into the downstream tasks?
I cannot use the dask data primitives to concurrently read HDF5 files because I need things like multi-indexes and grouping on multiple columns.
Revisited this question and found a relatively simple solution, though it uses internal API methods and involves a blocking call to client.run(). Using the same variables as in the question:
from distributed import get_worker
client_id = client.id
def load_dataset():
worker = get_worker()
data = {'load_dataset-0': load_data_func('path/to/data')}
info = worker.update_data(data=data, report=False)
worker.scheduler.update_data(who_has={key: [worker.address] for key in data},
nbytes=info['nbytes'], client=client_id)
client.run(load_dataset)
Now if you run client.has_what() you should see that each worker holds the key load_dataset-0. To use this in downstream computations you can simply create a future for the key:
from distributed import Future
load_data_future = Future('load_dataset-0', client=client)
and this can be used with client.compute() or dask.delayed as usual. Indeed the final line from the example in the question would work fine:
train_task_futures = [client.submit(train_func, load_data_future, params)
for params in train_param_set]
Bear in mind that it uses internal API methods Worker.update_data and Scheduler.update_data and works fine as of distributed.__version__ == 1.21.6 but could be subject to change in future releases.
As of today (distributed.__version__ == 1.20.2) what you ask for is not possible. The closest thing would be to compute once and then replicate the data explicitly
future = client.submit(load, path)
wait(future)
client.replicate(future)
You may want to raise this as a feature request at https://github.com/dask/distributed/issues/new
when the periodic task is running , how to get the task return? I need the running result.
this is my problem:
For example, my periodic task:
#shared_task(name='add')
def add():
x=1,y=2
return x+y
I add the task as periodic task from django admin,then start the worker with -B DEBUG option.It runs well.But I want to get the return value.Is there any method to get the retult when the periodic task is running?
To get result of a task you can call .get() method on AsyncResult object, which is returned when you add task to a queue:
result = add.delay()
result.get() // returns 5
Also make sure that your results are stored for enough long time by setting up CELERY_RESULT_PERSISTENT or CELERY_TASK_RESULT_EXPIRES. Read more in AMQP backend settings section.