Asyncio start a task inside another task? - task

i'm trying to learn the usage of asyncio but i've met a roadblock.
What am i trying to do? I'm trying to create a number of workers that as soon as they're created they start their own task. So while task3 is being created and started task1 should already be executing its task. I'm doing that by using a loop inside a single coroutine, at eache iteration the worker is created and starts.
The problem i'm facing: When the first worker completes its task the others just stop and don't continue.
This is my code:
import asyncio
class Worker:
def __init__(self, session_name):
self.name = session_name
self.messagelist = ['--------1', '--------2', '--------3', '--------4']
async def job(self):
for i, message in enumerate(self.messagelist):
print(f"### Worker {self.name} says {message}")
await asyncio.sleep(20)
class Testmanager:
def __init__(self):
self.workers_name = ['test0', 'test1', 'test2', 'test3', 'test4']
async def create_and_start_workers(self, loop):
for i, name in enumerate(self.workers_name):
worker = Worker(name)
print(f"# Created worker {worker.name}")
loop.create_task(worker.job())
print(f"## Started worker {worker.name}")
await asyncio.sleep(10)
def start(self):
loop = asyncio.get_event_loop()
loop.run_until_complete(self.create_and_start_workers(loop))
loop.close()
manager = Testmanager()
manager.start()
When run initially it works as expected, but after a while i get a lot of:
Task was destroyed but it is pending!
task: <Task pending coro=<Worker.job() done, defined at PATH_REDACTED> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0x0000026AF6315438>()]>>
What am i doing wrong?
Thanks for the help.

What am i doing wrong?
You are never awaiting the tasks you create to run in parallel. For example:
async def create_and_start_workers(self, loop):
tasks = []
for i, name in enumerate(self.workers_name):
worker = Worker(name)
print(f"# Created worker {worker.name}")
tasks.append(loop.create_task(worker.job()))
print(f"## Started worker {worker.name}")
await asyncio.sleep(10)
await asyncio.gather(*tasks)

Related

Can I await the same Task multiple times in Python?

I need to do a lot of work, but luckily it's easy to decouple into different tasks for asynchronous execution. Some of those depend on each other, and it's perfectly clear to me how on task can await multiple others to get their results. However, I don't know how I can have multiple different tasks await the same coroutine, and both get the result. The Documentation also doesn't mention this case as far as I can find.
Consider the following minimal example:
from asyncio import create_task, gather
async def TaskA():
... # This is clear
return result
async def TaskB(task_a):
task_a_result = await task_a
... # So is this
return result
async def TaskC(task_a):
task_a_result = await task_a
... # But can I even do this?
return result
async def main():
task_a = create_task(TaskA())
task_b = create_task(TaskB(task_a))
task_c = create_task(TaskC(task_a))
gather(task_b, task_c) # Can I include task_a here to signal the intent of "wait for all tasks"?
For the actual script, all tasks do some database operations, some of which involve foreign keys, and therefore depend on other tables already being filled. Some depend on the same table. I definitely need:
All tasks run once, and only once
Some tasks are dependent on others being done before starting.
In brief, the question is, does this work? Can I await the same instantiated coroutine multiple times, and get the result every time? Or do I need to put awaits in main(), and pass the result? (which is the current setup, and I don't like it.)
You can await the same task multiple times:
from asyncio import create_task, gather, run
async def coro_a():
print("executing coro a")
return 'a'
async def coro_b(task_a):
task_a_result = await task_a
print("from coro_b: ", task_a_result)
return 'b'
async def coro_c(task_a):
task_a_result = await task_a
print("from coro_a: ", task_a_result)
return 'c'
async def main():
task_a = create_task(coro_a())
print(await gather(coro_b(task_a), coro_c(task_a)))
if __name__ == "__main__":
run(main())
Will output:
executing coro a
from coro_b: a
from coro_a: a
['b', 'c']
What you can not do is to await the same coroutine multiples times:
...
async def main():
task_a = coro_a()
print(await gather(coro_b(task_a), coro_c(task_a)))
...
Will raise RuntimeError: cannot reuse already awaited coroutine.
As long as you schedule your coroutine coro_a using create_task your code will work.

Is it possible to suspend and restart tasks in async Python?

The question should be simple enough, but I couldn't find anything about it.
I have an async Python program that contains a rather long-running task that I want to be able to suspend and restart at arbitrary points (arbitrary of course meaning everywhere where there's an await keyword).
I was hoping there was something along the lines of task.suspend() and task.resume() but it seems there isn't.
Is there an API for this on task- or event-loop-level or would I need to do this myself somehow? I don't want to place an event.wait() before every await...
What you're asking for is possible, but not trivial. First, note that you can never have suspends on every await, but only on those that result in suspension of the coroutine, such as asyncio.sleep(), or a stream.read() that doesn't have data ready to return. Awaiting a coroutine immediately starts executing it, and if the coroutine can return immediately, it does so without dropping to the event loop. await only suspends to the event loop if the awaitee (or its awaitee, etc.) requests it. More details in these questions: [1], [2], [3], [4].
With that in mind, you can use the technique from this answer to intercept each resumption of the coroutine with additional code that checks whether the task is paused and, if so, waits for the resume event before proceeding.
import asyncio
class Suspendable:
def __init__(self, target):
self._target = target
self._can_run = asyncio.Event()
self._can_run.set()
self._task = asyncio.ensure_future(self)
def __await__(self):
target_iter = self._target.__await__()
iter_send, iter_throw = target_iter.send, target_iter.throw
send, message = iter_send, None
# This "while" emulates yield from.
while True:
# wait for can_run before resuming execution of self._target
try:
while not self._can_run.is_set():
yield from self._can_run.wait().__await__()
except BaseException as err:
send, message = iter_throw, err
# continue with our regular program
try:
signal = send(message)
except StopIteration as err:
return err.value
else:
send = iter_send
try:
message = yield signal
except BaseException as err:
send, message = iter_throw, err
def suspend(self):
self._can_run.clear()
def is_suspended(self):
return not self._can_run.is_set()
def resume(self):
self._can_run.set()
def get_task(self):
return self._task
Test:
import time
async def heartbeat():
while True:
print(time.time())
await asyncio.sleep(.2)
async def main():
task = Suspendable(heartbeat())
for i in range(5):
print('suspending')
task.suspend()
await asyncio.sleep(1)
print('resuming')
task.resume()
await asyncio.sleep(1)
asyncio.run(main())

Sidekiq - pick up DB changes made by another worker

In a Rails app, I have projects which have many tasks.
A task may have a predecessor that need to be completed before the task can start.
I use sidekiq for creating tasks.
class ScheduleProjectJob < ApplicationJob
queue_as :default
def perform(project)
tasks = Array(project.tasks)
while !tasks.empty? do
task = tasks.shift
if task.without_predecessor? || task.predecessor_scheduled?
ScheduleTaskJob.perform_later(task)
else
tasks << task
end
end
end
I loop through the tasks and schedule a task if it doesn't have a predecessor or, in case it has one, when the predecessor has been already scheduled.
To check if the predecessor has been scheduled, I check in the database if the predecessor state is scheduled (tasks are created with created state and updated to scheduled at the end of ScheduleTaskJob.
The check is as follows
Task.joins(:task_template).
where(%q(task_templates.dep_id = :dep AND
task_templates.tag = :tag AND
tasks.state = :state),
specification_id: task_template.dep_id,
tag: task_template.runs_after_tag,
state: 'scheduled').
count > 0
The query above seems to work fine when I manually set the DB up and run it.
However, when it runs inside the ScheduleProjectJob the state of the predecessor task is always reported as created even if I can see in the DB the value in the record has been updated to scheduled.
Am I missing anything here?
end
ActiveRecord caches query results, when you're expecting query result to change, wrap your query with:
ActiveRecord::Base.uncached do # or YourModel.uncached do
some_query.count
end

How to explicitly stop a running/live task through dask.?

I have a simple task which is scheduled by dask-scheduler and is running on a worker node.
My requirement is, I want to have the control to stop the task on demand as and when the user wants..
You will have to build this into your task, perhaps by explicitly checking a distributed Variable object in a loop.
from dask.distributed import Variable
stop = Variable()
stop.set(False)
def my_task():
while True:
if stop.get():
return
else:
# do stuff
future = client.submit(my_task)
# wait
stop.set(True)
You will need something explicit like this. Tasks are normally run in separate threads. As far as I know there is no way to interrupt a thread (though I would be happy to learn otherwise).
#MRocklin. thanks for your suggestion.. and here is the machinery that I've built around explicit stopping of the running/live task. Although the below code is not re-factored.. kindly trace the logic behind it.. Thanks - Manoranjan (I will mark your answer was really helpful..) :) keep doing good..
import os
import subprocess
from dask.distributed import Variable, Client
from multiprocessing import Process, current_process
import time
global stop
def my_task(proc):
print("my_task..")
print("child proc::", proc)
p = None
childProcessCreated = False
while True:
print("stop.get()::", stop.get())
if stop.get():
print("event triggered for stopping the live task..")
p.terminate()
return 100
else:
if childProcessCreated == False:
print("childProcessCreated::", childProcessCreated)
p = subprocess.Popen("python sleep.py", shell=False)
childProcessCreated = True
print("subprocess p::", p, " type::", type(p))
time.sleep(1)
print("returnning with 20")
return 20
if __name__ == '__main__':
clienta = Client("192.168.1.2:8786")
print("global declaration..")
global stop
stop = Variable("name-xx", client = clienta)
stop.set(False)
future = clienta.submit(my_task, 10)
print("future::waiting for 4 sec..in client side", future)
time.sleep(3)
print("future after sleeping for sec", future)
#print("result::", future.result())
stop.set(True)
print("future after stopping the child process::", future)
print("child process should be stopped by now..")
#print("future::", future)
#print("future result::",future.result())
print("over.!")

Update args of a scheduled job

I have scheduled a job
Worker.perform_at(time, args)
And I can fetch the scheduled jobs
job = Sidekiq::ScheduledSet.new.find_job(jid)
job.args # this is the args I passed above
I need to update the args that will be passed to the worker when it is called, i.e. update job.args. How do I do that?
This won't work:
job.args = new_args
Sidekiq::ScheduledSet.new.to_a[0] = job
Well update the task is not the way achieving it cancel job and create new with new args:
job = Sidekiq::ScheduledSet.new.find_job(jid)
## time = job.time // Or just set time needed.
Sidekiq::Status.cancel jid
Worker.perform_at(time, new_args)
it will also make it easier for you to debug and log the jobs because when you edit/update them on the fly could cause bugs that very hard to identify.

Resources