The question should be simple enough, but I couldn't find anything about it.
I have an async Python program that contains a rather long-running task that I want to be able to suspend and restart at arbitrary points (arbitrary of course meaning everywhere where there's an await keyword).
I was hoping there was something along the lines of task.suspend() and task.resume() but it seems there isn't.
Is there an API for this on task- or event-loop-level or would I need to do this myself somehow? I don't want to place an event.wait() before every await...
What you're asking for is possible, but not trivial. First, note that you can never have suspends on every await, but only on those that result in suspension of the coroutine, such as asyncio.sleep(), or a stream.read() that doesn't have data ready to return. Awaiting a coroutine immediately starts executing it, and if the coroutine can return immediately, it does so without dropping to the event loop. await only suspends to the event loop if the awaitee (or its awaitee, etc.) requests it. More details in these questions: [1], [2], [3], [4].
With that in mind, you can use the technique from this answer to intercept each resumption of the coroutine with additional code that checks whether the task is paused and, if so, waits for the resume event before proceeding.
import asyncio
class Suspendable:
def __init__(self, target):
self._target = target
self._can_run = asyncio.Event()
self._can_run.set()
self._task = asyncio.ensure_future(self)
def __await__(self):
target_iter = self._target.__await__()
iter_send, iter_throw = target_iter.send, target_iter.throw
send, message = iter_send, None
# This "while" emulates yield from.
while True:
# wait for can_run before resuming execution of self._target
try:
while not self._can_run.is_set():
yield from self._can_run.wait().__await__()
except BaseException as err:
send, message = iter_throw, err
# continue with our regular program
try:
signal = send(message)
except StopIteration as err:
return err.value
else:
send = iter_send
try:
message = yield signal
except BaseException as err:
send, message = iter_throw, err
def suspend(self):
self._can_run.clear()
def is_suspended(self):
return not self._can_run.is_set()
def resume(self):
self._can_run.set()
def get_task(self):
return self._task
Test:
import time
async def heartbeat():
while True:
print(time.time())
await asyncio.sleep(.2)
async def main():
task = Suspendable(heartbeat())
for i in range(5):
print('suspending')
task.suspend()
await asyncio.sleep(1)
print('resuming')
task.resume()
await asyncio.sleep(1)
asyncio.run(main())
Related
I need to do a lot of work, but luckily it's easy to decouple into different tasks for asynchronous execution. Some of those depend on each other, and it's perfectly clear to me how on task can await multiple others to get their results. However, I don't know how I can have multiple different tasks await the same coroutine, and both get the result. The Documentation also doesn't mention this case as far as I can find.
Consider the following minimal example:
from asyncio import create_task, gather
async def TaskA():
... # This is clear
return result
async def TaskB(task_a):
task_a_result = await task_a
... # So is this
return result
async def TaskC(task_a):
task_a_result = await task_a
... # But can I even do this?
return result
async def main():
task_a = create_task(TaskA())
task_b = create_task(TaskB(task_a))
task_c = create_task(TaskC(task_a))
gather(task_b, task_c) # Can I include task_a here to signal the intent of "wait for all tasks"?
For the actual script, all tasks do some database operations, some of which involve foreign keys, and therefore depend on other tables already being filled. Some depend on the same table. I definitely need:
All tasks run once, and only once
Some tasks are dependent on others being done before starting.
In brief, the question is, does this work? Can I await the same instantiated coroutine multiple times, and get the result every time? Or do I need to put awaits in main(), and pass the result? (which is the current setup, and I don't like it.)
You can await the same task multiple times:
from asyncio import create_task, gather, run
async def coro_a():
print("executing coro a")
return 'a'
async def coro_b(task_a):
task_a_result = await task_a
print("from coro_b: ", task_a_result)
return 'b'
async def coro_c(task_a):
task_a_result = await task_a
print("from coro_a: ", task_a_result)
return 'c'
async def main():
task_a = create_task(coro_a())
print(await gather(coro_b(task_a), coro_c(task_a)))
if __name__ == "__main__":
run(main())
Will output:
executing coro a
from coro_b: a
from coro_a: a
['b', 'c']
What you can not do is to await the same coroutine multiples times:
...
async def main():
task_a = coro_a()
print(await gather(coro_b(task_a), coro_c(task_a)))
...
Will raise RuntimeError: cannot reuse already awaited coroutine.
As long as you schedule your coroutine coro_a using create_task your code will work.
Suppose I need to send a large amount of data from the client to the server using python gRPC. And I want to continue the rest computation when sending the message out instead of blocking the code. Is there any way can implement this?
I will illustrate the question by an example using the modified code from the greeter_client.py
for i in range(5):
res=computation()
response = stub.SayHello(helloworld_pb2.HelloRequest(data=res))
I want the computation of the next iteration continue while sending the "res" of last iteration. To this end, I have tried the "async/await", which looks like this
async with aio.insecure_channel('localhost:50051') as channel:
stub = helloworld_pb2_grpc.GreeterStub(channel)
for j in range(5):
res=computation()
response = await stub.SayHello(helloworld_pb2.HelloRequest(data=res))
But the running time is actually the same with the version without async/await. The async/await does not work. I am wondering is there anything wrong in my codes or there are other ways?
Concurrency is different than parallelism. AsyncIO allows multiple coroutines to run on the same thread, but they are not actually computed at the same time. If the thread is given a CPU-heavy work like "computation()" in your snippet, it doesn't yield control back to the event loop, hence there won't be any progress on other coroutines.
Besides, in the snippet, the RPC depends on the result of "computation()", this meant the work will be serialized for each RPC. But we can still gain some concurrency from AsyncIO, by handing them over to the event loop with asyncio.gather():
async with aio.insecure_channel('localhost:50051') as channel:
stub = helloworld_pb2_grpc.GreeterStub(channel)
async def one_hello():
res=computation()
response = await stub.SayHello(helloworld_pb2.HelloRequest(data=res))
await asyncio.gather(*(one_hello() for _ in range(5)))
I am trying to understand the use of Future.cancel() with asyncio. The Python documentation is very light on this. I've had no success with existing questions on here or search engines. I just want to understand happens when a task is awaiting a future which is cancelled.
Here is my code:
import asyncio
async def foo(future):
await asyncio.sleep(3)
future.cancel()
async def bar(future):
await future
print("hi")
async def baz(future):
await bar(future)
print("ho")
loop = asyncio.get_event_loop()
future = loop.create_future()
loop.create_task(baz(future))
loop.create_task(foo(future))
loop.run_forever()
"hi" is not seen printed. So I initially guessed bar was returning at the line await future in the case of a cancel.
However, "ho" is not printed either. So it seems logical that cancelling a future never yields back to tasks awaiting it? But then these tasks are sitting in the event loop forever? This seems undesirable, where have I misunderstood?
In this case the answer lies in the documentation, but you have to look for it a bit. First, a reminder of what it means to await a future:
# the expression:
x = await future
# is equivalent to:
... magically suspend the coroutine until the future.done() becomes true ...
x = future.result()
In other words, once the execution of the coroutine that contains await resumes, the value of the await statement will be the result() of the awaited future.
The question is: when you cancel a future, what is its result? The documentation says:
If the Future has been cancelled, this method raises a CancelledError exception.
So when someone cancel a future you awaited, the await future expression will raise an exception! This neatly explains why bar doesn't print hi (because await future has raised), and why baz doesn't print ho (because await bar(...) has raised).
A traceback is never printed because loop.create_task spawns the coroutine in the "background" (of sorts) - if no one inspects the return value, the exception will be lost. And since you threw away the task object returned by create_task and used run_forever to have the loop running forever, the loop just continues running, waiting (forever) for new tasks to somehow arrive.
If you changed the code to actually collect the result of bar, you would easily observe the CancelledError:
if __name__ == '__main__':
loop = asyncio.get_event_loop()
future = loop.create_future()
loop.create_task(foo(future))
loop.run_until_complete(baz(future))
Output:
Traceback (most recent call last):
File "xxx.py", line 19, in <module>
loop.run_until_complete(baz(future))
File "/usr/lib/python3.5/asyncio/base_events.py", line 387, in run_until_complete
return future.result()
File "/usr/lib/python3.5/asyncio/futures.py", line 266, in result
raise CancelledError
concurrent.futures._base.CancelledError
I have a simple task which is scheduled by dask-scheduler and is running on a worker node.
My requirement is, I want to have the control to stop the task on demand as and when the user wants..
You will have to build this into your task, perhaps by explicitly checking a distributed Variable object in a loop.
from dask.distributed import Variable
stop = Variable()
stop.set(False)
def my_task():
while True:
if stop.get():
return
else:
# do stuff
future = client.submit(my_task)
# wait
stop.set(True)
You will need something explicit like this. Tasks are normally run in separate threads. As far as I know there is no way to interrupt a thread (though I would be happy to learn otherwise).
#MRocklin. thanks for your suggestion.. and here is the machinery that I've built around explicit stopping of the running/live task. Although the below code is not re-factored.. kindly trace the logic behind it.. Thanks - Manoranjan (I will mark your answer was really helpful..) :) keep doing good..
import os
import subprocess
from dask.distributed import Variable, Client
from multiprocessing import Process, current_process
import time
global stop
def my_task(proc):
print("my_task..")
print("child proc::", proc)
p = None
childProcessCreated = False
while True:
print("stop.get()::", stop.get())
if stop.get():
print("event triggered for stopping the live task..")
p.terminate()
return 100
else:
if childProcessCreated == False:
print("childProcessCreated::", childProcessCreated)
p = subprocess.Popen("python sleep.py", shell=False)
childProcessCreated = True
print("subprocess p::", p, " type::", type(p))
time.sleep(1)
print("returnning with 20")
return 20
if __name__ == '__main__':
clienta = Client("192.168.1.2:8786")
print("global declaration..")
global stop
stop = Variable("name-xx", client = clienta)
stop.set(False)
future = clienta.submit(my_task, 10)
print("future::waiting for 4 sec..in client side", future)
time.sleep(3)
print("future after sleeping for sec", future)
#print("result::", future.result())
stop.set(True)
print("future after stopping the child process::", future)
print("child process should be stopped by now..")
#print("future::", future)
#print("future result::",future.result())
print("over.!")
My Rails web app has dozens of methods from making calls to an API and processing query result. These methods have the following structure:
def method_one
batch_query_API
process_data
end
..........
def method_nth
batch_query_API
process_data
end
def summary
method_one
......
method_nth
collect_results
end
How can I run all query methods at the same time instead of sequential in Rails (without firing up multiple workers, of course)?
Edit: all of the methods are called from a single instance variable. I think this limits the use of Sidekiq or Delay in submitting jobs simultaneously.
Ruby has the excellent promise gem. Your example would look like:
require 'future'
def method_one
...
def method_nth
def summary
result1 = future { method_one }
......
resultn = future { method_nth }
collect_results result1, ..., resultn
end
Simple, isn't it? But let's get to more details. This is a future object:
result1 = future { method_one }
It means, the result1 is getting evaluated in the background. You can pass it around to other methods. But result1 doesn't have any result yet, it is still processing in the background. Think of passing around a Thread. But the major difference is - the moment you try to read it, instead of passing it around, it blocks and waits for the result at that point. So in the above example, all the result1 .. resultn variables will keep getting evaluated in the background, but when the time comes to collect the results, and when you try to actually read these values, the reads will wait for the queries to finish at that point.
Install the promise gem and try the below in Ruby console:
require 'future'
x = future { sleep 20; puts 'x calculated'; 10 }; nil
# adding a nil to the end so that x is not immediately tried to print in the console
y = future { sleep 25; puts 'y calculated'; 20 }; nil
# At this point, you'll still be using the console!
# The sleeps are happening in the background
# Now do:
x + y
# At this point, the program actually waits for the x & y future blocks to complete
Edit: Typo in result, should have been result1, change echo to puts
You can take a look at a new option in town: The futoroscope gem.
As you can see by the announcing blog post it tries to solve the same problem you are facing, making simultaneous API query's. It seems to have pretty good support and good test coverage.
Assuming that your problem is a slow external API, a solution could be the use of either threaded programming or asynchronous programming. By default when doing IO, your code will block. This basically means that if you have a method that does an HTTP request to retrieve some JSON your method will tell your operating system that you're going to sleep and you don't want to be woken up until the operating system has a response to that request. Since that can take several seconds, your application will just idly have to wait.
This behavior is not specific to just HTTP requests. Reading from a file or a device such as a webcam has the same implications. Software does this to prevent hogging up the CPU when it obviously has no use of it.
So the question in your case is: Do we really have to wait for one method to finish before we can call another? In the event that the behavior of method_two is dependent on the outcome of method_one, then yes. But in your case, it seems that they are individual units of work without co-dependence. So there is a potential for concurrency execution.
You can start new threads by initializing an instance of the Thread class with a block that contains the code you'd like to run. Think of a thread as a program inside your program. Your Ruby interpreter will automatically alternate between the thread and your main program. You can start as many threads as you'd like, but the more threads you create, the longer turns your main program will have to wait before returning to execution. However, we are probably talking microseconds or less. Let's look at an example of threaded execution.
def main_method
Thread.new { method_one }
Thread.new { method_two }
Thread.new { method_three }
end
def method_one
# something_slow_that_does_an_http_request
end
def method_two
# something_slow_that_does_an_http_request
end
def method_three
# something_slow_that_does_an_http_request
end
Calling main_method will cause all three methods to be executed in what appears to be parallel. In reality they are still being sequentually processed, but instead of going to sleep when method_one blocks, Ruby will just return to the main thread and switch back to method_one thread, when the OS has the input ready.
Assuming each method takes two 2 ms to execute minus the wait for the response, that means all three methods are running after just 6 ms - practically instantly.
If we assume that a response takes 500 ms to complete, that means you can cut down your total execution time from 2 + 500 + 2 + 500 + 2 + 500 to just 2 + 2 + 2 + 500 - in other words from 1506 ms to just 506 ms.
It will feel like the methods are running simultanously, but in fact they are just sleeping simultanously.
In your case however you have a challenge because you have an operation that is dependent on the completion of a set of previous operations. In other words, if you have task A, B, C, D, E and F, then A, B, C, D and E can be performed simultanously, but F cannot be performed until A, B, C, D and E are all complete.
There are different ways to solve this. Let's look at a simple solution which is creating a sleepy loop in the main thread that periodically examines a list of return values to make sure some condition is fullfilled.
def task_1
# Something slow
return results
end
def task_2
# Something slow
return results
end
def task_3
# Something slow
return results
end
my_responses = {}
Thread.new { my_responses[:result_1] = task_1 }
Thread.new { my_responses[:result_2] = task_2 }
Thread.new { my_responses[:result_3] = task_3 }
while (my_responses.count < 3) # Prevents the main thread from continuing until the three spawned threads are done and have dumped their results in the hash.
sleep(0.1) # This will cause the main thread to sleep for 100 ms between each check. Without it, you will end up checking the response count thousands of times pr. second which is most likely unnecessary.
end
# Any code at this line will not execute until all three results are collected.
Keep in mind that multithreaded programming is a tricky subject with numerous pitfalls. With MRI it's not so bad, because while MRI will happily switch between blocked threads, MRI doesn't support executing two threads simultanously and that solves quite a few concurrency concerns.
If you want to get into multithreaded programming, I recommend this book:
http://www.amazon.com/Java-Concurrency-Practice-Brian-Goetz/dp/0321349601
It's centered around Java, but the pitfalls and concepts explained are universal.
You should check out Sidekiq.
RailsCasts episode about Sidekiq.