Cupy streams synchronization using with statement - stream

Are the two codes equivalent?
Code 1:
with cp.cuda.Stream(non_blocking=False) as stream:
# do stuff
stream.synchronize()
Code 2:
with cp.cuda.Stream(non_blocking=False) as stream:
# do stuff
stream.synchronize()

Yes, these two are equivalent. with cp.cuda.Stream(non_blocking=False) as stream: switches the current stream of CuPy, i.e., all the code within the block will be performed on stream.
In CuPy, streams are not synchronized automatically on exiting the context manager so that users can fully control the synchronization timing. This is handy when dealing with multiple streams, e.g.:
s1 = cp.cuda.Stream(non_blocking=False)
s2 = cp.cuda.Stream(non_blocking=False)
for i in range(10):
with s1:
# do some work
with s2:
# do some work
s1.synchronize()
s2.synchronize()

Related

Nim closure doesn't update captured variable when running in thread. Why?

I have a closure that update captured external variable. It works well when the closure is called in the same thread that variable is defined. But when I pass the closure to a child thread, it doesn't update the external variable as expected. What's happening when a closure is passed to a thread? Any docs about it?
# nim c -r --threads:on testClosure.nim
import strutils
template hexDumpAddrOf[T](v: T): string =
# echo fmt"size of addr: {sizeof(v.unsafeAddr)}, sizeof ByteAddress: {sizeof(ByteAddress)}"
var p = cast[array[sizeof(ByteAddress), uint8]](v.unsafeAddr)
var result = ""
for x in p:
result = x.toHex() & result
result
proc closureThreadProcWrapper(closure: proc() ) =
closure()
proc testClosureThread() =
var thr: Thread[proc()]
var output = #["first"] # to be updated by thead
echo " original addr of output: ", hexDumpAddrOf(output)
proc localClosure() =
# The address of captured output is different from the one when running in a child thread.
echo " localClosure addr of output: ", hexDumpAddrOf(output)
output.add "anything"
localClosure() # print the same addr as the original one. and print
echo "invoked closure directly and external var is updated by closure: ", output # print #["first", "anything"]
createThread(thr, closureThreadProcWrapper, localClosure) # print different addr of output ???
thr.joinThread
echo "invoked closure in child thread. external var doesn't update as expected: ", output # print #["first", "anything"] ???
when isMainModule:
testClosureThread()
The output is:
original addr of output: 00007F63349C8060
localClosure addr of output: 00007F63349C8060
invoked closure directly and external var is updated by closure: #["first", "anything"]
localClosure addr of output: 00007F63348C9060
invoked closure in child thread. external var doesn't update as expected: #["first", "anything"]
From the manual about the memory model for threads:
Nim's memory model for threads is quite different than that of other common programming languages (C, Pascal, Java): Each thread has its own (garbage collected) heap, and sharing of memory is restricted to global variables. This helps to prevent race conditions. GC efficiency is improved quite a lot, because the GC never has to stop other threads and see what they reference.
In your example output is a local variable, not global. Hence you're modifying the threads copy not the one from the "outside".

Can Python gRPC do computation when sending messages out?

Suppose I need to send a large amount of data from the client to the server using python gRPC. And I want to continue the rest computation when sending the message out instead of blocking the code. Is there any way can implement this?
I will illustrate the question by an example using the modified code from the greeter_client.py
for i in range(5):
res=computation()
response = stub.SayHello(helloworld_pb2.HelloRequest(data=res))
I want the computation of the next iteration continue while sending the "res" of last iteration. To this end, I have tried the "async/await", which looks like this
async with aio.insecure_channel('localhost:50051') as channel:
stub = helloworld_pb2_grpc.GreeterStub(channel)
for j in range(5):
res=computation()
response = await stub.SayHello(helloworld_pb2.HelloRequest(data=res))
But the running time is actually the same with the version without async/await. The async/await does not work. I am wondering is there anything wrong in my codes or there are other ways?
Concurrency is different than parallelism. AsyncIO allows multiple coroutines to run on the same thread, but they are not actually computed at the same time. If the thread is given a CPU-heavy work like "computation()" in your snippet, it doesn't yield control back to the event loop, hence there won't be any progress on other coroutines.
Besides, in the snippet, the RPC depends on the result of "computation()", this meant the work will be serialized for each RPC. But we can still gain some concurrency from AsyncIO, by handing them over to the event loop with asyncio.gather():
async with aio.insecure_channel('localhost:50051') as channel:
stub = helloworld_pb2_grpc.GreeterStub(channel)
async def one_hello():
res=computation()
response = await stub.SayHello(helloworld_pb2.HelloRequest(data=res))
await asyncio.gather(*(one_hello() for _ in range(5)))

Access tasks results launched by other tasks in Dask

My application requires me to launch tasks from within other tasks, like the following
def a():
# ... some computation ..
def b():
# ... some computation ..
def c():
client = get_client()
a = client.submit(a)
b = client.submit(b)
[a,b] = client.gather([a,b])
return a+b
client = get_client()
res = client.submit(c)
However, I would like to have access to the intermediate results a and b (when calling c), but only c shows up in client.futures.
Is there a way to tell dask to keep the results for a and b?
I have tried to use the Future.add_done_callback method but it does not work for submit calls inside other submit calls.
Thank you
You probably want to look at Dask's coordination primitives like shared variables, queues, and pub/sub. https://docs.dask.org/en/latest/futures.html#coordination-primitives

How to find the concurrent.future input arguments for a Dask distributed function call

I'm using Dask to distribute work to a cluster. I'm creating a cluster and calling .submit() to submit a function to the scheduler. It returns a Futures object. I'm trying to figure out how to obtain the input arguments to that future object once it's been completed.
For example:
from dask.distributed import Client
from dask_yarn import YarnCluster
def somefunc(a,b,c ..., n ):
# do something
return
cluster = YarnCluster.from_specification(spec)
client = Client(cluster)
future = client.submit(somefunc, arg1, arg2, ..., argn)
# ^^^ how do I obtain the input arguments for this future object?
# `future.args` doesn't work
Futures don't hold onto their inputs. You can do this yourself though.
futures = {}
future = client.submit(func, *args)
futures[future] = args
A future only knows the key by which it is uniquely known on the scheduler. At the time of submission, if it has dependencies, these are transiently found and sent to the scheduler but no copy if kept locally.
The pattern you are after sounds more like delayed, which keeps hold of its graph, and indeed client.compute(delayed_thing) returns a future.
d = delayed(somefunc)(a, b, c)
future = client.compute(d)
dict(d.dask) # graph of things needed by d
You could communicate directly with the scheduler to find the dependencies of some key, which will in general also be keys, and so reverse-engineer the graph, but that does not sound like a great path, so I won't try to describe it here.

Parallelizing methods in Rails

My Rails web app has dozens of methods from making calls to an API and processing query result. These methods have the following structure:
def method_one
batch_query_API
process_data
end
..........
def method_nth
batch_query_API
process_data
end
def summary
method_one
......
method_nth
collect_results
end
How can I run all query methods at the same time instead of sequential in Rails (without firing up multiple workers, of course)?
Edit: all of the methods are called from a single instance variable. I think this limits the use of Sidekiq or Delay in submitting jobs simultaneously.
Ruby has the excellent promise gem. Your example would look like:
require 'future'
def method_one
...
def method_nth
def summary
result1 = future { method_one }
......
resultn = future { method_nth }
collect_results result1, ..., resultn
end
Simple, isn't it? But let's get to more details. This is a future object:
result1 = future { method_one }
It means, the result1 is getting evaluated in the background. You can pass it around to other methods. But result1 doesn't have any result yet, it is still processing in the background. Think of passing around a Thread. But the major difference is - the moment you try to read it, instead of passing it around, it blocks and waits for the result at that point. So in the above example, all the result1 .. resultn variables will keep getting evaluated in the background, but when the time comes to collect the results, and when you try to actually read these values, the reads will wait for the queries to finish at that point.
Install the promise gem and try the below in Ruby console:
require 'future'
x = future { sleep 20; puts 'x calculated'; 10 }; nil
# adding a nil to the end so that x is not immediately tried to print in the console
y = future { sleep 25; puts 'y calculated'; 20 }; nil
# At this point, you'll still be using the console!
# The sleeps are happening in the background
# Now do:
x + y
# At this point, the program actually waits for the x & y future blocks to complete
Edit: Typo in result, should have been result1, change echo to puts
You can take a look at a new option in town: The futoroscope gem.
As you can see by the announcing blog post it tries to solve the same problem you are facing, making simultaneous API query's. It seems to have pretty good support and good test coverage.
Assuming that your problem is a slow external API, a solution could be the use of either threaded programming or asynchronous programming. By default when doing IO, your code will block. This basically means that if you have a method that does an HTTP request to retrieve some JSON your method will tell your operating system that you're going to sleep and you don't want to be woken up until the operating system has a response to that request. Since that can take several seconds, your application will just idly have to wait.
This behavior is not specific to just HTTP requests. Reading from a file or a device such as a webcam has the same implications. Software does this to prevent hogging up the CPU when it obviously has no use of it.
So the question in your case is: Do we really have to wait for one method to finish before we can call another? In the event that the behavior of method_two is dependent on the outcome of method_one, then yes. But in your case, it seems that they are individual units of work without co-dependence. So there is a potential for concurrency execution.
You can start new threads by initializing an instance of the Thread class with a block that contains the code you'd like to run. Think of a thread as a program inside your program. Your Ruby interpreter will automatically alternate between the thread and your main program. You can start as many threads as you'd like, but the more threads you create, the longer turns your main program will have to wait before returning to execution. However, we are probably talking microseconds or less. Let's look at an example of threaded execution.
def main_method
Thread.new { method_one }
Thread.new { method_two }
Thread.new { method_three }
end
def method_one
# something_slow_that_does_an_http_request
end
def method_two
# something_slow_that_does_an_http_request
end
def method_three
# something_slow_that_does_an_http_request
end
Calling main_method will cause all three methods to be executed in what appears to be parallel. In reality they are still being sequentually processed, but instead of going to sleep when method_one blocks, Ruby will just return to the main thread and switch back to method_one thread, when the OS has the input ready.
Assuming each method takes two 2 ms to execute minus the wait for the response, that means all three methods are running after just 6 ms - practically instantly.
If we assume that a response takes 500 ms to complete, that means you can cut down your total execution time from 2 + 500 + 2 + 500 + 2 + 500 to just 2 + 2 + 2 + 500 - in other words from 1506 ms to just 506 ms.
It will feel like the methods are running simultanously, but in fact they are just sleeping simultanously.
In your case however you have a challenge because you have an operation that is dependent on the completion of a set of previous operations. In other words, if you have task A, B, C, D, E and F, then A, B, C, D and E can be performed simultanously, but F cannot be performed until A, B, C, D and E are all complete.
There are different ways to solve this. Let's look at a simple solution which is creating a sleepy loop in the main thread that periodically examines a list of return values to make sure some condition is fullfilled.
def task_1
# Something slow
return results
end
def task_2
# Something slow
return results
end
def task_3
# Something slow
return results
end
my_responses = {}
Thread.new { my_responses[:result_1] = task_1 }
Thread.new { my_responses[:result_2] = task_2 }
Thread.new { my_responses[:result_3] = task_3 }
while (my_responses.count < 3) # Prevents the main thread from continuing until the three spawned threads are done and have dumped their results in the hash.
sleep(0.1) # This will cause the main thread to sleep for 100 ms between each check. Without it, you will end up checking the response count thousands of times pr. second which is most likely unnecessary.
end
# Any code at this line will not execute until all three results are collected.
Keep in mind that multithreaded programming is a tricky subject with numerous pitfalls. With MRI it's not so bad, because while MRI will happily switch between blocked threads, MRI doesn't support executing two threads simultanously and that solves quite a few concurrency concerns.
If you want to get into multithreaded programming, I recommend this book:
http://www.amazon.com/Java-Concurrency-Practice-Brian-Goetz/dp/0321349601
It's centered around Java, but the pitfalls and concepts explained are universal.
You should check out Sidekiq.
RailsCasts episode about Sidekiq.

Resources