Can Python gRPC do computation when sending messages out? - communication

Suppose I need to send a large amount of data from the client to the server using python gRPC. And I want to continue the rest computation when sending the message out instead of blocking the code. Is there any way can implement this?
I will illustrate the question by an example using the modified code from the greeter_client.py
for i in range(5):
res=computation()
response = stub.SayHello(helloworld_pb2.HelloRequest(data=res))
I want the computation of the next iteration continue while sending the "res" of last iteration. To this end, I have tried the "async/await", which looks like this
async with aio.insecure_channel('localhost:50051') as channel:
stub = helloworld_pb2_grpc.GreeterStub(channel)
for j in range(5):
res=computation()
response = await stub.SayHello(helloworld_pb2.HelloRequest(data=res))
But the running time is actually the same with the version without async/await. The async/await does not work. I am wondering is there anything wrong in my codes or there are other ways?

Concurrency is different than parallelism. AsyncIO allows multiple coroutines to run on the same thread, but they are not actually computed at the same time. If the thread is given a CPU-heavy work like "computation()" in your snippet, it doesn't yield control back to the event loop, hence there won't be any progress on other coroutines.
Besides, in the snippet, the RPC depends on the result of "computation()", this meant the work will be serialized for each RPC. But we can still gain some concurrency from AsyncIO, by handing them over to the event loop with asyncio.gather():
async with aio.insecure_channel('localhost:50051') as channel:
stub = helloworld_pb2_grpc.GreeterStub(channel)
async def one_hello():
res=computation()
response = await stub.SayHello(helloworld_pb2.HelloRequest(data=res))
await asyncio.gather(*(one_hello() for _ in range(5)))

Related

Why is this gen_statem call blocking?

Hello i am trying to figure out when using the gen_statem why do the calls block,since my fsm is running in a separate process.
-module(fsm).
-record(data,{
current="None",
intvCount=0,
jobCount=0
}).
-export([init/1,terminate/3,callback_mode/0,code_change/4]).
-export([state/1,start/0,interview/2,reject/2,wait/1]).
-export([sitting_home/3,interviewing/3]).
-export([handle_event/3]).
-behaviour(gen_statem).
handle_event({call,From},get_state,Data)->
io:format("why you need state>"),
{keep_state,Data};
handle_event({call,From},Event,Data)->
{keep,state,Data}.
%API
start()->
gen_statem:start_link(?MODULE,[],[]).
state(PID)->
gen_statem:call(PID,get_state).
interview(PID,Company)->
gen_statem:call(PID,{intv,Company}).
reject(PID,Company)->
gen_statem:call(PID,{reject,Company}).
wait(PID)->
gen_statem:call(PID,{wait}).
%mandatory
code_change(V,State,Data,Extra)->{ok,State,Data}.
callback_mode() ->
state_functions.
init([])->
{ok,sitting_home,#data{current="None",jobCount=0,intvCount=0}}.
terminate(Reasom,State,Data)->
void.
% State implementations
sitting_home({call,From},{intv,Company},Data=#data{intvCount=C})->
io:format("called for interview"),
{next_state,interviewing,Data#data{intvCount=C+1},{reply,From,something}};
sitting_home({call,From},Event,Data)->
{keep_state,Data}.
interviewing({call,From},{rejected,Company},Data)->
{next_state,sitting_home,Data,{reply,From,somethingelse}};
interviewing({call,From},wait,Data)->
{keep_state,Data}.
Usage:
>{ok,Pid}=fsm:start().
>fsm:state(). //blocks !
>{ok,Pid}=fsm:start().
>fsm:interview(Pid).
called for interview %and blocks
Why do both calls block? I am spawning a different process besides the shell in which the fsm runs using the gen_statem:start_link.
Why in both cases it blocks?
Update
I have updated my post after i was pointed out that i forgot to use reply in order to send something back to the caller.However the handle_event/3 still blocks even in this form:
handle_event({call,From},get_state,Data)->
{keep_state,Data,[{reply,From,Data}]}.
Because that's what gen_statem:call does:
Makes a synchronous call to the gen_statem ServerRef by sending a request and waiting until its reply arrives
and your state functions don't send any replies. They should look like
sitting_home({call,From},{intv,Company},Data=#data{intvCount=C})->
io:format("called for interview"),
{next_state,interviewing,Data#data{intvCount=C+1},{reply,From,WhateverReplyYouWant}};
or
sitting_home({call,From},{intv,Company},Data=#data{intvCount=C})->
io:format("called for interview"),
gen_statem:reply(From, WhateverReplyYouWant),
{next_state,interviewing,Data#data{intvCount=C+1}};
If there's no useful reply, consider
using cast instead of call (and handling cast as the EventType in your state functions), or
ok as the reply.

How to find the concurrent.future input arguments for a Dask distributed function call

I'm using Dask to distribute work to a cluster. I'm creating a cluster and calling .submit() to submit a function to the scheduler. It returns a Futures object. I'm trying to figure out how to obtain the input arguments to that future object once it's been completed.
For example:
from dask.distributed import Client
from dask_yarn import YarnCluster
def somefunc(a,b,c ..., n ):
# do something
return
cluster = YarnCluster.from_specification(spec)
client = Client(cluster)
future = client.submit(somefunc, arg1, arg2, ..., argn)
# ^^^ how do I obtain the input arguments for this future object?
# `future.args` doesn't work
Futures don't hold onto their inputs. You can do this yourself though.
futures = {}
future = client.submit(func, *args)
futures[future] = args
A future only knows the key by which it is uniquely known on the scheduler. At the time of submission, if it has dependencies, these are transiently found and sent to the scheduler but no copy if kept locally.
The pattern you are after sounds more like delayed, which keeps hold of its graph, and indeed client.compute(delayed_thing) returns a future.
d = delayed(somefunc)(a, b, c)
future = client.compute(d)
dict(d.dask) # graph of things needed by d
You could communicate directly with the scheduler to find the dependencies of some key, which will in general also be keys, and so reverse-engineer the graph, but that does not sound like a great path, so I won't try to describe it here.

How to parallelize HTTP requests within an Apache Beam step?

I have an Apache Beam pipeline running on Google Dataflow whose job is rather simple:
It reads individual JSON objects from Pub/Sub
Parses them
And sends them via HTTP to some API
This API requires me to send the items in batches of 75. So I built a DoFn that accumulates events in a list and publish them via this API once they I get 75. This results to be too slow, so I thought instead of executing those HTTP requests in different threads using a thread pool.
The implementation of what I have right now looks like this:
private class WriteFn : DoFn<TheEvent, Void>() {
#Transient var api: TheApi
#Transient var currentBatch: MutableList<TheEvent>
#Transient var executor: ExecutorService
#Setup
fun setup() {
api = buildApi()
executor = Executors.newCachedThreadPool()
}
#StartBundle
fun startBundle() {
currentBatch = mutableListOf()
}
#ProcessElement
fun processElement(processContext: ProcessContext) {
val record = processContext.element()
currentBatch.add(record)
if (currentBatch.size >= 75) {
flush()
}
}
private fun flush() {
val payloadTrack = currentBatch.toList()
executor.submit {
api.sendToApi(payloadTrack)
}
currentBatch.clear()
}
#FinishBundle
fun finishBundle() {
if (currentBatch.isNotEmpty()) {
flush()
}
}
#Teardown
fun teardown() {
executor.shutdown()
executor.awaitTermination(30, TimeUnit.SECONDS)
}
}
This seems to work "fine" in the sense that data is making it to the API. But I don't know if this is the right approach and I have the sense that this is very slow.
The reason I think it's slow is that when load testing (by sending a few million events to Pub/Sub), it takes it up to 8 times more time for the pipeline to forward those messages to the API (which has response times of under 8ms) than for my laptop to feed them into Pub/Sub.
Is there any problem with my implementation? Is this the way I should be doing this?
Also... am I required to wait for all the requests to finish in my #FinishBundle method (i.e. by getting the futures returned by the executor and waiting on them)?
You have two interrelated questions here:
Are you doing this right / do you need to change anything?
Do you need to wait in #FinishBundle?
The second answer: yes. But actually you need to flush more thoroughly, as will become clear.
Once your #FinishBundle method succeeds, a Beam runner will assume the bundle has completed successfully. But your #FinishBundle only sends the requests - it does not ensure they have succeeded. So you could lose data that way if the requests subsequently fail. Your #FinishBundle method should actually be blocking and waiting for confirmation of success from the TheApi. Incidentally, all of the above should be idempotent, since after finishing the bundle, an earthquake could strike and cause a retry ;-)
So to answer the first question: should you change anything? Just the above. The practice of batching requests this way can work as long as you are sure the results are committed before the bundle is committed.
You may find that doing so will cause your pipeline to slow down, because #FinishBundle happens more frequently than #Setup. To batch up requests across bundles you need to use the lower-level features of state and timers. I wrote up a contrived version of your use case at https://beam.apache.org/blog/2017/08/28/timely-processing.html. I would be quite interested in how this works for you.
It may simply be that the extremely low latency you are expecting, in the low millisecond range, is not available when there is a durable shuffle in your pipeline.

How do I wait for multiple asynchronous operations to finish before sending a response in Ruby on Rails?

In some web dev I do, I have multiple operations beginning, like GET requests to external APIs, and I want them to both start at the same time because one doesn't rely on the result of the other. I want things to be able to run in the background. I found the concurrent-ruby library which seems to work well. By mixing it into a class you create, the class's methods have asynchronous versions which run on a background thread. This lead me to write code like the following, where FirstAsyncWorker and SecondAsyncWorker are classes I've coded, into which I've mixed the Concurrent::Async module, and coded a method named "work" which sends an HTTP request:
def index
op1_result = FirstAsyncWorker.new.async.work
op2_result = SecondAsyncWorker.new.async.work
render text: results(op1_result, op2_result)
end
However, the controller will implicitly render a response at the end of the action method's execution. So the response gets sent before op1_result and op2_result get values and the only thing sent to the browser is "#".
My solution to this so far is to use Ruby threads. I write code like:
def index
op1_result = nil
op2_result = nil
op1 = Thread.new do
op1_result = get_request_without_concurrent
end
op2 = Thread.new do
op2_result = get_request_without_concurrent
end
# Wait for the two operations to finish
op1.join
op2.join
render text: results(op1_result, op2_result)
end
I don't use a mutex because the two threads don't access the same memory. But I wonder if this is the best approach. Is there a better way to use the concurrent-ruby library, or other libraries better suited to this situation?
I ended up answering my own question after some more research into the concurrent-ruby library. Futures ended up being what I was after! Simply put, they execute a block of code in a background thread and attempting to access the Future's calculated value blocks the main thread until that background thread has completed its work. My Rails controller actions end up looking like:
def index
op1 = Concurrent::Future.execute { get_request }
op2 = Concurrent::Future.execute { another_request }
render text: "The result is #{result(op1.value, op2.value)}."
end
The line with render blocks until both async tasks have finished, at which point result can begin running.

F# Start/Stop class instance at the same time

I am doing F# programming, I have some special requirements.
I have 3 class instances; each class instance has to run for one hour every day, from 9:00AM to 10:00AM. I want to control them from main program, starting them at the same time, and stop them also at the same time. The following is my code to start them at the same time, but I don’t know how to stop them at the same time.
#light
module Program
open ClassA
open ClassB
open ClassC
let A = new CalssA.A("A")
let B = new ClassB.B("B")
let C = new ClassC.C("C")
let task = [ async { return A.jobA("A")};
async { return B.jobB("B")};
async { return C.jobC("C")} ]
task |> Async.Parallel |> Async.RunSynchronously |> ignore
Anyone knows hows to stop all 3 class instances at 10:00AM, please show me your code.
Someone told me that I can use async with cancellation tokens, but since I am calling instance of classes in different modules, it is difficult for me to find suitable code samples.
Thanks,
The jobs themselves need to be stoppable, either by having a Stop() API of some sort, or cooperatively being cancellable via CancellationTokens or whatnot, unless you're just talking about some job that spins in a loop and you'll just thread-abort it eventually? Need more info about what "stop" means in this context.
As Brian said, the jobs themselves need to support cancellation. The programming model for cancellation that works the best with F# is based on CancellationToken, because F# keeps CancellationToken automatically in asynchronous workflows.
To implement the cancellation, your JobA methods will need to take additional argument:
type A() =
member x.Foo(str, cancellationToken:CancellationToken) =
for i in 0 .. 10 do
cancellationToken.ThrowIfCancellationRequested()
someOtherWork()
The idea is that you call ThrowIfCancellationRequested frequently during the execution of your job. If a cancellation is requested, the method thorws and the operation will stop. Once you do this, you can write asynchronous workflow that gets the current CancellationToken and passes it to JobA member when calling it:
let task =
[ async { let! tok = Async.CancellationToken
return A.JobA("A", tok) };
async { let! tok = Async.CancellationToken
return B.JobB("B") }; ]
Now you can create a new token using CancellationTokenSource and start the workflow. When you then cancel the token source, it will automatically stop any jobs running as part of the workflow:
let src = new CancellationTokenSource()
Async.Start(task, cancellationToken = src.Token)
// To cancel the job:
src.Cancel()
You asked this question on hubfs.net, and I'll repeat here my answer: try using Quartz.NET. You'd just implement IInteruptableJob in A,B,C, defining how they stop. Then another job at 10:00AM to stop the others.
Quartz.NET has a nice tutorial, FAQ, and lots of examples. It's pretty easy to use for simple cases like this, yet very powerful if you ever need more complex scheduling, monitoring jobs, logging, etc.

Resources