Reactor. List of Monos, retry on fail - project-reactor

I have list List<Mono<String>>. Each Mono represents API call where I wait on I/O for result. The problem is that some times some calls return nothing (empty String), and I need repeat them again on that case.
Now it looks like this:
val firstAskForItemsRetrieved = firstAskForItems.map {
it["statistic"] = (it["statistic"] as Mono<Map<Any, Any>>).block()
it
}
I'm waiting for all Monos to finish, then in case of empty body I repeat request
val secondAskForItem = firstAskForItemsRetrieved
.map {
if ((it["statistic"] as Map<Any, Any>).isEmpty()) {
// repeat request
it["statistic"] = getUserItem(userName) // return Mono
} else
it["statistic"] = Mono.just(it["statistic"])
it
}
And then block on each item again
val secondAskForItemsRetrieved = secondAskForItems.map {
it["statistic"] = (it["statistic"] as Mono<Map<Any, Any>>).block()
it
}
I see that looks ugly
Are any other ways to retry call in Mono if it fails, without doing it manually?
Is it block on each item a right way to get them all?
How to make the code better?
Thank you.

There are 2 operators I believe can help your:
For the "wait for all Mono" use case, have a look at the static methods when and zip.
when just cares about completion, so even if the monos are empty it will just signal an onComplete whenever all of the monos have finished. You don't get the data however.
zip cares about the values and expects all Monos to be valued. When all Monos are valued, it combines their values according to the passed Function. Otherwise it just completes empty.
To retry the empty Monos, have a look at repeatWhenEmpty. It resubscribes to an empty Mono, so if that Mono is "cold" it would restart the source (eg. make another HTTP request).

Related

sol3: Is it possible to reset the lua state so that any coroutine function can be re-run from the beginning?

This is the overall flow of my setup:
void ScriptPlayer::new_lua_state() {
lua = {};
lua.open_libraries(sol::lib::base, sol::lib::package, sol::lib::coroutine, sol::lib::math);
[...]
// Proceeds to initialize the state with usertype definitions and values
}
void ScriptPlayer::play(std::string path) {
main_coroutine = sol::nil;
script_env = sol::environment(lua, sol::create, lua.globals());
auto result = lua.load_file(path);
main_coroutine = sol::coroutine(result);
script_env.set_on(main_coroutine);
}
void ScriptPlayer::update() {
if (main_coroutine) {
main_coroutine();
}
}
"new_lua_state" is called once at the beginning of everything, then "play" is called anytime I want to execute a new lua script (that yields). "update" is executed every frame, and progresses the coroutine until it's finished, at which point it stops.
The problem:
If I call "play" while the previous script coroutine has yielded but hasn't yet finished, I expect lua to discard the whole environment and create a new one, discard the old coroutine, parse the script again, create a brand new coroutine and start its execution from the beginning.
What I get instead is that the coroutine will STILL be running from the state of the previous script's coroutine (which should be completely discarded) and not from the very beginning.
How is this possible? Where exactly is the state of the coroutine stored?
I tried wrapping the state with a thread, I tried calling lua.clear_stack but nothing made any difference in the fact that the new coroutine never starts from the beginning of the function when I re-parse the script and re-create the sol::coroutine object.
Any clarification is hugely appreciated.
Here was the solution:
https://github.com/ThePhD/sol2/issues/1061
Apparently my attempt of wrapping the state with a thread was faulty, because that was exactly the thing to do.
So to solve this, here's what I did:
void ScriptPlayer::play(std::string path) {
script_env = sol::nil;
main_coroutine = sol::nil;
script_thread = sol::thread::create(lua);
script_env = sol::environment(script_thread.state(), sol::create,
script_thread.state().globals());
auto result = script_thread.state().load_file(path);
main_coroutine = sol::coroutine(result);
script_env.set_on(main_coroutine);
}
One this that still blows my mind is that if I remove the second line (that reset the C-held reference to the lua coroutine), the result goes back to wrongly resuming the previous coroutine, despite that very same variable being set to a different value shortly after..
This baffles me deeply.

Combine framework: how to process each element of array asynchronously before proceeding

I'm having a bit of a mental block using the iOS Combine framework.
I'm converting some code from "manual" fetching from a remote API to using Combine. Basically, the API is SQL and REST (in actual fact it's Salesforce, but that's irrelevant to the question). What the code used to do is call a REST query method that takes a completion handler. What I'm doing is replacing this everywhere with a Combine Future. So far, so good.
The problem arises when the following scenario happens (and it happens a lot):
We do a REST query and get back an array of "objects".
But these "objects" are not completely populated. Each one of them needs additional data from some related object. So for each "object", we do another REST query using information from that "object", thus giving us another array of "objects".
This might or might not allow us to finish populating the first "objects" — or else, we might have to do another REST query using information from each of the second "object", and so on.
The result was a lot of code structured like this (this is pseudocode):
func fetchObjects(completion: #escaping ([Object] -> Void) {
let restQuery = ...
RESTClient.performQuery(restQuery) { results in
let partialObjects = results.map { ... }
let group = DispatchGroup()
for partialObject in partialObjects {
let restQuery = ... // something based on partialObject
group.enter()
RESTClient.performQuery(restQuery) { results in
group.leave()
let partialObjects2 = results.map { ... }
partialObject.property1 = // something from partialObjects2
partialObject.property2 = // something from partialObjects2
// and we could go down yet _another_ level in some cases
}
}
group.notify {
completion([partialObjects])
}
}
}
Every time I say results in in the pseudocode, that's the completion handler of an asynchronous networking call.
Okay, well, I see well enough how to chain asynchronous calls in Combine, for example by using Futures and flatMap (pseudocode again):
let future1 = Future...
future1.map {
// do something
}.flatMap {
let future2 = Future...
return future2.map {
// do something
}
}
// ...
In that code, the way we form future2 can depend upon the value we received from the execution of future1, and in the map on future2 we can modify what we received from upstream before it gets passed on down the pipeline. No problem. It's all quite beautiful.
But that doesn't give me what I was doing in the pre-Combine code, namely the loop. Here I was, doing multiple asynchronous calls in a loop, held in place by a DispatchGroup before proceeding. The question is:
What is the Combine pattern for doing that?
Remember the situation. I've got an array of some object. I want to loop through that array, doing an asynchronous call for each object in the loop, fetching new info asynchronously and modifying that object on that basis, before proceeding on down the pipeline. And each loop might involve a further nested loop gathering even more information asynchronously:
Fetch info from online database, it's an array
|
V
For each element in the array, fetch _more_ info, _that's_ an array
|
V
For each element in _that_ array, fetch _more_ info
|
V
Loop thru the accumulated info and populate that element of the original array
The old code for doing this was horrible-looking, full of nested completion handlers and loops held in place by DispatchGroup enter/leave/notify. But it worked. I can't get my Combine code to work the same way. How do I do it? Basically my pipeline output is an array of something, I feel like I need to split up that array into individual elements, do something asynchronously to each element, and put the elements back together into an array. How?
The way I've been solving this works, but doesn't scale, especially when an asynchronous call needs information that arrived several steps back in the pipeline chain. I've been doing something like this (I got this idea from https://stackoverflow.com/a/58708381/341994):
An array of objects arrives from upstream.
I enter a flatMap and map the array to an array of publishers, each headed by a Future that fetches further online stuff related to one object, and followed by a pipeline that produces the modified object.
Now I have an array of pipelines, each producing a single object. I merge that array and produce that publisher (a MergeMany) from the flatMap.
I collect the resulting values back into an array.
But this still seems like a lot of work, and even worse, it doesn't scale when each sub-pipeline itself needs to spawn an array of sub-pipelines. It all becomes incomprehensible, and information that used to arrive easily into a completion block (because of Swift's scoping rules) no longer arrives into a subsequent step in the main pipeline (or arrives only with difficulty because I pass bigger and bigger tuples down the pipeline).
There must be some simple Combine pattern for doing this, but I'm completely missing it. Please tell me what it is.
With your latest edit and this comment below:
I literally am asking is there a Combine equivalent of "don't proceed to the next step until this step, involving multiple asynchronous steps, has finished"
I think this pattern can be achieved with .flatMap to an array publisher (Publishers.Sequence), which emits one-by-one and completes, followed by whatever per-element async processing is needed, and finalized with a .collect, which waits for all elements to complete before proceeding
So, in code, assuming we have these functions:
func getFoos() -> AnyPublisher<[Foo], Error>
func getPartials(for: Foo) -> AnyPublisher<[Partial], Error>
func getMoreInfo(for: Partial, of: Foo) -> AnyPublisher<MoreInfo, Error>
We can do the following:
getFoos()
.flatMap { fooArr in
fooArr.publisher.setFailureType(to: Error.self)
}
// per-foo element async processing
.flatMap { foo in
getPartials(for: foo)
.flatMap { partialArr in
partialArr.publisher.setFailureType(to: Error.self)
}
// per-partial of foo async processing
.flatMap { partial in
getMoreInfo(for: partial, of: foo)
// build completed partial with more info
.map { moreInfo in
var newPartial = partial
newPartial.moreInfo = moreInfo
return newPartial
}
}
.collect()
// build completed foo with all partials
.map { partialArr in
var newFoo = foo
newFoo.partials = partialArr
return newFoo
}
}
.collect()
(Deleted the old answer)
Using the accepted answer, I wound up with this structure:
head // [Entity]
.flatMap { entities -> AnyPublisher<Entity, Error> in
Publishers.Sequence(sequence: entities).eraseToAnyPublisher()
}.flatMap { entity -> AnyPublisher<Entity, Error> in
self.makeFuture(for: entity) // [Derivative]
.flatMap { derivatives -> AnyPublisher<Derivative, Error> in
Publishers.Sequence(sequence: derivatives).eraseToAnyPublisher()
}
.flatMap { derivative -> AnyPublisher<Derivative2, Error> in
self.makeFuture(for: derivative).eraseToAnyPublisher() // Derivative2
}.collect().map { derivative2s -> Entity in
self.configuredEntity(entity, from: derivative2s)
}.eraseToAnyPublisher()
}.collect()
That has exactly the elegant tightness I was looking for! So the idea is:
We receive an array of something, and we need to process each element asynchronously. The old way would have been a DispatchGroup and a for...in loop. The Combine equivalent is:
The equivalent of the for...in line is flatMap and Publishers.Sequence.
The equivalent of the DispatchGroup (dealing with asynchronousness) is a further flatMap (on the individual element) and some publisher. In my case I start with a Future based on the individual element we just received.
The equivalent of the right curly brace at the end is collect(), waiting for all elements to be processed and putting the array back together again.
So to sum up, the pattern is:
flatMap the array to a Sequence.
flatMap the individual element to a publisher that launches the asynchronous operation on that element.
Continue the chain from that publisher as needed.
collect back into an array.
By nesting that pattern, we can take advantage of Swift scoping rules to keep the thing we need to process in scope until we have acquired enough information to produce the processed object.

How to parallelize HTTP requests within an Apache Beam step?

I have an Apache Beam pipeline running on Google Dataflow whose job is rather simple:
It reads individual JSON objects from Pub/Sub
Parses them
And sends them via HTTP to some API
This API requires me to send the items in batches of 75. So I built a DoFn that accumulates events in a list and publish them via this API once they I get 75. This results to be too slow, so I thought instead of executing those HTTP requests in different threads using a thread pool.
The implementation of what I have right now looks like this:
private class WriteFn : DoFn<TheEvent, Void>() {
#Transient var api: TheApi
#Transient var currentBatch: MutableList<TheEvent>
#Transient var executor: ExecutorService
#Setup
fun setup() {
api = buildApi()
executor = Executors.newCachedThreadPool()
}
#StartBundle
fun startBundle() {
currentBatch = mutableListOf()
}
#ProcessElement
fun processElement(processContext: ProcessContext) {
val record = processContext.element()
currentBatch.add(record)
if (currentBatch.size >= 75) {
flush()
}
}
private fun flush() {
val payloadTrack = currentBatch.toList()
executor.submit {
api.sendToApi(payloadTrack)
}
currentBatch.clear()
}
#FinishBundle
fun finishBundle() {
if (currentBatch.isNotEmpty()) {
flush()
}
}
#Teardown
fun teardown() {
executor.shutdown()
executor.awaitTermination(30, TimeUnit.SECONDS)
}
}
This seems to work "fine" in the sense that data is making it to the API. But I don't know if this is the right approach and I have the sense that this is very slow.
The reason I think it's slow is that when load testing (by sending a few million events to Pub/Sub), it takes it up to 8 times more time for the pipeline to forward those messages to the API (which has response times of under 8ms) than for my laptop to feed them into Pub/Sub.
Is there any problem with my implementation? Is this the way I should be doing this?
Also... am I required to wait for all the requests to finish in my #FinishBundle method (i.e. by getting the futures returned by the executor and waiting on them)?
You have two interrelated questions here:
Are you doing this right / do you need to change anything?
Do you need to wait in #FinishBundle?
The second answer: yes. But actually you need to flush more thoroughly, as will become clear.
Once your #FinishBundle method succeeds, a Beam runner will assume the bundle has completed successfully. But your #FinishBundle only sends the requests - it does not ensure they have succeeded. So you could lose data that way if the requests subsequently fail. Your #FinishBundle method should actually be blocking and waiting for confirmation of success from the TheApi. Incidentally, all of the above should be idempotent, since after finishing the bundle, an earthquake could strike and cause a retry ;-)
So to answer the first question: should you change anything? Just the above. The practice of batching requests this way can work as long as you are sure the results are committed before the bundle is committed.
You may find that doing so will cause your pipeline to slow down, because #FinishBundle happens more frequently than #Setup. To batch up requests across bundles you need to use the lower-level features of state and timers. I wrote up a contrived version of your use case at https://beam.apache.org/blog/2017/08/28/timely-processing.html. I would be quite interested in how this works for you.
It may simply be that the extremely low latency you are expecting, in the low millisecond range, is not available when there is a durable shuffle in your pipeline.

Chaining dependent observables

I need to create dependent API calls where the second one needs a value returned by the first one. First thing that comes to mind is using flatMap
ApiManager.shared
.createReport(report: report)
.flatMap { (report) -> Observable<Report> in
return ApiManager.shared.createReportStep(reportID: report.ID)
}
createReport returns Observable<Report> where after successfull call returns updated Report model(with ID), after that I need to call API to create report step, where report.ID is needed.
Everything looks and works fine with that code, but the problem comes when I need to do something after each of these steps(createReport and createReportStep). I placed code in onNext block, but it is called only once, after both of the steps are completed.
Is there a way to receive onNext signal after both steps? I could use something like this:
ApiManager.shared
.createReport(report: report)
.concat(ApiManager.shared.createReportStep(reportID: report.ID))
Which would emmit two signals like I want, but then again where do I get updated report.ID from to pass to createReportStep?
If you don't mind the time component and only need to have access to both report and what is returned by createReportStep(reportID:), you could go with creating a tuple in flatMap's block
ApiManager.shared
.createReport(report: report)
.flatMap { (report) -> Observable<Report> in
return ApiManager.shared.createReportStep(reportID: report.ID)
.map { (report, $0) }
}
The resulting observable would contain both results in a tuple.
If the time component is important, you could do the following
let report = ApiManager.shared
.createReport(report: report)
.share()
let reportStep = report.map { $0.ID }.flatMap(ApiManager.shared.createReportStep)
Observable.concat([report, reportStep])
Here, the important bit is the share call. It will ensure createReport performs its work only once, but you would have two next events as requested.

Dart nested futures whenComplete fires first

I need to make a series of database queries that each return a stream of results. Once all the information is collected and sent the 'complete' message needs to be send last. In my code 'sendCompleteMessageToClient' gets sent first.
Future.forEach(centerLdapNames.values, (center) {
db
.collection(center)
.find({'date': {'\$gte': json['from'], '\$lt': json['to']}})
.forEach(sendAppointmentToClient);
}).whenComplete(() => sendCompleteMessageToClient("all"));
How do I wait for all 'sendAppointmentToClient' to finish properly?
I guess you just miss the return of the future
Future.forEach(centerLdapNames.values, (center) {
return db // <== always return the returned future from calls to async functions to keep the chain connected
.collection(center)
.find({'date': {'\$gte': json['from'], '\$lt': json['to']}})
.forEach(sendAppointmentToClient);
}).whenComplete(() => sendCompleteMessageToClient("all"));
If you use wait these calls might be executed in parallel instead of one after the other
Future.wait(centerLdapNames.values.map((center) { ...}, eagerError: false)
.whenComplete((...))

Resources