Trying to manage multiple Flux/Mono, starting a few of them before others, and combining some of them, and getting a bit lost

Trying to manage multiple Flux/Mono, starting a few of them before others, and combining some of them, and getting a bit lost - project-reactor

I have a module that accepts entity IDs and a "resolution type" as parameters, and then gathers data (primarily) asynchronously via multiple operations that return Fluxes. The resolution is broken into multiple (primarily, again) asynchronous operations that each work on gathering different data types that contribute to the resolution. I say "primarily" asynchronously because some of the resolution types require some preliminary operation(s) that must happen synchronously to provide information for the remaining asynchronous Flux operations of the resolution. Now, while this synchronous operation is taking place, at least a portion of the overall asynchronous resolution operation can begin. I would like to start these Flux operations while the synchronous operations are taking place. Then, once the synchronous data has been resolved, I can get each Flux for the remaining operations underway. Some resolution types will have all Flux operations returning data, while others gather less information, and some of the Flux operations will remain empty. The resolution operations are time-expensive, and I would like to be able to start some Flux operations earlier so that I can compress the time a bit -- that is quite important for what I am accomplishing. So eager subscription is ideal, as long as I can guarantee that I will not miss any item emission.
With that in mind, how can I:
Create a "holder" or a "container" for each of the Flux operations that will be needed to resolve everything, and initialize them as empty (like Flux.empty())
Add items to whatever I can create in item 1 above -- it was initialized as empty, but I might want the data from one or multiple finite and asynchronous Flux operations, but I do not care to keep them separate, and they can appear as one stream when I will use collectList() on them to produce a Mono.
When some of these Flux operations should start before some of the others, how can I start them, and ensure that I do not miss any data? And if I start a name resolution Flux, for example, can I add to it, as in item 2 above? Let's say that I want to start retrieving some data, then perform a synchronous operation, and then I create another name resolution Flux from the result of the synchronous operation, can I append this new Flux to the original name resolution Flux, since it will be returning the same data type? I am aware of Flux.merge(), but it would be convenient to work with a single Flux reference that I can keep adding to, if possible.
Will I need a collection object, like a list, and then use a merge operation? Initially, I thought about using a ConnectableFlux, until I realized that it is for connecting multiple subscribers, rather than for connecting multiple publishers. Connecting multiple publishers is what I think would be a good answer for my need, unless this is a common pattern that can be handled in a better way.
I have only been doing reactive programming for a short time, so please be patient with the way I am trying to describe what I want to do. If I can better clarify my intentions, please let me know where I have been unclear, and I will gladly attempt to clear it up. Thanks in advance for your time and help!
EDIT:
Here is the final Kotlin version, nice and concise:
private val log = KotlinLogging.logger {}
class ReactiveDataService {
private val createMono: () -> Mono<List<Int>> = {
Flux.just(9, 8, 7)
.flatMap {
Flux.fromIterable(List(it) { Random.nextInt(0, 100) })
.parallel()
.runOn(Schedulers.boundedElastic())
}
.collectList()
.cache()
}
private val processResults: (List<String>, List<String>) -> String =
{ d1, d2 -> "\n\tdownstream 1: $d1\n\tdownstream 2: $d2" }
private val convert: (List<Int>, Int) -> Flux<String> =
{ data, multiplier -> Flux.fromIterable(data.map { String.format("%3d", it * multiplier) }) }
fun doQuery(): String? {
val mono = createMono()
val downstream1 = mono.flatMapMany { convert(it, 1) }.collectList()
val downstream2 = mono.flatMapMany { convert(it, 2) }.collectList()
return Mono.zip(downstream1, downstream2, processResults).block()
}
}
fun main() {
val service = ReactiveDataService()
val start = System.currentTimeMillis()
val result = service.doQuery()
log.info("{}\n\tTotal time: {}ms", result, System.currentTimeMillis() - start)
}
And the output:
downstream 1: [ 66, 39, 40, 88, 97, 35, 70, 91, 27, 12, 84, 37, 35, 15, 45, 27, 85, 22, 55, 89, 81, 21, 43, 62]
downstream 2: [132, 78, 80, 176, 194, 70, 140, 182, 54, 24, 168, 74, 70, 30, 90, 54, 170, 44, 110, 178, 162, 42, 86, 124]
Total time: 209ms

It sounds like an ideal job for reactor. The synchronous calls can be wrapped to return as Fluxes (or Monos) using an elastic scheduler to allow them to be executed in parallel. Then using the various operators you can compose them all together to make a single Flux which represents the result. Subscribe to that Flux and the whole machine will kick off.
I think you need to use Mono.flatMapMany instead of Flux.usingWhen.
public class ReactiveDataService {
public static void main(final String[] args) {
ReactiveDataService service = new ReactiveDataService();
service.doQuery();
}
private Flux<Integer> process1(final List<Integer> data) {
return Flux.fromIterable(data);
}
private Flux<Integer> process2(final List<Integer> data) {
return Flux.fromIterable(data).map(i -> i * 2);
}
private String process3(List<Integer> downstream1, List<Integer> downstream2) {
System.out.println("downstream 1: " + downstream1);
System.out.println("downstream 2: " + downstream2);
return "Done";
}
private void doQuery() {
final Mono<List<Integer>> mono =
Flux.just(9, 8, 7)
.flatMap(
limit ->
Flux.fromStream(
Stream.generate(() -> new Random().nextInt(100))
.peek(
i -> {
try {
Thread.sleep(500);
} catch (InterruptedException ignored) {
}
})
.limit(limit))
.parallel()
.runOn(Schedulers.boundedElastic()))
.collectList()
.cache();
final Mono<List<Integer>> downstream1 = mono.flatMapMany(this::process1).collectList();
final Mono<List<Integer>> downstream2 = mono.flatMapMany(this::process2).collectList();
Mono.zip(downstream1, downstream2, this::process3).block();
}
}

Related

Dart - Detect non-completing futures

I have a Dart console app that is calling into a third-party library.
When my console app calls the third-party library the call to the method returns however my CLI app then 'hangs' for 10 seconds or so before finally shutting down.
I suspect that the library has some type of resource that it has created but has not closed/completed.
My best guess is that it is a non-completed future.
So I'm looking for ways to detect resources that haven't been freed.
My first port of call would be looking for a technique to detect futures that haven't been completed but solutions for other resource types would be useful.
I'm currently using a runZoneGuarded, passing in a ZoneSpecification to hook calls.
Edit: with some experimentation, I've found I can detect timers and cancel them. In a simple experiment, I've found that a non-cancelled timer will cause the app to hang. If I cancel the timers (during my checkLeaks method) the app will shut down, however, this isn't enough in my real-world app so I'm still looking for ways to detect other resources.
Here is the experimental code I have:
#! /usr/bin/env dcli
import 'dart:async';
import 'package:dcli/dcli.dart';
import 'package:onepub/src/pub/global_packages.dart';
import 'package:onepub/src/pub/system_cache.dart';
import 'package:onepub/src/version/version.g.dart';
import 'package:pub_semver/pub_semver.dart';
void main(List<String> arguments) async {
print(orange('OnePub version: $packageVersion '));
print('');
print(globals);
// await globals.repairActivatedPackages();
await runZonedGuarded(() async {
Timer(Duration(seconds: 20), () => print('timer done'));
unawaited(Future.delayed(Duration(seconds: 20)));
var completer = Completer();
unawaited(
Future.delayed(Duration(seconds: 20), () => completer.complete()));
// await globals.activateHosted(
// 'dcli_unit_tester',
// VersionConstraint.any,
// null, // all executables
// overwriteBinStubs: true,
// url: null, // hostedUrl,
// );
print('end activate');
}, (error, stackTrace) {
print('Uncaught error: $error');
}, zoneSpecification: buildZoneSpec());
print('end');
checkLeaks();
// await entrypoint(arguments, CommandSet.ONEPUB, 'onepub');
}
late final SystemCache cache = SystemCache(isOffline: false);
GlobalPackages? _globals;
GlobalPackages get globals => _globals ??= GlobalPackages(cache);
List<void Function()> actions = [];
List<Source<Timer>> timers = [];
int testCounter = 0;
int timerCount = 0;
int periodicCallbacksCount = 0;
int microtasksCount = 0;
ZoneSpecification buildZoneSpec() {
return ZoneSpecification(
createTimer: (source, parent, zone, duration, f) {
timerCount += 1;
final result = parent.createTimer(zone, duration, f);
timers.add(Source(result));
return result;
},
createPeriodicTimer: (source, parent, zone, period, f) {
periodicCallbacksCount += 1;
final result = parent.createPeriodicTimer(zone, period, f);
timers.add(Source(result));
return result;
},
scheduleMicrotask: (source, parent, zone, f) {
microtasksCount += 1;
actions.add(f);
final result = parent.scheduleMicrotask(zone, f);
return result;
},
);
}
void checkLeaks() {
print(actions.length);
print(timers.length);
print('testCounter $testCounter');
print('timerCount $timerCount');
print('periodicCallbacksCount $periodicCallbacksCount');
print('microtasksCount $microtasksCount');
for (var timer in timers) {
if (timer.source.isActive) {
print('Active Timer: ${timer.st}');
timer.source.cancel();
}
}
}
class Source<T> {
Source(this.source) {
st = StackTrace.current;
}
T source;
late StackTrace st;
}
I'm my real-world testing I can see that I do have hanging timers caused by HTTP connections. As I originally guessed this does seem to point to some other problem with the HTTP connections not being closed down correctly.
Active Timer: #0 new Source (file:///home/bsutton/git/onepub/onepub/bin/onepub.dart:105:21)
#1 buildZoneSpec.<anonymous closure> (file:///home/bsutton/git/onepub/onepub/bin/onepub.dart:68:18)
#2 _CustomZone.createTimer (dart:async/zone.dart:1388:19)
#3 new Timer (dart:async/timer.dart:54:10)
#4 _HttpClientConnection.startTimer (dart:_http/http_impl.dart:2320:18)
#5 _ConnectionTarget.returnConnection (dart:_http/http_impl.dart:2381:16)
#6 _HttpClient._returnConnection (dart:_http/http_impl.dart:2800:41)
#7 _HttpClientConnection.send.<anonymous closure>.<anonymous closure>.<anonymous closure> (dart:_http/http_impl.dart:2171:25)
#8 _rootRunUnary (dart:async/zone.dart:1434:47)

In general, it's impossible to find things that doesn't happen.
There is no way to find all futures in the program.
With a zone, you might be able to intercept all the callbacks being "registered" in the zone, but you can't know which of them must be called. A future can have both value handlers and an error handlers, and at most one of them will ever be called. So, just because a callback on a future isn't called, it doesn't mean the future didn't complete.
A future most likely won't keep the isolate alive, though.
An incompleted future will just be garbage collected if nothing important is hanging on to it.
The most likely culprits for keeping an isolate alive are timers and receive ports.
(The VM internal implementation of timers, and I/O, and sockets, all use receive ports, so it's really just the ports.)
Again, there is no way to find all open ports programmatically.
You need a debugger with memory inspection tools for that.
I'd recommend using the developer tools to look for instances of ReceivePort or RawReceivePort that are not being garbage collected, and see whether they are still alive.
Also be careful with runZonedGuarded.
Since runZonedGuarded introduces a new error zone (because it introduces an uncaught error handler in the new zone), an error future created inside the zone will not be seen to complete outside the zone.
That means that the code:
await runZonedGuarded(() async {
will not work if the body throws. The error of the future is handled by the zone instead of the await, so the await just sees a future which never completes.

rootBundle.loadString hanging for large-ish (50k+) files due to isolate?

I'm trying to load a large-ish (1000 lines, 68k) text file using
final String enString = await rootBundle.loadString('res/string/string_en.json');
The Dart class function AssetBundle.loadString that loads the string is
Future<String> loadString(String key, { bool cache = true }) async {
final ByteData data = await load(key);
if (data == null)
throw FlutterError('Unable to load asset: $key');
// 50 KB of data should take 2-3 ms to parse on a Moto G4, and about 400 μs
// on a Pixel 4.
if (data.lengthInBytes < 50 * 1024) {
return utf8.decode(data.buffer.asUint8List());
}
// For strings larger than 50 KB, run the computation in an isolate to
// avoid causing main thread jank.
return compute(_utf8decode, data, debugLabel: 'UTF8 decode for "$key"');
}
Looking at the code above, if the file is bigger than 50k, as mine is, an isolate is used.
As a test, I cut my file in half (so 32k) and it loaded in a second (not using the isolate). But, unedited, the function hangs when the isolate is used.
My files is just a simple json file of key-value pairs. Here are the first few lines
{
"ctaButtonConfirm": "Confirm",
"ctaButtonContinue": "Continue",
"ctaButtonReview": "Review",
"balance": "Balance",
"totalBalance": "Total Balance",
"transactions": "Transactions",
:
Seem like it hangs when the isolate is used?
EDIT
Based on the loadString code above I wrote an extension function that doesn't use an isolate and it works fine, so it's looking like the isolate doesn't like my file?
extension AssetBundleX on AssetBundle {
Future<String> loadStringWithoutIsolate(String key) async {
final ByteData data = await load(key);
return utf8.decode(data.buffer.asUint8List());
}
}

You can't access rootBundle from spawned isolate.
So use main isolate instead.
Or in [docs](This is useful for operations that take longer than a few milliseconds, and which would therefore risk skipping frames. For tasks that will only take a few milliseconds, consider SchedulerBinding.scheduleTask instead.)
you can try SchedulerBinding.scheduleTask instead.

How do I write a futures::Stream to disk without storing it entirely in memory first?

There's an example of downloading a file with Rusoto S3 here:
How to save a file downloaded from S3 with Rusoto to my hard drive?
The problem is that it looks like it's downloading the whole file into memory and then writing it to disk, because it uses the write_all method which takes an array of bytes, not a stream. How can I use the StreamingBody, which implements futures::Stream to stream the file to disk?

Since StreamingBody implements Stream<Item = Vec<u8>, Error = Error>, we can construct a MCVE that represents that:
extern crate futures; // 0.1.25
use futures::{prelude::*, stream};
type Error = Box<std::error::Error>;
fn streaming_body() -> impl Stream<Item = Vec<u8>, Error = Error> {
const DUMMY_DATA: &[&[u8]] = &[b"0123", b"4567", b"89AB", b"CDEF"];
let iter_of_owned_bytes = DUMMY_DATA.iter().map(|&b| b.to_owned());
stream::iter_ok(iter_of_owned_bytes)
}
We can then get a "streaming body" somehow and use Stream::for_each to process each element in the Stream. Here, we just call write_all with some provided output location:
use std::{fs::File, io::Write};
fn save_to_disk(mut file: impl Write) -> impl Future<Item = (), Error = Error> {
streaming_body().for_each(move |chunk| file.write_all(&chunk).map_err(Into::into))
}
We can then write a little testing main:
fn main() {
let mut file = Vec::new();
{
let fut = save_to_disk(&mut file);
fut.wait().expect("Could not drive future");
}
assert_eq!(file, b"0123456789ABCDEF");
}
Important notes about the quality of this naïve implementation:
The call to write_all may potentially block, which you should not do in an asynchronous program. It would be better to hand off that blocking work to a threadpool.
The usage of Future::wait forces the thread to block until the future is done, which is great for tests but may not be correct for your real use case.
See also:
What is the best approach to encapsulate blocking I/O in future-rs?
How do I synchronously return a value calculated in an asynchronous Future in stable Rust?

React-Native + Redux + ImmutableJS Memory Leak

I have a strange memory leak in my React-Native app. It's a constant RAM increase.
My state is normalized, and then converted to an immutable state. There is a sockets handler which updates existing objects in state. This causes the RAM to slowly increase as new messages are updating state.
State:
const state = {
entities: {
2000: {
1: {
id: 1,
name: "I am normalized",
coordinates:[
{
lat: 0,
lng: 0
}
]
},
2: {
id: 2,
name: "me too",
coordinates:[
{
lat: 0,
lng: 0
}
]
}
},
1337: {
2: {
id: 2,
name: "me too",
coordinates:[
{
lat: 0,
lng: 0
}
]
},
3: {
id: 3,
name: "also normalized",
coordinates:[
{
lat: 0,
lng: 0
}
]
}
}
},
results: {
2000: [1,2],
1337: [2,3]
},
};
This is then converted with fromJS() to immutable state.
I have a sockets handler, which passes the action.payload to a reducer.
action = {
payload: {
message_type: COORDINATES_UPDATE,
messages: [
{
id: 1,
coordinates: [
{
lat: 180,
lng: 180
}
]
},
{
id: 2,
coordinates: [
{
lat: 90,
lng: 90
}
]
}
]
}
}
reducer that handles the incoming action:
case SOCKET_MESSAGE: {
let newState = state;
if(action.payload.message_type == "COORDINATES_UPDATE") {
action.payload.messages.map((incoming_message) => {
let id = incoming_message.id;
let coordinates = incoming_message.coordinates;
newState.get("results").map((data, entities_id) => {
if(data.indexOf(id) > -1) {
newState = newState.setIn(["entities", entities_id, "" + id, "coordinates"], fromJS(coordinates));
}
})
})
return newState;
}
}
This searches the results Map() for an existing id, if it does exist, it updates the entities object. As far as I know, there are no problems with this logic, the state properly updates and is reflected in the render() component, altho for debug purposes I am rendering an empty <View /> as my whole app, and only updating state.
However each setIn, or updateIn increases RAM ever so slightly, and with the frequency of updates I get it grows to GB in minutes.
Relevant Packages:
"react": "16.0.0",
"react-native": "0.50.3",
"immutable": "^3.8.2",
"normalizr": "^3.2.4",
"redux": "^3.7.2",

Oh, that's a huge one ;)
You should probably check 2 things:
0) How many socket connections do you have ? You might have 5-10 and all the data is multiplied
1) Do you use redux-dev-tool? It might consume very big amount of memory in your case, consider to deactivate for production/testing

Normalizr is originally developed for browsers. The issue of memory consumption of entities is well-known, however, practically it has not been regarded as a barrier because page lifecycles on browsers are short enough.
Best practices to handle memory consumption of normalized data · Issue #47 · paularmstrong/normalizr
Meanwhile, in a native application, memory must be released properly. Unfortunately, there is no elegant library or solution at present. As compromise plans:
A. Don't use Normalizr. Update every entity by yourself.
B. Implement garbage collection on entities by yourself.

I ended up trying out many different solutions from this thread and github, but the most time consuming(and sad) one worked out best. Taking out ImmutableJS.
Memory stabilized after I replaced it with lodash functions.
const newState = state; //just because i don't want to mutate state
let updates = {}; //put my updates here
return _.merge({}, newState, updates); //merge into an empty object
reselect expects a new object every time the state mutates, and i have a very nested state structure. hence i went with _.merge({}, ...) instead of _.assign
There's still a very slight increase but this is way better than what it used to be.

Several parallel batches in Neo4jphp

Is it possible to create several batches at one time?
For example I have a code that has a running batch (batch 1). And inside this batch I have a method called which has another batch inside it (batch 2). The code is not working.
When I remove the upper batch (batch 1) I have a created node. Maybe there is only 1 batch possible at one time?
The example code is below:
$batch = $client->startBatch();
$widget = NULL;
try {
$widgetLabel = $client->makeLabel('Widget');
$widget = $client->makeNode();
$widget
->setProperty('base_filename', md5(uniqid('', TRUE)))
->setProperty('datetime_added', time())
->setProperty('current_version', 0)
->setProperty('shared', 0)
->setProperty('active', 1)
->save();
// add widget history
$history = Model_History::create($widget, $properties);
if ($history == NULL) {
throw new Exception('Could not create widget history!');
}
$widget->setProperty('current_version', $history->getID());
$widget->save();
$client->commitBatch($batch);
} catch (Exception $e) {
$client->endBatch();
}
The batch 2 is inside the Model_History::create() method. I don't get a valid $widget - Neo4jphp node from this code.

If the second batch is being create with another call to $client->startBatch() it will actually be the same batch object as $batch. If you call $client->commitBatch() from there, it will commit the outer batch (since they are the same.)
Don't start a second batch in Model_History::create(). Start the outer batch, go through all your code, and commit it once at the end.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart