Are these two Observable Operations Equivalent? - f#

I'm not sure why, but for some reason when using the observable that is created via concat I will always get all values that are pushed from my list (works as intended). Where as with the normal subscribe it seems that some values never make it to those who have subscribed to the observable (only in certain conditions).
These are the two cases that I am using. Could anyone attempt to explain why in certain cases when subscribing to the second version not all values are received? Are they not equivalent? The intent here is to rewind the stream. What are some reasons that could explain why Case 2 fails while Case 1 does not.
Replay here is just a list of the ongoing stream.
Case 1.
let observable =
Observable.Create(fun (o:IObserver<'a>) ->
let next b =
for v in replay do
o.OnNext(v.Head)
o.OnNext(b)
o.OnCompleted()
someOtherObs.Subscribe(next, o.OnError, o.OnCompleted))
let toReturn = observable.Concat(someOtherObs).Publish().RefCount()
Case 2.
let toReturn =
Observable.Create(fun (o:IObserver<'a>) ->
for v in replay do
o.OnNext(v.Head)
someOtherObs.Subscribe(o)
).Publish().RefCount()

Caveat! I don't use F# regularly enough to be 100% comfortable with the syntax, but I think I see what's going on.
That said, both of these cases look odd to me and it greatly depends on how someOtherObs is implemented, and where (in terms of threads) things are running.
Case 1 Analysis
You apply concat to a source stream which appears to work like this:
It subscribes to someOtherObs, and in response to the first event (a) it pushes the elements of replay to the observer.
Then it sends event (a) to the observer.
Then it completes. At this point the stream is finished and no further events are sent.
In the event that someOtherObs is empty or just has a single error, this will be propagated to the observer instead.
Now, when this stream completes, someOtherObs is concatenated on to it. What happens now is a little unpreditcable - if someOtherObs is cold, then the first event would be sent a second time, if someOtherObs is hot, then the first event is not resent, but there's a potential race condition around which event of the remainder will go next which depends on how someOtherObs is implemented. You could easily miss events if it's hot.
Case 2 Analysis
You replay all the replay events, and then send all the events of someOtherObs - but again there's a race condition if someOtherObs is hot because you only subscribe after pushing replay, and so might miss some events.
Comments
In either case, it seems messy to me.
This looks like an attempt to do a merge of a state of the world (sotw) and a live stream. In this case, you need to subscribe to the live stream first, and cache any events while you then acquire and push the sotw events. Once sotw is pushed, you push the cached events - being careful to de-dupe events that may been read in the sotw - until you are caught up with live at which point you can just pass live events though.
You can often get away with naive implementations that flush the live cache in an OnNext handler of the live stream subscription, effectively blocking the source while you flush - but you run the risk of applying too much back pressure to the live source if you have a large history and/or a fast moving live stream.
Some considerations for you to think on that will hopefully set you on the right path.
For reference, here is an extremely naïve and simplistic C# implementation I knocked up that compiles in LINQPad with rx-main nuget package. Production ready implementations I have done in the past can get quite complex:
void Main()
{
// asynchronously produce a list from 1 to 10
Func<Task<List<int>>> sotw =
() => Task<List<int>>.Run(() => Enumerable.Range(1, 10).ToList());
// a stream of 5 to 15
var live = Observable.Range(5, 10);
// outputs 1 to 15
live.MergeSotwWithLive(sotw).Subscribe(Console.WriteLine);
}
// Define other methods and classes here
public static class ObservableExtensions
{
public static IObservable<TSource> MergeSotwWithLive<TSource>(
this IObservable<TSource> live,
Func<Task<List<TSource>>> sotwFactory)
{
return Observable.Create<TSource>(async o =>
{
// Naïve indefinite caching, no error checking anywhere
var liveReplay = new ReplaySubject<TSource>();
live.Subscribe(liveReplay);
// No error checking, no timeout, no cancellation support
var sotw = await sotwFactory();
foreach(var evt in sotw)
{
o.OnNext(evt);
}
// note naive disposal
// and extremely naive de-duping (it really needs to compare
// on some unique id)
// we are only supporting disposal once the sotw is sent
return liveReplay.Where(evt => !sotw.Any(s => s.Equals(evt)))
.Subscribe(o);
});
}
}

Related

How are Cold Streams able to work properly in a concurrent environment, while obeying "Item 79" in "Effective Java"?

In summary:
The cascade effect nature of the Cold Stream, from Inactive to Active, Is in itself an "alien" execution (alien to the reactive design) that MUST BE EXECUTED WITHIN THE SYNCRHONIZED REGION, and this is unavoidable, going against Item 79 of Effective Java.
Effective Java: Item 79:
"..., to avoid deadlock and data corruption, never call an alien
method from within a synchronized region. More generally, keep the
amount of work that you do from within synchronized to a minimum."
never call an alien
method from within a synchronized region
An add(Consumer<T> observer) AND remove(Consumer<T> observer) WILL BE concurrent (because of switchMaps that react to asynchronous changes in values/states), BUT according to Item 79, it should not be possible for a subscribe(Publisher p); method to even exist.
Since a subscribe(publisher) MUST WORK as a callback function that reacts to additions and removals of observers...
private final Object lock = new Object();
private volatile BooleanConsumer suscriptor;
public void subscribe(Publisher p) {
syncrhonized (lock) {
suscriptor = isActive -> {
if (isActive) p.add(this);
else p.remove(this);
}
}
}
public void add(Consumer<T> observer) {
syncrhonized (lock) {
observers.add(observer);
if (observer.size() > 0) suscriptor.accept(true);
}
}
I would argue that using a volatile mediator is better than holding on to the Publisher directly, but holding on to the publisher makes no difference at all, because by altering its state (when adding ourselves to the publisher) we are triggering the functions (other possible subscriptions to publishers) within it!!!, There really is no difference.
Doing it via misdirection is the proper answer, and doing so is the main idea behind the separation of concerns principle!!
Instead, what Item 79 is asking, is that each time an observer is added, we manually synchronize FROM THE OUT/ALIEN-SIDE, and deliberately check whether a subscription must be performed.
synchronized (alienLock) {
observable.add(observer);
if (observable.getObserverSize() > 0) {
publishier.add(observable);
}
}
and each time an observer is removed:
synchronized (alienLock) {
observable.remove(observer);
if (observable.getObserverSize() == 0) {
publishier.remove(observable);
}
}
Imagine those lines repeated EACH and EVERY TIME a node forks or joins on to a new one (in the reactive graph), it would be an insane amount of boilerplate defeating the entire purpose.
Reading carefully the item, you can see that the rule is there to prevent a "something" done wrong by the user that hangs the thread preventing accesses.
And this answer will be part me trying to justify why this is possible but also a non-issue in this case.
Binary Events.
In this case an event that involves only 2 states, isActive == true || false;
This means that if the consumption gets "hanged" on true, the only other option that may be waiting is "false", BUT even worst...
IF one of the two becomes deadlocked, it means the entire system is deadlocked anyways... in reality the issue is outside the design, not inside.
What I mean is that out of the 2 possible options: true or false. the time it takes for either of them to execute is meaningless since the ONLY OTHER OPTION IS STILL REQUIRED TO WAIT regardless.
Enclosed functionality of the lock.
Even if subscribe(Publisher p) methods can be concatenated, the only thing the user has access to, IS NOT THE lock per se, but the method.
So even If we are executing "alien methods" with functions, inside our synchronized bodies, THOSE FUNCTIONS ARE STILL OURS, and we know what they have inside them and how they work and what exactly they will perform.
In this case the only uncertainty in the system is not what the functions will do, but HOW MANY CONTATENATIONS THE SYSTEM HAS.
What's wrong in my code:
Finally, the only thing (I see) wrong, is that observers and subscriptions MOST DEFINITEY WORK IN SEPARATE LOCKS, because observers MUST NOT, under any circumstance should allow themselves to get locked while a subscription domino effect is taking place.
I believe that's all...

How to pass native void pointers to a Dart Isolate - without copying?

I am working on exposing an audio library (C library) for Dart. To trigger the audio engine, it requires a few initializations steps (non blocking for UI), then audio processing is triggered with a perform function, which is blocking (audio processing is a heavy task). That is why I came to read about Dart isolates.
My first thought was that I only needed to call the performance method in the isolate, but it doesn't seem possible, since the perform function takes the engine state as first argument - this engine state is an opaque pointer ( Pointer in dart:ffi ). When trying to pass engine state to a new isolate with compute function, Dart VM returns an error - it cannot pass C pointers to an isolate.
I could not find a way to pass this data to the isolate, I assume this is due to the separate memory of main isolate and the one I'm creating.
So, I should probably manage the entire engine state in the isolate which means :
Create the engine state
Initialize it with some options (strings)
trigger the perform function
control audio at runtime
I couldn't find any example on how to perform this actions in the isolate, but triggered from main thread/isolate. Neither on how to manage isolate memory (keep the engine state, and use it). Of course I could do
Here is a non-isolated example of what I want to do :
Pointer<Void> engineState = createEngineState();
initEngine(engineState, parametersString);
startEngine(engineState);
perform(engineState);
And at runtime, triggered by UI actions (like slider value changed, or button clicked) :
setEngineControl(engineState, valueToSet);
double controleValue = getEngineControl(engineState);
The engine state could be encapsulated in a class, I don't think it really matters here.
Whether it is a class or an opaque datatype, I can't find how to manage and keep this state, and perform triggers from main thread (processed in isolate). Any idea ?
In advance, thanks.
PS: I notice, while writing, that my question/explaination may not be precise, I have to say I'm a bit lost here, since I never used Dart Isolates. Please tell me if some information is missing.
EDIT April 24th :
It seems to be working with creating and managing object state inside the Isolate. But the main problem isn't solved. Because the perform method is actually blocking while it is not completed, there is no way to still receive messages in the isolate.
An option I thought first was to use the performBlock method, which only performs a block of audio samples. Like this :
while(performBlock(engineState)) {
// listen messages, and do something
}
But this doesn't seem to work, process is still blocked until audio performance finishes. Even if this loop is called in an async method in the isolate, it blocks, and no message are read.
I now think about the possibility to pass the Pointer<Void> managed in main isolate to another, that would then be the worker (for perform method only), and then be able to trigger some control methods from main isolate.
The isolate Dart package provides a registry sub library to manage some shared memory. But it is still impossible to pass void pointer between isolates.
[ERROR:flutter/lib/ui/ui_dart_state.cc(157)] Unhandled Exception: Invalid argument(s): Native objects (from dart:ffi) such as Pointers and Structs cannot be passed between isolates.
Has anyone already met this kind of situation ?
It is possible to get an address which this Pointer points to as a number and construct a new Pointer from this address (see Pointer.address and Pointer.fromAddress()). Since numbers can freely be passed between isolates, this can be used to pass native pointers between them.
In your case that could be done, for example, like this (I used Flutter's compute to make the example a bit simpler but that would apparently work with explicitly using Send/ReceivePorts as well)
// Callback to be used in a backround isolate.
// Returns address of the new engine.
int initEngine(String parameters) {
Pointer<Void> engineState = createEngineState();
initEngine(engineState, parameters);
startEngine(engineState);
return engineState.address;
}
// Callback to be used in a backround isolate.
// Does whichever processing is needed using the given engine.
void processWithEngine(int engineStateAddress) {
final engineState = Pointer<Void>.fromAddress(engineStateAddress);
process(engineState);
}
void main() {
// Initialize the engine in a background isolate.
final address = compute(initEngine, "parameters");
final engineState = Pointer<Void>.fromAddress(address);
// Do some heavy computation in a background isolate using the engine.
compute(processWithEngine, engineState.address);
}
I ended up doing the processing of callbacks inside the audio loop itself.
while(performAudio())
{
tasks.forEach((String key, List<int> value) {
double val = getCallback(key);
value.forEach((int element) {
callbackPort.send([element, val]);
});
});
}
Where the 'val' is the thing you want to send to callback. The list of int 'value' is a list of callback index.
Let's say you audio loop performs with vector size of 512 samples, you will be able to pass your callbacks after every 512 audio samples are processed, which means 48000 / 512 times per second (assuming you sample rate is 48000). This method is not the best one but it works, I still have to see if it works in very intensive processing context though. Here, it has been thought for realtime audio, but it could work the same for audio rendering.
You can see the full code here : https://framagit.org/johannphilippe/csounddart/-/blob/master/lib/csoundnative.dart

Dart: Do I have to cancel Stream subscriptions and close StreamSinks?

I know I have to cancel Stream Subscriptions when I no longer want to receive any events.
Do I have to this even after I receive a 'Done' event? Or do I get memory leaks?
What happens to Streams that are passed to addStream of another Stream? Are they automatically canceled?
Same Question on the StreamSink side do I have to close them if the stream is already done?
Short-answer: no, but you should. Nothing in the contract of either StreamSubscription or StreamSink requires closing the resources, but some use cases can lead to memory leaks if you don't close them, even though in some cases, doing so might be confusing. Part of the confusion around these classes is that they are overloaded, and handle two fairly distinct use cases:
Resource streams (like file I/O, network access)
Event streams (like click handlers)
Let's tackle these subjects one at a time, first, StreamSubscription:
StreamSubscription
When you listen to a Stream, you receive a StreamSubscription. In general, when you are done listening to that Stream, for any reason, you should close the subscription. Not all streams will leak memory if choose not to, but, some will - for example, if you are reading input from a file, not closing the stream means the handle to the file may remain open.
So, while not strictly required, I'd always cancel when done accessing the stream.
StreamSink
The most common implementation of StreamSink is StreamController, which is a programmatic interface to creating a Stream. In general, when your stream is complete (i.e. all data emitted), you should close the controller.
Here is where it gets a little confusing. Let's look at those two cases:
File I/O
Imagine you were creating an API to asynchronously read a File line-by-line:
Stream<String> readLines(String path);
To implement this, you might use a StreamController:
Stream<String> readLines(String path) {
SomeFileResource someResource;
StreamController<String> controller;
controller = new StreamController<String>(
onListen: () {
someResource = new SomeFileResource(path);
// TODO: Implement adding to the controller.
},
);
return controller.stream;
}
In this case, it would make lots of sense to close the controller when the last line has been read. This gives a signal to the user (a done event) that the file has been read, and is meaningful (you can close the File resource at that time, for example).
Events
Imagine you were creating an API to listen to news articles on HackerNews:
Stream<String> readHackerNews();
Here it makes less sense to close the underlying sink/controller. Does HackerNews ever stop? Event streams like this (or click handlers in UI programs) don't traditionally "stop" without the user accessing for it (i.e cancelling the StreamSubscription).
You could close the controller when you are done, but it's not required.
Hope that makes sense and helps you out!
I found in my case that if I have code like this:
Stream<String> readHackerNews(String path) {
StreamController<String> controller = StreamController<String>();
......
return controller.stream;
}
I see a warning message "Close instance of 'dart.core.Sink'." in the Visual Studio Code.
In order to fix this warning I added
controller.close()
to the event handler for the OnCancel event, see below:
Stream<String> readHackerNews(String path) {
StreamController<String> controller = StreamController<String>();
//TODO: your code here
controller.onCancel = () {
controller.close();
};
return controller.stream;
}
Hope this helps!

How to buffer stream events?

I have a web component which subscribes to a stream.
Since the web component is re-created each time it's displayed, I have to clean up the subscriber and redo it.
Right now I am adding all subscribers to a list and in removed() life-cycle method I'm doing :
subscriptions.forEach((sub) => sub.cancel());
Now, to the problem: when the web component isn't displayed, there's no one listening to the stream. The issue is that the component is missing data/events when it's not displayed.
What I need is buffering. Events need to be buffered and sent at once when a listener is registered. According to the documentation, buffering happens until a listener is registered:
The controller will buffer all incoming events until the subscriber is registered.
This works, but the problem is that the listener will at some point removed, and re-registered, and it appears this does not trigger buffering.
It appears that buffering happens only initially, not later on even if all listeners are gone.
So the question is: how do I buffer in this situation where listeners may be gone and back?
Note: normally you shouldn't be able to resubscribe to a Stream that has already been closed. This seems to be a bug we forgot to fix.
I'm unfamiliar with web-components but I hope I'm addressing your problem with the following suggestion.
One way (and there are of course many) would be to create a new Stream for every subscriber (like html-events do) that pauses the original stream.
Say origin is the original Stream. Then implement a stream getter that returns a new Stream that is linked to origin:
Untested code.
Stream origin;
var _subscription;
final _listeners = new Set<StreamController>();
_addListener(controller) {
_listeners.add(controller);
if (_subscription == null) {
_subscription = origin.listen((event) {
// When we emit the event we want listeners to be able to unsubscribe
// or add new listeners. In order to avoid ConcurrentModificationErrors
// we need to make sure that the _listeners set is not modified while
// we are iterating over it with forEach. Here we just create a copy with
// toList().
// Alternatively (more efficient) we could also queue subscription
// modification requests and do them after the forEach.
_listeners.toList().forEach((c) => c.add(event));
});
}
_subscription.resume(); // Just in case it was paused.
}
_removeListener(controller) {
_listeners.remove(controller);
if (_listeners.isEmpty) _subscription.pause();
}
Stream get stream {
var controller;
controller = new StreamController(
onListen: () => _addListener(controller),
onCancel: () => _removeListener(controller));
return controller.stream;
}
If you need to buffer events immediately you need to start the subscription right away and not lazily as in the sample code.

One thread showing interest in another thread (consumer / producer)

I would like to have to possibility to make thread (consumer) express interest in when another thread (producer) makes something. But not all the time.
Basically I want to make a one-shot consumer. Ideally the producer through would go merrily about its business until one (or many) consumers signal that they want something, in which case the producer would push some data into a variable and signal that it has done so. The consumer will wait until the variable has become filled.
It must also be so that the one-shot consumer can decide that it has waited too long and abandon the wait (a la pthread_cond_timedwait)
I've been reading many articles and SO questions about different ways to synchronize threads. Currently I'm leaning towards a condition variable approach.
I would like to know if this is a good way to go about it (being a novice at thread programming I probably have quite a few bugs in there), or if it perhaps would be better to (ab)use semaphores for this situation? Or something else entirely? Just an atomic assign to a pointer variable if available? I currently don't see how these would work safely, probably because I'm trying to stay on the safe side, this application is supposed to run for months, without locking up. Can I do without the mutexes in the producer? i.e.: just signal a condition variable?
My current code looks like this:
consumer {
pthread_mutex_lock(m);
pred = true; /* signal interest */
while (pred) {
/* wait a bit and hopefully get an answer before timing out */
pthread_cond_timedwait(c, m, t);
/* it is possible that the producer never produces anything, in which
case the pred will stay true, we must "designal" interest here,
unfortunately the also means that a spurious wake could make us miss
a good answer, no? How to combat this? */
pred = false;
}
/* if we got here that means either an answer is available or we timed out */
//... (do things with answer if not timed out, otherwise assign default answer)
pthread_mutex_unlock(m);
}
/* this thread is always producing, but it doesn't always have listeners */
producer {
pthread_mutex_lock(m);
/* if we have a listener */
if (pred) {
buffer = "work!";
pred = false;
pthread_cond_signal(c);
}
pthread_mutex_unlock(m);
}
NOTE: I'm on a modern linux and can make use of platform-specific functionality if necessary
NOTE2: I used the seemingly global variables m, c, and t. But these would be different for every consumer.
High-level recap
I want a thread to be able to register for an event, wait for it for a specified time and then carry on. Ideally it should be possible for more than one thread to register at the same time and all threads should get the same events (all events that came in the timespan).
What you want is something similar to a std::future in c++ (doc). A consumer requests a task to be performed by a producer using a specific function. That function creates a struct called future (or promise), holding a mutex, a condition variable associated with the task as well as a void pointer for the result, and returns it to the caller. It also put that struct, the task id, and the parameters (if any) in a work queue handled by the producer.
struct future_s {
pthread_mutex_t m;
pthread_cond_t c;
int flag;
void *result;
};
// basic task outline
struct task_s {
struct future_s result;
int taskid;
};
// specific "mytask" task
struct mytask_s {
struct future_s result;
int taskid;
int p1;
float p2;
};
future_s *do_mytask(int p1, float p2){
// allocate task data
struct mytask_s * t = alloc_task(sizeof(struct mytask_s));
t->p1 = p1;
t->p2 = p2;
t->taskid = MYTASK_ID;
task_queue_add(t);
return (struct future_s *)t;
}
Then the producer pull the task out of the queue, process it, and once terminated, put the result in the future and trigger the variable.
The consumer may wait for the future or do something else.
For a cancellable futures, include a flag in the struct to indicate that the task is cancelled. The future is then either:
delivered, the consumer is the owner and must deallocate it
cancelled, the producer remains the owner and disposes of it.
The producer must therefore check that the future has not been cancelled before triggering the condition variable.
For a "shared" future, the flag turns into a number of subscribers. If the number is above zero, the order must be delivered. The consumer owning the result is left to be decided between all consumers (First come first served? Is the result passed along to all consumers?).
Any access to the future struct must be mutexed (which works well with the condition variable).
Regarding the queues, they may be implemented using a linked list or an array (for versions with limited capacity). Since the functions creating the futures may be called concurrently, they have to be protected with a lock, which is usually implemented with a mutex.

Resources