There are many usages of Fuseable interface in Reactor source code but I can't find any reference what is it. Could someone explain it's purpose?
The Fuseable interface, and its containing interfaces define the contracts used for stream fusion. Stream fusion is a reactive streams optimisation.
Without any such optimisation (in "normal" execution if you will), each reactive operator:
Subscribes to a previous operator in the chain
Is notified when the subscriber has completed
Performs its operation
Notifies its subscribers
...and then the cycle repeats for all operators. This is fantastic for making sure everything stays non-blocking, but all of those asynchronous calls come with some amount of overhead.
"Stream fusion" (or "operator fusion") significantly reduces this overhead by performing two or more of the operations in one chunk (fusing them together as one unit), passing values between them using a Queue or similar rather than via subscriptions, eliminating this overhead. It's not always possible of course - it can't be done this way if running in parallel, when certain side effects come into play, etc. - but a neat optimisation when it is possible.
Related
I have a general efficiency question about dart streams.
I have a project that makes some use of them, but it has been proposed that we convert nearly everything (functions and data) to be dart streams. This is in order to achieve a fully reactive architecture.
I don't know how streams really work under the hood, so I don't really know if this kind of design comes with any kind of memory or computational overhead.
Thanks for your attention to this question.
There is an overhead. It's not necessarily big, but it's there.
Streams have a well-defined asynchronous behavior, and it's documented how they react to listeners being added, paused or cancelled, even if that happens while an event is being delivered (because, most often, that is when it happens).
Streams are asynchronous, which means there is a delay between adding an event to the stream (through a StreamController), and that event being received by the listener. That delay makes it necessary to store (buffer) the event, schedule a microtask, and then unbuffer the event and deliver it in that later microtask. Scheduling a microtask costs. There might be zones involved, which can cost extra.
On top of that, because the stream needs to be able to react to pause and cancel events in a timely manner, which means that each event delivery is also flanked by extra checks of whether the event handler has paused or cancelled. It's not a lot of overhead, but it's there.
For single-subscription streams, that's about it.
For broadcast streams, which can have multiple listeners, there can be a little extra overhead to handle new listeners being added while delivering the event. Again, not a lot, but it's there. The state-space for a stream is actually quite complicated.
(You can create "a synchoronous StreamController" which delivers events "immediately", but most of the time, you shouldn't. Those are not for avoiding asynchrony, they are for avoiding adding extra asynchronous delays when propagating already synchronous events, and should be used very carefully to avoid breaking code assuming that they won't get events in the middle of something else. A properly implemented reactive framework will use such controllers in their implementation, but that will not get rid of the original inherent delay of delivering the original asynchronous event.)
Now, performance is not absolute. Using streams everywhere might make your life easier, and if the performance is good enough for your application (it's not dominating the actual computations), then the increased development speed and maintainability might pay for itself. You should measure (and have repeatable benchmarks to measure) before making a decision about an implementation strategy based on performance alone.
I have a system, connected to financial markets, that makes a very heavy use of events.
All the code is structured as a cascade of events with filters, aggregations, etc in between.
Originally the system was written in C# and then ported to F# (which in retrospect was a great move) and events in the C# code got replaced by events in F# without giving it much thoughts.
I have heard about the observer pattern, but I haven't really gone through the topic. And recently, I have read, through some random browsing, about F#'s Mailbox processor.
I read this: Difference between Observer Pattern and Event-Driven Approach and I didn't get it, but apparently over 150 people voted that the answer wasn't too clear as well :)
In an article like this: https://hackernoon.com/observer-vs-pub-sub-pattern-50d3b27f838c it seems like the observer pattern is strictly identical to events...
At first glance, they seem to be solving the same kind of problems, just with different interfaces but that got me to think about 2 questions:
Is the mailbox processor really a thing being used? it seems to appear mostly in older documentation and, in the packages I'm using, I haven't come across any using it
Regarding the observer pattern, only one package across the sizeable amount we're using makes internal use of it, but everything else is just using basic events.
Are there specific use cases fitting the Observable pattern and the MailboxProcessor? Do they have features that are unique? or are they just syntactic help around events in the end?
As simplified as possible:
Mailbox
This is a minimal implementation of the actor model.
You post messages to a queue, and your loop reads the messages from the queue, one by one. Maybe it posts to another mailbox or it does something with the messages.
Any action can only take place when a message is received.
Posting to the queue is non-blocking, i.e, no back-pressure.
All exceptions are caught and exposed as an event on the mailbox. They are expected to be handled by the actor above it.
Other actor frameworks provide features like supervisors, contracts, failover, etc.
Events
Events are a language supported callback mechanism.
It's a simple implementation. You register a callback delegate, and when the event is raised, your delegate is called.
Delegates are called in the order they are added.
Events are blocking, and synchronous. The one delegate blocks, the rest are delayed.
Events are about writing code to respond to events, as opposed what came before it, which was polling.
The handler for an event is usually the final end-point for that event, and it usually has side-effects.
Sharing a handler is common. For example, ten buttons might have the same function handling clicks, because the sender of the event is known.
You handle exceptions by yourself, typically in the handler code
Observables
There's a source (Observable) which you can subscribe to with a sink (Observer). An observable represents a bounded or un-bounded stream of values. An unbounded stream (an Observable which never completes) seems similar to an event, but there are several important properties to Observables.
An Observable emits a series of notifications, which follows this contract:
OnNext* (OnError|OnCompleted)+
All notifications are serialized
Notifications may or may not be synchronous. There's no guarantee of back-pressure.
The value of Observables lies in the fact that they are compose-able.
An observable represents a stream of future notifications, operators act to transform this stream.
This approach is sometimes called complex event processing (CEP).
Exception handling is part of the pipeline, and there are many combinators to deal with it.
You typically never implement an Observer yourself. You use combinators to set up a pipeline which models the behavior you want.
My Azure Durable Function(Runtime V3) getting an average of 3M events per day. When it runs for two or three weeks it is getting slower and slower. When I remove two table storages(History & Instances) used by Durable Function Framework, it is getting better and works as expected. I hosted my function app in the consumption plan. And also inside my function app, I'm used Durabel Entities as well. In my code, I'm using sub orchestrators as well for the Fan-Out mechanism.
Is this problem possible when it comes to heavy workload? Do I need to clear those table storages from time to time or do I need to Delete the state of completed entities inside my Durable Entity Function?
Someone, please help me
Yes, you should perform periodic clean-ups yourself by calling the PurgeInstanceHistoryAsync method. See a similar post on how to do this: https://stackoverflow.com/a/60894392
Also review any loops or Monitor patterns that you may have in your code.
Any looping logic, (like foreach, for or while loops) will replay from the initial startup state. Whilst the Durable Function replay architecture is very efficient at doing this, the code we write may not be optimised for repetitive iterations.
Durable Monitor Pattern is almost an Anti-Pattern. The concept is OK but it is easily misinterpreted and is open to abuse. It is designed for a low-frequency loop that polls an endpoint either for a set number of iterations or up until a finite time, or of course when the state of the endpoint being monitoried has changed. That state change will be the trigger to perform the rest of the operation.
It is NOT an example of how to use general or high frequency looping structures in Durable functions
It is NOT and example of how to implement a traditional HTTP endpoint reporting monitor in an infinite loop (while(true)) style, perhaps to record changes into a data store over time.
If your durable function logic has an iterator that may involve many iterations, consider migrating the iteration step to a sub-orchestration that uses the Eternal Orchestration pattern
My app code components often calls upon dependent components that sport asynchronous methods that return Q.js promises. I'd like to write synchronous test of such outer components whenever possible ... mostly because synchronous tests are more readable but also because it can be almost impossible to know when a dependent component is "ready" (as discussed below)
I've designed the dependent components so I can configure them to behave synchronously when under test. But their APIs still return Q.js promises. Even though such a promise will be fully resolved "immediately" (e.g., return Q(some_data);), Q guarantees that the promise won't actually resolve until the next tick. This (properly) ensures asynchronous behavior even when the time-to-resolution is effectively zero.
I get it.
But that means I can't write synchronous tests for the app components and I can't control when the ready-to-go promises resolve. I can't test the code at all when the dependent component doesn't expose the promise to the caller ... which it should not do when the method of the dependent component API should be fire-and-forget as is often the case.
It would be great if my test could tell Q that a "tick" had occurred, thus causing it to attempt to resolve queued promises. This idea is inspired by Angular's $q which has this feature baked in (you call $scope.$apply) for just this purpose.
I don't see any way to trigger a "tick" in Q today. For sure I do NOT want to monkey-punch setTimeout!
Is there a way I don't know about? Would this be a good feature?
It’s an intriguing idea. Q does not provide any way to force flush of the event queue today. There are probably good security and safety reasons to not provide this capability in most cases, and I would certainly encourage writing asynchronous tests for asynchronous systems. I am interested in hearing feedback on the idea from our panel of active contributors, so I have filed an issue to Q’s next-gen event queue implementation, ASAP (linked).
I'd like to delay the handling for some captured events in ActionScript until a certain time. Right now, I stick them in an Array when captured and go through it when needed, but this seems inefficient. Is there a better way to do this?
Well, to me this seems a clean and efficient way of doing that.
What do you mean by delaying? you mean simply processing them later, or processing them after a given time?
You can always set a timout to the actual processing function in your event handler (using flash.utils.setTimeout), to process the event at a precise moment in time. But that can become inefficient, since you may have many timeouts dangeling about, that need to be handled by the runtime.
Maybe you could specify your needs a little more.
edit:
Ok, basically, flash player is single threaded - that is bytecode execution is single threaded. And any event, that is dispatched, is processed immediatly, i.e. dispatchEvent(someEvent) will directly call all registered handlers (thus AS bytecode).
Now there are events, which actually are generated in the background. These come either from I/O (network, userinput) or timers (TimerEvents). It may happen, that some of these events actually occur, while bytecode is executed. This usually happens in a background thread, which passes the event (in the abstract sense of the term) to the main thread through a (de)queue.
If the main thread is busy executing bytecode, then it will ignore these messages until it is done (notice: nearly any bytecode execution is always the implicit consequence of an event (be it enter frame, or input, or timer or load operation or whatever)). When it is idle, it will look in all queues, until it finds an available message, wraps the information into an ActionScript Event object, and dispatches it as previously described.
Thus this queueing is a very low level mechanism, that comes from thread-to-thread communication (and appears in many multi-threading scenarios), and is inaccessible to you.
But as I said before, your approach both is valid and makes sense.
Store them into Vector instead of Array :p
I think it's all about how you structure your program, maybe you can assign the captured event under the related instance? So that it's all natural to process the captured event with it instead of querying from a global vector