Disruptor pattern with pull even handler - buffer

I'm going to improve performance of existing solution introducing disruptor pattern. Existing solution contains single producer and consumer with shared byte buffer. In this scenario producer push data to shared buffer and consumer pull data from the buffer but disruptor pattern doesn't support to the pull event handler. Disruptor events handlers are being triggered automatically by the disruptor framework. any suggestions?

Related

error handling in data pipeline using project reactor

I'm writing a data pipeline using Reactor and Reactor Kafka and use spring's Message<> to save
the ReceiverOffset of ReceiverRecord in the headers, to be able to use ReciverOffset.acknowledge() when finish processing. I'm also using the out-of-order commit feature enabled.
When an event process fails I want to be able to log the error, write to another topic that represents all the failure events, and commit to the source topic. I'm currently solving that by returning Either<Message<Error>,Message<myPojo>> from each processing stage, that way the stream will not be stopped by exceptions and I'm able to save the original event headers and eventually commit the failed messages at the button of the pipeline.
The problem is that each step of the pipline gets Either<> as input and needs to filter the previous errors, apply the logic only on the Either.right and that could be cumbersome, especially when working with buffers and the operator get 'List<Either<>>' as input. So I would want to keep my business pipeline clean and get only Message<MyPojo> as input but also not missing errors that need to be handled.
I read that sending those message erros to other channel or stream is a soulution for that.
Spring Integration uses that pattern for error handling and I also read an article (link to article) that solves this problem in Akka Streams using 'divertTo()':
I couldn't find documentation or code examples of how to implement that in Reactor,
is there any way to use Spring Integration error channel with Reactor? or any other ideas to implement that?
Not familiar with reactor per se, but you can keep the stream linear. The trick, since Vavr's Either is right-biased is to use flatMap, which would take a function from Message<MyPojo> to Either<Message<Error>, Message<MyPojo>>. If the Either coming in is a right (i.e. a Message<MyPojo>, the function gets invoked and otherwise it just gets passed through.
// Apologies if the Java is atrocious... haven't written Java since pre-Java 8
incomingEither.flatMap(
myPojoMessage -> ... // compute a new Either
)
Presumably at some point you want to do something (publish to a dead-letter topic, tickle metrics, whatever) with the Message<Error> case, so for that, orElseRun will come in handy.

How can I implement throttling by the message value using MassTransit? (backend is SNS/SQS but flexible)

I'm interested in using MassTransit as the event bus to help me bust a cache, but I'm not sure how to properly throttle the service.
The situation
I have a .Net service that has a refreshCache(itemId) API which recomputes the cache for itemId. I want to call this whenever code in my organization modifies any data related to itemId.
However, due to legacy code, I may have 10 events for a given itemId emitted within the same second. Since the refreshCache(itemId) call is expensive, I'd prefer to only call it once every second or so per itemId.
For instance, imagine that I have 10 events emitted for item1 and then 1 event emitted for item2. I'd like refreshCache to be called twice, once with item1 and once with item2.
Trouble with MassTransit
I could send event messages that essentially are just itemId over SNS/SQS, and the .Net service could use a MassTransit consumer to listen to that SQS queue and call refreshCache for each message. Ideally, I can also throttle either in SNS/SQS or MassTransit.
I've read these docs: https://masstransit-project.com/advanced/middleware/rate-limiter.html and have tried to find the middleware in the code but wasn't able to locate it.
They seem to suggest that the rate-limiting just delays the delivery of messages, which means that my refreshCache would get called 10 times with item1 before getting called with item2. Instead, I'd prefer it get called once per item, ideally both immediately.
Similarly, it seems as if SNS and SQS can either rate-limit in-order delivery or throttle based on the queue but not based on the contents of that queue. It would not be feasible for me to have separate queues per itemId, as there will be 100,000+ distinct itemIds.
The Ask
Is what I'm trying to do possible in MassTransit? If not, is it possible via SQS? I'm also able to be creative with using RabbitMQ or adding in Lambdas, but would prefer to keep it simple.

what are streams in Dart

What is the difference between async and streams, where we should use streams instead of async, in Dart language.As descried in the official documentation stream represents the sequence of data.
Async execution is registering a callback that is called when some other computation completes.
This can be a an operating system like file.readAsString() , or an HTTP request to a server where the client continues executing UI rendering (or other things) and when the response from the server arrives, your code gets called to process the response.
In Dart you usually get a Future back from such async call, where you can register a callback using .then(/* pass callback here */).
async and await is syntactic sugar so you don't need to clutter your code with .then(...).then(...).
A stream can be sync or async, but async means something different here than the async explained above.
A stream is similar to a Future in some ways, but the callback can be called more than once if multiple events are emitted, until the sender or receiver closes the stream.
An async execution completes a Future once when it's done and that was it.
A stream can also be seen as iterable like an array, but where the items are pushed instead of pulled.
A main difference is also that there are many operators available for streams to map streams, fork and join multiple streams, and many more.
Many of these operators remind of methods available for collections like arrays, because as mentioned, a stream has similarities to iterables.
Streams compose well, and with the set of available operators, stream allow a kind of declarative programming which can be quite powerful, where a lot can be achived with a few streams and operators combined well.

Initial state for a dataflow job

I'm trying to figure out how we "seed" the window state for some of our streaming dataflow jobs. Scenario is we have a stream of forum messages, we want to emit a running count of messages for each topic for all time, so we have a streaming dataflow job with a global window and triggers to emit each time a record for a topic comes in. All good so far. But prior to the stream source, we have a large file which we'd like to process to get our historical counts, also, because topics live forever, we need the historical count to inform the outputs from the stream source, so we kind've need the same logic to run over the file, then start running over the stream source when the file is exhausted, while keeping the window state.
Current ideas:
Write a custom unbounded source that does just that. Reads over the file until it's exhausted and then starts reading from the stream. Not much fun because writing custom sources is not much fun.
Run the logic in batch mode over the file, and as the last step emit the state to a stream sink somehow, then have a streaming version of the logic start up that reads from both the state stream and the data stream, and somehow combines the two. This seems to make some sense, but not sure how to make sure that the streaming job reads everything from the state source, to initialise, before reading from the data stream.
Pipe the historical data into a stream, write a job that reads from both the streams. Same problems as the second solution, not sure how to make sure one stream is "consumed" first.
EDIT: Latest option, and what we're going with, is to write the calculation job such that it doesn't matter at all what order the events arrive in, so we'll just push the archive to the pub/sub topic and it will all work. That works in this case, but obviously it affects the downstream consumer (need to either support updates or retractions) so I'd be interested to know what other solutions people have for seeding their window states.
You can do what you suggested in bullet point 2 --- run two pipelines (in the same main), with the first that populates a pubsub topic from the large file. This is similar to what the StreamingWordExtract example does.

dart vm send back stream from isolate

there is similar question (how to process a HTTP stream with Dart) about processing streams in dart2js. I am focused on vm.
I can read from spec that:
The content of message can be: primitive values (null, num, bool, double, String), instances of SendPort, and lists and maps whose elements are any of these. List and maps are also allowed to be cyclic.
In the special circumstances when two isolates share the same code and are running in the same process (e.g. isolates created via spawn), it is also possible to send object instances (which would be copied in the process). This is currently only supported by the dartvm. For now, the dart2js compiler only supports the restricted messages described above.
I ve learnt I cannot send to isolates and back following objects: HttpRequest and HttpResponse objects, I cannot send streams.
Q: I cannot really understand how should I process big chunk of data in isolate and then send it back to main isolate and in turn it can be send back to client.
Normally if I want to read a file. I can obtain a stream, apply transforms and then pipe a stream to http response. what is best practice to do that using isolates?
thanks for help!
I have provided an example but Ill give a quick overview on how you can achieve this although it may not necessarily be the best practise - its simply the only way I personally know how.
We are given a method called Isolate.spawnUri(Uri uri, List<String> args, dynamic message)
The message parameter can hold any of the values the extract in your first post mentions. What we want to do is in the main thread create a new ReceivePort and listen for incoming data, then we want to spawn an isolate with the message as the ReceivePorts .sendPort.
The isolate should then create its own ReceivePort and use the message value to send back its own sendPort and listen on its receive port. What this essentially does is creates a 2 way communication. Allowing us to keep the isolate alive and send work back and forth.
Any response from your isolate will come through your original receive port and any data you want to send to the isolate will be through the send port the isolate just sent back.
You can then directly send the stream data to the isolate for it to process at will and it can send back data as its made (or you can just close the ports after isolate has sent its response - up to you how you want to go about it).
I have provided an example on GitHub:
https://github.com/TomCaserta/ExampleIsolate
Please note, if you want your isolates to work in dart2js its important that you run dart2js on each of your isolate files via the following command:
dart2js -o <filename>.dart.js <filename>.dart

Resources