What will project reactor do in this scenario? - project-reactor

In a very low transaction volume environment, such as some of our test environments, what will the project reactor classes do if the subscriber requests N new items to work on, but the publisher currently has less than N new items to send. Will the publisher (1) send less than N new items immediately, (2) wait until it has N new items to send, or (3) wait until a configurable timeout before it sends less than N new items? I'm trying to troubleshoot a problem, in one of these very low volume environments, which leads me to believe that something is stuck on the publisher side, even though the subscriber should be regularly requesting batches of work.

If the publisher can partially answer the request, it must send these x < N items. It must also remember that the remaining demand is currently at y = N-x, and send these elements as they become available even if the subscriber has made no further request.
If the publisher knows that it won't be able to fulfill the full demand, because it works with a limited dataset, then it must complete the sequence early with an onComplete after having emitted what it can.

Related

Anylogic Queueing Priority based

so the initial problem is that I have 2 products, one individual and one standard. After the separate products got produced, they will send out, but in the logistics department, there is only 1 worker. So how do I prioritize the individual product? The worker should always send out the individual good before the standard product.
I'm stuck because I have no idea how to queue, either agent comparison or priority-based, but how does the block knows which product is which?
Thx
Easiest approach:
add a parameter "myPriority" to your product agent type (integer)
when creating a individualized one, set it to 10. Else to 1
in your queue, set it up as below. This will ensure the queue always moves higher-prio agents to the front. Make sure your queue block expects agents of your Product type
Also check example models and the help :)

How to dedupe across over-lapping sliding windows in apache beam / dataflow

I have the following requirement:
read events from a pub sub topic
take a window of duration 30 mins and period 1 minute
in that window if 3 events for a given id all match match some predicate then i need to raise an event in a different pub sub topic
The event should be raised as soon as the 3rd event comes in for the grouping id as this is for detecting fraudulent behaviour. In one pane there many be many ids that have 3 events that match my predicate so i may need to emit multiple events per pane
I am able to write a function which consumes a PCollection does the necessary grouping, logic and filtering and emit events according to my business logic.
Questions:
The output PCollection contains duplicates due to the overlapping sliding windows. I understand this is the expected behaviour of sliding windows but how can I avoid this whilst staying in the same dataflow pipeline. I realise I could dedupe in an external system but that is just adding complexity to my system.
I also need to write some sort of trigger that fires each and every time my condition is reached in a window
Is dataflow suitable for this type of realtime detection scenario
Many thanks
You can rewindow the output PCollection into the global window (using the regular Window.into()) and dedupe using a GroupByKey.
It sounds like you're already returning the events of interest as a PCollection. In order to "do something for each event", all you need is a ParDo.of(whatever action you want) applied to this collection. Triggers do something else: they control what happens when a new value V arrives for a particular key K in a GroupByKey<K, V>: whether to drop the value, or buffer it, or to pass the buffered KV<K, Iterable<V>> for downstream processing.
Yes :)

How do we enforce privacy while providing tracing of provenance using multiple channels in Hyperledger Fabric v1.0?

In Hyperledger Fabric v0.6, a supply chain app can be implemented that allows tracing of provenance and avoids double-spending (i.e., distributing/selling items more than it has) and thus avoids counterfeit. As an example, when a supplier supplies 500 units of an item to a distributor, this data is stored in the ledger. The distributor can distribute a specified quantity of an item to a particular reseller by calling a "transfer" function. The transfer function does the following:
checks if the distributor has enough quantity of an item to distribute to a particular reseller (i.e., if quantity to transfer <= current quantity)
updates the ledger (i.e., deducts the current quantity of the distributor and adds this to the current quantity of the reseller)
With this approach, the distributor cannot distribute more (i.e., double spend) than what it has (e.g., distributing counterfeit/smuggled items).
In addition, a consumer can trace the provenance (e.g., an item was purchased from reseller1, which came from a distributor, which came from a supplier) by looking at the ledger.
However, since it uses a single ledger, privacy is an issue (e.g., reseller2 can see the quantity of items ordered by reseller1, etc.)
A proposed solution to impose privacy is to use multiple channels in Hyperledger Fabric v1.0. In this approach, a separate channel/ledger is used by the supplier and distributor. Similarly, a separate channel/ledger is used by the distributor and reseller1, and another separate channel/ledger for the distributor and reseller2.
However, since the resellers (i.e., reseller1 and reseller2) have no access the the channel/ledger of the supplier and distributor, the resellers have no idea of the real quantity supplied by the supplier to the distributor. For example, if the supplier supplied only 500 quantities to the distributor, the distributor can claim to the resellers that it procured 1000 quantities from the supplier. With this approach, double spending / counterfeiting will not be avoided.
In addition, how will tracing of provenance be implemented? Will a consumer be given access to all the channels/ledgers? If this is the case, then privacy becomes an issue again.
Given this, how can we use multiple channels in Hyperledger Fabric v1.0 while allowing tracing of provenance and prohibiting double spending?
As Artem points out, there is no straightforward way to do this today.
Chaincodes may read across channels, but only weakly, and they may not make the content of this read a contingency of the commit. Similarly, transactions across channels are not ordered, which creates other complications.
However, it should be possible to safely move an asset across channels, so long as there is at least one trusted participant in both channels. You can think of this as the regulatory or auditor role.
To accomplish this, the application would essentially have to implement a mutex on top of fabric which ensures a resource does not migrate to two different channels at once.
Consider a scenario with companies A, B, and regulator R. A is known to have control over an asset Q in channel A-R, and B wants to safely take control over asset Q in channel A-B-R.
To safely accomplish this the A may do the following:
A proposes to lock Q at sequence 0 in A-R to channel A-B-R. Accepted and committed.
A proposes the existence of Q at sequence 0 in A-B-R, endorsed by R (who performs a cross channel read to A-R to verify the asset is locked to A-B-R). Accepted and committed.
A proposes to transfer Q to B in A-B-R, at sequence 0. All check that the record for Q at sequence 0 exists, and includes it in their readset, then sets it to sequence 1 in their writeset.
Green path is done. Now, let's say instead that B decided not to purchase Q, and A wished to sell it to C. in A-C-R. We start assuming (1), (2), have completed above.
A proposes to remove asset Q from consideration in channel A-B-R. R reads Q at sequence 0, writes it at sequence 1, and marks it as unavailable.
A proposes to unlock asset Q in A-R. R performs a cross channel read in A-B-R and confirms that the sequence is 1, endorses the unlock in A-R.
A proposes the existence of Q at sequence 1 in A-C-R, and proceeds as in (1)
Attack path, assume (1), (2) are done once more.
A proposes the existence of Q at sequence 0 in A-C-R. R will read A-R and find it is not locked to A-C-R, will not endorse.
A proposes to remove the asset Q from consideration in A-R after a transaction in A-B-R has moved control to B. Both the move and unlock transaction read that value at the same version, so only one will succeed.
The key here, is that B trusts the regulator to enforce that Q cannot be unlocked in A-R until Q has been released in A-B-R. The unordered reads are fine across the channels, so long as you include a monotonic type sequence number to ensure that the asset is locked at the correct version.
At the moment there is no straight forward way of providing provenance across two different channels within Hyperledger Fabric 1.0. There few directions to support such scenarios:
First one is to have an ability to keep portions of the data of the ledger and provide discrete segregation within the channel, the work item described here: FAB-1151.
Additionally there is proposal of adding support for private data while maintaining the ability to proof existence and ownership of claimed asset was posted in mailing list.
What you can do currently is to leverage application side encryption to provide privacy and keep all related transactions on the same channel, e.g. same ledger (pretty much similar to approach you had back in v0.6).
Starting in v1.2,
Fabric offers the ability to create private data collections,
which allow a defined subset of organizations on a channel the ability
to endorse, commit, or query private data without having to create a
separate channel.
Now in your case, you can create a subset of your reseller data to be private to the particular entity without creating a separate channel.
More Info refer: Fabric Doc.

How do CEP rules engines store time data?

I'm thinking about designing an event processing system.
The rules per se are not the problem.
What bogs my is how to store event data so that I can efficiently answer questions/facts like:
If number of events of type A in the last 10 minutes equals N,
and the average events of type B per minute over the last M hours is Z,
and the current running average of another metric is Y...
then
fire some event (or store a new fact/event).
How do Esper/Drools/MS StreamInsight store their time dependant data so that they can efficiently calculate event stream properties? ¿Do they just store it in SQL databases and continuosly query them?
Do the preprocess the rules so they can know beforehand what "knowledge" they need to store?
Thanks
EDIT: I found what I want is called Event Stream Processing, and the wikipedia example shows what I would like to do:
WHEN Person.Gender EQUALS "man" AND Person.Clothes EQUALS "tuxedo"
FOLLOWED-BY
Person.Clothes EQUALS "gown" AND
(Church_Bell OR Rice_Flying)
WITHIN 2 hours
ACTION Wedding
Still the question remains: how do you implement such a data store? The key is "WITHIN 2 hours" and the ability to process thousands of events per second.
Esper analyzes the rule and only stores derived state (aggregations etc., if any) and if needed by the rule also a subset of events. Esper allows defining contexts like described in the book by Opher Etzion and Peter Niblet. I recommend reading. By specifying a context Esper can minimize the amount of state it retains and can make queries easier to read.
It's not difficult to store events happening within a time window of a certain length. The problem gets more difficult if you have to consider additional constraints: here an analysis of the rules is indicated so that you can maintain sets of events matching the constraints.
Storing events in an (external) database will be too slow.

wso2/ws02 CEP, ESPER or something else?

I have a use case where a system transaction happen/completed over a period of time and with multiple "building up" steps. each step in the process generates one or more events (up to 22 events per transaction). All events within a transaction have a shared and unique (uuid) correlation ID.
An example is for a transaction X: will have the building blocks of EventA, EventB, EventC... and all tagged with a unique correlation identifier.
The ultimate goal here is to switch from persisting all the separate events in an RDBMS and query a consolidated view (lots of joins) To: be able to persist only 1 encompassing transaction record that will consolidate attributes from each step in the transaction.
My research so far led me toward reading about Esper (Java stack here) and WSo2/WS02 CEP. In my case each event is submitted/enqueued into JMS, and I am wondering if a solution like WS02/WSo2 CEP can be used to consolidate JMS events/messages (streams) and based on correlation ID (and maximum time limit 30 min) produce one consolidated record and send it down JMS to ultimately persist in a DB.
Since I am still in research mode, I was wondering if I am on the right path for a solution?
Anybody achieved such thing using WS02/WSo2 CEP, or is it over kill ? other recommendations?
Thanks
-S
You can use WSO2 CEP by integrating that to JMS to send and receive events and by using Siddhi Pattern queries[1] to consolidate events arriving from the same transaction.
30 min is a reasonable time period and its recommended to test the scenario with some test data set because you must need enough memory in the servers for CEP to handle the states. This will greatly depend on the event rate.
AFAIK this is not an over kill in a enterprise deployment.
[1]https://docs.wso2.com/display/CEP200/Patterns
I would recommend you to try esper patterns. For multievent based system where some particular information is to be collected patterns works the best way.
A sample example would be:
select * from TemperatureEvent
match_recognize (
measures A as temp1, B as temp2, C as temp3, D as temp4
pattern (A B C D)
define
A as A.temperature > 100,
B as (A.temperature < B.value),
C as (B.temperature < C.value),
D as (C.temperature < D.value) and D.value >
(A.value * 1.5))
Here, we have 4 events and 5 conditions involving these events. Example is taken from demo project.

Resources