How do we enforce privacy while providing tracing of provenance using multiple channels in Hyperledger Fabric v1.0? - hyperledger

In Hyperledger Fabric v0.6, a supply chain app can be implemented that allows tracing of provenance and avoids double-spending (i.e., distributing/selling items more than it has) and thus avoids counterfeit. As an example, when a supplier supplies 500 units of an item to a distributor, this data is stored in the ledger. The distributor can distribute a specified quantity of an item to a particular reseller by calling a "transfer" function. The transfer function does the following:
checks if the distributor has enough quantity of an item to distribute to a particular reseller (i.e., if quantity to transfer <= current quantity)
updates the ledger (i.e., deducts the current quantity of the distributor and adds this to the current quantity of the reseller)
With this approach, the distributor cannot distribute more (i.e., double spend) than what it has (e.g., distributing counterfeit/smuggled items).
In addition, a consumer can trace the provenance (e.g., an item was purchased from reseller1, which came from a distributor, which came from a supplier) by looking at the ledger.
However, since it uses a single ledger, privacy is an issue (e.g., reseller2 can see the quantity of items ordered by reseller1, etc.)
A proposed solution to impose privacy is to use multiple channels in Hyperledger Fabric v1.0. In this approach, a separate channel/ledger is used by the supplier and distributor. Similarly, a separate channel/ledger is used by the distributor and reseller1, and another separate channel/ledger for the distributor and reseller2.
However, since the resellers (i.e., reseller1 and reseller2) have no access the the channel/ledger of the supplier and distributor, the resellers have no idea of the real quantity supplied by the supplier to the distributor. For example, if the supplier supplied only 500 quantities to the distributor, the distributor can claim to the resellers that it procured 1000 quantities from the supplier. With this approach, double spending / counterfeiting will not be avoided.
In addition, how will tracing of provenance be implemented? Will a consumer be given access to all the channels/ledgers? If this is the case, then privacy becomes an issue again.
Given this, how can we use multiple channels in Hyperledger Fabric v1.0 while allowing tracing of provenance and prohibiting double spending?

As Artem points out, there is no straightforward way to do this today.
Chaincodes may read across channels, but only weakly, and they may not make the content of this read a contingency of the commit. Similarly, transactions across channels are not ordered, which creates other complications.
However, it should be possible to safely move an asset across channels, so long as there is at least one trusted participant in both channels. You can think of this as the regulatory or auditor role.
To accomplish this, the application would essentially have to implement a mutex on top of fabric which ensures a resource does not migrate to two different channels at once.
Consider a scenario with companies A, B, and regulator R. A is known to have control over an asset Q in channel A-R, and B wants to safely take control over asset Q in channel A-B-R.
To safely accomplish this the A may do the following:
A proposes to lock Q at sequence 0 in A-R to channel A-B-R. Accepted and committed.
A proposes the existence of Q at sequence 0 in A-B-R, endorsed by R (who performs a cross channel read to A-R to verify the asset is locked to A-B-R). Accepted and committed.
A proposes to transfer Q to B in A-B-R, at sequence 0. All check that the record for Q at sequence 0 exists, and includes it in their readset, then sets it to sequence 1 in their writeset.
Green path is done. Now, let's say instead that B decided not to purchase Q, and A wished to sell it to C. in A-C-R. We start assuming (1), (2), have completed above.
A proposes to remove asset Q from consideration in channel A-B-R. R reads Q at sequence 0, writes it at sequence 1, and marks it as unavailable.
A proposes to unlock asset Q in A-R. R performs a cross channel read in A-B-R and confirms that the sequence is 1, endorses the unlock in A-R.
A proposes the existence of Q at sequence 1 in A-C-R, and proceeds as in (1)
Attack path, assume (1), (2) are done once more.
A proposes the existence of Q at sequence 0 in A-C-R. R will read A-R and find it is not locked to A-C-R, will not endorse.
A proposes to remove the asset Q from consideration in A-R after a transaction in A-B-R has moved control to B. Both the move and unlock transaction read that value at the same version, so only one will succeed.
The key here, is that B trusts the regulator to enforce that Q cannot be unlocked in A-R until Q has been released in A-B-R. The unordered reads are fine across the channels, so long as you include a monotonic type sequence number to ensure that the asset is locked at the correct version.

At the moment there is no straight forward way of providing provenance across two different channels within Hyperledger Fabric 1.0. There few directions to support such scenarios:
First one is to have an ability to keep portions of the data of the ledger and provide discrete segregation within the channel, the work item described here: FAB-1151.
Additionally there is proposal of adding support for private data while maintaining the ability to proof existence and ownership of claimed asset was posted in mailing list.
What you can do currently is to leverage application side encryption to provide privacy and keep all related transactions on the same channel, e.g. same ledger (pretty much similar to approach you had back in v0.6).

Starting in v1.2,
Fabric offers the ability to create private data collections,
which allow a defined subset of organizations on a channel the ability
to endorse, commit, or query private data without having to create a
separate channel.
Now in your case, you can create a subset of your reseller data to be private to the particular entity without creating a separate channel.
More Info refer: Fabric Doc.


Device Delete event Handling in Rule chain being able to reduce the total device count at Customer Level

I am using total count of devices as the "server attributes" at customer entity level that is in turn being used for Dashboard widgets like in "Doughnut charts". Hence to get the total count information, I have put a rule chain in place that handles "Device" Addition / Assignment event to increment the "totalDeviceCount" attribute at customer level. But when the device is getting deleted / UnAssigned , I am unable to get access to the Customer entity using "Enrichment" node as the relation is already removed at the trigger of this event. With this I have the challenge of maintaining the right count information for the widgets.
Has anyone come across similar requirement? How to handle this scenario?
Has anyone come across similar requirement? How to handle this scenario?
What you could do is to count your devices periodically, instead of tracking each individual addition/removal.
This you can achieve using the Aggregate Latest Node, where you can indicate a period (say, each minute), the entity or devices you want to count, and to which variable name you want to save it.
This node outputs a POST_TELEMETRY_REQUEST. If you are ok with that then just route that node to Save Timeseries. If you want an attribute, route that node to a Script Transformation Node and change the msgType to POST_ATTRIBUTE_REQUEST.

Is Sales Transaction modeled as Hub or a Link in Data Vault 2.0

I'm a rookie in Data Vault, so please excuse my ignorance. I am currently ramping up and modeling Raw Data Vault in parallel using Data Vault 2.0. I have few assumptions and need help validating them.
1) Individual Hubs are modeled for:
a) Product(holds pk-Product_Hkey, BK,Metadata),
b) Customer(holds pk-Customer_Hkey,BK,Metadata),
c) Store(holds pk-Store_Hkey,BK,Metadata).
Now a Sales Txn's that involves all the above Business Objects should be modeled as a Link Table
d) Link table- Sales_Link(holds pk-Sales_Hkey, Sales Txn ID, Product_Hkey(fk), Customer_Hkey(fk), Store_Hkey(fk), Metadata) and a Satellite needs to be associated to Link table holding some descriptive data about Link.
Is the above approach valid ?
My rationale for the above Link Table is because
I consider Sales Txn ID as a non-BK & hence
Sales Txn's must be hosted in a Link as opposed to hub.
2) Operational data has different types of customers.(Retail, Professional). All customers (agnostic to types) should be modeled in one hub & this distinction of customer types should be made by modeling different Satellites(one for retail, one for Professional) tied to Customer hub.
Is the above valid?
I have researched online technical forums, but got conflicting theories, so I'm posting it here.
There is no code applicable here
I would suggest to model sales as Hub if you are fine with below points else link is perfectly good design..
Sales transaction as a hub (Sales_Hub) :
Whats business key? Can you consider "Sales Txn ID"(unique number) as a BK.
Is this hub or the same BK used in another Link (except Sales_Link) i.e. link on link.
Are you ok with Sales_Link with no satellite, since all the descriptive exists in Sales_Hub.
Also it will store same BK+Audit metadata info in two places (Hub/Link) and addition joins to fetch data from Hub-satellite.
Is valid when
Customer information (retail,professional..etc) stored in separate tables at source(s) system.
You should model a satellite if the data is coming thru single source table then you apply soft rules to bifurcate them into their type in Business data vault.

Neo4J - Which is better to store element as a property of user or as a node & relationship?

I got a problem when designing a graph model with million users. I need to store information that user is registered or non-register.
As I see we have 2 options:
Store a property "register = true/false" in each user node. So with 1 million user, we have 1 million properties "register".
Store a Registered node then make relationship just for registered user to this node. So we have number of relationship equal exactly with the registered user.
Which option is better in performance searching also about minimum storage?
Thanks in advance,
Modeling your data as a graph is a difficult thing to pin down exactly. Typically, when it comes to NoSQL databases, the most important thing to consider is how you will be using your data, and to model it based on that.
Using the external node might run into performance problems, as Neo4J typically starts to run into issues during traversing as it approaches around 10,000 relationships in a single node. You will be well above that limit with an external "Registered" node; on the other hand as long as you are not anchoring your search to that node, it should be okay.
No matter which route you go, the query you described in the comments will likely anchor on (start with) the user, then traverse to who their friends are, and then for each friend, it will check whether it
A. has the "registered" property set to 'true'
B. has a relationship to the "Registered" node.
Each of these methods appears to have a similar execution time, and indexing on the "registered" property will have negligible impact because it is not being used as an anchor (presumably; you would have to PROFILE your query with both methods to find out for sure). So, like you mentioned, one might consider the space restraints.
Besides that, there is not much difference from a performance analysis perspective between the two methods that I can see.
A third option, mentioned by #InverseFalcon, is to use an additional label, ':Registered' on those nodes that are registered. This might well result in a faster comparison time than keeping it in a property, as labels will be inlined in the node store and can be checked there, whereas properties might have an additional level of indirection to the property store.

data model for notification in social network?

I build a social network with Neo4j, it includes:
Node labels: User, Post, Comment, Page, Group
Relationships: LIKE, WRITE, HAS, JOIN, FOLLOW,...
It is like Facebook.
example: A user follow B user: when B have a action such as like post, comment, follow another user, follow page, join group, etc. so that action will be sent to A. Similar, C, D, E users that follow B will receive the same notification.
I don't know how to design the data model for this problem and I have some solutions:
create Notification nodes for every user. If a action is executed, create n notification for n follower. Benefit: we can check that this user have seen notification, right? But, number of nodes quickly increase, power of n.
create a query for every call API notification (for client application), this query only get a action list of users are followed in special time (24 hours or a 2, 3 days). But Followers don't check this notification seen or yet, and this query may make server slowly.
create node with limited quantity such as 20, 30 nodes per user.
Create unlimited nodes (include time of action) on 24 hours and those nodes has time of action property > 24 hours will be deleted (expire time maybe is 2, 3 days).
Who can help me solve this problem? I should chose which solution or a new way?
I believe that the best approach is the option 1. As you said, you will be able to know if the follower has read or not the notification. About the number of notification nodes by follower: this problem is called "supernodes" or "dense nodes" - nodes that have too many connections.
The book Learning Neo4j (by Rik Van Bruggen, available for download in the Neo4j's web site) talk about "Dense node" or "Supernode" and says:
"[supernodes] becomes a real problem for graph traversals because the graph
database management system will have to evaluate all of the connected
relationships to that node in order to determine what the next step
will be in the graph traversal."
The book proposes a solution that consists in add meta nodes between the follower and the notification (in your case). This meta node should got at most a hundred of connections. If the current meta node reaches 100 connections a new meta node must be created and added to the hierarchy, according to the example of figure, showing a example with popular artists and your fans:
I think you do not worry about it right now. If in the future your followers node becomes a problem then you will be able to refactor your database schema. But at now keep things simple!
In the series of posts called "Building a Twitter clone with Neo4j" Max de Marzi describes the process of building the model. Maybe it can help you to make best decisions about your model!

wso2/ws02 CEP, ESPER or something else?

I have a use case where a system transaction happen/completed over a period of time and with multiple "building up" steps. each step in the process generates one or more events (up to 22 events per transaction). All events within a transaction have a shared and unique (uuid) correlation ID.
An example is for a transaction X: will have the building blocks of EventA, EventB, EventC... and all tagged with a unique correlation identifier.
The ultimate goal here is to switch from persisting all the separate events in an RDBMS and query a consolidated view (lots of joins) To: be able to persist only 1 encompassing transaction record that will consolidate attributes from each step in the transaction.
My research so far led me toward reading about Esper (Java stack here) and WSo2/WS02 CEP. In my case each event is submitted/enqueued into JMS, and I am wondering if a solution like WS02/WSo2 CEP can be used to consolidate JMS events/messages (streams) and based on correlation ID (and maximum time limit 30 min) produce one consolidated record and send it down JMS to ultimately persist in a DB.
Since I am still in research mode, I was wondering if I am on the right path for a solution?
Anybody achieved such thing using WS02/WSo2 CEP, or is it over kill ? other recommendations?
You can use WSO2 CEP by integrating that to JMS to send and receive events and by using Siddhi Pattern queries[1] to consolidate events arriving from the same transaction.
30 min is a reasonable time period and its recommended to test the scenario with some test data set because you must need enough memory in the servers for CEP to handle the states. This will greatly depend on the event rate.
AFAIK this is not an over kill in a enterprise deployment.
I would recommend you to try esper patterns. For multievent based system where some particular information is to be collected patterns works the best way.
A sample example would be:
select * from TemperatureEvent
match_recognize (
measures A as temp1, B as temp2, C as temp3, D as temp4
pattern (A B C D)
A as A.temperature > 100,
B as (A.temperature < B.value),
C as (B.temperature < C.value),
D as (C.temperature < D.value) and D.value >
(A.value * 1.5))
Here, we have 4 events and 5 conditions involving these events. Example is taken from demo project.
