Where did HazelCast map.get() go? And scalability concerns - scalability

According to this, "Each map.get(k) will be a remote operation" But where is the remote? For example, I have a node that writes into the IMap with key - k. Another 50 nodes that does read from the IMap using map.get(k). What happens when 50 nodes call map.get(k). Does each call come to the node that does the write? If so, how many copies of IMap does this "remote" node will create in responds to these 50 calls? Is it multi-threaded? Is this IMap singleton? Or each thread will create a deep copy of such IMap?

But where is the remote?
The answer is in the preceding sentence in the documentation link you supplied: "Imagine that you are reading the key k so many times and k is owned by another member in your cluster.". Each key is hashed and mapped to a partition, as explained in http://docs.hazelcast.org/docs/3.7/manual/html-single/index.html#sharding-in-hazelcast and the "Data Partitioning" section that follows. The cluster member that owns that partition is the owner (or primary replica) of the key. Each read & write on that key will be executed by the same thread on that particular member (that is unless your configuration allows read from backup).
What happens when 50 nodes call map.get(k). Does each call come to the node that does the write?
Yes, it's always the key owner the one that executes operations on that key.
If so, how many copies of IMap does this "remote" node will create in responds to these 50 calls?
The member only has one instance of the IMap, no copies.
Is it multi-threaded?
No, all map operations involving the same key k will be executed on the same partition thread on the same member which is the primary replica of that key. You may read more about threading model of operations in http://docs.hazelcast.org/docs/3.7/manual/html-single/index.html#operation-threading

Related

Chord Join DHT - join protocol for second node

I have a distributed hash table (DHT) which is running on multiple instances of the same program, either on multiple machines or for testing on different ports on the same machine. These instances are started after each other. First, the base node is started, then the other nodes join it.
I am a little bit troubled how I should implement the join of the second node, in a way that it works with all the other nodes as well (all have of course the same program) without defining all border cases.
For a node to join, it sends a join message first, which gets passed to the correct node (here it's just the base node) and then answered with a notify message.
With these two messages the predecessor of the base node and the successor of the existing nodes get set. But how does the other property get set? I know, that occasionally the nodes send a stabilise message to their successor, which compares it to its predecessor and returns it with a notify message and the predecessor in case it varies from the sender of the message.
Now, the base node can't send a message, because it doesn't know its successor, the new node can send one, but the predecessor is already valid.
I am guessing, both properties should point to the other node in the end, to be fully joined.
Here another diagram what I think should be the sequence i the third node joins. But again, when do I update the properties based on a stabilise message, when do I send a notify message back? In the diagram it is easy to see, but in code it is hard to decide.
Th trick is here to set the successor to the same value as the predecessor if it is NULL after the join-message has been received. Everything else gets handled nicely by the rest of the protocol.

How do we enforce privacy while providing tracing of provenance using multiple channels in Hyperledger Fabric v1.0?

In Hyperledger Fabric v0.6, a supply chain app can be implemented that allows tracing of provenance and avoids double-spending (i.e., distributing/selling items more than it has) and thus avoids counterfeit. As an example, when a supplier supplies 500 units of an item to a distributor, this data is stored in the ledger. The distributor can distribute a specified quantity of an item to a particular reseller by calling a "transfer" function. The transfer function does the following:
checks if the distributor has enough quantity of an item to distribute to a particular reseller (i.e., if quantity to transfer <= current quantity)
updates the ledger (i.e., deducts the current quantity of the distributor and adds this to the current quantity of the reseller)
With this approach, the distributor cannot distribute more (i.e., double spend) than what it has (e.g., distributing counterfeit/smuggled items).
In addition, a consumer can trace the provenance (e.g., an item was purchased from reseller1, which came from a distributor, which came from a supplier) by looking at the ledger.
However, since it uses a single ledger, privacy is an issue (e.g., reseller2 can see the quantity of items ordered by reseller1, etc.)
A proposed solution to impose privacy is to use multiple channels in Hyperledger Fabric v1.0. In this approach, a separate channel/ledger is used by the supplier and distributor. Similarly, a separate channel/ledger is used by the distributor and reseller1, and another separate channel/ledger for the distributor and reseller2.
However, since the resellers (i.e., reseller1 and reseller2) have no access the the channel/ledger of the supplier and distributor, the resellers have no idea of the real quantity supplied by the supplier to the distributor. For example, if the supplier supplied only 500 quantities to the distributor, the distributor can claim to the resellers that it procured 1000 quantities from the supplier. With this approach, double spending / counterfeiting will not be avoided.
In addition, how will tracing of provenance be implemented? Will a consumer be given access to all the channels/ledgers? If this is the case, then privacy becomes an issue again.
Given this, how can we use multiple channels in Hyperledger Fabric v1.0 while allowing tracing of provenance and prohibiting double spending?
As Artem points out, there is no straightforward way to do this today.
Chaincodes may read across channels, but only weakly, and they may not make the content of this read a contingency of the commit. Similarly, transactions across channels are not ordered, which creates other complications.
However, it should be possible to safely move an asset across channels, so long as there is at least one trusted participant in both channels. You can think of this as the regulatory or auditor role.
To accomplish this, the application would essentially have to implement a mutex on top of fabric which ensures a resource does not migrate to two different channels at once.
Consider a scenario with companies A, B, and regulator R. A is known to have control over an asset Q in channel A-R, and B wants to safely take control over asset Q in channel A-B-R.
To safely accomplish this the A may do the following:
A proposes to lock Q at sequence 0 in A-R to channel A-B-R. Accepted and committed.
A proposes the existence of Q at sequence 0 in A-B-R, endorsed by R (who performs a cross channel read to A-R to verify the asset is locked to A-B-R). Accepted and committed.
A proposes to transfer Q to B in A-B-R, at sequence 0. All check that the record for Q at sequence 0 exists, and includes it in their readset, then sets it to sequence 1 in their writeset.
Green path is done. Now, let's say instead that B decided not to purchase Q, and A wished to sell it to C. in A-C-R. We start assuming (1), (2), have completed above.
A proposes to remove asset Q from consideration in channel A-B-R. R reads Q at sequence 0, writes it at sequence 1, and marks it as unavailable.
A proposes to unlock asset Q in A-R. R performs a cross channel read in A-B-R and confirms that the sequence is 1, endorses the unlock in A-R.
A proposes the existence of Q at sequence 1 in A-C-R, and proceeds as in (1)
Attack path, assume (1), (2) are done once more.
A proposes the existence of Q at sequence 0 in A-C-R. R will read A-R and find it is not locked to A-C-R, will not endorse.
A proposes to remove the asset Q from consideration in A-R after a transaction in A-B-R has moved control to B. Both the move and unlock transaction read that value at the same version, so only one will succeed.
The key here, is that B trusts the regulator to enforce that Q cannot be unlocked in A-R until Q has been released in A-B-R. The unordered reads are fine across the channels, so long as you include a monotonic type sequence number to ensure that the asset is locked at the correct version.
At the moment there is no straight forward way of providing provenance across two different channels within Hyperledger Fabric 1.0. There few directions to support such scenarios:
First one is to have an ability to keep portions of the data of the ledger and provide discrete segregation within the channel, the work item described here: FAB-1151.
Additionally there is proposal of adding support for private data while maintaining the ability to proof existence and ownership of claimed asset was posted in mailing list.
What you can do currently is to leverage application side encryption to provide privacy and keep all related transactions on the same channel, e.g. same ledger (pretty much similar to approach you had back in v0.6).
Starting in v1.2,
Fabric offers the ability to create private data collections,
which allow a defined subset of organizations on a channel the ability
to endorse, commit, or query private data without having to create a
separate channel.
Now in your case, you can create a subset of your reseller data to be private to the particular entity without creating a separate channel.
More Info refer: Fabric Doc.

Synchronizing a current message id in a conversation between Alice and Bob

I'm faced with this situation:
Host A and B are exchanging messages in a conversation through a broker.
When host B receives a messages it sends back a delivery token to Host A so that it can show the user that B has received his messages. This may also happen the other way around.
At any point A or B may be offline and the broker will hold on to the messages until they come online and then deliver them.
Each host stores it's own and the other hosts messages in a database table:
ID | From | To | Msg | Type | Uid
I figured using the naive table primary key id would have been a bad choice to identify the messages (as it's dependent in order of insertion) so I defined a custom unique id field (uid).
My question is:
How can I make sure that the current message id stays synchronized between host A and B so that only one message has that id? So that I can use delivery token id to identify which message was received, and it wouldn't be possible if I had more than one message with the same Id.
If I do this naively incrementing it every time we send/receive a message at first it looks ok:
Host A sends message with ID 1 and increases it's current ID to 2
Host B receives a message and increases it's current ID to 2
Host B sends message with ID 2 and increases it's current ID to 3
Host A receives message and increases it's current ID to 3
...
But it may very easily break:
Host A sends message with ID 1 and increases it's current ID to 2
Host B sends a message (before receiving the previous one) with ID 1
clash.. two messages with ID 1 received by both hosts
I thought of generating a large UUID every time (with extremely low chance of collision) but it introduces a large overhead as every message would need both to carry and store one.
Unfortunately any solution regarding the broker is not viable because I can't touch the code of the broker.
This is a typical problem of Distributed Systems (class exercise?). I suppose you are trying to keep the same ID in order to determine an absolute order among all messages exchanged between Alice and Bob. If this is not the case, the solution provided in the comment by john1020 should be enough. Other possibility is to have ID stored in one node that can be accessed by both A and B and a distributed locks mechanism synchronizes access. In that way, you always define an order even in face of collisions. But this is not always possible and sometimes not efficient.
Unfortunately, there is no way of keeping an absolute order (except having that unique counter with distributed locks). If you have one ID that can be modified by both A and B, you will have a problem of eventual consistency and risk of collisions. A collision is basically the problem you described.
Now, imagine both Bob and Alice send a message at the same time, both set ID in 2. What would be the order in which you would store the messages? Actually it doesn't matter, it's like the situation when two people spoke at the phone at the same time. There is a collision.
However, what is interesting is to identify messages that actually have a sequence or cause-effect: so you could keep an order between messages that are caused by other messages: Bob invites Alice to dance and Alice says yes, two messages with an order.
For keeping such order you can apply some techniques like vector clocks (based on a Leslie Lamport's timestamps vector algorithm): https://en.wikipedia.org/wiki/Vector_clock . You can also read about AWS' DynamoDB: http://the-paper-trail.org/blog/consistency-and-availability-in-amazons-dynamo/
Also you can use the same mechanism Cassandra uses for distributed counters. This is a nice description: http://www.datastax.com/wp-content/uploads/2011/07/cassandra_sf_counters.pdf

py2neo set_labels not working with get_or_create_indexed_node

Why does batch.set_labels() work with batch.create() but not with batch.get_or_create_indexed_node()
This works, a node is created as expected.
batch = neo4j.WriteBatch(neo4j_graph)
a = batch.create(node({'name': 'a'}))
batch.set_labels(a, 'Person')
batch.submit()
This does not work, no node is created.
graph_db.get_or_create_index(neo4j.Node, 'node_index')
batch = neo4j.WriteBatch(neo4j_graph)
b = batch.get_or_create_indexed_node(NEO4J_NODE_INDEX, 'name',
'b',
{'name': 'b'}
)
batch.set_labels(b, 'Person')
batch.submit()
This is a limitation of the way that the batch endpoint and other resources work through the REST interface and requires some familiarity with the REST API to fully understand.
The batch endpoint bundles up a number of HTTP requests into a single call that is carried out within a single transaction. One of the values returned from such requests is a location URI and this can in some cases be passed through into other requests. For example, one may want to create two nodes and then a relationship connecting them. This can be achieved by using pointers such as {0} and {1} to refer to the nodes previously created as the endpoints of the new relationship. For more details on this notation, see here.
The difficulty comes when using the legacy index calls. When a node is created through a legacy index, the location URI returned is that of the index entry point, not of the newly-created node. This cannot be used as a reference (such as in your set_labels call above) and so the expected behaviour does not occur.
Unfortunately, there is no straightforward workaround for this. You can move over to Cypher but you have no way to write to legacy indexes there. Perhaps you can look at schema indexes for this instead?

Neo4j PHP Acquire wirte lock

Sinking in big trouble,
Well can anyone tell me , how can i acquire write lock through cypher.
Note : I will use REST APIs , So my cypher would in php.
EDITED :
Scenario:
I am using Neo4j REST server and PHP to access it.
Now i have created a node say 'counter-node' which generates new user id. Logic is just add 1 to previous value.
Now If two users are coming simultaneously then first user read 'counter-node' value BUT before it can update it to 1 , second user read it . Thus value in 'counter-node' is not as expected.
Any Help
You don't need to acquire write locks explicitly. All nodes that you modify in a transaction are write-locked automatically.
So if you do this in your logic:
start tx
increment counter node
read the value of the counter node and set it on the user node as ID
commit tx
no two users will ever get the same ID.
The popular APOC plugin for Neo4j has a selection of explicit Locking procedures that can be called via Cypher such as call apoc.lock.nodes([nodes])
Learn more at neo4j-contrib.github.io/neo4j-apoc-procedures/#_locking
Note: as far as I can tell, this functionality doesn't exist natively in Cypher so APOC is probably your best bet.

Resources