I'm faced with this situation:
Host A and B are exchanging messages in a conversation through a broker.
When host B receives a messages it sends back a delivery token to Host A so that it can show the user that B has received his messages. This may also happen the other way around.
At any point A or B may be offline and the broker will hold on to the messages until they come online and then deliver them.
Each host stores it's own and the other hosts messages in a database table:
ID | From | To | Msg | Type | Uid
I figured using the naive table primary key id would have been a bad choice to identify the messages (as it's dependent in order of insertion) so I defined a custom unique id field (uid).
My question is:
How can I make sure that the current message id stays synchronized between host A and B so that only one message has that id? So that I can use delivery token id to identify which message was received, and it wouldn't be possible if I had more than one message with the same Id.
If I do this naively incrementing it every time we send/receive a message at first it looks ok:
Host A sends message with ID 1 and increases it's current ID to 2
Host B receives a message and increases it's current ID to 2
Host B sends message with ID 2 and increases it's current ID to 3
Host A receives message and increases it's current ID to 3
...
But it may very easily break:
Host A sends message with ID 1 and increases it's current ID to 2
Host B sends a message (before receiving the previous one) with ID 1
clash.. two messages with ID 1 received by both hosts
I thought of generating a large UUID every time (with extremely low chance of collision) but it introduces a large overhead as every message would need both to carry and store one.
Unfortunately any solution regarding the broker is not viable because I can't touch the code of the broker.
This is a typical problem of Distributed Systems (class exercise?). I suppose you are trying to keep the same ID in order to determine an absolute order among all messages exchanged between Alice and Bob. If this is not the case, the solution provided in the comment by john1020 should be enough. Other possibility is to have ID stored in one node that can be accessed by both A and B and a distributed locks mechanism synchronizes access. In that way, you always define an order even in face of collisions. But this is not always possible and sometimes not efficient.
Unfortunately, there is no way of keeping an absolute order (except having that unique counter with distributed locks). If you have one ID that can be modified by both A and B, you will have a problem of eventual consistency and risk of collisions. A collision is basically the problem you described.
Now, imagine both Bob and Alice send a message at the same time, both set ID in 2. What would be the order in which you would store the messages? Actually it doesn't matter, it's like the situation when two people spoke at the phone at the same time. There is a collision.
However, what is interesting is to identify messages that actually have a sequence or cause-effect: so you could keep an order between messages that are caused by other messages: Bob invites Alice to dance and Alice says yes, two messages with an order.
For keeping such order you can apply some techniques like vector clocks (based on a Leslie Lamport's timestamps vector algorithm): https://en.wikipedia.org/wiki/Vector_clock . You can also read about AWS' DynamoDB: http://the-paper-trail.org/blog/consistency-and-availability-in-amazons-dynamo/
Also you can use the same mechanism Cassandra uses for distributed counters. This is a nice description: http://www.datastax.com/wp-content/uploads/2011/07/cassandra_sf_counters.pdf
Related
I am using total count of devices as the "server attributes" at customer entity level that is in turn being used for Dashboard widgets like in "Doughnut charts". Hence to get the total count information, I have put a rule chain in place that handles "Device" Addition / Assignment event to increment the "totalDeviceCount" attribute at customer level. But when the device is getting deleted / UnAssigned , I am unable to get access to the Customer entity using "Enrichment" node as the relation is already removed at the trigger of this event. With this I have the challenge of maintaining the right count information for the widgets.
Has anyone come across similar requirement? How to handle this scenario?
Has anyone come across similar requirement? How to handle this scenario?
What you could do is to count your devices periodically, instead of tracking each individual addition/removal.
This you can achieve using the Aggregate Latest Node, where you can indicate a period (say, each minute), the entity or devices you want to count, and to which variable name you want to save it.
This node outputs a POST_TELEMETRY_REQUEST. If you are ok with that then just route that node to Save Timeseries. If you want an attribute, route that node to a Script Transformation Node and change the msgType to POST_ATTRIBUTE_REQUEST.
I have a distributed hash table (DHT) which is running on multiple instances of the same program, either on multiple machines or for testing on different ports on the same machine. These instances are started after each other. First, the base node is started, then the other nodes join it.
I am a little bit troubled how I should implement the join of the second node, in a way that it works with all the other nodes as well (all have of course the same program) without defining all border cases.
For a node to join, it sends a join message first, which gets passed to the correct node (here it's just the base node) and then answered with a notify message.
With these two messages the predecessor of the base node and the successor of the existing nodes get set. But how does the other property get set? I know, that occasionally the nodes send a stabilise message to their successor, which compares it to its predecessor and returns it with a notify message and the predecessor in case it varies from the sender of the message.
Now, the base node can't send a message, because it doesn't know its successor, the new node can send one, but the predecessor is already valid.
I am guessing, both properties should point to the other node in the end, to be fully joined.
Here another diagram what I think should be the sequence i the third node joins. But again, when do I update the properties based on a stabilise message, when do I send a notify message back? In the diagram it is easy to see, but in code it is hard to decide.
Th trick is here to set the successor to the same value as the predecessor if it is NULL after the join-message has been received. Everything else gets handled nicely by the rest of the protocol.
In Hyperledger Fabric v0.6, a supply chain app can be implemented that allows tracing of provenance and avoids double-spending (i.e., distributing/selling items more than it has) and thus avoids counterfeit. As an example, when a supplier supplies 500 units of an item to a distributor, this data is stored in the ledger. The distributor can distribute a specified quantity of an item to a particular reseller by calling a "transfer" function. The transfer function does the following:
checks if the distributor has enough quantity of an item to distribute to a particular reseller (i.e., if quantity to transfer <= current quantity)
updates the ledger (i.e., deducts the current quantity of the distributor and adds this to the current quantity of the reseller)
With this approach, the distributor cannot distribute more (i.e., double spend) than what it has (e.g., distributing counterfeit/smuggled items).
In addition, a consumer can trace the provenance (e.g., an item was purchased from reseller1, which came from a distributor, which came from a supplier) by looking at the ledger.
However, since it uses a single ledger, privacy is an issue (e.g., reseller2 can see the quantity of items ordered by reseller1, etc.)
A proposed solution to impose privacy is to use multiple channels in Hyperledger Fabric v1.0. In this approach, a separate channel/ledger is used by the supplier and distributor. Similarly, a separate channel/ledger is used by the distributor and reseller1, and another separate channel/ledger for the distributor and reseller2.
However, since the resellers (i.e., reseller1 and reseller2) have no access the the channel/ledger of the supplier and distributor, the resellers have no idea of the real quantity supplied by the supplier to the distributor. For example, if the supplier supplied only 500 quantities to the distributor, the distributor can claim to the resellers that it procured 1000 quantities from the supplier. With this approach, double spending / counterfeiting will not be avoided.
In addition, how will tracing of provenance be implemented? Will a consumer be given access to all the channels/ledgers? If this is the case, then privacy becomes an issue again.
Given this, how can we use multiple channels in Hyperledger Fabric v1.0 while allowing tracing of provenance and prohibiting double spending?
As Artem points out, there is no straightforward way to do this today.
Chaincodes may read across channels, but only weakly, and they may not make the content of this read a contingency of the commit. Similarly, transactions across channels are not ordered, which creates other complications.
However, it should be possible to safely move an asset across channels, so long as there is at least one trusted participant in both channels. You can think of this as the regulatory or auditor role.
To accomplish this, the application would essentially have to implement a mutex on top of fabric which ensures a resource does not migrate to two different channels at once.
Consider a scenario with companies A, B, and regulator R. A is known to have control over an asset Q in channel A-R, and B wants to safely take control over asset Q in channel A-B-R.
To safely accomplish this the A may do the following:
A proposes to lock Q at sequence 0 in A-R to channel A-B-R. Accepted and committed.
A proposes the existence of Q at sequence 0 in A-B-R, endorsed by R (who performs a cross channel read to A-R to verify the asset is locked to A-B-R). Accepted and committed.
A proposes to transfer Q to B in A-B-R, at sequence 0. All check that the record for Q at sequence 0 exists, and includes it in their readset, then sets it to sequence 1 in their writeset.
Green path is done. Now, let's say instead that B decided not to purchase Q, and A wished to sell it to C. in A-C-R. We start assuming (1), (2), have completed above.
A proposes to remove asset Q from consideration in channel A-B-R. R reads Q at sequence 0, writes it at sequence 1, and marks it as unavailable.
A proposes to unlock asset Q in A-R. R performs a cross channel read in A-B-R and confirms that the sequence is 1, endorses the unlock in A-R.
A proposes the existence of Q at sequence 1 in A-C-R, and proceeds as in (1)
Attack path, assume (1), (2) are done once more.
A proposes the existence of Q at sequence 0 in A-C-R. R will read A-R and find it is not locked to A-C-R, will not endorse.
A proposes to remove the asset Q from consideration in A-R after a transaction in A-B-R has moved control to B. Both the move and unlock transaction read that value at the same version, so only one will succeed.
The key here, is that B trusts the regulator to enforce that Q cannot be unlocked in A-R until Q has been released in A-B-R. The unordered reads are fine across the channels, so long as you include a monotonic type sequence number to ensure that the asset is locked at the correct version.
At the moment there is no straight forward way of providing provenance across two different channels within Hyperledger Fabric 1.0. There few directions to support such scenarios:
First one is to have an ability to keep portions of the data of the ledger and provide discrete segregation within the channel, the work item described here: FAB-1151.
Additionally there is proposal of adding support for private data while maintaining the ability to proof existence and ownership of claimed asset was posted in mailing list.
What you can do currently is to leverage application side encryption to provide privacy and keep all related transactions on the same channel, e.g. same ledger (pretty much similar to approach you had back in v0.6).
Starting in v1.2,
Fabric offers the ability to create private data collections,
which allow a defined subset of organizations on a channel the ability
to endorse, commit, or query private data without having to create a
separate channel.
Now in your case, you can create a subset of your reseller data to be private to the particular entity without creating a separate channel.
More Info refer: Fabric Doc.
I build a social network with Neo4j, it includes:
Node labels: User, Post, Comment, Page, Group
Relationships: LIKE, WRITE, HAS, JOIN, FOLLOW,...
It is like Facebook.
example: A user follow B user: when B have a action such as like post, comment, follow another user, follow page, join group, etc. so that action will be sent to A. Similar, C, D, E users that follow B will receive the same notification.
I don't know how to design the data model for this problem and I have some solutions:
create Notification nodes for every user. If a action is executed, create n notification for n follower. Benefit: we can check that this user have seen notification, right? But, number of nodes quickly increase, power of n.
create a query for every call API notification (for client application), this query only get a action list of users are followed in special time (24 hours or a 2, 3 days). But Followers don't check this notification seen or yet, and this query may make server slowly.
create node with limited quantity such as 20, 30 nodes per user.
Create unlimited nodes (include time of action) on 24 hours and those nodes has time of action property > 24 hours will be deleted (expire time maybe is 2, 3 days).
Who can help me solve this problem? I should chose which solution or a new way?
I believe that the best approach is the option 1. As you said, you will be able to know if the follower has read or not the notification. About the number of notification nodes by follower: this problem is called "supernodes" or "dense nodes" - nodes that have too many connections.
The book Learning Neo4j (by Rik Van Bruggen, available for download in the Neo4j's web site) talk about "Dense node" or "Supernode" and says:
"[supernodes] becomes a real problem for graph traversals because the graph
database management system will have to evaluate all of the connected
relationships to that node in order to determine what the next step
will be in the graph traversal."
The book proposes a solution that consists in add meta nodes between the follower and the notification (in your case). This meta node should got at most a hundred of connections. If the current meta node reaches 100 connections a new meta node must be created and added to the hierarchy, according to the example of figure, showing a example with popular artists and your fans:
I think you do not worry about it right now. If in the future your followers node becomes a problem then you will be able to refactor your database schema. But at now keep things simple!
In the series of posts called "Building a Twitter clone with Neo4j" Max de Marzi describes the process of building the model. Maybe it can help you to make best decisions about your model!
What is the best way to proxy marketplace messaging using SMS?
User Model:
each conversation has owner_id and renter_id, if a message is received from one it should be proxied to the other.
If the owner is connected to many conversations, what is the best way to make sure the messages are directed to the proper recipient?
Update:
It looks like twilio recommends purchasing a phone number for each conversation.
This would require owning N phone numbers where N is greater than the conversations grouped by unique user/recipient.
For example with Airbnb data model, would need to know the owner with the largest number of unique renters... This seems like a lot of potential overhead. please correct me if i'm wrong.
This concept will definitely require multiple Twilio numbers if you want to give a friction less experience (no PINs to enter ) , but you will only ever need to have as many numbers as people who a single user can contact.
This is explained in more detail here . And you only need to work out a starting number and rest can be dynamic .
Say, if you maximum number of property any owner owns is N and he rents out on all 365 days to different renters , it means the owner has N*365 renters in their "address book", you would only ever need 365N numbers, even if you had 100,000 users. If based on historical data , you could work out maximum of N and maximum of rental days ( say M) , you have the required phone numbers = N*M . This could be the starting point and doesnt have to be a fixed constant value .
As a fail safe - add a handler to when you cross a threshold - say 90% of your number pool of N*M numbers , then use the Twilio REST API to add numbers dynamically to this pool .