Implement privacy settings in status updates in Neo4j database

Implement privacy settings in status updates in Neo4j database - neo4j

Social Networks nowadays allow a user to set privacy settings for each post. For example in Facebook a user can set privacy setting of a status update to "just me", "friends", "public" etc. How would one design such a system in Neo4j (no documentation on something like this)?
One way would be to set a property for each post. But this doesn't look scalable as this feature cannot be extended to something like "share with group 1" (Assuming users are allowed to create their own groups (or circles in google+)). Also, when a user tries to access the newsfeed, merging separate results from "just me", "friends", "public" sounds like a nightmare.
Any suggestions?

For the sharing-level I'd set a label on the content (:SharePublic (or leave that off) :ShareFriends, :ShareGroup, :ShareIndividual)
For groups and individuals create a relationship from the content.
To aggregate the newsfeed for a user
Until limit
go over potential news items
allow public
allow individual if pointing to me
allow group if pointing to one of my groups
allow friend if friend with author (check if there is a friendship rel from me to author)
As the feed is a core high performance use-case I'd probably not do it in cypher but a server extension.
Cypher "pseudocode"
MATCH (u:User {login:{login}})
MATCH (:Head)-[:NEXT*1000]->(p:Post)
WHERE p:SharePublic
OR p:ShareIndividual AND (p)-[:SHARED_WITH]->(u)
OR p:ShareFriend AND (p)<-[:AUTHOR]-()-[:FRIEND]-(u)
OR p:ShareGroup AND (p)-[:SHARED_WITH]->(:Group)<-[:IN_GROUP]-(u)
RETURN p
LIMIT 30

Related

Can graph database query "nodes that a given node has no relationship with"?

I am working on a dating app where users can "like" or "dislike" other users and get matched.
As you can imagine the most important query of the app would be:
Give me a stack of nearby user profiles that I have NOT liked/disliked before.
I tried to work on this with a document database (Firestore) and figured it's simply not suitable for such kind of application and hence landed in the graph database world which is new and fascinating to me.
I understand that by nature a graph database retrieves data by tracing through the relationships and make relationships first-class citizens. My question now is that what if the nodes that I am trying to get are those with no relationship from the given node? What would the query look like? Can anyone provide an example query?
Edit:
- added nearby criteria to the query statement

This is definitely possible, here is a query example :
MATCH (me:Profile {name: "Chris"})
MATCH (other:Profile) WHERE NOT (other)-[:LIKES]->(me)
As stated in the comments of your original question, on a large dataset it might not scale well, that said it is pretty uncommon that you would use only one criteria for matching, for example, the list of possible profiles to match from can be grouped by :
geolocation
profiles in depth 2 ( who is liking me, then find who other people they like, do those people like me ? )
shared interests
age group
skin color
...

graph modeling approach for node/edge user access control

Are there sets of best practices to approach how to model data in a graph database (I am considering arangodb right now but the question would apply to other platforms)? Here is a practical case to illustrate my question:
Assuming we are creating a centralised contact list for users. Each user has contacts but some contacts could be common to users e.g. John knows Mary, and Marc knows Mary. I would thus have 3 nodes (John, Mary and Marc) but John should only see his relationship to Mary, not Marc's relationship to Mary
So how should a full graph be designed in order to support user access to their information?
Option 1: Create 1 graph per user. That way, I know exactly who can see what (I could for example prefix all my collections with the user id). That would be simple but would duplicate a lot of data (e.g. if I put all my family in the db, my brother will do too, creating twice the same data, in different graphs)
Option 2: Create 1 general graph with Contact nodes, plus User nodes. I would have the contact John, Mary and Marc connected, but the User node representing John, would be linked to the Contact nodes John and Mary only. That way I would know to get only the contact nodes that are connected to the User node I am focusing on.
The problem is that edges cannot be linked to the User node (I cannot have an edge going from a node to an edge...can I?). So I would have to add an attribute of user_id to all the edges in order to only fetch the ones relevant to the current user.
This is slightly nicer as I do not have to duplicate nodes, but I would still have to duplicate edges as they would be user specific
Option 3: Do it SQL like with a Rights table, maintaining a list of Contact ids along with what user can see what Node and what Edge (heavy on joins)
Options 4: ???
As in everything, there are many ways to reach a solution but I was wondering what was considered best practice to balance cleanliness of approach and performance for insertion/deletion...knowing that performance might be platform dependent

i would suggest an Option 4:
First i would not distinguish between User and Contact Nodes, but all of them should be Contact Nodes.
If you create a new User you basically create a new Contact for him (or use an existing one) and connect your Applications Authentication to this specific Contact.
Then you can use directed edges to create the contact list for a user.
Say you have two users John and Mary, than John can add Mary to his contact list, but Mary would not recognize. If she wants to add John this means you will add a second edge.
If you want to have symmetrical contacts only (if John adds Mary to his list, he should automatically appear in her list) you simply ignore this direction in your queries.
If you now want to get the contacts for John this can be done by selecting the Neighbors of John.
In ArangoDB this can be realized with two collections, say Contact and Knows, where Knows holds the edges.
The following code pasted into arangosh creates your situation described above:
db._create("Contact");
db._createEdgeCollection("Knows");
db.Contact.save({_key: "John", mail: "john#example.com"});
db.Contact.save({_key: "Mary", mail: "mary#somewhere.com"});
db.Contact.save({_key: "Marc", mail: "marc#somewhereelse.com"});
db.Knows.save("Contact/John", "Contact/Mary", {});
db.Knows.save("Contact/Marc", "Contact/Mary", {});
To query the contact list for user John:
db._query('RETURN NEIGHBORS(Contact, Knows, "John", "outbound")').toArray()
Should give Mary as result, no information about Marc.
If you do not want to join Contacts and User Accounts as i suggested you could also separate them in different collections, in this case you have to slightly modify the edges and the query:
db.Knows.save("User/John", "Contact/Mary", {});
db.Knows.save("User/Marc", "Contact/Mary", {});
db._query('RETURN NEIGHBORS(Users, Knows, "John", "outbound")').toArray()
should give the same result.
Edit:
Regarding your question in Option 2:
In ArangoDB it is actually possible to point edges to other edges, however build in graph functionality will now consider the edges pointed to as if they were nodes. This means they do not follow their direction automatically. But you can use these resulting edges in further AQL statements and continue the search with AQL features.

Neo4j how to do a highscore system

I work in a company that makes a social game where our users can have friends and can make content that based on popularity shows up on highscores.
I am trying to find out whether we can move some of our data to a graph database like neo4j and one of the things I can't figure out is how to implement a highscore system in a graph database. I basically want to make queries like this:
Get list of movies/artbooks/photos content created by friends ordered by content with most likes.
Get list of movies/artbooks/photos content created by ALL USERS in the last 7 days ordered by content with most likes.
What kind of data modeling and queries should we do to implement this?
The datamodel I was planning to do was to have users as nodes and the content made by a user linked to the user as a list of connected content nodes with the latest one linked to the user, but how do I get highscores into such a model.
Thanks.

Here is one possible model:
(f:User {name: "Fred"})-[:CREATED]->(c:Content {created: 2345, type: "Music"})
(m:User {name: "Mary"})-[:LIKES {score:5}]->(c:Content)
(f)-[:KNOWS]->(m);
To get the content created by all Users since a specific timestamp, in descending order by the number of likes, you can use the following query. The OPTIONAL MATCH is used to avoid filtering out Content with no likes.
MATCH (c:Content)
WHERE c.created > 1234
OPTIONAL MATCH ()-[l:LIKES]->(c)
RETURN c, COUNT(l) AS num_likes
ORDER BY num_likes DESC;
Here is a console that illustrates this.

Neo4j user status privacy setting model

I am following this article to setup a system where users can follow each other and also become friends : http://neo4j.com/docs/stable/cypher-cookbook-newsfeed.html
A user (A) can friend another user (B) and hence A is automatically following B. User A can also follow B without adding B as a friend. Hence there should be a distinction made to the feed results. If A and B are not confirmed friends, A should get status updates from B that are marked public only. If A is a confirmed friend of B, A should get all status updates from B. A, even if is B's friend, can also unfollow his/her feed. (typical facebook model?). So basically I need to check who A follows and grab their updates. However, while doing this I also need to check if A has access to these status updates.
Is there an easy cypher to implement this? Or do you have a better model in mind? Assuming all updates are public following query should work. How would you add privacy setting dimension to it if there are friends only posts too?
MATCH (me { name: 'Joe' })-[rels:FOLLOWS*0..1]-(anotherUser)
WITH anotherUser
MATCH (anotherUser)-[:STATUS]-(latestupdate)-[:NEXT*0..1]-(statusupdates)
RETURN anotherUser.name AS name, statusupdates.date AS date, statusupdates.text AS text
ORDER BY statusupdates.date DESC LIMIT 3

Yes, you can implement all of these requirements, it seems to me to boil down to a few extra carefully chosen WHERE clauses.
Here's your base query, with modifications:
MATCH (me { name: 'Joe' })-[rels:FOLLOWS*0..1]-(anotherUser)
WITH anotherUser
MATCH (anotherUser)-[:STATUS]-(latestupdate)-[:NEXT*0..1]-(statusupdates)
WHERE statusupdates.visibility='PUBLIC'
RETURN anotherUser.name AS name, statusupdates.date AS date, statusupdates.text AS text
ORDER BY statusupdates.date DESC LIMIT 3
Here, I've just added a WHERE to check for visibility=PUBLIC (which I made up, because the sample app doesn't specify those things; that would have to be part of your model one way or another).
You might consider doing that query along with a UNION to another query, which would be intended to fetch only those status updates from friends. (If it's a friend, then it doesn't matter what the visibility is)
MATCH (me { name: 'Joe' })-[:FRIEND]-(friend)-[:STATUS|NEXT*1..]->(statusupdates)
RETURN statusupdates
ORDER BY statusupdates.date DESC LIMIT 3;
Instead of using UNION you could also combine the two queries with an OPTIONAL MATCH clause on the second pattern. But either way, basically, your query needs to get the list of all status updates that are either people you follow whose posts are public, friends posts, or both. So conceptually it's easy to break that into those two separate cases, and then UNION the two result sets together.

Neo4j suggestion on large scale

i need to implement a suggestion system for my project
in this system we should recommend people base on some parameters like current city, education, friend of friends etc.
i have designed this by creating(update) may_know relations when users edit their profile or become friend with someone and i will retrieve them by MATCH u-[r:MAY_KNOW]-x RETURN * ORDER BY r.weight so people can find most like people to them
but i think this is not a best practice because soon may_know relation from/to every user can reach even milions and scan and sorting them will be heavy cost
do you have a better idea?

Depends a bit on the data-structure, I assume there are relationships to cities, education facilities and friends. So you don't actually have MAY_KNOW relationships as those are only inferred?
Also it depends if you want to create a cross products between all your users (how many) and how you would want to filter out non-related people.
Perhaps check out this blog post from Max: http://maxdemarzi.com/2013/04/19/match-making-with-neo4j/
So something like this query might work (depending on the data volume I'd rewrite it in the Java API).
match (p:Person {id:{user_id})
match (p)-[:LIVES_IN]->(:City)<-[:LIVES_IN]-(other)
match (p)-[:GRADUATED]->(:School)<-[:GRADUATED]-(other)
match (p)-[:KNOWS]->(:Person)<-[:KNOWS]-(other)
RETURN other

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart