Is this concept applicable for a graph database? - neo4j

I have been reading about Graph databases and want to know if this type of structure is applicable to it:
Company > Has user Accounts > Accounts send out facebook posts (which are available to all users)
Up to here - I think this makes sense - yes it would be a good use of Graph. A post has a relationship to any accounts and you can find out the direction both ways - posts for a company and which posts were sent by which users or companies.
However
Users get added and deleted on a daily basis and I need a record store of how many there were at a given time
Accounts are getting results for each post (likes/friends) which I need to store on a daily basis
I need to find out how many likes a company received (on any given day)
I also need to find out how many likes a user received
I need to find out how many likes a user received per post
You would need to store Likes as a group and then date-value - can you even have "sub" properties?
I struggle at this point unless you are storing lots of date-value property lists per node. Is that the way you would do it? If I wanted to find out the later 2 points for example would it be as efficient as a RDBMS?

Here is a very simple example of a Graph data model that seems to cover your stated use cases. (Since nodes can have multiple labels, all Company and User nodes are also Entity nodes -- to simplify the model.)
(:Company:Entity {id:100})-[:HAS_USER]->(:User:Entity {id: 200})
(:Entity)-[:SENT]->(:Post {date: 123, msg: "I like cats!"})
(:Entity)-[:LIKES {date: 234}]->(:Post)
Your use cases:
Users get added and deleted on a daily basis and I need a record store of how many there were at a given time.
How to count all users:
MATCH (u:User)
RETURN COUNT(*);
How to count a company's users:
MATCH (c:Company {id:100})-[:HAS_USER]->(u:User)
RETURN COUNT(*);
I need to find out how many likes a company received (on any given day)
MATCH (c:Company {id: 100})-[:SENT]->(p:Post)<-[:LIKES {date:234}]-()
RETURN COUNT(*)
I also need to find out how many likes a user received
MATCH (u:User {id:200})-[:SENT]->(p:Post)<-[:LIKES]-()
RETURN COUNT(*);
I need to find out how many likes a user received per post
MATCH (u:User {id:200})-[:SENT]->(p:Post)<-[:LIKES]-()
RETURN p, COUNT(*)
You would need to store Likes as a group and then date-value - can you even have "sub" properties?
You do not need to explicitly group likes by date (if that is what you mean). Such "groupings" can be easily obtained by the appropriate query (e.g., in #2 above).

Related

Find the movies the user hasn't watched that the other users he follows have watched

I have a database containing movies and users. Users can follow each other and rate movies. I have a user with the id 599. I want to find all the movies he hasn't watched that the other users he follows have watched. Here's what I tried so far. I do get a result and it works but the numbers don't really add up so I think there might be something off.
MATCH (u:user {id:599}),(m2:Movie), (u2:user),(u)-[r]->(u2),(u2)-->(m2)
WHERE not (u)-->(m2)
RETURN m2.title, m2.movieId,u2.id;
As #Inversefalcon stated, perhaps your query needs to be specific about the relationship types, in case there are multiple types of relationships between the same nodes.
For example (assuming your data model has the relationship types FOLLOWS and WATCHED):
MATCH (u:user {id:599})-[:FOLLOWS]->(u2:user)-[:WATCHED]->(m2:Movie)
WHERE not (u)-[:WATCHED]->(m2)
RETURN m2.title, m2.movieId, u2.id;
If this does not solve your issue, then please provide some sample data and what your expected number of results is.

Neo4j - complete a query with an alternative match if it finds few results

I am trying to write a query which looks for potential friends in a Neo4j db based on common friends and interests.
I don't want to post the whole query (part of school assignment), but this is the important part
MATCH (me:User {firstname: "Name"}), (me)-[:FRIEND]->(friend:User)<-[:FRIEND]-(potential:User), (me)-[:MEMBER]->(i:Interest)
WHERE NOT (potential)-[:FRIEND]->(me)
WITH COLLECT(DISTINCT potential) AS potentialFriends,
COLLECT(DISTINCT friend) AS friends,
COLLECT(i) as interests
UNWIND potentialFriends AS potential
/*
#HANDLING_FINDINGS
Here I count common friends, interests and try to find relationships between
potential friends too -- hence the collect/unwind
*/
RETURN potential,
commonFriends,
commonInterests,
(commonFriends+commonInterests) as totalPotential
ORDER BY totalPotential DESC
LIMIT 10
In the section #HANDLING_FINDINGS I use the found potential friends to find relationships between each other and calculate their potential (i.e. sum of shared friends and common interests) and then order them by potential.
The problem is that there might be users with no friends whom I would also like to recommend someone friends.
My question - can I somehow insert a few random users into the "potential" findings if their count is below 10 so that everyone gets a recommendation?
I have tried something like this
...
UNWIND potentialFriends AS potential
CASE
WHEN (count(potential) < 10 )
...
But that produced an error as soon as it hit start of the CASE. I think that case can be used only as part of a command like return? (maybe just return)
Edit with 2nd related question:
I was already thinking of matching all users and then ranking them based on common friends/interestes, but wouldn't searching through the whole DB be intensive?
A CASE expression can be used wherever a value is needed, but it cannot be used as a complete clause.
With respect to your main question, you can put a WITH clause like the following between your existing WITH and UNWIND clauses:
WITH friends, interests,
CASE WHEN SIZE(potentialFriends) < 10 THEN {randomFriends} ELSE potentialFriends END AS potentialFriends
If the size of the potentialFriends collection is less than 10, the CASE expression assigns the value of the {randomFriends} parameter to potentialFriends.
As for your second question, yes it would be expensive.

DB model for logging ongoing transactions

I am used to have tables for ongoing activities from my former life as relational DB guy. I am wondering how I would store ongoing information like transactions, logs or whatever in a neo4j DB. Let#s assume I have an account, which is been assigned to a user A:
(u:User {name:"A"})
I want to keep track on transactions he does, e.g. deducting or adding a value:
(t:Transaction {value:"-20", date:timestamp()})
Would I do for every transaction a new node and assign it to the user:
(u) -[r:changeBalance]-> (t)
In the end I might have lots of nodes which are assigned to the user and keep one transaction each resulting in lots of nodes with only one information. I was pondering if a query that has a limit on the last 50 transactions (limit 50, sort by t.date) might still have to read all available transaction nodes to get the total sorting queue before the limit applies - this seems a bit unperformant.
How would you model a list of actions in a neo4j DB? Any hint is very appreciated.
If you used a simple query like the following, you would NOT be reading all Transaction nodes per User.
MATCH (u:User)-[r:ChangeBalance]->(t:Transaction)
RETURN u, t
ORDER BY t.date;
You'd only be reading the Transaction nodes that are directly related to each User (via a ChangeBalance relationship). So, the performance would not be as bad as you are afraid it might be.
Although everything is fine with your query - you are reading only transactions, that are related to this specific user - this approach can be improved.
Let's imagine that, for some reason, you application will work 5 years and you have user that have 10 transactions per day. It will result in ~18250 transaction connected to single node.
This is not great idea, from data model perspective. In this case if you want to filter result (get 50 latest transaction) on some non-indexed field, then this will result in full 18250 node traverse.
This can be solved by adding additional relations to database.
Currently you have such graph: (user)-[:HAS]->(transasction)
( user )
/ | \
(transasction1) (transaction2) (transaction3)
You can add additional relation between transactions, to specify sequence of events.
Like that: (transaction)-[:NEXT]->(transasction)
( user )
/ | \
(transasction1)-(transaction2)-(transaction3)
Note: there is no need to have additional PREVIOUS relation, because Neo4j store relationship pointers in both directions, so traversing backwards can be done at same speed as forward.
And maintain relations to first and last user transasctions:
(user)-[:LAST_TRANSACTION]->(transaction)
(user)-[:FIRST_TRANSACTION]->(transaction)
This allows you to get last transaction in 1 hop. And then latest 50 with additional 50 hops.
So, adding additional complexity, you can traverse/manipulate with your data in more efficient ways.
This idea come from EventStore database (and similar to them).
Moreover, with such data model User balance can be aggerated by wrapping up sequence of transaction. This can give you a nice and fast way to get user balance at any point.
Getting latest 50 transaction in this model can look like this:
MATCH (user:User {id: 1} WITH user
MATCH (user)-[:LAST_TRANSACTION]->(last_transaction:Transaction) WITH last_transaction
MATCH (last_transasction)<-[:NEXT*0..50]-(transasctions:Transaction)
RETURN transactions
Getting total user balance can be:
MATCH (user:User {id: 1}) WITH user
MATCH (user)-[:FIRST_TRANSACTION]->(first_transaction:Transaction) WITH first_transaction
MATCH (first_transaction)-[:NEXT*]->(transactions:Transaction)
RETURN first_transaction.value + sum(transasctions.value)

Neo4j how to do a highscore system

I work in a company that makes a social game where our users can have friends and can make content that based on popularity shows up on highscores.
I am trying to find out whether we can move some of our data to a graph database like neo4j and one of the things I can't figure out is how to implement a highscore system in a graph database. I basically want to make queries like this:
Get list of movies/artbooks/photos content created by friends ordered by content with most likes.
Get list of movies/artbooks/photos content created by ALL USERS in the last 7 days ordered by content with most likes.
What kind of data modeling and queries should we do to implement this?
The datamodel I was planning to do was to have users as nodes and the content made by a user linked to the user as a list of connected content nodes with the latest one linked to the user, but how do I get highscores into such a model.
Thanks.
Here is one possible model:
(f:User {name: "Fred"})-[:CREATED]->(c:Content {created: 2345, type: "Music"})
(m:User {name: "Mary"})-[:LIKES {score:5}]->(c:Content)
(f)-[:KNOWS]->(m);
To get the content created by all Users since a specific timestamp, in descending order by the number of likes, you can use the following query. The OPTIONAL MATCH is used to avoid filtering out Content with no likes.
MATCH (c:Content)
WHERE c.created > 1234
OPTIONAL MATCH ()-[l:LIKES]->(c)
RETURN c, COUNT(l) AS num_likes
ORDER BY num_likes DESC;
Here is a console that illustrates this.

Neo4j user status privacy setting model

I am following this article to setup a system where users can follow each other and also become friends : http://neo4j.com/docs/stable/cypher-cookbook-newsfeed.html
A user (A) can friend another user (B) and hence A is automatically following B. User A can also follow B without adding B as a friend. Hence there should be a distinction made to the feed results. If A and B are not confirmed friends, A should get status updates from B that are marked public only. If A is a confirmed friend of B, A should get all status updates from B. A, even if is B's friend, can also unfollow his/her feed. (typical facebook model?). So basically I need to check who A follows and grab their updates. However, while doing this I also need to check if A has access to these status updates.
Is there an easy cypher to implement this? Or do you have a better model in mind? Assuming all updates are public following query should work. How would you add privacy setting dimension to it if there are friends only posts too?
MATCH (me { name: 'Joe' })-[rels:FOLLOWS*0..1]-(anotherUser)
WITH anotherUser
MATCH (anotherUser)-[:STATUS]-(latestupdate)-[:NEXT*0..1]-(statusupdates)
RETURN anotherUser.name AS name, statusupdates.date AS date, statusupdates.text AS text
ORDER BY statusupdates.date DESC LIMIT 3
Yes, you can implement all of these requirements, it seems to me to boil down to a few extra carefully chosen WHERE clauses.
Here's your base query, with modifications:
MATCH (me { name: 'Joe' })-[rels:FOLLOWS*0..1]-(anotherUser)
WITH anotherUser
MATCH (anotherUser)-[:STATUS]-(latestupdate)-[:NEXT*0..1]-(statusupdates)
WHERE statusupdates.visibility='PUBLIC'
RETURN anotherUser.name AS name, statusupdates.date AS date, statusupdates.text AS text
ORDER BY statusupdates.date DESC LIMIT 3
Here, I've just added a WHERE to check for visibility=PUBLIC (which I made up, because the sample app doesn't specify those things; that would have to be part of your model one way or another).
You might consider doing that query along with a UNION to another query, which would be intended to fetch only those status updates from friends. (If it's a friend, then it doesn't matter what the visibility is)
MATCH (me { name: 'Joe' })-[:FRIEND]-(friend)-[:STATUS|NEXT*1..]->(statusupdates)
RETURN statusupdates
ORDER BY statusupdates.date DESC LIMIT 3;
Instead of using UNION you could also combine the two queries with an OPTIONAL MATCH clause on the second pattern. But either way, basically, your query needs to get the list of all status updates that are either people you follow whose posts are public, friends posts, or both. So conceptually it's easy to break that into those two separate cases, and then UNION the two result sets together.

Resources