How to make transactions running a particular query execute sequentially in Neo4j? - neo4j

I'm working on an app where users can un-bookmark a post they've bookmarked before. But I realized that if multiple requests is sent by a particular user to un-bookmark a post they've bookmarked before, node properties get set multiple times. For example, if user 1 bookmarked post 1, their noOfBookmarks (both user and post) will increase by 1 and when they un-bookmark, their noOfBookmarks will decrease by 1. But sometimes during concurrent requests, I get incorrect or negative noOfBookmarks depending on the number of requests. I'm using MATCH which will return 0 rows when the pattern can't be found.
I think the problem is because of the isolation level neo4j is using. During concurrent requests, the changes made by the first query to run will not be visible to other transactions until the first transaction is committed. So the MATCH is still returning rows, that's why I'm getting invalid properties. I think what I need is for transactions to be executed sequentially or get an exclusive read lock.
I've tried setting a property on the user and post node (before MATCHing the bookmark relationship) which will make the first transaction get a write lock on those nodes. I thought other transactions will wait at this point for the write lock to be released before continuing but it didn't work.
How do I ensure the first transaction during concurrent requests modify the graph and other transactions stop at that MATCH (which is the behaviour during sequential requests)?
This is my cypher query:
MATCH (user:User { id: $userId })
MATCH (post:Post { id: $postId })
WITH user, post
MATCH (user)-[bookmarkRel:BOOKMARKED_POST]->(post)
WITH bookmarkRel, user, post
DELETE bookmarkRel
WITH post, user
SET post.noOfBookmarks = post.noOfBookmarks - 1,
user.noOfBookmarks = user.noOfBookmarks - 1
RETURN post { .* }
Thank you

Related

How to create random relationships between nodes?

I'm trying to create random transaction between bank accounts. I have created the following query:
//Create transactions
CALL apoc.periodic.iterate("
match (a:BANK_ACCOUNT)
WITH apoc.coll.randomItem(collect(a)) as sender
return sender", "
MATCH (b:BANK_ACCOUNT)
WHERE NOT sender = b
WITH apoc.coll.randomItem(collect(b)) as receiver
MERGE (sender)-[r:HAS_TRANSFERED {time: datetime()}]->(receiver)
set r.amount = rand()*1000",
{batchSize:100, parallel:false});
I would assume that it would create 100 random transactions between random bank accounts. Instead it creates 1 new bank account and 1 new relationship. What am I doing wrong and what should I do?
Thanks for your help !
The following query uses apoc.coll.randomItems to get 200 different random accounts at one time (which is much faster than getting one random account 200 times):
MATCH (ba:BankAccount)
WITH apoc.coll.randomItems(COLLECT(ba), 200) AS accts
WHERE SIZE(accts) > 1
UNWIND RANGE(0, SIZE(accts)/2*2-1, 2) AS i
WITH accts[i] AS sender, accts[i+1] AS receiver
CREATE (sender)-[:TRANSFERED_TO {time: datetime()}]->(receiver)
Notes:
This query uses CREATE instead of MERGE because it is unlikely that a TRANSFERED_TO relationship already exists with the current time as the time value. (You can choose to use MERGE anyway, if duplication is still possible.)
The WHERE SIZE(accts) > 1 test avoids errors when there are not at least 2 accounts.
SIZE(accts)/2*2-1 calculation prevents the RANGE function from generating a list index (i) that exceeds the last valid index for a sender account.

Is Neo4j newsfeed flawed?

I am using this example, http://neo4j.com/docs/stable/cypher-cookbook-newsfeed.html, to maintain newsfeeds for my users. So I use the following to post a status update:
MATCH (me)
WHERE me.name='Bob'
OPTIONAL MATCH (me)-[r:STATUS]-(secondlatestupdate)
DELETE r
CREATE (me)-[:STATUS]->(latest_update { text:'Status',date:123 })
WITH latest_update, collect(secondlatestupdate) AS seconds
FOREACH (x IN seconds | CREATE (latest_update)-[:NEXT]->(x))
RETURN latest_update.text AS new_status
I encountered a severe flaw in this and don't know how to fix it. In a very rare scenario where two status updates are posted at the exactly same time (ex. 10ms apart), instead of replacing the current status, Neo4j creates two status updates. This leads to a much bigger problem where, the next updates are posted twice!
This looks like a race condition. To resolve that you basically need to make sure that at a given time only one transaction is modifying the status for this specific user.
Neo4j's Java API does have the ability to set locks to achieve this. Cypher doesn't have an explicit feature for this but you can e.g. remove a non-existing property to force a lock on the given node. With a lock in place concurrent transaction need to wait this the holder of the lock is finished with his transaction.
So grab a lock early in your statement:
MATCH (me)
WHERE me.name='Bob'
REMOVE me._not_existing // side effect: grab a lock early
WITH me
OPTIONAL MATCH (me)-[r:STATUS]-(secondlatestupdate)
DELETE r
CREATE (me)-[:STATUS]->(latest_update { text:'Status',date:123 })
WITH latest_update, collect(secondlatestupdate) AS seconds
FOREACH (x IN seconds | CREATE (latest_update)-[:NEXT]->(x))
RETURN latest_update.text AS new_status

Returning like count and whether current user liked a post in Cypher

I have a possibly bone-headed question, but I'm just starting out with Neo4j, and I hope someone can help me out with learning Cypher syntax, which I've just started learning and evaluating.
I have two User nodes, and a single NewsPost node. Both users LIKE the NewsPost. I'm able to construct a Cypher query to count the likes for the post, but I'm wondering if it's also possible to check if the current user has liked the post in the same query.
What I have so far for a Cypher query is
match (p:NewsPost)<-[r:LIKES]-(u:User)
where id(p) = 1
return p, count(*)
Which returns the post and like count, but I can't figure out the other part of "has the current user liked this post". I know you're not supposed to filter on <id>, but I learned that after the fact and I'll go back and fix it later.
So first, is it possible to answer the "has the current user liked this post" question in the same query? And if so, how do I modify my query to do that?
The smallest change to your query that adds a true/false test for a particular user liking the news post would be
MATCH (p:NewsPost)<-[r:LIKES]-(u:User)
WHERE ID(p) = 1
RETURN p, count(r), 0 < size(p<-[:LIKES]-(:User {email:"michael#nero.com"}))
This returns, in addition to your query, the comparison of 0 being less than the size of the path from the news post node via an incoming likes relationship to a user node with email address michael#nero.com. If there is no such path you get false, if there is one or more such paths you get true.
If that does what you want you can go ahead and change the query a little, for instance use RETURN ... AS ... to get nicer result identifiers, and so on.
What you are looking for is Case.
In your database you should have something unique for each user (id property, email or maybe login, I don't know), so you have to match this user, and then match the relation to the post you want, using case you can return a boolean.
Example:
Optional Match (u:User{login:"Michael"})-[r:LIKES]-(p:newPost{id:1})
return CASE WHEN r IS NULL THEN false ELSE true END as userLikesTopic
If you want to get the relation directly (to get a property in it as example) you can remove the CASE part and directly return r, if it does not exist, null will be returned from the query.

Return all users given user chats with and the latest message in conversation

my relationships look like this
A-[:CHATS_WITH]->B - denotes that the user have sent at least 1 mesg to the other user
then messages
A-[:FROM]->message-[:SENT_TO]->B
and vice versa
B-[:FROM]->message-[:SENT_TO]->A
and so on
now i would like to select all users a given user chats with together with the latest message between the two.
for now i have managed to get all messages between two users with this query
MATCH (me:user)-[:CHATS_WITH]->(other:user) WHERE me.nick = 'bazo'
WITH me, other
MATCH me-[:FROM|:SENT_TO]-(m:message)-[:FROM|:SENT_TO]-other
RETURN other,m ORDER BY m.timestamp DESC
how can I return just the latest message for each conversation?
Taking what you already have do you just want to tag LIMIT 1 to the end of the query?
The preferential way in a graph store is to manually manage a linked list to model the interaction stream in which case you'd just select the head or tail of the list. This is because you are playing to the graphs strengths (traversal) rather than reading data out of every Message node.
EDIT - Last message to each distinct contact.
I think you'll have to collect all the messages into an ordered collection and then return the head, but this sounds like it get get very slow if you have many friends/messages.
MATCH (me:user)-[:CHATS_WITH]->(other:user) WHERE me.nick = 'bazo'
WITH me, other
MATCH me-[:FROM|:SENT_TO]-(m:message)-[:FROM|:SENT_TO]-other
WITH other, m
ORDER BY m.timestamp DESC
RETURN other, HEAD(COLLECT(m))
See: Neo Linked Lists and Neo Modelling a Newsfeed.

Modelling a forum with Neo4j

I need to model a forum with Neo4j. I have "forums" nodes which have messages and, optionally, these messages have replies: forum-->message-->reply
The cypher query I am using to retrieve the messages of a forum and their replies is:
start forum=node({forumId}) match forum-[*1..]->msg
where (msg.parent=0 and msg.ts<={ts} or msg.parent<>0)
return msg ORDER BY msg.ts DESC limit 10
This query retrieves the messages with time<=ts and all their replies (a message has parent=0 and a reply has parent<>0)
My problem is that I need to retrieve pages of 10 messages (limit 10) independently of the number or replies.
For example, if I had 20 messages and the first one with 100 replies, it would only return 10 rows: the first message and 9 replies but I need the first 10 messages and the 100 replies of the first one.
How can I limit the result based on the number of messages and not their replies?
The ts property is indexed, but is this query efficient when mixing it with other where clauses?
Do you know a better way to model this kind of forum with Neo?
Supposing you switch to labels and avoid IDs (as they can be recycled and therefore are not stable identifiers):
MATCH (forum:FORUM)<--(message:MESSAGE {parent:0})
WHERE forum.name = '%s' // where %s identifies the forum in a *stable* way
WITH message // using a subquery allows to apply LIMIT only to main messages
ORDER BY message.ts DESC
LIMIT 10
OPTIONAL MATCH (message)<-[:REPLIES_TO]-(replies)
RETURN message, replies
The only important change here is to split the reply and message matching in two sub-queries, so that the LIMIT clause applies to the first subquery only.
However, you need to link the relevant replies to the matched main messages in the second subquery (I introduced a fictional relationship REPLIES_TO to link replies to messages).
And when you need to fetch page 2,3,4 etc.
You need an extra parameter (which the biggest message timestamp of the previous page, let's say previous_timestamp).
The first sub-query WHERE clause becomes:
WHERE forum.name = '%s' AND message.ts > previous_timestamp

Resources