Modelling a forum with Neo4j - neo4j

I need to model a forum with Neo4j. I have "forums" nodes which have messages and, optionally, these messages have replies: forum-->message-->reply
The cypher query I am using to retrieve the messages of a forum and their replies is:
start forum=node({forumId}) match forum-[*1..]->msg
where (msg.parent=0 and msg.ts<={ts} or msg.parent<>0)
return msg ORDER BY msg.ts DESC limit 10
This query retrieves the messages with time<=ts and all their replies (a message has parent=0 and a reply has parent<>0)
My problem is that I need to retrieve pages of 10 messages (limit 10) independently of the number or replies.
For example, if I had 20 messages and the first one with 100 replies, it would only return 10 rows: the first message and 9 replies but I need the first 10 messages and the 100 replies of the first one.
How can I limit the result based on the number of messages and not their replies?
The ts property is indexed, but is this query efficient when mixing it with other where clauses?
Do you know a better way to model this kind of forum with Neo?

Supposing you switch to labels and avoid IDs (as they can be recycled and therefore are not stable identifiers):
MATCH (forum:FORUM)<--(message:MESSAGE {parent:0})
WHERE forum.name = '%s' // where %s identifies the forum in a *stable* way
WITH message // using a subquery allows to apply LIMIT only to main messages
ORDER BY message.ts DESC
LIMIT 10
OPTIONAL MATCH (message)<-[:REPLIES_TO]-(replies)
RETURN message, replies
The only important change here is to split the reply and message matching in two sub-queries, so that the LIMIT clause applies to the first subquery only.
However, you need to link the relevant replies to the matched main messages in the second subquery (I introduced a fictional relationship REPLIES_TO to link replies to messages).
And when you need to fetch page 2,3,4 etc.
You need an extra parameter (which the biggest message timestamp of the previous page, let's say previous_timestamp).
The first sub-query WHERE clause becomes:
WHERE forum.name = '%s' AND message.ts > previous_timestamp

Related

How to make transactions running a particular query execute sequentially in Neo4j?

I'm working on an app where users can un-bookmark a post they've bookmarked before. But I realized that if multiple requests is sent by a particular user to un-bookmark a post they've bookmarked before, node properties get set multiple times. For example, if user 1 bookmarked post 1, their noOfBookmarks (both user and post) will increase by 1 and when they un-bookmark, their noOfBookmarks will decrease by 1. But sometimes during concurrent requests, I get incorrect or negative noOfBookmarks depending on the number of requests. I'm using MATCH which will return 0 rows when the pattern can't be found.
I think the problem is because of the isolation level neo4j is using. During concurrent requests, the changes made by the first query to run will not be visible to other transactions until the first transaction is committed. So the MATCH is still returning rows, that's why I'm getting invalid properties. I think what I need is for transactions to be executed sequentially or get an exclusive read lock.
I've tried setting a property on the user and post node (before MATCHing the bookmark relationship) which will make the first transaction get a write lock on those nodes. I thought other transactions will wait at this point for the write lock to be released before continuing but it didn't work.
How do I ensure the first transaction during concurrent requests modify the graph and other transactions stop at that MATCH (which is the behaviour during sequential requests)?
This is my cypher query:
MATCH (user:User { id: $userId })
MATCH (post:Post { id: $postId })
WITH user, post
MATCH (user)-[bookmarkRel:BOOKMARKED_POST]->(post)
WITH bookmarkRel, user, post
DELETE bookmarkRel
WITH post, user
SET post.noOfBookmarks = post.noOfBookmarks - 1,
user.noOfBookmarks = user.noOfBookmarks - 1
RETURN post { .* }
Thank you

How to get max value in cypher subquery

I've read the questions about subqueries but still stuck with this use case.
I have Documents that contain one or more keywords and each document has linked user comments with a status property. I want to get just the most recent status (if it exists) returned for each document in the query. If I run a query like the following, I just get one row.
MATCH (d:Document)-[:Keys]->(k:Keywords)
WITH d,k
OPTIONAL MATCH (d)--(c:Comments)
ORDER BY c.created DESC LIMIT 1
RETURN d.Title as Title, k.Word as Keyword, c.Status as Status
I have hundreds of documents I want to return with the latest status like:
Title Keyword Status
War in the 19th Century WWI Reviewed
War in the 19th Century Weapons Reviewed
The Great War WWI Pending
World War I WWI <null>
I have tried multiple queries using WITH clause but no luck yet. Any suggestions would be appreciated.
This query should do what you probably intended:
MATCH (d:Document)-[:Keys]->(k:Keywords)
OPTIONAL MATCH (d)--(c:Comments)
WITH d, COLLECT(k.Word) AS Keywords, c
ORDER BY c.created DESC
WHERE c IS NOT NULL
RETURN d.Title as Title, Keywords, COLLECT(c)[0] AS Status
Since a comment is related to a document, and not a document/keyword pair, it makes more sense to return a collection of keywords for each Title/Status pair. Your original query, if it had worked, would have returned the same Title/Status pair multiple times, each time with a different keyword.
We've got a knowledge base article explaining how to limit results of a match per-row, that should give you a few good options.
EDIT
Here's a full example, using apoc.cypher.run() to perform the limited subquery.
MATCH (d:Document)-[:Keys]->(k:Keywords)
WITH d, COLLECT(k.Word) AS Keywords
// collect keywords first so each document on a row
CALL apoc.cypher.run('
OPTIONAL MATCH (d)--(c:Comments)
RETURN c
ORDER BY c.created DESC
LIMIT 1
', {d:d}) YIELD value
RETURN d.Title as Title, Keywords, value.c.status as Status

Neo4j cyper performance on simple match

I have a very simple cypher which give me a poor performance.
I have approx. 2 million user and 60 book category with relation from user to category around 28 million.
When I do this cypher:
MATCH (u:User)-[read:READ]->(bc:BookCategory)
WHERE read.timestamp >= timestamp() - (1000*60*60*24*30)
RETURN distinct(bc.id);
It returns me 8.5k rows within 2 - 2.5 (First time) minutes
And when I do this cypher:
MATCH (u:User)-[read:READ]->(bc:BookCategory)
WHERE read.timestamp >= timestamp() - (1000*60*60*24*30)
RETURN u.id, u.email, read.timestamp;
It return 55k rows within 3 to 6 (First time) minutes.
I already have index on User id and email, but still I don't think this performance is acceptable. Any idea how can I improve this?
First of all, you can profile your query, to find what happens under the hood.
Currently looks like that query scans all nodes in database to complete query.
Reasons:
Neo4j support indexes only for '=' operation (or 'IN')
To complete query, it traverses all nodes, one by one, checking each node if it has valid timestamp
There is no straightforward way to deal with this problem.
You should look into creating proper graph structure, to deal with Time-specific queries more efficiently. There are several ways how to represent time in graph databases.
You can take look on graphaware/neo4j-timetree library.
Can you explain your model a bit?
Where are the books and the "reading"-Event in it?
Afaik all you want to know, which book categories have been recently read (in the last month)?
You could create a second type of relationship thats RECENTLY_READ which expires (is deleted) by a batch job it is older than 30 days. (That can be two simple cypher statements which create and delete those relationships).
WITH (1000*60*60*24*30) as month
MATCH (a:User)-[read:READ]->(b:BookCategory)
WHERE read.timestamp >= timestamp() - month
MERGE (a)-[rr:RECENTLY_READ]->(b)
WHERE coalesce(rr.timestamp,0) < read.timestamp
SET rr.timestamp = read.timestamp;
WITH (1000*60*60*24*30) as month
MATCH (a:User)-[rr:RECENTLY_READ]->(b:BookCategory)
WHERE rr.timestamp < timestamp() - month
DELETE rr;
There is another way to achieve what you exactly want to do here, but it's unfortunately not possible in Cypher.
With a relationship-index on timestamp on your read relationship you can run a Lucene-NumericRangeQuery in Neo4j's Java API.
But I wouldn't really recommend to go down this route.

Return all users given user chats with and the latest message in conversation

my relationships look like this
A-[:CHATS_WITH]->B - denotes that the user have sent at least 1 mesg to the other user
then messages
A-[:FROM]->message-[:SENT_TO]->B
and vice versa
B-[:FROM]->message-[:SENT_TO]->A
and so on
now i would like to select all users a given user chats with together with the latest message between the two.
for now i have managed to get all messages between two users with this query
MATCH (me:user)-[:CHATS_WITH]->(other:user) WHERE me.nick = 'bazo'
WITH me, other
MATCH me-[:FROM|:SENT_TO]-(m:message)-[:FROM|:SENT_TO]-other
RETURN other,m ORDER BY m.timestamp DESC
how can I return just the latest message for each conversation?
Taking what you already have do you just want to tag LIMIT 1 to the end of the query?
The preferential way in a graph store is to manually manage a linked list to model the interaction stream in which case you'd just select the head or tail of the list. This is because you are playing to the graphs strengths (traversal) rather than reading data out of every Message node.
EDIT - Last message to each distinct contact.
I think you'll have to collect all the messages into an ordered collection and then return the head, but this sounds like it get get very slow if you have many friends/messages.
MATCH (me:user)-[:CHATS_WITH]->(other:user) WHERE me.nick = 'bazo'
WITH me, other
MATCH me-[:FROM|:SENT_TO]-(m:message)-[:FROM|:SENT_TO]-other
WITH other, m
ORDER BY m.timestamp DESC
RETURN other, HEAD(COLLECT(m))
See: Neo Linked Lists and Neo Modelling a Newsfeed.

getting random rows with yql?

I want to use javascript to fetch data with yql from flickr,
e.g.
select id from flickr.photos.search(10) where text = 'music' and license=4
however, I would like to fetch 10 random rows, rather then the latest, since the latest tend to be 10 photos all from the same person.
ist that possible in yql itself (I suspect not),
or any workarounds that could bring the same effect?
(it does not have to be complete random, the main thing I want to avoid is to get 10 photos from the same poster)
To get only results from unique owners, you can use the unique() function (docs).
My suggestion would be to query for a larger result set (more likely to have 10 unique people) then call unique() followed by truncate() to limit to 10 results, as below.
select id from flickr.photos.search(100) where text = 'music' and
license=4 | unique(field="owner") | truncate(count=10)

Resources