Linked list query takes too long - neo4j

I have build a linked list model using neo4j. Here there is a representation:
A user has a list of Events and each one has two attributes: date and done. Given a particular time, I would like to set all previous events' done attribute to true.
My current query is this one:
MATCH (user:User {id: {myId} })-[rel:PREV*]->(event:Event {done:false})
WHERE event.date <= {eventTime}
SET event.done = true;
This query takes 12 sec when the list has 500 events and I would like to make it faster. One possibility would be to stop the query once it finds an event which is already done, but I don't know how to do it.

You can use shortestPath for this and it will be much faster. In general, you should never use [:REL_TYPE*] because it does an exhaustive search for every path of any length between the nodes.
I created your data:
CREATE (:User {id:1})-[:PREV]->(:Event {id:1, date:1450806880004, done:false})-[:PREV]->(:Event {id:2, date:1450806880003, done:false})-[:PREV]->(:Event {id:3, date:1450806880002, done:true})-[:PREV]->(:Event {id:4, date:1450806880002, done:true});
Then, the following query will find all previous Event nodes in a particular User's linked list where done=false and the date is less than or equal to, say, 1450806880005.
MATCH p = shortestPath((u:User)-[:PREV*]->(e:Event))
WHERE u.id = 1 AND
e.done = FALSE AND
e.date <= 1450806880005
RETURN p;
This yields:
p
[(6:User {id:1}), (6)-[6:PREV]->(7), (7:Event {date:1450806880004, done:false, id:1})]
[(6:User {id:1}), (6)-[6:PREV]->(7), (7:Event {date:1450806880004, done:false, id:1}), (7)-[7:PREV]->(8), (8:Event {date:1450806880003, done:false, id:2})]
So you can see it's returning two paths, one that terminates at Event with id=1 and another that terminates at Event with id=2.
Then you can do something like this:
MATCH p = shortestPath((u:User)-[:PREV*]->(e:Event))
WHERE u.id = 1 AND e.done = FALSE AND e.date <= 1450806880005
FOREACH (event IN TAIL(NODES(p)) | SET event.done = TRUE)
RETURN p;
I'm using TAIL here because it grabs all the nodes except for the first one (since we don't want to update this property for the User node). Now all of the done properties have been updated on the Event nodes:
p
[(6:User {id:1}), (6)-[6:PREV]->(7), (7:Event {date:1450806880004, done:true, id:1})]
[(6:User {id:1}), (6)-[6:PREV]->(7), (7:Event {date:1450806880004, done:true, id:1}), (7)-[7:PREV]->(8), (8:Event {date:1450806880003, done:true, id:2})]
EDIT: And don't forget the super fun bug where the shortestPath function silently sets the maximum hop limit to 15 in Neo4j < 2.3.0. See
ShortestPath doesn't find any path without max hops limit
Find all events between 2 dates
So if you're on Neo4j < 2.3.0, you'll want to do:
MATCH p = shortestPath((u:User)-[:PREV*..1000000000]->(e:Event))
WHERE u.id = 1 AND e.done = FALSE AND e.date <= 1450806880005
FOREACH (event IN TAIL(NODES(p)) | SET event.done = TRUE)
RETURN p;

Your question is fairly vague with performance issues and targets, but one critical thing for performance is creating an index on properties you are examining. In your case, that would mean creating an index on both the done property and the date property:
CREATE INDEX ON :Event(done)
CREATE INDEX ON :Event(date)
Additionally, your query retrieves all events in the entire history of a user, as seen in:
-[rel:PREV*]->
You could cap the depth, such as
-[rel:PREV*..20]->
to prevent complete traversal. That might not give you the outcome you're looking for, but it would prevent long-running queries if you have an extreme number of nodes in your linked list (you haven't specified how large that list could get, so I have no idea if this will actually help).

Related

Filtering path with variable length multiple relationships (People You May Know query)

So let's say we have User nodes, Company nodes, Project nodes, School nodes and Event nodes. And there are the following relationships between these nodes
(User)-[:WORKED_AT {start: timestamp, end:timestamp}]->(Company)
(User)-[:COLLABORATED_ON]->(Project)
(Company)-[:COLLABORATED_ON]->(Project)
(User)-[:IS_ATTENDING]->(Event)
(User)-[:STUDIED_AT]->(School)
I am trying to recommend users to any given user. My starting query looks like this
MATCH p=(u:User {id: {leftId}})-[r:COLLABORATED_ON|:AUTHORED|:WORKED_AT|:IS_ATTENDING|:STUDIED_AT*1..3]-(pymk:User)
RETURN p
LIMIT 24
Now this returns me all the pymk users within 1 to 3 relationships away, which is fine. But I want to filter the path according to the relationship attributes. Like remove the following path if the user and pymk work start date and end date is not overlapping.
(User)-[:WORKED_AT]->(Company)<-[:WORKED_AT]-(User)
I can do this with single query
MATCH (u:User)-[r1:WORKED_AT]->(Company)<-[r2:WORKED_AT]-(pymk:User)
WHERE
(r1.startedAt < r2.endedAt) AND (r2.startedAt < r1.endedAt)
RETURN pymk
But couldn't get my head around doing it within a collection of paths. I don't even know if this is possible.
Any help is appreciated.
This should do the trick:
MATCH p=(:User {id: {leftId}})-[:COLLABORATED_ON|:AUTHORED|:WORKED_AT|:IS_ATTENDING|:STUDIED_AT*1..3]-(:User)
WITH p, [rel in relationships(p) WHERE type(rel) = 'WORKED_AT'] as worked
WHERE size(worked) <> 2 OR
apoc.coll.max([work in worked | work.startedAt]) < apoc.coll.min([work in worked | work.endedAt])
RETURN p
LIMIT 24
We're using APOC here to get the max and min of a collection (the max() and min() aggregation functions in just Cypher are aggregation functions across rows, and can't be used on lists).
This relies on distilling the logic of overlapping down to max([start times]) < min([end times]), which you can check out in this highly popular answer here

neo4j Get list of new nodes (change-feed)

Is there a way on Neo4j to fetch a list of all the new nodes created after a certain time? like a built in change-feed?
I know this could be done by traversing the entire graph and comparing if a node's date is > than the treshold set before.
However, this is not optimal at the very least and would not perform well on a 10 million node graph.
Is there a way to know if new nodes were added? (or relationships) some sort of change feed like a built in bloom filter?
If not, any ideas on getting a change feed every x minutes?
Have you tried an INDEX? With an index your query performance will be improved. Try creating an index in the property related to the creation time of the nodes.
CREATE INDEX ON :Person(created_at)
After, when creating a node, you can use the timestamp() function and save the current timestamp in the property created_at of :Person nodes.
CREATE (:Person {name:'Jon', created_at: timestamp()})
CREATE (:Person {name:'Doe', created_at: timestamp()})
Then you can query normally by the created_at property of :Person nodes and the index will be used.
MATCH (p:Person)
WHERE p.created_at > 1502882338889 // given a timestamp...
RETURN p
Also, if you don't need all the nodes modified after a given timestamp at same time, you can make a pagination in the query and work with pieces of the entire data using SKIP and LIMIT.
MATCH (p:Person)
WHERE p.created_at > 1502882338889 // given a timestamp...
RETURN p
ORDER BY p.created_at
SKIP 1
LIMIT 2

cql doesnt pass node in WITH statement

I'm using the following query to count the number of created users and create a user if the user with that id doesnt exist:
MERGE (uc:UserCounter)
ON CREATE SET uc.count = 0
WITH uc
MATCH (u:User{id:X})
WITH uc, count(u) as counts
MERGE (u:User{id:X})
ON CREATE SET uc.count = uc.count+1, u.id = uc.count, u.creation_ts = TIMESTAMP()
RETURN counts
I'm also returning counts to see if the user existed before or not. This query gives me back
(no rows). After some debugging, I came to the conclusion, that the uc node is not been passed until the end. What am I missing ?
This looks like the same problem as this question: if the user doesn't exist yet, the MATCH will not return any row despite the count() aggregation. You'll need an OPTIONAL MATCH for it to work instead.
Your query seems off though: why would you match/merge on id X, then overwrite it on creation with the current count? It's also dubious it would work correctly when executed concurrently.

Perform MATCH on collection / Break apart collection

I am trying to mimc the functionality of the neo4j browser to display my graph in my front end. The neo4j browser issues two calls for every query - the first call performs the query that the user types into the query box and the second call uses find the relationships between every node returned in the first user-entered query.
{
"statements":[{
"statement":"START a = node(1,2,3,4), b = node(1,2,3,4)
MATCH a -[r]-> b RETURN r;",
"resultDataContents":["row","graph"],
"includeStats":true}]
}
In my application I would like to be more efficient so I would like to be able to get all of my nodes and relationships in a single query. The query that I have at present is:
START person = node({personId})
MATCH person-[:RELATIONSHIP*]-(p:Person)
WITH distinct p
MATCH p-[r]-(d:Data), p-[:DETAILS]->(details), d-[:FACT]->(facts)
RETURN p, r, d, details, facts
This query runs well but it doesn't give me the "d" and "details" nodes which were linked to the original "person".
I have tried to join the "p" and "person" results in a collection:
collect(p) + collect(person) AS people
But this does not allow me to perform a MATCH on the resulting collection. As far as I can figure out there is no way of breaking apart a collection.
The only option I see at the moment is to split the query into two; return the "collect(p) + collect(person) AS people" collection and then use the node values in a second query. Is there a more efficient way of performing this query?
If you use the quantifier *0.. RELATIONSHIP is also match at a depth of 0 making person the same as p in this case. The * without specified limits defaults to 1..infinity
START person = node({personId})
MATCH person-[:RELATIONSHIP*0..]-(p:Person)
WITH distinct p
MATCH p-[r]-(d:Data), p-[:DETAILS]->(details), d-[:FACT]->(facts)
RETURN p, r, d, details, facts

Getting multiple relationship property returns from a single cypher call

I'm new to cypher, neo4j and graph databases in general. The data model I was given to work with is a tad confusing, but it looks like the nodes are just GUID placeholders with all the real "data" as properties on relationships to the nodes (which relates each node back to node zero).
Each node (which basically only has a guid) has a dozen relations with key/value pairs that are the actual data I need. (I'm guessing this was done for versioning?.. )
I need to be able to make a single cypher call to get properties off two (or more) relationships connected to the same node -- here is two calls that I'd like to make into one call;
start n = Node(*)
match n-[ar]->m
where has(ar.value) and has(ar.proptype) and ar.proptype = 'ccid'
return ar.value
and
start n = Node(*)
match n-[br]->m
where has(br.value) and has(br.proptype) and br.proptype = 'description'
return br.value
How do I go about doing this in a single cypher call?
EDIT - for clarification;
I want to get the results as a list of rows with a column for each value requested.
Something returned like;
n.id as Node, ar.value as CCID, br.value as Description
Correct me if I'm wrong, but I believe you can just do this:
start n = Node(*)
match n-[r]->m
where has(r.value) and has(r.proptype) and (r.proptype = 'ccid' or r.proptype = 'description')
return r.value
See here for more documentation on Cypher operations.
Based on your edits I'm guessing that you're actually looking to find cases where a node has both a ccid and a description? The question is vague, but I think this is what you're looking for.
start n = Node(*)
match n-[ar]->m, n-[br]->m
where (has(ar.value) and has(ar.proptype) and ar.proptype = 'ccid') and
(has(br.value) and has(br.prototype) and br.proptype = 'description')
return n.id as Node, ar.value as CCID, br.value as Description
You can match relationships from two sides:
start n = Node(*)
match m<-[br]-n-[ar]->m
where has(ar.value) and has(ar.proptype) and ar.proptype = 'ccid' and
has(br.value) and has(br.proptype) and br.proptype = 'description'
return ar.value, br.value

Resources