How to recursively update a Project Planning Graph with Cypher - neo4j

I am refering to this graphgist : https://neo4j.com/graphgist/project-management
I'm actually trying to update a project plan when a duration on one task changes.
In the GraphGist, the whole project is always calculated from the initial activity to the last activity. This doesn't work great for me in a multi-project environment where I don't really know what the starting point is, and I don't know either the end point. What I would like for now, is just to update the earliest start of any activity which depends on a task I just updated.
The latest I have is the following :
MATCH p1=(:Activity {description:'Perform needs analysis'})<-[:REQUIRES*]-(j:Activity)
UNWIND nodes(p1) as task
MATCH (pre:Activity)<-[:REQUIRES]-(task:Activity)
WITH MAX(pre.duration+pre.earliest_start) as updateEF,task
SET task.earliest_start = updateEF
The intent is to get all the paths in the projects which depends on the task I just updated (in this case : "perform needs analysis"), also at every step of the path I'm checking if there aren't other dependencies which would override my duration update.
So, of course it only works on the direct connections.
if I have A<-[:requires]-B<-[:requires]-C
if I increase duration A, I believe it updates B based on A, but then C is calculated with the duration of B before B duration was updated.
How can I make this recursive? maybe using REDUCE?
(still searching...)

This is a very interesting issue.
You want to update the nodes 1 step away from the originally updated node, and then update the nodes 2 steps away (incorporating the previously-updated values as appropriate), and then 3 steps away, and so on until every node reachable from the original node has been updated.
The Cypher planner does not generate code that performs this kind of query/update pattern, where new values are propagated step by step through paths.
However, there is a workaround using the APOC plugin. For example, using apoc.periodic.iterate:
CALL apoc.periodic.iterate(
"MATCH p=(:Activity {description:'Perform needs analysis'})<-[:REQUIRES*]-(task:Activity)
RETURN task ORDER BY LENGTH(p)",
"MATCH (pre:Activity)<-[:REQUIRES]-(task)
WITH MAX(pre.duration+pre.earliest_start) as updateEF, task
SET task.earliest_start = updateEF",
{batchSize:1})
The first Cypher statement passed to the procedure generates the task nodes, ordered by distance from the original node. The second Cypher statement gets the pre nodes for each task, and sets the appropriate earliest_start value for that task. The batchSize:1 option tells the procedure to perform every iteration of the second statement in its own transaction, so that subsequent iterations will see the updated values.
NOTE: If the same task can be encountered multiple times at different distances, you will have to determine if this approach is right for you. Also, you cannot have other operations writing to the DB at the same time, as that could lead to inconsistent results.

Related

apoc.periodic.iterate fails the batch if there is an duplicate data in parameter

I am using an apoc.periodic.iterate query to store millions of data . Since the data may contain duplicates I am using MERGE action to create nodes but unfortunately whenever the data is duplicated the whole batch is getting with error like this
"LockClient[200] can't wait on resource RWLock[NODE(14), hash=1645803399] since => LockClient[200] <-[:HELD_BY]- RWLock[NODE(101)"
Changing parallel as false works fine
Also by removing duplicates the query is passed successfully
But both of the above solution takes more time since dealing with millions of data . Is there any alternate solution like making a it to wait for the lock
You cannot use parallel:true, because you are creating relationships in your query. Every time you want to add a relationship to a node, the cypher engine adds a write lock to a node, and other processes can't add to that particular node. That is why you have the write lock exception. Not much you can do except to run it with parallel:false setting.
To avoid deadlocks, concurrent requests that update the DB should avoid touching the same nodes or relationships (including the nodes on both ends of those relationships). One way to achieve this is to figure out a way to have the concurrent requests work on disjoint subgraphs.
Or, you can retry queries that throw a DeadlockDetectedException. The docs show an example of how to do that.

Creating relationships between nodes in neo4j is extremely slow

I'm using a python script to generate and execute queries loaded from data in a CSV file. I've got a substantial amount of data that needs to be imported so speed is very important.
The problem I'm having is that merging between two nodes takes a very long time, and including the cypher to create the relations between the nodes causes a query to take around 3 seconds (for a query which takes around 100ms without).
Here's a small bit of the query I'm trying to execute:
MERGE (s0:Chemical{`name`: "10074-g5"})
SET s0.`name`="10074-g5"
MERGE (y0:Gene{`gene-id`: "4149"})
SET y0.`name`="MAX"
SET y0.`gene-id`="4149"
MERGE (s0)-[:INTERACTS_WITH]->(y0)
MERGE (s1:Chemical{`name`: "10074-g5"})
SET s1.`name`="10074-g5"
MERGE (y1:Gene{`gene-id`: "4149"})
SET y1.`name`="MAX"
SET y1.`gene-id`="4149"
MERGE (s1)-[:INTERACTS_WITH]->(y1)
Any suggestions on why this is running so slowly? I've got index's set up on Chemical->name and Gene->gene-id so I honestly don't understand why this runs so slowly.
Most of your SET clauses are just setting properties to the same values they already have (as guaranteed by the preceding MERGE clauses).
The remaining SET clauses probably only need to be executed if the MERGE had created a new node. So, they should probably be preceded by ON CREATE.
You should never generate a long sequence of almost identical Cypher code. Instead, your Cypher code should use parameters, and you should pass your data as parameter(s).
You said you have a :Gene(id) index, whereas your code actually requires a :Gene(gene-id) index.
Below is sample Cypher code that uses the dataList parameter (a list of maps containing the desired property values), which fixes most of the above issues. The UNWIND clause just "unwinds" the list into individual maps.
UNWIND $dataList AS d
MERGE (s:Chemical{name: d.sName})
MERGE (y:Gene{`gene-id`: d.yId})
ON CREATE SET y.name=d.yName
MERGE (s)-[:INTERACTS_WITH]->(y)

Cypher Query: Get related nodes of shortest path

I try to develop a routing system, based on data in a Neo4j 3.0.4 Database. The graph contains multiple stops. Some of these stops are scheduled like bus stops, train stops, but not all of them. Therefore, the are connected to a schedule node. Each schedule node is connected to an offer.
A subgraph looks like this:
My question is: How can I create a query that returns this subgraph? Up to now I wrote this query in cypher:
MATCH (from:Stop{poiId:'A'}), (to:Stop{poiId:'Z'}) ,
path = allShortestPaths((from)-[r*]->(to))
RETURN path
This results in all shortest paths from stop A to stop Z. Between A and Z are to more stops, that are included in the returned path. I want to get for all stops the related schedules and for these schedules the related offers.
Furthermore it would be great if it would be possible to use constrains, based on the schedule node, e. g. allShortestPath from A to Z where filter(time in schedule.monday WHERE x > 1100).
If that is not possible, is it possilbe to create a new query with this constrain based on the previous query?
EDIT1: Further information:
In the schedules are departure times for each stop. I want to calculate based on a desired departure time (alternatively a desired arrival time) the full travel time and get the 5 best connections (less time).
E.g. I want to start at 7:00: the switch relation has cost time of 2. so check schedule 1 if there is a departure after 7:02. if yes, take the first departure after 7:02. The connected_by relation has a cost time of 12 min. the last switch_to relation has no cost time. So I will arrive at 07:14. Note: If I have to switch the service line during travelling, I have to check the schedule again. If the schedule fits not the desired time windows, exclude it from the result. I want to get the 5 best paths (based on travel time or arrival time), the number of hops is not important. If there is a connection with e. g. 6 stops, but with less travel time (or earlier arrival time) that prefer this one. I know this is a difficult and big problem, but I have no idea how to start... If there is a way to do this via REST (or if not in Java) I would be glad for each hint!
You can use the UNWIND construct in Cypher to get the nodes of a path and use OPTIONAL MATCH to look for schedules & offers.
I created a sample dataset:
CREATE
(offer: Offer),
(sch1: Schedule),
(sch2: Schedule),
(stop1: Stop {name: "stop1"}),
(stop2: Stop {name: "stop2"}),
(stop3: Stop {name: "stop3"}),
(stop4: Stop {name: "stop4"}),
(stop1)-[:SWITCH_TO]->(stop2),
(stop2)-[:CONNECTED_BY]->(stop3),
(stop3)-[:SWITCH_TO]->(stop4),
(stop2)-[:SCHEDULED_BY]->(sch1),
(stop3)-[:SCHEDULED_BY]->(sch2),
(sch1)-[:OFFERED_BY]->(offer),
(sch2)-[:OFFERED_BY]->(offer)
To get the subgraph, you can issue this query:
MATCH
(from:Stop {name:'stop1'}), (to:Stop {name:'stop4'}),
path = allShortestPaths((from)-[r*]->(to))
UNWIND nodes(path) AS stopNode
OPTIONAL MATCH (stopNode)-[sb:SCHEDULED_BY]->(schedule:Schedule)-[ob:OFFERED_BY]-(offer:Offer)
RETURN stopNode, sb, ob, schedule, offer
Using this approach, the edges in r are dropped, so it does not return the whole subgraph. The visualization on Neo4j's web UI adds those edges, so the result looks like this:
Anyways, I hope the post contains useful information - let me know how it works for you.

Different results of two (synonymous) queries in Neo4j

I have identified that some queries happen to return less results than expected. I have taken one of the missing results and tried to force Neo4j to return this result - and I succeeded with the following query:
match (q0),(q1),(q2),(q3),(q4),(q5)
where
q0.name='v4' and q1.name='v3' and q2.name='v5' and
q3.name='v1' and q4.name='v3' and q5.name='v0' and
(q1)-->(q0) and (q0)-->(q3) and (q2)-->(q0) and (q4)-->(q0) and
(q5)-->(q4)
return *
I have supposed that the following query is semantically equivalent to the previous one. However in this case, Neo4j returns no result at all.
match (q1)-->(q0), (q0)-->(q3), (q2)-->(q0), (q4)-->(q0), (q5)-->(q4)
where
q0.name='v4' and q1.name='v3' and q2.name='v5' and
q3.name='v1' and q4.name='v3' and q5.name='v0'
return *
I have also manually verified that the required edges among vertices v0, v1, v3, v4 and v5 are present in the database with right directions.
Am I missing some important difference between these queries or is it just a bug of Neo4j? (I have tested these queries on Neo4j 2.1.6 Community Edition.)
Thank you for any advice
/EDIT: Updating to newest version 2.2.1 was of no help.
This might not be a complete answer, but here's what I found out.
These queries aren't synonymous, if I understand correctly.
First of all, use EXPLAIN (or even PROFILE) to look under the hood. The first query will be executed as follows:
The second query:
As you can see (even without going deep down), those are different queries in terms of both efficiency and semantics.
Next, what's actually going on here:
the 1st query will look through all (single) nodes, filter them by name, then - try to group them according to your pattern, which will involve computing Cartesian product (hence the enormous space complexity), then collect those groups into the larger ones, and then evaluate your other conditions.
the 2nd query will first pick a pair of nodes connected with some relationship (which satisfy the condition on the name property), then throw in the third node and filter again, ..., and so on till the end. The number of nodes is expected to decrease after every filter cycle.
By the way, is it possible that you accidentally set the same name twice (for q1 and q3?)

Multiple Match statements in Neo4j

I have a list of MATCH statements which are totally unrelated to each other. But if I execute them like
MATCH (a:Person),(b:InProceedings) WHERE a.identifier = 'person/joseph-valeri' and b.identifier = 'conference/edm2008/paper/209' CREATE (a)-[r:creator]->(b)
MATCH (a:Person),(b:InProceedings) WHERE a.identifier = 'person/nell-duke' and b.identifier = 'conference/edm2008/paper/209' CREATE (a)-[r:creator]->(b)
But if I execute them at once I get the following error:
WITH is required between CREATE and MATCH (line 2, column 1)
What changes should I incorporate?
(I am new to Neo4j)
Does this need to happen in a single transaction? In which case you should be matching your nodes up front before performing the create:
MATCH (jo:Person{identifier:'person/joseph-valeri'}), (nell:Person{identifier:'person/nell-duke'}), (b:InProceedings{identifier:'conference/edm2008/paper/209'})
CREATE (jo)-[:creator]->(b), (nell)-[:creator]->(b)
If it's just the two creators you could change the create to:
CREATE (jo)-[:creator]->(b)<-[:creator]-(nell)
If this isn't what you want to achieve then effectively what you have posted is two distinct Cypher statements that you are trying to run as one, and the parser is getting confused.
Post comment edit
Given that you said millions I think that you are going to find the transaction time on performing the import prohibitive and therefore you should investigate the CSV import syntax (and specifically pay attention to PERIODIC COMMIT) if you can write to CSV instead of to the big Cypher dump?
If for some reason that is not an option and you are starting from empty then build slowly - creating nodes first. These are going to need names to keep the speed up (but these names aren't persisted, just constant in your Cypher query):
CREATE (a:Person{identifier:'person/joseph-valeri'}),
(b:Person{identifier:'person/nell-duke'}),
(zzz:Person{identifier:'person/do-you-really-want-person-in-all-these-identifiers'}),
(inProca:InProceedings{identifier:'conference/edm2008/paper/209'}),
(inProcb:InProceedings{identifier:'conference/edm2009/paper/209'})
You will have kept track of a, b .. zzz in your Python script allowing you to build the CREATE statment up with:
(a)-[:creator]->(inProcA), (zzz)-[:creator]-(inProcB)
Now if all of your nodes already exist and you just want to build the relationships in now, then you have the choice of:
Performing individual MATCH and CREATEs for each new relationship, exceuting them each individually. This looks like what your original code was doing. You should move the conditions into the MATCH rather than the WHERE clause.
MATCHing a large set of nodes and CREATEing new realtionships. This is more akin to what my initial code was doing and will require your script to be smart in generating the queries.
MERGEing existing nodes into new relationships.
Whatever you do, you'e going to need to batch the writes within the transaction or you're going to run out of memory - you can advise Neo4J to do this by using the USING PERIODIC COMMIT 50000 syntax, here is a great blog post on it.

Resources