I am trying to do a model for state changes of a batch. I capture the various changes and I have an Epoch time column to track these. I managed to get this done with the below code :
MATCH(n:Batch), (n2:Batch)
WHERE n.BatchId = n2.Batch
WITH n, n2 ORDER BY n2.Name
WITH n, COLLECT(n2) as others
WITH n, others, COALESCE(
HEAD(FILTER(x IN others where x.EpochTime > n.EpochTime)),
HEAD(others)
) as next
CREATE (n)-[:NEXT]->(next)
RETURN n, next;
It makes my graph circular because of the HEAD(others) and doesn't stop at the Node with the maximum Epoch time. If I remove the HEAD(others) then I am unable to figure out how to stop the creation of relationship for the last node. Not sure how to put conditions around the creation of relationship so I can stop creating relationships when the next node is null
This might do what you want:
MATCH(n:Batch)
WITH n ORDER BY n.EpochTime
WITH n.BatchId AS id, COLLECT(n) AS ns
CALL apoc.nodes.link(ns, 'NEXT')
RETURN id, ns;
It orders all the Batch nodes by EpochTime, and then collects all the Batch nodes with the same BatchId value. For each collection, it calls the apoc procedure apoc.nodes.link to link all its nodes together (in chronological order) with NEXT relationships. Finally, it returns each distinct BatchId and its ordered collection of Batch nodes.
Related
I am very new to Neo4j, so this is probably a simple question.
I have several hundred nodes with a property "seq" (for sequence). This number basically represents the day of the month. So all of these several hundred nodes have a seq property between 1 and 31. I want to combine all the nodes with the same seq into a single node - so that all the nodes with seq = 1 are combined into a "January 1" node. All nodes with seq =2 are combined into a "January 2" node, etc. I have a property of "pat_id" that will be combined into an array from all the merged noes for a day.
Here is my code:
WITH range(1,31) as counts
UNWIND counts AS cnt
MATCH (n:OUTPT {seq:cnt})
WITH collect(n) AS nodes
CALL apoc.refactor.mergeNodes(nodes, {properties: {
pat_id:'combine',
seq:'discard'}, mergeRels:true})
YIELD node
RETURN node
I initially tried to do this with a FOREACH loop, but I can't do a MATCH inside a FOREACH.
I have been doing an UNWIND, but it is only merging the nodes with the first value (seq = 1). I assume this is because the RETURN statement ends the loop. But when I remove the RETURN statement, I get this error:
Query cannot conclude with CALL (must be RETURN or an update clause) (line 5, column 1 (offset: 99))
"CALL apoc.refactor.mergeNodes(nodes, {properties: {"
Any help would be appreciated.
The problem is with this line:
WITH collect(n) AS nodes
You've matched to all :OUTPT nodes with a sequence number within 1-31, but then you aggregate them into a single large collection, then merge them into a single node.
If you want to collect the nodes according to the sequence number, then the sequence number (in your case, cnt) needs to be the grouping key of the aggregation:
WITH cnt, collect(n) AS nodes
That will get you a row per distinct cnt value, with the list of nodes with the same count on the associated row.
Because Cypher operations execute per row, your APOC refactor call will execute per row. Because each row is associated with a different cnt value, and each has a different list, you will be performing the refactoring for each list separately.
The output will be one row per cnt value, with a single node per row (as a result of merging all the nodes in that row's list into a single node).
The following Cypher query creates the cnt property and sets all to 0 when I run it the first time. The exact query run a second time updates the cnt property. Is it possible to increment node cnt for each relation that is added upon the load without running twice?
LOAD CSV WITH HEADERS FROM "file:///graph_data.csv" AS row
MERGE (t1:Term {word:row.term1})
MERGE (t2:Term {word:row.term2})
WITH t1, t2, row
MERGE (t1)-[:TOGETHER {id:row.id}]-(t2)
ON MATCH SET
t1.cnt = t1.cnt+1,
t2.cnt = t2.cnt+1
ON CREATE SET
t1.cnt=0,
t2.cnt=0
RETURN t1,t2
The following query finds the counts (number of relationships) associated with each node. This seems to be a better method than storing the count as a property.
MATCH (t:Term), (s:Term)
WHERE t <> s AND (t)-[:TOGETHER]-(s)
RETURN t.word, COUNT((t)-[:TOGETHER]-(s));
It may not be necessary for you to store and maintain cnt properties at all.
For instance, to find out how many TOGETHER relationships a specific Term has:
MATCH (t:Term {word: 'cat'})
RETURN COUNT((t)-[:TOGETHER]-()) AS cnt;
I have 2 different nodes with label Class and Parents. These nodes are connected with hasParents Relationship. There are 4 million Class nodes, 700K Parents nodes. I wanted to create a Sibling Relationship between the Class nodes. I did the following query:
Match (A:Class)-[:hasParents]-> (B:Parents) <-[:hasParents]-(C:Class) Merge (A)-[:Sibling]-[C]
This query is taking ages to complete. I have indexed in both class_id and parent_id property of Class and Parents node. I am using Neo4j version 2.1.6. Any suggestion to speed this up.
First of all, the indices won't help the query since the properties are not referenced anywhere in the query.
With 700K Parent nodes and 4M Class nodes, you have on average 5.7 classes per parent. With 5 classes under one parent, there are 15 Sibling relationships, so there would be more than 10M relationships to create for the whole graph.
That's a lot for one transaction, you're almost guaranteed to hit an OutOfMemory error.
To avoid that, you should batch changes into several smaller transactions.
I'd use a marker label to manage the progression. First, mark all the parents:
MATCH (p:Parent) SET p:ToProcess
Then, repeatedly select a subset of the nodes that remain to be processed, and connect the siblings:
MATCH (p:ToProcess)
REMOVE p:ToProcess
WITH p
LIMIT 1000
OPTIONAL MATCH (p)<-[:hasParents]-(c:Class)
WITH p, collect(c) AS children
FOREACH (c1 IN children |
FOREACH (c2 IN filter(c IN children WHERE c <> c1) |
MERGE (c1)-[:Sibling]-(c2)))
RETURN count(p)
As the query returns the number of parents that were processed, you just repeat it until it returns 0. At that point, no parent has the ToProcess label anymore.
Edited:
I need to check whether the relationship exists or not. If not exists then calculate similarity between nodes, which is bit time taking. And then, i need to insert the relation ship between them.
I need to do the repeat this for all the pair of nodes in the graph.
Programmatic paradigm for this scenario is like:
If relationship exits
then calculate similarity and insert relationship
else
do nothing (or) return value
There is also problem with this query is, it may also cause memory exceptions. If so, how to overcome this problem.
This is my query,
MATCH (a{word:"review"}),(b{word:"nothing"})
MERGE (a)-[r:jsim]->(b)
MERGE (a)<-[s:jsim]-(b)
SET r.val =
CASE WHEN NOT (HAS (r.val))
THEN [1]
ELSE 2 END
SET s.val =
CASE WHEN NOT (HAS (s.val))
THEN [2]
ELSE 1 END
RETURN r,s
In my actual problem, the FALSE case has a big query which iterate through all the nodes in the graph which has to store many values in the stack. So, here the memory exception may arise.
My IF ELSE CASE query is :
MATCH (a)-[r]->(b) where r.val>1
WITH collect(DISTINCT b.word) as our_word_pairs,a
MATCH (c)-[r]->(d) where r.val>1 AND Not c = a
WITH collect(DISTINCT d.word) as other_word_pairs,a,c,our_word_pairs
WITH FILTER(X in our_word_pairs where X in other_word_pairs) AS word_pair_intersection,
(our_word_pairs+other_word_pairs) AS all_word_pairs,other_word_pairs,a,c,our_word_pairs
WITH DISTINCT(all_word_pairs) as all_word_pairs,word_pair_intersection,a,c
WITH (1.0*SIZE(word_pair_intersection)/SIZE(all_word_pairs)) AS jsim
Now the jsim is the value I need to assign.
Assume a and b are two nodes, I have to find similarity between them and add the relationship with the value. Similarity of A and B is the common nodes between them divided by the total no.of nodes they are connected with.
Ex: A-->p,A-->q.A-->r,A-->s
B-->r,B-->s,B-->t,B-->u,B-->v
Sim(A,B) = Common nodes/Total Nodes
= 2/7
I've this kind of data model in the db:
(a)<-[:has_parent]<-(b)-[:has_parent]-(c)<-[:has_parent]-(...)
every parent can have multiple children & this can go on to unknown number of levels.
I want to find these values for every node
the number of descendants it has
the depth [distance from the node] of every descendant
the creation time of every descendant
& I want to rank the returned nodes based on these values. Right now, with no optimization, the query runs very slow (especially when the number of descendants increases).
The Questions:
what can I do in the model to make the query performant (indexing, data structure, ...)
what can I do in the query
what can I do anywhere else?
edit:
the query starts from a specific node using START or MATCH
to clarify:
a. the query may start from any point in the hierarchy, not just the root node
b. every node under the starting node is returned ranked by the total number of descendants it has, the distance (from the returned node) of every descendant & timestamp of every descendant it has.
c. by descendant I mean everything under it, not just it's direct children
for example,
here's a sample graph:
http://console.neo4j.org/r/awk6m2
First you need to know how to find the root node. The following statement finds the nodes having no outboung parent relationship - be aware that statement is potentially expensive in a large graph.
MATCH (n)
WHERE NOT ((n)-[:has_parent]->())
RETURN n
Instead you should use an index to find that node:
MATCH (n:Node {name:'abc'})
Starting with our root node, we traverse inbound parent relationship with variable depth. On each node traversed we calculate the number of children - since this might be zero a OPTIONAL MATCH is used:
MATCH (root:Node) // line 1-3 to find root node, replace by index lookup
WHERE NOT ((root)-[:has_parent]->())
WITH root
MATCH p =(root)<-[:has_parent*]-() // variable path length match
WITH last(nodes(p)) AS currentNode, length(p) AS currentDepth
OPTIONAL MATCH (currentNode)<-[:has_parent]-(c) // tranverse children
RETURN currentNode, currentNode.created, currentDepth, count(c) AS countChildren