The following Cypher query creates the cnt property and sets all to 0 when I run it the first time. The exact query run a second time updates the cnt property. Is it possible to increment node cnt for each relation that is added upon the load without running twice?
LOAD CSV WITH HEADERS FROM "file:///graph_data.csv" AS row
MERGE (t1:Term {word:row.term1})
MERGE (t2:Term {word:row.term2})
WITH t1, t2, row
MERGE (t1)-[:TOGETHER {id:row.id}]-(t2)
ON MATCH SET
t1.cnt = t1.cnt+1,
t2.cnt = t2.cnt+1
ON CREATE SET
t1.cnt=0,
t2.cnt=0
RETURN t1,t2
The following query finds the counts (number of relationships) associated with each node. This seems to be a better method than storing the count as a property.
MATCH (t:Term), (s:Term)
WHERE t <> s AND (t)-[:TOGETHER]-(s)
RETURN t.word, COUNT((t)-[:TOGETHER]-(s));
It may not be necessary for you to store and maintain cnt properties at all.
For instance, to find out how many TOGETHER relationships a specific Term has:
MATCH (t:Term {word: 'cat'})
RETURN COUNT((t)-[:TOGETHER]-()) AS cnt;
Related
match(m:master_node:Application)-[r]-(k:master_node:Server)-[r1]-(n:master_node)
where (m.name contains '' and (n:master_node:DeploymentUnit or n:master_node:Schema))
return distinct m.name,n.name
Hi,I am trying to get total number of records for the above query.How I change the query using count function to get the record count directly.
Thanks in advance
The following query uses the aggregating funtion COUNT. Distinct pairs of m.name, n.name values are used as the "grouping keys".
MATCH (m:master_node:Application)--(:master_node:Server)--(n:master_node)
WHERE EXISTS(m.name) AND (n:DeploymentUnit OR n:Schema)
RETURN m.name, n.name, COUNT(*) AS cnt
I assume that m.name contains '' in your query was an attempt to test for the existence of m.name. This query uses the EXISTS() function to test that more efficiently.
[UPDATE]
To determine the number of distinct n and m pairs in the DB (instead of the number of times each pair appears in the DB):
MATCH (m:master_node:Application)--(:master_node:Server)--(n:master_node)
WHERE EXISTS(m.name) AND (n:DeploymentUnit OR n:Schema)
WITH DISTINCT m.name AS n1, n.name AS n2
RETURN COUNT(*) AS cnt
Some things to consider for speeding up the query even further:
Remove unnecessary label tests from the MATCH pattern. For example, can we omit the master_node label test from any nodes? In fact, can we omit all label testing for any nodes without affecting the validity of the result? (You will likely need a label on at least one node, though, to avoid scanning all nodes when kicking off the query.)
Can you add a direction to each relationship (to avoid having to traverse relationships in both directions)?
Specify the relationship types in the MATCH pattern. This will filter out unwanted paths earlier. Once you do so, you may also be able to remove some node labels from the pattern as long as you can still get the same result.
Use the PROFILE clause to evaluate the number of DB hits needed by different Cypher queries.
You can find examples of how to use count in the Neo4j docs here
In your case the first example where:
count(*)
Is used to return a count of each returned item should work.
I have a couple of nodes and I need to add a incremental value in all nodes of a particular label.
match wd= (w:MYNODE)
forEach(n IN nodes(wd)|
set n.incrementalId=??
);
currently I have tried for each to traverse on each node but unable to get index of the each loop
ps: I have also tried with but unable to increment ie ++ the current value.
This is a more succinct version of #InverseFalcon's first query (and resembles the form of the query in your question):
MATCH (w:MYNODE)
WITH COLLECT(w) as ws
FOREACH(i IN RANGE(0, SIZE(ws)-1) | SET (ws[i]).incrementalId = i);
If the nodes already exist, then you'll have to use a more awkward approach using collections and indexes:
MATCH (w:MYNODE)
WITH collect(w) as myNodes
UNWIND range(0, size(myNodes)-1) as index
WITH index, myNodes[index] as node
SET node.incrementalId = index
If the nodes don't already exist, and you want to create some number of them with incremental ids, then it's an easier task using the range() function to produce a list of indexes:
UNWIND range(1, 100) as id
CREATE (:MYNODE {id:id})
I am trying to do a model for state changes of a batch. I capture the various changes and I have an Epoch time column to track these. I managed to get this done with the below code :
MATCH(n:Batch), (n2:Batch)
WHERE n.BatchId = n2.Batch
WITH n, n2 ORDER BY n2.Name
WITH n, COLLECT(n2) as others
WITH n, others, COALESCE(
HEAD(FILTER(x IN others where x.EpochTime > n.EpochTime)),
HEAD(others)
) as next
CREATE (n)-[:NEXT]->(next)
RETURN n, next;
It makes my graph circular because of the HEAD(others) and doesn't stop at the Node with the maximum Epoch time. If I remove the HEAD(others) then I am unable to figure out how to stop the creation of relationship for the last node. Not sure how to put conditions around the creation of relationship so I can stop creating relationships when the next node is null
This might do what you want:
MATCH(n:Batch)
WITH n ORDER BY n.EpochTime
WITH n.BatchId AS id, COLLECT(n) AS ns
CALL apoc.nodes.link(ns, 'NEXT')
RETURN id, ns;
It orders all the Batch nodes by EpochTime, and then collects all the Batch nodes with the same BatchId value. For each collection, it calls the apoc procedure apoc.nodes.link to link all its nodes together (in chronological order) with NEXT relationships. Finally, it returns each distinct BatchId and its ordered collection of Batch nodes.
My question is similar to the one pointed here :
Creating unique node and relationship NEO4J over huge dataset
I have 2 tables Entity (Entities.txt) & Relationships (EntitiesRelationships_Updated.txt) which looks like below: Both the tables are inside an import folder within the Neo4j database. What I am trying to do is load the tables using the load csv command and then create relationships.
As in the table below: If ParentID is 0, it means that ENT_ID does not have a parent. If it is populated, then it has a parent. For example in the table below, ENT_ID = 3 is the parent of ENT_ID = 4 and ENT_ID = 1 is the parent of ENT_ID = 2
**Entity Table**
ENT_ID Name PARENTID
1 ABC 0
2 DEF 1
3 GHI 0
4 JKG 3
**Relationship Table**
RID ENT_IDPARENT ENT_IDCHILD
1 1 2
2 3 5
The Entity table has 2 million records and the relationship tables has about 400K lines
Each RID has a particular tag associated with it. For example RID = 1 has it that the relation is A FATHER_OF B; RID = 2 has it that the relation is A MOTHER_OF B. Similarly there are 20 such RIDs associated.
Both of these are in txt format.
My first step is to load the entity table. I used the following script:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///Entities.txt" AS Entity FIELDTERMINATOR '|'
CREATE (n:Entity{ENT_ID: toInt(Entity.ENT_ID),NAME: Entity.NAME,PARENTID: toInt(Entity.PARENTID)})
This query works fine. It takes about 10 minutes to load 2.8mil records. The next step I do is to index the records:
CREATE INDEX ON :Entity(PARENTID)
CREATE INDEX ON :Entity(ENT_ID)
This query runs fine as well. Following this I tried creating the relationships from the relationship table using a similar query as in the above link:
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///EntitiesRelationships_Updated.txt" AS Rships FIELDTERMINATOR '|'
MATCH (n:A {ENT_IDPARENT : Rships.ENT_IDPARENT})
with Entity, n
MATCH (m:B {ENT_IDCHILD : Rships.ENT_IDCHILD})
with m,n
MERGE (n)-[r:RELATION_OF]->(m);
As I do this, my query keeps running for about an hour and it stops at a particular size(in my case 2.2gb) I followed this query based on the link above. This includes the edit from the solution below and still does not work
I have one more query, which would be as follows (Based on the above link). I run this query as I want to create a relationship based of the Entity table
PROFILE
MATCH(Entity)
MATCH (a:Entity {ENT_ID : Entity.ENT_ID})
WITH Entity, a
MATCH (b:Entity {PARENTID : Entity.PARENTID})
WITH a,b
MERGE (a)-[r:PARENT_OF]->(b)
While I tried running this query, I get a Java Heap Space Error. Unfortunately, I have not been able to get the solution for these.
Could you please advice if I am doing something wrong?
This query allows you to take advantage of your :Entity(ENT_ID) index:
MATCH (child:Entity)
WHERE child.PARENTID > 0
WITH child.PARENTID AS pid, child
MATCH (parent:Entity {ENT_ID : pid})
MERGE (parent)-[:PARENT_OF]->(child);
Cypher does not use indices when the property value comes from another node. To get around that, the above query uses a WITH clause to represent child.PARENTID as a variable (pid). The time complexity of this query should be O(N). You original query has a complexity of O(N * N).
[EDITED]
If the above query takes too long or encounters errors that might be related to running out of memory, try this variant, which creates 1000 new relationships at a time. You can change 1000 to any number that is workable for you.
MATCH (child:Entity)
WHERE child.PARENTID > 0 AND NOT ()-[:PARENT_OF]->(child)
WITH child.PARENTID AS pid, child
LIMIT 1000
MATCH (parent:Entity {ENT_ID : pid})
CREATE (parent)-[:PARENT_OF]->(child)
RETURN COUNT(*);
The WHERE clause filters out child nodes that already have a parent relationship. And the MERGE operation has been changed to a simpler CREATE operation, since we have already ascertained that the relationship does not yet exist. The query returns a count of the number of relationships created. If the result is less than 1000, then all parent relationships have been created.
Finally, to make the repeated queries automated, you can install the APOC plugin on the neo4j server and use the apoc.periodic.commit procedure, which will repeatedly invoke a query until it returns 0. In this example, I use a limit parameter of 10000:
CALL apoc.periodic.commit(
"MATCH (child:Entity)
WHERE child.PARENTID > 0 AND NOT ()-[:PARENT_OF]->(child)
WITH child.PARENTID AS pid, child
LIMIT {limit}
MATCH (parent:Entity {ENT_ID : pid})
CREATE (parent)-[:PARENT_OF]->(child)
RETURN COUNT(*);",
{limit: 10000});
Your entity creation Cypher looks fine, as do your indexes.
I am rather confused about the last two Cypher fragments though.
Since your relationships have a specific label or id associated with them, it's probably best to add your relationships by loading from the relationship table data, though the node labels in your query (A and B) aren't used in your Entity creation and aren't in your graph, and neither are ENT_IDPARENT or ENT_IDCHILD fields. Looks like this isn't really the Cypher you used, but an example you built off of?
I'd change this relationship creation query to this, setting the type property of the relationship for post-processing later (this assumes that there can only be one :RELATION_OF relation between the same two nodes):
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///EntitiesRelationships_Updated.txt" AS Rships FIELDTERMINATOR '|'
MATCH (parent:Entity {ENT_ID : Rships.ENT_IDPARENT})
MATCH (child:Entity {ENT_ID : Rships.ENT_IDCHILD})
MERGE (parent)-[r:RELATION_OF]->(child)
ON CREATE SET r.RID = Rships.RID;
Later on, if you like, you can match on your relationships with an RID, and add the corresponding type ("FATHER_OF", "MOTHER_OF", etc) property.
As for creating the :PARENT_OF relationship, you're doing some extra match on an Entity variable bound to every single node in your graph - get rid of that.
Instead, use this:
PROFILE
// first, match on all Entities with a PARENTID property
MATCH(child:Entity)
WHERE EXISTS(child.PARENTID)
// next, find the parent for each child by the child's PARENTID
WITH child
MATCH (parent:Entity {ENT_ID : child.PARENTID})
MERGE (parent)-[:PARENT_OF]->(child)
// lastly remove the parentid from the child, so it won't be reprocessed
// if we run the query again.
REMOVE child.PARENTID
EDITED the above query to use an existence check on child.PARENTID, and to remove child.PARENTID after the corresponding relationship has been created.
If you need a solution that uses batching, you could do this manually (adding LIMIT 100000 to your WITH child line, or you could install the APOC Procedures Library and use its periodic.commit() function to batch your processing.
I'm trying to find the number of nodes of a certain kind in my database that are connected to more than one other node of another kind. In my case, it's place nodes connected to several name nodes. I have a query that works:
MATCH rels=(p:Place)-[c:Called]->(n:Name)
WITH p,count(n) as counts
WHERE counts > 1
RETURN p;`
However, that only returns the place nodes, and ideally I'd like it to return all the nodes and edges involved. I've found a question on returning variables from before the WITH, but if I include any of the other variables I've defined, the query returns no responses, i.e. this query returns nothing:
MATCH rels=(p:Place)-[c:Called]->(n:Name)
WITH p, count(n) as counts, rels
WHERE counts > 1
RETURN p;
I don't know how to return the information that I want without changing the results of the query. Any help would be much appreciated
The reason your second query returns nothing is because its WITH clause specifies as aggregation "grouping keys" both p and rels. Since each rels path has only a single n value, counts would always be 1.
Something like this might work for you:
MATCH path=(p:Place)-[:Called]->(:Name)
WITH p, COLLECT(path) as paths
WHERE SIZE(paths) > 1
RETURN p, paths;
This returns each matching Place node and all its paths.
Try this:
MATCH (p:Place)-[c:Called]->(n:Name)
WHERE size((p)-[:Called]->(:Name)) > 1
WITH p,count(n) as counts, collect(n) AS names, collect(c) AS calls
RETURN p, names, calls, counts ORDER BY counts DESC;
This query makes use of Cypher's collect() function to create lists of the names and called relationships for each place that has more than Called relationship with a Name node.