how to remove Neo4j nodes with duplicate properties? - neo4j

In Neo4j 2.1.6, I have nodes that are non-unique in respect of a certain property, inputID.
Using Cypher, how do I remove all nodes that are duplicates in terms of a given property, leaving only uniques?
I have tried the following...
MATCH (n:Input)
WITH n.inputID, collect(n) AS nodes
WHERE size(nodes) > 1
FOREACH (n in tail(nodes) | DELETE n)
...but it results in...
Expression in WITH must be aliased (use AS) (line 2, column 6)
"WITH n.inputID, collect(n) AS nodes"
^
Thanks,
G

You're not aliasing that WITH variable. Change this:
WITH n.inputID, collect(n) AS nodes
To this:
WITH n.inputID AS inputID, collect(n) AS nodes

As you correctly found out, using tail on a collection will let you remove the duplicates, don't forget to remove relationships before the node (DETACH) and alias the field as FrobberOfBits mentioned:
MATCH (n:Input)
WITH n.inputID AS inputID, collect(n) AS nodes
WHERE size(nodes) > 1
FOREACH (n in tail(nodes) | DETACH DELETE n)

Related

How Many Nodes Are Involved in a Match

How can I know how many nodes and edges are involved in a MATCH? Is there another way besides Explain / Profile Match?
If you mean how many nodes are matched in a path, such as a variable-length path, then you can assign a path variable for this:
MATCH p = (k:Person {name:'Keanu Reeves'})-[*..8]-(t:Person {name:'Tom Hanks'})
WITH p LIMIT 1
RETURN p, length(p) as pathLength, length(p) + 1 as numberOfNodesInPath
You can also use nodes(p) and relationships(p) to get the collection of nodes and relationships that make up the path, and you can use size() on those collections to get their size.
There exists the COUNT() function of Cypher that allows you to count the number of elements. As for example in this query:
MATCH (n)
RETURN COUNT(n);
This query will count all nodes in your database.
You can find more information in the cypher manual, under the aggregating functions. Check it out.
The following Cypher snippet should return the number of distinct nodes and relationships found by any given MATCH clause. Just replace <your code here> with your MATCH pattern.
MATCH <your code here>
WITH COLLECT(NODES(p)) AS ns, SUM(SIZE(RELATIONSHIPS(p))) AS relCount
UNWIND ns AS nodeList
UNWIND nodeList AS node
RETURN COUNT(DISTINCT node) AS nodeCount, relCount;

how to simplify my cypher with several relationships and the path with several nodes

match p=(a:ACCT_NO)
-[r1:TRX_TO]-(n1:ACCT_NO)
-[r2:TRX_TO]-(n2:ACCT_NO)
-[r3:TRX_TO]-(n3:ACCT_NO)
-[r4:TRX_TO]-(b:ACCT_NO)
-[rb:BELONG_TO]->(c1:CUSTOM_NO{sensitivity:'1'})
-[:RELATE_TO*0..2]-(c2:CUSTOM_NO)
where r1.trxAmt > 10000 and r2.trxAmt > 10000 and r3.trxAmt > 10000 and r4.trxAmt > 10000
and a.acctNo in $doubtAcct
and not n1.acctNo in $fliterAcct
and not n2.acctNo in $fliterAcct
and not n3.acctNo in $fliterAcct
return p;
i want to find the path between a and b, but in the path there are no nodes in the list of $fliterAcct
and trxAmt the attribute of relationships is greater than 10000.
my question is how to simplify my cypher?
because i don't want to find path with n1,n2,n3 and r1,r2,r3 if i need to search in several relationships.
whether i can use the pattern like [r:TRX_TO*...3] (actually i try,but the error is Type mismatch: expected Any,
Map, Node or Relationship but was List (line 2, column 7 (offset: 54)) "where r.trxAmt > 10000")
Here is a simplified query:
MATCH p = (a:ACCT_NO)-[rels:TRX_TO*4]-(:ACCT_NO)-[:BELONG_TO]->(:CUSTOM_NO{sensitivity:'1'})-[:RELATE_TO*0..2]-(:CUSTOM_NO)
WHERE a.acctNo in $doubtAcct AND
ALL(r IN rels WHERE r.trxAmt > 10000) AND
NONE(n IN NODES(p)[1..3] WHERE n.acctNo in $fliterAcct)
RETURN p;
[EDITED]
Or, since the usage of identifiers for variable-length relationships has been deprecated since version 3.2, you can do this instead:
MATCH p = (a:ACCT_NO)-[:TRX_TO*4]-(:ACCT_NO)-[:BELONG_TO]->(:CUSTOM_NO{sensitivity:'1'})-[:RELATE_TO*0..2]-(:CUSTOM_NO)
WHERE a.acctNo in $doubtAcct AND
ALL(r IN RELATIONSHIPS(p)[0..3] WHERE r.trxAmt > 10000) AND
NONE(n IN NODES(p)[1..3] WHERE n.acctNo in $fliterAcct)
RETURN p;
You're on the right track, a variable-length pattern is the right way to go, but you'll need to use a different approach to ensure that all nodes and relationships adhere to the restrictions you want.
match p=(a:ACCT_NO)-[:TRX_TO*4]-(b:ACCT_NO)-[rb:BELONG_TO]->(c1:CUSTOM_NO{sensitivity:'1'})-[:RELATE_TO*0..2]-(c2:CUSTOM_NO)
where a.acctNo in $doubtAcct and all(rel in relationships(p) where type(rel) <> 'TRX_TO' OR r.trxAmt > 10000)
and none(node in nodes(p) where node in $fliterAcct)
return p;

How to write cypher statement to combine nodes when an OPTIONAL MATCH is null?

Background
Hi all, I am currently trying to write a cypher statement that allows me to find a set of paths on a map from a starting point. I want my search result to always return connecting streets within 5 nodes. Optionally, if there's a nearby hospital, I would like my search pattern to also indicate nearby hospitals.
Main Problem
Because there isn't always a nearby hospital to the current street, sometimes my optional match search pattern comes back as null. Here's the current cypher statement I'm using:
MATCH path=(a:Street {id: 123})-[:CONNECTED_TO*..5]-(b:Street)
OPTIONAL MATCH optionalPath=(b)-[:CONNECTED_TO]->(hospital:Hospital)
WHERE ALL (x IN nodes(path) WHERE (x:Street))
WITH DISTINCT nodes(path) + nodes(optionalPath) as n
UNWIND n as nodes
RETURN DISTINCT nodes;
However, this syntax only works if optionalPath contains nodes. If it doesn't, the statement nodes(path) + nodes(optionalPath) is an operation adding null and I get no records. This is true even the nodes(path) term does contain nodes.
What's the best way to get around this problem?
You can use COALESCE to replace a NULL with some other value. For example:
MATCH path=(:Street {id: 123})-[:CONNECTED_TO*..5]-(b:Street)
WHERE ALL (x IN nodes(path) WHERE x:Street)
OPTIONAL MATCH optionalPath=(b)-[:CONNECTED_TO]->(hospital:Hospital)
WITH nodes(path) + COALESCE(nodes(optionalPath), []) as n
UNWIND n as nodes
RETURN DISTINCT nodes;
I have also made a few other improvements:
The WHERE clause was moved up right after the first MATCH. This eliminates the unwanted path values immediately. Your original query would get all path values (even unwanted ones) and always the perform the second MATCH query, and only eliminate unwanted paths afterwards. (But, it is actually not clear if you even need the WHERE clause at all; for example, if the CONNECTED_TO relationship is only used between Street nodes.)
The DISTINCT in your WITH clause would have prevented duplicate n collections, but the collections internally could have had duplicate paths. This was probably not what you wanted.
It seems you don't really want the path, just all the street nodes within 5 steps, plus any connected hospitals. So I would simplify your query to just that, and then condense the 3 columns down to 1.
MATCH (a:Street {id: 123})-[:CONNECTED_TO*..5]-(b:Street)
OPTIONAL MATCH (b)-[:CONNECTED_TO]->(hospital:Hospital)
WITH collect(a) + collect(b) + collect(hospital) as n
UNWIND n as nodez
RETURN DISTINCT nodez;
If Streets can be indirectly connected (hospital in between), Than I'd adjust like this
MATCH (a:Street {id: 123})-[:CONNECTED_TO]-(b:Street)
WITH a as nodez, b as a
MATCH (a)-[:CONNECTED_TO]-(b:Street)
WITH nodez+collect(b) as nodez, b as a
MATCH (a)-[:CONNECTED_TO]-(b:Street)
WITH nodez+collect(b) as nodez, b as a
MATCH (a)-[:CONNECTED_TO]-(b:Street)
WITH nodez+collect(b) as nodez, b as a
MATCH (a)-[:CONNECTED_TO]-(b:Street)
WITH nodez+collect(b) as nodez, b as a
OPTIONAL MATCH (b)-[:CONNECTED_TO]->(hospital:Hospital)
WITH nodez + collect(hospital) as n
UNWIND n as nodez
RETURN DISTINCT nodez;
It's a bit more verbose, but just says exactly what you want (and also adds the start node to the hospital check list)

Excluding Nodes that are unconnected

I have a Graph connected as:
-->(D)-->(E)-->(F)
/
(A)-->(B)
\
-->(C)
The Graph is a tree with root = A with a directed relationship from parent to child through a :HAS_CHILD
What I want to do is to exclude nodes for a given property for instance:
MATCH (n:Node)
WHERE n.name <> "D"
return n
Which would give me the a subgraph:
(E)-->(F)
(A)-->(B)
\
-->(C)
Where E and F is not reachable from the root node. How do I exclude such subtrees?
Preferred result would be:
(A)-->(B)
\
-->(C)
I think we do not have the full picture of your data and what you really want here.
My guess is that your data model is a tree. It seems to me that you're trying to define a node to exclude, which also excludes all branches beneath that node (so in your example, you may have a rich and complex subtree beneath D, and you want to exclude all of that). This assumes a directed relationship down from parents to children in your tree.
If so, you can try the following query. I'm assuming the relationship from parent to child as :HAS_CHILD, since that wasn't included in your description.
MATCH (excluded:Node {name: "D"})
WITH excluded
MATCH (n:Node)
WHERE n <> excluded
AND NOT (excluded)-[:HAS_CHILD*]->(n)
RETURN n
Or, an alternative, which may perform better if your tree is large and the subtree beneath your excluded node is comparatively smaller than the entire tree:
MATCH (excludedRoot:Node {name: "D"})-[:HAS_CHILD*0..]->(excluded)
WITH COLLECT(excluded) as excludedNodes
MATCH (n:Node)
WHERE NOT n IN excludedNodes
RETURN n
So you want all the nodes that are neither D nor only connected to D:
MATCH (excluded:Node {name: "D"})
MATCH (n:Node)
WHERE n <> excluded
OPTIONAL MATCH (n)--(n2:Node)
WHERE n2 <> excluded
WITH n, collect(n2) AS nodes
WHERE size(nodes) > 0
RETURN n
This supposes that there's only one excluded node, as it will exclude the connected nodes for each excluded.
Should there be more than one, this modified query should work:
MATCH (excluded:Node {name: "D"})
WITH collect(excluded) AS excluded
MATCH (n:Node)
WHERE NOT n IN excluded
OPTIONAL MATCH (n)--(n2:Node)
WHERE NOT n2 IN excluded
WITH n, collect(n2) AS nodes
WHERE size(nodes) > 0
RETURN n

Cypher: Node not found in same query

I have this test database:
I want to remove that path with two +B nodes to the right. This case generally can be described as a path, that contains PS nodes (+B nodes are also PS nodes, that do not have an incoming :SOURCE edge. The sub path I want to delete is the one between (writer) (excluding) and and that node that has no incoming :SOURCE edge (including).
For that I have this query:
MATCH p1=(writer:A {type:'writer'})-[*]->(Q:PS)-[:TARGET*]->(T)
WITH (Q), (writer)
MATCH (Q)
WHERE NOT ()-[:SOURCE]->(Q)
WITH (Q), (writer)
MATCH p2=(writer)-[*]->(Q)
WHERE ANY (x IN NODES(p2)[1..] WHERE x:PS AND NOT ()-[:SOURCE]->(x))
WITH REDUCE(s = [], y IN NODES(p2)[1..] | CASE
WHEN y:PS THEN s + y
ELSE s END) AS todo
FOREACH (z IN todo | DETACH DELETE z);
It first identifies the said node(s) and then passes them on to make a new path selection, that ends in that node. This all works correctly. What does not work is the very last part beginning with WITH REDUCE. It says it does not find Q, but Q does not even occur in that part.
The error is Node <some ID> not found. Why is that? Why does it not find the node again and why does it even try to in the last part? Cutting the last part off and the query works as intended up to that point.
[UPDATED]
That error typically means that you are trying to delete a node that has already been deleted, or to delete a relationship that has an endpoint node that has already been deleted.
Try replacing your FOREACH clause with the following snippet, which eliminates duplicate nodes before attempting to delete them, and also deletes the relationships before the nodes:
UNWIND todo AS node
WITH DISTINCT node
MATCH (node)-[r]-()
WITH COLLECT(DISTINCT node) AS nodes, COLLECT(DISTINCT r) AS rels
FOREACH(r IN rels | DELETE r)
FOREACH(n IN nodes | DELETE n);
Also, your query seems to very inefficient. Here is a simpler version, which includes the above fix:
MATCH p=(:A {type:'writer'})-[*]->(:PS)-[:TARGET]->()
WHERE ANY (x IN NODES(p)[1..-1] WHERE x:PS AND NOT ()-[:SOURCE]->(x))
WITH REDUCE(s = [], y IN NODES(p)[1..] | CASE
WHEN y:PS THEN s + y
ELSE s END) AS todo
UNWIND todo AS node
WITH DISTINCT node
MATCH (node)-[r]-()
WITH COLLECT(DISTINCT node) AS nodes, COLLECT(DISTINCT r) AS rels
FOREACH(r IN rels | DELETE r)
FOREACH(n IN nodes | DELETE n);

Resources