Why does this create two relationships instead of one?
MATCH (a:Person{name:'Barack'}), (b:Person{name:'Raback'})
CREATE (a)-[r:SHAKES_HANDS_WITH{id:toString(rand())}]->(b)
RETURN r
(Random number "id" is just added for demo purposes.)
You probably have 2 Person nodes with the same name (either 'Barack' or 'Raback').
Assuming that the other name has only a single node, the MATCH clause will produce 2 rows -- which will cause the the CREATE clause to be executed twice.
To verify if this is your scenario, this query will show you how many nodes have each name:
MATCH (a:Person)
WHERE a.name IN ['Barack', 'Raback']
RETURN a.name, COUNT(a) as nodeCount
Related
What I have tried :-
UNWIND ["SVC_HAS_DCSV","APP_HAS_EPLD","DCSV_HAS_SVC_EP","PART_HAS_WGE","HAS_REMOTECONNECTION","EPLD_HAS_DCSV","PART_HAS_EPLD","EPLD_HAS_INSTANCE","EPLD_HAS_SVC_EP","ALLOW_CONN_FROM","DONE_BY_POLICY","LOCATION_HAS_DE","APP_HAS_SVC","DE_HAS_EPLD","DCSV_HAS_ENDPOINTS","ALLOW_CONN_TO","CLOUD_HAS_LOCATION","DE_HAS_WGE","DE_HAS_PART"] as rel_name
MATCH (a)-[r]->(b)
where r._edgeType=rel_name AND a.t_id="MCNM-TEST"
WITH DISTINCT count(r) as r_count,rel_name
RETURN rel_name, r_count
Here, I am trying to check for each relation ex. APP_HAS_EPLD, the number of edges in the graph when a.tenant_id is "ABCD-TEST" then collect each rel and r_count and return.
I don't think it can be optimized in its current form, since it's traversing the entire graph, without any index getting used. However, you can make certain modifications and try them.
Add a generic label to all nodes, say Entity, using the following query.
MATCH (a)
SET a:Entity
Create an index on the node label Entity and the property t_id.
CREATE INDEX t_id_entity IF NOT EXISTS FOR (n:Entity) ON (t_id)
Now, try the following query.
MATCH (a:Entity{t_id: 'MCNM-TEST'})-[r]->(b)
UNWIND ["SVC_HAS_DCSV","APP_HAS_EPLD","DCSV_HAS_SVC_EP","PART_HAS_WGE","HAS_REMOTECONNECTION","EPLD_HAS_DCSV","PART_HAS_EPLD","EPLD_HAS_INSTANCE","EPLD_HAS_SVC_EP","ALLOW_CONN_FROM","DONE_BY_POLICY","LOCATION_HAS_DE","APP_HAS_SVC","DE_HAS_EPLD","DCSV_HAS_ENDPOINTS","ALLOW_CONN_TO","CLOUD_HAS_LOCATION","DE_HAS_WGE","DE_HAS_PART"] as rel_name
WITH a, b, r WHERE r._edgeType=rel_name
WITH DISTINCT count(r) as r_count,rel_name
RETURN rel_name, r_count
match(m:master_node:Application)-[r]-(k:master_node:Server)-[r1]-(n:master_node)
where (m.name contains '' and (n:master_node:DeploymentUnit or n:master_node:Schema))
return distinct m.name,n.name
Hi,I am trying to get total number of records for the above query.How I change the query using count function to get the record count directly.
Thanks in advance
The following query uses the aggregating funtion COUNT. Distinct pairs of m.name, n.name values are used as the "grouping keys".
MATCH (m:master_node:Application)--(:master_node:Server)--(n:master_node)
WHERE EXISTS(m.name) AND (n:DeploymentUnit OR n:Schema)
RETURN m.name, n.name, COUNT(*) AS cnt
I assume that m.name contains '' in your query was an attempt to test for the existence of m.name. This query uses the EXISTS() function to test that more efficiently.
[UPDATE]
To determine the number of distinct n and m pairs in the DB (instead of the number of times each pair appears in the DB):
MATCH (m:master_node:Application)--(:master_node:Server)--(n:master_node)
WHERE EXISTS(m.name) AND (n:DeploymentUnit OR n:Schema)
WITH DISTINCT m.name AS n1, n.name AS n2
RETURN COUNT(*) AS cnt
Some things to consider for speeding up the query even further:
Remove unnecessary label tests from the MATCH pattern. For example, can we omit the master_node label test from any nodes? In fact, can we omit all label testing for any nodes without affecting the validity of the result? (You will likely need a label on at least one node, though, to avoid scanning all nodes when kicking off the query.)
Can you add a direction to each relationship (to avoid having to traverse relationships in both directions)?
Specify the relationship types in the MATCH pattern. This will filter out unwanted paths earlier. Once you do so, you may also be able to remove some node labels from the pattern as long as you can still get the same result.
Use the PROFILE clause to evaluate the number of DB hits needed by different Cypher queries.
You can find examples of how to use count in the Neo4j docs here
In your case the first example where:
count(*)
Is used to return a count of each returned item should work.
I wrote a script to to batch create a bunch of relationship in neo4j. Here is the cypher:
:param batch => [{startId: 'abc123', endId: 'abc321'}, {startId: 'abc456', endId: 'abc654']
UNWIND $batch as row
MATCH (from {id: row.startId}
MATCH (to {id: row.endId}
CREATE (from)-[rel:HAS]->(to)
RETURN rel
The problem that there might be some startId/endId entries that don't match any nodes and are silently ignore. Is there a way to return the list of rows that don't match any nodes and create the relationship for the nodes that do match?
I tried OPTIONAL MATCH to fail-fast as soon an id doesn't find a startId/endId however, the query execution was really slow.
First of all, you should always try to specify a label for the node that is used to kick off a MATCH (unless the MATCH pattern uses any already-bound nodes). Otherwise, every single node in the DB must be scanned. In addition, you should consider using indexes to speed up your MATCHs (but, again, you'd need to specify the labels).
Here is a query that uses the APOC procedure apoc.do.when to create a new relationship when appropriate. It returns each row and the corresponding new relationship (or NULL if either node is not found):
UNWIND $batch as row
OPTIONAL MATCH (from:Foo {id: row.startId})
OPTIONAL MATCH (to:Foo {id: row.endId})
CALL apoc.do.when(
from IS NOT NULL AND to IS NOT NULL,
'CREATE (from)-[rel:HAS]->(to) RETURN rel',
'RETURN NULL AS rel',
{from: from, to: to}) YIELD value
RETURN row, value.rel AS rel
According to this post I tried to map all related entities in a list.
I used the same query into the post with a condition to return a list of User but it returns duplicate object
MATCH (user:User) WHERE <complex conditions>
WITH user, calculatedValue
MATCH p=(user)-[r*0..1]-() RETURN user, calculatedValue, nodes(p), rels(p)
Is it a bug? I'm using SDN 4.2.4.RELEASE with neo4j 3.2.1
Not a bug.
Keep in mind a MATCH in Neo4j will find all occurrences of a given pattern. Let's look at your last MATCH:
MATCH p=(user)-[r*0..1]-()
Because you have a variable match of *0..1, this will always return at least one row with just the user itself (with rels(p) empty and nodes(p) containing only the user), and then you'll get a row for every connected node (user will always be present on that row, and in the nodes(p) collection, along with the other connected node).
In the end, when you have a single user node and n directly connected nodes, you will get n + 1 rows. You can run the query in the Neo4j browser, looking at the table results, to confirm.
A better match might be something like:
...
OPTIONAL MATCH (user)-[r]-(b)
RETURN user, calculatedValue, collect(r) as rels, collect(b) as connectedNodes
Because we aggregate on all relationships and connected nodes (rather than just the relationships and nodes for each path), you'll get a single row result per user node.
I have some duplicate nodes, all with the label Tag. What I mean with duplicates is that I have two nodes with the same name property, example:
{ name: writing, _id: 57ec2289a90f9a2deece7e6d},
{ name: writing, _id: 57db1da737f2564f1d5fc5a1},
{ name: writing }
The _id field is no longer used so in all effects these three nodes are the same, only that each of them have different relationships.
What I would like to do is:
Find all duplicate nodes (check)
MATCH (n:Tag)
WITH n.name AS name, COLLECT(n) AS nodelist, COUNT(*) AS count
WHERE count > 1
RETURN name, nodelist, count
Copy all relationships from the duplicate nodes into the first one
Delete all the duplicate nodes
Can this be achieved with cypher query? Or do I have to make a script in some programming language? (this is what I'm trying to avoid)
APOC Procedures has some graph refactoring procedures that can help. I think apoc.refactor.mergeNodes() ought to do the trick.
Be aware that in addition to transferring all relationships from the other nodes onto the first node of the list, it will also apply any labels and properties from the other nodes onto the first node. If that's not something you want to do, then you may have to collect incoming and outgoing relationships from the other nodes and use apoc.refactor.to() and apoc.refactor.from() instead.
Here's the query for merging nodes:
MATCH (n:Tag)
WITH n.name AS name, COLLECT(n) AS nodelist, COUNT(*) AS count
WHERE count > 1
CALL apoc.refactor.mergeNodes(nodelist) YIELD node
RETURN node
The above cypher query didn't work on my Database version 3.4.16
What worked for me was:
MATCH (n:Tag)
WITH n.name AS name, COLLECT(n) AS nodelist, COUNT(*) AS count
WHERE count > 1
CALL apoc.refactor.mergeNodes(nodelist,{
properties:"combine",
mergeRels:true
})
YIELD node
RETURN node;