Is there scope of optimization in this neo4j query? - neo4j

What I have tried :-
UNWIND ["SVC_HAS_DCSV","APP_HAS_EPLD","DCSV_HAS_SVC_EP","PART_HAS_WGE","HAS_REMOTECONNECTION","EPLD_HAS_DCSV","PART_HAS_EPLD","EPLD_HAS_INSTANCE","EPLD_HAS_SVC_EP","ALLOW_CONN_FROM","DONE_BY_POLICY","LOCATION_HAS_DE","APP_HAS_SVC","DE_HAS_EPLD","DCSV_HAS_ENDPOINTS","ALLOW_CONN_TO","CLOUD_HAS_LOCATION","DE_HAS_WGE","DE_HAS_PART"] as rel_name
MATCH (a)-[r]->(b)
where r._edgeType=rel_name AND a.t_id="MCNM-TEST"
WITH DISTINCT count(r) as r_count,rel_name
RETURN rel_name, r_count
Here, I am trying to check for each relation ex. APP_HAS_EPLD, the number of edges in the graph when a.tenant_id is "ABCD-TEST" then collect each rel and r_count and return.

I don't think it can be optimized in its current form, since it's traversing the entire graph, without any index getting used. However, you can make certain modifications and try them.
Add a generic label to all nodes, say Entity, using the following query.
MATCH (a)
SET a:Entity
Create an index on the node label Entity and the property t_id.
CREATE INDEX t_id_entity IF NOT EXISTS FOR (n:Entity) ON (t_id)
Now, try the following query.
MATCH (a:Entity{t_id: 'MCNM-TEST'})-[r]->(b)
UNWIND ["SVC_HAS_DCSV","APP_HAS_EPLD","DCSV_HAS_SVC_EP","PART_HAS_WGE","HAS_REMOTECONNECTION","EPLD_HAS_DCSV","PART_HAS_EPLD","EPLD_HAS_INSTANCE","EPLD_HAS_SVC_EP","ALLOW_CONN_FROM","DONE_BY_POLICY","LOCATION_HAS_DE","APP_HAS_SVC","DE_HAS_EPLD","DCSV_HAS_ENDPOINTS","ALLOW_CONN_TO","CLOUD_HAS_LOCATION","DE_HAS_WGE","DE_HAS_PART"] as rel_name
WITH a, b, r WHERE r._edgeType=rel_name
WITH DISTINCT count(r) as r_count,rel_name
RETURN rel_name, r_count

Related

Cypher - given relationship, get the nodes

If I do the query
MATCH (:Label1 {prop1: "start node"}) -[relationships*1..10]-> ()
UNWIND relationships as relationship
RETURN DISTINCT relationship
How do I get nodes for each of acquired relationship to get result in format:
╒════════╤════════╤═══════╕
│"from" │"type" │"to" │
╞════════╪════════╪═══════╡
├────────┼────────┼───────┤
└────────┴────────┴───────┘
Is there a function such as type(r) but for getting nodes from relationship?
RomanMitasov and ray have working answers above.
I don't think they quite get at what you want to do though, because you're basically returning every relationship in the graph in a sort of inefficient way. I say that because without a start or end position, specifying a path length of 1-10 doesn't do anything.
For example:
CREATE (r1:Temp)-[:TEMP_REL]->(r2:Temp)-[:TEMP_REL]->(r3:Temp)
Now we have a graph with 3 Temp nodes with 2 relationships: from r1 to r2, from r2 to r3.
Run your query on these nodes:
MATCH (:Temp)-[rels*1..10]->(:Temp)
UNWIND rels as rel
RETURN startNode(rel), type(rel), endNode(rel)
And you'll see you get four rows. Which is not what you want because there are only two distinct relationships.
You could modify that to return only distinct values, but you're still over-searching the graph.
To get an idea of what relationships are in the graph and what they connect, I use a query like:
MMATCH (n)-[r]->(m)
RETURN labels(n), type(r), labels(m), count(r)
The downside of that, of course, is that it can take a while to run if you have a very large graph.
If you just want to see the structure of your graph:
CALL db.schema.visualization()
Best wishes and happy graphing! :)
Yes, such functions do exist!
startNode(r) to get the start node from relationship r
endNode(r) to get the end node
Here's the final query:
MATCH () -[relationships*1..10]-> ()
UNWIND relationships as r
RETURN startNode(r) as from, type(r) as type, endNode(r) as to

Do not return set of nodes from a specific path in Cypher

I am trying to return a set of a node from 2 sessions with a condition that returned node should not be present in another session (third session). I am using the following code but it is not working as intended.
MATCH (:Session {session_id: 'abc3'})-[:HAS_PRODUCT]->(p:Product)
UNWIND ['abc1', 'abc2'] as session_id
MATCH (target:Session {session_id: session_id})-[r:HAS_PRODUCT]->(product:Product)
where p<>product
WITH distinct product.products_id as products_id, r
RETURN products_id, count(r) as score
ORDER BY score desc
This query was supposed to return all nodes present in abc1 & abc2 but not in abc3. This query is not excluding all products present in abc3. Is there any way I can get it working?
UPDATE 1:
I tried to simplify it without UNWIND as this
match (:Session {session_id: 'abc3'})-[:HAS_PRODUCT]->(p:Product)
MATCH (target:Session {session_id: 'abc1'})-[r:HAS_PRODUCT]->(product:Product)
where product <> p
WITH distinct product.products_id as products_id
RETURN products_id
Even this is also not working. It is returning all items present in abc1 without removing those which are already in abc3. Seems like where product <> p is not working correctly.
I would suggest it would be best to check if the nodes are in a list, and to prove out the approach, start with a very simple example.
Here is a simple cypher showing one way to do it. This approach can then be extended into the complex query,
// get first two product IDs as a list
MATCH (p:Product)
WITH p LIMIT 2
WITH COLLECT(ID(p)) as list
RETURN list
// now show two more product IDs which not in that list
MATCH (p:Product)
WITH p LIMIT 2
WITH COLLECT(ID(p)) as list
MATCH (p2:Product)
WHERE NOT ID(p2) in list
RETURN ID(p2) LIMIT 2
Note: I'm using the ID() of the nodes instead of the entire node, same dbhits but may be more performant...

Duplicate object in a list when I try to map all related entities

According to this post I tried to map all related entities in a list.
I used the same query into the post with a condition to return a list of User but it returns duplicate object
MATCH (user:User) WHERE <complex conditions>
WITH user, calculatedValue
MATCH p=(user)-[r*0..1]-() RETURN user, calculatedValue, nodes(p), rels(p)
Is it a bug? I'm using SDN 4.2.4.RELEASE with neo4j 3.2.1
Not a bug.
Keep in mind a MATCH in Neo4j will find all occurrences of a given pattern. Let's look at your last MATCH:
MATCH p=(user)-[r*0..1]-()
Because you have a variable match of *0..1, this will always return at least one row with just the user itself (with rels(p) empty and nodes(p) containing only the user), and then you'll get a row for every connected node (user will always be present on that row, and in the nodes(p) collection, along with the other connected node).
In the end, when you have a single user node and n directly connected nodes, you will get n + 1 rows. You can run the query in the Neo4j browser, looking at the table results, to confirm.
A better match might be something like:
...
OPTIONAL MATCH (user)-[r]-(b)
RETURN user, calculatedValue, collect(r) as rels, collect(b) as connectedNodes
Because we aggregate on all relationships and connected nodes (rather than just the relationships and nodes for each path), you'll get a single row result per user node.

How to write cypher statement to combine nodes when an OPTIONAL MATCH is null?

Background
Hi all, I am currently trying to write a cypher statement that allows me to find a set of paths on a map from a starting point. I want my search result to always return connecting streets within 5 nodes. Optionally, if there's a nearby hospital, I would like my search pattern to also indicate nearby hospitals.
Main Problem
Because there isn't always a nearby hospital to the current street, sometimes my optional match search pattern comes back as null. Here's the current cypher statement I'm using:
MATCH path=(a:Street {id: 123})-[:CONNECTED_TO*..5]-(b:Street)
OPTIONAL MATCH optionalPath=(b)-[:CONNECTED_TO]->(hospital:Hospital)
WHERE ALL (x IN nodes(path) WHERE (x:Street))
WITH DISTINCT nodes(path) + nodes(optionalPath) as n
UNWIND n as nodes
RETURN DISTINCT nodes;
However, this syntax only works if optionalPath contains nodes. If it doesn't, the statement nodes(path) + nodes(optionalPath) is an operation adding null and I get no records. This is true even the nodes(path) term does contain nodes.
What's the best way to get around this problem?
You can use COALESCE to replace a NULL with some other value. For example:
MATCH path=(:Street {id: 123})-[:CONNECTED_TO*..5]-(b:Street)
WHERE ALL (x IN nodes(path) WHERE x:Street)
OPTIONAL MATCH optionalPath=(b)-[:CONNECTED_TO]->(hospital:Hospital)
WITH nodes(path) + COALESCE(nodes(optionalPath), []) as n
UNWIND n as nodes
RETURN DISTINCT nodes;
I have also made a few other improvements:
The WHERE clause was moved up right after the first MATCH. This eliminates the unwanted path values immediately. Your original query would get all path values (even unwanted ones) and always the perform the second MATCH query, and only eliminate unwanted paths afterwards. (But, it is actually not clear if you even need the WHERE clause at all; for example, if the CONNECTED_TO relationship is only used between Street nodes.)
The DISTINCT in your WITH clause would have prevented duplicate n collections, but the collections internally could have had duplicate paths. This was probably not what you wanted.
It seems you don't really want the path, just all the street nodes within 5 steps, plus any connected hospitals. So I would simplify your query to just that, and then condense the 3 columns down to 1.
MATCH (a:Street {id: 123})-[:CONNECTED_TO*..5]-(b:Street)
OPTIONAL MATCH (b)-[:CONNECTED_TO]->(hospital:Hospital)
WITH collect(a) + collect(b) + collect(hospital) as n
UNWIND n as nodez
RETURN DISTINCT nodez;
If Streets can be indirectly connected (hospital in between), Than I'd adjust like this
MATCH (a:Street {id: 123})-[:CONNECTED_TO]-(b:Street)
WITH a as nodez, b as a
MATCH (a)-[:CONNECTED_TO]-(b:Street)
WITH nodez+collect(b) as nodez, b as a
MATCH (a)-[:CONNECTED_TO]-(b:Street)
WITH nodez+collect(b) as nodez, b as a
MATCH (a)-[:CONNECTED_TO]-(b:Street)
WITH nodez+collect(b) as nodez, b as a
MATCH (a)-[:CONNECTED_TO]-(b:Street)
WITH nodez+collect(b) as nodez, b as a
OPTIONAL MATCH (b)-[:CONNECTED_TO]->(hospital:Hospital)
WITH nodez + collect(hospital) as n
UNWIND n as nodez
RETURN DISTINCT nodez;
It's a bit more verbose, but just says exactly what you want (and also adds the start node to the hospital check list)

Neo4j Cypher find two disjoint nodes

I'm using Neo4j to try to find any node that is not connected to a specific node "a". The query that I have so far is:
MATCH p = shortestPath((a:Node {id:"123"})-[*]-(b:Node))
WHERE p IS NULL
RETURN b.id as b
So it tries to find the shortest path between a and b. If it doesn't find a path, then it returns that node's id. However, this causes my query to run for a few minutes then crashes when it runs out of memory. I was wondering if this method would even work, and if there is a more efficient way? Any help would be greatly appreciated!
edit:
MATCH (a:Node {id:"123"})-[*]-(b:Node),
(c:Node)
WITH collect(b) as col, a, b, c
WHERE a <> b AND NOT c IN col
RETURN c.id
So col (collect(b)) contains every node connected to a, therefore if c is not in col then c is not connected to a?
For one, you're giving this MATCH an impossible predicate to fulfill, so it will never find the shortest path.
WHERE clauses are associated with MATCH, OPTIONAL MATCH, and WITH clauses, so your query is asking for the shortest path where the path doesn't exist. That will never return anything.
Also, the shortestPath will start at the node you DON'T want to be connected, so this has no way of finding the nodes that aren't connected to it.
Probably the easiest way to approach this is to MATCH to all nodes connected to your node in question, then MATCH to all :Nodes checking for those that aren't in the connected set. That means you won't have to do a shortestPath from every single node in the db, just a membership check in a collection.
You'll need APOC Procedures for this, as it has the fastest means of matching to nodes within a subgraph.
MATCH (a:Node {id:"123"})
CALL apoc.path.subgraphNodes(a, {}) YIELD node
WITH collect(node) as subgraph
MATCH (b:Node)
WHERE NOT b in subgraph
RETURN b.id as b
EDIT
Your edited query is likely to blow up, that's going to generate a huge result set (the query will build a result set of every node reachable from your start node by a unique path in a cartesian product with every :Node).
Instead, go step by step, collect the distinct nodes (because otherwise you'll get multiples of the same nodes that can be reached via different paths), and then only after you have your collection should you start your match for nodes that aren't in the list.
MATCH (:Node {id:"123"})-[*0..]-(b:Node)
WITH collect(DISTINCT b) as col
MATCH (a:Node)
WHERE NOT a IN col
RETURN a.id

Resources