Using aggregation thresholds in a Cypher query to refine results - neo4j

I'm trying to write a Cypher query that uses aggregation to pull back the most relevant paths. My desired path is described in the MATCH clause below
MATCH p=(a:TYP1)--(b:TYP2)--(c:TYP3)--(d:TYP4)
RETURN a, count(distinct d) as cntTYP4
ORDER BY cntTYP4 DESC
This produces a list of nodes of TYP1 sorted in descending order by the number of TYP4 nodes that they link to in the MATCH clause. What I would like to do is return all paths p where cntTYP4 > 5 (for example). My attempts to structure a query this far have been unsuccessful. Hopefully I'm missing something obvious!

You can use WITH to do this. Something like:
MATCH p=(a:TYP1)--(b:TYP2)--(c:TYP3)--(d:TYP4)
WITH a, count(distinct d) as cntTYP4
WHERE cntTYP4 > 5
RETURN a, cntTYP4
ORDER BY cntTYP4 DESC
HTH,
Andrés

Related

How Many Nodes Are Involved in a Match

How can I know how many nodes and edges are involved in a MATCH? Is there another way besides Explain / Profile Match?
If you mean how many nodes are matched in a path, such as a variable-length path, then you can assign a path variable for this:
MATCH p = (k:Person {name:'Keanu Reeves'})-[*..8]-(t:Person {name:'Tom Hanks'})
WITH p LIMIT 1
RETURN p, length(p) as pathLength, length(p) + 1 as numberOfNodesInPath
You can also use nodes(p) and relationships(p) to get the collection of nodes and relationships that make up the path, and you can use size() on those collections to get their size.
There exists the COUNT() function of Cypher that allows you to count the number of elements. As for example in this query:
MATCH (n)
RETURN COUNT(n);
This query will count all nodes in your database.
You can find more information in the cypher manual, under the aggregating functions. Check it out.
The following Cypher snippet should return the number of distinct nodes and relationships found by any given MATCH clause. Just replace <your code here> with your MATCH pattern.
MATCH <your code here>
WITH COLLECT(NODES(p)) AS ns, SUM(SIZE(RELATIONSHIPS(p))) AS relCount
UNWIND ns AS nodeList
UNWIND nodeList AS node
RETURN COUNT(DISTINCT node) AS nodeCount, relCount;

neo4j indegree outdegree union

I want to compute Indegree and Outdegree and return a graph that has a connection between top 5 Indegree nodes and top 5 Outdegree nodes. I have written a code as
match (a:Port1)<-[r]-()
return a.id as NodeIn, count(r) as Indegree
order by Indegree DESC LIMIT 5
union
match (n:Port1)-[r]->()
return n.id as NodeOut, count(r) as Outdegree
order by Outdegree DESC LIMIT 5
union
match p=(u:Port1)-[:LinkTo*1..]->(t:Port1)
where u.id in NodeIn and t.id in NodeOut
return p
I get an error as
All sub queries in an UNION must have the same column names (line 4, column 1 (offset: 99)) "union"
What are the changes that I need to do to the code?
There's a few things we can improve.
The matches you're doing isn't the most efficient way to get incoming and outgoing degrees for relationships.
Also, UNION can only be used to combine query results with identical columns. In this case, we won't even need UNION, we can use WITH to pipe results from one part of a query to another, and COLLECT() the nodes you need in between.
Try this query:
match (a:Port1)
with a, size((a)<--()) as Indegree
order by Indegree DESC LIMIT 5
with collect(a) as NodesIn
match (a:Port1)
with NodesIn, a, size((a)-->()) as Outdegree
order by Outdegree DESC LIMIT 5
with NodesIn, collect(a) as NodesOut
unwind NodesIn as NodeIn
unwind NodesOut as NodeOut
// we now have a cartesian product between both lists
match p=(NodeIn)-[:LinkTo*1..]->(NodeOut)
return p
Be aware that this performs two NodeLabelScans of :Port1 nodes, and does a cross product of the top 5 of each, so there are 25 variable length path matches, which can be expenses, as this generates all possible paths from each NodeIn to each NodeOut.
If you only one the shortest connection between each, then you might try replacing your variable length match with a shortestPath() call, which only returns the shortest path found between each two nodes:
...
match p = shortestPath((NodeIn)-[:LinkTo*1..]->(NodeOut))
return p
Also, make sure your desired direction is correct, as you're matching nodes with the highest in degree and getting an outgoing path to nodes with the highest out degree, that seems like it might be backwards to me, but you know your requirements best.

Limit the results of a union cypher query

Let's say we have the example query from the documentation:
MATCH (n:Actor)
RETURN n.name AS name
UNION
MATCH (n:Movie)
RETURN n.title AS name
I know that if I do that:
MATCH (n:Actor)
RETURN n.name AS name
LIMIT 5
UNION
MATCH (n:Movie)
RETURN n.title AS name
LIMIT 5
I can reduce the returned results of each sub query to 5.How can I LIMIT the total results of the union query?
This is not yet possible, but there is already an open neo4j issue that requests the ability to do post-UNION processing, which includes what you are asking about. You can add a comment to that neo4j issue if you support having it resolved.
This can be done using UNION post processing by rewriting the query using the COLLECT function and the UNWIND clause.
First we turn the columns of a result into a map (struct, hash, dictionary), to retain its structure. For each partial query we use the COLLECT to aggregate these maps into a list, which also reduces our row count (cardinality) to one (1) for the following MATCH. Combining the lists is a simple list concatenation with the “+” operator.
Once we have the complete list, we use UNWIND to transform it back into rows of maps. After this, we use the WITH clause to deconstruct the maps into columns again and perform operations like sorting, pagination, filtering or any other aggregation or operation.
The rewritten query will be as below:
MATCH (n:Actor)
with collect ({name: n.title}) as row
MATCH (n:Movie)
with row + collect({name: n.title}) as rows
unwind rows as row
with row.name as name
return name LIMIT 5
This is possible in 4.0.0
CALL {
MATCH (p:Person) RETURN p
UNION
MATCH (p:Person) RETURN p
}
RETURN p.name, p.age ORDER BY p.name
Read more about Post-union processing here https://neo4j.com/docs/cypher-manual/4.0/clauses/call-subquery/

Find shortest path between nodes with additional filter

I'm trying to model flights between airports on certain dates. So far my test graph looks like this:
Finding shortest path between for example LTN and WAW is trivial with:
MATCH (f:Airport {code: "LTN"}), (t:Airport {code: "WAW"}),
p = shortestPath((f)-[]-(t)) RETURN p
Which gives me:
But I have no idea how to get only paths with Flights that have relation FLIES_ON with given Date.
Link to Neo4j console
Here's what I would do with your given model. The other commenters' queries don't seem right, as they use ANY() instead of ALL(). You specifically said you only want paths where all Flight nodes on the path are attached to a given Date node with a :FLIES_ON relationship:
MATCH (LTN:Airport {code:"LTN"}),
(WAW:Airport {code:"WAW"}),
p =(LTN)-[:ROUTE*]-(WAW)
WHERE ALL(x IN FILTER(x IN NODES(p) WHERE x:Flight)
WHERE (x)<-[:FLIES_ON]-(:Date {date:"130114"}))
WITH p ORDER BY LENGTH(p) LIMIT 1
RETURN p
http://console.neo4j.org/r/xgz84y
though this would not be my preferred structure for this kind of data; in answering your question i might go this way instead. get the paths, filter the path and get the first one ordered by length.
in the console tests is runs faster than the one suggested above as the query plan is simpler.
Anyhoo i hope this at least points you in a good direction :)
MATCH (f:Airport { cd: "ltn" }),(t:Airport { cd: "waw" }), p =((f)-[r*]-(t))
WHERE ANY (x IN relationships(p)
WHERE type(x)='FLIES_ON') AND ANY (x IN nodes(p)
WHERE x.cd='130114')
RETURN p
ORDER BY length(p)
LIMIT 1
The problem is that using shortestPath or allShortestPaths will never include the Date nodes.
What you need to do is to filter the pattern with the date node (I don't know however how you store the date, so I'll take Ymd format:
MATCH (f:Airport {code: "LTN"}), (t:Airport {code: "WAW"})
MATCH p=(f)-[*]-(t)
WHERE ANY (r in rels(p) WHERE type(r) = 'FLIES_ON')
AND ANY (n in nodes(p) WHERE 'Date' IN labels(n) AND n.date = 20150120)
RETURN p
ORDER BY length(p)
LIMIT 1
Another solution and less costly, is to include the date in your match and building yourself the path with it :
MATCH (n:Date {date:20150120})
MATCH (f:Airport {code:"LTN"}), (t:Airport {code:"WAW"})
MATCH p=(f)<-[*]-(n)-[*]->(t)
RETURN distinct(p)
ORDER BY length(p)

How to retrieve only the nodes from the path in Neo4J Cypher query?

I have a query of the following kind:
MATCH (u1:User{name:"user_name"}), (s1:Statement), s1-[:BY]->u1
WITH DISTINCT s1,u1
MATCH (s2:Statement), s2-[:BY]->u1,
p=s1<-[:OF]-c-[:OF]->s2
WHERE s1 <> s2
WITH collect(p) AS coll, count(p) AS paths, s1, s2
RETURN s1,s2,paths,coll
ORDER BY paths DESC
LIMIT 2;
Right now it returns a list of all the paths p in the coll variable. I want it to list only the nodes c. How to make this possible?
Maybe the query is not right, in this case, what I'm trying to do is to
1) Find all statements made by a user;
2) Find the nodes that connect those two statements;
3) Return those statements, which have the most nodes connecting them, ORDER BY DESC, including the names of the actual nodes that connect them.
Thank you!
I can't test it at the moment, but you could try something like
MATCH (u:User {name:"user_name"})<-[:BY]-(s1)<-[:OF]-(c)-[:OF]->(s2)-[:BY]->(u)
RETURN s1, s2, collect(c) as connections
ORDER BY length(connections) DESC
LIMIT 2

Resources