Optimize cypher query to avoid cartesian product - neo4j

The query purpose is pretty trivial. For a given nodeId(userId) I want to return on the graph all nodes which has relaionship within X hops and I want to aggregate and return the distance(param which set on the relationship) between them)
I came up with this:
MATCH p=shortestPath((user:FOLLOWERS{userId:{1}})-[r:follow]-(f:FOLLOWERS)) " +
"WHERE f <> user " +
"RETURN (f.userId) as userId," +
"reduce(s = '', rel IN r | s + rel.dist + ',') as dist," +
"length(r) as hop"
userId({1}) is given as Input and is indexed.
I believe Iam having here cartesian product. how would you suggest avoiding it?

You can make the cartesian product less onerous by creating an index on :FOLLOWERS(userId) to speed up one of the two "legs" of the cartesian product:
CREATE INDEX ON :FOLLOWERS(userId);
Even though this will not get rid of the cartesian product, it will run in O(N log N) time, which is much faster than O(N ^ 2).
By the way, your r relationship needs to be variable-length in order for your query to work. You should specify a reasonable upper bound (which depends on your DB) to assure that the query will finish in a reasonable time and not run out of memory. For example:
MATCH p=shortestPath((user:FOLLOWERS { userId: 1 })-[r:follow*..5]-(f:FOLLOWERS))
WHERE f <> user
RETURN (f.userId) AS userId,
REDUCE (s = '', rel IN r | s + rel.dist + ',') AS dist,
LENGTH(r) AS hop;

Related

Pattern Matching in Neo4j

Assume that in an application, the user gives us a graph and we want to consider it as a pattern and find all occurrences of the pattern in the neo4j database. If we knew what the pattern is, we could write the pattern as a Cypher query and run it against our database. However, now we do not know what the pattern is beforehand and receive it from the user in the form of a graph. How can we perform a pattern matching on the database based on the given graph (pattern)? Is there any apoc for that? Any external library?
One way of doing this is to decompose your input graph into edges and create a dynamic cypher from it. I have worked on this quite some time ago, and the solution below is not perfect but indicates a possible direction.
For example, if you feed this graph:
and you take the id(node) from the graph, (i am not taking the rel ids, this is one of the imperfections)
this query
WITH $nodeids AS selection
UNWIND selection AS s
WITH COLLECT (DISTINCT s) AS selection
WITH selection,
SPLIT(left('a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z',SIZE(selection)*2-1),",") AS nodeletters
WITH selection,
nodeletters,
REDUCE (acc="", nl in nodeletters |
CASE acc
WHEN "" THEN acc+nl
ELSE acc+','+nl
END) AS rtnnodes
MATCH (n) WHERE id(n) IN selection
WITH COLLECT(n) AS nodes,selection,nodeletters,rtnnodes
UNWIND nodes AS n
UNWIND nodes AS m
MATCH (n)-[r]->(m)
WITH DISTINCT "("
+nodeletters[REDUCE(x=[-1,0], i IN selection | CASE WHEN i = id(n) THEN [x[1], x[1]+1] ELSE [x[0], x[1]+1] END)[0]]
+TRIM(REDUCE(acc = '', p IN labels(n)| acc + ':'+ p))+")-[:"+type(r)+"]->("
+ nodeletters[REDUCE(x=[-1,0], i IN selection | CASE WHEN i = id(m) THEN [x[1], x[1]+1] ELSE [x[0], x[1]+1] END)[0]]
+TRIM(REDUCE(acc = '', p IN labels(m)| acc + ':'+ p))+")" as z,rtnnodes
WITH COLLECT(z) AS parts,rtnnodes
WITH REDUCE(y=[], x in range(0, size(parts)-1) | y + replace(parts[x],"[","[r" + (x+1))) AS parts2,
REDUCE (acc="", x in range(0, size(parts)-1) | CASE acc WHEN "" THEN acc+"r"+(x+1) ELSE acc+",r"+(x+1) END) AS rtnrels,
rtnnodes
RETURN
REDUCE (acc="MATCH ",p in parts2 |
CASE acc
WHEN "MATCH " THEN acc+p
ELSE acc+','+p
END)+
" RETURN "+
rtnnodes+","+rtnrels+
" LIMIT "+{limit}
AS cypher
returns something like
cypher: "MATCH (a:Person)-[r1:DRIVES]->(b:Car),(a:Person)-[r2:KNOWS]->(c:Person) RETURN a,b,c,r1,r2 LIMIT 50"
which you can feed to the next query.
In Graphileon, you can just select the nodes, and the result will be visualized as well.
Disclosure : I work for Graphileon
I have used patterns in genealogy queries.
The X-chromosome is not transmitted from father to son. As you traverse a family tree you can use the reduce function to create a concatenated string of the sex of the ancestor. You can then accept results that lack MM (father-son). This query gives all the descendants inheriting the ancestor's (RN=32) X-chromosome.
match p=(n:Person{RN:32})<-[:father|mother*..99]-(m)
with m, reduce(status ='', q IN nodes(p)| status + q.sex) AS c
where c=replace(c,'MM','')
return distinct m.fullname as Fullname
I am developing other pattern specific queries as part of a Neo4j PlugIn for genealogy. These will include patterns of triangulation groups.
GitHub repository for Neo4j Genealogy PlugIn

Can graph algorithms take nodes' and relationships' properties in Neo4J?

I'm starting to use Graph Algorithms plugin of Neo4J (3.3.x) and wanted to ask if the plugin can take in the properties of the nodes/relationships, so that I could add in a request like this:
CALL algo.pageRank.stream('Page', 'LINKS', {iterations:20, dampingFactor:0.85})
YIELD node, score
RETURN node,score order by score desc limit 20
Some properties of the Nodes labeled Page (e.g. only the ones with timestamp > certain_date) or only the LINKS which have a specific property x.
Or then if it's not possible, shall I use Cypher projection and simply make a Cypher query inside the pageRank algorithm?
You can use Cypher projection to be more selective about which nodes and relationships to process with an graph algorithm.
For example, to execute the algo.pageRank algorithm only on Page nodes whose timestamp > 1000, and LINKS relationships that have a specific property x, this should work:
MERGE (dummy:Dummy)
WITH dummy, ID(dummy) AS dummy_id
CALL algo.pageRank.stream(
'OPTIONAL MATCH (p:Page) WHERE p.timestamp > 1000 RETURN CASE WHEN p IS NOT NULL THEN ID(p) ELSE ' + dummy_id + ' END AS id',
'OPTIONAL MATCH (p1:Page)-[link:LINKS]->(p2:Page) WHERE EXISTS(link.x) WITH CASE WHEN link IS NOT NULL THEN [ID(p1), ID(p2)] ELSE [' + dummy_id + ',' + dummy_id + '] END AS res RETURN res[0] AS source, res[1] as target',
{graph:'cypher', iterations:20, dampingFactor:0.85})
YIELD node, score
WITH dummy, node, score
WHERE node <> dummy
RETURN node, score ORDER BY score DESC LIMIT 20;
NOTE: The graph algorithms are currently badly behaved (i.e., they throw exceptions) when either of the Cypher statements used in a Cypher projection return no results. The above query works around that by making sure that both statements return a dummy node instead of returning nothing. The Cypher statement that "wraps" the algorithm call will then filter out the dummy node if it is returned by the algorithm.

How to perform distinct while having multiple paths using Cypher

For a given node(sourceNode) I want to retrieve all node's which has relationships to my sourceNode within 3 hops.
The problem starts when we have multiple pathes btw source and destination nodes.
I dont care which path I get as long as I get one and I dont want to get the other ones (would be great to get only the shortest path)
So this is my code:
MATCH (user:C9 {userId:'70'})-[r:follow*1..3]-f WHERE f <> user
RETURN DISTINCT (f.userId) as userId,
reduce(s = '', rel IN r | s + rel.dist + ',') as dist,
length(r) as hop
The repose for this consist the same nodeId(userId's) and not performing distinct:
I would like to avoid the duplicated lines with the same userId.
any idea how to perform the distinct here?
Thanks,
ray.
How about something like this? Rather than look for distinct user just ue shortestPath to get to each follower 1..3 out from the starting user.
MATCH p=shortestPath((user:C9 {userId:'70'})-[r:follow*1..3]-(f))
WHERE f <> user
RETURN f.userId,
reduce(s = '', rel IN r | s + rel.dist + ',') as dist,
length(p) as hop
Alternatively, if you were looking to do it by shortest distance regardless of hops you could do something like the following example. Instead of using shortestPath, aggregate the distances on each relationship, order by shortest, put them in a collection, order by user and return the first element of the collection which will be the shortest
MATCH p=(user:C9 {userId:'70'})-[r:follow*1..3]-(f)
WHERE f <> user
with f.userId as user_id
, reduce(s = 0, rel IN relationships(p) | s + rel.dist) as dist
, length(p) as hops
order by dist
with user_id, collect(dist) as dists_per_follow, collect(hops) as hops_per_follow
return user_id
, dists_per_follow[0] as shortest
, dists_per_follow, hops_per_follow
order by user_id

Stop Cypher traversal when where condition on reduce() can no longer be satisfied

Suppose I have a neo4j database with a single node type and a single relationship type to keep things simple. All relationships have a "cost" property (as in classical graph problems), whose values are non-negative.
Suppose now I want to find all the possible paths between node with ID A and node with ID B, with an upper bound on path length (e.g. 10) such that the total path cost is below or equal to a given constant (e.g. 20).
The Cypher code to accomplish this is the following (and it works):
START a = node(A), b = node(B)
MATCH (a) -[r*0..10]-> (b)
WITH extract(x in r | x.cost) as costs, reduce(acc = 0, x in r | acc + x.cost) as totalcost
WHERE totalcost < 20
RETURN costs, totalcost
The problem with this query is that it doesn't take advange of the fact that costs are non-negative and thus paths where the total cost limit is passed can be pruned. Instead, it lists all possible paths of length 0 to 10 between nodes A and B (which can be ridiculously expensive), calculates total costs and then filters out paths that fall above the limit. Pruning paths in time would lead to massive performance improvements.
I know this is doable with the traversal framework by using BranchStates and preventing expansion when relevant, but I would like to find a Cypher solution (mainly due to the reasons exposed here).
I am currently using version 2.2.2, if that matters.
Would a sum of relationships costs before the extract be sufficient ?
START a = node(A), b = node(B)
MATCH (a)-[r*0..10]->(b)
WHERE sum(r.cost) < 20
WITH extract(x in r | x.cost) as costs, reduce(acc = 0, x in r | acc + x.cost) as totalcost
RETURN costs, totalcost
By the way, wanting to prune is meaning you want imperative way !
Also, please help Cypher a bit, use labels

Limiting a Neo4j cypher query results by sum of relationship property

Is there a way to limit a cypher query by the sum of a relationship property?
I'm trying to create a cypher query that returns nodes that are within a distance of 100 of the start node. All the relationships have a distance set, the sum of all the distances in a path is the total distance from the start node.
If the WHERE clause could handle aggregate functions what I'm looking for might look like this
START n=node(1)
MATCH path = n-[rel:street*]-x
WHERE SUM( rel.distance ) < 100
RETURN x
Is there a way that I can sum the distances of the relationships in the path for the where clause?
Sure, what you want to do is like a having in a SQL query.
In cypher you can chain query segments and use the results of previous parts in the next part by using WITH, see the manual.
For your example one would assume:
START n=node(1)
MATCH n-[rel:street*]-x
WITH SUM(rel.distance) as distance
WHERE distance < 100
RETURN x
Unfortunately sum doesn't work with collections yet
So I tried to do it differently (for variable length paths):
START n=node(1)
MATCH n-[rel:street*]-x
WITH collect(rel.distance) as distances
WITH head(distances) + head(tail(distances)) + head(tail(tail(distances))) as distance
WHERE distance < 100
RETURN x
Unfortunately head of an empty list doesn't return null which could be coalesced to 0 but just fails. So this approach would only work for fixed length paths, don't know if that's working for you.
I've come across the same problem recently. In more recent versions of neo4j this was solved by the extract and reduce clauses. You could write:
START n=node(1)
MATCH path = (n)-[rel:street*..100]-(x)
WITH extract(x in rel | x.distance) as distances, x
WITH reduce(res = 0, x in rs | res + x) as distance, x
WHERE distance <100
RETURN x
i dont know about a limitation in the WHERE clause, but you can simply specify it in the MATCH clause:
START n=node(1)
MATCH path = n-[rel:street*..100]-x
RETURN x
see http://docs.neo4j.org/chunked/milestone/query-match.html#match-variable-length-relationships

Resources