Set constraint on relationship property for variable-path search in GraphDB - neo4j

I have a neo4j graph that is similar in structure to the example below:
MATCH (n:Person)-[k:KNOWS]->(f)
WHERE k.since < 2000
RETURN f.name, f.age, f.email
which comes straight off the neo4j examples.
What I am looking to do is this:
Start with one node by name ("Jennifer" in this case) and find all the nodes, regardless of path depth, that stem from the initial node but where the relationship KNOWS has a property since < 2000
So Jennifer might know Gary since before 2000 who also knows Bill since before 2000. And Jennifer knows Michelle since before 2000 (et cetera)
This is where I am stuck:
MATCH p=(n:Person {name:'Jennifer'})-[:KNOWS*]-(f)
RETURN [k IN p WHERE k.since < 2000]
If I run any query with :KNOWS*, it just hangs up forever, even for a relatively small database of 21 nodes and 840 relationships.
I figured I need to use WITH REDUCE() somehow but it isn't clicking...
Can anyone point me in the right direction here?
Much apperciated!

You can use the all() list predicate to ensure that all relationships in the path adhere to the predicate. This will be evaluated during expansion, so may yield better performance:
MATCH p=(n:Person {name:'Jennifer'})-[:KNOWS*]-(f)
WHERE all(rel in relationships(p) WHERE rel.since < 2000)
RETURN DISTINCT f
That said, Cypher is concerned with finding all possible paths that fit the pattern, and that approach isn't always a good match when you're interested in distinct nodes, not distinct paths (especially when the paths backtrack to previously visited nodes via different relationships).
You may want to consider adding an upper limit to your variable length expansion.

Related

neo4j query to find indirect paths runs extremely slow

MATCH (a:Author {id:'author_1'}),
(art:Article {id:'PMID:21473878'})
WITH a, art
MATCH r=((a)-[*2..4]-(art))
RETURN r
In a database with roughly 1.3 million nodes and 8 million relations this query runs forever. Is there anything I can do?
There are indexes on :Author and :Article id
===============
In this particular case, the query planner can sometimes use an inefficient approach when matching to patterns connecting two nodes you already know.
In this case, the planner takes one node as the start node, expands to all possible nodes from the given pattern, and then applies a filter on all of those nodes to see if it's the other node at the end of the match. This is unnecessary property access, especially with large numbers of matched nodes.
The better approach is for both your start and end node to be looked up via the index, then perform expansions from one of those nodes, and use a hash join to determine which of those end nodes is the same as the end node you're looking for. This approach only uses property access once when matching to the id of the end node in question (instead of for every single node found at the end of the expansion).
The trick right now is how to get Neo4j to use this approach in the planner. This may work:
MATCH (a:Author {id:'author_1'}),
(art:Article {id:'PMID:21473878'})
MATCH r=((a)-[*2..4]-(end))
WHERE end = art
RETURN r
At the least, I'd expect this to be about as fast as your approach using an OPTIONAL MATCH.
The first two matches are not necessary. Remove it and try:
MATCH r=(:Author {id:'author_1'})-[*2..4]-(:Article {id:'PMID:21473878'})
return r

Is neo4j suitable for searching for paths of specific length

I am a total newcommer in the world of graph databases. But let's put that on a side.
I have a task to find a cicular path of certain length (or of any other measure) from start point and back.
So for example, I need to find a path from one node and back which is 10 "nodes" long and at the same time has around 15 weights of some kind. This is just an example.
Is this somehow possible with neo4j, or is it even the right thing to use?
Hope I clarified it enough, and thank you for your answers.
Regards
Neo4j is a good choice for cycle detection.
If you need to find one path from n to n of length 10, you could try some query like this one:
MATCH p=(n:TestLabel {uuid: 1})-[rels:TEST_REL_TYPE*10]-(n)
RETURN p LIMIT 1
The match clause here is asking Cypher to find all paths from n to itself, of exactly 10 hops, using a specific relationship type. This is called variable length relationships in Neo4j. I'm using limit 1 to return only one path.
Resulting path can be visualized as a graph:
You can also specify a range of length, such as [*8..10] (from 8 to 10 hops away).
I'm not sure I understand what you mean with:
has around 15 weights of some kind
You can check relationships properties, such as weight, in variable length paths if you need to. Specific example in the doc here.
Maybe you will also be interested in shortestPath() and allShortestPaths() functions, for which you need to know the end node as well as the start one, and you can find paths between them, even specifying the length.
Since you did not provide a data model, I will just assume that your starting/ending nodes all have the Foo label, that the relevant relationships all have the BAR type, and that your circular path relationships all point in the same direction (which should be faster to process, in general). Also, I gather that you only want circular paths of a specific length (10). Finally, I am guessing that you prefer circular paths with lower total weight, and that you want to ignore paths whose total weight exceed a bounding value (15). This query accomplishes the above, returning the matching paths and their path weights, in ascending order:
MATCH p=(f:Foo)-[rels:BAR*10]->(f)
WITH p, REDUCE(s = 0, r IN rels | s + r.weight) AS pathWeight
WHERE pathWeight <= 15
RETURN p, pathWeight
ORDER BY pathWeight;

Optimizing Cypher Query

I am currently starting to work with Neo4J and it's query language cypher.
I have a multple queries that follow the same pattern.
I am doing some comparison between a SQL-Database and Neo4J.
In my Neo4J Datababase I habe one type of label (person) and one type of relationship (FRIENDSHIP). The person has the propterties personID, name, email, phone.
Now I want to have the the friends n-th degree. I also want to filter out those persons that are also friends with a lower degree.
FOr example if I want to search for the friends 3 degree I want to filter out those that are also friends first and/or second degree.
Here my query type:
MATCH (me:person {personID:'1'})-[:FRIENDSHIP*3]-(friends:person)
WHERE NOT (me:person)-[:FRIENDSHIP]-(friends:person)
AND NOT (me:person)-[:FRIENDSHIP*2]-(friends:person)
RETURN COUNT(DISTINCT friends);
I found something similiar somewhere.
This query works.
My problem is that this pattern of query is much to slow if I search for a higher degree of friendship and/or if the number of persons becomes more.
So I would really appreciate it, if somemone could help me with optimize this.
If you just wanted to handle depths of 3, this should return the distinct nodes that are 3 degrees away but not also less than 3 degrees away:
MATCH (me:person {personID:'1'})-[:FRIENDSHIP]-(f1:person)-[:FRIENDSHIP]-(f2:person)-[:FRIENDSHIP]-(f3:person)
RETURN apoc.coll.subtract(COLLECT(f3), COLLECT(f1) + COLLECT(f2) + me) AS result;
The above query uses the APOC function apoc.coll.subtract to remove the unwanted nodes from the result. The function also makes sure the collection contains distinct elements.
The following query is more general, and should work for any given depth (by just replacing the number after *). For example, this query will work with a depth of 4:
MATCH p=(me:person {personID:'1'})-[:FRIENDSHIP*4]-(:person)
WITH NODES(p)[0..-1] AS priors, LAST(NODES(p)) AS candidate
UNWIND priors AS prior
RETURN apoc.coll.subtract(COLLECT(DISTINCT candidate), COLLECT(DISTINCT prior)) AS result;
The problem with Cypher's variable-length relationship matching is that it's looking for all possible paths to that depth. This can cause unnecessary performance issues when all you're interested in are the nodes at certain depths and not the paths to them.
APOC's path expander using 'NODE_GLOBAL' uniqueness is a more efficient means of matching to nodes at inclusive depths.
When using 'NODE_GLOBAL' uniqueness, nodes are only ever visited once during traversal. Because of this, when we set the path expander's minLevel and maxLevel to be the same, the result are nodes at that level that are not present at any lower level, which is exactly the result you're trying to get.
Try this query after installing APOC:
MATCH (me:person {personID:'1'})
CALL apoc.path.expandConfig(me, {uniqueness:'NODE_GLOBAL', minLevel:4, maxLevel:4}) YIELD path
// a single path for each node at depth 4 but not at any lower depth
RETURN COUNT(path)
Of course you'll want to parameterize your inputs (personID, level) when you get the chance.

follow all relationships but specific ones

How can I tell cypher to NOT follow a certain relationship/edge?
E.g. I have a :NODE that is connected to another :NODE via a :BUDDY relationship. Additionally every :NODE is related to :STUFF in arbitrary depth by arbitrary edges which are NOT of type :BUDDY. I now want to add a shortcut relation from each :NODE to its :STUFF. However, I do not include :STUFF of its :BUDDIES.
(:NODE)-[:BUDDY]->(:NODE)
(:NODE)-[*]->(:STUFF)
My current query looks like this:
MATCH (n:Node)-[*]->(s:STUFF) WHERE NOT (n)-[:BUDDY]->()-[*]->(s) CREATE (n)-[:HAS]->(s)
However I have some issues with this query:
1) If I ever add a :BUDDY relationship not directly between :NODE but children of :NODE the query will use that relationship for matching. This might not be intended as I do not want to include buddies at all.
2) Explain tells me that neo4j does the match (:NODE)-[*]->(:STUFF) and then AntiSemiApply the pattern (n)-[:BUDDY]->(). As a result it matches the whole graph to then unmatch most of the found connections. This seems ineffective and the query runs slower than I like (However subjective this might sound).
One (bad) fix is to restrict the depth of (:NODE)-[*]->(:STUFF) via (:NODE)-[*..XX]->(:STUFF). However, I cannot guarantee that depth unless I use a ridiculous high number for worst case scenarios.
I'd actually just like to tell neo4j to just not follow a certain relationship. E.g. MATCH (n:NODE)-[ALLBUT(:BUDDY)*]->(s:STUFF) CREATE (n)-[:HAS]->(s). How can I achieve this without having to enumerate all allowed connections and connect them with a | (which is really fast - but I have to manually keep track of all possible relations)?
One option for this particular schema is to explicitly traverse past the point where the BUDDY relationship is a concern, and then do all the unbounded traversing you like from there. Then you only have to apply the filter to single-step relationships:
MATCH (n:Node) -[r]-> (leaf)
WHERE NOT type(r) = 'BUDDY'
WITH n, leaf
MATCH (leaf) -[*] -> (s:Stuff)
WITH n, COLLECT(DISTINCT leaf) AS leaves, COLLECT(DISTINCT s) AS stuff
RETURN n, [leaf IN leaves WHERE leaf:Stuff] + stuff AS stuffs
The other option is to install apoc and take a look at the path expander procedure, which allows you to 'blacklist' node labels (such as :Node) from your path query, which may work just as well depending on your graph. See here.
My final solution is generating a string from the set of
{relations}\{relations_id_do_not_want}
and use that for matching. As I am using an API and thus can do this generation automatically it is not as much as an inconvenience as I feared, but still an inconvenience. However, this was the only solution I was able to find since posting this question.
You could use a condition o nrelationship type:
MATCH (:Node)-[r:REL]->(:OtherNode)
WHERE NOT type(r) = 'your_unwanted_rel_type'
I have no clues about perf though

Neo4j - Cypher return 1 to 1 relationships

Using neo4j 1.9.2, I'm trying to find all nodes in my graph that have a one to one relationship to another node. Let's say I have persons in my graph and I would like to find all persons, that have exactly one friend (since 2013), and this one friend only has the other person as friend and no one else. As a return, I would like to have all these pairs of "isolated" friends.
I tried the following:
START n=node(*) MATCH n-[r:is_friend]-m-[s:is_friend]-n
WHERE r.since >= 2013 and s.since >= 2013
WITH n, m, count(r), count(s)
WHERE count(r) = 1 AND count(s) = 1
RETURN n, m
But this query does not what it is supposed to do - it simply returns nothing.
Note: There exists just one relation between the two persons. So one friend has a incoming relationship and the other one an outgoing one. Also, these two persons might have some other relations, like "works_in" or so, but I just want to check if there is a 1:1 relation of type *is_friends* between the persons.
EDIT: The suggestion of Stefan works perfect if using node(*) as starting point. But when trying this query for one specific node as start point (e.g. start n=node(42)), it doesn't work. What would the solution look like in this case?
Update: I'm still wondering about a solution for this szenario: How to check if a given start node has a 1-to-1 relation to another node of a specific relationship type. Any ideas?
Here it's crucial to understand the concept of paths in the MATCH clause. A path is a alternating collection of node, relationship, node, relationship, .... node. There is the constraint that the same relationship will never occur twice in the same path - otherwise there would be a danger of having endless loops.
That said, you need to decide if is_friend in your domain is directed. If it is directed you'd distinguish a being friend to b and b being friend to a. From the description I assume is_friend is undirected and the statement should look like:
START n=node(*) MATCH n-[r:is_friend]-()
WHERE r.since >= 2013
WITH n, count(r) as numberOfFriends
WHERE numberOfFriends=1
RETURN n
You don't have to care about the other end, it's traversed nonetheless since you do a node(*). Be aware that node(*) gets obviously more expensive when your graph grows.

Resources