Cypher matched results difer with "with" and without "with" - neo4j

First of all sorry for the clumsy title, I couldn't think of a better way to expose it.
The issue is I get different results when querying cypher in a single match---result and when spliting it in a match --- with--- match ---result structure.
The match---result skips certain results.
My code:
match---result query
match (up:U)-[r1:COCS]->(op:O)-[r2:CCLS]->(jp:J)-[r3:PRE]->(n:J{id:"AC"})<-[j2o:CCLS]-(o:O)<-[o2u:COCS]-(u:U)
return up,type(r1), op, type(r2), jp, type(r3), n, type(j2o), o, type(o2u), u
Returns less results (there are results missing that match the path structure).
match--with---match---result query
match (up:U)-[r1:COCS]->(op:O)-[r2:CCLS]->(jp:J)-[r3:PRE]->(n:J{id:"AC"})
with up, r1, op, r2, jp, r3, n
match(n)<-[j2o:CCLS]-(o:O)<-[o2u:COCS]-(u:U)
return up,type(r1), op, type(r2), jp, type(r3), n, type(j2o), o, type(o2u), u
Returns the correct results
I do not understand why this is so. It makes no sense to me.
The way I understand how the with works, both should return the same results. Can someone throw some light?
This is with Neo4J 2.1.6
Thank you.

I can think of an explanation for this seemingly anomalous behavior.
To quote from the neo4j manual:
While pattern matching, Neo4j makes sure to not include matches where
the same graph relationship is found multiple times in a single
pattern. In most use cases, this is a sensible thing to do.
In your first query, the following sub-pattern appears twice (once on either side of the (n) node):
(:U)-[:COCS]->(:O)
Since the first query consists of a single pattern, Cypher would prevent the same COCS relationship from showing up twice in the same result row. In your case, this prevented some rows from showing up in the results.
Your second query splits the original query so that the above sub-pattern no longer appears twice in a single pattern. Therefore, you got "complete" results.
So, the lesson here is: if you use a pattern that repeats a relationship subpattern, make sure that you really intend to filter out those rows in which the same relationship instance shows up multiple times.

Related

Adding a property filter to cypher query explodes memory, why?

I'm trying to write a query that explores a DAG-type graph (a bill of materials) for all construction paths leading down to a specific part number (second MATCH), among all the parts associated with a given product (first MATCH). There is a strange behavior I don't understand:
This query runs in a reasonable time using Neo4j community edition (~2 s):
WITH '12345' as snid, 'ABCDE' as pid
MATCH (m:Product {full_sn:snid})-[:uses]->(p:Part)
WITH snid, pid, collect(p) AS mparts
MATCH path=(anc:Part)-[:has*]->(child:Part)
WHERE ALL(node IN nodes(path) WHERE node IN mparts)
WITH snid, path, relationships(path)[-1] AS rel,
nodes(path)[-2] AS parent, nodes(path)[-1] AS child
RETURN stuff I want
However, to get the query I want, I must add a filter on the child using the part number pid in the second MATCH statement:
MATCH path=(anc:Part)-[:has*]->(child:Part {pn:pid})
And when I try to run the new query, neo4j browser compains that there is not enough memory. (Neo.TransientError.General.OutOfMemoryError). When I run it with EXPLAIN, the db hits are exploding into the 10s of billions, as if I'm asking it for a massive cartestian product: but all I have done is added a restriction on the child, so this should be reducing the search space, shouldn't it?
I also tried adding an index on :Part(pn). Now the profile shown by EXPLAIN looks very efficient, but I still have the same memory error.
If anyone can help me understand why this change between the two queries is causing problems, I'd greatly appreciate it!
Best wishes,
Ben
MATCH path=(anc:Part)-[:has*]->(child:Part)
The * is exploding to every downstream child node.
That's appropriate if that is what's desired. If you make this an optional match and limit to the collect items, this should restrict the return results.
OPTIONAL MATCH path=(anc:Part)-[:has*]->(child:Part)
This is conceptionally (& crudely) similar to an inner join in SQL.

What does a comma in a Cypher query do?

A co-worker coded something like this:
match (a)-[r]->(b), (c) set c.x=y
What does the comma do? Is it just shorthand for MATCH?
Since Cypher's ASCII-art syntax can only let you specify one linear chain of connections in a row, the comma is there, at least in part, to allow you to specify things that might branch off. For example:
MATCH (a)-->(b)<--(c), (b)-->(d)
That represents three nodes which are all connected to b (two incoming relationships, and one outgoing relationship.
The comma can also be useful for separating lines if your match gets too long, like so:
MATCH
(a)-->(b)<--(c),
(c)-->(d)
Obviously that's not a very long line, but that's equivalent to:
MATCH
(a)-->(b)<--(c)-->(d)
But in general, any MATCH statement is specifying a pattern that you want to search for. All of the parts of that MATCH form the pattern. In your case you're actually looking for two unconnected patterns ((a)-[r]->(b) and (c)) and so Neo4j will find every combination of each instance of both patterns, which could potentially be very expensive. In Neo4j 2.3 you'd also probably get a warning about this being a query which would give you a cartesian product.
If you specify multiple matches, however, you're asking to search for different patterns. So if you did:
MATCH (a)-[r]->(b)
MATCH (c)
Conceptually I think it's a bit different, but the result is the same. I know it's definitely different with OPTIONAL MATCH, though. If you did:
MATCH (a:Foo)
OPTIONAL MATCH (a)-->(b:Bar), (a)-->(c:Baz)
You would only find instances where there is a Foo node connected to nothing, or connected to both a Bar and a Baz node. Whereas if you do this:
MATCH (a:Foo)
OPTIONAL MATCH (a)-->(b:Bar)
OPTIONAL MATCH (a)-->(c:Baz)
You'll find every single Foo node, and you'll match zero or more connected Bar and Baz nodes independently.
EDIT:
In the comments Stefan Armbruster made a good point that commas can also be used to assign subpatterns to individual identifiers. Such as in:
MATCH path1=(a)-[:REL1]->(b), path2=(b)<-[:REL2*..10]-(c)
Thanks Stefan!
EDIT2: Also see Mats' answer below
Brian does a good job of explaining how the comma can be used to construct larger subgraph patterns, but there's also a subtle yet important difference between using the comma and a second MATCH clause.
Consider the simple graph of two nodes with one relationship between them. The query
MATCH ()-->() MATCH ()-->() RETURN 1
will return one row with the number 1. Replace the second MATCH with a comma, however, and no rows will be returned at all:
MATCH ()-->(), ()-->() RETURN 1
This is because of the notion of relationship uniqueness. Inside each MATCH clause, each relationship will be traversed only once. That means that for my second query, the one relationship in the graph will be matched by the first pattern, and the second pattern will not be able to match anything, leading to the whole pattern not matching. My first query will match the one relationship once in each of the clauses, and thus create a row for the result.
Read more about this in the Neo4j manual: http://neo4j.com/docs/stable/cypherdoc-uniqueness.html

Neo4j- incorrect count in multiple match query

When I am trying to execute this query
match(u:User)-[ro:OWNS]->(p:PushDevice) where p.type='gcm'
match(com:Comment)
return count(com) as total_comments,count(ro) as device
this is returning the same number in both total_comments and device which is the number of total comment.
I feel like your query should work, though I'm more confident that this will work:
MATCH (u:User)-[ro:OWNS]->(p:PushDevice) WHERE p.type='gcm'
WITH count(ro) AS device
MATCH (com:Comment)
RETURN count(com) as total_comments, device
Your query is generating a row for every combination of your MATCH results. If you just returned the ro and com values, this would be more clear. See this console for an example. That console has 2 comments and a single OWNS relationship, but the result shows 2 rows (both rows have the same OWNS relationship). So, your query is essentially counting the number of rows -- not what you expected.
Here is an example of a query that would work as you you expected:
MATCH (u:User)-[ro:OWNS]->(p:PushDevice {type:'gcm'})
WITH COUNT(ro) AS device
MATCH (com:Comment)
RETURN count(com) AS total_comments, device;
[EDITED]
This would also work logically, but is less performant (as it takes a cartesian product and then filters out duplicates):
MATCH (u:User)-[ro:OWNS]->(p:PushDevice { type: 'gcm' })
MATCH (com:Comment)
RETURN COUNT(DISTINCT com), COUNT(DISTINCT ro);
Observation
The power of neo4j comes from its efficient handling of relationships. So, the most efficient queries tend to be for connected subgraphs (where all nodes are connected by relationships).
Since your query is not for a single connected subgraph, getting the answer you want is naturally going to be a bit more convoluted and can be inefficient.
If you determine that the suggested queries are too slow, you can try making 2 separate queries instead. That may also make make your code easier to understand.

Cypher directionless query not returning all expected paths

I have a cypher query that starts from a machine node, and tries to find nodes related to it using any of the relationship types I've specified:
match p1=(n:machine)-[:REL1|:REL2|:REL3|:PERSONAL_PHONE|:MACHINE|:ADDRESS*]-(n2)
where n.machine="112943691278177215"
optional match p2=(n2)-[*]->()
return p1,p2
limit 300
The optional match clause is my attempt to traverse outwards in my model from each of the nodes found in p1. The below screenshot shows the part of the results I'm having issues with:
You can see from the starting machine node, it finds a personal_phone node via two app nodes related to the machine. For clarification, this part of the model is designed like so:
So it appeared to be working until I realized that certain paths were somehow being left out of the results. If I run a second query showing me all apps related to that particular personal_phone node, I get the following:
match p1=(n:personal_phone)<-[*]-(n2)
where n.personal_phone="(xxx) xxx-xxxx"
return p1
limit 100
The two apps I have segmented out, are the two apps shown in the earlier image.
So why doesn't my original query show the other 7 apps related to the personal_phone?
EDIT : Despite the overly broad optional match combined with the limit 300 statement, the returned results show only 52 nodes and 154 rels. This is because the paths following relationships with an outward direction are going to stop very quickly. I could have put a max 2 on it but was being lazy.
EDIT 2: The query I finally came up with to give me what I want is this:
match p1=(m:machine)<-[:MACHINE]-(a:app)
where m.machine="112943691278177215"
optional match p2=(a:app)-[:REL1|:REL2|:REL3|:PERSONAL_PHONE|:MACHINE|:ADDRESS*0..3]-(n)
where a<>n and a<>m and m<>n
optional match p3=(n)-[r*]->(n2)
where n2<>n
return distinct n, r, n2
This returns 74 nodes and 220 rels which seems to be the correct result (387 rows). So it seems like my incredibly inefficient query was the reason the graph was being truncated. Not only were the nodes being traversed many times, but the paths being returned contained duplicate information which consumed the limited rows available for return. I guess my new questions are:
When following multiple hops, should I always explicitly make sure the same nodes aren't traversed via where clauses?
If I was to return p3 instead, it returns 1941 rows to display 74 nodes and 220 rels. There seems to be a lot of duplication present. Is it typically better to use return distinct (like I have above) or is there a way to easily dedupe the nodes and relationships within a path?
So part of your issue here (updated questions) is that you're returning paths, and not individual nodes/relationships.
For example, if you do MATCH p=(n)-[*]-() and your data is A->B->C->D then the results you'll get will be A->B, A->B->C, A->B->C->D and so on. If on the other hand you did MATCH (n)-[r:*]-(m) and then worked with r and m, you could get the same data, but deal with the distinct things on the path rather than have to sort that out later.
It seems you want the nodes and relationships, but you're asking for the paths - so you're getting them. ALL of them. :)
When following multiple hops, should I always explicitly make sure the
same nodes aren't traversed via where clauses?
Well, the way you did it, yes -- but honestly I haven't ever run into that problem before. Part of the issue again is the overly-broad query you're running. Lacking any constraint, it ends up roping in the items you've already matched, which buys you this problem. Perhaps better would be to match some set of possible labels, to narrow your query down. By narrowing it down, you wouldn't have the same issue, for example something like:
MATCH (n)-[r:*]-(m)
WHERE 'foo' in labels(m) or 'bar' in labels(m)
RETURN n, r, m;
Note we're not doing path matching, and we're specifying some range of labels that could be m, without leaving it completely wild-west. I tend to formulate queries this way, so your question #2 never really arises. Presumably you have a reasonable data model that would act as your grounding for that.

Clarification on multiple MATCH patterns in a Cypher query

In the below query, does the 2nd match pattern john-[r?:HAS_SEEN]->(movie) run on the result of the first match john-[:IS_FRIEND_OF]->(user)-[:HAS_SEEN]->(movie) . I am trying to understand if this is similar to the unix pipe concept i.e. the result of the 1st pattern is the input to the 2nd pattern.
start john=node(1)
match
john-[:IS_FRIEND_OF]->(user)-[:HAS_SEEN]->(movie),
john-[r?:HAS_SEEN]->(movie)
where r is null
return movie;
I don't think I would compare multiple MATCH clauses to the UNIX pipes concept. Using multiple, comma-separated matches is just a way of breaking out of the 1-dimensional constraint of writing relationships with a single sentence. For example, the following is completely valid:
MATCH a--b,
b--c,
c--d,
d--e,
a--c
At the very end I went back and referenced a and c even though they weren't used in the clause directly before. Again, this is just a way of drawing 2 dimensions' worth of relationships by only using 1-dimensional sentences. We're drawing a 2-dimensional picture with several 1-dimensional pieces.
On a side note, I WOULD compare the WITH clause to UNIX pipes -- I'd call them analogous. WITH will pipe out any results it finds into the next set of clauses you give it.
Yes, simply think of these two matches to be one - i.e.
match (movie)<-[r?:HAS_SEEN]-john-[:IS_FRIEND_OF]->(user)-[:HAS_SEEN]->(movie)
or
match john-[:IS_FRIEND_OF]->(user)-[:HAS_SEEN]->(movie)<-[r?:HAS_SEEN]-john

Resources