How to find the nodes that overlap between two sets of nodes - neo4j

I have a directed graph, and for a given node N, I want to find the nodes who have inbound relationships to N but not outbound relationships from N. It seems like it should be a simple thing, but I'm having trouble getting my head wrapped around the query.
so I've got:
start n=node({id}) match (n)<-[:RELTYPE]-inbound
but can't figure out how to structure the rest of the clause. I'm feeling rather stupid. I could, of course, just do two queries and perform the calculation in my Java code, but it seems like there should be a query that would do the job more efficiently.
Thanks!

Never mind, I"m an idiot.
start n=node({id}) match n<-[:RELTYPE]-someone where not n-[:RELTYPE]->someone return someone;

Related

Adding a property filter to cypher query explodes memory, why?

I'm trying to write a query that explores a DAG-type graph (a bill of materials) for all construction paths leading down to a specific part number (second MATCH), among all the parts associated with a given product (first MATCH). There is a strange behavior I don't understand:
This query runs in a reasonable time using Neo4j community edition (~2 s):
WITH '12345' as snid, 'ABCDE' as pid
MATCH (m:Product {full_sn:snid})-[:uses]->(p:Part)
WITH snid, pid, collect(p) AS mparts
MATCH path=(anc:Part)-[:has*]->(child:Part)
WHERE ALL(node IN nodes(path) WHERE node IN mparts)
WITH snid, path, relationships(path)[-1] AS rel,
nodes(path)[-2] AS parent, nodes(path)[-1] AS child
RETURN stuff I want
However, to get the query I want, I must add a filter on the child using the part number pid in the second MATCH statement:
MATCH path=(anc:Part)-[:has*]->(child:Part {pn:pid})
And when I try to run the new query, neo4j browser compains that there is not enough memory. (Neo.TransientError.General.OutOfMemoryError). When I run it with EXPLAIN, the db hits are exploding into the 10s of billions, as if I'm asking it for a massive cartestian product: but all I have done is added a restriction on the child, so this should be reducing the search space, shouldn't it?
I also tried adding an index on :Part(pn). Now the profile shown by EXPLAIN looks very efficient, but I still have the same memory error.
If anyone can help me understand why this change between the two queries is causing problems, I'd greatly appreciate it!
Best wishes,
Ben
MATCH path=(anc:Part)-[:has*]->(child:Part)
The * is exploding to every downstream child node.
That's appropriate if that is what's desired. If you make this an optional match and limit to the collect items, this should restrict the return results.
OPTIONAL MATCH path=(anc:Part)-[:has*]->(child:Part)
This is conceptionally (& crudely) similar to an inner join in SQL.

Matching immediate neighbors of a node without returning relationships between neighbors

I am attempting to return the immediate (outgoing) neighbors of a node, without the relationships between those nodes. I've tried many, many queries, including these:
MATCH (k:Record{name: 'First'})-->(m)
RETURN k,m
and
MATCH (:Record{name: 'First'})-[r]->()
RETURN r
but I always get all the relationships between the neighbors as well, which is not relevant for my application, and needlessly clutters the visualization. Is there an easy way to avoid the return of these relationships? I have read a great deal in the last 2 days but have been unable to make this work.
Note: this question
Neo4j - Cypher query for finding neighbourhood graph
seems to imply that either of these should work, because in this question he's asking for exactly what I get: the neighborhood graph. This makes me think that there's something wrong with my install or the creation of my relationships.
edit: This is a duplicate of
is it possible to hide a Node in the neo4j browser once it has been shown. Thanks to Dave Bennett for the answer.

Is it the optimal way of expressing "go through all nodes" queries in Cypher?

I have a quite large social graph in which I execute global queries like this one:
match (n:User)-[r:LIKES]->(k:User)
where not (k:User)-[]->(n:User)
return count(r);
They take a lot of time and memory, so I am curious if they are expressed in optimal way. I have felling that when I execute such query Cypher is firstly matching everything that fits the expression (and that takes a lot of memory) and then starts to count things. I would rather like to go through every node, check the pattern and update the counter if necessary. This way such queries would not require a lot of memory. So how in fact such query is executed? If it is not optimal, is there a way to make it better (in Cypher)?
If you used the query just as you wrote it, you may not be getting what you think you are. Putting labels on node "variables" can cause them to be treated as fresh (partial) patterns instead of bound nodes. Is your query any faster if you use
MATCH (n:User)-[r:LIKES]->(k:User)
WHERE NOT (n)<--(k)
RETURN count(r)
Here's how this works (not considering internal optimizations, which I don't begin to understand).
For each User node, every outgoing LIKES relationship is followed. If the other end of the LIKES relationship is a User node, the two nodes and the relationship are bound to the names n, k, and r and passed to the WHERE clause. Every outgoing relationship on the bound k node is then tested to see if it connects to the bound n node. If no such relationship is found, the match is considered successful. The count() function in the RETURN clause counts the resulting collection of relationships that were passed from the match.
If you have a densely connected graph, and particularly if there are many other relationships between nodes other than LIKES relationship, this can be quite an extensive search.
As a further experiment, you might try changing the WHERE clause to read
WHERE NOT (k)-->(n)
and see if it makes any difference. I don't think it will, but I could be wrong.

Neo4j with all relations between all nodes

I'm parsing a cypher query to a .gexf (xml) file. Entering this query in the Neo4j admin gui returns all nodes with their interconnecting relationships (relations between all b-nodes)
START a=node(52681) MATCH(a)-[r]-(b) RETURN a,r,b
The neo4j webgui seems to make it's own queries since it draws up all the relationships between the b-nodes and not just between the a and b-nodes. The JSON response contains no data of which I can parse an xml file with the relationships between the b-nodes.
I've resolved this so far by doing a seperate query for each and every b-node:
MATCH (a)-[r]-(b) WHERE id(a)=52681 AND id(b)=12345
But that doesn't seem like very good design... I would like to get this done in one query only.
Also, I tend to overcomplicate things.
I don't think there's an easy/efficient way to do this.
Consider that the paths between each pair of nodes are likely variable in size, and therefore something like (a)-[r]-(b) will only get you the results you want if a and b are both one degree away.
If they are, however, all only one degree away (and assuming no self-loops, which would be easy enough to take care of anyway), something like
MATCH (a)-[r]-(b) RETURN a, r, b
...would likely do the trick, albeit in a horribly inefficient fashion. But if your paths between a and b are > 1 level deep, it obviously won't work.
In that case, something like this might work, but again be horrible:
MATCH (a)-[r:*]-(b) RETURN a, r, b
...but if the depth of your paths are anything more than a few levels, well...ouch.
When you start asking questions of the graph that span the entire graph and require working/traversing the entirety of it, the kinds of questions you're asking start to blow up a bit.
So, likely, the resolution you came up with is probably the only way to really tackle this.
That said, I'd love to know if anyone else has a different take on this.
HTH, if only a bit.

Returning multiple nodes in cypher with Index lookup

I have the following cypher query being called multiple times.
start n=node:MyIndex(Name="ABC")
return n
Then somewhere else in the code
start m=node:MyIndex(NAME="XYZ")
return m
My data base is hosted in Azure and so I am having latency/performance issues. In order to speed up the process, and to reduce multiple round trips, I thought about combining multiple Cypher queries into a single one.
Actually, I am getting 10+ nodes in lookup but for simplicity I have decided to show example with just two nodes below.
start n=node:MyIndex(Name="ABC"), m=node:MyIndex(NAME="XYZ")
return n, m
My goal is to get what I can in one round trip instead of 10+. It works successfully if the index lookup on All nodes succeeds. However, Cypher query returns zero rows even if one index lookup fails. I was hoping that I will get NULL equivalent in n or m on the missing node. However, no luck.
Please suggest what I am doing wrong and any workarounds to reduce the round trips. Many thanks!
You can use a parametrized query with lucene syntax, e.g.:
START n=node:MyIndex({query}) return n
and parametrize with
{'query':'Name:(ABC XYZ)'}
where list of names is a string with space separated names you are looking for.

Resources