I have an Neo4j Cypher query I struggling with. What I want is to have all nodes with a given relationship, until it reach a relation with an properties who is below a given number ( < 25). All nodes after that relationship is found should be skipped.
Here a sample of my graph:
and how I would like the result to be (every nodes after relationship IS_OWNER with Share < 25 should be skipped)
I hope someone can show how to write the Cypher query who can give the desired result (picture 2) from the sample graph (picture 1). Every nodes after relationship IS_OWNER with Share < 25 should be skipped.
The ALL() function should work here, letting you specify that all relationships in the path should have Share >= 25, and pruning further expansion otherwise.
Something like this, assuming starting at company C:
MATCH (:Company{name:'C'})-[rels:IS_OWNER*0..]-(companies)
WHERE all(rel in rels WHERE rel.Share >= 25)
RETURN companies
EDIT:
While it seems like using a variable on the variable-length relationship is deprecated in the newer versions of neo4j (I'll double-check on that, doesn't seem like a good decision), here's another way to specify this which gets around the warning:
MATCH p = (:Company{name:'C'})-[:IS_OWNER*0..]-(companies)
WHERE all(rel in relationships(p) WHERE rel.Share >= 25)
RETURN companies
Related
I am a newbie who just started learning graph database and I have a problem querying the relationships between nodes.
My graph is like this:
There are multiple relationships from one node to another, and the IDs of these relationships are different.
How to find relationships where the number of relationships between two nodes is greater than 2,or is there a problem with the model of this graph?
Just like on the graph, I want to query node d and node a.
I tried to use the following statement, but the result is incorrect:
match (from)-[r:INVITE]->(to)
with from, count(r) as ref
where ref >2
return from
It seems to count the number of relations issued by all from, not the relationship between from-->to.
to return nodes who have more then 2 relationship between them you need to check the size of the collected rels. something like
MATCH (x:Person)-[r:INVITE]-(:Party)
WITH x, size(collect(r)) as inviteCount
WHERE inviteCount > 2
RETURN x
Aggregating functions like COLLECT and COUNT use non-aggregating terms in the same WITH (or RETURN) clause as "grouping keys".
So, here is one way to get pairs of nodes that have more than 2 INVITE relationships (in a specific direction) between them:
MATCH (from)-[r:INVITE]->(to)
WITH from, to, COUNT(r) AS ref
WHERE ref > 2
RETURN from, to
NOTE: Ideally (for clarity and efficiency), your nodes would have specific labels and the MATCH pattern would specify those labels.
Given that I'm very new to Neo4j. I have a schema which looks like the below image:
Here Has nodes are different for example Passport, Merchant, Driving License, etc. and also these nodes are describing the customer node (looking for future scope of filtering customers based on these nodes).
SIMILAR is a self-relation meaning there exists a customer with ID:1 is related to another customer with ID:2 with a score of 2800.
I have the following questions:
Is this a good schema given the condition of the future scope I mentioned above, or getting all the properties in a single customer node is viable? (Different nodes may have array of items as well, for example: ()-[:HAS]->(Phone) having {active: "+91-1231241", historic_phone_numbers: ["+91-121213", "+91-1231421"]})
I want to get the customer along with describing nodes in relation to other customers. For that, I tried the below query (w/o number of relation more than 1):
// With number_of_relation > 1
MATCH (searched:Customer)-[r:SIMILAR]->(matched:Customer)
WHERE r.score > 2700
WITH searched, COLLECT(matched.customer_id) AS MatchedList, count(r) as cnt
WHERE cnt > 1
UNWIND MatchedList AS matchedCustomer
MATCH (person:Customer {customer_id: matchedCustomer})-[:HAS|:LIVES_IN|:IS_EMPLOYED_BY]->(related)
RETURN searched, person, related
Result what I got is below, notice one customer node not having its describing nodes:
// without number_of_relation > 1
// second attempt - for a sample customer_id
MATCH (matched)<-[r:SIMILAR]-(c)-[:HAS|:LIVES_IN|:IS_EMPLOYED_BY]->(b)
WHERE size(keys(b)) > 0
AND c.customer_id = "1b093559-a39b-4f95-889b-a215cac698dc"
AND r.score > 2700
RETURN b AS props, c AS src_cust, r AS relation, matched
Result I got are below, notice related nodes are not having their describing nodes:
If I had two describing nodes with some property (some may have a list) upon which I wanted to query and build the expected graph specified in point 2 above, how can I do that?
I want the database to find a similar customer given the describing nodes. Example: A customer {name: "Dave"} has phone {active_number: "+91-12345"} is similar to customer {name: "Mike"} has phone {active_number: "+91-12345"}. How can get started with this?
If something is unclear, please ask. I can explain with examples.
[EDITED]
Yes, the schema seems fine, except that you should not use the same HAS relationship type between different node label pairs.
The main problem with your first query is that its top MATCH clause uses a directional relationship pattern, ()-->(), which does not allow all Customer nodes to have a chance to be the searched node (because some nodes may only be at the tail end of SIMILAR relationships). This tweaked query should work better:
MATCH (searched:Customer)-[r:SIMILAR]-(matched:Customer)
WHERE r.score > 2700
WITH searched, COLLECT(matched) AS matchedList
WHERE SIZE(matchedList) > 1
UNWIND matchedList AS person
MATCH (person)-[:HAS|LIVES_IN|IS_EMPLOYED_BY]->(pDesc)
WITH searched, person, COLLECT(pDesc) AS personDescribers
MATCH (searched)-[:HAS|LIVES_IN|IS_EMPLOYED_BY]->(sDesc)
RETURN searched, person, personDescribers, COLLECT(sDesc) AS searchedDescribers
It's not clear what you want are trying to do.
To get all Customers who have the same phone number:
MATCH (c:Customer)-[:HAS_PHONE]-(p:Phone)
WHERE p.activeNumber = '+91-12345'
WITH p.activeNumber AS phoneNumber, COLLECT(c) AS customers
WHERE SIZE(customers) > 1
RETURN phoneNumber, customers
So let's say we have User nodes, Company nodes, Project nodes, School nodes and Event nodes. And there are the following relationships between these nodes
(User)-[:WORKED_AT {start: timestamp, end:timestamp}]->(Company)
(User)-[:COLLABORATED_ON]->(Project)
(Company)-[:COLLABORATED_ON]->(Project)
(User)-[:IS_ATTENDING]->(Event)
(User)-[:STUDIED_AT]->(School)
I am trying to recommend users to any given user. My starting query looks like this
MATCH p=(u:User {id: {leftId}})-[r:COLLABORATED_ON|:AUTHORED|:WORKED_AT|:IS_ATTENDING|:STUDIED_AT*1..3]-(pymk:User)
RETURN p
LIMIT 24
Now this returns me all the pymk users within 1 to 3 relationships away, which is fine. But I want to filter the path according to the relationship attributes. Like remove the following path if the user and pymk work start date and end date is not overlapping.
(User)-[:WORKED_AT]->(Company)<-[:WORKED_AT]-(User)
I can do this with single query
MATCH (u:User)-[r1:WORKED_AT]->(Company)<-[r2:WORKED_AT]-(pymk:User)
WHERE
(r1.startedAt < r2.endedAt) AND (r2.startedAt < r1.endedAt)
RETURN pymk
But couldn't get my head around doing it within a collection of paths. I don't even know if this is possible.
Any help is appreciated.
This should do the trick:
MATCH p=(:User {id: {leftId}})-[:COLLABORATED_ON|:AUTHORED|:WORKED_AT|:IS_ATTENDING|:STUDIED_AT*1..3]-(:User)
WITH p, [rel in relationships(p) WHERE type(rel) = 'WORKED_AT'] as worked
WHERE size(worked) <> 2 OR
apoc.coll.max([work in worked | work.startedAt]) < apoc.coll.min([work in worked | work.endedAt])
RETURN p
LIMIT 24
We're using APOC here to get the max and min of a collection (the max() and min() aggregation functions in just Cypher are aggregation functions across rows, and can't be used on lists).
This relies on distilling the logic of overlapping down to max([start times]) < min([end times]), which you can check out in this highly popular answer here
I have some questions regarding Neo4j's Query profiling.
Consider below simple Cypher query:
PROFILE
MATCH (n:Consumer {mobileNumber: "yyyyyyyyy"}),
(m:Consumer {mobileNumber: "xxxxxxxxxxx"})
WITH n,m
MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;
and output is:
So according to Neo4j's Documentation:
3.7.2.2. Expand Into
When both the start and end node have already been found, expand-into
is used to find all connecting relationships between the two nodes.
Query.
MATCH (p:Person { name: 'me' })-[:FRIENDS_WITH]->(fof)-->(p) RETURN
> fof
So here in the above query (in my case), first of all, it should find both the StartNode & the EndNode before finding any relationships. But unfortunately, it's just finding the StartNode, and then going to expand all connected :HAS_CONTACT relationships, which results in not using "Expand Into" operator. Why does this work this way? There is only one :HAS_CONTACT relationship between the two nodes. There is a Unique Index constraint on :Consumer{mobileNumber}. Why does the above query expand all 7 relationships?
Another question is about the Filter operator: why does it requires 12 db hits although all nodes/ relationships are already retrieved? Why does this operation require 12 db calls for just 6 rows?
Edited
This is the complete Graph I am querying:
Also I have tested different versions of same above query, but the same Query Profile result is returned:
1
PROFILE
MATCH (n:Consumer{mobileNumber: "yyyyyyyyy"})
MATCH (m:Consumer{mobileNumber: "xxxxxxxxxxx"})
WITH n,m
MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;
2
PROFILE
MATCH (n:Consumer{mobileNumber: "yyyyyyyyy"}), (m:Consumer{mobileNumber: "xxxxxxxxxxx"})
WITH n,m
MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;
3
PROFILE
MATCH (n:Consumer{mobileNumber: "yyyyyyyyy"})
WITH n
MATCH (n)-[r:HAS_CONTACT]->(m:Consumer{mobileNumber: "xxxxxxxxxxx"})
RETURN n,m,r;
The query you are executing and the example provided in the Neo4j documentation for Expand Into are not the same. The example query starts and ends at the same node.
If you want the planner to find both nodes first and see if there is a relationship then you could use shortestPath with a length of 1 to minimize the DB hits.
PROFILE
MATCH (n:Consumer {mobileNumber: "yyyyyyyyy"}),
(m:Consumer {mobileNumber: "xxxxxxxxxxx"})
WITH n,m
MATCH Path=shortestPath((n)-[r:HAS_CONTACT*1]->(m))
RETURN n,m,r;
Why does this do this?
It appears that this behaviour relates to how the query planner performs a database search in response to your cypher query. Cypher provides an interface to search and perform operations in the graph (alternatives include the Java API, etc.), queries are handled by the query planner and then turned into graph operations by neo4j's internals. It make sense that the query planner will find what is likely to be the most efficient way to search the graph (hence why we love neo), and so just because a cypher query is written one way, it won't necessarily search the graph in the way we imagine it will in our head.
The documentation on this seemed a little sparse (or, rather I couldn't find it properly), any links or further explanations would be much appreciated.
Examining your query, I think you're trying to say this:
"Find two nodes each with a :Consumer label, n and m, with contact numbers x and y respectively, using the mobileNumber index. If you find them, try and find a -[:HAS_CONTACT]-> relationship from n to m. If you find the relationship, return both nodes and the relationship, else return nothing."
Running this query in this way requires a cartesian product to be created (i.e., a little table of all combinations of n and m - in this case only one row - but for other queries potentially many more), and then relationships to be searched for between each of these rows.
Rather than doing that, since a MATCH clause must be met in order to continue with the query, neo knows that the two nodes n and m must be connected via the -[:HAS_CONTACT]-> relationship if the query is to return anything. Thus, the most efficient way to run the query (and avoid the cartesian product) is as below, which is what your query can be simplified to.
"Find a node n with the :Consumer label, and value x for the index mobileNumber, which is connected via a -[:HAS_CONTACT]-> relationshop to a node m with the :Consumer label, and value y for its proprerty mobileNumber. Return both nodes and the relationship, else return nothing."
So, rather than perform two index searches, a cartesian product and a set of expand into operations, neo performs only one index search, an expand all, and a filter.
You can see the result of this simplification by the query planner through the presence of AUTOSTRING parameters in your query profile.
How to Change Query to Implement Search as Desired
If you want to change the query so that it must use an expand into relationship, make the requirement for the relationship optional, or use explicitly iterative execution. Both these queries below will produce the initially expected query profiles.
Optional example:
PROFILE
MATCH (n:Consumer{mobileNumber: "xxx"})
MATCH (m:Consumer{mobileNumber: "yyy"})
WITH n,m
OPTIONAL MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;
Iterative example:
PROFILE
MATCH (n1:Consumer{mobileNumber: "xxx"})
MATCH (m:Consumer{mobileNumber: "yyy"})
UNWIND COLLECT(n1) AS n
MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;
I'm trying to learn neo4j on my own and I would like to ask how I can find all the nodes that satisfy a specific condition. Concretely, let's say that I have a graph and the only relation connecting the nodes is the relation :FOLLOWS.
How can I find all the nodes of the graph that have at least k "followers"?
If I would like to find exactly find all the nodes with exactly 2 followers I would do
MATCH (a)-[:FOLLOWS]->(n), (b)-[:FOLLOWS]->(n)
RETURN n
Either way, the above method seems pretty tedious when it comes to finding all the nodes that have k "followers".
You'll use predicates in the WHERE clause to add restrictions to narrow down to the data you're interested in.
You can find the number of occurrences of a pattern with the SIZE() function.
Also, using labels on your nodes should help speed up your queries. For this example, I'll assume that :Person is the label for your nodes.
MATCH (p:Person)
WHERE SIZE( (p)<-[:FOLLOWS]-(:Person) ) = 2
RETURN p
You can use inequalities to satisfy your "at least" condition. If you have a between condition, you can chain your inequalities like so:
MATCH (p:Person)
WHERE 5 < SIZE( (p)<-[:FOLLOWS]-(:Person) ) <= 10
RETURN p
which will give you persons with between 6 and 10 followers (inclusive).