Mutual Friends of Friends Neo4j Cypher - neo4j

I am new in Neo4j and Cypher and writing on my BA-Thesis in which I compare a RDBMS against Neo4j Graph Database in case of social networks. I´ve defined some queries in SQL and Cypher for a Performance Test over JDBC and REST API in JMETER. However, I have a problem declaring the Cypher query to get the Nodes which are the mutual friends of friends for a certain Node.
My first approach was like so:
MATCH (me:Enthusiast {Id: 488})-[:abonniert]->(f:Enthusiast)-[:abonniert]->(fof:Enthusiast)<-[:abonniert]-(f) RETURN o

I guess you're pretty close with your Cypher statement. I assume that "mutual friend on 2nd degree" means that I'm mutual friend with someone the target is mutual friend as well?
If so (shortening labels and relationship types for readbility):
MATCH
(me:En {Id: 488})-[:abonniert]->(f:En)-[:abonniert]->(fof:En),
(fof)-[:abonniert]->(f)-[:abonniert]->me
RETURN fof

it would be nice if you can create an example scenario at http://console.neo4j.org/ .
i would also omit the relationships direction.
MATCH (me:Enthusiast {Id: 488})-[:abonniert]->(f:Enthusiast),
(f)-[:abonniert]-(x:Enthusiast)-[:aboniert]-(y:Enthusiast)
WHERE f--y AND Id(y) <> 488
RETURN f, y, count(x) as NrMutFr
edit
try this console query, works for the scenario: http://console.neo4j.org/r/tws07k
my above query would in that case be
MATCH (me:Enthusiast {Id: 488})-[:abonniert]->(f:Enthusiast),
(f)-[:abonniert]->(x:Enthusiast)<-[:aboniert]-(y:Enthusiast)
WHERE me--y
RETURN f, y, count(x) as NrMutFr
the difference between your posted question query is that you must finish the last node with a new substitute y and not f. than also, if necessary, again match that y with starting me node

Once you've matched your friends you should be able to express the rest of the query as a path predicate: match "my friends", filter out everyone except "those of my friends who have some friend in common", which amounts to the same as "those of my friends who have a friend-of-friend who is a friend of mine.
MATCH (me:Enthusiast { Id: 488 })-[:abonniert]->(f)
WHERE f-[:abonniert]-()-[:abonniert]-()<-[:abonniert]-me
RETURN f
Here's a console: http://console.neo4j.org/r/87n0j9. If I have misunderstood your question you can make changes in that console, click "share" and post back the link here with an explanation of what result you expect to get back.
Edit
If you want to get the nodes that are two or more of your friends are related to in common, you can do
MATCH (me:Enthusiast { Id: 488 })-[:subscribed]->(f)-[:subscribed]->(common)
WITH common, count(common) AS cnt
WHERE cnt > 1
RETURN common
A node that is a common neighbour of your neighbours can be described as a node you can reach on at least two paths. You can therefore match your neighbour-of-neighbours, count the times each "non" is matched, and if it is matched more than once then it is a "non" that is common to at least two of your neighbours. If you want you can return that count and order the result by it as a type of scoring (since this seems to be for recommendation purposes).
MATCH (me:Enthusiast { Id: 488 })-[:subscribed]->(f)-[:subscribed]->(common)
WITH common, count(common) AS score
WHERE score > 1
RETURN common, score
ORDER BY score DESC

Related

Cypher pattern for getting self related nodes

Given that I'm very new to Neo4j. I have a schema which looks like the below image:
Here Has nodes are different for example Passport, Merchant, Driving License, etc. and also these nodes are describing the customer node (looking for future scope of filtering customers based on these nodes).
SIMILAR is a self-relation meaning there exists a customer with ID:1 is related to another customer with ID:2 with a score of 2800.
I have the following questions:
Is this a good schema given the condition of the future scope I mentioned above, or getting all the properties in a single customer node is viable? (Different nodes may have array of items as well, for example: ()-[:HAS]->(Phone) having {active: "+91-1231241", historic_phone_numbers: ["+91-121213", "+91-1231421"]})
I want to get the customer along with describing nodes in relation to other customers. For that, I tried the below query (w/o number of relation more than 1):
// With number_of_relation > 1
MATCH (searched:Customer)-[r:SIMILAR]->(matched:Customer)
WHERE r.score > 2700
WITH searched, COLLECT(matched.customer_id) AS MatchedList, count(r) as cnt
WHERE cnt > 1
UNWIND MatchedList AS matchedCustomer
MATCH (person:Customer {customer_id: matchedCustomer})-[:HAS|:LIVES_IN|:IS_EMPLOYED_BY]->(related)
RETURN searched, person, related
Result what I got is below, notice one customer node not having its describing nodes:
// without number_of_relation > 1
// second attempt - for a sample customer_id
MATCH (matched)<-[r:SIMILAR]-(c)-[:HAS|:LIVES_IN|:IS_EMPLOYED_BY]->(b)
WHERE size(keys(b)) > 0
AND c.customer_id = "1b093559-a39b-4f95-889b-a215cac698dc"
AND r.score > 2700
RETURN b AS props, c AS src_cust, r AS relation, matched
Result I got are below, notice related nodes are not having their describing nodes:
If I had two describing nodes with some property (some may have a list) upon which I wanted to query and build the expected graph specified in point 2 above, how can I do that?
I want the database to find a similar customer given the describing nodes. Example: A customer {name: "Dave"} has phone {active_number: "+91-12345"} is similar to customer {name: "Mike"} has phone {active_number: "+91-12345"}. How can get started with this?
If something is unclear, please ask. I can explain with examples.
[EDITED]
Yes, the schema seems fine, except that you should not use the same HAS relationship type between different node label pairs.
The main problem with your first query is that its top MATCH clause uses a directional relationship pattern, ()-->(), which does not allow all Customer nodes to have a chance to be the searched node (because some nodes may only be at the tail end of SIMILAR relationships). This tweaked query should work better:
MATCH (searched:Customer)-[r:SIMILAR]-(matched:Customer)
WHERE r.score > 2700
WITH searched, COLLECT(matched) AS matchedList
WHERE SIZE(matchedList) > 1
UNWIND matchedList AS person
MATCH (person)-[:HAS|LIVES_IN|IS_EMPLOYED_BY]->(pDesc)
WITH searched, person, COLLECT(pDesc) AS personDescribers
MATCH (searched)-[:HAS|LIVES_IN|IS_EMPLOYED_BY]->(sDesc)
RETURN searched, person, personDescribers, COLLECT(sDesc) AS searchedDescribers
It's not clear what you want are trying to do.
To get all Customers who have the same phone number:
MATCH (c:Customer)-[:HAS_PHONE]-(p:Phone)
WHERE p.activeNumber = '+91-12345'
WITH p.activeNumber AS phoneNumber, COLLECT(c) AS customers
WHERE SIZE(customers) > 1
RETURN phoneNumber, customers

Cypher - Neo4j Query Profiling

I have some questions regarding Neo4j's Query profiling.
Consider below simple Cypher query:
PROFILE
MATCH (n:Consumer {mobileNumber: "yyyyyyyyy"}),
(m:Consumer {mobileNumber: "xxxxxxxxxxx"})
WITH n,m
MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;
and output is:
So according to Neo4j's Documentation:
3.7.2.2. Expand Into
When both the start and end node have already been found, expand-into
is used to find all connecting relationships between the two nodes.
Query.
MATCH (p:Person { name: 'me' })-[:FRIENDS_WITH]->(fof)-->(p) RETURN
> fof
So here in the above query (in my case), first of all, it should find both the StartNode & the EndNode before finding any relationships. But unfortunately, it's just finding the StartNode, and then going to expand all connected :HAS_CONTACT relationships, which results in not using "Expand Into" operator. Why does this work this way? There is only one :HAS_CONTACT relationship between the two nodes. There is a Unique Index constraint on :Consumer{mobileNumber}. Why does the above query expand all 7 relationships?
Another question is about the Filter operator: why does it requires 12 db hits although all nodes/ relationships are already retrieved? Why does this operation require 12 db calls for just 6 rows?
Edited
This is the complete Graph I am querying:
Also I have tested different versions of same above query, but the same Query Profile result is returned:
1
PROFILE
MATCH (n:Consumer{mobileNumber: "yyyyyyyyy"})
MATCH (m:Consumer{mobileNumber: "xxxxxxxxxxx"})
WITH n,m
MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;
2
PROFILE
MATCH (n:Consumer{mobileNumber: "yyyyyyyyy"}), (m:Consumer{mobileNumber: "xxxxxxxxxxx"})
WITH n,m
MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;
3
PROFILE
MATCH (n:Consumer{mobileNumber: "yyyyyyyyy"})
WITH n
MATCH (n)-[r:HAS_CONTACT]->(m:Consumer{mobileNumber: "xxxxxxxxxxx"})
RETURN n,m,r;
The query you are executing and the example provided in the Neo4j documentation for Expand Into are not the same. The example query starts and ends at the same node.
If you want the planner to find both nodes first and see if there is a relationship then you could use shortestPath with a length of 1 to minimize the DB hits.
PROFILE
MATCH (n:Consumer {mobileNumber: "yyyyyyyyy"}),
(m:Consumer {mobileNumber: "xxxxxxxxxxx"})
WITH n,m
MATCH Path=shortestPath((n)-[r:HAS_CONTACT*1]->(m))
RETURN n,m,r;
Why does this do this?
It appears that this behaviour relates to how the query planner performs a database search in response to your cypher query. Cypher provides an interface to search and perform operations in the graph (alternatives include the Java API, etc.), queries are handled by the query planner and then turned into graph operations by neo4j's internals. It make sense that the query planner will find what is likely to be the most efficient way to search the graph (hence why we love neo), and so just because a cypher query is written one way, it won't necessarily search the graph in the way we imagine it will in our head.
The documentation on this seemed a little sparse (or, rather I couldn't find it properly), any links or further explanations would be much appreciated.
Examining your query, I think you're trying to say this:
"Find two nodes each with a :Consumer label, n and m, with contact numbers x and y respectively, using the mobileNumber index. If you find them, try and find a -[:HAS_CONTACT]-> relationship from n to m. If you find the relationship, return both nodes and the relationship, else return nothing."
Running this query in this way requires a cartesian product to be created (i.e., a little table of all combinations of n and m - in this case only one row - but for other queries potentially many more), and then relationships to be searched for between each of these rows.
Rather than doing that, since a MATCH clause must be met in order to continue with the query, neo knows that the two nodes n and m must be connected via the -[:HAS_CONTACT]-> relationship if the query is to return anything. Thus, the most efficient way to run the query (and avoid the cartesian product) is as below, which is what your query can be simplified to.
"Find a node n with the :Consumer label, and value x for the index mobileNumber, which is connected via a -[:HAS_CONTACT]-> relationshop to a node m with the :Consumer label, and value y for its proprerty mobileNumber. Return both nodes and the relationship, else return nothing."
So, rather than perform two index searches, a cartesian product and a set of expand into operations, neo performs only one index search, an expand all, and a filter.
You can see the result of this simplification by the query planner through the presence of AUTOSTRING parameters in your query profile.
How to Change Query to Implement Search as Desired
If you want to change the query so that it must use an expand into relationship, make the requirement for the relationship optional, or use explicitly iterative execution. Both these queries below will produce the initially expected query profiles.
Optional example:
PROFILE
MATCH (n:Consumer{mobileNumber: "xxx"})
MATCH (m:Consumer{mobileNumber: "yyy"})
WITH n,m
OPTIONAL MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;
Iterative example:
PROFILE
MATCH (n1:Consumer{mobileNumber: "xxx"})
MATCH (m:Consumer{mobileNumber: "yyy"})
UNWIND COLLECT(n1) AS n
MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;

Getting relationships from all node's in Neo4j

I am trying to query using Neo4j.
I would like to print result of obtaining information while AUTO-COMPLETE is ON in Neo4j.
For example, suppose query that creating 3 nodes as shown below.
create (david:Person {name: 'david'}), (mike:Person {name: 'mike'}), (book:Book {title:'book'}), (david)-[:KNOWS]->(mike), (david)-[:WRITE]->(book), (mike)-[:WRITE]->(book)
Here are 2 images:
Auto-complete on
Auto-complete off
Figure is shown after query, and I would like to obtain all relating node’s relationships based on starting node ('book' node).
I used this query as shown below.
match (book:Book)-[r]-(person) return book, r, person
Whether AUTO-COMPLETE is ON or OFF, I expect to obtain all node’s relationships including “David knows Mike”, but system says otherwise.
I studied a lot of Syntax structure at neo4j website, and somehow it is very difficult for me. So, I upload this post to acquire assistance for you.
You have to return all the data that you need yourself explicitly. It would be bad for Neo4j to automatically return all the relationships for a super node with thousands of relationships for example, as it would mean lots of I/O, possibly for nothing.
MATCH (book:Book)-[r]-(person)-[r2]-()
RETURN book, r, person, collect(r2) AS r2
Thanks to InverseFalcon, this is my query that works.
MATCH p = (book:Book)-[r]-(person:Person)
UNWIND nodes(p) as allnodes WITH COLLECT(ID(allnodes)) AS ALLID
MATCH (a)-[r2]-(b)
WHERE ID(a) IN ALLID AND ID(b) IN ALLID
WITH DISTINCT r2
RETURN startNode(r2), r2, endNode(r2)

neo4j cartesian product performance improvement

I have a Graph database with over 2 million nodes. I have an application which takes a social graph and does some inference on it. As one step of the algorithm, I have to get all possible combinations of a relationship [:friends] of two connected nodes. Currently, I have a query which looks like:
match (a)-[:friend]-(c), (b)-[:friend]-(d) where id(a)={ida} and id(b)={idb} return distinct c as first, d as second
So, I already know the nodes a and b and I want to get all the possible pairs that can be made from friends of a and b.
This is obviously a very slow operation. I was wondering if there is a more efficient way of getting the same result in neo4j. Perhaps adding indexes might help? Any ideas / clues are welcome!
Example
Node a has friends : x, y
Node b has friends : g, h, i``
Then the result should be:
x,g
x,h
x,i
y,g
y,h
y,i`
If you are not already you should use labels to speed up your query, which might look like:
MATCH (p1:Person)-[:FRIEND]->(p3:Person),(p2:Person)-[:FRIEND]->(p4:Person)
WHERE ID(p1) = 6 AND ID(p2) = 7
RETURN p3 as first, p4 as second
Obviously that will rely on you having created your nodes with a :Person label.
How many friends does the average node have?
I wouldn't use two patterns but just one and the IN operator.
MATCH (p:Person)-[:FRIEND]->(friend:Person)
WHERE id(p) IN [1,2,3]
RETURN p, collect(friend) as friends
Then you have no cross product and you can also return the friends nicely as collection per person.

Neo4j cypher query with variable relationship path length

I'm moving my complex user database where users can be on one of many teams, be friends with each other and more to Neo4j. Doing this in a RDBMS was painful and slow, but is simple and blazing with Neo4j. :)
I was hoping there is a way to query for
a relationship that is 1 hop away and
another relationship that is 2 hops away
from the same query.
START n=node:myIndex(user='345')
MATCH n-[:IS_FRIEND|ON_TEAM*2]-m
RETURN DISTINCT m;
The reason is that users that are friends are one edge from each other, but users linked by teams are linked through that team node, so they are two edges away. This query does IS_FRIEND*2 and ON_TEAM*2, which gets teammates (yeah) and friends of friends (boo).
Is there a succinct way in Cypher to get both differing length relations in a single query?
I rewrote it to return a collection:
start person=node(1)
match person-[:IS_FRIEND]-friend
with person, collect(distinct friend) as friends
match person-[:ON_TEAM*2]-teammate
with person, friends, collect(distinct teammate) as teammates
return person, friends + filter(dupcheck in teammates: not(dupcheck in friends)) as teammates_and_friends
http://console.neo4j.org/r/oo4dvx
thanks for putting together the sample db, Werner.
I have created a small test database at http://console.neo4j.org/?id=sqyz7i
I have also created a query which will work as you described:
START n=node(1)
MATCH n-[:IS_FRIEND]-m
WITH collect(distinct id(m)) as a, n
MATCH n-[:ON_TEAM*2]-m
WITH collect(distinct id(m)) as b, a
START n=node(*)
WHERE id(n) in a + b
RETURN n

Resources