I am trying to write a query which returns only the first common node between two nodes in a scenario where there may be multiple.
Using this graph for reference - http://neo4j.com/docs/stable/cypher-cookbook-friend-finding.html.
For example, I'm Joe, and I would like to find the list of friend-of-friends I don't know, with only one person that I should ask for an introduction. An example return set is this, even though Bill is also a connection to Ian:
Bill Derrick
Sara Ian
Sara Jill
I've tried using DISTINCT, but that doesn't group properly:
MATCH (joe { name: 'Joe' })-[:knows]-(friend)-[:knows]-(friend_of_friend)
WHERE NOT (joe)-[:knows]-(friend_of_friend)
WITH DISTINCT friend_of_friend, friend
RETURN friend.name, friend_of_friend.name
I'm starting to believe I need a second query with the friend node passed to it. Hopefully not though, because that sounds painfully inefficient. What am I missing?
You need to do an aggregation on level of friend using the collect function:
MATCH (joe { name: 'Joe' })-[:knows]-(friend)-[:knows]-(friend_of_friend)
WHERE NOT (joe)-[:knows]-(friend_of_friend)
RETURN friend.name, collect(friend_of_friend.name)
update
MATCH path=(joe { name: 'Joe' })-[:knows]-(friend)-[:knows]-(friend_of_friend)
WHERE NOT (joe)-[:knows]-(friend_of_friend)
RETURN collect(friend)[0] AS friend, friend_of_friend
This gives you 3 rows:
Bill, Derrick
Bill, Ian or Sara, Ian
Sara, Jill
Here it's not deterministic if Bill-Ian or Sara-Ian is in the result.
Related
I have big dataset of persons data and found a lot of duplicates by an algorithm.
I marked these duplicates in Neo4j with a relationship.
Example:
(p:Person)-[:similar]->(d:Person)
For testing purpose I created virtual nodes by combining all nodes marked with the similar-relationship.
CALL algo.unionFind.stream('Person', 'similar', {})
YIELD nodeId, setId
WITH setId AS idd, collect(algo.getNodeById(nodeId)) AS nodis
WHERE size(nodis) > 1
CALL apoc.nodes.collapse(nodis,{properties:'combine'}) YIELD from, rel
RETURN idd, from, rel
Here I found the problem, that only two nodes were compared and stored in the result data.
Example:
ID: 5, Peter Smith
ID: 4635, Peter Smit
ID: 4635, Peter Smit
ID: 765, Peter Smith
ID: 5, Peter Smith
ID: 765, Peter Smith
I want to refactor the graph and merge the duplicates (a forrest) into one node. But only one node is merged. How can I merge all forrests, that exist due to the relationship 'similar'?
UPDATE:
I found a semi solution. All similar persons were merged by the following code. All properties were combined as a list. Seems fine to me, except, that the Ids are in a list now, too - but this isn't the topic of the question.
CALL algo.unionFind.stream('Person', 'similar', {})
YIELD nodeId,setId
WITH setId AS idd, collect(algo.getNodeById(nodeId)) AS nodis
CALL apoc.refactor.mergeNodes(nodis, {properties:'combine', mergeRels: true}) YIELD node
RETURN node
How about using constraints unique?
I also faced same problems with MERGE.
example)
CREATE CONSTRAINT ON ( book:Book) ASSERT book.isbn IS UNIQUE
In my application, Users are connected by the Projects they work on together. My schema looks like (User)-[:WORKED_ON]->(Project) with multiple users per project and multiple projects per user.
I'd like to be able to show someone their network of not only who they've worked with but also their "friends of friends" that they might want to connect with.
In other words, I want to find Users I'm connected to (either because we're both connected to the same project, or we're both connected to the same person through our projects).
Visually:
Common project: (me: User)-[:WORKED_ON]->(:Project)<-[:WORKED_ON]-(collaborator:User)
Common Collaborator: (me: User)-[:WORKED_ON]->(:Project)<-[:WORKED_ON]-(common_collaborator:User)-[:WORKED_ON]->(:Project)<-[:WORKED_ON]-(collaborator:User)
So end result would be a list something like:
Jane Doe: You and jane worked on X project
John Doe: You and John have both worked with Jane
Right now the only way I've found is to do it in several different queries.
MATCH (a:User {user_id: 'theuserid'})-[:WORKED_ON*1..4]-(collaborator:User) RETURN collaborator which gives me a list of all the collaborators (first and second degree connections).
Programmatically loop through each collaborator and query AGAIN for HOW we are connected to them by first querying for common projects, and then for common users.
MATCH (user:User {user_id: 'theuserid'}), (other_user: User {user_id: 'otheruserid'}), (user)-[:WORKED_ON]->(common_project:Project)<-[:WORKED_ON]-(other_user) RETURN common_project
MATCH (user:User {user_id: 'theuserid'}), (other_user: User {user_id: 'otheruserid'}), (user)-[:WORKED_ON*2]-(common_collaborator:User)-[:WORKED_ON*2]-(other_user) RETURN common_collaborator
Surely there's a better way? Thanks in advance for your help.
Your first approach already looks pretty good, except that I would use a lower bound of 2 for the variable-length pattern:
MATCH (a:User {user_id: 'theuserid'})-[:WORKED_ON*2..4]-(collaborator:User)
RETURN collaborator;
Also, you should make sure you have an index on :User(user_id).
[UPDATE]
Here is a query that will return rows that contain the user and data about either a project/collaborator or a common_collaborator/collaborator. This should be what you wanted.
MATCH p=(user:User {user_id: 'theuserid'})-[:WORKED_ON*2..4]-(col:User)
RETURN user,
CASE WHEN LENGTH(p) = 2
THEN {project: NODES(p)[1], collaborator: col}
ELSE {common_collaborator: NODES(p)[2], collaborator: col} END AS data;
Apparently it seems like the following WHERE clause will not work because we have two relationships (WorksAt and ResponsibleFor) in our query. If there was only one relationship then this would work like magic. Here in the query below the query returns all the courses in teh department science but it does not filter out courses NOT taught by Maria Smith. All i want to do is get only the courses taught by Maria Smith who works in Science Department.
I came across WITH and Start Clause that seem to be potential candidate clauses make it work where you could filter out one part of the query before sending it to another.
http://neo4j.com/docs/stable/query-with.html
but i havent been able to grasp the concept yet. Anyone up for help?
MATCH (d:Department)<-[w:WorksAt]-(t:Tutor)-[r:ResponsibleFor]->(c:Courses)
WHERE d.name='Science'
AND t.name='Maria Smith'
return c,r
There are a number of ways to skin this particular cat. Let's break it down.
Find the tutor whose name is 'Maria Smith' that works in the 'Science' department
MATCH (d:Department)<-[:WorksAt]-(t:Tutor)
WHERE d.name = 'Science' AND t.name = 'Maria Smith'
RETURN t
Find the courses that a tutor teaches
MATCH (t:Tutor)-[:ResponsibleFor]->(c:Courses)
RETURN t.name, c
Bring these two together to get the courses that Maria Smith from the Scence department teaches
MATCH (d:Department)<-[:WorksAt]-(t:Tutor)
WHERE d.name = 'Science' AND t.name = 'Maria Smith'
WITH t
MATCH (t)-[r:ResponsibleFor]->(c:Courses)
RETURN t.name, r, c
This can also be written as
MATCH (d:Department { name : 'Science' })<-[:WorksAt]-(t:Tutor { name : 'Maria Smith' })
WITH t
MATCH (t)-[r:ResponsibleFor]->(c:Courses)
RETURN t.name, r, c
To maximise query performance you can use schema indexes to quickly locate your Department and Tutor nodes. Are you doing this? To create the indexes use
CREATE INDEX ON :Department(name)
CREATE INDEX ON :Tutor(name)
Run these lines separately.
As an aside were you to want to list the courses that each tutor taught, as suggested above in the second query, you could use the following query to aggregate the courses for each tutor.
MATCH (t:Tutor)-[:ResponsibleFor]->(c:Courses)
RETURN t.name as CourseTutor, collect(c.name) as CourseName
Hope this helps.
Nice breakdown. For performance details on this type of query, refer to Wes Freeman's Pragmatic Cypher Optimization. In setting up the match, start with the smaller node set and work toward the larger (Wes's Rule 4).
I know this may be a simple question but I'm having a hard time finding an answer.
I want to find all "Persons" who have INTERESTED_IN the same Activities as a Person with the id of 1 that is not FRIENDS_WITH person 1
Something like
MATCH (p:Person {Id:1})--[r:INTERSTED_IN]-->(a:Activity {name:Skiing})<--(f:Person)
RETURN f.name
Might be wrong..
I think this will find everyone with the same relationship but then I want to make sure they aren't already friends.
Trying to figure out cypher and can't find any good examples of this.
Almost got it!
MATCH (p:Person { id: 1 })-[r:INTERESTED_IN]->(a:Activity { name: 'Skiing' })<-[r2:INTERESTED_IN]-(f:Person)
WHERE NOT (p)-[:FRIENDS_WITH]-(f)
RETURN f.name
Note that id here is a property, and not the internal node ID. If that's what you're looking for, you'd do this:
MATCH (p:Person)-[r:INTERESTED_IN]->(a:Activity { name: 'Skiing' })<-[r2:INTERESTED_IN]-(f:Person)
WHERE ID(p) = 1 AND NOT (p)-[:FRIENDS_WITH]-(f)
RETURN f.name
And it's "cypher." ;-)
I have set up a graph gist to show my problem: http://gist.neo4j.org/?dropbox-2900504%2Fnames.adoc
I have the problem that if I don't specifically return the person node, or person id, two of my person nodes get merged into one for the return. They both have the same second name and the same labels on the person node (id 3 and 4, Tom and Sarah Smith).
If I add a label to the person node, as with James Smith (id 1) in this example, there is no problem. If I were to remove his :Foo label he would also be merged in with Sarah and Tom in query 2.
If this is not a bug, is there a way for me to return these people distinctly without the person id or node being returned?
I have shown the problem in the above gist, with the only difference between the two queries being that the second one also returns the person id.
Many thanks for your help,
tekiegirl
Edit:
How I want my results to look (basically like query 3 in the gist, but without the person id):
labels names
[Person, Bar] [Sally, Jones]
[Person, Foo] [James, Smith]
[Person] [Sarah, Smith]
[Person] [Tom, Smith]
I think maybe you're not expecting the aggregation behavior you get with collect. Is this what you're trying to get?
MATCH (:Club { name:'FooFighters' })-[:MEMBER]->(p:Person)-[r:NAMED]->(n:Name)
RETURN labels(p) AS labels, n.content AS names
ORDER BY r.order, names
Update with more info, and now I understand what you were doing with your multiple names and order by in the WITH:
collect actually does an implicit group by on the other terms, making them distinct and grouping on them. If you want to group on person, then you need to include person p in the WITH/RETURN that you're collecting in. Here's a rewrite. You can avoid returning p if you want, in the last return statement:
MATCH (:Club{name:'FooFighters'})-[:MEMBER]->(p:Person)-[r:NAMED]->(n:Name)
WITH p, n, r
ORDER BY r.order
WITH p, labels(p) as labels, collect(n.content) as names
RETURN labels, names
ORDER BY names[length(names)-1], names[0]
http://gist.neo4j.org/?8008646