neo4j - multiple optional matches - neo4j

I have the following neo4j database:
http://console.neo4j.org/?id=gkkmha
I then run the following query:
MATCH (person:Person)-[:plays]->(instrument:Instrument {name: 'Drums'})
OPTIONAL MATCH (band:Band { name: 'bandname' })-[:genre]->(genre:Genre)<-[:likes]-(person)
OPTIONAL MATCH (band)-[:influenced]->(influence:Influence)<-[:influenced]-(person)
RETURN person.name, COLLECT(genre.name) as matched_genres, COLLECT (influence.name) as matched_influences, (count(genre)/4.0) as score
ORDER BY score DESC
I want to be able to find musicians who play the specified instrument but also have similar genre matches and influences to the band. So far I've got it working for matching genres and returning a list of those genres, but I can't make it do the same for influences. I want it to return a list of matching influences as well.
Ideally it'd also get the total number of genres and influences the band is associated with (though this is just a nice to have).
Current output:
+-----------------------------------------------------------------+
| person.name | matched_genres | matched_influences | score |
+-----------------------------------------------------------------+
| "Robert Smith" | ["Soul","Motown"] | [] | 0.5 |
| "Alex Smith" | ["Soul"] | [] | 0.25 |
| "Mr Drummer" | [] | [] | 0.0 |
+-----------------------------------------------------------------+
3 rows
54 ms
Any thoughts?

i believe you got there a typo, instead of :influenced try :Influenced
MATCH (person:Person)-[:plays]->(instrument:Instrument { name: 'Drums' })
OPTIONAL
MATCH (band:Band { name: 'bandname' })-[:genre]->(genre:Genre)<-[:likes]-(person)
OPTIONAL
MATCH (band)-[:Influenced]->(influence:Influence)<-[:Influenced]-(person)
RETURN person.name, COLLECT(genre.name) AS matched_genres, COLLECT(influence.name) AS matched_influences,(count(genre)/4.0) AS score
ORDER BY score DESC

Related

How to get the closest nodes in Cypher?

Based on the model here, I am trying to find the closest meeting room based on the proximity of the rooms in this model. I wanted to the results like this,
+-------+----------+--+
| Room | Distance | |
+-------+----------+--+
| room1 | 3 | |
| room2 | 3 | |
| room3 | 4 | |
| room4 | 4 | |
+-------+----------+--+
My model:
I have tried this query:
MATCH (p:Person {name:"test"})-[r*2..]->(f:Floor)<-[:ROOM_LOCATED_IN_FLOOR]-(r:Room)
RETURN p, f, r
which just gives the meeting rooms the person is located. But I wanted to traverse through other rooms in different floors.
Here a sample data for testing:
CREATE (p:Person)
CREATE (d:Desk)
CREATE (f1:Floor)
CREATE (f2:Floor)
CREATE (r1:Room {name : 'room1'})
CREATE (r2:Room {name : 'room2'})
CREATE (r3:Room {name : 'room3'})
CREATE (r4:Room {name : 'room4'})
CREATE (p)-[:SEATED_AT]->(d)-[:LOCATED_IN]->(f1)-[:HAS_NEXT]->(f2)
CREATE (f1)<-[:PART_OF]-(r1)
CREATE (f1)<-[:PART_OF]-(r2)
CREATE (f2)<-[:PART_OF]-(r3)
CREATE (f2)<-[:PART_OF]-(r4)
Then, you can get the desired result with size() and relationships() functions:
MATCH p = (:Person)-[*]-(r:Room)
RETURN r.name as Room, size( relationships(p) ) as Distance
ORDER BY Distance
The output will be:
╒═══════╤══════════╕
│"Room" │"Distance"│
╞═══════╪══════════╡
│"room1"│3 │
├───────┼──────────┤
│"room2"│3 │
├───────┼──────────┤
│"room3"│4 │
├───────┼──────────┤
│"room4"│4 │
└───────┴──────────┘

Is it possible to match node by id, that lies in the properties of other node or relationship?

I've got a problem with simple matching.
For example,
I have some node
start startNode = node(0)
It has a relationship with another one. One of the relationship's properties is idOfThirdNode with id(thirdNode).
I found out that start point = node( ) get only digits as arguments and any toInt(rel.idOfThirdNode) is not available at all, as other match(point:_Node) where id(point) = rel.idOfThirdNode
Find node by property is not a problem. But it isn't possible to set new duplicate id-property.
Have this problem any decision or only saving this property in model and begining of new matching with this property like id?
Edit:
Earlier I have had in result of such action:
start startNode = node({0})
optional match startNode-[r:REL]-(relNode: _Node)
return distinct startNode, id(r) as linkId, id(relNode) as nodeId,
r.idOfthirdNode as point
beautiful table with nulls in some fields
______________________________________
| StartNode| linkId | nodeId | point |
--------------------------------------
| startNode| 1 | 2 | null |
| info | | | |
-------------------------------------
| startNode| 3 | 4 | 5 |
| info | | | |
But now this "where" make disabled all null matching
start startNode = node({0})
optional match startNode-[r:REL]-(relNode: _Node), (pointNode:_Node)
where id(pointNode) = r.idOfthirdNode
return distinct startNode, id(r) as linkId, id(relNode) as nodeId,
collect({pointNode.name, id:id(pointNode)}) as point
and I get only second line.
You should be able to do something like this:
MATCH (point:_Node), (node:Label)
WHERE ID(point) = node.idOfThirdNode
RETURN *
But I've never actually seen that done because relationships are so much better than foreign keys
This should work for you:
START startNode = node(0)
MATCH (startNode)-[rel]->(secondNode), (thirdNode:_Node)
WHERE ID(thirdNode) = rel.idOfThirdNode
RETURN startNode, secondNode, thirdNode

Performance in Neo4j cypher query

I have the following cypher query:
MATCH (country:Country { name: 'norway' }) <- [:LIVES_IN] - (person:Person)
WITH person
MATCH (skill:Skill { name: 'java' }) <- [:HAS_SKILL] - (person)
WITH person
OPTIONAL MATCH (skill:Skill { name: 'javascript' }) <- [rel:HAS_SKILL] - (person)
WITH person, CASE WHEN skill IS NOT NULL THEN 1 ELSE 0 END as matches
ORDER BY matches DESC
LIMIT 50
RETURN COLLECT(ID(person)) as personIDs
It seems to perform worse when adding more nodes. Right now with only 5000 Person nodes (a Person node can have multiple HAS_SKILL relationships to Skill nodes). Right now it takes around 180 ms to perform the query, but adding another 1000 Person nodes with relationships adds 30-40 ms to the query. We are planning on having millions of Person nodes, so adding 40 ms every 1000 Person is a no go.
I use parameters in my query instead of 'norway', 'java', 'javascript' in the above query. I have created indexes on :Country(name) and :Skill(name).
My goal with the query is to match every person that lives in a specified country (norway) which also have the skill 'java'. If the person also have the skill 'javascript' it should be ordered higher in the result.
How can I restructure the query to improve performance?
Edit:
There also seems to be an issue with the :Country nodes, if I switch out
MATCH (country:Country { name: 'norway' }) <- [:LIVES_IN] - (person:Person)
with
MATCH (city:City { name: 'vancouver' }) <- [:LIVES_IN] - (person:Person)
the query time jumps down to around 15-50 ms, depending on what city i query for. It is still a noticeable increase in query time when adding more nodes.
Edit 2:
I seems like the query time is increased by a huge amount when there is a lot of rows in the first match clause. So if I switch the query to match on Skill nodes first, the query times decreases substantially. The query is part of an API and it is created dynamically and I do not know which of the match clauses that will return the smallest amount of rows. It will probably also be a lot more rows in every match clause when the database grows.
Edit 3
I have done some testing from the answers and I now have the following query:
MATCH (country:Country { name: 'norway'})
WITH country
MATCH (country) <- [:LIVES_IN] - (person:Person)
WITH person
MATCH (person) - [:HAS_SKILL] -> (skill:Skill) WHERE skill.name = 'java'
MATCH (person) - [:MEMBER_OF_GROUP] -> (group:Group) WHERE group.name = 'some_group_name'
RETURN DISTINCT ID(person) as id
LIMIT 50
this still have performance issues, is it maybe better to first match all the skills etc, like with the Country node? The query can also grow bigger, I may have to add matching against multiple skills, groups, projects etc.
Edit 4
I modified the query slightly and it seems like this did the trick. I now match all the needed skills, company, groups, country etc first. Then use those later in the query. In the profiler this reduced the database hits from 700k to 188 or something. It is a slightly different query from my original query (different labeled nodes etc), but it solves the same problem. I guess this can be further improved by maybe matching on the node with the least relationships first etc, to start with a reduced number of nodes. I'll do some more testing later!
MATCH (company:Company { name: 'relinkgroup' })
WITH company
MATCH (skill:Skill { name: 'java' })
WITH company, skill
MATCH (skill2:Skill { name: 'ajax' })
WITH company, skill, skill2
MATCH (country:Country { name: 'canada' })
WITH company, skill, skill2, country
MATCH (company) <- [:WORKED_AT] - (person:Person)
, (person) - [:HAS_SKILL] -> (skill)
, (person) - [:HAS_SKILL] -> (skill2)
, (person) - [:LIVES_IN] -> (country)
RETURN DISTINCT ID(person) as id
LIMIT 50
For the first line of your query, the execution has to look for all possible paths between the country and person. Limiting your initial match (thus defining a more accurate starting point for the traversal) you'll win some performance.
So instead of
MATCH (country:Country { name: 'norway' }) <- [:LIVES_IN] - (person:Person)
Try doing it in two steps :
MATCH (country:Country { name: 'norway' })
WITH country
MATCH (country)<-[:LIVES_IN]-(person:Person)
WITH person
As an example, I'll use the simple movie app in the neo4j console : http://console.neo4j.org/
Doing a query equivalent to yours for finding people that knows cypher :
MATCH (n:Crew)-[r:KNOWS]-m WHERE n.name='Cypher' RETURN n, m
The execution plan will be :
Execution Plan
ColumnFilter
|
+Filter
|
+TraversalMatcher
+------------------+------+--------+-------------+----------------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+------------------+------+--------+-------------+----------------------------------------+
| ColumnFilter | 2 | 0 | | keep columns n, m |
| Filter | 2 | 14 | | Property(n,name(0)) == { AUTOSTRING0} |
| TraversalMatcher | 7 | 16 | | m, r, m |
+------------------+------+--------+-------------+----------------------------------------+
Total database accesses: 30
And by defining an accurate starting point :
MATCH (n:Crew) WHERE n.name='Cypher' WITH n MATCH (n)-[:KNOWS]-(m) RETURN n,m
Result in the following execution plan :
Execution Plan
ColumnFilter
|
+SimplePatternMatcher
|
+Filter
|
+NodeByLabel
+----------------------+------+--------+-------------------+----------------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+----------------------+------+--------+-------------------+----------------------------------------+
| ColumnFilter | 2 | 0 | | keep columns n, m |
| SimplePatternMatcher | 2 | 0 | m, n, UNNAMED53 | |
| Filter | 1 | 8 | | Property(n,name(0)) == { AUTOSTRING0} |
| NodeByLabel | 4 | 5 | n, n | :Crew |
+----------------------+------+--------+-------------------+----------------------------------------+
Total database accesses: 13
As you can see, the first method use the traversal pattern, which is quite a bit exponantionnaly expensive with the amount of nodes, and you're doing a global match on the graph.
The second uses an explicit starting point, using the labels index.
EDIT
For the skills part, I would do something like this, if you have some test data to provide it could be more helpful for testing :
MATCH (country:Country { name: 'norway' })
WITH country
MATCH (country)<-[:LIVES_IN]-(person:Person)-[:HAS_SKILL]->(skill:Skill)
WHERE skill.name = 'java'
WITH person
OPTIONAL MATCH (person)-[:HAS_SKILL]->(skillb:Skill) WHERE skillb.name = 'javascript'
WITH person, skillb
There is no need for global lookups, as he already found persons, he just follows the "HAS_SKILL" relationships and filter on skill.name value
Edit 2:
Concerning your last edit, maybe this last part of the query :
MATCH (company) <- [:WORKED_AT] - (person:Person)
, (person) - [:HAS_SKILL] -> (skill)
, (person) - [:HAS_SKILL] -> (skill2)
, (person) - [:LIVES_IN] -> (country)
Could be better written as :
MATCH (person:Person)-[:WORKED_AT]->(company)
WHERE (person)-[:HAS_SKILL]->(skill)
AND (person)-[:HAS_SKILL]->(skill2)
AND (person)-[:LIVES_IN]->(country)

Nodes with same relation to a third node in a graph database

I was following the Neo4J online tutorial and I came to a question while trying this query with the query tool:
match (a:Person)-[:ACTED_IN|:DIRECTED]->()<-[:ACTED_IN|:DIRECTED]-(b:Person)
return a,b;
I was expecting one of the pairs returned to have the same Person in both identifiers but that didn't happen. Can somebody explain me why? Does a match clause exclude repeated elements in the different identifiers used?
UPDATE:
This question came to me in "Lession 3 - Adding Relationships with Cypher, more" from Neo4J online tutorial, where the query I mentioned above is presented.
I refined the query to the following one, in order to focus more directly my question:
MATCH (a:Person {name:"Keanu Reeves"})-[:ACTED_IN]->()<-[:ACTED_IN]-(b)
RETURN a,b;
The results:
|---------------|--------------------|
| a | b |
|---------------|--------------------|
| Keanu Reeves | Carrie-Anne Moss |
| Keanu Reeves | Laurence Fishburne |
| Keanu Reeves | Hugo Weaving |
| Keanu Reeves | Brooke Langton |
| Keanu Reeves | Gene Hackman |
| Keanu Reeves | Orlando Jones |
|------------------------------------|
So, why there is no row with Keanu Reeves in a and b? Doesn't he should match with both both relations :ACTED_IN?
The behavior you observed is by design.
To quote the manual:
While pattern matching, Cypher makes sure to not include matches where
the same graph relationship is found multiple times in a single
pattern. In most use cases, this is a sensible thing to do.
I would check your data sample. Your query looks like it works just fine for me. I replicated with a simple data set, and here's verification that it does produce pairs like what you're looking for.
Joe acted in "Some Flick"
neo4j-sh (?)$ create (p:Person {name:"Joe"})-[:ACTED_IN]->(m:Movie {name:"Some Flick"});
+-------------------+
| No data returned. |
+-------------------+
Nodes created: 2
Relationships created: 1
Properties set: 2
Labels added: 2
14 ms
But Joe is so multi-talented, he also directed "Some Flick".
neo4j-sh (?)$ match (p:Person {name: "Joe"}), (m:Movie {name: "Some Flick"}) create p-[:DIRECTED]->m;
+-------------------+
| No data returned. |
+-------------------+
Relationships created: 2
23 ms
So who are the actor/director pairs that we know of?
neo4j-sh (?)$ match (a:Person)-[:ACTED_IN|:DIRECTED]->()<-[:ACTED_IN|:DIRECTED]-(b:Person)
> return a,b;
+-----------------------------------------------------+
| a | b |
+-----------------------------------------------------+
| Node[222128]{name:"Joe"} | Node[222128]{name:"Joe"} |
| Node[222128]{name:"Joe"} | Node[222128]{name:"Joe"} |
+-----------------------------------------------------+
2 rows
50 ms
Of course it's Joe.

Getting different (and incorrect) results runing the same cypher query on identical DB schemas

I'm running the following cypher query on two identical neo4j DB schemas:
START dave = node(7)
// dave's friend who lives and attends an event in the same city
MATCH dave-[:FRIEND]-friend-[:LIVES]->city-[:HOSTS]->event<-[:ATTENDS]-friend
RETURN dave.name, friend.name, city.name, event.name;
When I run the above query on the DB schema on my local server, I get correct results--a single path:
+----------------------------------------------------+
| dave.name | friend.name | city.name | event.name |
+----------------------------------------------------+
| "dave" | "adam" | "london" | "exhibition" |
+----------------------------------------------------+
In fact for each of the 4 persons node(4, 5, 6, 7), adam=node(4) is the only person who lives and attends an event in the same city.
However, when I run the same query here (on the exact same DB schema as on my local server) I'm getting the following incorrect result:
+----------------------------------------------------+
| dave.name | friend.name | city.name | event.name |
+----------------------------------------------------+
| "dave" | "adam" | "london" | "exhibition" |
| "dave" | "adam" | "london" | "exhibition" |
| "dave" | "bill" | "paris" | "seminar" | // bill doesn't attend seminar
+----------------------------------------------------+
For other persons instead of dave=node(7), the results here are also incorrect (extra paths that don't exist).
try to separate the match phase into 2, i have never used one parameter name 2 times in one match pattern:
except
MATCH dave-[:FRIEND]-friend-[:LIVES]->city-[:HOSTS]->event<-[:ATTENDS]-friend
use
MATCH dave-[:FRIEND]-friend-[:LIVES]->city-[:HOSTS]->event, event<-[:ATTENDS]-friend

Resources