I was following the Neo4J online tutorial and I came to a question while trying this query with the query tool:
match (a:Person)-[:ACTED_IN|:DIRECTED]->()<-[:ACTED_IN|:DIRECTED]-(b:Person)
return a,b;
I was expecting one of the pairs returned to have the same Person in both identifiers but that didn't happen. Can somebody explain me why? Does a match clause exclude repeated elements in the different identifiers used?
UPDATE:
This question came to me in "Lession 3 - Adding Relationships with Cypher, more" from Neo4J online tutorial, where the query I mentioned above is presented.
I refined the query to the following one, in order to focus more directly my question:
MATCH (a:Person {name:"Keanu Reeves"})-[:ACTED_IN]->()<-[:ACTED_IN]-(b)
RETURN a,b;
The results:
|---------------|--------------------|
| a | b |
|---------------|--------------------|
| Keanu Reeves | Carrie-Anne Moss |
| Keanu Reeves | Laurence Fishburne |
| Keanu Reeves | Hugo Weaving |
| Keanu Reeves | Brooke Langton |
| Keanu Reeves | Gene Hackman |
| Keanu Reeves | Orlando Jones |
|------------------------------------|
So, why there is no row with Keanu Reeves in a and b? Doesn't he should match with both both relations :ACTED_IN?
The behavior you observed is by design.
To quote the manual:
While pattern matching, Cypher makes sure to not include matches where
the same graph relationship is found multiple times in a single
pattern. In most use cases, this is a sensible thing to do.
I would check your data sample. Your query looks like it works just fine for me. I replicated with a simple data set, and here's verification that it does produce pairs like what you're looking for.
Joe acted in "Some Flick"
neo4j-sh (?)$ create (p:Person {name:"Joe"})-[:ACTED_IN]->(m:Movie {name:"Some Flick"});
+-------------------+
| No data returned. |
+-------------------+
Nodes created: 2
Relationships created: 1
Properties set: 2
Labels added: 2
14 ms
But Joe is so multi-talented, he also directed "Some Flick".
neo4j-sh (?)$ match (p:Person {name: "Joe"}), (m:Movie {name: "Some Flick"}) create p-[:DIRECTED]->m;
+-------------------+
| No data returned. |
+-------------------+
Relationships created: 2
23 ms
So who are the actor/director pairs that we know of?
neo4j-sh (?)$ match (a:Person)-[:ACTED_IN|:DIRECTED]->()<-[:ACTED_IN|:DIRECTED]-(b:Person)
> return a,b;
+-----------------------------------------------------+
| a | b |
+-----------------------------------------------------+
| Node[222128]{name:"Joe"} | Node[222128]{name:"Joe"} |
| Node[222128]{name:"Joe"} | Node[222128]{name:"Joe"} |
+-----------------------------------------------------+
2 rows
50 ms
Of course it's Joe.
Related
I'm importing a dataset of the following structure into Neo4j:
| teacher | student | period |
|:---------------:|---------|:------:|
| Mr. Smith | Michael | 1 |
| Mrs. Oliver | Michael | 2 |
| Mrs. Roth | Michael | 3 |
| Mrs. Oliver | Michael | 4 |
| Mrs. Oliver | Susan | 1 |
| Mrs. Roth | Susan | 2 |
My goal is to create a graph where a teacher "sends" students from one period to the next, showing the flow of students between teachers. The above graph for instance, would look like this:
Using words, my logic looks like this:
Generate a unique node for every teacher
For each student, create a relationship connecting the earliest period to the next earliest period, until the latest period is reached.
My code so far completes the first step:
LOAD CSV WITH HEADERS FROM 'file:///neo_sample.csv' AS row // loads local file
MERGE(a:teacher {teacher: row.teacher}) // used merge instead of create to produce unique teacher nodes.
Here is how you can produce your illustrated graph.
Assuming your CSV file looks like this:
teacher;student;period
Mr. Smith;Michael;1
Mrs. Oliver;Michael;2
Mrs. Roth;Michael;3
Mrs. Oliver;Michael;4
Mrs. Oliver;Susan;1
Mrs. Roth;Susan;2
then this query should work:
LOAD CSV WITH HEADERS FROM 'file:///neo_sample.csv' AS row FIELDTERMINATOR ';'
WITH row.teacher AS t, row.student AS s, row.period AS p
ORDER BY p
WITH s, COLLECT({t:t, p:p}) AS data
FOREACH(i IN RANGE(0, SIZE(data)-2) |
MERGE(a:Teacher {name: data[i].t})
MERGE(b:Teacher {name: data[i+1].t})
MERGE (a)-[:SENDS {student: s, period: data[i].p}]->(b)
)
I'm using neo4j to develop a proof of concept and I want to get all Nodes ID for all paths from my root node to leafs, example with ids :
ROOT1-->N1--->SN2--->L1
ROOT1-->N2--->SN3--->L3
What I want to get in my result query is : ROO1,N1,SN2 and ROOT1,N2,SN3
Im new to cypher and I struggle to get this result, any help would be usefull .
I assume that the ID that you mention is an id property.
To get a collection of the node ids in each full path (except for the leaf node):
MATCH p=(root {id: 'ROOT1'})-[*]->(leaf)
WHERE NOT (leaf)-->()
RETURN EXTRACT(x IN NODES(p)[..-1] | x.id) AS result;
Here is a sample result:
+----------------------+
| result |
+----------------------+
| ["ROOT1","N1","SN2"] |
| ["ROOT1","N2","SN3"] |
+----------------------+
I'm using neo4j 2.1.7 Recently i was experimenting with Match queries, searching for nodes with several labels. And i found out, that generally query
Match (p:A:B) return count(p) as number
and
Match (p:B:A) return count(p) as number
works different time, extremely in cases when you have for example 2 millions of Nodes A and 0 of Nodes B.
So do labels order effects search time? Is this future is documented anywhere?
Neo4j internally maintains a labelscan store - that's basically a lookup to quickly get all nodes carrying a definied label A.
When doing a query like
MATCH (n:A:B) return count(n)
labelscanstore is used to find all A nodes and then they're filtered if those nodes carry label B as well. If n(A) >> n(B) it's way more efficient to do MATCH (n:B:A) instead since you look up only a few B nodes and filter those for A.
You can use PROFILE MATCH (n:A:B) return count(n) to see the query plan. For Neo4j <= 2.1.x you'll see a different query plan depending on the order of the labels you've specified.
Starting with Neo4j 2.2 (milestone M03 available as of writing this reply) there's a cost based Cypher optimizer. Now Cypher is aware of node statistics and they are used to optimize the query.
As an example I've used the following statements to create some test data:
create (:A:B);
with 1 as a foreach (x in range(0,1000000) | create (:A));
with 1 as a foreach (x in range(0,100) | create (:B));
We have now 100 B nodes, 1M A nodes and 1 AB node. In 2.2 the two statements:
MATCH (n:B:A) return count(n)
MATCH (n:A:B) return count(n)
result in the exact same query plan (and therefore in the same execution speed):
+------------------+---------------+------+--------+-------------+---------------+
| Operator | EstimatedRows | Rows | DbHits | Identifiers | Other |
+------------------+---------------+------+--------+-------------+---------------+
| EagerAggregation | 3 | 1 | 0 | count(n) | |
| Filter | 12 | 1 | 12 | n | hasLabel(n:A) |
| NodeByLabelScan | 12 | 12 | 13 | n | :B |
+------------------+---------------+------+--------+-------------+---------------+
Since there are only few B nodes, it's cheaper to scan for B's and filter for A. Smart Cypher, isn't it ;-)
Could anyone please tell me the difference between Merge and Create in Cypher Querying.
How Neo4j stores data physically?
Thanks in advance..
CREATE does just what it says. It creates, and if that means creating duplicates, well then it creates. MERGE does the same thing as create, but also checks to see if a node already exists with the properties you specify. If it does, then it doesn't create. This helps avoid duplicates. Here's an example: I use CREATE twice to create a person with the same name.
neo4j-sh (?)$ create (p:Person {name: "Bob"});
+-------------------+
| No data returned. |
+-------------------+
Nodes created: 1
Properties set: 1
Labels added: 1
9 ms
neo4j-sh (?)$ create (p:Person {name: "Bob"});
+-------------------+
| No data returned. |
+-------------------+
Nodes created: 1
Properties set: 1
Labels added: 1
5 ms
So now when we query, there are two Bob's.
neo4j-sh (?)$ match (p:Person {name:"Bob"}) return p;
+--------------------------+
| p |
+--------------------------+
| Node[222124]{name:"Bob"} |
| Node[222125]{name:"Bob"} |
+--------------------------+
2 rows
46 ms
Let's MERGE in another Bob and see what happens.
neo4j-sh (?)$ merge (p:Person {name:"Bob"});
+--------------------------------------------+
| No data returned, and nothing was changed. |
+--------------------------------------------+
2 ms
neo4j-sh (?)$ match (p:Person {name:"Bob"}) return p;
+--------------------------+
| p |
+--------------------------+
| Node[222124]{name:"Bob"} |
| Node[222125]{name:"Bob"} |
+--------------------------+
2 rows
11 ms
Bob already existed, so MERGE did nothing here. Querying again, the same two Bobs are present. Had there been no Bobs in the database, MERGE would have done the same thing as CREATE.
I have a graph like this:
(2)<-[0:CHILD]-(1)-[1:CHILD]->(3)
In words: Node 1,2 and 3 (all with names); Edges 0 and 1
I write the following cypher-query:
START nodes = node(1,2,3), relationship = relationship(0,1)
RETURN nodes, relationship
and got as a result:
==> +-----------------------------------------------+
==> | nodes | relationship |
==> +-----------------------------------------------+
==> | Node[1]{name->"Risikogruppe2"} | :CHILD[0] {} |
==> | Node[1]{name->"Risikogruppe2"} | :CHILD[1] {} |
==> | Node[2]{name->"Beruf 1"} | :CHILD[0] {} |
==> | Node[2]{name->"Beruf 1"} | :CHILD[1] {} |
==> | Node[3]{name->"Beruf 2"} | :CHILD[0] {} |
==> | Node[3]{name->"Beruf 2"} | :CHILD[1] {} |
==> +-----------------------------------------------+
==> 6 rows, 0 ms
now my question:
why I became all nodes twice and relationships three time? I just want to get all of it one time.
thanks for your time ^^
The way Cypher works is very similar to SQL. When you create your variables in your START clause, you're sort of doing a from nodes, relationships in SQL (tables). The reason you're getting a cartesian product of all of the possible values for the two, is because you're not doing any sort of match or where to filter them, so it's basically like:
select *
from nodes, relationships
Where you forgot to put the foreign key relationship between the tables.
In Cypher, you do this by doing a match, usually:
start n=node(1,2,3), r=relationship(0,1)
match n-[r]-m // find where the n nodes and the r relationships point (to m)
return *
But since you have no match, you get a cartesian product.
You should only see the nodes and relationships once, unless you do some matching.
Tried to reproduce your problem, but I haven't been able to.
http://tinyurl.com/cobd8oq
Is it possible for you to create an console.neo4j.org example of your problem?
Thanks,
Andrés