I'm importing a dataset of the following structure into Neo4j:
| teacher | student | period |
|:---------------:|---------|:------:|
| Mr. Smith | Michael | 1 |
| Mrs. Oliver | Michael | 2 |
| Mrs. Roth | Michael | 3 |
| Mrs. Oliver | Michael | 4 |
| Mrs. Oliver | Susan | 1 |
| Mrs. Roth | Susan | 2 |
My goal is to create a graph where a teacher "sends" students from one period to the next, showing the flow of students between teachers. The above graph for instance, would look like this:
Using words, my logic looks like this:
Generate a unique node for every teacher
For each student, create a relationship connecting the earliest period to the next earliest period, until the latest period is reached.
My code so far completes the first step:
LOAD CSV WITH HEADERS FROM 'file:///neo_sample.csv' AS row // loads local file
MERGE(a:teacher {teacher: row.teacher}) // used merge instead of create to produce unique teacher nodes.
Here is how you can produce your illustrated graph.
Assuming your CSV file looks like this:
teacher;student;period
Mr. Smith;Michael;1
Mrs. Oliver;Michael;2
Mrs. Roth;Michael;3
Mrs. Oliver;Michael;4
Mrs. Oliver;Susan;1
Mrs. Roth;Susan;2
then this query should work:
LOAD CSV WITH HEADERS FROM 'file:///neo_sample.csv' AS row FIELDTERMINATOR ';'
WITH row.teacher AS t, row.student AS s, row.period AS p
ORDER BY p
WITH s, COLLECT({t:t, p:p}) AS data
FOREACH(i IN RANGE(0, SIZE(data)-2) |
MERGE(a:Teacher {name: data[i].t})
MERGE(b:Teacher {name: data[i+1].t})
MERGE (a)-[:SENDS {student: s, period: data[i].p}]->(b)
)
Related
Hi I have a tables like this
exams
id | exam_name
1 | computer science
2 | Environment science
exam_students
id | exam_id | student_name
1 | 1 | Josh
2 | 1 | Michael
3 | 1 | John
I just need to join and count the total students of each exam and output something like this
exam_name | total_students |
computer science | 3 |
Environment science| 0 |
Thank you for your any help and suggestions
Try this
SELECT
a.exam_name, count(b.id) AS total_students
FROM
exams a
LEFT JOIN exam_students b ON a.id = b.exam_id
GROUP BY
a.id
Hope this help
In my neo4j data, I have unique constraint set.
neo4j-sh (?)$ schema
Indexes
ON :Post(uuid) ONLINE (for uniqueness constraint)
Constraints
ON (post:Post) ASSERT post.uuid IS UNIQUE
However, when i do a profile on query, it seems search is being done by NodeByLabelScan
neo4j-sh (?)$ profile match (p:Post {uuid:"503cb957-9da0-490c-808d-48b64a1b1f64"}) return p;
+---+
| p |
+---+
+---+
0 row
12 ms
Compiler CYPHER 2.2
Planner COST
Filter
|
+NodeByLabelScan
+-----------------+---------------+------+--------+-------------+---------------------------+
| Operator | EstimatedRows | Rows | DbHits | Identifiers | Other |
+-----------------+---------------+------+--------+-------------+---------------------------+
| Filter | 1 | 0 | 2 | p | p.uuid == { AUTOSTRING0} |
| NodeByLabelScan | 1 | 1 | 2 | p | :Post |
+-----------------+---------------+------+--------+-------------+---------------------------+
Total database accesses: 4
Is there something I am missing here?
My neo4j version is 2.2.3.
Neo4j 2.2 introduced a cost based analyzer. I guess here Cypher has the opinion that a NodeByLabelScan with filtering is faster than a index query due to the small number of nodes.
I was following the Neo4J online tutorial and I came to a question while trying this query with the query tool:
match (a:Person)-[:ACTED_IN|:DIRECTED]->()<-[:ACTED_IN|:DIRECTED]-(b:Person)
return a,b;
I was expecting one of the pairs returned to have the same Person in both identifiers but that didn't happen. Can somebody explain me why? Does a match clause exclude repeated elements in the different identifiers used?
UPDATE:
This question came to me in "Lession 3 - Adding Relationships with Cypher, more" from Neo4J online tutorial, where the query I mentioned above is presented.
I refined the query to the following one, in order to focus more directly my question:
MATCH (a:Person {name:"Keanu Reeves"})-[:ACTED_IN]->()<-[:ACTED_IN]-(b)
RETURN a,b;
The results:
|---------------|--------------------|
| a | b |
|---------------|--------------------|
| Keanu Reeves | Carrie-Anne Moss |
| Keanu Reeves | Laurence Fishburne |
| Keanu Reeves | Hugo Weaving |
| Keanu Reeves | Brooke Langton |
| Keanu Reeves | Gene Hackman |
| Keanu Reeves | Orlando Jones |
|------------------------------------|
So, why there is no row with Keanu Reeves in a and b? Doesn't he should match with both both relations :ACTED_IN?
The behavior you observed is by design.
To quote the manual:
While pattern matching, Cypher makes sure to not include matches where
the same graph relationship is found multiple times in a single
pattern. In most use cases, this is a sensible thing to do.
I would check your data sample. Your query looks like it works just fine for me. I replicated with a simple data set, and here's verification that it does produce pairs like what you're looking for.
Joe acted in "Some Flick"
neo4j-sh (?)$ create (p:Person {name:"Joe"})-[:ACTED_IN]->(m:Movie {name:"Some Flick"});
+-------------------+
| No data returned. |
+-------------------+
Nodes created: 2
Relationships created: 1
Properties set: 2
Labels added: 2
14 ms
But Joe is so multi-talented, he also directed "Some Flick".
neo4j-sh (?)$ match (p:Person {name: "Joe"}), (m:Movie {name: "Some Flick"}) create p-[:DIRECTED]->m;
+-------------------+
| No data returned. |
+-------------------+
Relationships created: 2
23 ms
So who are the actor/director pairs that we know of?
neo4j-sh (?)$ match (a:Person)-[:ACTED_IN|:DIRECTED]->()<-[:ACTED_IN|:DIRECTED]-(b:Person)
> return a,b;
+-----------------------------------------------------+
| a | b |
+-----------------------------------------------------+
| Node[222128]{name:"Joe"} | Node[222128]{name:"Joe"} |
| Node[222128]{name:"Joe"} | Node[222128]{name:"Joe"} |
+-----------------------------------------------------+
2 rows
50 ms
Of course it's Joe.
I'm running the following cypher query on two identical neo4j DB schemas:
START dave = node(7)
// dave's friend who lives and attends an event in the same city
MATCH dave-[:FRIEND]-friend-[:LIVES]->city-[:HOSTS]->event<-[:ATTENDS]-friend
RETURN dave.name, friend.name, city.name, event.name;
When I run the above query on the DB schema on my local server, I get correct results--a single path:
+----------------------------------------------------+
| dave.name | friend.name | city.name | event.name |
+----------------------------------------------------+
| "dave" | "adam" | "london" | "exhibition" |
+----------------------------------------------------+
In fact for each of the 4 persons node(4, 5, 6, 7), adam=node(4) is the only person who lives and attends an event in the same city.
However, when I run the same query here (on the exact same DB schema as on my local server) I'm getting the following incorrect result:
+----------------------------------------------------+
| dave.name | friend.name | city.name | event.name |
+----------------------------------------------------+
| "dave" | "adam" | "london" | "exhibition" |
| "dave" | "adam" | "london" | "exhibition" |
| "dave" | "bill" | "paris" | "seminar" | // bill doesn't attend seminar
+----------------------------------------------------+
For other persons instead of dave=node(7), the results here are also incorrect (extra paths that don't exist).
try to separate the match phase into 2, i have never used one parameter name 2 times in one match pattern:
except
MATCH dave-[:FRIEND]-friend-[:LIVES]->city-[:HOSTS]->event<-[:ATTENDS]-friend
use
MATCH dave-[:FRIEND]-friend-[:LIVES]->city-[:HOSTS]->event, event<-[:ATTENDS]-friend
I have a graph like this:
(2)<-[0:CHILD]-(1)-[1:CHILD]->(3)
In words: Node 1,2 and 3 (all with names); Edges 0 and 1
I write the following cypher-query:
START nodes = node(1,2,3), relationship = relationship(0,1)
RETURN nodes, relationship
and got as a result:
==> +-----------------------------------------------+
==> | nodes | relationship |
==> +-----------------------------------------------+
==> | Node[1]{name->"Risikogruppe2"} | :CHILD[0] {} |
==> | Node[1]{name->"Risikogruppe2"} | :CHILD[1] {} |
==> | Node[2]{name->"Beruf 1"} | :CHILD[0] {} |
==> | Node[2]{name->"Beruf 1"} | :CHILD[1] {} |
==> | Node[3]{name->"Beruf 2"} | :CHILD[0] {} |
==> | Node[3]{name->"Beruf 2"} | :CHILD[1] {} |
==> +-----------------------------------------------+
==> 6 rows, 0 ms
now my question:
why I became all nodes twice and relationships three time? I just want to get all of it one time.
thanks for your time ^^
The way Cypher works is very similar to SQL. When you create your variables in your START clause, you're sort of doing a from nodes, relationships in SQL (tables). The reason you're getting a cartesian product of all of the possible values for the two, is because you're not doing any sort of match or where to filter them, so it's basically like:
select *
from nodes, relationships
Where you forgot to put the foreign key relationship between the tables.
In Cypher, you do this by doing a match, usually:
start n=node(1,2,3), r=relationship(0,1)
match n-[r]-m // find where the n nodes and the r relationships point (to m)
return *
But since you have no match, you get a cartesian product.
You should only see the nodes and relationships once, unless you do some matching.
Tried to reproduce your problem, but I haven't been able to.
http://tinyurl.com/cobd8oq
Is it possible for you to create an console.neo4j.org example of your problem?
Thanks,
Andrés