Neo4j - Create relationship within label on property - neo4j

I'm importing a dataset of the following structure into Neo4j:
| teacher | student | period |
|:---------------:|---------|:------:|
| Mr. Smith | Michael | 1 |
| Mrs. Oliver | Michael | 2 |
| Mrs. Roth | Michael | 3 |
| Mrs. Oliver | Michael | 4 |
| Mrs. Oliver | Susan | 1 |
| Mrs. Roth | Susan | 2 |
My goal is to create a graph where a teacher "sends" students from one period to the next, showing the flow of students between teachers. The above graph for instance, would look like this:
Using words, my logic looks like this:
Generate a unique node for every teacher
For each student, create a relationship connecting the earliest period to the next earliest period, until the latest period is reached.
My code so far completes the first step:
LOAD CSV WITH HEADERS FROM 'file:///neo_sample.csv' AS row // loads local file
MERGE(a:teacher {teacher: row.teacher}) // used merge instead of create to produce unique teacher nodes.

Here is how you can produce your illustrated graph.
Assuming your CSV file looks like this:
teacher;student;period
Mr. Smith;Michael;1
Mrs. Oliver;Michael;2
Mrs. Roth;Michael;3
Mrs. Oliver;Michael;4
Mrs. Oliver;Susan;1
Mrs. Roth;Susan;2
then this query should work:
LOAD CSV WITH HEADERS FROM 'file:///neo_sample.csv' AS row FIELDTERMINATOR ';'
WITH row.teacher AS t, row.student AS s, row.period AS p
ORDER BY p
WITH s, COLLECT({t:t, p:p}) AS data
FOREACH(i IN RANGE(0, SIZE(data)-2) |
MERGE(a:Teacher {name: data[i].t})
MERGE(b:Teacher {name: data[i+1].t})
MERGE (a)-[:SENDS {student: s, period: data[i].p}]->(b)
)

Related

Mysql Join and count for each categories

Hi I have a tables like this
exams
id | exam_name
1 | computer science
2 | Environment science
exam_students
id | exam_id | student_name
1 | 1 | Josh
2 | 1 | Michael
3 | 1 | John
I just need to join and count the total students of each exam and output something like this
exam_name | total_students |
computer science | 3 |
Environment science| 0 |
Thank you for your any help and suggestions
Try this
SELECT
a.exam_name, count(b.id) AS total_students
FROM
exams a
LEFT JOIN exam_students b ON a.id = b.exam_id
GROUP BY
a.id
Hope this help

why does profile query shows me NodeByLabelScan for a property which has a unique constraint in neo4j?

In my neo4j data, I have unique constraint set.
neo4j-sh (?)$ schema
Indexes
ON :Post(uuid) ONLINE (for uniqueness constraint)
Constraints
ON (post:Post) ASSERT post.uuid IS UNIQUE
However, when i do a profile on query, it seems search is being done by NodeByLabelScan
neo4j-sh (?)$ profile match (p:Post {uuid:"503cb957-9da0-490c-808d-48b64a1b1f64"}) return p;
+---+
| p |
+---+
+---+
0 row
12 ms
Compiler CYPHER 2.2
Planner COST
Filter
|
+NodeByLabelScan
+-----------------+---------------+------+--------+-------------+---------------------------+
| Operator | EstimatedRows | Rows | DbHits | Identifiers | Other |
+-----------------+---------------+------+--------+-------------+---------------------------+
| Filter | 1 | 0 | 2 | p | p.uuid == { AUTOSTRING0} |
| NodeByLabelScan | 1 | 1 | 2 | p | :Post |
+-----------------+---------------+------+--------+-------------+---------------------------+
Total database accesses: 4
Is there something I am missing here?
My neo4j version is 2.2.3.
Neo4j 2.2 introduced a cost based analyzer. I guess here Cypher has the opinion that a NodeByLabelScan with filtering is faster than a index query due to the small number of nodes.

Nodes with same relation to a third node in a graph database

I was following the Neo4J online tutorial and I came to a question while trying this query with the query tool:
match (a:Person)-[:ACTED_IN|:DIRECTED]->()<-[:ACTED_IN|:DIRECTED]-(b:Person)
return a,b;
I was expecting one of the pairs returned to have the same Person in both identifiers but that didn't happen. Can somebody explain me why? Does a match clause exclude repeated elements in the different identifiers used?
UPDATE:
This question came to me in "Lession 3 - Adding Relationships with Cypher, more" from Neo4J online tutorial, where the query I mentioned above is presented.
I refined the query to the following one, in order to focus more directly my question:
MATCH (a:Person {name:"Keanu Reeves"})-[:ACTED_IN]->()<-[:ACTED_IN]-(b)
RETURN a,b;
The results:
|---------------|--------------------|
| a | b |
|---------------|--------------------|
| Keanu Reeves | Carrie-Anne Moss |
| Keanu Reeves | Laurence Fishburne |
| Keanu Reeves | Hugo Weaving |
| Keanu Reeves | Brooke Langton |
| Keanu Reeves | Gene Hackman |
| Keanu Reeves | Orlando Jones |
|------------------------------------|
So, why there is no row with Keanu Reeves in a and b? Doesn't he should match with both both relations :ACTED_IN?
The behavior you observed is by design.
To quote the manual:
While pattern matching, Cypher makes sure to not include matches where
the same graph relationship is found multiple times in a single
pattern. In most use cases, this is a sensible thing to do.
I would check your data sample. Your query looks like it works just fine for me. I replicated with a simple data set, and here's verification that it does produce pairs like what you're looking for.
Joe acted in "Some Flick"
neo4j-sh (?)$ create (p:Person {name:"Joe"})-[:ACTED_IN]->(m:Movie {name:"Some Flick"});
+-------------------+
| No data returned. |
+-------------------+
Nodes created: 2
Relationships created: 1
Properties set: 2
Labels added: 2
14 ms
But Joe is so multi-talented, he also directed "Some Flick".
neo4j-sh (?)$ match (p:Person {name: "Joe"}), (m:Movie {name: "Some Flick"}) create p-[:DIRECTED]->m;
+-------------------+
| No data returned. |
+-------------------+
Relationships created: 2
23 ms
So who are the actor/director pairs that we know of?
neo4j-sh (?)$ match (a:Person)-[:ACTED_IN|:DIRECTED]->()<-[:ACTED_IN|:DIRECTED]-(b:Person)
> return a,b;
+-----------------------------------------------------+
| a | b |
+-----------------------------------------------------+
| Node[222128]{name:"Joe"} | Node[222128]{name:"Joe"} |
| Node[222128]{name:"Joe"} | Node[222128]{name:"Joe"} |
+-----------------------------------------------------+
2 rows
50 ms
Of course it's Joe.

Getting different (and incorrect) results runing the same cypher query on identical DB schemas

I'm running the following cypher query on two identical neo4j DB schemas:
START dave = node(7)
// dave's friend who lives and attends an event in the same city
MATCH dave-[:FRIEND]-friend-[:LIVES]->city-[:HOSTS]->event<-[:ATTENDS]-friend
RETURN dave.name, friend.name, city.name, event.name;
When I run the above query on the DB schema on my local server, I get correct results--a single path:
+----------------------------------------------------+
| dave.name | friend.name | city.name | event.name |
+----------------------------------------------------+
| "dave" | "adam" | "london" | "exhibition" |
+----------------------------------------------------+
In fact for each of the 4 persons node(4, 5, 6, 7), adam=node(4) is the only person who lives and attends an event in the same city.
However, when I run the same query here (on the exact same DB schema as on my local server) I'm getting the following incorrect result:
+----------------------------------------------------+
| dave.name | friend.name | city.name | event.name |
+----------------------------------------------------+
| "dave" | "adam" | "london" | "exhibition" |
| "dave" | "adam" | "london" | "exhibition" |
| "dave" | "bill" | "paris" | "seminar" | // bill doesn't attend seminar
+----------------------------------------------------+
For other persons instead of dave=node(7), the results here are also incorrect (extra paths that don't exist).
try to separate the match phase into 2, i have never used one parameter name 2 times in one match pattern:
except
MATCH dave-[:FRIEND]-friend-[:LIVES]->city-[:HOSTS]->event<-[:ATTENDS]-friend
use
MATCH dave-[:FRIEND]-friend-[:LIVES]->city-[:HOSTS]->event, event<-[:ATTENDS]-friend

understanding cypher output

I have a graph like this:
(2)<-[0:CHILD]-(1)-[1:CHILD]->(3)
In words: Node 1,2 and 3 (all with names); Edges 0 and 1
I write the following cypher-query:
START nodes = node(1,2,3), relationship = relationship(0,1)
RETURN nodes, relationship
and got as a result:
==> +-----------------------------------------------+
==> | nodes | relationship |
==> +-----------------------------------------------+
==> | Node[1]{name->"Risikogruppe2"} | :CHILD[0] {} |
==> | Node[1]{name->"Risikogruppe2"} | :CHILD[1] {} |
==> | Node[2]{name->"Beruf 1"} | :CHILD[0] {} |
==> | Node[2]{name->"Beruf 1"} | :CHILD[1] {} |
==> | Node[3]{name->"Beruf 2"} | :CHILD[0] {} |
==> | Node[3]{name->"Beruf 2"} | :CHILD[1] {} |
==> +-----------------------------------------------+
==> 6 rows, 0 ms
now my question:
why I became all nodes twice and relationships three time? I just want to get all of it one time.
thanks for your time ^^
The way Cypher works is very similar to SQL. When you create your variables in your START clause, you're sort of doing a from nodes, relationships in SQL (tables). The reason you're getting a cartesian product of all of the possible values for the two, is because you're not doing any sort of match or where to filter them, so it's basically like:
select *
from nodes, relationships
Where you forgot to put the foreign key relationship between the tables.
In Cypher, you do this by doing a match, usually:
start n=node(1,2,3), r=relationship(0,1)
match n-[r]-m // find where the n nodes and the r relationships point (to m)
return *
But since you have no match, you get a cartesian product.
You should only see the nodes and relationships once, unless you do some matching.
Tried to reproduce your problem, but I haven't been able to.
http://tinyurl.com/cobd8oq
Is it possible for you to create an console.neo4j.org example of your problem?
Thanks,
Andrés

Resources