I have a Neo4j db with different labels on the nodes such as a:Banker , b:Customer. each has an email property I want to search for an email but but not search the entire db. So I want to do something like this Match(a:Banker {email: '123#mymail.com'}) OR Match (b:Customer {email:'123#mymail.com'}). There are constraints on email for both labels but I don't want each label to have the same email so before I add a node I need to determine if the email exist in either Banker or Customer nodes. I suspect this can be done in a very efficient scalable way that would not leave the user staring at a spinner when trying to add the one millionth record.....any help would be much appreciated
How I would do it is have an addition label 'Person' on all Bankers and Customers.
CREATE CONSTRAINT ON (b:Person) ASSERT p.Email IS UNIQUE
CREATE CONSTRAINT ON (b:Banker) ASSERT p.Email IS UNIQUE
CREATE CONSTRAINT ON (b:Customer) ASSERT p.Email IS UNIQUE
CREATE (b:Person:Banker {Email: "123#mymail.com"})
CREATE (b:Person:Customer {Email: "321#mymail.com"})
CREATE (c:Person:Customer {Email: "123#mymail.com"})
The last one will fail as a Person/Banker already has the same email. You can then also search MATCH (p:Person {Email: "123#mymail.com"}) or even b:Banker, c:Customer
You can also do (p:Person:Customer:Banker) if a person is all three.
It will also allow you to do MERGE which creates an entry if it doesn't already exist.
Since you already have a database you can do:
MATCH(b:Banker)
SET b:Person
MATCH(c:Customer)
SET c:Person
A somewhat "safer" approach than #Liam's would be to just have the Person label, without the Banker and Customer labels. That way, it would be harder to accidentally create/merge a node without the Person label, since that would be the only label for a person. Also, this approach would not require 2 (or 3) uniqueness checks every time you added a person.
With this approach, you could also add isCustomer and isBanker boolean properties, as needed, and create indexes on :Person(isCustomer) and :Person(isBanker) to quickly locate customers versus bankers.
Now, having said the above, I wonder if you really need the isCustomer and isBanker properties (or the Customer and Banker labels) at all. That is, the fact that a Person node is a banker and/or a customer may be derivable from that node's relationships. It seems reasonable for your data model to contain Bank nodes with relationships between them and people. For example, in the following data model, b is a banker at "XYZ Bank", c is a customer, and bc is both:
(b:Person)-[:WORKS_AT]->(xyz:Bank {id:123, name: 'XYZ Bank'}),
(c:Person)-[:BANKS_AT]->(xyz),
(bc:Person)-[:BANKS_AT]->(xyz)<-[:WORKS_AT]-(bc)
This query would find all bankers:
MATCH (banker:Person)-[:WORKS_AT]->(:Bank)
RETURN banker;
This would find all customers:
MATCH (banker:Person)-[:BANKS_AT]->(:Bank)
RETURN banker;
This would find all bankers who are also customers at the same bank:
MATCH (both:Person)-[:WORKS_AT]->(:Bank)<-[:BANKS_AT]-(both)
RETURN both;
Related
I just downloaded and installed Neo4J. Now I'm working with a simple csv that is looking like that:
So first I'm using this to merge the nodes for that file:
LOAD CSV WITH HEADERS FROM 'file:///Athletes.csv' AS line
MERGE(Rank:rank{rang: line.Rank})
MERGE(Name:name{nom: line.Name})
MERGE(Sport:sport{sport: line.Sport})
MERGE(Nation:nation{pays: line.Nation})
MERGE(Gender: gender{genre: line.Gender})
MERGE(BirthDate:birthDate{dateDeNaissance: line.BirthDate})
MERGE(BirthPlace: birthplace{lieuDeNaissance: line.BirthPlace})
MERGE(Height: height{taille: line.Height})
MERGE(Pay: pay{salaire: line.Pay})
and this to create some constraint for that file:
CREATE CONSTRAINT ON(name:Name) ASSERT name.nom IS UNIQUE
CREATE CONSTRAINT ON(rank:Rank) ASSERT rank.rang IS UNIQUE
Then I want to display to which country the athletes live to. For that I use:
Create(name)-[:WORK_AT]->(nation)
But I have have that appear:
I would like to know why I have that please.
I thank in advance anyone that takes time to help me.
Several issues come to mind:
If your CREATE clause is part of your first query: since the CREATE clause uses the variable names name and nation, and your MERGE clauses use Name and Nation (which have different casing) -- the CREATE clause would just create new nodes instead of using the Name and Nation nodes.
If your CREATE clause is NOT part of your first query: your CREATE clause would just create new nodes (since variable names, even assuming they had the same casing, are local to a query and are not stored in the DB).
Solution: You can add this clause to the end of the first query:
CREATE (Name)-[:WORK_AT]->(Nation)
Yes, Agree with #cybersam, it's the case sensitive issue of 'name' and 'nation' variables.
My suggesttion:
MERGE (Name)-[:WORK_AT]->(Nation)
I see that you're using MERGE for nodes, so just in case any values of Name or Nation duplicated, you should use MERGE instead of CREATE.
I'm really struggling getting my head around neo4j and was hoping someone might be able to help point me in the right direction with the below.
Basically, I have a list of what can be referred to as events; the event can be said to describe a patient entering and leaving a room.
Each event has a unique identifier; it also has an identifier for the student in question along with start and end times (e.g. the student entered the room at 12:00 and left at 12:05) and an identifier for the room.
The event and data might look along the lines of the below, columns separated by a pipe delimiter
ID|SID|ROOM|ENTERS|LEAVES
1|1|BLUE|1/01/2015 11:00|4/01/2015 10:19
2|2|GREEN|1/01/2015 12:11|1/01/2015 12:11
3|2|YELLOW|1/01/2015 12:11|1/01/2015 12:20
4|2|BLUE|1/01/2015 12:20|5/01/2015 10:48
5|3|GREEN|1/01/2015 18:41|1/01/2015 18:41
6|3|YELLOW|1/01/2015 18:41|1/01/2015 21:00
7|3|BLUE|1/01/2015 21:00|9/01/2015 9:30
8|4|BLUE|1/01/2015 19:30|3/01/2015 11:00
9|5|GREEN|2/01/2015 19:08|2/01/2015 19:08
10|5|ORANGE|2/01/2015 19:08|3/01/2015 2:43
11|5|PURPLE|3/01/2015 2:43|4/01/2015 16:44
12|6|GREEN|3/01/2015 11:52|3/01/2015 11:52
13|6|YELLOW|3/01/2015 11:52|3/01/2015 17:45
14|6|RED|3/01/2015 17:45|7/01/2015 10:00
Questions that might be asked could be:
what rooms have student x visited and in what order
what does the movement of students between rooms look like - to which room does students go to when they leave room y
That sounds simple enough but I'm tying myself into knots.
I started off creating unique constraints for both student and room
create constraint on (student: Student) assert student.id is unique
I then did the same for room.
I then loaded student as
using periodic commit 1000 load csv with headers from 'file://c:/event.csv' as line merge (s:Student {id: line.SID});
I also did the same for room and visits.
I have absolutely no idea how to create the relationships though to be able to answer the above questions though. Each event lists the time the student enters and leaves the room but not the room the student went to. Starting with the extract, should the extract be changed so that it contains the room the student left for? If someone could help talk through how I need to think of the relationships that needs to be created, that would be very much appreciated.
Cheers
As the popular saying goes, there is more than one way to skin an Ouphe - or thwart a mage. One way you could do it (which makes for the simplest modeling imo) is as follows :
CREATE CONSTRAINT ON (s:Student) ASSERT s.studentID IS UNIQUE;
CREATE CONSTRAINT ON (r:Room) ASSERT r.roomID IS UNIQUE;
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///dorm.csv" as line fieldterminator '|'
MERGE (s:Student {studentID: line.SID})
MERGE (r:Room {roomID: line.ROOM})
CREATE (s)-[:VISIT {starttime: apoc.date.parse(line.ENTERS,'s',"dd/MM/yyyy HH:mm"), endtime: apoc.date.parse(line.LEAVES,'s',"dd/MM/yyyy HH:mm")}]->(r);
# What rooms has student x visited and in what order
MATCH (s:Student {studentID: "2"})-[v:VISIT]->(r:Room)
RETURN r.roomID,v.starttime ORDER BY v.starttime;
# What does the movement of students between rooms look like - to which room does students go to when they leave room y
MATCH (s:Student)-[v:VISIT]->(r:Room {roomID: "GREEN"})
WITH s, v
MATCH (s)-[v2:VISIT]->(r2:Room)
WHERE v2.starttime > v.endtime
RETURN s.studentID, r2.roomID, v2.starttime ORDER BY s.studentID, v2.starttime;
So actually you would only have Student and Room as nodes and a student VISITing a room would make up the relationship (with the enter/leave times as properties of that relationship). Now, as you can see that might not be ideal for your second query (although it does work). One alternative is to have a Visit node and chain it (as timeline events) to both Students and Rooms. There's plenty of examples around on how to do that.
Hope this helps,
Tom
Good morning,
I want to build a structure in Neo4J where I can handle my users and groups (kind of ACL). The idea is to have for each user and for each group a node with all the details. The groups shall become a graph where a root group will have sub-groups that can have also sub-groups without limit. The relation will be -[:IS_SUBGROUP_OF]- - so far nothing exciting. Every user will be related to a group with -[:IS_MEMBER_OF]- to have a clear assignment. Of course a user can be a member of 1 or more groups. Some users will have a different relation like -[:IS_LEADER_OF]- to identify teamlead of the groups.
My tasks:
Assignment: I can query each member of a group with a simple query, I can even query members of the subgroups using the current logged in and asking user:
MATCH (d1:Group:Local) -- (c:User)
MATCH (d:User) -[:IS_MEMBER_OF|IS_LEADER_OF]- (g:Group:Local)-[:IS_SUBGROUP_OF*0..]->(d1)
WHERE c.login = userLogin
RETURN DISTINCT d.lastname, d.firstname
I get every related user to every group of the current user and below (subgroups). Maybe you have a hint how I cna improve the query or the model.
Approval
Here I am stucked as I want to have all users of the current group from the querying user and all members of all subgroups - except the leader of the current group. The reason behind is that a teamlead shall not be able to approve actions for himself but though for every other member of his group and all members of subgroups including their teamleads.
I tried to use the relations -[:IS_LEADER_OF]- to exclude them but than I loose also the teamleads of the subgroups. Does anyone has an idea how I would either change the model or how I can query the graph to get all users except the teamlead of the current group?
Thanks for your time,
Balael
* EDIT *
I think I am getting close, I just need to understand the results of those both queries:
MATCH (d:User) -- (g:Group) WHERE g.uuid = "xx"
RETURN d.lastname, d.firstname
Returns all user in this group no matter what relationship (leader / member)
MATCH (d:User) -- (g:Group), (g)--(c:User{uuid:"yy"})
RETURN d.lastname, d.firstname
Returns all user of that group except the user c. I would have expected to get c as well in the list with d-users as c is part of that group and should be found with (d:User).
I do not understand the difference between both queries, maybe someone has a hint for me?
You can simplify your query slightly (however this should not have an impact on performance):
MATCH (d:User) -[:IS_MEMBER_OF|IS_LEADER_OF]- (g:Group:Local)-[:IS_SUBGROUP_OF*0..]->(d1:Group:Local)--(c:User{login:"userlogin"})
RETURN DISTINCT d.lastname, d.firstname
Don't completely understand your question, but I assume you want to make sure that d1 and c are not connected by a IS_LEADER_OF relationship. If so, try:
MATCH (d:User) -[:IS_MEMBER_OF|IS_LEADER_OF]- (g:Group:Local)-[:IS_SUBGROUP_OF*0..]->(d1:Group:Local)-[r]-(c:User{login:"userlogin"})
WHERE type(r)<>'IS_LEADER_OF'
RETURN DISTINCT d.lastname, d.firstname
following up on * EDIT * in the question
In a MATCH you specify a path. By definition a path does not use the same relationship twice. Otherwise there is a danger to run into infinite recursion. Looking at the second query in the "EDIT" section above: the right part matches yy's relationship to the group whereas the left part matches all user related to this group. To prevent multiple usage of the same relationship the left part does not hit use yy
I am trying to create a social network-like structure.
I would like to create a timeline of posts which looks like this
(user:Person)-[:POSTED]->(p1:POST)-[:PREV]->[p2:POST]...
My problem is the following.
Assuming a post for a user already exists, I can create a new post by executing the following cypher query
MATCH (user:Person {id:#id})-[rel:POSTED]->(prev_post:POST)
DELETE rel
CREATE (user)-[:POSTED]->(post:POST {post:"#post", created:timestamp()}),
(post)-[:PREV]->(prev_post);
Assuming, the user has not created a post yet, this query fails. So I tried to somehow include both cases (user has no posts / user has at least one post) in one update query (I would like to insert a new post in the "post timeline")
MATCH (user:Person {id:"#id"})
OPTIONAL MATCH (user)-[rel:POSTED]->(prev_post:POST)
CREATE (post:POST {post:"#post2", created:timestamp()})
FOREACH (o IN CASE WHEN rel IS NOT NULL THEN [rel] ELSE [] END |
DELETE rel
)
FOREACH (o IN CASE WHEN prev_post IS NOT NULL THEN [prev_post] ELSE [] END |
CREATE (post)-[:PREV]->(o)
)
MERGE (user)-[:POSTED]->(post)
Is there any kind of if-statement (or some type of CREATE IF NOT NULL) to avoid using a foreach loop two times (the query looks a litte bit complicated and I know that the loop will only run 1 time)?.
However, this was the only solution, I could come up with after studying this SO post. I read in an older post that there is no such thing as an if-statement.
EDIT: The question is: Is it even good to include both cases in one query since I know that the "no-post case" will only occur once and that all other cases are "at least one post"?
Cheers
I've seen a solution to cases like this in some articles. To use a single query for all cases, you could create a special terminating node for the list of posts. A person with no posts would be like:
(:Person)-[:POSTED]->(:PostListEnd)
Now in all cases you can run the query:
MATCH (user:Person {id:#id})-[rel:POSTED]->(prev_post)
DELETE rel
CREATE (user)-[:POSTED]->(post:POST {post:"#post", created:timestamp()}),
(post)-[:PREV]->(prev_post);
Note that the no label is specified for prev_post, so it can match either (:POST) or (:PostListEnd).
After running the query, a person with 1 post will be like:
(:Person)-[:POSTED]->(:POST)-[:PREV]->(:PostListEnd)
Since the PostListEnd node has no info of its own, you can have the same one node for all your users.
I also do not see a better solution than using FOREACH.
However, I think I can make your query a bit more efficient. My solution essentially merges the 2 FOREACH tests into 1, since prev_postand rel must either be both NULL or both non-NULL. It also combines the CREATE and the MERGE (which should have been a CREATE, anyway).
MATCH (user:Person {id:"#id"})
OPTIONAL MATCH (user)-[rel:POSTED]->(prev_post:POST)
CREATE (user)-[:POSTED]->(post:POST {post:"#post2", created:timestamp()})
FOREACH (o IN CASE WHEN prev_post IS NOT NULL THEN [prev_post] ELSE [] END |
DELETE rel
CREATE (post)-[:PREV]->(o)
)
In the Neo4j v3.2 developer manual it specifies how you can create essentially a composite key made of multiple node properties at this link:
CREATE CONSTRAINT ON (n:Person) ASSERT (n.firstname, n.surname) IS NODE KEY
However, this is only available for the Enterprise Edition, not Community.
"CASE" is as close to an if-statement as you're going to get, I think.
The FOREACH probably isn't so bad given that you're likely limited in scope. But I see no particular downside to separating the query into two, especially to keep it readable and given the operations are fairly small.
Just my two cents.
I'm using Neo4j 2.0.0-M06. Just learning Cypher and reading the docs. In my mind this query would work, but I should be so lucky...
I'm importing tweets to a mysql-database, and from there importing them to neo4j. If a tweet is already existing in the Neo4j database, it should be updated.
My query:
MATCH (y:Tweet:Socialmedia) WHERE
HAS (y.tweet_id) AND y.tweet_id = '123'
CREATE UNIQUE (n:Tweet:Socialmedia {
body : 'This is a tweet', tweet_id : '123', tweet_userid : '321', tweet_username : 'example'
} )
Neo4j says: This pattern is not supported for CREATE UNIQUE
The database is currently empty on nodes with the matching labels, so there are no tweets what so ever in the Neo4j database.
What is the correct query?
You want to use MERGE for this query, along with a unique constraint.
CREATE CONSTRAINT on (t:Tweet) ASSERT t.tweet_id IS UNIQUE;
MERGE (t:Tweet {tweet_id:'123'})
ON CREATE
SET t:SocialMedia,
t.body = 'This is a tweet',
t.tweet_userid = '321',
t.tweet_username = 'example';
This will use an index to lookup the tweet by id, and do nothing if the tweet exists, otherwise it will set those properties.
I would like to point that one can use a combination of
CREATE CONSTRAINT and then a normal
CREATE (without UNIQUE)
This is for cases where one expects a unique node and wants to throw an exception if the node unexpectedly exists. (Far cheaper than looking for the node before creating it).
Also note that MERGE seems to take more CPU cycles than a CREATE. (It also takes more CPU cycles even if an exception is thrown)
An alternative scenario covering CREATE CONSTRAINT, CREATE and MERGE (though admittedly not the primary purpose of this post).