Given the following graph:
(a)<--(b)-->(c)<--(d)-->(e)<--(f)-->(a)
I believe it is (currently) impossible to create a node (g) using the merge clause such that:
(g)-->(a)
(g)-->(c)
(g)-->(e)
The reason being that it requires a comma to describe the above pattern, and the MERGE clause will not accept a comma. e.g. (a)<--(g)-->(c), (g)-->(e)
For ease of reference, see picture below. Given that graph (except node 6), I cannot create node 6 using the MERGE command.
Can someone come up with a way to do this? I believe new functionality needs to be added, but I'd like to be more reasonably sure there's not a viable workaround before heading down that path.
There is no way to do this in Cypher, or in APOC right now. That said, there is a workaround. It's a bit manual, you'll need to acquire locks on the nodes in question (we'll use APOC for that), and we'll use OPTIONAL MATCH along with WHERE ... IS NULL to determine whether or not the center node exists, then create it only when it doesn't.
For this, I'm using the following example graph to mimic yours, before the addition of node 6:
create (zero:Node{name:0})
create (one:Node{name:1})
create (two:Node{name:2})
create (three:Node{name:3})
create (four:Node{name:4})
create (five:Node{name:5})
create (zero)<-[:TYPE]-(one)-[:TYPE]->(two)
create (two)<-[:TYPE]-(three)-[:TYPE]->(four)
create (four)<-[:TYPE]-(five)-[:TYPE]->(zero)
And now, the query to merge
match (node:Node)
where node.name in [0,2,4]
with collect(node) as nodes
call apoc.lock.nodes(nodes)
with nodes[0] as first, nodes[1] as second, nodes[2] as third
optional match (first)<-[:TYPE]-(center)-[:TYPE]->(second)
where (center)-[:TYPE]->(third)
with first, second, third, center
where center is null
// above 'where' will result in no rows if center exists, preventing creation of duplicate pattern below
create (first)<-[:TYPE]-(newCenter:Node{name:6})-[:TYPE]->(second)
create (newCenter)-[:TYPE]->(third)
Related
I am new to Cypher and I am trying to learn it through a small project I am trying to set up.
I have the following data model so far:
For every Thought created, I connect Tags through Categories.
The Categories only serve as intermediate between the Tags and Thoughts, this is done to improve querying, prevent Tag duplication and reduce relationships between the objects.
To prevent creation of new Tags with the same value, I thought of the following query:
CREATE (t: Thought {moment:timestamp(), message:'Testing new Thought'})
MERGE (t1: Tag{value: 'work'})
MERGE (t2: Tag{value: 'tasks'})
MERGE (t3: Tag{value: 'administration'})
MERGE (c: Category)
MERGE (t1)<-[u:CONSISTS_OF{index:0}]-(c)
MERGE (t2)<-[v:CONSISTS_OF{index:1}]-(c)
MERGE (t3)<-[w:CONSISTS_OF{index:2}]-(c)
MERGE (t)-[x:CATEGORIZED_AS{index: 0}]->(c)
This works fine, except for one thing: the Thought receives a relationship with all created Categories.
This I understand, I define no restrictions in the MERGE query.
However, I do not know how to apply restrictions to the CATEGORIZED_AS relationship?
I tried to add this to the bottom of the query, but that does not work:
WHERE (t)-[x]->(c)
Any idea how to apply a restriction like I need in my case?
EDIT:
I forgot to mention the unique connection of a Category:
A category is connect to a fixed set of Tags in a specific order.
E.g I have three tags:
work
tasks
administration
The only way the Category matches the Thought is if the Category has the following relationships with the Tags:
work <-[:CONSISTS_OF {index:0}]-(category)
tasks <-[:CONSISTS_OF {index:1}]-(category)
administration <-[:CONSISTS_OF {index:2}]-(category)
Any other order of relationships is invalid and a new Category should be created.
The Problem: Use of MERGE
MERGE will try and find a pattern in the graph, if it finds the pattern it will return it, else it will try and create the entire pattern. This works individually for each MERGE clause. So, this works great and as expected for (n:Tag) nodes, since you only want one tag for each word in the graph, but the issue comes with the later in your query when you try to merge a category.
What you want to do is try and find this (c:Category) that is connected to these three (t:Tag) nodes with these r.index properties on the relationship (:Tag)-[r:CONSISTS_OF]-(). However, you're running four merge clauses which do the following:
MERGE (c: Category)
Find or create any node c with the label `Category.
MERGE (t1)<-[u:CONSISTS_OF{index:0}]-(c)
MERGE (t2)<-[v:CONSISTS_OF{index:1}]-(c)
MERGE (t3)<-[w:CONSISTS_OF{index:2}]-(c)
Find or Create a relationship between that node and t1, then t2, t3 etc.
If you were to run that query, and then change one of the tags to something different like "rest", and run the query again, you'd expect a new category to appear. But it won't with the current query, it'll simply create a new tag, then find the existing (c:Category) node in that first MERGE clause, and create a relationship between it and the new tag. So, rather than having two categories each linked to three tags (with two tags being shared), you'll just have four tags all linked to one category with duplicate indexes on your relationships.
So, what you actually want to do is use MERGE to find the complex pattern like below.
MERGE (t1)<-[:CONSISTS_OF {index:0}]-(c:Category)-[:CONSISTS_OF {index:1}]->(t2),
(t3)<-[:CONSISTS_OF {index:2}]-(c)
Annoyingly, that will give you a syntax error, as cypher can't currently merge complex patterns like that. So, here comes the creative bit.
Solution 1: Conditional Execution with CASE and FOREACH (Easy)
This is quite a handy goto for these kinds of situation, see the commented query below. You'll essentially split the merge up, use OPTIONAL MATCH to try and find the pattern, and then use a little trick in cypher syntax to CREATE the pattern if we find it doesn't exist.
CREATE (t: Thought {moment:timestamp(), message:'Testing new Thought'})
MERGE (t1:Tag{value: 'work'})
MERGE (t2:Tag{value: 'abayo'})
MERGE (t3:Tag{value: 'rest'})
WITH *
// we can't merge this category because it's a complex pattern
// so, can we find it in the db?
OPTIONAL MATCH (t1)<-[:CONSISTS_OF {index:0}]-(c:Category)-[:CONSISTS_OF {index:1}]->(t2),
(t3)<-[:CONSISTS_OF {index:2}]-(c)
// the CASE here works in conjunction with the foreach to
// conditionally execute the create clause
WITH t, t1, t2, t3, c, CASE c WHEN NULL THEN [1] ELSE [] END AS make_cat
FOREACH (i IN make_cat |
// if no such category exists, this code will run as c is null
// if a category does exist, c will not be null, and so this won't run
CREATE (t1)<-[:CONSISTS_OF {index:0}]-(new_cat:Category)-[:CONSISTS_OF {index:1}]->(t2),
(t3)<-[:CONSISTS_OF {index:2}]-(new_cat)
)
// now we're not sure if we're referring to new_cat or cat
// remove variable c from scope
WITH t, t1, t2, t3
// and now match it, we know for sure now we'll find it
// alternatively, use conditional execution again here
MATCH (t1)<-[:CONSISTS_OF]-(c:Category)-[:CONSISTS_OF]->(t2),
(t3)<-[:CONSISTS_OF]-(c)
// now we have the category, we definitely want
// to create the relationship between the thought and the category
CREATE (t)-[:CATEGORIZED_AS]->(c)
RETURN *
Solution 2: Refactor Your Graph (Hard)
I haven't included a query here - although I can do if requested - but an alternative would be to refactor your graph to attach tags to categories in a ring (or chain - with a final member marker) structure, so that you can merge the pattern straight away without having to split it up.
Since the categories are in an order, you could express the data like the below, in one MERGE clause.
MERGE (c:Category)-[:CONSISTS_OF_TAG_SEQUENCE]->(t1)-[:NEXT_TAG_IN_SEQUENCE]->(t2)-[:NEXT_TAG_IN_SEQUENCE]->(t3)-[:NEXT_TAG_IN_SEQUENCE]->(c)
This might seem like a neat solution at first, but the problem is, that since tags will belong to multiple categories, if tags are shared between categories you will need to either:
create a composite index to identify categories and store this as a property of the sequential relationships so you know which relationships to follow in your path (i.e., so you can always find one, and only one, sequence of tags for a category)
still link each tag to the categories it is in and query on this pattern (to allow you to find that single path like in #1)
Use an intermediate node to achieve the same as 1 and 2
All of the above and more.
As you might have guessed, this will make your query much more complicated than it needs to be quite quickly. It could be fun to try, and may suit some use cases, but for the time being I'd stick with the easy solution!
My solution to your problem, is to enforce that every Category has a unique, consistently reproducible id. In your case, add a cid or id field, where the value is something along the lines of tag1<_>tag2<_>tag3<_>. (<_> is used because the chances of that being part of a tag are zero. If _ is an invalid tag character replacing <_> with _ will do just fine).
This way you can lock onto a category node without having to know anything about the nodes it is attached to. Essentially, the unique id IS your merge logic. This can even be dynamicly built up in Cypher using reduce. I usually also have a value field as a "pretty print display id value".
When running the final Cypher, you would Merge on each node alone by instance id, use Set for non node-defining fields, then use Create Unique to make make sure there was one and only one relation between the nodes.
hi how can i transform this SQL Query as CYPHER Query ? :
SELECT n.enginetype, n.Rocket20, n.Yearlong, n.DistanceOn,
FROM TIMETAB AS n
JOIN PLANEAIR AS p ON (n.tailnum = p.tailNum)
If it is requisition before using that query to create any relationship or antyhing please write and help with that one too.. thanks
Here's a good guide for comparing SQL with Cypher and showing the equivalent Cypher for some SQL queries.
If we were to translate this directly, we'd use :PLANEAIR and :TIMETAB node labels (though I'd recommend using better names for these), and we'll need a relationship between them. Let's call it :RELATION.
Joins in SQL tend to be replaced with relationships between nodes, so we'll need to create these patterns in your graph:
(:PLANEAIR)-[:RELATION]->(:TIMETAB)
There are several ways to get your data into the graph, usually through LOAD CSV. The general approach is to MERGE your :PLANEAIR and :TIMETAB nodes with some id or unique property (maybe TailNum?, use ON CREATE SET ... after the MERGE to add the rest of the properties to the node when it's created, and then MERGE the relationship between the nodes.
The MERGE section of the developers manual should be helpful here, though I'd recommend reading through the entire dev manual anyway.
With this in place, the Cypher equivalent query is:
MATCH (p:PLANEAIR)-[:RELATION]->(n:TIMETAB)
RETURN n.Rocket20,p.enginetype, n.year, n.distance
Now this is just a literal translation of your SQL query. You may want to reconsider your model, however, as I'm not sure how much value there is in keeping time-related data for a plane separate from its node. You may just want to have all of the :TIMETAB properties on the :PLANEAIR node and do away with the :TIMETAB nodes completely. Of course your queries and use cases should guide how to model that data best.
EDIT
As far as creating the relationship between :PLANEAIR and :TIMETAB nodes (and again, I recommend using better labels for these, and maybe even keeping all time-related properties on a :Plane node instead of a separate one), provided you already have those nodes created, you'll need to do a joining match, but it will help to have a unique constraints on :PLANEAIR(tailnum) :TIMETAB(tailNum) (or an index, if this isn't supposed to be a unique property):
CREATE CONSTRAINT ON (p:PLANEAIR)
ASSERT p.tailNum IS UNIQUE
CREATE CONSTRAINT ON (n:TIMETAB)
ASSERT n.TailNum IS UNIQUE
Now we're ready to create the relationships
MATCH (p:PLANEAIR)
MATCH (n:TIMETAB)
WHERE p.tailNum = n.tailNum
CREATE (p)-[:RELATION]->(n)
REMOVE n.tailNum
Now that the relationships are created, and :TIMETAB tailNum property removed, we can drop the unique constraint on :TIMETAB(tailNum), since the relationship to :PLANEAIR is all we need.
DROP CONSTRAINT ON (n:TIMETAB)
ASSERT n.tailNum IS UNIQUE
I have a linked list, in neo4j that looks something like this:
CREATE (p:Procedure {id:1})
CREATE (s1:Step {title:"Do Thing 1"})
CREATE (s2:Step {title:"Do Thing 2"})
MERGE (p)-[:FIRST_STEP {parent:[1]}]->(s1)-[:NEXT {parent:[1]}]->(s2)
Now I might create another list that contains this list, and for that to work, I'd either create a separate set of relationships with a new parent value, or I'd add the new parent id to the list of parents: e.g. parent[1,2].
Now, is it possible to do a match like this:
match (p:Procedure)-[rel:FIRST_STEP|NEXT*]->(steps)
WHERE p.id = 1 and 1 in rel.parent
return p, steps
I can do it if I put the constraint in the initial declaration of the relationship e.g. -[rel:FIRST_STEP|NEXT* {parent:1}]->, but that doesn't allow me to do the "IN" query.
Any thoughts or direction much appreciated.
Are there any expected use cases that will modify the list in some way, such as inserting, rearranging, or removing nodes? And if so, are the changes to one list meant to reflect changes to the other?
If these use cases exist, and if the list changes are meant to stay in sync with each other, single relationships with a list of parent ids makes sense (though the APOC Procedures library contains graph refactoring procedures that could handle either design).
If changes to one list aren't meant to reflect in the other list, then separate relationships per parent make the most sense.
Also, as far as I can tell there aren't easy operations to subtract elements from a list (you can use "+" to add an element, but you can't use "-"). I think you'd have to use a filter() to do this, which is a little awkward. It's easier syntactically to delete relationships entirely than to remove elements from lists on relationships, though that probably won't be a driving concern for your design choice.
I am running neo4j-community-3.0.0-M05.
I am trying out Neo4J Cypher Query Language's MERGE clause. Its explanation is given as follows
It acts like a combination of MATCH or CREATE, which checks for the existence of data first before creating it. With MERGE you define a pattern to be found or created. Usually, as with MATCH you only want to include the key property to look for in your core pattern. MERGE allows you to provide additional properties you want to set ON CREATE.
I already have following node:
(:Movie{title:"Forrest Gump", released:1994})
and now I wanted to add a dummy property addedOn with dummy value 20160108 to it just to try out MERGE clause:
MERGE (a:Movie{title:"Forrest Gump"})
ON CREATE SET a.addedOn= "20160108"
RETURN a;
However this seems not working:
Why is this so?
What you're seeing is precisely the expected behaviour.
Since MERGE finds your pre-existing Forrest Gump, this node is used. The ON CREATE handler will not fire since you didn't create anything.
If you've had a ON MATCH handler this one would have been fired since MERGE's match was successful.
I have a simple model of a chess tournament. It has 5 players playing each other. The graph looks like this:
The graph is generally fine, but upon further inspection, you can see that both sets
Guy1 vs Guy2,
and
Guy4 vs Guy5
have a redundant relationship each.
The problem is obviously in the data, where there is a extraneous complementary row for each of these matches (so in a sense this is a data quality issue in the underlying csv):
I could clean these rows by hand, but the real dataset has millions of rows. So I'm wondering how I could remove these relationships in either of 2 ways, using CQL:
1) Don't read in the extra relationship in the first place
2) Go ahead and create the extra relationship, but then remove it later.
Thanks in advance for any advice on this.
The code I'm using is this:
/ Here, we load and create nodes
LOAD CSV WITH HEADERS FROM
'file:///.../chess_nodes.csv' AS line
WITH line
MERGE (p:Player {
player_id: line.player_id
})
ON CREATE SET p.name = line.name
ON MATCH SET p.name = line.name
ON CREATE SET p.residence = line.residence
ON MATCH SET p.residence = line.residence
// Here create the edges
LOAD CSV WITH HEADERS FROM
'file:///.../chess_edges.csv' AS line
WITH line
MATCH (p1:Player {player_id: line.player1_id})
WITH p1, line
OPTIONAL MATCH (p2:Player {player_id: line.player2_id})
WITH p1, p2, line
MERGE (p1)-[:VERSUS]->(p2)
It is obvious that you don't need this extra relationship as it doesn't add any value nor weight to the graph.
There is something that few people are aware of, despite being in the documentation.
MERGE can be used on undirected relationships, neo4j will pick one direction for you (as realtionships MUST be directed in the graph).
Documentation reference : http://neo4j.com/docs/stable/query-merge.html#merge-merge-on-an-undirected-relationship
An example with the following statement, if you run it for the first time :
MATCH (a:User {name:'A'}), (b:User {name:'B'})
MERGE (a)-[:VERSUS]-(b)
It will create the relationship as it doesn't exist. However if you run it a second time, nothing will be changed nor created.
I guess it would solve your problem as you will not have to worry about cleaning the data in upfront nor run scripts afterwards for cleaning your graph.
I'd suggest creating a "match" node like so
(x:Player)-[:MATCH]->(m:Match)<-[:MATCH]-(y:Player)
to enable tracking details about the match separate from the players.
If you need to track player matchups distinct from the matches themselves, then
(x:Player)-[:HAS_PLAYED]->(pair:HasPlayed)<-[:HAS_PLAYED]-(y:Player)
would do the trick.
If the schema has to stay as-is and the only requirement is to remove redundant relationships, then
MATCH (p1:Player)-[r1:VERSUS]->(p2:Player)-[r2:VERSUS]->(p1)
DELETE r2
should do the trick. This finds all p1, p2 nodes with bi-directional VERSUS relationships and removes one of them.
You need to use UNWIND to do the trick.
MATCH (p1:Player)-[r:VERSUS]-(p2:Player)
WITH p1,p2,collect(r) AS rels
UNWIND tail(rels) as rel
DELETE rel;
THe previous code will find the direct connections of type VERSUS between p1 and p2 using match (note that this is not directed). Then will get the collection of relationships and finally the last of those relations, which is deleted.
Of course you can add a check to see whether the length of the collection is 2.