How to move all neo4j relationships with all their labels and properties from one node to another? - neo4j

Suppose you've got two nodes that represent the same thing, and you want to merge those two nodes. Both nodes can have any number of relations with other nodes.
The basics are fairly easy, and would look something like this:
MATCH (a), (b) WHERE a.id == b.id
MATCH (b)-[r]->()
CREATE (a)-[s]->()
SET s = PROPERTIES(r)
DELETE DETACH b
Only I can't create a relation without a type. And Cypher doesn't support variable labels either. I'd love to be able to do something like
CREATE (a)-[s:{LABELS(r)}]->(o)
but that doesn't work. To create the relation, you need to know the type of the relation, and in this case I really don't.
Is there a way to dynamically assign types to relationships, or am I going to have to query the types of the old relation, and then string concat new queries with the proper types? That's not impossible, but a lot slower and more complex. And this could potentially match a lot of elements and even more relationships, so having to generate a separate query for every instance is going to slow things down quite a lot.
Or is there a way to change the target of the old relationship? That would probably be the fastest, but I'm not aware of any way to do that.

I think you need to take a look at APOC, especially apoc.create.relationship which enable creating relationships with dynamic type.
Adapting your example, you should end up with something along the line of (not tested):
MATCH (a), (b) WHERE a.id == b.id
MATCH (b)-[r]->(n)
CALL apoc.create.relationship(a, type(r), properties(r), n)
DETACH DELETE b
NB
relationships have TYPE and not label
the proper cypher statement to delete relationships attached to a node and the node itself is DETACH DELETE (and not DELETE DETACH)
Related resource: https://markhneedham.com/blog/2016/10/30/neo4j-create-dynamic-relationship-type/

The APOC procedure apoc.refactor.mergeNodes should be very helpful. That procedure is very powerful, and you need to read the documentation to understand how to configure it to do what you want in your specific situation.
Here is a simple example that shows how to use the procedure's default configuration to merge nodes with the same id:
MATCH (node:Foo)
WITH node.id AS id, COLLECT(node) AS nodes
WHERE SIZE(nodes) > 1
CALL apoc.refactor.mergeNodes(nodes, {}) YIELD node
RETURN node
In this example, I specified an arbitrary Foo label to avoid accidentally merging unwanted nodes. Doing so also helps to speed up the query if you have a lot of nodes with other labels (since they will not need to be scanned for the id property).
The aggregating function COLLECT is used to collect a list of all the nodes with the same id. After checking the size of the list, it is passed to the procedure.

Related

Neo4j WHERE NOT clause for different nodes not working

I will explain briefly my query:
match (a)-[:requires]-(b),
(a)-[:instanceOf]->(n)<-[:superclassOf*]-(c:Host_configuration),
(h)-[:instanceOf]->(z)<-[:superclassOf*]-(t:Host)
where not b = h
return distinct a, b
My wish is to return all (a)-[:requires]-(b) patterns (where a is somehow a subclass of Host_configuration but b is not a subclass of Host.
This query however returns also nodes that actually are subclasses of Host
EDIT
I don't want to retrieve all a elements connected to b elements that are not tied to Host. I want to retrieve all patterns between a and b that are not like (a)-[:requires]-(h)
Try this query:
match (a)-[:requires]-(b),
(a)-[:instanceOf]->(n)<-[:superclassOf*]-(:Host_configuration)
where not (b)-[:instanceOf]->(z)<-[:superclassOf*]-(:Host)
return distinct a, b
It's possible to directly update the where clause with the path you want to exclude. You can define the where clause to exclude b where it is a subclass of Host.
I believe you should care about the direction of the (a)-[:requires]-(b) pattern. If you do not specify a direction, you would not know who requires whom AND you might also get the same node pair twice (in opposite orders). In my answer, I assume you meant (a)-[:requires]->(b), but you can easily reverse the direction if need be.
For efficiency, you should perform both instanceOf/superclassOf tests using path patterns in a WHERE clause. Such a path pattern just checks for a single match before succeeding, and does not bother to expend the resources to hunt down all possible matches. (By the way, a path pattern in a WHERE clause cannot introduce new variables.)
Once the above issues are taken care of, your MATCH clause would just be MATCH (a)-[:requires]->(b), and any given a/b pair would only be found once (as long as your DB has at most one requires relationship going from a given node to another given node). So that should mean that your RETURN clause can omit the DISTINCT option, which would be more efficient.
So, this may work better for you:
MATCH (a)-[:requires]->(b)
WHERE
(a)-[:instanceOf]->()<-[:superclassOf*]-(:Host_configuration) AND
NOT (b)-[:instanceOf]->()<-[:superclassOf*]-(:Host)
RETURN a, b
By the way, it would also be more efficient for the MATCH clause to specify the node labels for a and b, so that the DB does not have to scan every node in the DB. I have not done that in my answer.

NEO4J - Matching a path where middle node might exist or not

I have the following graph:
I would look to get all contractors and subcontractors and clients, starting from David.
So I thought of a query likes this:
MATCH (a:contractor)-[*0..1]->(b)-[w:works_for]->(c:client) return a,b,c
This would return:
(0:contractor {name:"David"}) (0:contractor {name:"David"}) (56:client {name:"Sarah"})
(0:contractor {name:"David"}) (1:subcontractor {name:"John"}) (56:client {name:"Sarah"})
Which returns the desired result. The issue here is performance.
If the DB contains millions of records and I leave (b) without a label, the query will take forever. If I add a label to (b) such as (b:subcontractor) I won't hit millions of rows but I will only get results with subcontractors:
(0:contractor {name:"David"}) (1:subcontractor {name:"John"}) (56:client {name:"Sarah"})
Is there a more efficient way to do this?
link to graph example: https://console.neo4j.org/r/pry01l
There are some things to consider with your query.
The relationship type is not specified- is it the case that the only relationships from contractor nodes are works_for and hired? If not, you should constrain the relationship types being matched in your query. For example
MATCH (a:contractor)-[:works_for|:hired*0..1]->(b)-[w:works_for]->(c:client)
RETURN a,b,c
The fact that (b) is unlabelled does not mean that every node in the graph will be matched. It will be reached either as a result of traversing the works_for or hired relationships if specified, or any relationship from :contractor, or via the works_for relationship.
If you do want to label it, and you have a hierarchy of types, you can assign multiple labels to nodes and just use the most general one in your query. For example, you could have a label such as ExternalStaff as the generic label, and then further add Contractor or SubContractor to distinguish individual nodes. Then you can do something like
MATCH (a:contractor)-[:works_for|:hired*0..1]->(b:ExternalStaff)-[w:works_for]->(c:client)
RETURN a,b,c
Depends really on your use cases.

How could i use this SQL on cypher(neo4j)

hi how can i transform this SQL Query as CYPHER Query ? :
SELECT n.enginetype, n.Rocket20, n.Yearlong, n.DistanceOn,
FROM TIMETAB AS n
JOIN PLANEAIR AS p ON (n.tailnum = p.tailNum)
If it is requisition before using that query to create any relationship or antyhing please write and help with that one too.. thanks
Here's a good guide for comparing SQL with Cypher and showing the equivalent Cypher for some SQL queries.
If we were to translate this directly, we'd use :PLANEAIR and :TIMETAB node labels (though I'd recommend using better names for these), and we'll need a relationship between them. Let's call it :RELATION.
Joins in SQL tend to be replaced with relationships between nodes, so we'll need to create these patterns in your graph:
(:PLANEAIR)-[:RELATION]->(:TIMETAB)
There are several ways to get your data into the graph, usually through LOAD CSV. The general approach is to MERGE your :PLANEAIR and :TIMETAB nodes with some id or unique property (maybe TailNum?, use ON CREATE SET ... after the MERGE to add the rest of the properties to the node when it's created, and then MERGE the relationship between the nodes.
The MERGE section of the developers manual should be helpful here, though I'd recommend reading through the entire dev manual anyway.
With this in place, the Cypher equivalent query is:
MATCH (p:PLANEAIR)-[:RELATION]->(n:TIMETAB)
RETURN n.Rocket20,p.enginetype, n.year, n.distance
Now this is just a literal translation of your SQL query. You may want to reconsider your model, however, as I'm not sure how much value there is in keeping time-related data for a plane separate from its node. You may just want to have all of the :TIMETAB properties on the :PLANEAIR node and do away with the :TIMETAB nodes completely. Of course your queries and use cases should guide how to model that data best.
EDIT
As far as creating the relationship between :PLANEAIR and :TIMETAB nodes (and again, I recommend using better labels for these, and maybe even keeping all time-related properties on a :Plane node instead of a separate one), provided you already have those nodes created, you'll need to do a joining match, but it will help to have a unique constraints on :PLANEAIR(tailnum) :TIMETAB(tailNum) (or an index, if this isn't supposed to be a unique property):
CREATE CONSTRAINT ON (p:PLANEAIR)
ASSERT p.tailNum IS UNIQUE
CREATE CONSTRAINT ON (n:TIMETAB)
ASSERT n.TailNum IS UNIQUE
Now we're ready to create the relationships
MATCH (p:PLANEAIR)
MATCH (n:TIMETAB)
WHERE p.tailNum = n.tailNum
CREATE (p)-[:RELATION]->(n)
REMOVE n.tailNum
Now that the relationships are created, and :TIMETAB tailNum property removed, we can drop the unique constraint on :TIMETAB(tailNum), since the relationship to :PLANEAIR is all we need.
DROP CONSTRAINT ON (n:TIMETAB)
ASSERT n.tailNum IS UNIQUE

neo4j and uni-directional relationship

I'm new to neo4j. I've just read some information on this tool, installed it on Ubuntu and made a bunch of queries. And at this moment, I must confess, that I realy like it. However, there is something (I think very simple and intuitive), which I do not know how to implement. So, I created three nodes like so:
CREATE (n:Object {id:1}) RETURN n
CREATE (n:Object {id:2}) RETURN n
CREATE (n:Object {id:3}) RETURN n
And I created a hierarchical relationship between them:
MATCH (a:Object {id:1}), (b:Object {id:2}) CREATE (a)-[:PARENT]->(b)
MATCH (a:Object {id:2}), (b:Object {id:3}) CREATE (a)-[:PARENT]->(b)
So, I think this simple hierarchy should look like this:
(id:1)
-> (id:2)
-> (id:3)
What I want now is to get a path from any node. For example, if I want to have a path from node (id:2), I will get (id:2) -> (id:3). And if I want to get a path from node (id:1), I will get (id:1)->(id:2)->(id:3). I tried this query:
MATCH (n:Object {id:2})-[*]-(children) return n, children
which I though should return a path (id:2)->(id:3), but unexpectedly (just for me) it returns (id:1)->(id:2)->(id:3). So, what I'm doing wrong and what is the right query to use?
All relationships in neo4j are directed. When you say (n)-[:foo]->(m), that relationship goes only one way, from n to m.
Now what's tricky about this is that you can navigate the relationship both ways. This doesn't make the relationship bi-directional, it never is -- it only means that you can look at it in either direction.
When you write this query: (n:Object {id:2})-[*]-(children) you didn't put an arrow head on that relationship, so children could refer to something either downstream or upstream of the node in question.
In other words, saying (n)-[:test]-(m) is the same thing as matching both (n)<-[:test]-(m) and (n)-[:test]->(m).
So children could refer to the ID 1 object or ID 2 object.
Returning only children
To directly answer your question,
Your query
MATCH (n:Object {id:2})-[*]-(children) return n, children
matches not only relationships FROM (n {id:2}) TO its children, but also relationships TO (n {id:2}) FROM its parents.
You need to additionally specify the direction that you'd like. This returns the results you expect:
MATCH (n:Object {id:2})-[*]->(children) return n, children
Issues with the example
I'd like to answer your comment about uni-directional and bi-directional relationships, but let's first resolve a couple of issues with the example.
Using correct labels
Let's revisit your example:
(:Object {id:1})-[:PARENT]->(:Object {id:2})-[:PARENT]->(:Object {id:3})
There's no point to using labels like :Object, :Node, :Thing. If you really don't care, don't use a label at all!
In this case, it looks we're talking about people, although it could easily also be motherboards and daughterboards, or something else!
Let's use People instead of Objects:
(:Person {id:1})-[:PARENT]->(:Person {id:2})-[:PARENT]->(:Person {id:3})
IDs in Neo4j
Neo4j stores its own IDs of every node and relationship. You can retrieve those IDs with id(nodeOrRelationship), and access by ID with a WHERE clause or by specifying them as a start point for your match. START n=node(2) MATCH (n)-[*]-(children) return n, children is equivalent to your original query MATCH (n:Object {id:2})-[*]-(children) return n, children.
Let's, instead of IDs, store something useful about the nodes, like names:
(:Person {name:'Bob'})-[:PARENT]->(:Person {name:'Mary'})-[:PARENT]->(:Person {name:'Tom'})
Relationship ambiguity
Lastly, let's disambiguate the relationships. Does PARENT mean "is the parent of", or "has this parent"? It might be clear to you which one you meant, but someone unfamiliar with your system might have the opposite interpretation.
I think you meant "is the parent of", so let's make that clear:
(:Person {name:'Bob'})-[:PARENT_OF]->(:Person {name:'Mary'})-[:PARENT_OF]->(:Person {name:'Tom'})
More information about uni-directional and bi-directional relationships in Neo4j
Now that we've taken care of a few basic issues with the example, let's address the directionality of relationships in Neo4j and graphs in general.
There are several ways we could have expressed the relationships this example. Let's look at a few.
Undirected/bidirectional relationship
Let's abstract the parent relationship that we used above, for the purposes of discussion:
(bob)-[:KIN]-(mary)-[:KIN]-(tom)
Here the relationship KIN indicates that they are related but we don't know exactly who is the parent of whom. Is Tom the child of Mary, or vice-versa?
Notice that I didn't use any arrows. In the graph pseudo-code above, the KIN relationship is a bidirectional or undirected relationship.
Relationships in Neo4j, however, are always directional. If the KIN relationship was really how you wanted to track things, then you'd create a directional relationship, but always ignore the direction in your MATCH queries, e.g. MATCH (a)-[:KIN]-(b) and not MATCH (a)-[:KIN]->(b).
But is the KIN relationship really the best way to store this information? We can make it more specific. Let's go back to the PARENT_OF relationship that we were using earlier.
Directed/unidirectional relationship
Back to the example. We know that Bob is the parent of Mary who is the parent of Tom:
(bob)-[:PARENT_OF]->(mary)-[:PARENT_OF]->(tom)
Obviously, the corollary of this is:
(bob)<-[:CHILD_OF]-(mary)<-[:CHILD_OF]-(tom)
Or, equivalently:
(tom)-[:CHILD_OF]->(mary)-[:CHILD_OF]->(bob)
So, should we go ahead and create both the PARENT_OF and the CHILD_OF relationships between our (bob), (mary) and (tom) nodes?
The answer is no. We can pick one of those relationships, whichever best models the idea, and still be able to search both ways.
Using only the :PARENT_OF relationship, we can do
MATCH (mary {name:'Mary'})-[:PARENT_OF]->(children) RETURN children
to find the children, or
MATCH (mary {name:'Mary'})<-[:PARENT_OF]-(parents) RETURN parents
to find the parents, using (mary) as the starting point each time.
For more information, see this fantastic article from GraphAware

Is a DFS Cypher Query possible?

My database contains about 300k nodes and 350k relationships.
My current query is:
start n=node(3) match p=(n)-[r:move*1..2]->(m) where all(r2 in relationships(p) where r2.GameID = STR(id(n))) return m;
The nodes touched in this query are all of the same kind, they are different positions in a game. Each of the relationships contains a property "GameID", which is used to identify the right relationship if you want to pass the graph via a path. So if you start traversing the graph at a node and follow the relationship with the right GameID, there won't be another path starting at the first node with a relationship that fits the GameID.
There are nodes that have hundreds of in and outgoing relationships, some others only have a few.
The problem is, that I don't know how to tell Cypher how to do this. The above query works for a depth of 1 or 2, but it should look like [r:move*] to return the whole path, which is about 20-200 hops.
But if i raise the values, the querys won't finish. I think that Cypher looks at each outgoing relationship at every single path depth relating to the start node, but as I already explained, there is only one right path. So it should do some kind of a DFS search instead of a BFS search. Is there a way to do so?
I would consider configuring a relationship index for the GameID property. See http://docs.neo4j.org/chunked/milestone/auto-indexing.html#auto-indexing-config.
Once you have done that, you can try a query like the following (I have not tested this):
START n=node(3), r=relationship:rels(GameID = 3)
MATCH (n)-[r*1..]->(m)
RETURN m;
Such a query would limit the relationships considered by the MATCH cause to just the ones with the GameID you care about. And getting that initial collection of relationships would be fast, because of the indexing.
As an aside: since neo4j reuses its internally-generated IDs (for nodes that are deleted), storing those IDs as GameIDs will make your data unreliable (unless you never delete any such nodes). You may want to generate and use you own unique IDs, and store them in your nodes and use them for your GameIDs; and, if you do this, then you should also create a uniqueness constraint for your own IDs -- this will, as a nice side effect, automatically create an index for your IDs.

Resources