Neo4j Cypher - Returning nodes and their nested nodes of the same type - neo4j

I want to be able to return a list of Item nodes with their list of nested Item nodes contained within a Box. Because the relationships between the Item nodes and their nested Item nodes may be different (E.g. WHEELS, WINDOWS, LIGHTS), I would like to write a query that skips over the relationships and returns any nested Item node and their Item children because an Item will either have at least one Item child or none (thus resulting in empty children list).
I want to be able to do this with just a Box identifier (E.g. boxID) being passed.
NOTE: I'm new to Neo4j and Cypher so please reply with a (fairly) detailed answer of how the query works. I want to be able to understand how it works. Thanks!
E.g.
MATCH (iA: Item)-[r]->(iB: Item)-[r]->(b: Box)
WHERE b.boxID = $boxID
RETURN COLLECT(iB.itemID AS ItemID, ib.name as ItemName, COLLECT(iA.itemID as ItemID, iA.name as ItemName, COLLECT(...) ) AS ItemChildren)
The COLLECT(..) part confuses me. How do I return an Item node and all of its Item children and all of that childs Item children, and so on until empty children? Is there a better way to MATCH all of the nodes?

That is very easy using a variable-length relationship pattern:
MATCH (b:Box)-[:CONTAINS]->(:ItemInstance)-[*]-(i:Item)
WHERE b.boxID = $boxID
RETURN COLLECT(DISTINCT i) AS ItemChildren
The DISTINCT option is needed because the variable-length relationship result can return the same item multiple times.
This query also acknowledges the relationship directionality shown in your diagram. The CONTAINS relationship pattern specifies the appropriate directionality, but the variable-length relationship (-[*]-) specifies no directionality since your data model does not use a consistent direction throughout the tree starting at an ItemInstance.
Caveat: unbounded variable-length relationships can take a very long time or even run out of memory, depending on how big your DB is and how many relationships each node has. This can be worked around by specifying a reasonable upper bound on the length.

Related

NEO4J - Matching a path where middle node might exist or not

I have the following graph:
I would look to get all contractors and subcontractors and clients, starting from David.
So I thought of a query likes this:
MATCH (a:contractor)-[*0..1]->(b)-[w:works_for]->(c:client) return a,b,c
This would return:
(0:contractor {name:"David"}) (0:contractor {name:"David"}) (56:client {name:"Sarah"})
(0:contractor {name:"David"}) (1:subcontractor {name:"John"}) (56:client {name:"Sarah"})
Which returns the desired result. The issue here is performance.
If the DB contains millions of records and I leave (b) without a label, the query will take forever. If I add a label to (b) such as (b:subcontractor) I won't hit millions of rows but I will only get results with subcontractors:
(0:contractor {name:"David"}) (1:subcontractor {name:"John"}) (56:client {name:"Sarah"})
Is there a more efficient way to do this?
link to graph example: https://console.neo4j.org/r/pry01l
There are some things to consider with your query.
The relationship type is not specified- is it the case that the only relationships from contractor nodes are works_for and hired? If not, you should constrain the relationship types being matched in your query. For example
MATCH (a:contractor)-[:works_for|:hired*0..1]->(b)-[w:works_for]->(c:client)
RETURN a,b,c
The fact that (b) is unlabelled does not mean that every node in the graph will be matched. It will be reached either as a result of traversing the works_for or hired relationships if specified, or any relationship from :contractor, or via the works_for relationship.
If you do want to label it, and you have a hierarchy of types, you can assign multiple labels to nodes and just use the most general one in your query. For example, you could have a label such as ExternalStaff as the generic label, and then further add Contractor or SubContractor to distinguish individual nodes. Then you can do something like
MATCH (a:contractor)-[:works_for|:hired*0..1]->(b:ExternalStaff)-[w:works_for]->(c:client)
RETURN a,b,c
Depends really on your use cases.

neo4j: CYPHER query all properties of a node

We are evaluating Neo4J for future projects.  Currently just experimenting with learning Cypher and its capabilities.  But one thing that I think should be very straightforward has so far eluded me.  I want to be able to see all properties and their values for any given Node.  In SQL that would be something like:
select * from TableX where ID = 12345;
I have looked through the latest Neo4J docs and numerous Google searches but so far I am coming up empty.  I did find the keys() function that will return the property names in a string list, but that is marginally useful at best.  What I want is a query that will return prop names and the corresponding values like:
name     :  "Lebron"
city     :  "Cleveland"
college  :  "St. Vincent–St. Mary High School"
You may want to reread the Neo4j docs.
Returning the node itself will include the properties map for the node, which is typically the way you will get all properties (keys and values) for the node.
MATCH (n)
WHERE id(n) = 12345
RETURN n
If you explicitly just want the properties but without the metadata related to the node itself, returning properties(n) (assuming n is a node variable) will return the node's properties.
MATCH (n)
WHERE id(n) = 12345
RETURN properties(n) as props
With regard to how columns (variables) work, these are always explicit, so you don't have a way to dynamically get the columns corresponding to the properties of a node. You will instead need to use the approaches above, where the variable corresponds to either a node (where you can get at the properties map through the structure) or a properties map.
The main difference between this approach and select * in SQL is that Neo4j has no table schema, so you can use whatever properties you want on nodes of the same type, and those can differ between nodes of the same type, so there is no common structure to reference which will provide the properties for a node of a given label (you would need to scan all nodes of that label and accumulate the distinct properties to do so).

Neo4j ordered tree

We are working with a hierarchy tree structure where a parent has zero or more children, and a child has either one or zero parents. When we query for a list of direct children for a given parent the query returns the children in random order. We need the children to return in the order we define when we create or update the children.
I have added a relationship between children -[:Sibling]-> so the 'top' sibling has only an incoming :Sibling relationship, and the 'bottom' sibling has only an outgoing relationship.
Given this, is there a Cypher query to return the children in sibling order?
I have a query that returns each child, and its sibling, but now I have to write some code to return the list in the correct order.
An alternative approach might be to add a sort number to each child node. This would need to be updated for all the children if one of them changes order. This approach seems slightly foreign to the graph database concept.
If this problem has been encountered before, is there a standard algorithm for solving it programatically?
Update1
sample data as requested by Bruno
(parent1)
(child1)-[:ChildOf]->(parent1)
(child2)-[:ChildOf]->(parent1) (child2)-[:Sibling]->(child1)
(child3)-[:ChildOf]->(parent1) (child3)-[:Sibling]->(child2)
is there a cypher query to return child1, child2, child3 in that order?
if not, then the ordering can be done programatically
using properties instead of relationships
(parent1)
(child1)-[:ChildOf]->(parent1) (child1:{order:1})
(child2)-[:ChildOf]->(parent1) (child2:{order:2})
(child3)-[:ChildOf]->(parent1) (child3:{order:3})
`match (c)-[:ChildOf]->(parent1) return c ordered by c:order`
I do not expect that there is a cypher query that can update the order of children.
Update2
I have now arrived at the following query which returns children in the right order
`match (firstChild)-[:FirstChildOf]->(parent) match (sibling)-[:Sibling*]->(firstChild) return firstChild,sibling`
This query depends on adding a -[:FirstChildOf]->(parent) relationship.
If I don't hear otherwise I'll set this to the answer.
Shall I assume there is no cypher query for inserting a node into an ordered list?
When you create your graph model you should concentrate on the questions you want to answered with your data model.
If you want to get child of a parent ordered by a "create or update" property then you should store it since the relationship in general does not represent an order.
It is not foreign to the graph database concept, since the graph databases use properties. It is not always easy task to decide
to store something as a relationship or a property. It is all about the proper modelling.
If you have the concept of '[:NEXT_SIBLING]' or similar, then it will be a pain in the back, when you have to remove a node. So I should use it when this state is constant. For example in a timetree, where the days are after each other and it does not change.
In general, if the creation order should be used, I should use timestamps like this:
create (n:Node {created:timestamp()});
Then you can use the timestamp to order.
match (p:Parent)-[:HAS_CHILDREN]->(n:Child) where p.name='xy' return n order by n.created;
And you can use timestamps for relationships too.
Similar question is here:
Modeling an ordered tree with neo4j
Here are some tips I use for dealing with ordered lists of nodes in Neo4J.
Reverse the relation direction to (child1)<-[:HasChild]-(parent1). (mostly just logical reinforcement of the next items, since a "parent has list of children")
Add the property index to HasChild. This will let you do WITH child, hasChild.index as sid ORDER BY sid ASC for sibling order. (This is how I maintain arbitrary order information on lists of children) This is on the relationship because it assumes that a node can be part of more than one ordered list.
You can use SIZE(shortestpath((root)-[*]->(child1)) as depth to then order them by depth from a root.
Since this is for arbitrary order, you must update all the indexes to update the order (You can do something like WITH COLLECT(child) as children, FILTER(c IN COLLECT(child) WHERE c.index >=3) as subset FOREACH (c IN subset| SET c.index+=1) for a basic insert, otherwise you will have to just rewrite them all to arbitrarily change the order.
If you don't actually care about the order, just that it is consistent, you can use WITH child, ID(child) as sid ORDER BY sid ASC. This essentially is "Order by node age"
One other option is to use a meta relationship. So :HasChild would act as a list of nodes, and then something like :NextMember would tell you what the next item from this one is. This is more flexible, but in my opinion harder to work with (You need to case for doesn't have next, to get the right order you have to do a 'trace' on the nodes, doesn't work if you want to add this node to another ordered list later, ect.)
Of course, if the order isn't arbitrary (based on name or age or something), than it is much better to just sort on the non-arbitrary logic.
The query for returning the children in the right order is
match (firstChild)-[:FirstChildOf]->(parent) match (sibling)-[:Sibling*]->(firstChild) return firstChild,sibling

Limitation on Selection of nodes using labels

I want to keep restriction in cypher in selecting alike nodes without knowing the property values.
Let say, I have few nodes with BUYER as labels. And I don't know anything more than that regarding the database. And I wanted to see the list of properties for the BUYER nodes. And, all BUYER nodes have same set of properties. Then, I did this
My Approach:
MATCH (n:Buyer)
with keys(n) as each_node_keys
UNWIND each_node_keys as all_keys
RETURN DISTINCT(all_keys)
In my approach I can clearly see that, first line of query, MATCH(n:Buyer) is selecting all the nodes, iterating all the nodes, collecting all the properties and then filtering. Which is not a good idea.
In order to overcome this, I wanted to LIMIT the nodes we are selecting,
like instead of selecting all the nodes, How can I restrict it to select only one node and since I don't know any property values, I cannot filter using the property. Once I pick a node then I should not pick further nodes. How can I do that.
If as you said all Buyer nodes have the same property keys, you can just limit the MATCH for one node :
MATCH (n:Buyer)
WITH n LIMIT 1
RETURN keys(n)

Do labels and properties have an id value in Neo4j?

I know that nodes and relationships have an integral value that identifies them. Is the same true of labels and/or properties? Is it sufficient to identify a node labeling by giving a node id and a string? That is, is it possible to assign the same label to the same node/relationship more than once?
All nodes and relationships have IDs and can be looked up by their IDs. You can do it like this:
MATCH (n) WHERE id(n) = 5 RETURN n;
IDs should be thought of as an internal implementation detail; don't rely on them to have any particular value or to be consistent or ordered, just rely on them to uniquely identify each node.
In general, IMHO it's good practice to assign your own meaningful identifier to nodes, probably indexed, that you can use to find your nodes, in the same way you'd assign a primary key to a relational database record.
It is possible to assign a label to more than one node; labels should be thought of more as classes of nodes, sort of like entities in an ERD. Typically labels would be things like Person, Company, Job, etc. Labels don't have much to do with identifiers though.
Do labels and properties have an id value in Neo4j?
No
Is it sufficient to identify a node labeling by giving a node id and a string?
If by this you mean that you have the node id and a label value in a String variable then yes, it is. It is also sufficient to just use the ID.
MATCH (n) WHERE ID(n) = 1234 RETURN n
Better:
MATCH (n:YourLabel) WHERE ID(n) = 1234 RETURN n
As FrobberOfBits intimated it is also preferable to ignore (withint reason) the internal IDs that Neo attaches to Nodes/Relationships in your external interactions. This is in part to do with Neo recycling it's internal identifiers (So if you create a Node it gets assigned ID 1, delete that Node, the next created Node could be assigned ID 1), and inpart to do with exposing the internals of the system. Instead you should probably attach meaningful identifiers or UUIDS where required.
Using your own identifier:
MATCH (n:YourLabel{uid:1234}) RETURN n
To make lookups fast, index them:
CREATE INDEX ON :YourLabel(uid)
Is it possible to assign the same label to the same node/relationship more than once?
Nodes can have as many labels as you want to assign them (including none), which is handy when you want to maintain a hierarchy of Node "types". Having a label makes them faster to lookup as Neo has a hint of where to start.
Relationships can only have a single type and that type is immutable. i.e If you want to change from type HAS_A to HAD_A you cannot just change the type, you must delete and re-add the relationship.
A node can be related to another node as many times as you want, using the same relationship type or different relationship types. Performancewise it is better to have different relationship types than to use properties on relationships as the lookup is faster.
CREATE (p:Person{name:"Dave"}), (m:Pet), (d:Pet),
(p)-[:HAS_PET{type:"Cat"}]->(m),
(p)-[:HAS_PET{type:"Dog"}]->(d)
Is fine, as is:
REATE (p:Person{name:"Dave"}), (m:Pet), (d:Pet),
(p)-[:HAS_CAT]->(m),
(p)-[:HAS_DOG]->(d)
Which is now faster if you wanted to do a query for matching all dogs. If Dave's dog gets reassigned you cannot do:
MATCH (p:Person{name:"Dave"})-[rel:HAS_DOG]->()
SET rel:HAS_CAT
But you could do:
MATCH (p:Person{name:"Dave"})<-[rel:HAS_PET{type:"Dog"}]-()
SET rel.type = "Cat"
But I've probably missed the point of the multiple-assignment question.
Not sure what you're aiming for:
yes, property-names, rel-types and labels have internal id's they are not stored as strings
you can identify nodes by label + property + value, one if you have a uniqueness constraint otherwise multiple
you can identify nodes by label (many)

Resources