Limitation on Selection of nodes using labels - neo4j

I want to keep restriction in cypher in selecting alike nodes without knowing the property values.
Let say, I have few nodes with BUYER as labels. And I don't know anything more than that regarding the database. And I wanted to see the list of properties for the BUYER nodes. And, all BUYER nodes have same set of properties. Then, I did this
My Approach:
MATCH (n:Buyer)
with keys(n) as each_node_keys
UNWIND each_node_keys as all_keys
RETURN DISTINCT(all_keys)
In my approach I can clearly see that, first line of query, MATCH(n:Buyer) is selecting all the nodes, iterating all the nodes, collecting all the properties and then filtering. Which is not a good idea.
In order to overcome this, I wanted to LIMIT the nodes we are selecting,
like instead of selecting all the nodes, How can I restrict it to select only one node and since I don't know any property values, I cannot filter using the property. Once I pick a node then I should not pick further nodes. How can I do that.

If as you said all Buyer nodes have the same property keys, you can just limit the MATCH for one node :
MATCH (n:Buyer)
WITH n LIMIT 1
RETURN keys(n)

Related

NEO4J - Matching a path where middle node might exist or not

I have the following graph:
I would look to get all contractors and subcontractors and clients, starting from David.
So I thought of a query likes this:
MATCH (a:contractor)-[*0..1]->(b)-[w:works_for]->(c:client) return a,b,c
This would return:
(0:contractor {name:"David"}) (0:contractor {name:"David"}) (56:client {name:"Sarah"})
(0:contractor {name:"David"}) (1:subcontractor {name:"John"}) (56:client {name:"Sarah"})
Which returns the desired result. The issue here is performance.
If the DB contains millions of records and I leave (b) without a label, the query will take forever. If I add a label to (b) such as (b:subcontractor) I won't hit millions of rows but I will only get results with subcontractors:
(0:contractor {name:"David"}) (1:subcontractor {name:"John"}) (56:client {name:"Sarah"})
Is there a more efficient way to do this?
link to graph example: https://console.neo4j.org/r/pry01l
There are some things to consider with your query.
The relationship type is not specified- is it the case that the only relationships from contractor nodes are works_for and hired? If not, you should constrain the relationship types being matched in your query. For example
MATCH (a:contractor)-[:works_for|:hired*0..1]->(b)-[w:works_for]->(c:client)
RETURN a,b,c
The fact that (b) is unlabelled does not mean that every node in the graph will be matched. It will be reached either as a result of traversing the works_for or hired relationships if specified, or any relationship from :contractor, or via the works_for relationship.
If you do want to label it, and you have a hierarchy of types, you can assign multiple labels to nodes and just use the most general one in your query. For example, you could have a label such as ExternalStaff as the generic label, and then further add Contractor or SubContractor to distinguish individual nodes. Then you can do something like
MATCH (a:contractor)-[:works_for|:hired*0..1]->(b:ExternalStaff)-[w:works_for]->(c:client)
RETURN a,b,c
Depends really on your use cases.

Neo4j ordered tree

We are working with a hierarchy tree structure where a parent has zero or more children, and a child has either one or zero parents. When we query for a list of direct children for a given parent the query returns the children in random order. We need the children to return in the order we define when we create or update the children.
I have added a relationship between children -[:Sibling]-> so the 'top' sibling has only an incoming :Sibling relationship, and the 'bottom' sibling has only an outgoing relationship.
Given this, is there a Cypher query to return the children in sibling order?
I have a query that returns each child, and its sibling, but now I have to write some code to return the list in the correct order.
An alternative approach might be to add a sort number to each child node. This would need to be updated for all the children if one of them changes order. This approach seems slightly foreign to the graph database concept.
If this problem has been encountered before, is there a standard algorithm for solving it programatically?
Update1
sample data as requested by Bruno
(parent1)
(child1)-[:ChildOf]->(parent1)
(child2)-[:ChildOf]->(parent1) (child2)-[:Sibling]->(child1)
(child3)-[:ChildOf]->(parent1) (child3)-[:Sibling]->(child2)
is there a cypher query to return child1, child2, child3 in that order?
if not, then the ordering can be done programatically
using properties instead of relationships
(parent1)
(child1)-[:ChildOf]->(parent1) (child1:{order:1})
(child2)-[:ChildOf]->(parent1) (child2:{order:2})
(child3)-[:ChildOf]->(parent1) (child3:{order:3})
`match (c)-[:ChildOf]->(parent1) return c ordered by c:order`
I do not expect that there is a cypher query that can update the order of children.
Update2
I have now arrived at the following query which returns children in the right order
`match (firstChild)-[:FirstChildOf]->(parent) match (sibling)-[:Sibling*]->(firstChild) return firstChild,sibling`
This query depends on adding a -[:FirstChildOf]->(parent) relationship.
If I don't hear otherwise I'll set this to the answer.
Shall I assume there is no cypher query for inserting a node into an ordered list?
When you create your graph model you should concentrate on the questions you want to answered with your data model.
If you want to get child of a parent ordered by a "create or update" property then you should store it since the relationship in general does not represent an order.
It is not foreign to the graph database concept, since the graph databases use properties. It is not always easy task to decide
to store something as a relationship or a property. It is all about the proper modelling.
If you have the concept of '[:NEXT_SIBLING]' or similar, then it will be a pain in the back, when you have to remove a node. So I should use it when this state is constant. For example in a timetree, where the days are after each other and it does not change.
In general, if the creation order should be used, I should use timestamps like this:
create (n:Node {created:timestamp()});
Then you can use the timestamp to order.
match (p:Parent)-[:HAS_CHILDREN]->(n:Child) where p.name='xy' return n order by n.created;
And you can use timestamps for relationships too.
Similar question is here:
Modeling an ordered tree with neo4j
Here are some tips I use for dealing with ordered lists of nodes in Neo4J.
Reverse the relation direction to (child1)<-[:HasChild]-(parent1). (mostly just logical reinforcement of the next items, since a "parent has list of children")
Add the property index to HasChild. This will let you do WITH child, hasChild.index as sid ORDER BY sid ASC for sibling order. (This is how I maintain arbitrary order information on lists of children) This is on the relationship because it assumes that a node can be part of more than one ordered list.
You can use SIZE(shortestpath((root)-[*]->(child1)) as depth to then order them by depth from a root.
Since this is for arbitrary order, you must update all the indexes to update the order (You can do something like WITH COLLECT(child) as children, FILTER(c IN COLLECT(child) WHERE c.index >=3) as subset FOREACH (c IN subset| SET c.index+=1) for a basic insert, otherwise you will have to just rewrite them all to arbitrarily change the order.
If you don't actually care about the order, just that it is consistent, you can use WITH child, ID(child) as sid ORDER BY sid ASC. This essentially is "Order by node age"
One other option is to use a meta relationship. So :HasChild would act as a list of nodes, and then something like :NextMember would tell you what the next item from this one is. This is more flexible, but in my opinion harder to work with (You need to case for doesn't have next, to get the right order you have to do a 'trace' on the nodes, doesn't work if you want to add this node to another ordered list later, ect.)
Of course, if the order isn't arbitrary (based on name or age or something), than it is much better to just sort on the non-arbitrary logic.
The query for returning the children in the right order is
match (firstChild)-[:FirstChildOf]->(parent) match (sibling)-[:Sibling*]->(firstChild) return firstChild,sibling

neo4j getting from a list of labels that which is a child and which one is parent

I have a problem in which there a number of nodes A,B,C,D
where
B-->A
C-->B
D-->B
and the relation between them is children.
Now I want to query Neo4j to find that from a list of labels (B,C,D) which nodes exists at the bottom of the graph
I am making a bot application. In the neo4j database relations would be stored between different terms.
Like :dog-->:animal
:labra-->:dog
:germanShepard-->:dog
Now If a user asks a qustion tell me about dog then i should be able to get dog label data and if the user asks tell me about labra dog then i should be able to get labra label data.I am breaking the user input into tokens and then trying to find which label is at the bottom.
You can try something like
Match (a:Label) where not (a)<--(:Label) return a
(should work but I didn't test it)
As mentioned in my comment, using a unique label for every single node is going to be costly in the long run, and is going to impact your lookup speed on your queries.
So, if I'm understanding your use case correctly, you're breaking up user input into tokens, and the tokens should match to nodes on the same path in your graph. You want to find the label on the "bottom" of the graph, basically a leaf node, though in your description child nodes point toward their parent. I'll assume it's a :Parent relationship from the child to the parent node.
Here's a query which might do what you want. We'll assume you pass in the list of tokens as a parameter {tokens}. Please review the developer documentation for using parameters.
UNWIND {tokens} as token
MATCH (n)
WHERE labels(n) = token
AND NOT ()-[:Parent]->(n)
RETURN n
This will ensure the nodes you return are not themselves parents of any other node.
However, if you want instead wanted to be able to return nodes even if they were parents of other nodes, then we could instead return the node that is farthest from the root node. This requires a :Root node at the root of your entire graph. For your example in your description, :Root would be the parent of :animal.
UNWIND {tokens} as token
MATCH (n)
WHERE labels(n) = token
MATCH (n)-[r:Parent*]->(:Root)
RETURN n
ORDER BY SIZE(r)
LIMIT 1
Keep in mind that this query isn't guaranteed to work when there are multiple nodes with the same distance to the :Root. For example, if "germanShepard" and "labra" were given as elements of the tokens list, only one of the corresponding nodes would be returned because of the LIMIT 1, with no guarantee of which node would be returned.

Neo4j Cypher: Match and Delete the subgraph based on value of node property

Suppose I have 3 subgraphs in Neo4j and I would like to select and delete the whole subgraph if all the nodes in the subgraph matching the filtering criteria that is each node's property value <= 1. However if there is atleast one node within the subgraph that is not matching the criteria then the subgraph will not be deleted.
In this case the left subgraph will be deleted but the right subgraph and the middle one will stay. The right one will not be deleted even though it has some nodes with value 1 because there are also nodes with values greater than 1.
userids and values are the node properties.
I will be thankful if anyone can suggest me the cypher query that can be used to do that. Please note that the query will be on the whole graph, that is on all three subgraphs or more if there are anymore.
Thanks for the clarification, that's a tricky requirement, and it's not immediately clear to me what the best approach is that will scale well with large graphs, as most possibilities seem to be expensive full graph operations. We'll likely need to use a few steps to set up the graph for easier querying later. I'm also assuming you mean "disconnected subgraphs", otherwise this answer won't work.
One start might be to label nodes as :Alive or :Dead based upon the property value. It should help if all nodes are of the same label, and if there's an index on the value property for that label, as our match operations could take advantage of the index instead of having to do a full label scan and property comparison.
MATCH (a:MyNode)
WHERE a.value <= 1
SET a:Dead
And separately
MATCH (a:MyNode)
WHERE a.value > 1
SET a:Alive
Then your query to mark nodes to delete would be:
MATCH (a:Dead)
WHERE NOT (a)-[*]-(:Alive)
SET a:ToDelete
And if all looks good with the nodes you've marked for delete, you can run your delete operation, using apoc.periodic.commit() from APOC Procedures to batch the operation if necessary.
MATCH (a:ToDelete)
DETACH DELETE a
If operations on disconnected subgraphs are going to be common, I highly encourage using a special node connected to each subgraph you create (such as a single :Cluster node at the head of the subgraph) so you can begin such operations on :Cluster nodes, which would greatly speed up these kind of queries, since your query operations would be executed per cluster, instead of per :Dead node.

Do labels and properties have an id value in Neo4j?

I know that nodes and relationships have an integral value that identifies them. Is the same true of labels and/or properties? Is it sufficient to identify a node labeling by giving a node id and a string? That is, is it possible to assign the same label to the same node/relationship more than once?
All nodes and relationships have IDs and can be looked up by their IDs. You can do it like this:
MATCH (n) WHERE id(n) = 5 RETURN n;
IDs should be thought of as an internal implementation detail; don't rely on them to have any particular value or to be consistent or ordered, just rely on them to uniquely identify each node.
In general, IMHO it's good practice to assign your own meaningful identifier to nodes, probably indexed, that you can use to find your nodes, in the same way you'd assign a primary key to a relational database record.
It is possible to assign a label to more than one node; labels should be thought of more as classes of nodes, sort of like entities in an ERD. Typically labels would be things like Person, Company, Job, etc. Labels don't have much to do with identifiers though.
Do labels and properties have an id value in Neo4j?
No
Is it sufficient to identify a node labeling by giving a node id and a string?
If by this you mean that you have the node id and a label value in a String variable then yes, it is. It is also sufficient to just use the ID.
MATCH (n) WHERE ID(n) = 1234 RETURN n
Better:
MATCH (n:YourLabel) WHERE ID(n) = 1234 RETURN n
As FrobberOfBits intimated it is also preferable to ignore (withint reason) the internal IDs that Neo attaches to Nodes/Relationships in your external interactions. This is in part to do with Neo recycling it's internal identifiers (So if you create a Node it gets assigned ID 1, delete that Node, the next created Node could be assigned ID 1), and inpart to do with exposing the internals of the system. Instead you should probably attach meaningful identifiers or UUIDS where required.
Using your own identifier:
MATCH (n:YourLabel{uid:1234}) RETURN n
To make lookups fast, index them:
CREATE INDEX ON :YourLabel(uid)
Is it possible to assign the same label to the same node/relationship more than once?
Nodes can have as many labels as you want to assign them (including none), which is handy when you want to maintain a hierarchy of Node "types". Having a label makes them faster to lookup as Neo has a hint of where to start.
Relationships can only have a single type and that type is immutable. i.e If you want to change from type HAS_A to HAD_A you cannot just change the type, you must delete and re-add the relationship.
A node can be related to another node as many times as you want, using the same relationship type or different relationship types. Performancewise it is better to have different relationship types than to use properties on relationships as the lookup is faster.
CREATE (p:Person{name:"Dave"}), (m:Pet), (d:Pet),
(p)-[:HAS_PET{type:"Cat"}]->(m),
(p)-[:HAS_PET{type:"Dog"}]->(d)
Is fine, as is:
REATE (p:Person{name:"Dave"}), (m:Pet), (d:Pet),
(p)-[:HAS_CAT]->(m),
(p)-[:HAS_DOG]->(d)
Which is now faster if you wanted to do a query for matching all dogs. If Dave's dog gets reassigned you cannot do:
MATCH (p:Person{name:"Dave"})-[rel:HAS_DOG]->()
SET rel:HAS_CAT
But you could do:
MATCH (p:Person{name:"Dave"})<-[rel:HAS_PET{type:"Dog"}]-()
SET rel.type = "Cat"
But I've probably missed the point of the multiple-assignment question.
Not sure what you're aiming for:
yes, property-names, rel-types and labels have internal id's they are not stored as strings
you can identify nodes by label + property + value, one if you have a uniqueness constraint otherwise multiple
you can identify nodes by label (many)

Resources