How to calculate custom degree based on the node label or other conditions? - neo4j

I have a scenario where I need to calcula a custom degree between the first node (:employee) where it should only be incremented to another node when this node's label is :natural or :relative, but not when it is :legal.
Example:
The thing is I'm having trouble generating this custom degree property as I needed it.
So far I've tried playing with FOREACH and CASE but had no luck. The closest I got to getting some sort of calculated custom degree is this:
match p = (:employee)-[*5..5]-()
WITH distinct nodes(p) AS nodes
FOREACH(i IN RANGE(0, size(nodes)) |
FOREACH(node IN [nodes[i]] |
SET node.degree = i
))
return *
limit 1
But even this isn't right, as despite having 5 distinct nodes, I get SIZE(nodes) = 6, as the :legal node is accounted for twice for some reason.
Does anyone know how to achieve my goal within a single cypher query?
Also, if you know why the :legal node is account for twice, please let me know. I suspect it is because it has 2 :natural nodes related to it, but don't know the inner workings that make it appear twice.
More context:
:employee nodes are, well, employees of an organization
:relative nodes are relatives to an employee
:natural nodes are natural persons that may or may not be related to a :legal
:legal nodes are companies (legal persons) that may, or may not, be related to an :employee, :relative, :natural or another :legal on an IS_PARTNER relationship when, in real life, they are part of the board of directors or are shareholders of that company (:legal).
custom degree is what I aim to create and will define how close one node is to another given some conditions to this project (specified below).
All nodes have a total_contracts property that are the total amount of money received through contracts.
The objective is to find any employees with relationships to another node that has total_contracts > 0 and are up to custom degree <= 3, as employees may be receiving money from external sources, when they shouldn't.
As for why I need this custom degree ignoring the distance when it is a :legal node, is because we threat companies as the same distance as the natural person that is a partner.
On the illustrated example above, the employee has a son, DIEGO, that is a shareholder of a company (ALLURE) and has 2 other business partners (JOSE and ROSIEL). When I ask what's the degree of the son to the employee, I should get 1, as they are directly related; when I ask whats the degree of JOSE to the employee I should get 2, as JOSE is related to DIEGO through ALLURE and we shouldn't increment the custom degree when it is a company, only when its a person.

The trick with this type of graph is making sure we avoid paths that loop back to the same nodes (which is definitely going to happen quite a lot because you're using multiple relationships between nodes instead of just one...you may want to make sure this is necessary in your model).
The easiest way to do that is via APOC Procedures, as you can adjust the uniqueness of traversals so that nodes are unique in each path.
So for example, for a specific start node (let's say the :employee has empId:1 just for the sake of mocking up a lookup of the node, we'll calculate a degree for all nodes within 5 hops of the starting node. The idea here is that we'll take the length of the path (the number of hops) - the number of :legal nodes in the path (by filtering the nodes in the path for just :legal nodes, then getting the size of that filtered list).
MATCH (e:employee {empId:1})
CALL apoc.path.expandConfig(e, {minLevel:1, maxLevel:5, uniqueness:'NODE_PATH'}) YIELD path
WITH e, last(nodes(path)) as endNode,
length(path) - size([x in nodes(path) WHERE x:legal]) as customDegree
RETURN e, endNode, customDegree

Related

NEO4J - Matching a path where middle node might exist or not

I have the following graph:
I would look to get all contractors and subcontractors and clients, starting from David.
So I thought of a query likes this:
MATCH (a:contractor)-[*0..1]->(b)-[w:works_for]->(c:client) return a,b,c
This would return:
(0:contractor {name:"David"}) (0:contractor {name:"David"}) (56:client {name:"Sarah"})
(0:contractor {name:"David"}) (1:subcontractor {name:"John"}) (56:client {name:"Sarah"})
Which returns the desired result. The issue here is performance.
If the DB contains millions of records and I leave (b) without a label, the query will take forever. If I add a label to (b) such as (b:subcontractor) I won't hit millions of rows but I will only get results with subcontractors:
(0:contractor {name:"David"}) (1:subcontractor {name:"John"}) (56:client {name:"Sarah"})
Is there a more efficient way to do this?
link to graph example: https://console.neo4j.org/r/pry01l
There are some things to consider with your query.
The relationship type is not specified- is it the case that the only relationships from contractor nodes are works_for and hired? If not, you should constrain the relationship types being matched in your query. For example
MATCH (a:contractor)-[:works_for|:hired*0..1]->(b)-[w:works_for]->(c:client)
RETURN a,b,c
The fact that (b) is unlabelled does not mean that every node in the graph will be matched. It will be reached either as a result of traversing the works_for or hired relationships if specified, or any relationship from :contractor, or via the works_for relationship.
If you do want to label it, and you have a hierarchy of types, you can assign multiple labels to nodes and just use the most general one in your query. For example, you could have a label such as ExternalStaff as the generic label, and then further add Contractor or SubContractor to distinguish individual nodes. Then you can do something like
MATCH (a:contractor)-[:works_for|:hired*0..1]->(b:ExternalStaff)-[w:works_for]->(c:client)
RETURN a,b,c
Depends really on your use cases.

Neo4j: Iterating from leaf to parent AND finding common children

I've migrated my relational database to neo4j and am studying whether I can implement some functionalities before I commit to the new system. I just read two neo4j books, but unfortunately they don't cover two key features I was hoping would be more self-evident. I'd be most grateful for some quick advice on whether these things will be easy to implement or whether I should stick to sql! Thx!
Features I need are:
1) I have run a script to assign :leaf label to all nodes that are leaves in my tree. In paths between a known node and its related leaf nodes, I aim to assign to every node a level property that reflects how many hops that node is from the known node (or leaf node - whatever I can get to work most easily).
I tried:
match path=(n:Leaf)-[:R*]->(:Parent {Parent_ID: $known_value})
with n, length(nodes(path)) as hops
set n.Level2=hops;
and
path=(n:Leaf)-[:R*]->(:Parent {Parent_ID: $known_value})
with n, path, length(nodes(path)) as hops
foreach (n IN relationships (path) |
set n.Level=hops);
The first assigns property with value of full length of path to only leaf nodes. The second assigns property with value of full length of path to all relationships in path.
Should I be using shortestpath instead, create a bogus property with value =1 for all nodes and iteratively add weight of that property?
2) I need to find the common children for a given parent node. For example, my children each [:like] lots of movies, and I would like to create [:like] relationships from myself to just the movies that my children all like in common (so if 1 of 1 likes a movie, then I like it too, but if only 2 of 3 like a movie, nothing happens).
I found a solution with three paths here:
Need only common nodes across multiple paths - Neo4j Cypher
But I need a solution that works for any number of paths (starting from 1).
3) Then I plan to start at my furthest leaf nodes, create relationships to children's movies, and move level by level toward my known node and repeat create relationships, so that the top-most grandparent likes only the movies that all children [of all children of all children...] like in common and if there's one that everybody agrees on, that's the movie the entire extended family will watch Saturday night.
Can this be done with neo4j and how hard a task is it for someone with rudimentary Cypher? This is mostly how I did it in my relational database / Should I be looking at implementing this totally differently in graph database?
Most grateful for any advice. Thanks!
1.
shortestPath() may help when your already matched start and end nodes are not the root and the leaf, in that it won't continue to look for additional paths once the first is found. If your already matched start and end nodes are the root and the leaf when the graph is a tree structure (acyclic), there's no real reason to use shortestPath().
Typically when setting something like the depth of a node in a tree, you would use length(path), so the root would be at depth 0, its children at depth 1.
Usually depth is calculated with respect to the root node and not leaf nodes (as an intermediate node may be the ancestor of multiple leaf nodes at differing distances). Taking the depth as the distance from the root makes the depths consistent.
Your approach with setting the property on relationships will be a problem, as the same relationship can be present in multiple paths for multiple leaf nodes at varying depths. Your query could overwrite the property on the same relationship over and over until the last write wins. It would be better to match down to all nodes (leave out :Leaf in the query), take the last relationship in the path, and set its depth:
MATCH path=(:Parent {Parent_ID: $known_value})<-[:R*]-()
WITH length(path) as length, last(relationships(path)) as rel
SET rel.Level = length
2.
So if all child nodes of a parent in the tree :like a movie then the parent should :like the movie. Something like this should work:
MATCH path=(:Parent {Parent_ID: $known_value})<-[:R*0..]-(n)
WITH n, size((n)<-[:R]-()) as childCount
MATCH (n)<-[:R]-()-[:like]->(m:Movie)
WITH n, childCount, m, count(m) as movieLikes
WHERE childCount = movieLikes
MERGE (n)-[:like]->(m)
The idea here is that for a movie, if the count of that movie node equals the count of the child nodes then all of the children liked the movie (provided that a node can only :like the same movie once).
This query can't be used to build up likes from the bottom up however, the like relationships (liking personally, as opposed to liking because all children liked it) would have to be present on all nodes first for this query to work.
3.
In order to do a bottom-up approach, you would need to force the query to execute in a particular order, and I believe the best way to do that is to first order the nodes to process in depth order, then use apoc.cypher.doIt(), a proc in APOC Procedures which lets you execute an entire Cypher query per row, to do the calculation.
This approach should work:
MATCH path=(:Parent {Parent_ID: $known_value})<-[:R*0..]-(n)
WHERE NOT n:Leaf // leaves should have :like relationships already created
WITH n, length(path) as depth, size((n)<-[:R]-()) as childCount
ORDER BY depth DESC
CALL apoc.cypher.doIt("
MATCH (n)<-[:R]-()-[:like]->(m:Movie)
WITH n, childCount, m, count(m) as movieLikes
WHERE childCount = movieLikes
MERGE (n)-[:like]->(m)
RETURN count(m) as relsCreated",
{n:n, childCount:childCount}) YIELD value
RETURN sum(value.relsCreated) as relsCreated
That said, I'm not sure this will do what you think it will do. Or rather, it will only work the way you think it will if the only :like relationships to movies are initially set on just the leaf nodes, and (prior to running this propagation query) no other intermediate node in the tree has any :like relationship to a movie.

Choose a path depending on relationship property on neo4j?

All my nodes are 'Places' with only the 'name' property, and I have different relationships named A, B and C, each one of them has a 'cost' property.
If I am at the first node, connected to the second one, I want to 'take' the relationship with the lower cost.
For example:
MATCH (p1:Place{name: place1})
MATCH (p2:Place{name: place2})
MERGE (place1)-[:A{cost: "10"}]->(place2)
MERGE (place1)-[:B{cost: "5"}]->(place2)
MERGE (place1)-[:C{cost: "20"}]->(place2)
What Ii want to do, is take (in this case) the relationship B
The costs of the relationships are always the same for the name each one of them (A always costs 10, and B always 5) so maybe it will not be necessary to put the cost property to it).
the best solution is to do it with a query or list the paths and select the best one with java?
Depending on that, how can I do it? and what it would be the query?
There's a few ways you can do this.
For few nodes and few relationships, it should be easy enough to order the relationships by cost and grab the first one:
...
MATCH path=(:Place{name: place1})-[r]-(:Place{name: place2})
WITH path, r
ORDER BY r.cost ASC
LIMIT 1
RETURN path
If this is for a more complex operation, such as calculating a path of least cost between nodes, then this turns into a weighted shortest path query, and you might want to look into solutions using Dijkstra's algorithm. APOC Procedures has an implementation you might use.

neo4j getting from a list of labels that which is a child and which one is parent

I have a problem in which there a number of nodes A,B,C,D
where
B-->A
C-->B
D-->B
and the relation between them is children.
Now I want to query Neo4j to find that from a list of labels (B,C,D) which nodes exists at the bottom of the graph
I am making a bot application. In the neo4j database relations would be stored between different terms.
Like :dog-->:animal
:labra-->:dog
:germanShepard-->:dog
Now If a user asks a qustion tell me about dog then i should be able to get dog label data and if the user asks tell me about labra dog then i should be able to get labra label data.I am breaking the user input into tokens and then trying to find which label is at the bottom.
You can try something like
Match (a:Label) where not (a)<--(:Label) return a
(should work but I didn't test it)
As mentioned in my comment, using a unique label for every single node is going to be costly in the long run, and is going to impact your lookup speed on your queries.
So, if I'm understanding your use case correctly, you're breaking up user input into tokens, and the tokens should match to nodes on the same path in your graph. You want to find the label on the "bottom" of the graph, basically a leaf node, though in your description child nodes point toward their parent. I'll assume it's a :Parent relationship from the child to the parent node.
Here's a query which might do what you want. We'll assume you pass in the list of tokens as a parameter {tokens}. Please review the developer documentation for using parameters.
UNWIND {tokens} as token
MATCH (n)
WHERE labels(n) = token
AND NOT ()-[:Parent]->(n)
RETURN n
This will ensure the nodes you return are not themselves parents of any other node.
However, if you want instead wanted to be able to return nodes even if they were parents of other nodes, then we could instead return the node that is farthest from the root node. This requires a :Root node at the root of your entire graph. For your example in your description, :Root would be the parent of :animal.
UNWIND {tokens} as token
MATCH (n)
WHERE labels(n) = token
MATCH (n)-[r:Parent*]->(:Root)
RETURN n
ORDER BY SIZE(r)
LIMIT 1
Keep in mind that this query isn't guaranteed to work when there are multiple nodes with the same distance to the :Root. For example, if "germanShepard" and "labra" were given as elements of the tokens list, only one of the corresponding nodes would be returned because of the LIMIT 1, with no guarantee of which node would be returned.

In Neo4j for every disjoint subgraph return the node with the most relationships

I’m new to Neo4j and graph theory and I’m trying to figure out if I can use Neo4j to solve a problem I have. Please correct me if I’m using the wrong words to describe stuff. Since I’m new to the subject I haven’t really wrapped my head around what to call everything.
I think the easiest way to describe my problem is with a lot of pictures.
Let’s say you have two disjoint subgraphs that look like this.
From the subgraphs above I want to get a list of subgraphs that fulfills one of two criteria.
Criteria 1.
If a node has a unique relationship to another node, the nodes and relationship should be returned as a subgraph.
Criteria 2.
If the relations are not unique, I'd like the node with the most relationships to be returned, as a subgraph with its relationships and related nodes.
If other nodes come in tie in criteria 2, I want all subgraphs to be returned.
Or put in the context of this graph,
Give me the people who have unique games, and if there are other people having the same games, give me back the person with the most games. If they come in tie, return all people who come in tie.
Or actually, return the whole subgraph, not only the person.
To clarify what I am after here is a picture that describes the result I want to get. The ordering of the result is not important.
Disjoint subgraph A, because of Criteria 1, Andrew is the only person who has Bubble Bobble.
Disjoint subgraph B, because of Criteria 1, Johan is the only person who has Puzzle Bobble 1.
Disjoint subgraph C, because of Criteria 2, Julia since she has the most games.
Disjoint subgraph D, because of Criteria 2, Anna since she comes in tie with Julia having the most games.
Worth noting is that Johan's relationship to Puzzle Bobble 2 is not returned because it's not unique and he has not the most games.
Is this a problem you could solve with only Neo4j and is it a good idea?
If you could solve it how would you do it in Cypher?
Create script:
CREATE (p1:Person {name:"Johan"}),
(p2:Person {name:"Julia"}),
(p3:Person {name:"Anna"}),
(p4:Person {name:"Andrew"}),
(v1:Videogame {name:"Puzzle Bobble 1"}),
(v2:Videogame {name:"Puzzle Bobble 2"}),
(v3:Videogame {name:"Puzzle Bobble 3"}),
(v4:Videogame {name:"Puzzle Bobble 4"}),
(v5:Videogame {name:"Bubble Bobble"}),
(p1)-[:HAS]->(v1),
(p1)-[:HAS]->(v2),
(p2)-[:HAS]->(v2),
(p2)-[:HAS]->(v3),
(p2)-[:HAS]->(v4),
(p3)-[:HAS]->(v2),
(p3)-[:HAS]->(v3),
(p3)-[:HAS]->(v4),
(p4)-[:HAS]->(v5)
I feel like this solution might not be quite what you're looking for, but it could be a good start:
MATCH (game:Videogame)<-[:HAS]-(owner:Person)
OPTIONAL MATCH owner-[:HAS]->(other_game:Videogame)
WITH game, owner, count(other_game) AS other_game_count
ORDER BY other_game_count DESC
RETURN game, collect(owner)[0]
Here the query:
Finds all of the games and their owners (games without owners will not be matched)
Does an OPTIONAL MATCH against any other games those owners might own (by doing an optional match we're saying that it's OK if they own zero)
Pass through each game/owner pair along with a count of the number of other games owned by that owner, sorting so that those with the most games come first
RETURN the first owner for each game (the ORDER is preserved when doing the collect)

Resources