Choose a path depending on relationship property on neo4j? - neo4j

All my nodes are 'Places' with only the 'name' property, and I have different relationships named A, B and C, each one of them has a 'cost' property.
If I am at the first node, connected to the second one, I want to 'take' the relationship with the lower cost.
For example:
MATCH (p1:Place{name: place1})
MATCH (p2:Place{name: place2})
MERGE (place1)-[:A{cost: "10"}]->(place2)
MERGE (place1)-[:B{cost: "5"}]->(place2)
MERGE (place1)-[:C{cost: "20"}]->(place2)
What Ii want to do, is take (in this case) the relationship B
The costs of the relationships are always the same for the name each one of them (A always costs 10, and B always 5) so maybe it will not be necessary to put the cost property to it).
the best solution is to do it with a query or list the paths and select the best one with java?
Depending on that, how can I do it? and what it would be the query?

There's a few ways you can do this.
For few nodes and few relationships, it should be easy enough to order the relationships by cost and grab the first one:
...
MATCH path=(:Place{name: place1})-[r]-(:Place{name: place2})
WITH path, r
ORDER BY r.cost ASC
LIMIT 1
RETURN path
If this is for a more complex operation, such as calculating a path of least cost between nodes, then this turns into a weighted shortest path query, and you might want to look into solutions using Dijkstra's algorithm. APOC Procedures has an implementation you might use.

Related

How to move all neo4j relationships with all their labels and properties from one node to another?

Suppose you've got two nodes that represent the same thing, and you want to merge those two nodes. Both nodes can have any number of relations with other nodes.
The basics are fairly easy, and would look something like this:
MATCH (a), (b) WHERE a.id == b.id
MATCH (b)-[r]->()
CREATE (a)-[s]->()
SET s = PROPERTIES(r)
DELETE DETACH b
Only I can't create a relation without a type. And Cypher doesn't support variable labels either. I'd love to be able to do something like
CREATE (a)-[s:{LABELS(r)}]->(o)
but that doesn't work. To create the relation, you need to know the type of the relation, and in this case I really don't.
Is there a way to dynamically assign types to relationships, or am I going to have to query the types of the old relation, and then string concat new queries with the proper types? That's not impossible, but a lot slower and more complex. And this could potentially match a lot of elements and even more relationships, so having to generate a separate query for every instance is going to slow things down quite a lot.
Or is there a way to change the target of the old relationship? That would probably be the fastest, but I'm not aware of any way to do that.
I think you need to take a look at APOC, especially apoc.create.relationship which enable creating relationships with dynamic type.
Adapting your example, you should end up with something along the line of (not tested):
MATCH (a), (b) WHERE a.id == b.id
MATCH (b)-[r]->(n)
CALL apoc.create.relationship(a, type(r), properties(r), n)
DETACH DELETE b
NB
relationships have TYPE and not label
the proper cypher statement to delete relationships attached to a node and the node itself is DETACH DELETE (and not DELETE DETACH)
Related resource: https://markhneedham.com/blog/2016/10/30/neo4j-create-dynamic-relationship-type/
The APOC procedure apoc.refactor.mergeNodes should be very helpful. That procedure is very powerful, and you need to read the documentation to understand how to configure it to do what you want in your specific situation.
Here is a simple example that shows how to use the procedure's default configuration to merge nodes with the same id:
MATCH (node:Foo)
WITH node.id AS id, COLLECT(node) AS nodes
WHERE SIZE(nodes) > 1
CALL apoc.refactor.mergeNodes(nodes, {}) YIELD node
RETURN node
In this example, I specified an arbitrary Foo label to avoid accidentally merging unwanted nodes. Doing so also helps to speed up the query if you have a lot of nodes with other labels (since they will not need to be scanned for the id property).
The aggregating function COLLECT is used to collect a list of all the nodes with the same id. After checking the size of the list, it is passed to the procedure.

Neo4j: Iterating from leaf to parent AND finding common children

I've migrated my relational database to neo4j and am studying whether I can implement some functionalities before I commit to the new system. I just read two neo4j books, but unfortunately they don't cover two key features I was hoping would be more self-evident. I'd be most grateful for some quick advice on whether these things will be easy to implement or whether I should stick to sql! Thx!
Features I need are:
1) I have run a script to assign :leaf label to all nodes that are leaves in my tree. In paths between a known node and its related leaf nodes, I aim to assign to every node a level property that reflects how many hops that node is from the known node (or leaf node - whatever I can get to work most easily).
I tried:
match path=(n:Leaf)-[:R*]->(:Parent {Parent_ID: $known_value})
with n, length(nodes(path)) as hops
set n.Level2=hops;
and
path=(n:Leaf)-[:R*]->(:Parent {Parent_ID: $known_value})
with n, path, length(nodes(path)) as hops
foreach (n IN relationships (path) |
set n.Level=hops);
The first assigns property with value of full length of path to only leaf nodes. The second assigns property with value of full length of path to all relationships in path.
Should I be using shortestpath instead, create a bogus property with value =1 for all nodes and iteratively add weight of that property?
2) I need to find the common children for a given parent node. For example, my children each [:like] lots of movies, and I would like to create [:like] relationships from myself to just the movies that my children all like in common (so if 1 of 1 likes a movie, then I like it too, but if only 2 of 3 like a movie, nothing happens).
I found a solution with three paths here:
Need only common nodes across multiple paths - Neo4j Cypher
But I need a solution that works for any number of paths (starting from 1).
3) Then I plan to start at my furthest leaf nodes, create relationships to children's movies, and move level by level toward my known node and repeat create relationships, so that the top-most grandparent likes only the movies that all children [of all children of all children...] like in common and if there's one that everybody agrees on, that's the movie the entire extended family will watch Saturday night.
Can this be done with neo4j and how hard a task is it for someone with rudimentary Cypher? This is mostly how I did it in my relational database / Should I be looking at implementing this totally differently in graph database?
Most grateful for any advice. Thanks!
1.
shortestPath() may help when your already matched start and end nodes are not the root and the leaf, in that it won't continue to look for additional paths once the first is found. If your already matched start and end nodes are the root and the leaf when the graph is a tree structure (acyclic), there's no real reason to use shortestPath().
Typically when setting something like the depth of a node in a tree, you would use length(path), so the root would be at depth 0, its children at depth 1.
Usually depth is calculated with respect to the root node and not leaf nodes (as an intermediate node may be the ancestor of multiple leaf nodes at differing distances). Taking the depth as the distance from the root makes the depths consistent.
Your approach with setting the property on relationships will be a problem, as the same relationship can be present in multiple paths for multiple leaf nodes at varying depths. Your query could overwrite the property on the same relationship over and over until the last write wins. It would be better to match down to all nodes (leave out :Leaf in the query), take the last relationship in the path, and set its depth:
MATCH path=(:Parent {Parent_ID: $known_value})<-[:R*]-()
WITH length(path) as length, last(relationships(path)) as rel
SET rel.Level = length
2.
So if all child nodes of a parent in the tree :like a movie then the parent should :like the movie. Something like this should work:
MATCH path=(:Parent {Parent_ID: $known_value})<-[:R*0..]-(n)
WITH n, size((n)<-[:R]-()) as childCount
MATCH (n)<-[:R]-()-[:like]->(m:Movie)
WITH n, childCount, m, count(m) as movieLikes
WHERE childCount = movieLikes
MERGE (n)-[:like]->(m)
The idea here is that for a movie, if the count of that movie node equals the count of the child nodes then all of the children liked the movie (provided that a node can only :like the same movie once).
This query can't be used to build up likes from the bottom up however, the like relationships (liking personally, as opposed to liking because all children liked it) would have to be present on all nodes first for this query to work.
3.
In order to do a bottom-up approach, you would need to force the query to execute in a particular order, and I believe the best way to do that is to first order the nodes to process in depth order, then use apoc.cypher.doIt(), a proc in APOC Procedures which lets you execute an entire Cypher query per row, to do the calculation.
This approach should work:
MATCH path=(:Parent {Parent_ID: $known_value})<-[:R*0..]-(n)
WHERE NOT n:Leaf // leaves should have :like relationships already created
WITH n, length(path) as depth, size((n)<-[:R]-()) as childCount
ORDER BY depth DESC
CALL apoc.cypher.doIt("
MATCH (n)<-[:R]-()-[:like]->(m:Movie)
WITH n, childCount, m, count(m) as movieLikes
WHERE childCount = movieLikes
MERGE (n)-[:like]->(m)
RETURN count(m) as relsCreated",
{n:n, childCount:childCount}) YIELD value
RETURN sum(value.relsCreated) as relsCreated
That said, I'm not sure this will do what you think it will do. Or rather, it will only work the way you think it will if the only :like relationships to movies are initially set on just the leaf nodes, and (prior to running this propagation query) no other intermediate node in the tree has any :like relationship to a movie.

How to calculate custom degree based on the node label or other conditions?

I have a scenario where I need to calcula a custom degree between the first node (:employee) where it should only be incremented to another node when this node's label is :natural or :relative, but not when it is :legal.
Example:
The thing is I'm having trouble generating this custom degree property as I needed it.
So far I've tried playing with FOREACH and CASE but had no luck. The closest I got to getting some sort of calculated custom degree is this:
match p = (:employee)-[*5..5]-()
WITH distinct nodes(p) AS nodes
FOREACH(i IN RANGE(0, size(nodes)) |
FOREACH(node IN [nodes[i]] |
SET node.degree = i
))
return *
limit 1
But even this isn't right, as despite having 5 distinct nodes, I get SIZE(nodes) = 6, as the :legal node is accounted for twice for some reason.
Does anyone know how to achieve my goal within a single cypher query?
Also, if you know why the :legal node is account for twice, please let me know. I suspect it is because it has 2 :natural nodes related to it, but don't know the inner workings that make it appear twice.
More context:
:employee nodes are, well, employees of an organization
:relative nodes are relatives to an employee
:natural nodes are natural persons that may or may not be related to a :legal
:legal nodes are companies (legal persons) that may, or may not, be related to an :employee, :relative, :natural or another :legal on an IS_PARTNER relationship when, in real life, they are part of the board of directors or are shareholders of that company (:legal).
custom degree is what I aim to create and will define how close one node is to another given some conditions to this project (specified below).
All nodes have a total_contracts property that are the total amount of money received through contracts.
The objective is to find any employees with relationships to another node that has total_contracts > 0 and are up to custom degree <= 3, as employees may be receiving money from external sources, when they shouldn't.
As for why I need this custom degree ignoring the distance when it is a :legal node, is because we threat companies as the same distance as the natural person that is a partner.
On the illustrated example above, the employee has a son, DIEGO, that is a shareholder of a company (ALLURE) and has 2 other business partners (JOSE and ROSIEL). When I ask what's the degree of the son to the employee, I should get 1, as they are directly related; when I ask whats the degree of JOSE to the employee I should get 2, as JOSE is related to DIEGO through ALLURE and we shouldn't increment the custom degree when it is a company, only when its a person.
The trick with this type of graph is making sure we avoid paths that loop back to the same nodes (which is definitely going to happen quite a lot because you're using multiple relationships between nodes instead of just one...you may want to make sure this is necessary in your model).
The easiest way to do that is via APOC Procedures, as you can adjust the uniqueness of traversals so that nodes are unique in each path.
So for example, for a specific start node (let's say the :employee has empId:1 just for the sake of mocking up a lookup of the node, we'll calculate a degree for all nodes within 5 hops of the starting node. The idea here is that we'll take the length of the path (the number of hops) - the number of :legal nodes in the path (by filtering the nodes in the path for just :legal nodes, then getting the size of that filtered list).
MATCH (e:employee {empId:1})
CALL apoc.path.expandConfig(e, {minLevel:1, maxLevel:5, uniqueness:'NODE_PATH'}) YIELD path
WITH e, last(nodes(path)) as endNode,
length(path) - size([x in nodes(path) WHERE x:legal]) as customDegree
RETURN e, endNode, customDegree

Neo4j and Cypher - How can I create/merge chained sequential node relationships (and even better time-series)?

To keep things simple, as part of the ETL on my time-series data, I added a sequence number property to each row corresponding to 0..370365 (370,366 nodes, 5,555,490 properties - not that big). I later added a second property and named it "outeseq" (original) and "ineseq" (second) to see if an outright equivalence to base the relationship on might speed things up a bit.
I can get both of the following queries to run properly on up to ~30k nodes (LIMIT 30000) but past that, its just an endless wait. My JVM has 16g max (if it can even use it on a windows box):
MATCH (a:BOOK),(b:BOOK)
WHERE a.outeseq=b.outeseq-1
MERGE (a)-[s:FORWARD_SEQ]->(b)
RETURN s;
or
MATCH (a:BOOK),(b:BOOK)
WHERE a.outeseq=b.ineseq
MERGE (a)-[s:FORWARD_SEQ]->(b)
RETURN s;
I also added these in hopes of speeding things up:
CREATE CONSTRAINT ON (a:BOOK)
ASSERT a.outeseq IS UNIQUE
CREATE CONSTRAINT ON (b:BOOK)
ASSERT b.ineseq IS UNIQUE
I can't get the relationships created for the entire data set! Help!
Alternatively, I can also get bits of the relationships built with parameters, but haven't figured out how to parameterize the sequence over all of the node-to-node sequential relationships, at least not in a semantically general enough way to do this.
I profiled the query, but did't see any reason for it to "blow-up".
Another question: I would like each relationship to have a property to represent the difference in the time-stamps of each node or delta-t. Is there a way to take the difference between the two values in two sequential nodes, and assign it to the relationship?....for all of the relationships at the same time?
The last Q, if you have the time - I'd really like to use the raw data and just chain the directed relationships from one nodes'stamp to the next nearest node with the minimum delta, but didn't run right at this for fear that it cause scanning of all the nodes in order to build each relationship.
Before anyone suggests that I look to KDB or other db's for time series, let me say I have a very specific reason to want to use a DAG representation.
It seems like this should be so easy...it probably is and I'm blind. Thanks!
Creating Relationships
Since your queries work on 30k nodes, I'd suggest to run them page by page over all the nodes. It seems feasible because outeseq and ineseq are unique and numeric so you can sort nodes by that properties and run query against one slice at time.
MATCH (a:BOOK),(b:BOOK)
WHERE a.outeseq = b.outeseq-1
WITH a, b ORDER BY a.outeseq SKIP {offset} LIMIT 30000
MERGE (a)-[s:FORWARD_SEQ]->(b)
RETURN s;
It will take about 13 times to run the query changing {offset} to cover all the data. It would be nice to write a script on any language which has a neo4j client.
Updating Relationship's Properties
You can assign timestamp delta to relationships using SET clause following the MATCH. Assuming that a timestamp is a long:
MATCH (a:BOOK)-[s:FORWARD_SEQ]->(b:BOOK)
SET s.delta = abs(b.timestamp - a.timestamp);
Chaining Nodes With Minimal Delta
When relationships have the delta property inside, the graph becomes a weighted graph. So we can apply this approach to calculate the shortest path using deltas. Then we just save the length of the shortest path (summ of deltas) into the relation between the first and the last node.
MATCH p=(a:BOOK)-[:FORWARD_SEQ*1..]->(b:BOOK)
WITH p AS shortestPath, a, b,
reduce(weight=0, r in relationships(p) : weight+r.delta) AS totalDelta
ORDER BY totalDelta ASC
LIMIT 1
MERGE (a)-[nearest:NEAREST {delta: totalDelta}]->(b)
RETURN nearest;
Disclaimer: queries above are not supposed to be totally working, they just hint possible approaches to the problem.

How to find distinct nodes in a Neo4j/Cypher query

I'm trying to do some pattern matching in neo4j/cypher and I came across this issue:
There are two types of graphs I want to search for:
Star graphs: A graph with one center node and multiple outgoing relationships.
n-length line graphs: A line graph with length n where none of the nodes are repeats (I have some bidirectional edges and cycles in my graph)
So the main problem is that when I do something such as:
MATCH a-->b, a-->c, a-->d
MATCH a-->b-->c-->d
Cypher doesn't guarantee (when I tried it) that a, b, c, and d are all different nodes. For small graphs, this can easily be fixed with
WHERE not(a=b) AND not(a=c) AND ...
But I'm trying to have graphs of size 10+, so checking equality between all nodes isn't a viable option. Afaik, RETURN DISTINCT does not work as well since it doesn't check equality among variables, only across different rows. Is there any simple way I can specify the query to make the differently named nodes distinct?
Old question, but look to APOC Path Expander procedures for how to address these kinds of use cases, as you can change the traversal uniqueness behavior for expansion (the same way you can when using the traversal API...which these procedures use).
Cypher implicitly uses RELATIONSHIP_PATH uniqueness, meaning that per path returned, a relationship must be unique, it cannot be used multiple times in a single path.
While this is good for queries where you need all possible paths, it's not a good fit for queries where you want distinct nodes or a subgraph or to prevent repeating nodes in a path.
For an n-length path, let's say depth 6 with only outgoing relationships of any type, we can change the uniqueness to NODE_PATH, where a node must be unique per path, no repeats in a path:
MATCH (n)
WHERE id(n) = 12345
CALL apoc.path.expandConfig(n, {maxLevel:6, uniqueness:'NODE_PATH'}) YIELD path
RETURN path
If you want all reachable nodes up to a certain depth (or at any depth by omitting maxLevel), you can use NODE_GLOBAL uniqueness, or instead just use apoc.path.subgraphNodes():
MATCH (n)
WHERE id(n) = 12345
CALL apoc.path.subgraphNodes(n, {maxLevel:6}) YIELD node
RETURN node
NODE_GLOBAL uniqueness means that across all paths that a node must be unique, it will only be visited once, and there will only be one path to a node from a given start node. This keeps the number of paths that need to be evaluated down significantly, but because of this behavior not all relationships will be traversed, if they expand to a node already visited.
You will not get relationships back with this procedure (you can use apoc.path.spanningTree() for that, although as previously mentioned not all relationships will be included, as we will only capture a single path to each node, not all possible paths to nodes). If you want all nodes up to a max level and all possible relationships between those nodes, then use apoc.path.subgraphAll():
MATCH (n)
WHERE id(n) = 12345
CALL apoc.path.subgraphAll(n, {maxLevel:6}) YIELD nodes, relationships
RETURN nodes, relationships
Richer options exist for label and relationship filtering, or filtering (whitelist, blacklist, endnode, terminator node) based on lists of pre-matched nodes.
We also support repeating sequences of relationships or node labels.
If you need filtering by node or relationship properties during expansion, then this won't be a good option as that feature is yet supported.

Resources