Cypher query shortest path - neo4j

I build a graphe this way: the nodes represents: busStops, and the relationship represent the bus line linking bus stops each others.
The relationship type correspond to the time needed to go from a node two another one.
When I'm querying the graph (thanks to cypher) to get the shortestPath between two which are maybe not linked, the result is the one where the number of relations used is the smallest.
I would to change that in order that the shortest path corresponds to the path where the addition of all relationship types used between two nodes(which correspond to the time) is the smallest?

first, you are doing it wrong. don't use a unique relationship type for each time. use one relationship type and then put a property "time" on all relations.
second, to calculate the addition you can use this cypher formula:
START from=node({busStopId1}), to=node({busStopId2})
MATCH p=from-[:LINE*]-to //asterix * means any distance
RETURN p,reduce(total = 0, r in relationships(p): total + r.time) as tt
ORDER by tt asc;

Related

Choose a path depending on relationship property on neo4j?

All my nodes are 'Places' with only the 'name' property, and I have different relationships named A, B and C, each one of them has a 'cost' property.
If I am at the first node, connected to the second one, I want to 'take' the relationship with the lower cost.
For example:
MATCH (p1:Place{name: place1})
MATCH (p2:Place{name: place2})
MERGE (place1)-[:A{cost: "10"}]->(place2)
MERGE (place1)-[:B{cost: "5"}]->(place2)
MERGE (place1)-[:C{cost: "20"}]->(place2)
What Ii want to do, is take (in this case) the relationship B
The costs of the relationships are always the same for the name each one of them (A always costs 10, and B always 5) so maybe it will not be necessary to put the cost property to it).
the best solution is to do it with a query or list the paths and select the best one with java?
Depending on that, how can I do it? and what it would be the query?
There's a few ways you can do this.
For few nodes and few relationships, it should be easy enough to order the relationships by cost and grab the first one:
...
MATCH path=(:Place{name: place1})-[r]-(:Place{name: place2})
WITH path, r
ORDER BY r.cost ASC
LIMIT 1
RETURN path
If this is for a more complex operation, such as calculating a path of least cost between nodes, then this turns into a weighted shortest path query, and you might want to look into solutions using Dijkstra's algorithm. APOC Procedures has an implementation you might use.

How to cluster nodes together in Neo4j

My graph is 1M nodes. The data model is intentionally simple. There are Entities and IDType nodes. A single Entity may have 1:many IDType nodes. And an IDType node may be connected to 1:many Entities. This forms the graph.
The goal is to find all clusters of IDType's and Entities that are connected together into what I call a cluster of nodes (subgraph I guess some call it). Imagine if we had 1M nodes. I would like to find "clusters" like this in the graph data, I'm trying to figure out how to do that. I've written the cypher query that I believe does it, but it's not clear to me if it's doing what is intended.
The question: how do I efficiently traverse my graph and cluster together nodes so that there is a single row or group of rows that I can return as a row-based result set to my python driver program to then operate over that cluster. While this doesn't need to be the exact structure of my result, this is a sense of what I'm looking for.
cluster|nodes
1|2,3,4,5,6,7
2|10,11,12,13
3|15,17,19,20,21,25,27,28,33
Where the "cluster" is some arbitrary clustering of the list of nodes (frankly if I have a single line that's just a collection of clusters or some other way of telling they are all related, then I'm golden). The "nodes" number represents a unique integer-based property that we tag to every Entity node.
The query is below. The concept is that an "Entity" node can have 1 or many "ID" nodes and I'm trying to get all "Entity" and "ID" that are related to each other through the relationship "HAS_ID".
Conceptually, if there is a relationship that exists in the data like this Entity1-->ID1<--Entity2-->ID2<--Entity3-->ID3<--Entity4-->ID4<--Entity5 then I want to "cluster" them together so that I can create a unique number that represents this group of nodes. With my example, there are 5 entities, but there could just as easily be 2 entities, or 50 entities, which are all related to one another, that's why I'm thinking the variable length path is what I need.
The below is my attempt to do this in the graph. But 1) is it correct? 2) is it efficient because it seems to runs indefinitely 3) how do i best "group" these together?
match
(n:Entity)-[e1:HAS_ID*]-(o)
where n.key <> o.key
return *
limit 10
;
I've also tried
match (n:Entity)-[e1:HAS_ID*]-(o)
where n.key <> o.key
with distinct n.key as key_1, o.key as key_2
return key_1, collect(key_2)
limit 100
;
This seems to do close to what I want, but I'm still not getting a single group for a given key, in other words, I can have 5 rows returned but they are all still related, which I'd rather have 1 row in that case... He's an example, you can see that key "49518" is on the first and second row, I'd rather have one row that grouped them all together.
49518 [49004, 49871, 49940, 50525, 49101, 49625, 50165, 50017, 49098, 50383]
49940 [49088, 49706, 50292, 50470, 49140, 49258, 49216, 49559, 50004, 50346, 49237, 49518, 49894, 49101, 49625, 50165, 50017, 49098, 50383]
Well, for one, your query doesn't match the relationship pattern you described.
Each of your arrows in your pattern is a [:HAS_ID] relationship, so if entities and IDs are always alternating between each relationship, then your current query would only match patterns like this:
(:Entity)-[:HAS_ID]->(:ID)<-[:HAS_ID]-(:Entity)-[:HAS_ID]->(:ID)<-[:HAS_ID]-(:Entity)
3 entities, 2 IDs, 4 relationships. That doesn't match your example pattern of 5 entities, 4 IDs, and 8 relationships. So at the very least, you'll want to alter your pattern to use *8.
As for efficiency...the thing you're trying to do seems rather inefficient, as it must attempt to find this pattern on every single :Entity node in your graph, trying every single :HAS_ID relationship it finds. If your entire graph is made of this same pattern of :Entity and :ID and :HAS_ID, then your query is going to be traversing your entire graph, not once but multiple times.
You are going to get duplicate results. Even if we assume that your entire graph is made up of isolated 5 entity / 4 ID / 8 relationship chains like a snake, as in your example (an entity either being at the end of the chain with one link to an ID, or somewhere in the middle with links to 2 IDs), then you'll be getting 2 matches for that same group of nodes, one matching from one end of the chain, the other matching the other end. And that's the simple case...I'm guessing your graph could be much more complex than this, allowing even more possibilities for many different patterns to match on the exact same group of nodes. A unique path using your pattern does not equate to a unique grouping of nodes.
At the very least, you'll probably want to match on a pattern and use RETURN DISTINCT NODES(p) to enforce unique sets of nodes, but I still think the matching may take quite a bit of time.

Neo4j and Cypher - How can I create/merge chained sequential node relationships (and even better time-series)?

To keep things simple, as part of the ETL on my time-series data, I added a sequence number property to each row corresponding to 0..370365 (370,366 nodes, 5,555,490 properties - not that big). I later added a second property and named it "outeseq" (original) and "ineseq" (second) to see if an outright equivalence to base the relationship on might speed things up a bit.
I can get both of the following queries to run properly on up to ~30k nodes (LIMIT 30000) but past that, its just an endless wait. My JVM has 16g max (if it can even use it on a windows box):
MATCH (a:BOOK),(b:BOOK)
WHERE a.outeseq=b.outeseq-1
MERGE (a)-[s:FORWARD_SEQ]->(b)
RETURN s;
or
MATCH (a:BOOK),(b:BOOK)
WHERE a.outeseq=b.ineseq
MERGE (a)-[s:FORWARD_SEQ]->(b)
RETURN s;
I also added these in hopes of speeding things up:
CREATE CONSTRAINT ON (a:BOOK)
ASSERT a.outeseq IS UNIQUE
CREATE CONSTRAINT ON (b:BOOK)
ASSERT b.ineseq IS UNIQUE
I can't get the relationships created for the entire data set! Help!
Alternatively, I can also get bits of the relationships built with parameters, but haven't figured out how to parameterize the sequence over all of the node-to-node sequential relationships, at least not in a semantically general enough way to do this.
I profiled the query, but did't see any reason for it to "blow-up".
Another question: I would like each relationship to have a property to represent the difference in the time-stamps of each node or delta-t. Is there a way to take the difference between the two values in two sequential nodes, and assign it to the relationship?....for all of the relationships at the same time?
The last Q, if you have the time - I'd really like to use the raw data and just chain the directed relationships from one nodes'stamp to the next nearest node with the minimum delta, but didn't run right at this for fear that it cause scanning of all the nodes in order to build each relationship.
Before anyone suggests that I look to KDB or other db's for time series, let me say I have a very specific reason to want to use a DAG representation.
It seems like this should be so easy...it probably is and I'm blind. Thanks!
Creating Relationships
Since your queries work on 30k nodes, I'd suggest to run them page by page over all the nodes. It seems feasible because outeseq and ineseq are unique and numeric so you can sort nodes by that properties and run query against one slice at time.
MATCH (a:BOOK),(b:BOOK)
WHERE a.outeseq = b.outeseq-1
WITH a, b ORDER BY a.outeseq SKIP {offset} LIMIT 30000
MERGE (a)-[s:FORWARD_SEQ]->(b)
RETURN s;
It will take about 13 times to run the query changing {offset} to cover all the data. It would be nice to write a script on any language which has a neo4j client.
Updating Relationship's Properties
You can assign timestamp delta to relationships using SET clause following the MATCH. Assuming that a timestamp is a long:
MATCH (a:BOOK)-[s:FORWARD_SEQ]->(b:BOOK)
SET s.delta = abs(b.timestamp - a.timestamp);
Chaining Nodes With Minimal Delta
When relationships have the delta property inside, the graph becomes a weighted graph. So we can apply this approach to calculate the shortest path using deltas. Then we just save the length of the shortest path (summ of deltas) into the relation between the first and the last node.
MATCH p=(a:BOOK)-[:FORWARD_SEQ*1..]->(b:BOOK)
WITH p AS shortestPath, a, b,
reduce(weight=0, r in relationships(p) : weight+r.delta) AS totalDelta
ORDER BY totalDelta ASC
LIMIT 1
MERGE (a)-[nearest:NEAREST {delta: totalDelta}]->(b)
RETURN nearest;
Disclaimer: queries above are not supposed to be totally working, they just hint possible approaches to the problem.

Fixed length path between two nodes without knowing relationship type

I want to find paths of a certain length between two nodes - but I don't know what relationships are involved. The tutorials and examples seems to require I know what type of relationship I want to use. E.g.
MATCH (martin { name:"Martin Sheen" })-[:ACTED_IN*1..3]-(x)
RETURN x
I want to ask the graph:
MATCH (martin { name:"Martin Sheen" })-[:3..3]-(x)
RETURN x
But Neo4J seems to hang (I have 600,000 nodes / 1.3m relationships). How can I find paths of a certain length between two nodes, in a performant manner, provided I have no information about the relationships?
Thanks
First of all, this is not surprising it is really slow. Since Neo4j2.0, the use of labels is almost mandatory for performance reasons.
So make sure to use labels + an indexed property at least for matching your first node.
Secondly, you can just match paths of fixed length without knowing the relationship types.
Assuming your "Martin Sheen" node has a label Person :
Create index for Person/name :
CREATE INDEX ON :Person(name);
Match the person and the path
MATCH p=(martin:Person {name:"Martin Sheen"})-[*3..3]->(x)
RETURN p

Get all Routes between two nodes neo4j

I'm working on a project where I have to deal with graphs...
I'm using a graph to get routes by bus and bike between two stops.
The fact is,all my relationship contains the time needed to go from the start point of the relationship and the end.
In order to get the shortest path between to node, I'm using the shortest path function of cypher. But something, the shortest path is not the fastest....
Is there a way to get all paths between two nodes not linked by a relationship?
Thanks
EDIT:
In fact I change my graph, to make it easier.
So I still have all my nodes. Now the relationship type correspond to the time needed to go from a node to another.
The shortestPath function of cypher give the path which contains less relationship. I would like that it returns the path where the addition of all Type (the time) is the smallest..
Is that possible?
Thanks
In cypher, to get all paths between two nodes not linked by a relationship, and sort by a total in a weight, you can use the reduce function introduced in 1.9:
start a=node(...), b=node(...) // get your start nodes
match p=a-[r*2..5]->b // match paths (best to provide maximum lengths to prevent queries from running away)
where not(a-->b) // where a is not directly connected to b
with p, relationships(p) as rcoll // just for readability, alias rcoll
return p, reduce(totalTime=0, x in rcoll: totalTime + x.time) as totalTime
order by totalTime
You can throw a limit 1 at the end, if you need only the shortest.
You can use the Dijkstra/Astar algorithm, which seems to be a perfect fit for you. Take a look at http://api.neo4j.org/1.8.1/org/neo4j/graphalgo/GraphAlgoFactory.html
Unfortunately you cannot use those from Cypher.

Resources