NEO4J Find continual route by time - neo4j

I have 40 station, identified by ID, then a I have about 30k relations between this station, realation has time property (arrival a departure time and name of line).
I need find route between station A and B, but with specific time range.
For example:
between station A and C is not direct route, you must use
A -> B -> C = means id: 1 -> 2 -> 3
I am using this query:
MATCH p=(s1:L2Station{id:1})-[r:RIDE*]->(s2:L2Station{id:3}) WHERE ALL(x in r where x.deptime>=1438605300 AND x.deptime<=1438691700)
WITH reduce(acc = [], route in rels(p)|
CASE
WHEN toInt(route.arrtime) < last(extract(b in acc| b.deptime)) THEN null
WHEN length(acc) > 0 AND last(extract(a in acc| a.rid)) = route.rid THEN acc + route
ELSE acc + route
END) as reducedRoutes
WHERE reducedRoutes is not null
return reducedRoutes, length(reducedRoutes) as len
order by len;
but this query a took about 8minutes :(
If I use this query:
MATCH p=(s1:L2Station{id:1})-[r:RIDE]->(s2:L2Station{id:3}) WHERE r.deptime>1438732800 AND r.deptime<1438819200
....
returns nothing. I am able get only station with direct route.
Can anybody help me?
Thanks
Ondra

You are probably seeing the impact of attempting to match with an unbounded path length: [r:RIDE*]. Logically, this forces neo4j to follow every possible path that starts at station 1, to see which one ends at station 2. Probably only a few of the attempted paths will ultimately match, but neo4j is forced to follow every path to the bitter end.
If possible, you should try putting an upper bound on the path length. For example, to match paths up to length 3: [r:RIDE*..3].

Related

Traversing Relationships a Variable Number of Times in Cypher

I have a graph of Airports, Routes between them and Airlines that carry it. I created routes as separate nodes, rather than just a relationship, so that I can connect each with an Airline, and other nodes.
Each Route node has an IS_FROM relationship with the origin airport and an IS_TO relationship with the destination. It also has an IS_BY relationship with its airline:
I am trying to traverse this tree, n times, for routes between two airports. For example, if n = 3, I want to get all the routes, that will lead from LAX to LHR, with 3 or fewer connections.
So basically, my result would be a union of the following:
No Connecting Airports:
MATCH (a1:Airport {iata : 'LAX'})<-[:IS_FROM]-(r:Route)-[:IS_TO]->(a2:Airport {iata : 'LHR'}), (r)-[:IS_BY]->(ai:Airline) return a1 , r , a2 , ai;
1 Connecting airports:
MATCH (a1:Airport {iata : 'LAX'})<-[:IS_FROM]-(r:Route)-[:IS_TO]->(a2:Airport)<-[IS_FROM]-(r2:Route)-[:IS_TO]->(a3:Airport {iata: 'LHR'}), (r2)-[:IS_BY]->(ai:Airline) return a1 , r , a2 , a3 , r2 , ai;
and so on.
So the query should dynamically traverse the (:Airport)<-[:IS_FROM]-(:Route)-[:IS_TO]->(:Airport) pattern n times, and return the nodes (I am more interested in returning the Airlines that connect to those routes.
You can first extract all the paths between Airports that are 3 or less hops away, and then use OPTIONAL MATCH to see which of the nodes in the path are Routes, and which Airlines are offering them.
MATCH path = (:Airport {iata:$start_airport})-[*2..6]-(:Airport {iata:$end_airport})
WITH path,nodes(path) as path_airports_and_connecting_routes
UNWIND path_airports_and_connecting_routes as node
OPTIONAL MATCH (node)-[:IS_BY]-(airline:Airline)
WITH collect(node) as airports_and_routes,airline
RETURN airports_and_routes + [airline]
Caveat: Variable-length paths don't allow passing parameters, so you can't do something like [*2..2*n].
I don't know if i got your question right. To me your problem could be solved this way:
MATCH (a1:Airport {iata : 'LAX'})<-[r1:IS_FROM]-(r:Route)-[r2:IS_TO]->(a2:Airport{iata : 'LHR'})
OPTIONAL MATCH (r)-[r3:IS_BY]->(ai:Airline)
RETURN a1,r1,r,r2,a2,r3,ai

Iterate through Neo4j graph matching on node properties

I have for example the following graph in Neo4j
(startnode)-[:BELONG_TO]-(Interface)-[:IS_CONNECTED]-(Interface)-[:BELONG_TO]-
#the line below can repeat itself 0..n times
(node)-[:BELONG_TO]-(Interface)-[:IS_CONNECTED]-(Interface)-[:BELONG_TO]-
#up to the endnode
(endnode)
There is an Interface properties I also need to match on. I do not want to follow all the paths, I just the one with Interface Node property I am looking for. For example Interface.VlanList CONTAINS ",23,"
I have done the following in Cypher but it applies that I already know how many iterations I am going to find which in reality is not the case.
match (n:StartNode {name:"device name"}) -[:BELONG_TO]- (i:Interface) -[:IS_CONNECTED]- (ii:Interface)-[:BELONG_TO]-(nn:Node) -[:BELONG_TO]- (iii:Interface) -[:IS_CONNECTED]- (iiii:Interface) -[:BELONG_TO]-(nnn:Node)
where i.VlanList CONTAINS ",841,"
AND ii.VlanList CONTAINS ",841,"
AND iii.VlanList CONTAINS ",841,"
return n, i,ii,nn,iii,iiii,nnn
I have been looking at the documentation but can not work out how the above could be resolved.
This should work:
// put the searchstring in a variable
WITH ',841,' AS searchstring
// look up start end endnode
MATCH (startNode: .... {...}), (endNode: .... {...})
// look for paths of variable length
// that have your search string in all nodes,
// except the first and the last one
WITH searchstring,startNode,endNode
MATCH path=(startnode)-[:BELONG_TO|IS_CONNECTED*]-(endnode)
WHERE ALL(i IN nodes(path)[1..-1] WHERE i.VlanList CONTAINS searchstring)
RETURN path
You can also look at https://neo4j.com/labs/apoc/4.1/graph-querying/path-expander/ for more ideas about how you can limit the pathfinding.
This query should work for you (assuming that the relationship directions I chose are correct):
MATCH p = (sNode:StartNode)-[:BELONG_TO]->(i1:Interface)-[:IS_CONNECTED]->(i2:Interface)-[:BELONG_TO]->(n1)-[:BELONG_TO|IS_CONNECTED*0..]->(eNode:Node)
WHERE sNode.name = "device name" AND eNode.name = "foo" AND LENGTH(p)%3 = 0
WITH p, i1, i2, n1, eNode, RELATIONSHIPS(p) AS rels, NODES(p) AS ns
WHERE n1 = eNode OR (
ALL(j IN RANGE(3, SIZE(rels)-3, 3) WHERE
'BELONG_TO' = TYPE(rels[j]) = TYPE(rels[j+2]) AND
'IS_CONNECTED' = TYPE(rels[j+1])) AND
ALL(x IN ([i1, i2] + REDUCE(s = [], i IN RANGE(3, SIZE(ns)-2, 3) | CASE WHEN i%3 = 0 THEN s ELSE s +ns[i] END))
WHERE x:Interface AND x.VlanList CONTAINS $substring)
)
RETURN p
It checks that the returned paths have the required pattern of node labels, node property value, and relationship types. It takes advantage of the variable length relationship syntax, using zero as the lower bound. Since there is no upper bound, the variable length relationship query query can take "forever" to finish (and in such a situation, you should use a reasonable upper bound).

Optimizing a Cypher query to improve performance

The query I've written returns accurate results based on some random testing I've done. However, the query execution takes really long (7699.43 s)
I need help optimising this query.
count(Person) -> 67895
count(has_POA) -> 355479
count(POADocument) -> 40
count(issued_by) -> 40
count(Company) -> 21
count(PostCode) -> 9845
count(Town) -> 1673
count(in_town) -> 9845
count(offers_services_in) -> 17107
All the entity nodes are indexed on Id's (not Neo4j IDs). The PostCode nodes are also indexed on PostCode.
MATCH pa= (p:Person)-[r:has_POA]->(d:POADocument)-[:issued_by]->(c:Company),
(pc:PostCode),(t:Town) WHERE r.recipient_postcode=pc.PostCode AND (pc)-
[:in_town]->(t) AND NOT (c)-[:offers_services_in]->(t) RETURN p as Person,r
as hasPOA,t as Town, d as POA,c as Company
Much thanks in advance!
-Nancy
I made some changes in your query:
MATCH (p:Person)-[r:has_POA {recipient_code : {code} }]->(d:POADocument)-[:issued_by]->(c:Company),
(pc:PostCode {PostCode : {PostCode} })-[:in_town]->(t:Town)
WHERE NOT (c)-[:offers_services_in]->(t)
RETURN p as Person, r as hasPOA, t as Town, d as POA, c as Company
Since you are not using the entire path, removed pa variable
Moved the pattern existence check ((pc)-[:in_town]->(t)) from WHERE to MATCH.
Using parameters instead of the equality check r.recipient_postcode = pc.PostCode in where. If you are running the query in Neo4j Browser, you can set the parameters running the command :params {code : 10}.
Here is a simplified version of your current query.
MATCH (p:Person)-[r:has_POA]->(d:POADocument)-[:issued_by]->(c:Company)
MATCH (t:Town)<-[:in_town]-(pc:PostCode{PostCode:r.recipient_postcode})
WHERE NOT (c)-[:offers_services_in]->(t)
RETURN p as Person,r as hasPOA,t as Town, d as POA,c as Company
Your big performance hits are going to be on the Cartesian product between all the match sets, and the raw amount of data you are asking for.
In this simplified version, I'm using one less match, and the second match uses a variable from the first match to avoid generating a Cartesian product. I would also recommend using LIMIT and SKIP to page your results to limit data transfer.
If you can adjust your model, I would recommend converting the has_POA relation to an issued_POA node so that you can take advantage of Neo4j's relation finding on the 2 postcodes related to that instance, and making the second match a gimme instead of an extra indexed search (after you adjust the query to match the new model, of course).

Window in cypher

So basically it comes down to this. I have a (:PERSON) that used his (:CAR) at a given (:TIME). This triplet is fully connected. It might be that a (:CAR) is used by other (:PERSON) and a (:PERSON) can use multiple (:CAR) all of that at different (:TIME).
What I want to query is that for each combination (p:PERSON)-[:AT]-(t:TIME) I want the number of cars used in t-6H (p-[:USED]-(c:CAR)-[:AT]-(o:TIME) in t-6H).
Here is what I have achieved so far, but this only takes each :PERSON once.
MATCH (n:PERSON)-[:AT]-(t:TIME)
WITH n,t
MATCH (n)-[:USED]-(c:CAR)-[:AT]-(o:TIME)
WITH n,t,c,toFLoat(t.id) as current, toFloat(o.id) as previous
WITH n,t,c,current-previous as diff
WHERE (diff) >= 0 AND (diff) <= 3600*6
WITH n, count(distinct c) as cnt
RETURN n, cnt
Where :TIME(id) is a String containing the time in seconds
Hope this is clear. Thanks for the help.
You should count on person and 't' :
MATCH (n:PERSON)-[:AT]-(t:TIME)
WITH n,t
MATCH (n)-[:USED]-(c:CAR)-[:AT]-(o:TIME)
WITH n,t,c,toFLoat(t.id) - toFloat(o.id) as diff
WHERE (diff) >= 0 AND (diff) <= 3600*6
WITH n,t, count(distinct c) as cnt
RETURN n,t, cnt
Also you should make your TIME(id) a numeric value so you can remove the toFloat from your query which will improve the performance.
Maybe you should put your t Time in your USED relation.
Either you'll want only one USED per Person + Car then have a collection of times (no nice for querying)
or you'll have multiple USED

Unknown Error with Cypher Request for sum() over Subtrees

Trying to make following cypher request match (n:FOLDER)-[r*]->(m:FILE) with n,sum(m.size) as calc SET n.calculatedSize=calc
after about one minute the cypher browser says Unknown error.
My Request should sumarize the size of the whole subtree. So every folder should have a summarized size of all it subitems (FOLDER and FILE). in Production environment there will be about 9million items with a depth of max 15.
Why the Request returns Unknown error, is there any better way to achieve the calculated size?
fadanner,
You might find it is faster to first do a one-level calculation to sum the file sizes into their immediate parent folders, then work up.
MATCH (n:FOLDER)-[r]-(m:FILE)
WITH n, sum(m.size) as calc
SET n.calculatedSize = calc
Set a temporary property on all FOLDER nodes to indicate whether they have been visited yet.
MATCH (m:FOLDER) set m.seen = 0
Mark the leaf folders as seen.
MATCH (m:FOLDER)
WHERE NOT (m)-[:CONTAINS]->(:FOLDER)
SET m.seen = 1
Repeatedly apply this query until the return value is zero to calculate all the sizes.
MATCH (m:FOLDER {seen : 0})-[:CONTAINS]->(n:FOLDER)
WITH m, sum(n.seen) AS val1, count(n) AS val2, sum(n.calculatedSize) AS val3
WHERE val1 = val2
SET m.calculatedSize = m.calculatedSize + val3, m.seen=1
RETURN count(m)
Once you are done, remove the 'seen' properties with
MATCH(m:FOLDER)
REMOVE m.seen
Hope this helps.
Grace and peace,
Jim
Try to specify a limit in your variable path length:
match (n:FOLDER)-[r*..15]->(m:FILE)
with n,sum(m.size) as calc
SET n.calculatedSize=calc

Resources