How can I get second shortpath? - neo4j

I created big graph database. Graph includes relationships between organizations. I would like get shortpath between two nodes.
I filtered relationships types by next query
MATCH (o:Organization) WHERE Id(o) = 23806112
MATCH (o1:Organization) WHERE Id(o1) = 24385058
MATCH p = allShortestPaths((o)-[*1..10]-(o1)) WHERE
ALL (r IN RELATIONSHIPS(p) WHERE
(type(r) = 'OrganizationFounderOrganization' AND r.DateFrom <= datetime('2019-03-13') )
OR (type(r) = 'OrganizationFounderPerson' AND r.DateFrom <= datetime('2019-03-13') )
OR (type(r) = 'OrganizationChief' AND r.DateFrom <= datetime('2019-03-13') )
OR (type(r) = 'OrganizationManagingCompany' AND r.DateFrom <= datetime('2019-03-13') )
OR (type(r) = 'OrganizationPhone')
OR (type(r) = 'OrganizationAddress' AND NOT EXISTS(r.DateTo) )
)
RETURN p SKIP 0 LIMIT 30
Filters by relationships do not may be combined because they could use other filters.
My execution plan by this query
http://joxi.ru/E2pNPDDH9Rpger
When I get paths, I filter nodes by other conditions (capital, status) and if I get not right paths I apply next query with filter by bad nodes.
MATCH (o:Organization) WHERE Id(o) = 23806112
MATCH (o1:Organization) WHERE Id(o1) = 24385058
MATCH p = allShortestPaths((o)-[*1..10]-(o1)) WHERE
ALL (r IN RELATIONSHIPS(p) WHERE
(type(r) = 'OrganizationFounderOrganization' AND r.DateFrom <= datetime('2019-03-13') )
OR (type(r) = 'OrganizationFounderPerson' AND r.DateFrom <= datetime('2019-03-13') )
OR (type(r) = 'OrganizationChief' AND r.DateFrom <= datetime('2019-03-13') )
OR (type(r) = 'OrganizationManagingCompany' AND r.DateFrom <= datetime('2019-03-13') )
OR (type(r) = 'OrganizationPhone')
OR (type(r) = 'OrganizationAddress' AND NOT EXISTS(r.DateTo) )
)
AND ALL (n IN NODES(p) WHERE NOT Id(n) IN [15665,1557884,7888953]
RETURN p SKIP 0 LIMIT 30
Execution plan second query
http://joxi.ru/KAxWPDDcMJGyY2
In the process of executing second query neo4j become unresponsive and I should restart container
Neo4j version 3.4.7
Store Sizes
Count Store 5.69 KiB
Label Store 16.02 KiB
Index Store 9.27 GiB
Schema Store 8.01 KiB
Array Store 8.01 KiB
Logical Log 16.53 MiB
Node Store 1.18 GiB
Property Store 20.33 GiB
Relationship Store 8.79 GiB
String Store 18.73 GiB
Total Store Size 58.56 GiB
ID Allocation
Node ID 84686407
Property ID 532508736
Relationship ID 276570526
Relationship Type ID 13
Container memory limit 122880mb
Processor 32 core

Related

Finding right path in Cypher Neo4j

I'm working with Flight Analyzer database (https://neo4j.com/graphgist/flight-analyzer).
We have there few nodes and relationships types.
Nodes:
Airport
(SEA:Airport { name:'SEA' })
Flight
(f0:Flight { date:'11/30/2015 04:24:12', duration:218, distance:1721, airline:'19977' })
Ticket
(t1f0:Ticket { class:'economy', price:1344.75 })
Relationships
Destination
(f0)-[:DESTINATION]->(ORD)
Origin
(f0)-[:ORIGIN]->(SEA)
Assign
(t1f0)-[:ASSIGN]->(f0)
Now I need to find some path and I have problem with that connection ORIGIN - FLIGHT - DESTINATION.
I need to find all airports that are connected to LAX airport with sum of ticket prices < 3000.
I tried
MATCH path = (origin:Airport { name:"LAX" })<-[r:ORIGIN|DESTINATION*..5]->(destination:Airport)
WHERE REDUCE(s = 0, n IN [x IN NODES(path) WHERE 'Flight' IN LABELS(x)] |
s + [(n)<-[:ASSIGN]-(ticket) | ticket.price][0]
) < 3000
RETURN path
but in this solution LAX can be ORIGIN and DESTINATION too. I only want to chose paths that always have the same order aiport1 <- origin - flight1 - destination -> airport2 <- origin - flight2 - destination -> aiport etc..
I need to include departure and arrive time so
flight1 date + duration < flight2 date then flight2 date + duration < flight3 date etc...
[UPDATED]
This query should check that:
matched paths have alternating ORIGIN/DESTINATION relationships, and
every departing flight lands at least 30 minutes before the next departing flight (if any), and
the sum of the ticket prices of the Flight nodes (which are every other node starting at the second one) < 3000
MATCH p = (origin:Airport {name: 'LAX'})-[:ORIGIN|DESTINATION*..5]-(destination:Airport)
WHERE
ALL(i IN RANGE(0, LENGTH(p)-1) WHERE
TYPE(RELATIONSHIPS(p)[i]) = ['ORIGIN', 'DESTINATION'][i] AND
(i%4 <> 1 OR (i + 2) > LENGTH(p) OR
(apoc.date.parse(NODES(p)[i].date,'m','MM/dd/yyyy hh:mm:ss') + NODES(p)[i].duration + 30) < apoc.date.parse(NODES(p)[i+2].date,'m','MM/dd/yyyy hh:mm:ss'))
) AND
REDUCE(s = 0, n IN [k IN RANGE(1, LENGTH(p), 2) | NODES(p)[k]] |
s + [(n)<-[:ASSIGN]-(ticket) | ticket.price][0]
) < 3000
RETURN p
The query uses the apoc.date.parse function to convert each date into the number of epoch minutes, so that a duration (assumed to also be in minutes) can be added to it.
I believe, you should create new relationships like flyto from an airport to an airport with ticket price and ticket class. it can be useful.
then you can find flights easier.
match
(a:Airport )<-[:ORIGIN]-(f:Flight)-[:DESTINATION ]->(b:Airport ),
(f)-[:ASSIGN]-(t:Ticket)
CREATE (a)-[r:FLY_TO {price:t.price,Class:t.class} ]->(b)

Conditional Order By Clause In Cypher Query Neo4j

Hi I want to sort my graph results by two filters..
My Cypher Query looks like this..
MATCH (n:User{user_id:304020})-[r:know]->(m:User) with m MATCH (m)-[s:like|create|share]->(o{is_active:1})
with m, s, o, (toInt(timestamp()/1000)-toInt(o.created_on))/86400 as days,
(toInt(timestamp()/1000)-toInt(o.created_on))/3600 as hours,
(1- round(o.impression_count_all/20)/50) as low_boost
with m,s,o,days,low_boost,hours,
CASE
WHEN days > 30 THEN 0.05
WHEN days >=20 AND days <=30 THEN 0.1
WHEN days >=10 AND days <=20 THEN 0.2
WHEN days >=5 AND days <=10 THEN 0.4
WHEN days >=2 AND days <=5 THEN 0.5
WHEN days =1 THEN 0.6
WHEN days < 1 THEN
CASE
WHEN hours <= 2 THEN 1
WHEN hours > 2 AND hours <= 8 THEN 0.9
WHEN hours > 8 AND hours <= 16 THEN 0.8
WHEN hours > 16 AND hours < 23 THEN 0.75
WHEN hours >= 23 AND hours <= 24 THEN 0.7
END
END as rs,
CASE
WHEN low_boost > 0 THEN low_boost
WHEN low_boost <= 0 THEN 0
END as lb
where has(o.trending_score_all) and has(o.impression_count_all) and not(o.is_featured=2)
RETURN distinct o.story_id as story_id,
(o.trending_score_all*4) as ts, (o.trending_score_all + rs + lb) as final_score,
count(s) as rel_count,max(s.activity_id) as id, toInt(o.created_on) as created_on
ORDER BY (CASE WHEN ts > 3 THEN final_score desc, rel_count desc ELSE ts) END) DESC
skip 0 limit 10;
Now If ts > 3 ,I want to sort results by both final_score and rel_count ELSE srt only by ts..
Please modify order by..
Does this much simplified query (which uses a single argument for ORDER BY) work for you?
MATCH (u:User)-[r:like]->(s:Story)
WITH s, (s.trending_score_all*4) AS ts
RETURN DISTINCT s.story_id, ts, TOINT(s.impression_count_72)
ORDER BY (CASE WHEN ts > 3 THEN ts ELSE TOINT(s.impression_count_72) END) DESC
LIMIT 10;
[EDITED]
If you need to sort by a varying number of values (depending on the situation) you have to use a workaround, as Cypher does not support that directly.
For example, suppose when (ts > 3) you wanted to order by ts DESC and then by s.story_id ASC. In this situation, you could change the above ORDER BY clause to this:
ORDER BY
(CASE WHEN ts > 3 THEN ts ELSE TOINT(s.impression_count_72) END) DESC,
(CASE WHEN ts > 3 THEN s.story_id ELSE NULL END) ASC
By using NULL (or any literal value) in this way, you can have any of the sub-sorts effectively do nothing.
1) You use pagination (skip and limit)
2) If I understand what you need, then add "else" to sort:
UNWIND RANGE(1,100) as i
WITH i, rand()*5 as x, toInt(rand()*10) as y
RETURN i, x, y, CASE WHEN x>3 THEN 1 ELSE 0 END as for_sort
ORDER BY
CASE
WHEN for_sort=1 THEN x ELSE y END DESC,
CASE
WHEN for_sort=1 THEN y ELSE x END DESC

Find top 3 nodes with maximum relationships

The structure of my data base is:
( :node ) -[:give { money: some_int_value } ]-> ( :Org )
One node can have multiple relations.
I need to find top 3 nodes with the most number of relations :give with their property money holding: vx <= money <= vy
Using ORDER BY and LIMIT should solve your problem:
Match ( n:node ) -[r:give { money: some_int_value } ]-> ( :Org )
RETURN n
ORDER BY count(r) DESC //Order by the number of relations each node has
LIMIT 3 //We only want the top 3 nodes
Instead of using the label 'node', maybe use something more descriptive like Person for the label so the datamodel is more clear:
MATCH (p:Person)-[r:give]->(o:Org)
WITH count(r) AS num, sum(r.money) AS total, p
RETURN p, num, total ORDER BY num DESC LIMIT 3;
I'm not sure what you mean by "their property money holding: vx <= money <= vy". If you could clarify I can update my answer accordingly. You can calculate the total of the money properties using the sum() function.
Edit
To only include relationships with money property with value greater than 10 and less 25:
MATCH (p:Person)-[r:give]->(o:Org)
WHERE r.money >= 10 AND r.money <= 25
WITH count(r) AS num, sum(r.money) AS total, p
RETURN p, num, total ORDER BY num DESC LIMIT 3;

Cypher: find a path which takes the maximum valued step each time

I am trying to write a cypher query that finds a path between nodes a and b such that each step has the maximum timestamp value out of all available alternatives that is less than 15.
Here is my query so far, it does everything except for select the maximum possible timestamp at each step. How do I express this condition?
MATCH path=(a:NODE)-[rs:PARENT*]->(b:NODE)
WHERE a.name = 'SOME_VALUE' and b.name = 'SOME_OTHER_VALUE' AND ALL (r IN rs
WHERE r.timestamp < 15)
RETURN path
This is just awful sudo code but I think it expresses what I am looking for
MATCH path=(a:NODE)-[rs:PARENT*]->(b:NODE)
WHERE a.name = 'SOME_VALUE' and b.name = 'SOME_OTHER_VALUE' AND ALL (r IN rs
WHERE r.timestamp < 15 AND r.timestamp = max(allPossibleRsForThisStep))
RETURN path
Can this kind of query be written in cypher?
It won't be fast in cypher, it's possible to compute all maximum values first and then do what you want to do by compare the max value in a list with the current value.
Something like this (not sure if it works)
WITH range(1,10) as max_vals // a list with 10 values (actual values are not that important)
MATCH (a:NODE)-[rs:PARENT*..10]->(b:NODE)
WHERE a.name = 'SOME_VALUE' and b.name = 'SOME_OTHER_VALUE'
WITH a,b,
map(idx in range(0,size(rs)) |
max_vals[idx] = case when max_vals[idx]<rs[idx].timestamp then rs[idx].timestamp else max_vals[idx] end ), max_vals
MATCH path=(a)-[rs:PARENT*..10]->(b)
AND ALL (idx in range(0,size(rs) WHERE rs[idx].timestamp < 15 AND rs[idx].timestamp = max_vals[idx])
RETURN path

How to verify that specific paths do not exist in cypher query

I would like to get nodes, which do no have a specific relation (a relation with specific properties).
The graph contains entity nodes (the n's), which occur at specific lines (line_nr) in files (the f).
The current query I have is as follows:
start n=node:entities("text:*")
MATCH p=(n)-[left:OCCURS]->(f)
, p4=(f)<-[right4?:OCCURS]-(n4)
, p7=(f)<-[right7?:OCCURS]-(n7)
WHERE ( (
( n4.text? =~ "nonreachablenodestextregex" AND (p4 = null OR left.line_nr < right4.line_nr - 0 OR left.line_nr > right4.line_nr + 0 OR ID(left) = ID(right4) ) ) )
AND (
( n7.text? =~ "othernonreachablenodestextregex" AND (p7 = null OR left.line_nr < right7.line_nr - 0 OR left.line_nr > right7.line_nr + 0 OR ID(left) = ID(right7) ) ) ) )
WITH n, left, f, count(*) as group_by_cause
RETURN ID(left) as occ_id,
n.text as ent_text,
substring(f.text, ABS(left.file_offset-1), 2 + LENGTH(n.text) ) as occ_text,
f.path as file_path,
left.line_nr as occ_line_nr,
ID(f) as file_id
Instead of a new path in the MATCH clause, I thought it would also be possible to have:
NOT ( (f)<-[right4:OCCURS]-(n4) )
But, I do not want to exclude the existence of any path, but specific paths.
As an alternative solution, I thought to include additional start nodes (as I have an index on the not to be reachable node), to remove the text comparison in the WHERE clause. This however doesn't return anything if there are no nodes in neo4j matching the wildcard.
start n=node:entities("text:*")
, n4=node:entities("text:nonreachablenodestextwildcard")
, n7=node:entities("text:othernonreachablenodestextwildcard")
MATCH p=(n)-[left:OCCURS]->(f)
, p4=(f)<-[right4?:OCCURS]-(n4)
, p7=(f)<-[right7?:OCCURS]-(n7)
WHERE ( (
( (p4 = null
OR left.line_nr < right4.line_nr - 0
OR left.line_nr > right4.line_nr + 0
OR ID(left) = ID(right4) ) ) )
AND (
( (p7 = null
OR left.line_nr < right7.line_nr - 0
OR left.line_nr > right7.line_nr + 0
OR ID(left) = ID(right7) ) )
) )
Old Update:
As mentioned in the answers, I could use the predicate functions to construct an inner query. I therefore updated the query to:
start n=node:entities("text:*")
MATCH p=(n)-[left:OCCURS]->(f)
WHERE ( (
(NONE(path in (f)<-[:OCCURS]-(n4)
WHERE
(LAST(nodes(path))).text =~ "nonreachablenodestextRegex"
AND FIRST(r4 in rels(p)).line_nr <= left.line_nr
AND FIRST(r4 in rels(p)).line_nr >= left.line_nr
)
) )
AND (
(NONE(path in (f)<-[:OCCURS]-(n7)
WHERE
(LAST(nodes(path))).text =~ "othernonreachablenodestextRegex"
AND FIRST(r7 in rels(p)).line_nr <= left.line_nr
AND FIRST(r7 in rels(p)).line_nr >= left.line_nr
)
) )
)
WITH n, left, f, count(*) as group_by_cause
RETURN ....
This gives me an java.lang.OutOfMemoryException :
java.lang.OutOfMemoryError: Java heap space
at java.util.regex.Pattern.compile(Pattern.java:1432)
at java.util.regex.Pattern.<init>(Pattern.java:1133)
at java.util.regex.Pattern.compile(Pattern.java:823)
at scala.util.matching.Regex.<init>(Regex.scala:38)
at scala.collection.immutable.StringLike$class.r(StringLike.scala:226)
at scala.collection.immutable.StringOps.r(StringOps.scala:31)
at org.neo4j.cypher.internal.parser.v1_9.Base.ignoreCase(Base.scala:31)
at org.neo4j.cypher.internal.parser.v1_9.Base.ignoreCases(Base.scala:49)
at org.neo4j.cypher.internal.parser.v1_9.Base$$anonfun$ignoreCases$1.apply(Base.scala:49)
at org.neo4j.cypher.internal.parser.v1_9.Base$$anonfun$ignoreCases$1.apply(Base.scala:49)
at scala.util.parsing.combinator.Parsers$Parser.p$3(Parsers.scala:209)
at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$1.apply(Parsers.scala:210)
at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$1.apply(Parsers.scala:210)
at scala.util.parsing.combinator.Parsers$Failure.append(Parsers.scala:163)
at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:210)
at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:210)
at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:183)
at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$1.apply(Parsers.scala:210)
at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$1.apply(Parsers.scala:210)
at scala.util.parsing.combinator.Parsers$Failure.append(Parsers.scala:163)
(The last 6 lines are repeated a few more times)
Solution
The previous update probably contains a syntax error somewhere, got it fixed slightly different as follows:
start n=node:entities("text:*")
MATCH p=(n)-[left:OCCURS]->(f)
WHERE (
(NONE ( path in (f)<-[:OCCURS]-()
WHERE
ANY(n4 in nodes(path)
WHERE ID(n4) <> ID(n)
AND n4.type = 'ENTITY'
AND n4.text =~ "a regex expr"
)
AND ALL(r4 in rels(path)
WHERE r4.line_nr <= left.line_nr + 0
AND r4.line_nr >= left.line_nr - 0
)
)
) )
AND
NONE ( ...... )
WITH n, left, f, count(*) as group_by_cause
RETURN ...
It is however slow. Order of seconds (>10) for small graph:
4 entities-nodes and 6 :OCCURS relations in total, all to 1 single destination f node, with line_nr's between 0 and 3.
Performance Update
The following is about twice as fast:
start n=node:entities("text:*")
MATCH p=(n)-[left:OCCURS]->(f)
, p4=(f)<-[right4?:OCCURS]-(n4)
, p7=(f)<-[right7?:OCCURS]-(n7)
WHERE
( n4.text? =~ "regex1"
AND (p4 = null
OR left.line_nr < right4.line_nr - 0
OR left.line_nr > right4.line_nr + 0
OR ID(left) = ID(right4)
)
)
AND
( n7.text? =~ "regex2"
AND (p7 = null .....)
)
WITH n, left, f, count(*) as group_by_cause
RETURN ....
I think instead of the optional relationships you should use the pattern predicates in WHERE as you noted. The pattern expressions actually return a collection of paths, so you could do collection predicates like (ALL, NONE, ANY, SINGLE)
WHERE NONE(path in (f)<-[:OCCURS]-(n4) WHERE
ALL(r in rels(p) : r.line_nr = 42 ))
see: http://docs.neo4j.org/chunked/milestone/query-function.html#_predicates

Resources