Cypher - Number ordered relationship - neo4j

I am stumbling on a cypher query. I would like to create a relationship where there is an implicit order based on an integer property per instruction. For example, I would like to match address_1 to address_2 and address_2 to address_3, and so forth when the WHERE condition applies. Here is the statement attempted:
MATCH (i:Instruction), (i2:Instruction)
WHERE i.address < i2.address AND i.vars_written = i2.vars_read
MERGE (i)-[r:USED_BY]->(i2)
RETURN i, i2, r
The issue is that a lower number address will map to all other addresses that fit the WHERE condition instead of just to the relevant ones - creating additional relationships where there shouldn't be, like address_1 to address_3.
One instruction may have multiple USED_BY relationships but once another instruction has the same vars_written property, it should start a new USED_BY relationship.
To provide a concrete example, consider the following 4 nodes:
Index 1 parameters: {
address: 1
vars_written: var_b
vars_read: var_a
}
Index 2 parameters: {
address: 2
vars_written: var_c
vars_read: var_b
}
Index 3 parameters: {
address: 3
vars_written: var_c <- Same vars_written as Index 2
vars_read: var_b
}
Index 4 parameters: {
address: 4
vars_written: var_e
vars_read: var_c <- var_c was overwritten in Index 3, should NOT map to Index 2
}
Produces:
1 -[USED_BY]-> 2
1 -[USED_BY]-> 3
2 -[USED_BY]-> 4 (This should not be made because 3 should only map to 4!)
3 -[USED_BY]-> 4

Credit to the Neo4J team for answering on their Slack channel:
MATCH (i:Instruction), (i2:Instruction)
WHERE i.address < i2.address AND i.vars_written = i2.vars_read
WITH i, i2 ORDER BY i2.address ASC // <-- order the i2 by it's address
WITH i, head(collect(i2)) as i2 // <-- collect all of the matches and take the first
MERGE (i)-[r:USED_BY]->(i2)
RETURN i, i2, r

Related

Finding right path in Cypher Neo4j

I'm working with Flight Analyzer database (https://neo4j.com/graphgist/flight-analyzer).
We have there few nodes and relationships types.
Nodes:
Airport
(SEA:Airport { name:'SEA' })
Flight
(f0:Flight { date:'11/30/2015 04:24:12', duration:218, distance:1721, airline:'19977' })
Ticket
(t1f0:Ticket { class:'economy', price:1344.75 })
Relationships
Destination
(f0)-[:DESTINATION]->(ORD)
Origin
(f0)-[:ORIGIN]->(SEA)
Assign
(t1f0)-[:ASSIGN]->(f0)
Now I need to find some path and I have problem with that connection ORIGIN - FLIGHT - DESTINATION.
I need to find all airports that are connected to LAX airport with sum of ticket prices < 3000.
I tried
MATCH path = (origin:Airport { name:"LAX" })<-[r:ORIGIN|DESTINATION*..5]->(destination:Airport)
WHERE REDUCE(s = 0, n IN [x IN NODES(path) WHERE 'Flight' IN LABELS(x)] |
s + [(n)<-[:ASSIGN]-(ticket) | ticket.price][0]
) < 3000
RETURN path
but in this solution LAX can be ORIGIN and DESTINATION too. I only want to chose paths that always have the same order aiport1 <- origin - flight1 - destination -> airport2 <- origin - flight2 - destination -> aiport etc..
I need to include departure and arrive time so
flight1 date + duration < flight2 date then flight2 date + duration < flight3 date etc...
[UPDATED]
This query should check that:
matched paths have alternating ORIGIN/DESTINATION relationships, and
every departing flight lands at least 30 minutes before the next departing flight (if any), and
the sum of the ticket prices of the Flight nodes (which are every other node starting at the second one) < 3000
MATCH p = (origin:Airport {name: 'LAX'})-[:ORIGIN|DESTINATION*..5]-(destination:Airport)
WHERE
ALL(i IN RANGE(0, LENGTH(p)-1) WHERE
TYPE(RELATIONSHIPS(p)[i]) = ['ORIGIN', 'DESTINATION'][i] AND
(i%4 <> 1 OR (i + 2) > LENGTH(p) OR
(apoc.date.parse(NODES(p)[i].date,'m','MM/dd/yyyy hh:mm:ss') + NODES(p)[i].duration + 30) < apoc.date.parse(NODES(p)[i+2].date,'m','MM/dd/yyyy hh:mm:ss'))
) AND
REDUCE(s = 0, n IN [k IN RANGE(1, LENGTH(p), 2) | NODES(p)[k]] |
s + [(n)<-[:ASSIGN]-(ticket) | ticket.price][0]
) < 3000
RETURN p
The query uses the apoc.date.parse function to convert each date into the number of epoch minutes, so that a duration (assumed to also be in minutes) can be added to it.
I believe, you should create new relationships like flyto from an airport to an airport with ticket price and ticket class. it can be useful.
then you can find flights easier.
match
(a:Airport )<-[:ORIGIN]-(f:Flight)-[:DESTINATION ]->(b:Airport ),
(f)-[:ASSIGN]-(t:Ticket)
CREATE (a)-[r:FLY_TO {price:t.price,Class:t.class} ]->(b)

Neo4j Cypher round value

I have the following Cypher:
MATCH (v:Value)-[:CONTAINS]->(hv:HistoryValue)
WHERE v.id = {valueId}
OPTIONAL MATCH (hv)-[:CREATED_BY]->(u:User)
WHERE {fetchCreateUsers}
WITH u, hv ORDER BY hv.createDate DESC
WITH count(hv) as count, count(hv) / {maxResults} as step, COLLECT({userId: u.id, historyValueId: hv.id, historyValue: hv.originalValue, historyValueCreateDate: hv.createDate}) AS data
RETURN REDUCE(s = [], i IN RANGE(0, count - 1, CASE step WHEN 0 THEN 1 ELSE step END) | s + data[i]) AS result, step, count
right now:
count(hv) = 260
{maxResults} = 100
The step variable equals 2 but I expect round(260/100) = 3
I tried the following round(count(hv) / {maxResults}) as step but step is still 2.
How to fix my query in order to get a proper round (3 as step variable in this particular case)?
Use toFloat() in one of the values:
return round(toFloat(260) / 100)
Output:
╒═══════════════════════════╕
│"round(toFloat(260) / 100)"│
╞═══════════════════════════╡
│3 │
└───────────────────────────┘
You're currently doing integer division. If you enter return 260/100 you'll get 2, and that's the value that gets rounded (though there's nothing to round, so you get 2 back).
You need to be working with floating point values. You can do this by having maxResults have an explicit decimal (100.0), or use toFloat() around either maxResults or the count. Both return 260/100.0 and return toFloat(260)/100 or return 260/toFloat(100) will result in 2.6. If you round() that you'll get your expected 3 value.

Cypher Neo4j query optimization

I use the following Cypher query:
MATCH (v:Value)-[:CONTAINS]->(hv:HistoryValue)
WHERE v.id = {valueId}
OPTIONAL MATCH (hv)-[:CREATED_BY]->(u:User)
WHERE {fetchCreateUsers}
WITH u, hv
ORDER BY hv.createDate DESC
WITH count(hv) as count, ceil(toFloat(count(hv)) / {maxResults}) as step, COLLECT({userId: u.id, historyValueId: hv.id, historyValue: hv.originalValue, historyValueCreateDate: hv.createDate}) AS data
RETURN REDUCE(s = [], i IN RANGE(0, count - 1, CASE step WHEN 0 THEN 1 ELSE step END) | s + data[i]) AS result, step, count
This query works fine and does exactly what I need.
Right now I'm concerned about two possible issues inside of this query from the performance point of view and Cypher best practices.
First of all, as you may see - I two times use the same count(hv) function. Will it cause the problems during the execution from the performance point of view or Cypher and Neo4j are smart enough to optimize it? If no, please show how to fix it.
And the second place is the CASE statement inside range() function? The same question here - will this CASE statement be executed only once or every time for every iteration over my range? Please show how to fix it if needed.
UPDATED
I tried to do a separator WITH for count but the query doesn't return the results(returns empty result)
MATCH (v:Value)-[:CONTAINS]->(hv:HistoryValue)
WHERE v.id = {valueId}
OPTIONAL MATCH (hv)-[:CREATED_BY]->(u:User)
WHERE {fetchCreateUsers}
WITH u, hv ORDER BY hv.createDate DESC
WITH u, hv, count(hv) as count
WITH u, hv, count, ceil(toFloat(count) / {maxResults}) as step, COLLECT({userId: u.id, historyValueId: hv.id, historyValue: hv.originalValue, historyValueCreateDate: hv.createDate}) AS data
RETURN REDUCE(s = [], i IN RANGE(0, count - 1, CASE step WHEN 0 THEN 1 ELSE step END) | s + data[i]) AS result, step, count
1 MATCH (v:Value)-[:CONTAINS]->(hv:HistoryValue)
2 WHERE v.id = {valueId}
3 OPTIONAL MATCH (hv)-[:CREATED_BY]->(u:User)
4 WHERE {fetchCreateUsers}
5 WITH u, hv
6 ORDER BY hv.createDate DESC
7 WITH count(hv) as count, ceil(toFloat(count(hv)) / {maxResults}) as step, COLLECT({userId: u.id, historyValueId: hv.id, historyValue: hv.originalValue, historyValueCreateDate: hv.createDate}) AS data
8 RETURN REDUCE(s = [], i IN RANGE(0, count - 1, CASE step WHEN 0 THEN 1 ELSE step END) | s + data[i]) AS result, step, count
(1) You need to pass hv in line 5, because it's values are collected in line 7. That said, you can still do something like this:
5 WITH u, collect(hv) AS hvs, count(hv) as count
UNWIND hvs AS hv
However, this is not very elegant and probably not worth doing.
(2) You can calculate the CASE expression in line 7:
7 WITH count, data, step, CASE step WHEN 0 THEN 1 ELSE step END AS stepFlag
8 RETURN REDUCE(s = [], i IN RANGE(0, count - 1, stepFlag) | s + data[i]) AS result, step, count

Find top 3 nodes with maximum relationships

The structure of my data base is:
( :node ) -[:give { money: some_int_value } ]-> ( :Org )
One node can have multiple relations.
I need to find top 3 nodes with the most number of relations :give with their property money holding: vx <= money <= vy
Using ORDER BY and LIMIT should solve your problem:
Match ( n:node ) -[r:give { money: some_int_value } ]-> ( :Org )
RETURN n
ORDER BY count(r) DESC //Order by the number of relations each node has
LIMIT 3 //We only want the top 3 nodes
Instead of using the label 'node', maybe use something more descriptive like Person for the label so the datamodel is more clear:
MATCH (p:Person)-[r:give]->(o:Org)
WITH count(r) AS num, sum(r.money) AS total, p
RETURN p, num, total ORDER BY num DESC LIMIT 3;
I'm not sure what you mean by "their property money holding: vx <= money <= vy". If you could clarify I can update my answer accordingly. You can calculate the total of the money properties using the sum() function.
Edit
To only include relationships with money property with value greater than 10 and less 25:
MATCH (p:Person)-[r:give]->(o:Org)
WHERE r.money >= 10 AND r.money <= 25
WITH count(r) AS num, sum(r.money) AS total, p
RETURN p, num, total ORDER BY num DESC LIMIT 3;

Cypher: analog of `sort -u` to merge 2 collections?

Suppose I have a node with a collection in a property, say
START x = node(17) SET x.c = [ 4, 6, 2, 3, 7, 9, 11 ];
and somewhere (i.e. from .csv file) I get another collection of values, say
c1 = [ 11, 4, 5, 8, 1, 9 ]
I'm treating my collections as just sets, order of elements does not matter. What I need is to merge x.c with c1 with come magic operation so that resulting x.c will contain only distinct elements from both. The following idea comes to mind (yet untested):
LOAD CSV FROM "file:///tmp/additives.csv" as row
START x=node(TOINT(row[0]))
MATCH c1 = [ elem IN SPLIT(row[1], ':') | TOINT(elem) ]
SET
x.c = [ newxc IN x.c + c1 WHERE (newx IN x.c AND newx IN c1) ];
This won't work, it will give an intersection but not a collection of distinct items.
More RTFM gives another idea: use REDUCE() ? but how?
How to extend Cypher with a new builtin function UNIQUE() which accept collection and return collection, cleaned form duplicates?
UPD. Seems that FILTER() function is something close but intersection again :(
x.c = FILTER( newxc IN x.c + c1 WHERE (newx IN x.c AND newx IN c1) )
WBR,
Andrii
How about something like this...
with [1,2,3] as a1
, [3,4,5] as a2
with a1 + a2 as all
unwind all as a
return collect(distinct a) as unique
Add two collections and return the collection of distinct elements.
dec 15, 2014 - here is an update to my answer...
I started with a node in the neo4j database...
//create a node in the DB with a collection of values on it
create (n:Node {name:"Node 01",values:[4,6,2,3,7,9,11]})
return n
I created a csv sample file with two columns...
Name,Coll
"Node 01","11,4,5,8,1,9"
I created a LOAD CSV statement...
LOAD CSV
WITH HEADERS FROM "file:///c:/Users/db/projects/coll-merge/load_csv_file.csv" as row
// find the matching node
MATCH (x:Node)
WHERE x.name = row.Name
// merge the collections
WITH x.values + split(row.Coll,',') AS combo, x
// process the individual values
UNWIND combo AS value
// use toInt as the values from the csv come in as string
// may be a better way around this but i am a little short on time
WITH toInt(value) AS value, x
// might as well sort 'em so they are all purdy
ORDER BY value
WITH collect(distinct value) AS values, x
SET x.values = values
You could use reduce like this:
with [1,2,3] as a, [3,4,5] as b
return reduce(r = [], x in a + b | case when x in r then r else r + [x] end)
Since Neo4j 3.0, with APOC Procedures you can easily solve this with apoc.coll.union(). In 3.1+ it's a function, and can be used like this:
...
WITH apoc.coll.union(list1, list2) as unionedList
...

Resources