Getting aggregate ratio from a property from relationships with Cypher - neo4j

Let's say I have a graph like this (excuse the pseudo-code):
(A)-[:RSVP {reply:true}]->(X)
(B)-[:RSVP {reply:true}]->(X)
(C)-[:RSVP {reply:false}]->(X)
How do I get the ratio of positive responses? I'm expecting a result that will give me the acceptance ratio of 0.66.

I mocked up your data this way:
CREATE (A), (B), (C), (X {label: "party time"})
MERGE (A)-[:RSVP {reply:true}]->(X)
MERGE (B)-[:RSVP {reply:true}]->(X)
MERGE (C)-[:RSVP {reply:false}]->(X);
Then, with this query, we can simply count the "yes's" and the "no's" separately, and create the ratio with simple division:
MATCH (X {label:"party time"})
MATCH (X)<-[:RSVP {reply:true}]-(yeses),
(X)<-[:RSVP {reply:false}]-(nos)
RETURN count(distinct(yeses))/count(distinct(nos));
The answer I get is 2, because there are 2 yeses and 1 no. (2/1 => 2)

Using #FrobberOfBits' sample data, the following is a more general query that takes care of a couple of special cases (which would otherwise cause Cypher to have a "/ by zero" error).
If are no RSVPs, the query returns the string "No Matches".
If are no false RSVPs, the query returns the string "Infinity".
Otherwise, the query returns the ratio of yeses to nos.
MATCH (X {label:"party time"})
OPTIONAL MATCH (X)<-[:RSVP {reply:true}]-(yes)
OPTIONAL MATCH (X)<-[:RSVP {reply:false}]-(no)
WITH LENGTH(COLLECT(DISTINCT yes)) AS yeses, LENGTH(COLLECT(DISTINCT no)) AS nos
RETURN CASE
WHEN yeses = 0 AND nos = 0 THEN "No Matches"
WHEN nos = 0 THEN "Infinity"
ELSE TOFLOAT(yeses)/nos
END;
To get the ratio that you originally asked for (ratio of yeses to the total number of responses), the query would be:
MATCH (X {label:"party time"})
OPTIONAL MATCH (X)<-[:RSVP {reply:true}]-(yes)
OPTIONAL MATCH (X)<-[:RSVP {reply:false}]-(no)
WITH LENGTH(COLLECT(DISTINCT yes)) AS yeses, LENGTH(COLLECT(DISTINCT no)) AS nos
RETURN CASE
WHEN yeses = 0 AND nos = 0 THEN "No Matches"
ELSE TOFLOAT(yeses)/(yeses + nos)
END;

Related

apoc.cypher.mapParallel2 in Neo4j is not giving expected results

I have the following query
MATCH (e) WHERE SIZE((e:Customer)<-[:Transaction]-()) <> 0
AND SIZE(()<-[:Transaction]-(e)) <> 0
MATCH path = (e)-[:Transaction*..10]-(e) return path
I am getting the expected results with the above query.
I am trying to parallelize this query with the following query
MATCH (e:Customer) WHERE SIZE((e)<-[:Transaction]-()) <> 0 AND SIZE(()<-[:Transaction]-(e)) <> 0 WITH
collect(e.ID) AS users CALL apoc.cypher.mapParallel2("match (e:Customer)-[:Transaction*..10]->(e)
where e.ID=_ return e.ID as ll",{},users,10) yield value return value.ll
this query doesn't return anything. Kindly, please help me with this.
Does this query, which is also more efficient, work for you?
MATCH (e:Customer)
WHERE (e)<-[:Transaction]-() AND ()<-[:Transaction]-(e)
WITH collect(e) AS users
CALL apoc.cypher.mapParallel2(
"match (_)-[:Transaction*..10]->(_) return _.ID as ll",
{},users,10) YIELD value
RETURN value.ll

Compare data of two nodes of different labels in Neo4j with some property

Match(csav:CSAVHierarchy) with csav
Match(cx:CXCustomerHierarchy) with cx
Optional Match(csav)-[:CSAVCustomerHasChild]->(csa:CSAVHierarchy) where csa._type='CXCustomer' OR csa._type='CXCustomerBU'
Optional Match(cx)-[:CXCustomerHasChild]->(cxc:CXCustomerHierarchy) where cxc._type='CXCustomer' OR cxc._type='CXCustomerBU'
return
CASE
WHEN csa.ssid = cxc.ssid and csa.elementLabel = cxc.elementLabel
THEN "yes"
ELSE "No" END As result
with this query its giving cartesian issue and i want to carry forward both the nodes data for further use.
where I m lacking?
You can use the Apoc plugin (see https://neo4j-contrib.github.io/neo4j-apoc-procedures):
Match(n:Person{ssid:"1234"}) with collect(n) as nodes CALL apoc.refactor.mergeNodes(nodes) YIELD node RETURN node
This query may do what you want. It returns the unique cxc and csa pairs that pass all your tests.
MATCH (csa:CSAVHierarchy)
WHERE
(:CSAVHierarchy)-[:CSAVCustomerHasChild]->(csa) AND
csa._type='CXCustomer' OR csa._type='CXCustomerBU'
MATCH (cxc:CXCustomerHierarchy)
WHERE
(:CXCustomerHierarchy)-[:CXCustomerHasChild]->(cxc) AND
csa.ssid = cxc.ssid AND
csa.elementLabel = cxc.elementLabel AND
(cxc._type='CXCustomer' OR cxc._type='CXCustomerBU')
RETURN cxc, csa
For better performance, you should also create indexes on :CSAVHierarchy(_type) and :CXCustomerHierarchy(_type).
This the solution I came up with
MATCH(cxc:CXCustomerHierarchy)-[:_properties]->(auditnode)-->(spoke)
where cxc._type='CXCustomer' OR cxc._type='CXCustomerBU' AND spoke.start_date <= 1554272198875 <= spoke.end_date AND spoke.status = "Confirmed"
with cxc
OPTIONAL MATCH (cxc)<-[r:CXCustomerHasChild]-(parent) with cxc
MATCH(csav:CSAVHierarchy)-[:_properties]->(auditnode)-->(spoke) with cxc,csav
where csav._type='CXCustomer' OR csav._type='CXCustomerBU' AND spoke.start_date <= 1554272198875 <= spoke.end_date AND spoke.status = "Confirmed"
OPTIONAL MATCH (csav)<-[r:CSAVCustomerHasChild]-(parent) with csav,cxc
return
CASE
WHEN csav.sourceSystemId <> cxc.sourceSystemId , csav.elementLabel <> cxc.elementLabel
THEN csav.elementLabel
ELSE "SIMILAR DATA " END As result

How To Generate k-ary trees in Neo4j quickly?

I'm generating perfect k-ary trees in neo4j but my queries for doing so don't seem very efficient I was wondering if I could improve on them in anyway, my go code below shows all three queries Im running to generate the trees, k is number of children per node, h is tree height:
func createPerfectKaryTreeInNeo(k, h int, execNeo func(string) error) error {
lastNode := ((iPow(k, (h + 1)) - 1) / (k - 1)) - 1
err := execNeo(fmt.Sprintf("FOREACH(i IN RANGE(0, %d, 1) | CREATE (:NODE {id:i, value:i}))", lastNode))
if err != nil {
return err
}
err = execNeo(fmt.Sprintf("MATCH (a:NODE), (b:NODE) WHERE b.id = a.id * %d + 1 CREATE (a)-[:FIRST_CHILD]->(b)", k))
if err != nil {
return err
}
err = execNeo(fmt.Sprintf("MATCH (a:NODE), (b:NODE) WHERE b.id = a.id + 1 AND a.id %% %d <> 0 CREATE (a)-[:NEXT_SIBLING]->(b)", k))
if err != nil {
return err
}
return nil
}
I think this is slow for h > 9 because of the last 2 queries, the MATCH on the 2 unconnected nodes, when I run this in the neo4j web client it warns about:
This query builds a cartesian product between disconnected patterns.
If a part of a query contains multiple disconnected patterns, this
will build a cartesian product between all those parts. This may
produce a large amount of data and slow down query processing. While
occasionally intended, it may often be possible to reformulate the
query that avoids the use of this cross product, perhaps by adding a
relationship between the different parts or by using OPTIONAL MATCH
(identifier is: (b))
Is there a way I can reformulate these queries to be more efficient?
EDIT:
The code is here if you wish to run it: https://github.com/robsix/data_model_perf_test
Graphs are designed to quickly identify a single point and then traverse from there. Your query structure (write all the nodes, then sort them and add relationships) does pretty much the opposite, which is why you're getting all those warnings. Unfortunately, to provide variable children per node, you will need to be able to query the id property quickly, so make sure that you have an index on :Node(id) and then try a single big query like this:
WITH 3 AS k, 2 AS h
WITH k, REDUCE(s = toFloat(0), x IN RANGE(1, h-1)|s + k^x) AS max_parent_id
UNWIND RANGE(0, toInt(max_parent_id)) AS parent_id
WITH k, parent_id, k*parent_id+1 AS first_child_id
MERGE (parent:NODE {id:parent_id, value:parent_id})
MERGE (child:NODE {id: first_child_id, value:first_child_id})
MERGE (parent) - [:FIRST_CHILD] -> (child)
WITH k, first_child_id
UNWIND RANGE(first_child_id + 1, first_child_id + k - 1) AS next_child_id
MERGE (last_child:NODE {id:next_child_id -1, value:next_child_id -1})
MERGE (next_child:NODE {id:next_child_id, value:next_child_id})
MERGE (last_child) - [:NEXT_SIBLING] -> (next_child)
This will run through all the possible parent ids, and for each one, will MERGE (match or create) a node with the right ID. It will then MERGE the first child node, whose ID you can already calculate, along with the FIRST_CHILD relationship. This will avoid your cartesian problem. The query will then go through the ids of each possible sibling to the first one, MATCH the existing sibling, and MERGE the next sibling along with the relationship.
UPDATE: I am so sorry, I totally overlooked the node visualization when testing it. I've updated the query since, to solve an index error and account for some reordering that I didn't know Cypher did. You learn something every day! But yeah, what's up there now generates the right graph.
The best I can come up with is still to use three queries but the are used in an interesting way to create the K-ary tree without making neo4j do too many searches:
func createPerfectKaryTreeInNeo(k, h int, execNeo func(string) error) error {
lastNode := ((iPow(k, (h+1)) - 1) / (k - 1)) - 1
if lastNode % 2 != 0 {
err := execNeo(fmt.Sprintf("UNWIND RANGE(0, %d, 2) AS id CREATE (a:NODE {id:id, value: id})-[:NEXT_SIBLING]->(b:NODE {id: id+1, value: id+1}) WITH a, b MATCH (c:NODE {id: b.id+1}) CREATE (b)-[:NEXT_SIBLING]->(c)", lastNode - 1))
if err != nil {
return err
}
} else {
err := execNeo(fmt.Sprintf("UNWIND RANGE(1, %d, 2) AS id CREATE (a:NODE {id:id, value: id})-[:NEXT_SIBLING]->(b:NODE {id: id+1, value: id+1}) WITH a, b MATCH (c:NODE {id: b.id+1}) CREATE (b)-[:NEXT_SIBLING]->(c)", lastNode))
if err != nil {
return err
}
err = execNeo("MATCH (a:NODE {id:1}) CREATE (:NODE {id:0, value:0})-[:NEXT_SIBLING]->(a)")
}
lastParentNode := (lastNode - 1) / k
err := execNeo(fmt.Sprintf("UNWIND RANGE(0, %d, 1) AS id MATCH shortestPath((a:NODE {id:id})-[:NEXT_SIBLING *]->(b:NODE {id:id*%d+1})) CREATE (a)-[:FIRST_CHILD]->(b)", lastParentNode, k))
if err != nil {
return err
}
err = execNeo(fmt.Sprintf("MATCH (a:NODE)-[r:NEXT_SIBLING]->(b:NODE) WHERE a.id %% %d = 0 DELETE r", k))
if err != nil {
return err
}
return nil
}
I should note that this algorithm is specifically for perfect k-ary trees with node id's allocated in breadth first order, the way it works is:
1) generate all the nodes in pairs and assign them all in order as being NEXT_SIBLINGS of one another i.e. 0->1->2->3->4 so you end up with a straight graph.
2) loop through all the ids small enough to have children and match using the shortestPath function in the hope that neo4j is smart enough to work out that given the current shape of the graph as soon as it finds a match, that is the shortest possible path and so return early without continuing to search further.
3) the last query then grabs adjacent nodes that should not be considered NEXT_SIBLINGS and deletes the relationship leaving behind a perfect k-ary tree with depth h.
The changes have sped up the data creation by at least an order of magnitude.
UPDATE:
The accepted answer above is correct, this is just the go code that matches it:
func createPerfectKaryTreeInNeo(k, h int, execNeo func(string) error) error {
return execNeo(fmt.Sprintf(`
WITH %d AS k, %d AS h
WITH k AS k, REDUCE(s = toFloat(0), x IN RANGE(1, h-1)|s + k^x) AS max_parent_id
UNWIND RANGE(0, toInt(max_parent_id)) AS parent_id
WITH k AS k, parent_id, k*parent_id+1 AS first_child_id
MERGE (parent:NODE {id:parent_id, value:parent_id})
MERGE (child:NODE {id: first_child_id, value:first_child_id})
MERGE (parent) - [:FIRST_CHILD] -> (child)
WITH k AS k, first_child_id
UNWIND RANGE(first_child_id + 1, first_child_id + k - 1) AS next_child_id
MERGE (last_child:NODE {id:next_child_id -1, value:next_child_id -1})
MERGE (next_child:NODE {id:next_child_id, value:next_child_id})
MERGE (last_child) - [:NEXT_SIBLING] -> (next_child)
`, k, h))
}
It is several order of magnitude faster than my improvements I originally described in this answer

neo4j cypher left padding String in Where Clause

I have a String property in my nodes where the length of the String isn't fix.
Now i must search the right node by this property but i get a fixed length value from another System. For Example my Node has the Value '0123' but I get the Information '000123' for searching.
I need a function like left padding with Zeros and this in the Where Clause like
MATCH (a:LABEL) where leftPad(a.property, 6, '0') = '000123' return a
LIMIT 1
Is something like this possible with a good Performance?
You could do this:
MATCH (a:LABEL)
WHERE SUBSTRING('00000', 0, SIZE(a.property)) + a.property = '000123'
RETURN a
LIMIT 1;
Or, if all the characters are numeric, then you could do this:
MATCH (a:LABEL)
WHERE TOINT(a.property) = TOINT('000123')
RETURN a
LIMIT 1;
However, it would be even better if you could just store the property value as an integer in the first place, and also compare it to an integer, which would be the fastest. This might be very easy to do, depending on your situation.
MATCH (a:LABEL)
WHERE a.property = 000123
RETURN a
LIMIT 1;
Try it with reduce:
MATCH (a:LABEL)
WHERE REDUCE(lp='', n in RANGE(0,5-size(a.name)) | lp+'0')+a. a.property = '000123'
RETURN a
or try it with regular expression:
MATCH (a:LABEL)
WHERE a.property =~ '(0){0,3}123'
RETURN a

Neo4j: Conditional return/IF clause/String manipulation

This is in continuation of Neo4j: Listing node labels
I am constructing a dynamic MATCH statement to return the hierarchy structure & use the output as a Neo4j JDBC input to query the data from a java method:
MATCH p=(:Service)<-[*]-(:Anomaly)
WITH head(nodes(p)) AS Service, p, count(p) AS cnt
RETURN DISTINCT Service.company_id, Service.company_site_id,
"MATCH srvhier=(" +
reduce(labels = "", n IN nodes(p) | labels + labels(n)[0] +
"<-[:BELONGS_TO]-") + ") WHERE Service.company_id = {1} AND
Service.company_site_id = {2} AND Anomaly.name={3} RETURN " +
reduce(labels = "", n IN nodes(p) | labels + labels(n)[0] + ".name,");
The output is as follows:
MATCH srvhier=(Service<-[:BELONGS_TO]-Category<-[:BELONGS_TO]-SubService<-
[:BELONGS_TO]-Assets<-[:BELONGS_TO]-Anomaly<-[:BELONGS_TO]-) WHERE
Service.company_id = {1} and Service.company_site_id = {21} and
Anomaly.name={3} RETURN Service.name, Category.name, SubService.name,
Assets.name, Anomaly.name,
The problem I am seeing:
The "BELONGS_TO" gets appended to my last node
Line 2: Assets<-[:BELONGS_TO]-Anomaly**<-[:BELONGS_TO]-**
Are there string functions (I have looked at Substring..) that can be used to remove it? Or can I use a CASE statement with condition n=cnt to append "BELONGS_TO"?
The same problem persists with my last line:
Line 5: Assets.name,Anomaly.name**,** - the additional "," that I need to eliminate.
Thanks.
I think you need to introduce a case statement into the reduce clause something like this snippet below. If the node isn't the last element of the collection then append the "<-[:BELONGS_TO]-" relationship. If it is the last element then don't append it.
...
reduce(labels = "", n IN nodes(p) |
CASE
WHEN n <> nodes(p)[length(nodes(p))-1] THEN
labels + labels(n)[0] + "<-[:BELONGS_TO]-"
ELSE
labels + labels(n)[0]
END
...
Cypher has a substring function that works basically like you'd expect. An example: here's how you'd return everything but the last three characters of a string:
return substring("hello", 0, length("hello")-3);
(That returns "he")
So you could use substring to trim the last separator off of your query that you don't want.
But I don't understand why you're building your query in such a complex way; you're using cypher to write cypher (which is OK) but (and I don't understand your data model 100%) it seems to me like there's probably an easier way to write this query.

Resources