Post-Calculations on Aggregations - neo4j

I'm new to Neo4j and I feel stucked in simple operations that I would solve in regular SQL using subqueries.
How can I subtract the two resulting rows? I have grouped the results and I would like to return the difference between them:
MATCH (seguidores:RelevantTwitterUser {location:"Madrid"})-[:FOLLOWS]->(seguidos:RelevantTwitterUser {location:"Barcelona"})
WITH COLLECT({origen:seguidores.location, user:seguidores.userId}) AS ROWS
MATCH (seguidores:RelevantTwitterUser {location:"Barcelona"})-[:FOLLOWS]->(seguidos:RelevantTwitterUser {location:"Madrid"})
WITH ROWS + COLLECT({origen:seguidores.location, user:seguidores.userId}) AS allRows
UNWIND allRows AS ROW
RETURN ROW.origen, COUNT(ROW.user)
with the output:

Answer using aggregating functions
WITH "Madrid" AS loc1, "Barcelona" AS loc2
MATCH (:RelevantTwitterUser{location:loc1})-[:FOLLOWS]->(:RelevantTwitterUser{location:loc2})
WITH loc1, loc2, COUNT(*) AS count1
MATCH (:RelevantTwitterUser{location:loc2})-[:FOLLOWS]->(:RelevantTwitterUser{location:loc1})
WITH loc1, loc2, count1, COUNT(*) AS count2
RETURN loc1, count1, loc2, count2, count1 - count2 AS diff
You should read the documentation on aggregating functions (like COUNT) if you want to understand how this query works, and to avoid getting the wrong counts if you need to modify this query. It is especially important to understand how grouping keys (e.g., locs and count in the last WITH clause) affect the behavior of aggregating functions.
Answer using SIZE() on pattern expressions
WITH "Madrid" AS loc1, "Barcelona" AS loc2
WITH loc1, loc2,
SIZE((:RelevantTwitterUser{location:loc1})-[:FOLLOWS]->(:RelevantTwitterUser{location:loc2})) AS count1,
SIZE((:RelevantTwitterUser{location:loc2})-[:FOLLOWS]->(:RelevantTwitterUser{location:loc1})) AS count2
RETURN loc1, count1, loc2, count2, count1 - count2 AS diff

Related

Neo4j: Why difference in result?

I have the Cypher query:
match(p:Product {StyleNumber : "Z94882A", Color: "Black"})--(stock:Stock {Retailer: "11"})
with sum(stock.Stockcount) as onstock, p
optional match(p)-->(s:Sale {Retailer : "11"})
where s.Date = 20170801
return p.Color,p.Size, onstock as stock, sum(s.Quantity) as sold
This gives correctly:
Color,Size,Stock,Sold
Black,M,3,0
Black,S,3,1
Black,L,1,1
Black,XL,5,2
But if I leave out the Size property in the return statement,and just return:
return p.Color, onstock as stock, sum(s.Quantity) as sold
This only returns 3 rows (Size "M" is missing):
Black,3,1
Black,1,1
Black,5,2
I can't figure out why there is a difference in the result?
Because you are using the sum() aggregation function.
Cypher doesn't have a GROUP BY clause (like traditional SQL databases), but when you use an aggregation function all non-aggregated fields are implicitly used as grouping fields.
So when you remove p.Size from return the first row is grouped with the second row because all values implicitly grouped are equals (p.Color = 'Black' and onstock = 3). Also, the values of the Sold column are used in the sum() function (0 + 1 = 1), producing the row:
Black,3,1

neo4j cypher left padding String in Where Clause

I have a String property in my nodes where the length of the String isn't fix.
Now i must search the right node by this property but i get a fixed length value from another System. For Example my Node has the Value '0123' but I get the Information '000123' for searching.
I need a function like left padding with Zeros and this in the Where Clause like
MATCH (a:LABEL) where leftPad(a.property, 6, '0') = '000123' return a
LIMIT 1
Is something like this possible with a good Performance?
You could do this:
MATCH (a:LABEL)
WHERE SUBSTRING('00000', 0, SIZE(a.property)) + a.property = '000123'
RETURN a
LIMIT 1;
Or, if all the characters are numeric, then you could do this:
MATCH (a:LABEL)
WHERE TOINT(a.property) = TOINT('000123')
RETURN a
LIMIT 1;
However, it would be even better if you could just store the property value as an integer in the first place, and also compare it to an integer, which would be the fastest. This might be very easy to do, depending on your situation.
MATCH (a:LABEL)
WHERE a.property = 000123
RETURN a
LIMIT 1;
Try it with reduce:
MATCH (a:LABEL)
WHERE REDUCE(lp='', n in RANGE(0,5-size(a.name)) | lp+'0')+a. a.property = '000123'
RETURN a
or try it with regular expression:
MATCH (a:LABEL)
WHERE a.property =~ '(0){0,3}123'
RETURN a

How to find all oldest sibling in all families

If I have a lot of families represented like:
(parent:Person)<-[:CHILD_OF]-(child:Person {age:19})
then how would a query look like that finds the oldest child of all families?
I have the following suggestion, but this only returns 1 node:
match (parent:Person)<--(child:Person) return child order by child.age desc limit 1
Probably can be optimized, here's a quick go at it-
match (c:Person)-[:CHILD_OF]->(p)
with p, max(c.age) as maxAge, collect(c) as children
return p,filter (x in children where x.age=maxAge)
match (parent:Person)<--(child:Person) with parent, max(child.age) as maxAge
Match (parent)<--(child:Person) where child.age = maxAge
return *
It might return several childs if they have maximum same age, if you want to return one, you should use one more creteria like
max(id(child))
and you will have
Match (parent:Person)<--(child:Person) with parent, max(child.age) as maxAge
Match (parent)<--(child:Person) where child.age = maxAge with parent, max(id(child)) as maxId
Match (parent)<--(child:Person) where id(child) = maxId return *

Neo4j - minimum value from array properties

How do I get with cypher the minimum value of array with properties?
MATCH (n)-[r]->(m) RETURN n,m,min(r.timestamps)
Above query does not work.
r has an array with timestamps r.timestamps
How to get the lowest value of timestamps?
You can use unwind:
MATCH (n)-[r]->(m)
UNWIND r.timestamps as timestampts
RETURN n, m, min(timestampts)
I found an answer like this, but it looks ugly
MATCH
(h1)-[r]-(h2)
RETURN h1, h2,
reduce(minTimestamp = 999999999999999999, t IN r.timestamps | CASE WHEN minTimestamp < t THEN minTimestamp ELSE t END)

neo4j cypher - how to find all nodes that have a relationship to list of nodes

I have nodes- named "options". "Users" choose these options. I need a chpher query that works like this:
retrieve users who had chosen all the options those are given as a list.
MATCH (option:Option)<-[:CHOSE]-(user:User) WHERE option.Key IN ['1','2','2'] Return user
This query gives me users who chose option(1), option(2) and option(3) and also gives me the user who only chose option(2).
What I need is only the users who chose all of them -option(1), option(2) and option(3).
For an all cypher solution (don't know if it's better than Chris' answer, you'll have to test and compare) you can collect the option.Key for each user and filter out those who don't have a option.Key for each value in your list
MATCH (u:User)-[:CHOSE]->(opt:Option)
WITH u, collect(opt.Key) as optKeys
WHERE ALL (v IN {values} WHERE v IN optKeys)
RETURN u
or match all the options whose keys are in your list and the users that chose them, collect those options per user and compare the size of the option collection to the size of your list (if you don't give duplicates in your list the user with an option collection of equal size has chosen all the options)
MATCH (u:User)-[:CHOSE]->(opt:Option)
WHERE opt.Key IN {values}
WITH u, collect(opt) as opts
WHERE length(opts) = length({values}) // assuming {values} don't have duplicates
RETURN u
Either should limit results to users connected with all the options whose key values are specified in {values} and you can vary the length of the collection parameter without changing the query.
If the number of options is limited, you could do:
MATCH
(user:User)-[:Chose]->(option1:Option),
(user)-[:Chose]->(option2:Option),
(user)-[:Chose]->(option3:Option)
WHERE
option1.Key = '1'
AND option2.Key = '2'
AND option3.Key = '3'
RETURN
user.Id
Which will only return the user with all 3 options.
It's a bit rubbishy as obviously you end up with 3 lines where you have 1, but I don't know how to do what you want using the IN keyword.
If you're coding against it, it's pretty simple to generate the WHERE and MATCH clause, but still - not ideal. :(
EDIT - Example
Turns out there is some string manipulation going on here (!), but you can always cache bits. Importantly - it's using Params which would allow neo4j to cache the queries and supply faster responses with each call.
public static IEnumerable<User> GetUser(IGraphClient gc)
{
var query = GenerateCypher(gc, new[] {"1", "2", "3"});
return query.Return(user => user.As<User>()).Results;
}
public static ICypherFluentQuery GenerateCypher(IGraphClient gc, string[] options)
{
ICypherFluentQuery query = new CypherFluentQuery(gc);
for(int i = 0; i < options.Length; i++)
query = query.Match(string.Format("(user:User)-[:CHOSE]->(option{0}:Option)", i));
for (int i = 0; i < options.Length; i++)
{
string paramName = string.Format("option{0}param", i);
string whereString = string.Format("option{0}.Key = {{{1}}}", i, paramName);
query = i == 0 ? query.Where(whereString) : query.AndWhere(whereString);
query = query.WithParam(paramName, options[i]);
}
return query;
}
MATCH (user:User)-[:CHOSE]->(option:Option)
WHERE option.key IN ['1', '2', '3']
WITH user, COUNT(*) AS num_options_chosen
WHERE num_options_chosen = LENGTH(['1', '2', '3'])
RETURN user.name
This will only return users that have relationships with all the Options with the given keys in the array. This assumes there are not multiple [:CHOSE] relationships between users and options. If it is possible for a user to have multiple [:CHOSE] relationships with a single option, you'll have to add some conditionals as necessary.
I tested the above query with the below dataset:
CREATE (User1:User {name:'User 1'}),
(User2:User {name:'User 2'}),
(User3:User {name:'User 3'}),
(Option1:Option {key:'1'}),
(Option2:Option {key:'2'}),
(Option3:Option {key:'3'}),
(Option4:Option {key:'4'}),
(User1)-[:CHOSE]->(Option1),
(User1)-[:CHOSE]->(Option4),
(User2)-[:CHOSE]->(Option2),
(User2)-[:CHOSE]->(Option3),
(User3)-[:CHOSE]->(Option1),
(User3)-[:CHOSE]->(Option2),
(User3)-[:CHOSE]->(Option3),
(User3)-[:CHOSE]->(Option4)
And I get only 'User 3' as the output.
For shorter lists, you can use path predicates in your WHERE clause:
MATCH (user:User)
WHERE (user)-[:CHOSE]->(:Option { Key: '1' })
AND (user)-[:CHOSE]->(:Option { Key: '2' })
AND (user)-[:CHOSE]->(:Option { Key: '3' })
RETURN user
Advantages:
Clear to read
Easy to generate for dynamic length lists
Disadvantages:
For each different length, you will have a different query that has to be parsed and cached by Cypher. Too many dynamic queries will watch your cache hit rate go through the floor, query compilation work go up, and query performance go down.

Resources