Cypher join informations from different tables into a single one

Cypher join informations from different tables into a single one - neo4j

I'm new in cypher and I'm struggling with this problem:
I have these two queries
MATCH (u:UserNode)-[:PROMOTER_OF*1..]->(c:UserNode)
WHERE u.promoterActualRole IN ["GOLD","RUBY","SAPPHIRE","BRONZE","EMERALD", "DIAMOND"]
AND datetime(c.promoterStartActivity) >= datetime("2021-02-01T00:00:00Z")
AND datetime(c.promoterStartActivity)<= datetime("2021-05-31T23:59:59Z")
AND c.promoterEnabled = true
AND u.firstName="Gianvito"
WITH distinct u as user, count(c) as num_promoter
WHERE num_promoter >= 150
RETURN user.firstName as name, user.email as email, num_promoter
which will return me a table like this
name
email
num_promoter
Gianvito
gianvito#email.com
1475
and
MATCH (u:UserNode)-[:PROMOTER_OF*1..]->(c:UserNode)
WHERE u.promoterActualRole IN ["GOLD","RUBY","SAPPHIRE","BRONZE","EMERALD", "DIAMOND"]
AND datetime(c.subscriptionDate) >= datetime("2021-02-01T00:00:00Z")
AND datetime(c.subscriptionDate)<= datetime("2021-05-31T23:59:59Z")
AND c.kycStatus = "OK"
AND u.firstName="Gianvito"
WITH distinct u as user, count(c) as num_swaggy
WHERE num_swaggy >= 1
RETURN user.firstName as name, user.email as email , num_swaggy
name
email
num_swaggy
Gianvito
gianvito#email.com
1820
I would like to merge these two results into a single table.
I was doing a Union but in this way I can only create a single table with two different rows with duplicate common information and "null" as non present value.
How can I do if I want to obtain a table like this one?
name
email
num_promoter
num_swaggy
Gianvito
gianvito#email.com
1475
1820

If you're using Neo4j 4.x or higher, you can UNION the results of the queries in a subquery, and outside of it perform a sum() to get the results into a single row per user:
CALL {
MATCH (u:UserNode)-[:PROMOTER_OF*1..]->(c:UserNode)
WHERE u.promoterActualRole IN ["GOLD","RUBY","SAPPHIRE","BRONZE","EMERALD", "DIAMOND"]
AND datetime(c.promoterStartActivity) >= datetime("2021-02-01T00:00:00Z")
AND datetime(c.promoterStartActivity)<= datetime("2021-05-31T23:59:59Z")
AND c.promoterEnabled = true
AND u.firstName="Gianvito"
WITH u as user, count(c) as num_promoter
WHERE num_promoter >= 150
RETURN user, num_promoter, 0 as num_swaggy
UNION
MATCH (u:UserNode)-[:PROMOTER_OF*1..]->(c:UserNode)
WHERE u.promoterActualRole IN ["GOLD","RUBY","SAPPHIRE","BRONZE","EMERALD", "DIAMOND"]
AND datetime(c.subscriptionDate) >= datetime("2021-02-01T00:00:00Z")
AND datetime(c.subscriptionDate)<= datetime("2021-05-31T23:59:59Z")
AND c.kycStatus = "OK"
AND u.firstName="Gianvito"
WITH u as user, count(c) as num_swaggy
WHERE num_swaggy >= 1
RETURN user, 0 as num_promoter, num_swaggy
}
WITH user, sum(num_promoter) as num_promoter, sum(num_swaggy) as num_swaggy
RETURN user.firstName as name, user.email as email , num_promoter, num_swaggy
Also you don't need to use DISTINCT when you're performing any aggregation, since the grouping key will become distinct automatically as a result of the aggregation.

Related

Cypher UNION - How to apply to collected lists

Consider the following use of UNION cypher command:
MATCH (user:User)-[]-(org:Organization)
WHERE org.size > 100
RETURN collect({
user.name,
user.age
}) AS userList
UNION
MATCH (user:User)-[]-(family:Family)
WHERE family.mood = "Happy"
RETURN collect({
user.name,
user.age
}) AS userList
The UNION does not work, this query returns users only from the first MATCH. I suspect it's because of the collect statements, however the project's design requires the data to be collected. Is there a way to create a union of the collections, or perhaps collect after the union?

Your query will work just fine except that you should 1) return a valid dictionary format and 2) use CALL which is a subquery for neo4j cypher.
RETURN {
name: user.name,
age: user.age
} AS userList
See sample below:
CALL {MATCH (user:user{id:"some_id"})
RETURN {
id: user.id,
age: user.age
} AS userList
UNION
MATCH (user:user{id:"some_id2"})
RETURN {
id: user.id,
age: user.age
} AS userList
}
RETURN collect(userList) as userList
Result:
╒══════════════════════════════════════════════════════════╕
│"userList" │
╞══════════════════════════════════════════════════════════╡
│[{"id":"some_id","age":null},{"id":"some_id2","age":null}]│
└──────────────────────────────────────────────────────────┘
I am using neo4j version 4.4.3

You can use apoc.coll.union of the APOC library, to create a union of two lists, like this:
MATCH (user:User)-[]-(org:Organization)
WHERE org.size > 100
WITH collect({
user.name,
user.age
}) AS userList1
MATCH (user:User)-[]-(family:Family)
WHERE family.mood = "Happy"
WITH userList1, collect({
user.name,
user.age
}) AS userList2
RETURN apoc.coll.union(userList1, userList2) AS userList
The function apoc.coll.union will not include duplicates, if you want to include duplicates use apoc.coll.unionAll.

Neo4j - UNION of 3 different queries

I have a problem with one composed query, which has three parts.
Get direct friends
Get friends of friends
Get others - just fill up space to limit
So it should always return limited users, ordered by direct friends, friends of friends and others. First two parts are very fast, no problem here, but last part is slow and it's getting slower while db is growing on size. There are indexes on Person.number and Person.createdAt.
Does anyone have an idea how to improve or rewrite this query, to be more performant?
MATCH (me:Person { number: $number })-[r:KNOWS]-(contact:Person { registered: "true" }) WHERE contact.number <> $number AND (r.state = "contact" OR r.state = "declined")
MATCH (contact)-[:HAS_AVATAR]-(avatar:Avatar { primary: true })
WITH contact, avatar
RETURN contact AS friend, avatar, contact.createdAt AS rank
ORDER BY contact.createdAt DESC
UNION
MATCH (me:Person { number: $number })-[:KNOWS]-(friend)-[:KNOWS { state: "accepted" }]-(friend_of_friend:Person { registered: "true" }) WHERE NOT friend.username = 'default' AND NOT (me)-[:KNOWS]-(friend_of_friend)
MATCH (friend_of_friend)-[:HAS_AVATAR]-(avatar:Avatar { primary: true })
OPTIONAL MATCH (friend_of_friend)-[rel:KNOWS]-(friend)
RETURN friend_of_friend AS friend, avatar, COUNT(rel) AS rank
ORDER BY rank DESC
UNION
MATCH (me:Person { number: $number })
MATCH (others:Person { registered: "true" }) WHERE others.number <> $number AND NOT (me)-[:KNOWS]-(others) AND NOT (me)-[:KNOWS]-()-[:KNOWS { state: "accepted" }]-(others:Person { registered: "true" })
MATCH (others)-[:HAS_AVATAR]->(avatar:Avatar { primary: true })
OPTIONAL MATCH (others)-[rel:KNOWS { state: "accepted" }]-()
WITH others, rel, avatar
RETURN others AS friend, avatar, COUNT(rel) AS rank
ORDER BY others.createdAt DESC
SKIP $skip
LIMIT $limit
Here are some profiles:
https://i.stack.imgur.com/LfNww.png
https://i.stack.imgur.com/0EO0r.png
Final solution is to break down the whole query into three and call them separately, in our case it won't reach 3rd query in 99% and first two are super fast. And it seems that even if it reach 3rd stage, it is still fast, so maybe UNION was slowing the whole thing down the most.
const contacts = await this.neo4j.readQuery(`...
if (contacts.records.length < limit){
const friendOfFriend = await this.neo4j.readQuery(`...
if (contacts.records.length + friendOfFriend.records.length < limit){
const others = await this.neo4j.readQuery(`...
merge all results

You're doing a lot of work in that third query before the limit. You may want to move the ordering and LIMIT up sooner.
It's also going to be more efficient to pre-match to the friends (and friends of friends) in a single MATCH pattern, we can use *0..1 as an optional relationship to a potential next node.
And just a bit of style advice, I find it a good idea to reserve plurals for lists/collections and otherwise use singular, as you will only have a single one of those nodes per row.
Try this out for the third part:
MATCH (me:Person { number: $number })
OPTIONAL MATCH (me)-[:KNOWS]-()-[:KNOWS*0..1 { state: "accepted" }]-(other:Person {registered:"true"})
WITH collect(DISTINCT other) as excluded
MATCH (other:Person { registered: "true" }) WHERE other.createdAt < dateTime() AND other.number <> $number AND NOT other IN excluded
WITH other
ORDER BY other.createdAt DESC
SKIP $skip
LIMIT $limit
MATCH (other)-[:HAS_AVATAR]->(avatar:Avatar { primary: true })
WITH other, avatar, size((other)-[:KNOWS { state: "accepted" }]-()) AS rank
RETURN other AS friend, avatar, rank
If we know the type of createdAt then we can add a modification that may trigger index-backed ordering which could improve this.

Get count of child objects in grails Criteria query

Lets Say a domain class A has many Class B objects. I need to do a criteria query which returns
A.id
A.name
B.count(no of B elements associated with A)
B.last Updated(date of most recent update of B elements associated with A considering i have last_updated date for all B elements)
Also the query should be flexible enough to add conditions/restrictions to both A and B domain objects.
Currently I have gotten as far as this:
A.createCriteria().list {
createAlias('b','b')
projections{
property('id')
property('gender')
property('dateOfBirth')
count('b.id')
property('publicId')
}
}
But the problem is that it only returns one object and the count of child objects is for all the elements of B instead of just those associated with A

Recently I was in a similar scenario I needed a query in which one of your rows will store the count of many in a one-to-many relationship
But unlike your scenario I used native sql queries to resolve the query.
The solution was to use derived tables (I do not know how to implement them using criteria query).
In case you find it useful I share a code with the implementation taken from a grails service:
List<Map> resumeInMonth(final String monthName) {
final session = sessionFactory.currentSession
final String query = """
SELECT
t.id AS id,
e.full_name AS fullName,
t.subject AS issue,
CASE t.status
WHEN 'open' THEN 'open'
WHEN 'pending' THEN 'In progress'
WHEN 'closed' THEN 'closed'
END AS status,
CASE t.scheduled
WHEN TRUE THEN 'scheduled'
WHEN FALSE THEN 'non-scheduled'
END AS scheduled,
ifnull(d.name, '') AS device,
DATE(t.date_created) AS dateCreated,
DATE(t.last_updated) AS lastUpdated,
IFNULL(total_tasks, 0) AS tasks
FROM
tickets t
INNER JOIN
employees e ON t.employee_id = e.id
LEFT JOIN
devices d ON d.id = t.device_id
LEFT JOIN
(SELECT
ticket_id, COUNT(1) AS total_tasks
FROM
tasks
GROUP BY ticket_id) ta ON t.id = ta.ticket_id
WHERE
MONTHNAME(t.date_created) = :monthName
ORDER BY dateCreated DESC"""
final sqlQuery = session.createSQLQuery(query)
final results = sqlQuery.with {
resultTransformer = AliasToEntityMapResultTransformer.INSTANCE
setString('monthName', monthName)
list()
}
results
}
The part of interest is to declare a row within the main select and then in the clause from declare the derived query that stores the result in a row with the same name declared in the main select
SELECT ...
total_tasks --Add the count column to your select
FROM ticket t
JOIN (SELECT ticked_id, COUNT(1) as total_tasks
FROM tasks
GROUP BY ticked_id) ta ON t.id = ta.ticked_id
...rest of query
This last example I share from the answer made by the user Aaron Dietz to the question that I also formulate
I hope it is useful for you

Turns out I wasn't very far from the solution and i just needed to do grouping based on the right property which is the foreign key column in the child table which is b.a in this case so the following works now
A.createCriteria().list {
createAlias('b','b')
projections{
property('id')
property('gender')
property('dateOfBirth')
count('b.id')
groupProperty('b.a')
property('publicId')
}
}

In the criteria you need to group by the property which are not aggregate.
Try following:
A.createCriteria().list {
createAlias('b','b')
projections{
groupProperty('id','id')
groupProperty('gender','gender')
groupProperty('dateOfBirth','dateOfBirth')
count('b.id','total')
groupProperty('publicId','publicId')
}
}
or If you want to have a list of map object return you can try add resultTransformer(CriteriaSpecification.ALIAS_TO_ENTITY_MAP)
A.createCriteria().list {
resultTransformer(CriteriaSpecification.ALIAS_TO_ENTITY_MAP)
createAlias('b','b')
projections{
groupProperty('id','id')
groupProperty('gender','gender')
groupProperty('dateOfBirth','dateOfBirth')
count('b.id','total')
groupProperty('publicId','publicId')
}
}
Hope it can help

Using equals on different properties of the same collection returns no record, why?

When I use the following query:
MATCH (emp:Employee)
WHERE emp.supervisor_id = 159
RETURN emp
I get a result as 4 employees/nodes with supervisor_id = 159
and for this query I also get a result which is 1 employee with employeeID = 159 :
MATCH (emp:Employee)
WHERE emp.employeeID = 159
RETURN emp
But when I use the = operator, it does return (no changes, no records).
MATCH (emp:Employee)
WHERE emp.employeeID = emp.supervisor_id
RETURN emp
I assume it's a logic mistake, but I just can't figure it out.
Any idea pls.

In your query you are searching a node with the label Employee that has its attribute employeeID equals to supervisor_id.
Or from what I understand, what you want is to search two differents nodes with the label Employee.
So your query should be this one :
MATCH (emp1:Employee), (emp2:Employee)
WHERE emp1.employeeID = emp2.supervisor_id
CREATE (emp1)-[:MANAGER_OF]->(emp2)
This query create a Cartesian product, so if you have a lot of Employee nodes, you should batch the creation of relationships with an APOC procedure (https://neo4j-contrib.github.io/neo4j-apoc-procedures/) like this :
CALL apoc.periodic.iterate(
"MATCH (emp1:Employee) RETURN emp1",
"MATCH (emp2:Employee) WHERE emp1.employeeID = emp2.supervisor_id CREATE (emp1)-[:MANAGER_OF]->(emp2)",
{batchSize:5000, parallel:true}
);
Cheers

neo4j cypher - how to find all nodes that have a relationship to list of nodes

I have nodes- named "options". "Users" choose these options. I need a chpher query that works like this:
retrieve users who had chosen all the options those are given as a list.
MATCH (option:Option)<-[:CHOSE]-(user:User) WHERE option.Key IN ['1','2','2'] Return user
This query gives me users who chose option(1), option(2) and option(3) and also gives me the user who only chose option(2).
What I need is only the users who chose all of them -option(1), option(2) and option(3).

For an all cypher solution (don't know if it's better than Chris' answer, you'll have to test and compare) you can collect the option.Key for each user and filter out those who don't have a option.Key for each value in your list
MATCH (u:User)-[:CHOSE]->(opt:Option)
WITH u, collect(opt.Key) as optKeys
WHERE ALL (v IN {values} WHERE v IN optKeys)
RETURN u
or match all the options whose keys are in your list and the users that chose them, collect those options per user and compare the size of the option collection to the size of your list (if you don't give duplicates in your list the user with an option collection of equal size has chosen all the options)
MATCH (u:User)-[:CHOSE]->(opt:Option)
WHERE opt.Key IN {values}
WITH u, collect(opt) as opts
WHERE length(opts) = length({values}) // assuming {values} don't have duplicates
RETURN u
Either should limit results to users connected with all the options whose key values are specified in {values} and you can vary the length of the collection parameter without changing the query.

If the number of options is limited, you could do:
MATCH
(user:User)-[:Chose]->(option1:Option),
(user)-[:Chose]->(option2:Option),
(user)-[:Chose]->(option3:Option)
WHERE
option1.Key = '1'
AND option2.Key = '2'
AND option3.Key = '3'
RETURN
user.Id
Which will only return the user with all 3 options.
It's a bit rubbishy as obviously you end up with 3 lines where you have 1, but I don't know how to do what you want using the IN keyword.
If you're coding against it, it's pretty simple to generate the WHERE and MATCH clause, but still - not ideal. :(
EDIT - Example
Turns out there is some string manipulation going on here (!), but you can always cache bits. Importantly - it's using Params which would allow neo4j to cache the queries and supply faster responses with each call.
public static IEnumerable<User> GetUser(IGraphClient gc)
{
var query = GenerateCypher(gc, new[] {"1", "2", "3"});
return query.Return(user => user.As<User>()).Results;
}
public static ICypherFluentQuery GenerateCypher(IGraphClient gc, string[] options)
{
ICypherFluentQuery query = new CypherFluentQuery(gc);
for(int i = 0; i < options.Length; i++)
query = query.Match(string.Format("(user:User)-[:CHOSE]->(option{0}:Option)", i));
for (int i = 0; i < options.Length; i++)
{
string paramName = string.Format("option{0}param", i);
string whereString = string.Format("option{0}.Key = {{{1}}}", i, paramName);
query = i == 0 ? query.Where(whereString) : query.AndWhere(whereString);
query = query.WithParam(paramName, options[i]);
}
return query;
}

MATCH (user:User)-[:CHOSE]->(option:Option)
WHERE option.key IN ['1', '2', '3']
WITH user, COUNT(*) AS num_options_chosen
WHERE num_options_chosen = LENGTH(['1', '2', '3'])
RETURN user.name
This will only return users that have relationships with all the Options with the given keys in the array. This assumes there are not multiple [:CHOSE] relationships between users and options. If it is possible for a user to have multiple [:CHOSE] relationships with a single option, you'll have to add some conditionals as necessary.
I tested the above query with the below dataset:
CREATE (User1:User {name:'User 1'}),
(User2:User {name:'User 2'}),
(User3:User {name:'User 3'}),
(Option1:Option {key:'1'}),
(Option2:Option {key:'2'}),
(Option3:Option {key:'3'}),
(Option4:Option {key:'4'}),
(User1)-[:CHOSE]->(Option1),
(User1)-[:CHOSE]->(Option4),
(User2)-[:CHOSE]->(Option2),
(User2)-[:CHOSE]->(Option3),
(User3)-[:CHOSE]->(Option1),
(User3)-[:CHOSE]->(Option2),
(User3)-[:CHOSE]->(Option3),
(User3)-[:CHOSE]->(Option4)
And I get only 'User 3' as the output.

For shorter lists, you can use path predicates in your WHERE clause:
MATCH (user:User)
WHERE (user)-[:CHOSE]->(:Option { Key: '1' })
AND (user)-[:CHOSE]->(:Option { Key: '2' })
AND (user)-[:CHOSE]->(:Option { Key: '3' })
RETURN user
Advantages:
Clear to read
Easy to generate for dynamic length lists
Disadvantages:
For each different length, you will have a different query that has to be parsed and cached by Cypher. Too many dynamic queries will watch your cache hit rate go through the floor, query compilation work go up, and query performance go down.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Cypher join informations from different tables into a single one - neo4j

Related

Cypher UNION - How to apply to collected lists

Neo4j - UNION of 3 different queries

Get count of child objects in grails Criteria query

Using equals on different properties of the same collection returns no record, why?

neo4j cypher - how to find all nodes that have a relationship to list of nodes

Categories

Resources