Neo4j Get nodes and node count simultaneously - neo4j

I'm relatively new to Neo4j so I apologize if there is an obvious answer to this question. I have a db with User nodes, Account nodes and ASSIGNED_TO relationships between them. I have a query (below) to get the users and assigned accounts but I also want to get a count of the users found in the same query regardless of the LIMIT/SKIP result. What seems to be happening, is the user count is based on the OPTIONAL MATCH result, not the result of the MATCH query.
I have 3 users and 3 accounts in the database with 2 users assigned to 2 accounts and one user assigned only to one account.
This is the query:
MATCH (user:User)
WITH user
OPTIONAL MATCH (user)-[assigned:ASSIGNED_TO]-(account:Account)
RETURN user, count(user) as userCount, collect(account) as accounts
SKIP 0 LIMIT 25
This is the result:
user userCount accounts
{id: 2} 1 [{id: 2}]
{id: 1} 2 [{id: 2}, {id: 1}]
{id: 3} 2 [{id: 1}, {id: 3}]
I want the userCount value to be 3 for all rows. If I change 'count(user)' to 'count(DISTINCT user)' I get 1 for userCount. I want to avoid running 2 separate queries if possible.

A collect-unwind pair should do the trick
MATCH (user:User)
WITH collect(user) as users, count(DISTINCT user) as userCount
UNWIND users as user
OPTIONAL MATCH (user)-[assigned:ASSIGNED_TO]-(account:Account)
RETURN user, userCount, collect(account) as accounts
SKIP 0 LIMIT 25

// Get user count
MATCH (user:User) WITH count(user) as userCount
// Get user
MATCH (user:User)
// To optimize a query, first apply the pagination
WITH user, userCount SKIP 0 LIMIT 25
// The other part of query
OPTIONAL MATCH (user)-[assigned:assigned_to]-(account:Account)
RETURN user,
userCount,
collect(distinct account) as accounts

Related

Why these two Cypher queries return different result?

I'm trying to learn Cypher and I have the data of a trust network, I wanted to query people who trust "15 most trusted people", so I wrote this query, QUERY1:
QUERY1:
MATCH (u1:USER)-[:TRUST]->(u2:USER)
with u2.Id as id, COUNT(u2) AS score
order by score desc
limit 15
match p=(w1:USER)-[:TRUST]->(w2:USER {Id: id})
return w1.Id as user1, w2.Id as user2
after that I wanted to change the last 2 lines of query to this:
QUERY2:
MATCH (u1:USER)-[:TRUST]->(u2:USER)
with u2.Id as id, COUNT(u2) AS score
order by score desc
limit 15
match p=(w1:USER)-[:TRUST]->(w2:USER {Id: id})-[:TRUST]->(w3:USER)
return w1.Id as user1,w2.Id as user2, w3.Id as user3
and after analyzing the result, I've guess that something is wrong!
so I hard coded id to specific value, for example 575, then count(p) is equal to 1937520, BUT if I run the last line of query with hardcoded Id, as a separate query:
QUERY3:
MATCH r=(u1:USER)-[:TRUST]->(u2:USER {Id: "575"})-[:TRUST]->(u3:USER)
return count(r)
the count(r) is equal to 129168!
I checked that the User "575" trust 207 people and is trusted by 624 people, so QUERY3 result seems correct: 207*624=129168. and my question is why?!
I can't understand what is wrong with the QUERY2, and the second question is does it mean that QUERY1 result is wrong too?
EDIT1:
thanks for answers, but I still had problem with this, so I checked another scenario and I've got the following result:
If I write a query like this:
QUERY4:
MATCH (n) WITH n limit 15 return "1"
I'll get 15 "1"s printed in the output, so it means the last part of QUERY2 executes 15 times, no matter if I hard code the Id or not, like it's in a for loop. so the problem here was that I thought the WHIT X LIMIT N doSomeThing would execute like a foreach(x : X) loop, if I use x, and would not, if I don't use x. stupid assumption...
This query might do what you intended.
MATCH (:USER)-[r:TRUST]->(u2:USER)
WITH u2, COUNT(r) AS score
ORDER BY score DESC
LIMIT 15
MATCH (w1:USER)-[:TRUST]->(u2)-[:TRUST]->(w3:USER)
RETURN w1.Id AS user1, u2.Id AS user2, w3.Id AS user3;
It first finds the 15 most-trusted users, then finds all the 2-level trust paths that those users are in the middle of, and finally returns the ids of the users in those paths.
Also, the second MATCH reuses the u2 nodes already found by the first MATCH, to speed up the processing of the second MATCH.
In QUERY3, you are matching u2 to a single user (user 575). QUERY 3 is correct.
However, in QUERY2, that WITH (line 3) matches 15 different u1-u2 combinations. The MATCH (line 1) returns a "row" for each u1 and u2 that..well, matches that pattern. Then you are returning just the first 15 results, which I guess are 15 different u1 for u2=user{Id:575} That´s what give 1937520 results, which is exactly 15 * 129168.
The problem in the with appears because you are not aggregating (not getting just 1 row for each u2). You 'return' (using WITH) one id variable for each u2 user, so count(u2) will always be 1. Maybe you wanted to write u1.Id or count(u1) ? Anyway, WITHing u2.Id or u1.Id will return 15 results because of the LIMIT 15 (line 4). LIMIT 1 would do the trick, but we can also do this:
MATCH (u1:User)-[:TRUST]-(u2:User)
WITH DISTINCT(u2.Id) AS id
LIMIT 15
And then the rest of the QUERY2 (or QUERY1, for that matter). I eliminated the score variable, but if it´s meant to be count(u1), it can be readded with no problem.
I'll just break down Query 2 and the rest should make sense.
QUERY2:
MATCH (u1:USER)-[:TRUST]->(u2:USER)
with u2.Id as id, COUNT(u2) AS score
order by score desc
limit 15
match p=(w1:USER)-[:TRUST]->(w2:USER {Id: id})-[:TRUST]->(w3:USER)
return w1.Id as user1,w2.Id as user2, w3.Id as user3
Starting with
MATCH (u1:USER)-[:TRUST]->(u2:USER)
with u2.Id as id, COUNT(u2) AS score
order by score desc
limit 15
You are basically creating a list of all u1 trusts u2; And COUNT(u2) = # of u2 matched. So assuming u1 trusts u2 has 100 matches, COUNT(u2) would put '100' in that column for each row. (and then you order on what is now a constant, which does nothing, and limit 15, so you now have an arbitrary list of 15 u1 trusts u2.
So that just leaves
match p=(w1:USER)-[:TRUST]->(w2:USER {Id: id})-[:TRUST]->(w3:USER)
So that is match each path p where a user w1 trusts user w2 (with for-each id from first part) who trusts a user w3.
So, fixing the first part, to get 'top 15 trusted users you need to count the number of incoming trusts
MATCH (u1:USER)-[trusts:TRUST]->(u2:USER)
with u2, COUNT(trusts) AS score
order by score desc
limit 15
So now you have 15 most trusted users and you can verify this with return u2.id, score. To get people who trust these people you would than just need to ask like...
MATCH (u3:USER)-[:TRUST]->(u2)
and u3 will then be all users who trust someone from top 15 trusted people (u2).
As an additional note, if you are using the neo4j web browser, try pre-pending the PROFILE keyword to your cypher for some insight into what the cypher query actually does.
Edit 1:
Now to explain what query 4 does MATCH (n) WITH n limit 15 return "1". As I am sure you guessed, MATCH (n) WITH n limit 15 matches all nodes but limits results to first 15. On the RETURN part, you are saying "For each row, return the constant '1'.", Which give you 15 distinct rows internally, but the returned rows are not distinct. This is what the DISTINCT keyword is for. Using RETURN DISTINCT "1" says "For each row, return the constant '1', but filter the result set to only have distinct rows." aka, no 2 columns will have the same value. The non-distinct result is useful if you know there will be some duplicate rows, but you want to see them anyways (maybe for a weight reference, or knowing that they are from 2 separate fields).
As I mentioned in the EDIT1, the problem here was that I thought the WHIT X LIMIT N doSomeThing would execute like a foreach(x : X) loop, if I use x, and would not, if I don't use x. stupid assumption...

neo4j how to work with two match

I have two queries.
First query is
match (user)-[r:CreatesChat]-(chatitems)
Second query is
match (chatitems)-[r:PartOf]-(teamschat)-[s:OwnedBy]-()
I want to return the first 3 users from the first query
And to return the first 3 teams from the second query
The goal is to check if users from first query have the teams of second query
My neo4j query is
match (user)-[r:CreatesChat]-(chatitems)
with user.id as uid,chatitems.id as chatid
order by uid desc
with collect([uid])[..3] as users,collect([chatid])[..3] as chats
UNWIND users AS idusers
match (chatitems)-[r:PartOf]-(teamschat)-[s:OwnedBy]-()
return idusers
This query return
Returned 133239 rows in 1360 ms, displaying first 1000 rows
But I execute the query
match (user)-[r:CreatesChat]-(chatitems)
with user.id as uid,chatitems.id as chatid
order by uid desc
with collect([uid])[..3] as users,collect([chatid])[..3] as chats
UNWIND users AS idusers
return idusers
idusers returned are right
Returned 3 rows in 539 ms.
How can I relate these two queries ?
I think you want to collect both the top 3 users and the top 3 teams and then unwind over each collection. Something like this:
MATCH (user:User)-[:CreatesChat]->(chatitems:Chat)
WITH user ORDER BY user.id DESC LIMIT 3
WITH collect(user) AS users
MATCH (chatitems:Item)-[:PartOf]->(teamsChat:Team)-[:OwnedBy]-()
WITH users, teamsChat ORDER BY teamsChat.id DESC LIMIT 3
WITH users, collect(teamsChat) AS teams
UNWIND users AS user
UNWIND teams AS team
MATCH p=(chatitems:Item)-[:PartOf]-(team)-[:OwndedBy]-(user)
RETURN p

neo4j match node with multipal nodes connected to him, return them as array

Assuming a node for this example a User has many friends (they are Users aswell)
lets assume as well that im looking the user by id which is unique.
How can i query to get one row back with a property of friends as array?
Example:
MATCH (user:User {id: "some-id"})-[:FriendsWith]->(friend:User)
RETURN user, friend
Now I expected the result to be an array of length one,
like this [{user: data, friend: [array of users]}]
But instade I got rows [{user: , friend:}, {user, friend: }]
the user was duplicated in each row..
You can use the collect function to create a collection:
MATCH (user:User {id: "some-id"})-[:FriendsWith]->(friend:User)
RETURN user, collect(friend.name) AS friends
There is an implicit group by when using an aggregation.

Neo4j match nodes that are in relationship with one OR another node

I'm trying to wrap my head around one query. For example I have this pattern (photo:Photo)-[:AUTHOR]->(user:User). User can have friends (user:User)-[:FRIEND]->(friend:User). So how can i make a query in which i will find all :Photos made by me or my friends and sort them by date if there is any?
MATCH (user:User {id: 'me'}), (user)-[:FRIEND]-(friend:User)
//other pattern matches that I need to do
OPTIONAL MATCH (photo:Photo)-[:AUTHOR]-(user | friend)
RETURN photo
ORDER BY photo.date
LIMIT 42
But as far as I know this costruct (user | friend) is invalid. So what's correct way to do that?
If you only look for a single relationship to a defined User node, a simple way would be to use a variable length relationship with length 0 to 1. This collects all nodes with a distance of 0 (which is you start node) and all nodes with a distance of 1.
MATCH (user:User {id: 'me'})-[:FRIEND*0..1]-(me_and_friend:User)
OPTIONAL MATCH (photo:Photo)-[:AUTHOR]-(me_and_friend)
RETURN photo
ORDER BY photo.date
LIMIT 42
A more generic solution would be to collect different nodes into arrays, combine these arrays and then use UNWIND to MATCH again:
MATCH (user:User {id: 'me'}), (user)-[:FRIEND]-(friend:User)
WITH collect(user)+collect(friend) AS me_and_friends
UNWIND me_and_friends AS allusers
OPTIONAL MATCH (photo:Photo)-[:AUTHOR]-(allusers)
RETURN photo
ORDER BY photo.date
LIMIT 42
This could be useful if you MATCH longer paths or more complex patterns.

Neo4j similar paths

i want to do a query that will take all users (without a pre-condition like user ids) , and to find the common similar paths. (for example top 10 users flows)
For example:
User u1 has events: a,b,c,d
User u2 has events: b,d,e
Each event is a node with property event-type
the result should look like:
[a,b,e] - 100 users
[a,c,f] -80 users
[b,d,t]- 50 users
.......
the data the generated the 1st aggregated row in the result can be for example:
user 1: a,b,c,e
user 2: a,b,e,f
.........
user 100: a,c,t,b,g,e
i wonder if this link can help:
http://neo4j.com/docs/stable/rest-api-graph-algos.html#rest-api-execute-a-dijkstra-algorithm-with-equal-weights-on-relationships
Here is a Cypher query that returns all the Event nodes that user 1 and user 2 have in common (in a single row):
MATCH (u1:User {id: 1}) -[:HAS]-> (e:Event) <-[:HAS]- (u2:User {id: 2})
RETURN u1, u2, COLLECT(e);
[Added by MichaelHunger; modified by cybersam] For your additional question try:
// Specify the user ids of interest. This would normally be a query parameter.
WITH [1,2,3] as ids
MATCH (u1:User) -[:HAS]-> (e:Event)
// Only match events for users with one of the specified ids.
WHERE u1.id IN ids
// Count # of distinct user ids per event, and count # of input ids
WITH e, size(collect(distinct u1.id)) as n_users, size(ids) AS n_ids
// Only match when the 2 counts are the same
WHERE n_users = n_ids
RETURN e;

Resources