Trying a single Neo4j Cypher query for my graph - neo4j

I have "users" who owns "items", users are also friends w/ each other. I am trying to construct a cypher query to return all items I own PLUS those my friends own in a single query. I can do them individually but can't figure out how to do it in a single query.
RELATIONSHIPS:
(u:user)-[:OWNS]-(i:items)
(u:user)-[:FRIEND]-(f:user)
Let's say I have just two users in my DB and 100 items. Out of 100, the first person owns (1-5) 5 items and the 2nd person owns another 5 items(6-10). These two users are also friends.
I get 5 items if I do:
MATCH (uer1)-[:OWNS]->(i:items) return i
I get another 5 items if I do:
MATCH (uer1)-[:FRIEND]->(f)-[:OWNS]->(i:items) return i
But I need to combine them both for a given user(user1) so I can return all 10 items in a single shot. How to do that?

You have two (or more options)
Union
MATCH (user:User {name:"Raja"})-[:OWNS]->(i:Item) return i
UNION
MATCH (user:User {name:"Raja"})-[:FRIEND]->(f)-[:OWNS]->(i:Item) return i
Variable length paths
MATCH (user:User {name:"Raja"})-[:FRIEND*0..1]->(f)-[:OWNS]->(i:Item) return i
in this case we look at friends of the distance 0 (the user itself) to one (first degree friends)
The first option might be faster
The second is more versatile.

Related

find all items that wasn't bought by a person and count the times it was bought

I have a graph that looks like this.
I want to find all the items bought by the people, who bought the same items as Gremlin using cypher.
Basically I want to imitate the query in the gremlin examples that looks like this
g.V().has("name","gremlin")
.out("bought").aggregate("stash")
.in("bought").out("bought")
.where(not(within("stash")))
.groupCount()
.order(local).by(values,desc)
I was trying to do it like this
MATCH (n)-[:BOUGHT]->(g_item)<-[:BOUGHT]-(r),
(r)-[:BOUGHT]->(n_item)
WHERE
n.name = 'Gremlin'
AND NOT (n)-[:BOUGHT]->(n_item)
RETURN n_item.id, count(*) as frequency
ORDER by frequency DESC
but it seems it doesn't count frequencies properly - they seem to be twice as big.
4 - 4
5 - 2
3 - 2
While 3 and 5 was bought only once and 4 was bought 2 times.
What's the problem?
Cypher is interested in paths, and your MATCH finds the following:
2 paths to item 3 both through Rexter (via items 2 and 1)
2 paths to item 5 through Pipes (via items 1 and 2)
4 paths to item 4 via Rexter and Pipes (via items 1 and 2 for each person)
Basically the items are being counted multiple times because there are multiple paths to that same item per individual person via different common items with Gremlin.
To get accurate counts, you either need to match to distinct r users, and only then match out to items the r users bought (as long as they aren't in the collection of items bought by Gremlin), OR you need to do the entire match, but before doing the counts, get distinct items with respect to each person so each item per person only occurs once...then get the count per item (counts across all persons).
Here's a query that uses the second approach
MATCH (n:Person)-[:BOUGHT]->(g_item)
WHERE n.name = 'Gremlin'
WITH n, collect(g_item) as excluded
UNWIND excluded as g_item // now you have excluded list to use later
MATCH (g_item)<-[:BOUGHT]-(r)-[:BOUGHT]->(n_item)
WHERE r <> n AND NOT n_item in excluded
WITH DISTINCT r, n_item
WITH n_item, count(*) as frequency
RETURN n_item.id, frequency
ORDER by frequency DESC
You should be using labels in your graph, and you should use them in your query in order to leverage indexes and quickly find a starting point in the graph. In your case, an index on :Person(name), and usage of the :Person label in the query, should make this quick even as more nodes and more :Persons are added to the graph.
EDIT
If you're just looking for conciseness of the query, and don't have a large enough graph where performance will be an issue, then you can use your original query but add one extra line to get distinct rows of r and n_item before you count the item. This ensures that you only count an item per person once when you get the count.
Note that forgoes optimizations for handling excluded items (it will do a pattern match per item rather than aggregating the collection of bought items and doing a collection membership check), and it aggregates on items while doing property access rather than doing property access only after aggregating by the node.
MATCH (n:Person)-[:BOUGHT*2]-(r)-[:BOUGHT]->(n_item)
WHERE n.name = 'Gremlin'
WITH DISTINCT n, r, n_item
WHERE NOT (n)-[:BOUGHT]->(n_item)
RETURN n_item.id, count(*) as frequency
ORDER by frequency DESC
I am adding a quick shortcut in your match, using :BOUGHT*2 to indicate two :BOUGHT hops to r, since we don't really care about the item in-between.

How to return single node with Multiple match for different labels in neo4j?

I am using neo4j 3.2.x. What I am try to do is write a query that will update relation between two nodes (User and Gender) and return single User node with InterestedInGender property pulled from relation as array. There is a problem with the below query as right now it returns multiple records with single value for interestedInGender. Relations are created properly but when returning data it is returning multiple records. I just want to return User node. Is there any way we can fix this query to just return single user node.
MATCH (u:User {_id:"1234"})
MATCH (ig:Gender) WHERE ig.value IN ["male", "female"]
WITH u, ig
OPTIONAL MATCH (u)-[igr:INTERESTED_IN_GENDER]->()
DELETE igr
with u, ig
MERGE (u)-[:INTERESTED_IN_GENDER]->(ig)
RETURN u{
._id,
interestedInGender: [(u)-[:INTERESTED_IN_GENDER]->(ig:Gender) | ig.value]
}
The reason you're getting multiple records (rows) is because your ig match to gender matches to two :Gender nodes...two rows where both rows have the same u node, but different ig nodes. That cardinality remains throughout the rest of your query, and so you get two rows back.
You need to shrink the cardinality of u back down to 1 after you MERGE the relationship, so add this after your MERGE but before your RETURN:
WITH distinct u

Neo4j 'match' within 'unwind' operation

I have a Neo4j query where I am trying to get all distinct ids and then, for each id, return all nodes that match that id. Here's what my query looks like:
match (x:Log) with collect(distinct x.id) as ids
unwind ids as i
match (y:Log {id:i}) return y;
I was hoping that the results of this query would be grouped by id aka have multiple nodes per row, where each row holds all nodes that share an id. Instead it is returning only one node per row, and there are multiple rows of the same id. How do I get the results to be grouped by id?
Example: Let's say there are five nodes of label Log. Two of the Log nodes have id='abc' and three of them have id='123'. Right now, my query is returning five rows that each contain one node, but I would like it to return two rows: one row that contains all the Log nodes of id='123', and another row that contains all the Log nodes of id='abc'.
This will return a distinct id per row, along with the nodes having that id:
MATCH (x:Log)
RETURN x.id AS id, COLLECT(x) as nodes;

Optimizing Cypher Query - Neo4j

I have the following query
MATCH (User1 )-[:VIEWED]->(page)<-[:VIEWED]- (User2 )
RETURN User1.userId,User2.userId, count(page) as cnt
Its a relatively simple query to find co-page view counts between users.
Its just too slow, and I have to terminate it after some time.
Details
User consists of about 150k Nodes
Page consists of about 180k Nodes
User -VIEWS-> Page has about 380k Relationships
User has 7 attributes, and Page has about 5 attributes.
Both User and Page are indexed on UserId and PageId respectively.
Heap Size is 512mb (tried to run on 1g too)
What would be some of the ways to optimize this query as I think the count of the nodes and relationships are not a lot.
Use Labels
Always use Node labels in your patterns.
MATCH (u1:User)-[:VIEWED]->(p:Page)<-[:VIEWED]-(u2:User)
RETURN u1.userId, u2.userId, count(p) AS cnt;
Don't match on duplicate pairs of users
This query will be executed for all pairs of users (that share a viewed page) twice. Each user will be mapped to User1 and then each user will also be mapped to User2. To limit this:
MATCH (u1:User)-[:VIEWED]->(p:Page)<-[:VIEWED]-(u2:User)
WHERE id(u1) > id(u2)
RETURN u1.userId, u2.userId, count(p) AS cnt;
Query for a specific user
If you can bind either side of the pattern the query will be much faster. Do you need to execute this query for all pairs of users? Would it make sense to execute it relative to a single user only? For example:
MATCH (u1:User {name: "Bob"})-[:VIEWED]->(p:Page)<-[:VIEWED]-(u2:User)
WHERE NOT u1=u2
RETURN u1.userId, u2.userId, count(p) AS cnt;
As you are trying different queries you can prepend EXPLAIN or PROFILE to the Cypher query to see the execution plan and number of data hits. More info here.

neo4j match statment returns more nodes then wanted

I have in my DB a user node that is connected to a userDevice node. On the relationship between them, there is a property called pushId.
I'm trying to get a list of the pushIds and the user devices ids of a specific users.
match (user:User)-[r:WITH_DEVICE]->(device:UserDevice)
where user.id="222" or user.id="243243"
RETURN r.pushId,device.id
instead of 2 rows it duplicate one row twice and return 3 rows.
Use DISTINCT keyword
match (user:User)-[r:WITH_DEVICE]->(device:UserDevice)
where user.id="222" or user.id="243243"
RETURN DISTINCT r.pushId,device.id
http://neo4j.com/docs/stable/query-return.html#return-unique-results

Resources