Neo4j Pattern Comprehension failing with path condition in where clause - neo4j

I am using Neo4j 3.4 and am struggling with this particular query
MATCH (u:User)-[:IS_A_MEMBER_OF]->(c:Church)
RETURN size([(p:Post)<-[:POSTED]-(:User)-[:IS_A_MEMBER_OF]->(c) WHERE NOT (u)-[:ACKNOWLEDGED|POSTED]->(p) | p])
This query is designed to get the number of posts for the given Church that a user has not yet acknowledged and did not post themselves. In other words, it should retrieve all the posts by members of the church, then figure out which ones the user u has neither acknowledged or posted and return the count.
Unfortunately, I cannot figure out why Neo4j is not doing the check in the where clause. Is there something about pattern comprehensions that I am missing? Because the number returned is the same for all users, no matter whether they have acknowledged or posted any of the posts.
Thanks!

Here is a working example, I used count instead of size, size is for arrays and here you have a node row, count will aggregate all your row into a result.
MATCH (u:User)-[:IS_A_MEMBER_OF]->(c:Church),
(c)<-[:IS_A_MEMBER_OF]-(:User)-[:POSTED]->(p:Post)
WHERE NOT (u)-[:ACKNOWLEDGED|POSTED]->(p)
RETURN c, u, count(p)
This return for every church and members of this church, a number of unacknowledged posts, not posted by the member.

I submitted a similar bug report. This seems to be a problem starting with 3.3.6.
https://github.com/neo4j/neo4j/issues/11967

Related

How to optimise recursive query - Neo4j?

I am developing a contact tracing framework using Neo4j. There are 2 types of nodes, namely Person and Location. There exists a relationship VISITED between a Person and a Location, which has properties startTS and endTS. Example:
Now suppose person 1 is infected. I need to find all the persons who have been in contact with this person. For each person identified, I need to find all other persons who have been in contact with that person. This process is repeated until an identified person has not met anyone. Here is a working code:
MATCH path = (infected:Person {id:'1'})-[*]-(otherPerson:Person)
WITH relationships(path) as rels, otherPerson
WHERE all(i in range(1, size(rels)-1)
WHERE i % 2 = 0
OR (rels[i].endTS >= rels[i-1].startTS AND rels[i].startTS <= rels[i-1].endTS)
)
RETURN otherPerson
The problem is that the process is taking way too much time to complete with large datasets. Can the above query be optimised? Thank you for your help.
For this one, unfortunately, there are some limitations on our syntax for filtering these more complex conditions during expansion. We can cover post-expansion filtering, but you'd want an upper bound otherwise this won't perform well on a more complex graph.
To get what you need today (filtering during-expansion instead of after), you would need to implement a custom procedure in Java leveraging our traversal API, and then call the procedure in your Cypher query.
Advanced syntax that can cover these cases has already been proposed for GQL, and we definitely want that in Cypher. It's on our backlog.

How to delete nodes and relationship by using aggregate function on a value

I am using neo4j for the first time, and its fun using such an interactive database, but currently i got stuck in a problem, i have a data of people(uid,first name,last name, skills) , i also have a relationship [:has_skill]
my result frame looks like - p1 has a skill s (Robert has skill java)
I need to find out how many people have common skills, so i tried the following cypher query
match (p1:People)-[:has_skill]->(s:Skill)<-[:has_skill]-(p2:People)
where p1.people_uid="49981" and p2.people_uid="34564"
return p1.first_name+' '+p1.last_name as Person1, p2.first_name+' '+p2.last_name as Person2,s.skill_name,s.skillid,count(s)
i am getting p1 as different persons, but due to high skill set, the p2 person is getting repeated, and also the skill is not changing, i tried to delete every node and relationship where skill count of a person is greater then 6 to get good results, but cannot delete it, i am getting "invalid use of aggregating function"
This is my attempt to delete
match (p1:People)-[:has_skill]->(s:Skill)
where count(s)>6
detach delete p1,s
Please if anyone could guide or correct me where i am going wrong, your help would be highly appreciable . Thanks in advance.
Make sure when using count or other aggregating functions, they are within a WITH clause or a RETURN clause - seems to be a design decision that Neo Technology made when creating Neo4j - see some of the following links for similar cases to yours:
How to count the number of relationships in Neo4j
Neo4j aggregate function
I need to count the number of connection between two nodes with a certain property
Also - see the WITH clause documentation here and the RETURN clause documentation here, in particular, this part of the WITH documentation:
Another use is to filter on aggregated values. WITH is used to introduce aggregates which can then be used in predicates in WHERE. These aggregate expressions create new bindings in the results. WITH can also, like RETURN, alias expressions that are introduced into the results using the aliases as the binding name.
In your case, you are going to want your aggregate function to be used within a WITH clause because you need to use WHERE afterwards to filter only those persons with more than 6 skills. You can use the following query to see which persons have more than 6 skills:
match (p1:People)-[r:has_skill]->(s:Skill)
with p1,count(s) as rels, collect (s) as skills
where rels > 6
return p1,rels,skills
After confirming that the result set is correct, you can use the following query to delete the persons who have more than 6 skills along with all the skill nodes that these persons are related to:
MATCH(p1:People)-[r:has_skill]->(s:Skill)
WITH p1,count(s) as rels, collect (s) as skills
WHERE rels > 6
FOREACH(s in skills | DETACH DELETE s)
DETACH DELETE p1

Neo4j - complete a query with an alternative match if it finds few results

I am trying to write a query which looks for potential friends in a Neo4j db based on common friends and interests.
I don't want to post the whole query (part of school assignment), but this is the important part
MATCH (me:User {firstname: "Name"}), (me)-[:FRIEND]->(friend:User)<-[:FRIEND]-(potential:User), (me)-[:MEMBER]->(i:Interest)
WHERE NOT (potential)-[:FRIEND]->(me)
WITH COLLECT(DISTINCT potential) AS potentialFriends,
COLLECT(DISTINCT friend) AS friends,
COLLECT(i) as interests
UNWIND potentialFriends AS potential
/*
#HANDLING_FINDINGS
Here I count common friends, interests and try to find relationships between
potential friends too -- hence the collect/unwind
*/
RETURN potential,
commonFriends,
commonInterests,
(commonFriends+commonInterests) as totalPotential
ORDER BY totalPotential DESC
LIMIT 10
In the section #HANDLING_FINDINGS I use the found potential friends to find relationships between each other and calculate their potential (i.e. sum of shared friends and common interests) and then order them by potential.
The problem is that there might be users with no friends whom I would also like to recommend someone friends.
My question - can I somehow insert a few random users into the "potential" findings if their count is below 10 so that everyone gets a recommendation?
I have tried something like this
...
UNWIND potentialFriends AS potential
CASE
WHEN (count(potential) < 10 )
...
But that produced an error as soon as it hit start of the CASE. I think that case can be used only as part of a command like return? (maybe just return)
Edit with 2nd related question:
I was already thinking of matching all users and then ranking them based on common friends/interestes, but wouldn't searching through the whole DB be intensive?
A CASE expression can be used wherever a value is needed, but it cannot be used as a complete clause.
With respect to your main question, you can put a WITH clause like the following between your existing WITH and UNWIND clauses:
WITH friends, interests,
CASE WHEN SIZE(potentialFriends) < 10 THEN {randomFriends} ELSE potentialFriends END AS potentialFriends
If the size of the potentialFriends collection is less than 10, the CASE expression assigns the value of the {randomFriends} parameter to potentialFriends.
As for your second question, yes it would be expensive.

Neo4j- incorrect count in multiple match query

When I am trying to execute this query
match(u:User)-[ro:OWNS]->(p:PushDevice) where p.type='gcm'
match(com:Comment)
return count(com) as total_comments,count(ro) as device
this is returning the same number in both total_comments and device which is the number of total comment.
I feel like your query should work, though I'm more confident that this will work:
MATCH (u:User)-[ro:OWNS]->(p:PushDevice) WHERE p.type='gcm'
WITH count(ro) AS device
MATCH (com:Comment)
RETURN count(com) as total_comments, device
Your query is generating a row for every combination of your MATCH results. If you just returned the ro and com values, this would be more clear. See this console for an example. That console has 2 comments and a single OWNS relationship, but the result shows 2 rows (both rows have the same OWNS relationship). So, your query is essentially counting the number of rows -- not what you expected.
Here is an example of a query that would work as you you expected:
MATCH (u:User)-[ro:OWNS]->(p:PushDevice {type:'gcm'})
WITH COUNT(ro) AS device
MATCH (com:Comment)
RETURN count(com) AS total_comments, device;
[EDITED]
This would also work logically, but is less performant (as it takes a cartesian product and then filters out duplicates):
MATCH (u:User)-[ro:OWNS]->(p:PushDevice { type: 'gcm' })
MATCH (com:Comment)
RETURN COUNT(DISTINCT com), COUNT(DISTINCT ro);
Observation
The power of neo4j comes from its efficient handling of relationships. So, the most efficient queries tend to be for connected subgraphs (where all nodes are connected by relationships).
Since your query is not for a single connected subgraph, getting the answer you want is naturally going to be a bit more convoluted and can be inefficient.
If you determine that the suggested queries are too slow, you can try making 2 separate queries instead. That may also make make your code easier to understand.

How to query recommendation using Cypher

I'm trying to query Book nodes for recommendation by Cypher.
I want to recommend A:Book and C:Book for A:User.
i'm sorry I need some graph to explain this question, but I could't up graph image because my lepletion lacks for upload function.
I wrote query below.
match (u1:User{uid:'1003'})-->(o1:Order)-->(b1:Book)<--(o2:Order)
<--(u2:User)-->(o3:Order)-->(b2:Book)
return b2
This query return all Books(A,B,C,D) dispite cypher's Uniqueness.
I expect to only return A:Book and C:Book.
Is this behavior Neo4j' specification?
How do I get expected return? Thanks, everyone.
environment:
Neo4j ver.v2.0.0-RC1
Using Neo4j Server with REST API
Without the sample graph its hard to say why you get something back when you expected something else. You can share a sample graph by including a create statement that would generate said graph, or by creating it in Neo4j console and putting the link in your question. Here is an example of the latter: console.neo4j.org/r/fnnz6b
In the meantime, you probably want to declare the type of the relationships in your pattern. If a :User has more than one type of outgoing relationships you will be excluding those other paths based on the labels of the nodes on the other end, which is much less efficient than to only traverse the right relationships to begin with.
To my mind its not clear whether (u:User)-->(o:Order)-->(b:Book) means that a user has one or more orders, and each order consists of one or more books; or if it means only that a user ordered a book. If you can share a sample, hopefully that will be clear too.
Edit:
Great, so looking at the graph: You get B and D back because others who bought B also bought D, and others who bought D also bought B, which is your criterion for recommendation. You can add a filter in the WHERE clause to exclude those books that the user has already bought, something like
WHERE NOT (u1)-[:BUY]->()-[:CONTAINS]->(b2)
This will give you A, C, C back, since there are two matching paths to C. It's probably not important to get two result items for C, so you can either limit the return to give only distinct values
RETURN DISTINCT(b2)
or group the return values by counting the matching paths for each result as a 'recommendation score'
RETURN b2, COUNT(b2) as score
Also, if each order only [CONTAINS] one book, you could try modelling without order, just (:User)-[:BOUGHT]->(:Book).

Resources