How to delete nodes and relationship by using aggregate function on a value - neo4j

I am using neo4j for the first time, and its fun using such an interactive database, but currently i got stuck in a problem, i have a data of people(uid,first name,last name, skills) , i also have a relationship [:has_skill]
my result frame looks like - p1 has a skill s (Robert has skill java)
I need to find out how many people have common skills, so i tried the following cypher query
match (p1:People)-[:has_skill]->(s:Skill)<-[:has_skill]-(p2:People)
where p1.people_uid="49981" and p2.people_uid="34564"
return p1.first_name+' '+p1.last_name as Person1, p2.first_name+' '+p2.last_name as Person2,s.skill_name,s.skillid,count(s)
i am getting p1 as different persons, but due to high skill set, the p2 person is getting repeated, and also the skill is not changing, i tried to delete every node and relationship where skill count of a person is greater then 6 to get good results, but cannot delete it, i am getting "invalid use of aggregating function"
This is my attempt to delete
match (p1:People)-[:has_skill]->(s:Skill)
where count(s)>6
detach delete p1,s
Please if anyone could guide or correct me where i am going wrong, your help would be highly appreciable . Thanks in advance.

Make sure when using count or other aggregating functions, they are within a WITH clause or a RETURN clause - seems to be a design decision that Neo Technology made when creating Neo4j - see some of the following links for similar cases to yours:
How to count the number of relationships in Neo4j
Neo4j aggregate function
I need to count the number of connection between two nodes with a certain property
Also - see the WITH clause documentation here and the RETURN clause documentation here, in particular, this part of the WITH documentation:
Another use is to filter on aggregated values. WITH is used to introduce aggregates which can then be used in predicates in WHERE. These aggregate expressions create new bindings in the results. WITH can also, like RETURN, alias expressions that are introduced into the results using the aliases as the binding name.
In your case, you are going to want your aggregate function to be used within a WITH clause because you need to use WHERE afterwards to filter only those persons with more than 6 skills. You can use the following query to see which persons have more than 6 skills:
match (p1:People)-[r:has_skill]->(s:Skill)
with p1,count(s) as rels, collect (s) as skills
where rels > 6
return p1,rels,skills
After confirming that the result set is correct, you can use the following query to delete the persons who have more than 6 skills along with all the skill nodes that these persons are related to:
MATCH(p1:People)-[r:has_skill]->(s:Skill)
WITH p1,count(s) as rels, collect (s) as skills
WHERE rels > 6
FOREACH(s in skills | DETACH DELETE s)
DETACH DELETE p1

Related

Cypher: Avoid cycle in multiple relationship paths

My website has pages dedicated to some events, represented by nodes in neo4j. Those events possess sub-events which are relationships under neo4j, and which correspond to links on the source page to the target page. I currently have a search engine that highlights the links to the searched events, but it is flawed by cycles in the data model. Indeed it is highlighting all the links that contain cyclic references to the current page if this page contains any link to the searched event.
The aim is therefore to have a query that is able to flag the nodes and relationships which are related to the searched events, without flagging a path only because of cyclic relationships.
I've created a small dataset representative of the issue that you can build using this query:
CREATE
(r:Event:Searched {name:'R', tag:1}),
(d:Event:Searched {name:'D', tag:1}),
(o:Event {name:'O'}),
(a:Event {name:'A'}),
(b:Event {name:'B'}),
(c:Event {name:'C'}),
(e:Event {name:'E'}),
(o)-[:hasEvent]->(a),
(o)-[:hasEvent]->(e),
(o)-[:hasEvent]->(r),
(o)-[:hasEvent]->(c),
(a)-[:hasEvent]->(b),
(b)-[:hasEvent]->(o),
(c)-[:hasEvent]->(d)
Which produces the following graph:
My aim is to have a query that only fetches nodes C and O, but not A or B, as the only reason they are flagged is that O is already flagged:
.
My current query that I need to fix is the following:
MATCH path=(upper:Event)-[:hasEvent*]->(source:Event:Searched)
RETURN upper
I hope you can help me, I couldn't manage to make similar questions' answers work on my specific case.
Ideally, the solution shouldn't be too computing-intensive, as my real model is quite big (2.300.000 nodes and 9.500.000 relationships), and the current indexing in the search engine is already quite slow.
Thank you in advance for your help
You could try out apoc.path.expandConfig procedure. It has a uniqueness property, which you can be configured that a node cannot be traversed more than once.
MATCH (n:Event)
CALL apoc.path.expandConfig(n, {
relationshipFilter: "hasEvent>",
labelFilter: "/Searched",
uniqueness: "NODE_GLOBAL"
}) YIELD path
RETURN [n IN nodes(path) WHERE NOT n:Searched] AS upper
However, this query will still return A and B, because with MATCH (n:Event) you start looking from every node (regardless of the relationship between O and A). But if I understood you correctly, you don't wan't to start from all nodes, but from a specific one ("pages dedicated to some events"). So you might want to start with, e.g., MATCH (n:Event {name: "O"}) that will return only O and C.

How to optimise recursive query - Neo4j?

I am developing a contact tracing framework using Neo4j. There are 2 types of nodes, namely Person and Location. There exists a relationship VISITED between a Person and a Location, which has properties startTS and endTS. Example:
Now suppose person 1 is infected. I need to find all the persons who have been in contact with this person. For each person identified, I need to find all other persons who have been in contact with that person. This process is repeated until an identified person has not met anyone. Here is a working code:
MATCH path = (infected:Person {id:'1'})-[*]-(otherPerson:Person)
WITH relationships(path) as rels, otherPerson
WHERE all(i in range(1, size(rels)-1)
WHERE i % 2 = 0
OR (rels[i].endTS >= rels[i-1].startTS AND rels[i].startTS <= rels[i-1].endTS)
)
RETURN otherPerson
The problem is that the process is taking way too much time to complete with large datasets. Can the above query be optimised? Thank you for your help.
For this one, unfortunately, there are some limitations on our syntax for filtering these more complex conditions during expansion. We can cover post-expansion filtering, but you'd want an upper bound otherwise this won't perform well on a more complex graph.
To get what you need today (filtering during-expansion instead of after), you would need to implement a custom procedure in Java leveraging our traversal API, and then call the procedure in your Cypher query.
Advanced syntax that can cover these cases has already been proposed for GQL, and we definitely want that in Cypher. It's on our backlog.

Get full graph that node N is a part of in neo4j

I'm trying use Cypher to get the entire graph that exists if I start at a given node in neo4j. When I say entire graph I mean all nodes and relationships that are connected to at least one other node in the graph.
I've seen examples where people can get all nodes that might be connected to a given start node with a known relationship. Examples of this include this and this, but how could I do this if I do not know the relationships?
Ultimately I'd like every node and relationship where I start at one given node and sprawl out, listing the nodes that are linked by every relationship.
I've tried this:
START n=node(441007) MATHC (n)-[:*]->(d) RETURN d
but the syntax is incorrect. I'm unsure if you can submit a wildcard relationship. Additionaly I do not think this will give me what I am looking for.
Try this:
MATCH (n)-[r*]->(d)
WHERE ID(n) = 441007
RETURN r, d
This will fan out from n (if using an older version of Neo, you should revert to your START syntax) and return you the paths to each d Node that can be reached. It is relationship type agnostic through not defining the relationship label. If you didn't care about the path you could omit itwith:
MATCH (n)-[*]->(d)
WHERE ID(n) = 441007
RETURN d
Obviously on a large graph this will get expensive!
Edit
Meant to add the link to the cheat sheet, check out the section called Patterns.
Hej WildBill,
i have created a Company Graph for learning Neo4J, so i send the following Pattern against the Graph a got this result:
START a=node(9)
MATCH (a)<-[rel]-(d)
MATCH (d)-[sk]->(skill)
RETURN a, d, skill
Node 9 is my Company, which is part of the Graph.

How to query recommendation using Cypher

I'm trying to query Book nodes for recommendation by Cypher.
I want to recommend A:Book and C:Book for A:User.
i'm sorry I need some graph to explain this question, but I could't up graph image because my lepletion lacks for upload function.
I wrote query below.
match (u1:User{uid:'1003'})-->(o1:Order)-->(b1:Book)<--(o2:Order)
<--(u2:User)-->(o3:Order)-->(b2:Book)
return b2
This query return all Books(A,B,C,D) dispite cypher's Uniqueness.
I expect to only return A:Book and C:Book.
Is this behavior Neo4j' specification?
How do I get expected return? Thanks, everyone.
environment:
Neo4j ver.v2.0.0-RC1
Using Neo4j Server with REST API
Without the sample graph its hard to say why you get something back when you expected something else. You can share a sample graph by including a create statement that would generate said graph, or by creating it in Neo4j console and putting the link in your question. Here is an example of the latter: console.neo4j.org/r/fnnz6b
In the meantime, you probably want to declare the type of the relationships in your pattern. If a :User has more than one type of outgoing relationships you will be excluding those other paths based on the labels of the nodes on the other end, which is much less efficient than to only traverse the right relationships to begin with.
To my mind its not clear whether (u:User)-->(o:Order)-->(b:Book) means that a user has one or more orders, and each order consists of one or more books; or if it means only that a user ordered a book. If you can share a sample, hopefully that will be clear too.
Edit:
Great, so looking at the graph: You get B and D back because others who bought B also bought D, and others who bought D also bought B, which is your criterion for recommendation. You can add a filter in the WHERE clause to exclude those books that the user has already bought, something like
WHERE NOT (u1)-[:BUY]->()-[:CONTAINS]->(b2)
This will give you A, C, C back, since there are two matching paths to C. It's probably not important to get two result items for C, so you can either limit the return to give only distinct values
RETURN DISTINCT(b2)
or group the return values by counting the matching paths for each result as a 'recommendation score'
RETURN b2, COUNT(b2) as score
Also, if each order only [CONTAINS] one book, you could try modelling without order, just (:User)-[:BOUGHT]->(:Book).

neo4j complex pattern searching

I'm new to NEO4J and I need help on a specific problem. Or an answer if it's even possible.
SETUP:
We have 2 distinct type of nodes: users (A,B,C,D) and Products (1,2,3,4,5,6,7,8)
Next we have 2 distinct type of relationships between users and products where a users WANTS a Product and where a product is OWNED BY a user.
1,2 is owned by A
3,4 is owned by B
5,6 is owned by C
7,8 is owned by D
Now
B wants 1
C wants 3
D wants 5
So for now, I have no problems and I created the graph data with no difficulty. My questions starts here. We have a circle, when A wants product 8.
A-[:WANTS]->8-[:OWNEDBY]->D-[:WANTS]->5-[:OWNEDBY]->C-[:WANTS]->3-[:OWNEDBY]->B-[:WANTS]->1-[:OWNEDBY]->A
So we have a distinct pattern, U-[:WANTS]->P-[:OWNEDBY]->U
Now what I want to do is to find the paths toward the start node (initiating user that wants a product) following that pattern.
How do I define this using Cypher? Or do I need another way?
Thanks upfront.
i got a feeling this can be hacked with reduce and counting every even (every second) relationship:
MATCH p=A-[:OWNEDBY|WANTS*..20]->X
WITH r in relationships(p)
RETURN type(r),count(r) as cnt,
WHERE cnt=10;
or maybe counting all paths where the number of rels is even:
MATCH p=A-[:OWNEDBY|WANTS*..]->X
RETURN p,reduce(total, r in relationships(p): total + 1) as tt
WHERE tt%2=0;
but you graph must have the strict pattern, where all incoming relationship from the set of ownedby and wants must be different from all outgoin relationships from the same set. in other words, this pattern can not exist: A-[:WANTS]->B-[:WANTS]->C or A-[:OWNEDBY]->B-[:OWNEDBY]->C
the query is probably wrong in syntax, but the logic can be implemented in cypher whn you will play more with it.
or use gremlin, I think I saw somewhere a gremlin query, where you could define a pattern and than loop n-times via that pattern further till the end node.
I've played around with this and created http://console.neo4j.org/?id=qq9v1 showing the sample graph. I've found the following cypher statement solving the issue:
start a=node(1)
match p=( a-[:WANTS]->()-[:WANTS|OWNEDBY*]-()-[:OWNEDBY]->a )
return p
order by length(p) desc
limit 1
There is just one glitch: the intermediate part [:WANTS|OWNEDBY*] does not mandate alternating WANT and OWNEDBY chains. As #ulkas stated, this should not be an issue if you take care during data modelling.
You might also look into http://api.neo4j.org/current/org/neo4j/graphalgo/GraphAlgoFactory.html to apply graph algorithms from Java code. You might use unmangaged extensions to provide REST access to that.

Resources