Cypher find paths query - path

Don't have much experience in Cypher paths queries, but it seems that paths are the rational way to do what I want efficiently...
I have the following relevant relationships in my Neo4j:
p1-[r:SNEAK]->otherProfile
p1-[r:FRIEND]-otherProfile (the direction not relevant)
p1-[r:HANG]->venue<-[r:HANG]-otherProfile
p1-[r:INTERACT]->session<-[r:INTERACT]-otherProfile
p1-[r:INTERACT]->session<-[r:LIKE]-otherProfile
Let's say I have p1 in my hand, I want to perform a query to retrieve all the profiles where at least one condition from the following is exist (with distinct on the profiles):
p1 sneak at them
p1 is a (friend) || (friend of friend)
p1 hang at least one venue as them
p1 has a session with them
p1 has a session and they liked it
I also need the ability to extract the relation types in order to figure out what is the relation
between these profiles...
lets assume that these are all the relation types in the DB
At first it seems simple just retrieve all the below paths:
p=p1-[r*1..2]-profile
Few problems with that :
1) It will returns also profiles which sneaked at p1
2) It will returns also profiles which one of p1 friends sneaked at them
3) It will returns also profiles which are friends of profiles p1 sneaked on them
Is it possible to perform one Cypher query which will do the job for my use-case?

An Easy shortcut is the ability to do path qualifiers in the WHERE clause using not. So you can do your match, and then specify WHERE NOT(p1-[:SNEAK*1..2]->profile) or however you want to qualify it.

Related

How to delete nodes and relationship by using aggregate function on a value

I am using neo4j for the first time, and its fun using such an interactive database, but currently i got stuck in a problem, i have a data of people(uid,first name,last name, skills) , i also have a relationship [:has_skill]
my result frame looks like - p1 has a skill s (Robert has skill java)
I need to find out how many people have common skills, so i tried the following cypher query
match (p1:People)-[:has_skill]->(s:Skill)<-[:has_skill]-(p2:People)
where p1.people_uid="49981" and p2.people_uid="34564"
return p1.first_name+' '+p1.last_name as Person1, p2.first_name+' '+p2.last_name as Person2,s.skill_name,s.skillid,count(s)
i am getting p1 as different persons, but due to high skill set, the p2 person is getting repeated, and also the skill is not changing, i tried to delete every node and relationship where skill count of a person is greater then 6 to get good results, but cannot delete it, i am getting "invalid use of aggregating function"
This is my attempt to delete
match (p1:People)-[:has_skill]->(s:Skill)
where count(s)>6
detach delete p1,s
Please if anyone could guide or correct me where i am going wrong, your help would be highly appreciable . Thanks in advance.
Make sure when using count or other aggregating functions, they are within a WITH clause or a RETURN clause - seems to be a design decision that Neo Technology made when creating Neo4j - see some of the following links for similar cases to yours:
How to count the number of relationships in Neo4j
Neo4j aggregate function
I need to count the number of connection between two nodes with a certain property
Also - see the WITH clause documentation here and the RETURN clause documentation here, in particular, this part of the WITH documentation:
Another use is to filter on aggregated values. WITH is used to introduce aggregates which can then be used in predicates in WHERE. These aggregate expressions create new bindings in the results. WITH can also, like RETURN, alias expressions that are introduced into the results using the aliases as the binding name.
In your case, you are going to want your aggregate function to be used within a WITH clause because you need to use WHERE afterwards to filter only those persons with more than 6 skills. You can use the following query to see which persons have more than 6 skills:
match (p1:People)-[r:has_skill]->(s:Skill)
with p1,count(s) as rels, collect (s) as skills
where rels > 6
return p1,rels,skills
After confirming that the result set is correct, you can use the following query to delete the persons who have more than 6 skills along with all the skill nodes that these persons are related to:
MATCH(p1:People)-[r:has_skill]->(s:Skill)
WITH p1,count(s) as rels, collect (s) as skills
WHERE rels > 6
FOREACH(s in skills | DETACH DELETE s)
DETACH DELETE p1

Why neo4j don't allows not directed or bidirectional relationships at creation time?

I know that Neo4j requires a relationship direction at creation time, but allows ignore this direction in query time. By this way I can query my graph ignoring the relationship direction.
I also know that there are some workarounds for cases when the relationships are naturally bidirectional or not directed, like described here.
My question is: Why is it implemented that way? Has a good reason to not allow not directed or bidirectional relationships at creation time? Is it a limitation of the database architecture?
The Cypher statements like below are not allowed:
CREATE ()-[:KNOWS]-()
CREATE ()<-[:KNOWS]->()
I searched the web for an answer, but I did not find much. For example, this github issue.
Is strange to have to define a relationship direction to one that don't have it. It seems to me that i'm hurting the semantic of my graph.
EDIT 1:
To clarify my standpoint about a "semantic problem" (maybe the term is wrong):
Suppose that I run this simple CREATE statement:
CREATE (a:Person {name:'a'})-[:KNOWS]->(b:Person {name:'b'})
As result i have this very simple graph:
The :KNOWS relationship has a direction only because Neo4j requires a relationship direction at creation time. In my domain a knows b and b knows a.
Now, a new team member will query my graph with this Cypher query:
MATCH path = (a:Person {name:'a'})-[:KNOWS]-(b:Person {name:'b'})
return path
This new team member don't know that when I created this graph I considered that :KNOWS relationship is not directed. The result that he will see is the same:
By the result this new team member can think that only Person a consider knows Person b. It seems to me bad. Not for you? This make any sense?
Fundamentally, it boils down to the internals of how the data is stored on disk in Neo4j -- note Chapter 6 of the O'Reilly Neo4j e-book.
In the data structure of a relationship they have a "firstNode" and a "secondNode", where each is either the left or the right hand side of the relationship.
To flag a relationship as uni/bi-directional would require an additional bit per node, where I would argue it is better to retain the direction in the data store and just ignore direction during querying.
In Neo4j relationships are always directed.
But if you don't care about the direction, you can ignore the direction when querying.
MATCH (p1:Person {name:"me"})-[:KNOWS]-(p2)
RETURN p2;
And with MERGE you can also leave off the direction when creating.
MATCH (p1:Person {name:"me"})
MATCH (p2:Person {name:"you"})
MERGE (p1)-[:KNOWS]-(p2);
You only need 2 relationships if they really convey a different meaning, e.g. :FOLLOWS on Twitter.
It seems to me that i'm hurting the semantic of my graph.
I can't see why a < or > symbol used during creation of a relationship hurts the semantics of your graph if you are going to not use that symbol during matching (and thus treating that relationship as undirected/bidirectional).
Suppose that the syntax proposed by you is supported. Now how will you connect with an undirected relationship two nodes a and b? You still have two options:
CREATE (a)-[:KNOWS]-(b)
CREATE (b)-[:KNOWS]-(a)
The pair (a, b) is always ordered by appearance even if not by semantics. So even if we remove the < or > symbol from the relationship declaration, the problem with the order of nodes in it cannot be eliminated. Therefore simply don't treat it is a problem.

Adding relationship to existing nodes with Cypher doesn't work

I am working on Panama dataset using Neo4J graph database 1.1.5 web version. I identified Ion Sturza, former Prime Minister of Moldova on the database and want to make a map of his related network. I used following code to query using Cypher (creating a variable 'IonSturza'):
MATCH (IonSturza {name: "Ion Sturza"}) RETURN IonSturza
I identified that the entity 'CONSTANTIN LUTSENKO' linked differently to entities like 'Quade..' and 'Kinbo...' with a name in small letters as in this picture. I hence want to map a relationship 'SAME_COMPANY_AS' between the capslock and the uncapped version. I tried the following code based on this answer by #StefanArmbruster:
MATCH (a:Officer {name :"Constantin Lutsenko"}),(b:Officer{name :
"CONSTANTIN LUTSENKO"})
where (a:Officer{name :"Constantin Lutsenko"})-[:SHAREHOLDER_OF]->
(b:Entity{id:'284429'})
CREATE (a)-[:SAME_COMPANY_AS]->(b)
Instead of indexing, I used the 'where' statement to specify the uncapped version which is linked only to the entity bearing id '284429'.
My code however shows the cartesian product error message:
This query builds a cartesian product between disconnected patterns.If a part of a query contains multiple disconnected patterns, this will build a cartesian product between all those parts. This may produce a large amount of data and slow down query processing. While occasionally intended, it may often be possible to reformulate the query that avoids the use of this cross product, perhaps by adding a relationship between the different parts or by using OPTIONAL MATCH (identifier is: (b))<<
Also when I execute, there are no changes, no rows!! What am I missing here? Can someone please help me with inserting this relationship between the nodes. Thanks in advance!
The cartesian product warning will appear whenever you're matching on two or more disconnected patterns. In this case, however, it's fine, because you're looking up both of them by what is likely a unique name, s your result should be one node each.
If each separate part of that pattern returned multiple nodes, then you would have (rows of a) x (rows of b), a cartesian product between the two result sets.
So in this particular case, don't mind the warning.
As for why you're not seeing changes, note that you're reusing variables for different parts of the graph: you're using variable b for both the uppercase version of the officer, and for the :Entity in your WHERE. There is no node that matches to both.
Instead, use different variables for each, and include the :Entity in your match. Also, once you match to nodes and bind them to variables, you can reuse the variable names later in your query without having to repeat its labels or properties.
Try this:
MATCH (a:Officer {name :"Constantin Lutsenko"})-[:SHAREHOLDER_OF]->
(:Entity{id:'284429'}),(b:Officer{name : "CONSTANTIN LUTSENKO"})
CREATE (a)-[:SAME_COMPANY_AS]->(b)
Though I'm not quite sure of what you're trying to do...is an :Officer a company? That relationship type doesn't quite seem right.
I tried the answer by #InverseFalcon and thanks to it, by modifying the property identifier from 'id' to 'name' and using the property for both 'a' and 'b', 4 relationships were created by the following code:
MATCH (a:Officer {name :"Constantin Lutsenko"})-[:SHAREHOLDER_OF]->
(:Entity{name:'KINBOROUGH PORTFOLIO LTD.'}),(b:Officer{name : "CONSTANTIN
LUTSENKO"})-[:SHAREHOLDER_OF]->(:Entity{name:'Chandler Group Holdings Ltd'})
CREATE (a)-[:SAME_NAME_AS]->(b)
Thank you so much #InverseFalcon!

Optimizing neo4j queries involving creation of relationships

I'm interested to create relationships between two nodes having certain properties. The neo4j query for this could be written as:
MATCH (x:User {username: "user2064000"}), (y:User {username: "user2064001"}) MERGE (x)-[:KNOWS]->(y)
While the query does have the intended effect, the Neo4j web console also warns about the query creating a cartesian product (and about them being slow).
How should I rewrite the above query in order to prevent a cartesian product?
This is just a warning, and in your case you don't have to take care about it, because your are doing the following cartesian product : 1 x 1 (I assume that you have a unique constraint on username).
This warning appears when into a MATCH clause you describe two disjoincts patterns.
Cheers.

How to select relationships spreading from neo4j?

We have a scenario to display relationships spreading pictures(or messages) to user.
For example: Relationship 1 of Node A has a message "Foo", Relationship 2 of Node 2 also has same message "Foo" ... Relationship n of Node n also has same message "Foo".
Now we are going to display a relationship graph by query Neo4j.
This is my query:
MATCH (a)-[r1]-()-[r2]-()-[r3]-()-[r4]
WHERE a.id = '59072662'
and r2.message_id = r1.target_message_id
and r3.message_id = r2.target_message_id
and r4.message_id = r3.target_message_id
RETURN r1,r2,r3,r4
The problem is, this query does not work if there are only 2 levels of linking. If there is only a r1 and r2, this query returns nothing.
Please tell me how to write a Cypher query returns a set of relationships of my case?
Adding to Stefan's answer.
If you want to keep track of how pictures spread then you would also include a relationship to the image like:
(message)-[:INCLUDES]->(image)
If you want how a specific picture got spread in the message network:
MATCH (i:Image {url: "X"}), p=(recipient:User)<-[*]-(m:Message)<-[*]-(sender:User)
WHERE (m)-[:INCLUDES]->(i) WITH length(p) as length, sender ORDER BY length
RETURN DISTINCT sender
This will return all senders, ordered by path length, so the top one should be the original sender.
If you're just interested in the original sender you could use LIMIT 1.
Alternatively, if you find yourself traversing huge networks and hitting performance issue because of the massive paths that have to be traversed, you could also add a relationship between the message and the original uploader.
The answer to the question you psoted at the bottom, about the way to get a set of relationships in a variable length path:
You define a path, like in the example above
p=(recipient:User)<-[*]-(m:Message)<-[*]-(sender:User)
Then, to access the relationships in that path, you use the rels function
RETURN rels(p)
You didn't provide much details on your use case. From my experience I suggest that you rethink your way of graph data modelling.
A message seems to be a central concept in your domain. Therefore the message should be probably modeled as a node. To connect (a) and (b) via message (m), you might use something like (a)-[:SENT]->(m {message_id: ....})-[TO:]->(b).
Using this (m) could easily have a REFERS_TO relationship to another message making the query above way more graphy.

Resources