I'm trying to solve the simple question, if in a graph with the "knows" relationship, some person A knows some person B. Ideally I would answer this question with either true or false but I'm failing to solve this.
I found the following in another StackOverflow question which is almost what I want, except that apart from just answering my question, it also changes the graph:
MATCH (p:Person {userId: {0}}), (b:Person {userId: {1}})
MERGE (p)-[r:KNOWS]->(b)
ON CREATE SET r.alreadyExisted=false
ON MATCH SET r.alreadyExisted=true
RETURN r.alreadyExisted;
In the end I would like to put this in a Spring Neo4J repository like this
public interface PersonRepository extends GraphRepository<Person> {
boolean knows(final Long me, final Long other);
}
That means if there is a way to do it without cypher - using Springs Query and Finder methods, that would be fine too.
The Cypher query for this is a simple one, the key here is the EXISTS() function, which will return a boolean value if the pattern given to the function exists in the graph.
Here's the Cypher query.
MATCH (p:Person {userId: {0}}), (b:Person {userId: {1}})
RETURN EXISTS( (p)-[:KNOWS]-(b) )
You can even make it more concise:
RETURN EXISTS( (:Person {userId: {0}})-[:KNOWS]-(:Person {userId: {1}}) )
As a complementary note to what #InverseFalcon said
// first
MATCH (p:Person {userId: {0}}), (b:Person {userId: {1}})
RETURN exists( (p)-[:KNOWS]-(b) )
// second
RETURN exists( (:Person {userId: {0}})-[:KNOWS]-(:Person {userId: {1}}) )
There is a difference between the two examples that were provided:
the first one builds a Cartesian product between disconnected patterns.
If a part of a query contains multiple disconnected patterns, this will build a Cartesian product between all those parts. This may produce a large amount of data and slow down query processing. While occasionally intended, it may often be possible to reformulate the query that avoids the use of this cross product, perhaps by adding a relationship between the different parts or by using OPTIONAL MATCH
It merely means if you have 5 persons in your database P={0,1,2,3,4};
the first query nearly checks existence |A|x|A| = 5x5 = 25 possible path between every two-person where the first Person node has id equals to` 0 and the second Person node has id equals to 1.
the second query checks existence of the path from a Person node with id 0 and the Person node with id 1.
also cause exists can be a function and keyword, convention suggests to write functions as lowercase and others as uppercase.
// Use an existential sub-query to filter.
WHERE EXISTS {
MATCH (n)-->(m) WHERE n.userId = m.userId
}
also, you can rename the return value to some new variable for example:
RETURN exists( (:Person {userId: {0}})-[:KNOWS]-(:Person {userId: {1}}) ) as knows
Related
So this is a very basic question. I am trying to make a cypher query that creates a node and connects it to multiple nodes.
As an example, let's say I have a database with towns and cars. I want to create a query that:
creates people, and
connects them with the town they live in and any cars they may own.
So here goes:
Here's one way I tried this query (I have WHERE clauses that specify which town and which cars, but to simplify):
MATCH (t: Town)
OPTIONAL MATCH (c: Car)
MERGE a = ((c) <-[:OWNS_CAR]- (p:Person {name: "John"}) -[:LIVES_IN]-> (t))
RETURN a
But this returns multiple people named John - one for each car he owns!
In two queries:
MATCH (t:Town)
MERGE a = ((p:Person {name: "John"}) -[:LIVES_IN]-> (t))
MATCH (p:Person {name: "John"})
OPTIONAL MATCH (c:Car)
MERGE a = ((p) -[:OWNS_CAR]-> (c))
This gives me the result I want, but I was wondering if I could do this in 1 query. I don't like the idea that I have to find John again! Any suggestions?
It took me a bit to wrap my head around why MERGE sometimes creates duplicate nodes when I didn't intend that. This article helped me.
The basic insight is that it would be best to merge the Person node first before you match the towns and cars. That way you won't get a new Person node for each relationship pattern.
If Person nodes are uniquely identified by their name properties, a unique constraint would prevent you from creating duplicates even if you run a mistaken query.
If a person can have multiple cars and residences in multiple towns, you also want to avoid a cartesian product of cars and towns in your result set before you do the merge. Try using the table output in Neo4j Browser to see how many rows are getting returned before you do the MERGE to create relationships.
Here's how I would approach your query.
MERGE (p:Person {name:"John"})
WITH p
OPTIONAL MATCH (c:Car)
WHERE c.licensePlate in ["xyz123", "999aaa"]
WITH p, COLLECT(c) as cars
OPTIONAL MATCH (t:Town)
WHERE t.name in ["Lexington", "Concord"]
WITH p, cars, COLLECT(t) as towns
FOREACH(car in cars | MERGE (p)-[:OWNS]->(car))
FOREACH(town in towns | MERGE (p)-[:LIVES_IN]->(town))
RETURN p, towns, cars
I'm wondering, when I have read the data of a node and I want to match it in another query, which way will have the best performance? Using id like this:
MATCH (n) WHERE ID(n) = 1234
or using indices of the node:
MATCH (n:Label {SomeIndexProperty: 3456})
Which one is better?
IDs are a technical ID for Neo4j, and those should not be used as a primary key for your application.
Every node (and relationship) has a technical ID, and it's stable over time.
But if you delete a node, for example the node 32, Neo4j will reuse this ID for a new node.
So you can use it in your queries inside the same transaction (there is no problem), otherwise you should know what you are doing.
The only way to retrieve the technical ID, is to use the function ID like you do on your first query : MATCH (n) WHERE ID(n) = 1234 RETURN n.
The ID is not exposed as a node's property, so you can't do MATCH (n {ID:1234}) RETURN n.
You have noticed that if you want to do a WHERE on a strict equality, you can do put the condition directly on the node.
For example :
MATCH (n:Node) WHERE n.name = 'logisima' RETURN n
MATCH (n:Node {name:'logisima'}) RETURN n
Those two queries are identicals, they generate the same query plan, it's just a syntactic sugar.
Is it faster to retrieve a node by its ID or by an indexed property ?
The easier way to know the answer to this question is to profile the two queries.
On the one based on the ID, you will see the box NodeByIdSeek that cost 1 db hit, and on the one with a unique constrainst you will see the box NodeUniqueIndexSeek with 2 db hits.
So searching a node by its ID is faster.
Trying to understand when to use a property value vs a WHERE clause.
$Match (g:GROUP {GroupID: 1}) RETURN g
gives the expected response (all reported properties as expected).
And,
$match (a:ADDRESS {AddressID: 454}) return a
gives the expected response (all reported properties as expected).
However, the combo in a MERGE
MERGE (g:GROUP {GroupID: 1})-[r:USES]->(a:ADDRESS {AddressID: 454}) Return g.ShortName, type(r), a.Line1;
creates two new nodes (with no properties, of course, except a redundant AddressID and GroupID. The AddressID and GroupID were created with toInt() and I tried putting the property values in toInt() also (same result):
Added 2 labels, created 2 nodes, set 2 properties, created 1 relationship, returned 1 row in 77 ms.
So, after DETACH DELETE the extraneous nodes, I try again with (which works)
Match (g:GROUP) WHERE g.GroupID = 1
Match (a:ADDRESS) WHERE a.AddressID = 454
MERGE (g)-[r:USES]->(a)
RETURN g.ShortName, type(r), a.Line1
Returned 1 row in 14 ms.
WHY does the separate MATCHing work while the property spec does not?
MERGE is one of the trickier clauses for exactly this behavior.
From the Cypher documentation for MERGE:
When using MERGE on full patterns, the behavior is that either the
whole pattern matches, or the whole pattern is created. MERGE will not
partially use existing patterns — it’s all or nothing. If partial
matches are needed, this can be accomplished by splitting a pattern up
into multiple MERGE clauses.
So when you're going to MERGE a pattern, and you aren't using variables bound to already existing nodes, then the entire pattern is matched, or if it doesn't exist, the entire pattern is created, which, in your case, creates duplicate nodes, as your intent is to use existing nodes in the MERGE.
In general, when you want to MERGE a relationship or pattern between nodes that already exist, it's best to MATCH or MERGE on the nodes which should already exist first, and then MERGE the pattern with the matched or merged variables.
EDIT
I think there's some confusion here about the reasons for the differences in the queries.
This doesn't have anything to do with whether the properties are defined in a WHERE clause, or inline on the nodes in the MATCH clauses.
In fact, you can do this just fine with your last query, and it will behave identically:
Match (g:GROUP {GroupID:1})
Match (a:ADDRESS {AddressID:454})
MERGE (g)-[r:USES]->(a)
RETURN g.ShortName, type(r), a.Line1
The reasons for the differences, again, the behavior of MERGE
Really the easiest way to grasp what's going on is to consider what the behavior would be if MERGE were substituted first with MATCH, and then if no match was found, with CREATE.
MATCH (g)-[r:USES]->(a)
and if there is no match, it does CREATE instead
CREATE (g)-[r:USES]->(a)
That should make sense...a CREATE with existing nodes will create the missing part, the relationship.
Contrast that with using MERGE on the entire pattern:
MERGE (g:GROUP {GroupID: 1})-[r:USES]->(a:ADDRESS {AddressID: 454})
Return g.ShortName, type(r), a.Line1;
First this will attempt a MATCH:
MATCH (g:GROUP {GroupID: 1})-[r:USES]->(a:ADDRESS {AddressID: 454})
and then when no match is found, a CREATE
CREATE (g:GROUP {GroupID: 1})-[r:USES]->(a:ADDRESS {AddressID: 454})
And given what we know of how CREATE works, it doesn't attempt to match parts of the pattern (and there are no variables that have already matched to existing elements of the graph), it creates the pattern as a whole, creating a brand new :GROUP and :ADDRESS node with the given properties, and the new :USES relationship.
MERGE (g:GROUP {GroupID: 1})-[r:USES]->(a:ADDRESS {AddressID: 454}) Return g.ShortName, type(r), a.Line1; most likely creates two nodes, because those properties (GroupID for the GROUP node / AddressID for ADDRESS node) are not the only properties on those nodes.
Matching the nodes first ensures that you're getting nodes with matching properties (which could have other properties, too) and merge those.
If you had an index with uniqueness constraint on both GroupID for GROUP nodes and AddressID for ADDRESS nodes, then the MERGE without matching first, should still make that connection.
I want to add a "created by" relationship on nodes in my database. Any node should be able of having this relationship but there can never be more than one.
Right now my query looks something like this:
MATCH (u:User {email: 'my#mail.com'})
MERGE (n:Node {name: 'Node name'})
ON CREATE SET n.name='Node name', n.attribute='value'
CREATE UNIQUE (n)-[:CREATED_BY {date: '2015-02-23'}]->(u)
RETURN n
As I have understood Cypher there is no way to achieve what I want, the current query will only make sure there are no unique relationships based on TWO nodes, not ONE. So, this will create more CREATED_BY relationships when run for another User and I want to limit the outgoing CREATED_BY relationship to just one for all nodes.
Is there a way to achieve this without running multiple queries involving program logic?
Thanks.
Update
I tried to simplyfy the query by removing implementation details, if it helps here's the updated query based on cybersams response.
MERGE (c:Company {name: 'Test Company'})
ON CREATE SET c.uuid='db764628-5695-40ee-92a7-6b750854ebfa', c.created_at='2015-02-23 23:08:15', c.updated_at='2015-02-23 23:08:15'
WITH c
OPTIONAL MATCH (c)
WHERE NOT (c)-[:CREATED_BY]-()
CREATE (c)-[:CREATED_BY {date: '2015-02-23 23:08:15'}]->(u:User {token: '32ba9d2a2367131cecc53c310cfcdd62413bf18e8048c496ea69257822c0ee53'})
RETURN c
Still not working as expected.
Update #2
I ended up splitting this into two queries.
The problem I found was that there was two possible outcomes as I noticed.
The CREATED_BY relationship was created and (n) was returned using OPTIONAL MATCH, this relationship would always be created if it didn't already exist between (n) and (u), so when changing the email attribute it would re-create the relationship.
The Node (n) was not found (because of not using OPTIONAL MATCH and the WHERE NOT (c)-[:CREATED_BY]-() clause), resulting in no relationship created (yay!) but without getting the (n) back the MERGE query looses all it's meaning I think.
My Solution was the following two queries:
MERGE (n:Node {name: 'Name'})
ON CREATE SET
SET n.attribute='value'
WITH n
OPTIONAL MATCH (n)-[r:CREATED_BY]-()
RETURN c, r
Then I had program logic check the value of r, if there was no relationship I would run the second query.
MATCH (n:Node {name: 'Name'})
MATCH (u:User {email: 'my#email.com'})
CREATE UNIQUE (n)-[:CREATED_BY {date: '2015-02-23'}]->(u)
RETURN n
Unfortunately I couldn't find any real solution to combining this in one single query with Cypher. Sam, thanks! I've selected your answer even though it didn't quite solve my problem, but it was very close.
This should work for you:
MERGE (n:Node {name: 'Node name'})
ON CREATE SET n.attribute='value'
WITH n
OPTIONAL MATCH (n)
WHERE NOT (n)-[:CREATED_BY]->()
CREATE UNIQUE (n)-[:CREATED_BY {date: '2015-02-23'}]->(:User {email: 'my#mail.com'})
RETURN n;
I've removed the starting MATCH clause (because I presume you want to create a CREATED_BY relationship even when that User does not yet exist in the DB), and simplified the ON CREATE to remove the redundant setting of the name property.
I have also added an OPTIONAL MATCH that will only match an n node that does not already have an outgoing CREATED_BY relationship, followed by a CREATE UNIQUE clause that fully specifies the User node.
I am trying to figure out how to limit a shortest path query in cypher so that it only connects "Person" nodes containing a specific property.
Here is my query:
MATCH p = shortestPath( (from:Person {id: 1})-[*]-(to:Person {id: 2})) RETURN p
I would like to limit it so that when it connects from one Person node to another Person node, the Person node has to contain a property called "job" and a value of "engineer."
Could you help me construct the query? Thanks!
Your requirements are not very clear, but if you simply want one of the people to have an id of 1 and the other person to be an engineer, you would use this:
MATCH p = shortestPath( (from:Person {id: 1})-[*]-(to:Person {job: "engineer"}))
RETURN p;
This kind query should be much faster if you also created indexes for the id and job properties of Person.