I am fairly new to Neo4J and have a problem in a Cypher query.
I write cypher queries through Neo4J java and my Neo4J database is designed as follows:
I have a user node with attributes such as id, name, age, email, gender and a node city. Every user is related to a city node(with attributes id, name) by a relationship (lives). However there can be a case when a user is not associated with city.
Now my scenario of the query is such that I want to fetch all details of user and the city he lives in, in a single query which is.
match p, c, p-[:lives]->c where p.type = 'com.Person' and c.type='com.City' and p.id = 12345 return p.name, p.age, p.email, p.gender, c.name;
The query works well when user is related to a city, but it fails in case when a user is not associated with a city.
Could you please help me with a query that can handle both the scenarios.
Your MATCH and WHERE clauses are actually requiring that all matched p must be associated with a city. You have to use the OPTIONAL MATCH clause for optional matches.
By the way, the "p, c" in "MATCH p, c, p-[:lives]->c" is unnecessary and inefficient.
To get the results you want, try the following (c.name will be null if there is no associated city):
MATCH (p {type: "Person", id: 12345})
OPTIONAL MATCH (p)-[:lives]->(c {type: "City"})
RETURN p.name, p.age, p.email, p.gender, c.name;
Also, I would strongly suggest the use of the labels Person and City for your p and c nodes (instead of the type properties). That would make your queries much more efficient and also clearer. If you made this change to your nodes, the faster queries would look like:
MATCH (p:Person {id: 12345})
OPTIONAL MATCH (p)-[:lives]->(c:City)
RETURN p.name, p.age, p.email, p.gender, c.name;
Related
[newbie question] I'm kinda in doubt if it's better using specific relationships or labels, but first let me give you a little bit more context.
Suppose that the graph should be able to answer the following questions/queries:
Given a person, return all the emails associated;
Given a person, return all the contacts;
I've come up with these 2 possible models:
They're both able to fulfill the required requests with the following queries:
First model:
MATCH (p:Person {name:"Bob"})-[:REACHABLE_BY]-(e:Email)
RETURN p.name AS Name, e.contact AS Contact
MATCH (p:Person {name:"Bob"})-[:REACHABLE_BY]-(c:Contact)
RETURN p.name AS Name, c.contact AS Contact
Second model:
MATCH (p:Person {name:"Bob"})-[:REACHABLE_BY_EMAIL]-(c:Contact)
RETURN p.name AS Name, c.contact AS Contact
MATCH (p:Person {name:"Bob"})-[:REACHABLE_BY_FAX]-(c:Contact)
RETURN p.name AS Name, c.contact AS Contact
UNION ALL
MATCH (p)-[:REACHABLE_BY_EMAIL]-(c2:Contact)
RETURN p.name AS Name, c2.contact AS Contact
But I'm wondering if there's a best practice to follow in this case. I mean, I know that having specific relationships in some cases is better since we reduce the number of nodes involved in the query (instead of filtering later by some property), but I feel like that in this case we can achieve the same result (maybe also in performance) by considering different labels.
Both of your models will work fine. But to obtain the best performance, you can combine these two models into a single one. Like this:
All the Contact nodes that store emails, will have Email label as well.
All the Contact nodes that store faxes will have Fax label as well.
Relationship type between Person and Email types node will be REACHABLE_BY_EMAIL
Relationship type between Person and Fax types node will be REACHABLE_BY_FAX
Using this model, you can easily query a person's email or by these queries:
MATCH (p:Person)-[:REACHABLE_BY_EMAIL]->(email)
RETURN p, email
MATCH (p:Person)-[:REACHABLE_BY_FAX]->(fax)
RETURN p, fax
Note, that I have not specified Email or Fax labels in the query, as they are redundant.
Also, now you can query your the emails and faxes, using simply
MATCH (e:Email) RETURN e
MATCH (f:Fax) RETURN f
If the need arises.
You can also use
(:Person)-[:REACHABLE_BY]->(:Contact)-[:HAS_TYPE]->(:ContactType)
with 'Fax', 'Phone', 'Email' as ContactType nodes.
In your queries, using directions will help you speed up things. Your 2nd query of the second model can be written as
MATCH (p:Person {name:"Bob"})-->(c:Contact)
RETURN p.name AS Name, c.contact AS Contact
For the model I suggest, the queries would be:
MATCH (p:Person {name:"Bob"})-->(c:Contact)-->(:ContactType {name:'Email'})
RETURN p.name AS Name, c.contact AS Contact
and
MATCH (p:Person {name:"Bob"})-->(c:Contact)
RETURN p.name AS Name, c.contact AS Contact
So this is a very basic question. I am trying to make a cypher query that creates a node and connects it to multiple nodes.
As an example, let's say I have a database with towns and cars. I want to create a query that:
creates people, and
connects them with the town they live in and any cars they may own.
So here goes:
Here's one way I tried this query (I have WHERE clauses that specify which town and which cars, but to simplify):
MATCH (t: Town)
OPTIONAL MATCH (c: Car)
MERGE a = ((c) <-[:OWNS_CAR]- (p:Person {name: "John"}) -[:LIVES_IN]-> (t))
RETURN a
But this returns multiple people named John - one for each car he owns!
In two queries:
MATCH (t:Town)
MERGE a = ((p:Person {name: "John"}) -[:LIVES_IN]-> (t))
MATCH (p:Person {name: "John"})
OPTIONAL MATCH (c:Car)
MERGE a = ((p) -[:OWNS_CAR]-> (c))
This gives me the result I want, but I was wondering if I could do this in 1 query. I don't like the idea that I have to find John again! Any suggestions?
It took me a bit to wrap my head around why MERGE sometimes creates duplicate nodes when I didn't intend that. This article helped me.
The basic insight is that it would be best to merge the Person node first before you match the towns and cars. That way you won't get a new Person node for each relationship pattern.
If Person nodes are uniquely identified by their name properties, a unique constraint would prevent you from creating duplicates even if you run a mistaken query.
If a person can have multiple cars and residences in multiple towns, you also want to avoid a cartesian product of cars and towns in your result set before you do the merge. Try using the table output in Neo4j Browser to see how many rows are getting returned before you do the MERGE to create relationships.
Here's how I would approach your query.
MERGE (p:Person {name:"John"})
WITH p
OPTIONAL MATCH (c:Car)
WHERE c.licensePlate in ["xyz123", "999aaa"]
WITH p, COLLECT(c) as cars
OPTIONAL MATCH (t:Town)
WHERE t.name in ["Lexington", "Concord"]
WITH p, cars, COLLECT(t) as towns
FOREACH(car in cars | MERGE (p)-[:OWNS]->(car))
FOREACH(town in towns | MERGE (p)-[:LIVES_IN]->(town))
RETURN p, towns, cars
I have a User nodes and Intersets node. I want to be able given an array of interests to create/delete/change the relationship between the User and the Interests I also want in the same query to update some properties on the user node.
So far this is what i have menage to do:
MATCH (user:User {id: id})
OPTIONAL MATCH (user)-[oldRel:InterestedIn]->(:Interest)
DETACH DELETE oldRel
WITH user
UNWIND {interestsIds} as id
MATCH (interest:Interest {id: id})
MERGE (user)-[rel: InterestedIn]->(interest)
SET user.name = {user}.name, ..(more sets)
RETURN user, collect(interest) as interests
I think this one is working tho some time is looks like the interests are returned duplicated..
As well this query looks like a bit of an overkill. Any idea how to do that query with a better way?
Does this seem about right?
MATCH (user:User {id: id})
OPTIONAL MATCH (interest:Interest)
WHERE interest.id IN {interestsIds}
MERGE (user)-[:InterestedIn]->(interest)
WITH DISTINCT user
MATCH (user)-[rel:InterestedIn]->(interest:Interest)
WHERE NOT(interest.id IN {interestsIds})
DELETE rel
WITH DISTINCT user
MATCH (user)-[:InterestedIn]->(interest:Interest)
RETURN user, collect(interest)
I am trying to figure out how to limit a shortest path query in cypher so that it only connects "Person" nodes containing a specific property.
Here is my query:
MATCH p = shortestPath( (from:Person {id: 1})-[*]-(to:Person {id: 2})) RETURN p
I would like to limit it so that when it connects from one Person node to another Person node, the Person node has to contain a property called "job" and a value of "engineer."
Could you help me construct the query? Thanks!
Your requirements are not very clear, but if you simply want one of the people to have an id of 1 and the other person to be an engineer, you would use this:
MATCH p = shortestPath( (from:Person {id: 1})-[*]-(to:Person {job: "engineer"}))
RETURN p;
This kind query should be much faster if you also created indexes for the id and job properties of Person.
Suppose I have two kinds of nodes, Person and Competency. They are related by a KNOWS relationship. For example:
(:Person {id: 'thiago'})-[:KNOWS]->(:Competency {id: 'neo4j'})
How do I query this schema to find out all Person that knows all nodes of a set of Competency?
Suppose that I need to find every Person that knows "java" and "haskell" and I'm only interested in the nodes that knows all of the listed Competency nodes.
I've tried this query:
match (p:Person)-[:KNOWS]->(c:Competency) where c.id in ['java','haskell'] return p.id;
But I get back a list of all Person that knows either "java" or "haskell" and duplicated entries for those who knows both.
Adding a count(c) at the end of the query eliminates the duplicates:
match (p:Person)-[:KNOWS]->(c:Competency) where c.id in ['java','haskell'] return p.id, count(c);
Then, in this particular case, I can iterate the result and filter out results that the count is less than two to get the nodes I want.
I've found out that I could do it appending consecutive match clauses to keep filtering the nodes to get the result I want, in this case:
match (p:Person)-[:KNOWS]->(:Competency {id:'haskell'})
match (p)-[:KNOWS]->(:Competency {id:'java'})
return p.id;
Is this the only way to express this query? I mean, I need to create a query by concatenating strings? I'm looking for a solution to a fixed query with parameters.
with ['java','haskell'] as skills
match (p:Person)-[:KNOWS]->(c:Competency)
where c.id in skills
with p.id, count(*) as c1 ,size(skills) as c2
where c1 = c2
return p.id
One thing you can do, is to count the number of all skills, then find the users that have the number of skill relationships equals to the skills count :
MATCH (n:Skill) WITH count(n) as skillMax
MATCH (u:Person)-[:HAS]->(s:Skill)
WITH u, count(s) as skillsCount, skillMax
WHERE skillsCount = skillMax
RETURN u, skillsCount
Chris
Untested, but this might do the trick:
match (p:Person)-[:KNOWS]->(c:Competency)
with p, collect(c.id) as cs
where all(x in ['java', 'haskell'] where x in cs)
return p.id;
How about this...
WITH ['java','haskell'] AS comp_col
MATCH (p:Person)-[:KNOWS]->(c:Competency)
WHERE c.name in comp_col
WITH comp_col
, p
, count(*) AS total
WHERE total = length(comp_col)
RETURN p.name, total
Put the competencies you want in a collection.
Match all the people that have either of those competencies
Get the count of compentencies by person where they have the same number as in the competency collection from the start
I think this will work for what you need, but if you are building these queries programatically the best performance you get might be with successive match clauses. Especially if you knew which competencies were most/least common when building your queries, you could order the matches such that the least common were first and the most common were last. I think that would chunk down to your desired persons the fastest.
It would be interesting to see what the plan analyzer in the sheel says about the different approaches.