Given that I'm very new to Neo4j. I have a schema which looks like the below image:
Here Has nodes are different for example Passport, Merchant, Driving License, etc. and also these nodes are describing the customer node (looking for future scope of filtering customers based on these nodes).
SIMILAR is a self-relation meaning there exists a customer with ID:1 is related to another customer with ID:2 with a score of 2800.
I have the following questions:
Is this a good schema given the condition of the future scope I mentioned above, or getting all the properties in a single customer node is viable? (Different nodes may have array of items as well, for example: ()-[:HAS]->(Phone) having {active: "+91-1231241", historic_phone_numbers: ["+91-121213", "+91-1231421"]})
I want to get the customer along with describing nodes in relation to other customers. For that, I tried the below query (w/o number of relation more than 1):
// With number_of_relation > 1
MATCH (searched:Customer)-[r:SIMILAR]->(matched:Customer)
WHERE r.score > 2700
WITH searched, COLLECT(matched.customer_id) AS MatchedList, count(r) as cnt
WHERE cnt > 1
UNWIND MatchedList AS matchedCustomer
MATCH (person:Customer {customer_id: matchedCustomer})-[:HAS|:LIVES_IN|:IS_EMPLOYED_BY]->(related)
RETURN searched, person, related
Result what I got is below, notice one customer node not having its describing nodes:
// without number_of_relation > 1
// second attempt - for a sample customer_id
MATCH (matched)<-[r:SIMILAR]-(c)-[:HAS|:LIVES_IN|:IS_EMPLOYED_BY]->(b)
WHERE size(keys(b)) > 0
AND c.customer_id = "1b093559-a39b-4f95-889b-a215cac698dc"
AND r.score > 2700
RETURN b AS props, c AS src_cust, r AS relation, matched
Result I got are below, notice related nodes are not having their describing nodes:
If I had two describing nodes with some property (some may have a list) upon which I wanted to query and build the expected graph specified in point 2 above, how can I do that?
I want the database to find a similar customer given the describing nodes. Example: A customer {name: "Dave"} has phone {active_number: "+91-12345"} is similar to customer {name: "Mike"} has phone {active_number: "+91-12345"}. How can get started with this?
If something is unclear, please ask. I can explain with examples.
[EDITED]
Yes, the schema seems fine, except that you should not use the same HAS relationship type between different node label pairs.
The main problem with your first query is that its top MATCH clause uses a directional relationship pattern, ()-->(), which does not allow all Customer nodes to have a chance to be the searched node (because some nodes may only be at the tail end of SIMILAR relationships). This tweaked query should work better:
MATCH (searched:Customer)-[r:SIMILAR]-(matched:Customer)
WHERE r.score > 2700
WITH searched, COLLECT(matched) AS matchedList
WHERE SIZE(matchedList) > 1
UNWIND matchedList AS person
MATCH (person)-[:HAS|LIVES_IN|IS_EMPLOYED_BY]->(pDesc)
WITH searched, person, COLLECT(pDesc) AS personDescribers
MATCH (searched)-[:HAS|LIVES_IN|IS_EMPLOYED_BY]->(sDesc)
RETURN searched, person, personDescribers, COLLECT(sDesc) AS searchedDescribers
It's not clear what you want are trying to do.
To get all Customers who have the same phone number:
MATCH (c:Customer)-[:HAS_PHONE]-(p:Phone)
WHERE p.activeNumber = '+91-12345'
WITH p.activeNumber AS phoneNumber, COLLECT(c) AS customers
WHERE SIZE(customers) > 1
RETURN phoneNumber, customers
Related
I am a newbie who just started learning graph database and I have a problem querying the relationships between nodes.
My graph is like this:
There are multiple relationships from one node to another, and the IDs of these relationships are different.
How to find relationships where the number of relationships between two nodes is greater than 2,or is there a problem with the model of this graph?
Just like on the graph, I want to query node d and node a.
I tried to use the following statement, but the result is incorrect:
match (from)-[r:INVITE]->(to)
with from, count(r) as ref
where ref >2
return from
It seems to count the number of relations issued by all from, not the relationship between from-->to.
to return nodes who have more then 2 relationship between them you need to check the size of the collected rels. something like
MATCH (x:Person)-[r:INVITE]-(:Party)
WITH x, size(collect(r)) as inviteCount
WHERE inviteCount > 2
RETURN x
Aggregating functions like COLLECT and COUNT use non-aggregating terms in the same WITH (or RETURN) clause as "grouping keys".
So, here is one way to get pairs of nodes that have more than 2 INVITE relationships (in a specific direction) between them:
MATCH (from)-[r:INVITE]->(to)
WITH from, to, COUNT(r) AS ref
WHERE ref > 2
RETURN from, to
NOTE: Ideally (for clarity and efficiency), your nodes would have specific labels and the MATCH pattern would specify those labels.
I have a graph model -
(p:Person)-[r:LINK {startDate: timestamp, endDate: timestamp}]->(c:Company)
A person can be linked to multiple companies at the same time and a company can have multiple people linking to it at the same time (i.e. there is a many-to-many relationship between companies and people).
The endDate property is optional and will only be present when a person has left a company.
I am trying to display a network of connections and can successfully return all related nodes from a person using the following cypher query (this will display 2 levels of people connections) -
MATCH (p:Person {id:<id>})-[r:LINK*0..4]-(l) RETURN *
What I now need to do is filter the relationships where the relationships match on timeframe, e.g. Person 1 worked at Company A between 01/01/2000 and 31/12/2002. Person 2 worked at Company A between 01/01/2001 and 31/06/2001. Person 3 worked at Company A between 01/01/2005 and is still at Company A. The results for Person 1 should include Person 2 but not Person 3.
This same logic needs to be applied to all levels of the graph (we allow the user to display 3 levels of connections) and relates to the parent node in each level, i.e. when displaying level 2, the dates for Person 2 and Person 3 should be used to filter their respective relationships.
Essentially, we are trying to do something similar to the LinkedIn connections but to filter based on people working at companies at the same time.
I have tried using the REDUCE function but cannot get the logic to work for the optional end date - can someone please advise how to filter the relationships based on the start and end dates?
It turns out there are 4 ways in which date ranges can overlap, but only 2 in which they do not (person 1 ends before person 2 starts, or person 2 ends before person 1 starts), so it is much simpler to check that neither of these no-overlap conditions exist.
In the level 1 case, this query should do the trick:
MATCH (start:Person{id:1})-[r1:LINK]->(c)<-[r2:LINK]-(suggest)
WHERE NOT ((r1.endDate IS NOT NULL and r1.endDate < r2.startDate)
OR (r2.endDate IS NOT NULL and r2.endDate < r1.startDate))
RETURN suggest
The tricky part is applying this to multiple levels.
While we could create a single Cypher query to handle this dynamically, the evaluation of the relationships would only happen after expansion, not during, so it may not be the most efficient:
MATCH path = (start:Person{id:1})-[:LINK*..6]-(suggest:Person)
WITH path, start, suggest, apoc.coll.pairsMin(relationships(path)) as pairs
WITH path, start, suggest, [index in range(0, size(pairs)-1) WHERE index % 2 = 0 | pairs[index]] as pairs
WHERE none(pair in pairs WHERE (pair[0].endDate IS NOT NULL AND pair[0].endDate < pair[1].startDate)
OR (pair[1].endDate IS NOT NULL AND pair[1].endDate < pair[0].startDate))
RETURN suggest
Some of the highlights here...
We're using apoc.coll.pairsMin() from APOC Procedures to get pairs of adjacent relationships from the collection of relationships in each path, but we're only interested in the even-numbered entries (the two relationships from people working at the same company), because the odd-numbered pairs correspond to relationships from the same person going to two different companies.
So if we were executing on this pattern:
MATCH path = (start:Person)-[r1:LINK]->(c1)<-[r2:LINK]-(person2)-[r3:LINK]->(c2)<-[r4:LINK]-(person3)
The apoc.coll.pairsMin(relationships(path)) would return [[r1, r2], [r2,r3], [r3,r4]], and as you can see the relationships we need to consider are the ones linking 2 people to a company, so indexes 0 and 2 in the pairs list.
After we get our pairs we need to ensure that all of those interesting relationship pairs in the path considered to a suggestion meet your criteria and overlap (or do not NOT overlap).
Something like this should work:
MATCH path=(p:Person {id: $id})-[r:LINK*..4]-(l)
WHERE ALL(x IN NODES(path)[1..] WHERE x.startDate <= p.endDate AND x.endDate >= p.startDate)
RETURN path;
Assumptions:
The id value of the main person of interest is provided by the $id parameter.
You want the variable-length relationship pattern to have a lower bound of 1 (which is the default). If you used 0 for a lower bound, then you will also get the main person of interest as a result -- which is probably not what you want.
startDate and endDate have values that are suitable for comparison using comparison operators
I have an Neo4j Cypher query I struggling with. What I want is to have all nodes with a given relationship, until it reach a relation with an properties who is below a given number ( < 25). All nodes after that relationship is found should be skipped.
Here a sample of my graph:
and how I would like the result to be (every nodes after relationship IS_OWNER with Share < 25 should be skipped)
I hope someone can show how to write the Cypher query who can give the desired result (picture 2) from the sample graph (picture 1). Every nodes after relationship IS_OWNER with Share < 25 should be skipped.
The ALL() function should work here, letting you specify that all relationships in the path should have Share >= 25, and pruning further expansion otherwise.
Something like this, assuming starting at company C:
MATCH (:Company{name:'C'})-[rels:IS_OWNER*0..]-(companies)
WHERE all(rel in rels WHERE rel.Share >= 25)
RETURN companies
EDIT:
While it seems like using a variable on the variable-length relationship is deprecated in the newer versions of neo4j (I'll double-check on that, doesn't seem like a good decision), here's another way to specify this which gets around the warning:
MATCH p = (:Company{name:'C'})-[:IS_OWNER*0..]-(companies)
WHERE all(rel in relationships(p) WHERE rel.Share >= 25)
RETURN companies
I have a Neo4J DB up and running with currently 2 Labels: Company and Person.
Each Company Node has a Property called old_id.
Each Person Node has a Property called company.
Now I want to establish a relation between each Company and each Person where old_id and company share the same value.
Already tried suggestions from: Find Nodes with the same properties in Neo4J and
Find Nodes with the same properties in Neo4J
following the first link I tried:
MATCH (p:Person)
MATCH (c:Company) WHERE p.company = c.old_id
CREATE (p)-[:BELONGS_TO]->(c)
resulting in no change at all and as suggested by the second link I tried:
START
p=node(*), c=node(*)
WHERE
HAS(p.company) AND HAS(c.old_id) AND p.company = c.old_id
CREATE (p)-[:BELONGS_TO]->(c)
RETURN p, c;
resulting in a runtime >36 hours. Now I had to abort the command without knowing if it would eventually have worked. Therefor I'd like to ask if its theoretically correct and I'm just impatient (the dataset is quite big tbh). Or if theres a more efficient way in doing it.
This simple console shows that your original query works as expected, assuming:
Your stated data model is correct
Your data actually has Person and Company nodes with matching company and old_id values, respectively.
Note that, in order to match, the values must be of the same type (e.g., both are strings, or both are integers, etc.).
So, check that #1 and #2 are true.
Depending on the size of your dataset you want to page it
create constraint on (c:Company) assert c.old_id is unique;
MATCH (p:Person)
WITH p SKIP 100000 LIMIT 100000
MATCH (c:Company) WHERE p.company = c.old_id
CREATE (p)-[:BELONGS_TO]->(c)
RETURN count(*);
Just increase the skip value from zero to your total number of people in 100k steps.
I'm trying to out how to exclude nodes from a query. My graph consists of users, skills, skill scoring, questions and endo
user has skills
skill has scorings in relation with questions ((skill)-->(scorings)-->(questions))
endo is a relationship for users and skills scorings ((user)-->(endos)-->(scorings))
I would like to find all question for user, but exclude those questions who user has already endo relationship
I thought I could do :
MATCH (u:`User`),
u<-[rel1:`USERS`]-(s:`Skill`),
s-[rel2:`SKILLS`]->(k:`SkillScoring`),
k-[rel3:`ANSWER`]->(q:`Question`),
u<-[rel4:`ENDO`]-(e:`Endo`)
WHERE NOT((e)-->(u)) AND (u.id='1')
RETURN u, e, k, q
UPDATE:
endo nodes are connected like this.
blue is user node
purple is journaled endorsement node (created at)
green is skill scoring node
In fact the relationship "ENDORSEMENT" has a node (journalised) which connects the skill scoring nodes
UPDATE:
When I execute this query, it returns me the question in connection with the user
MATCH (u:User),
u<-[rel1:USERS]-(s:SoftSkill),
s-[rel2:SOFT_SKILLS]->(k:SkillScoring),
k-[rel3:ANSWER]->(q:Question),
u<-[:ENDO]-()<-[:ENDO]->(k)
WHERE u.id='1'
RETURN q, u
by cons when I execute this query to exclude the question, the query returns me questions but also question that I don't want
MATCH (u:User),
u<-[rel1:USERS]-(s:SoftSkill),
s-[rel2:SOFT_SKILLS]->(k:SkillScoring),
k-[rel3:ANSWER]->(q:Question)
WHERE u.id='1' AND NOT u<-[:ENDO]-()<-[:ENDO]->(k)
RETURN q, u
What's wrong? Any suggestions?
Thanks
Your query first says that u has :ENDO relationship from e, then that e has no relationship to u, but that can't be.
How are endo nodes connected to scorings/questions in those cases you want to exclude? Could you try something like
MATCH (u:User)-[rel1:USERS]->(s:Skill)-[rel2:SKILLS]->(k:SkillScoring)-[rel3:ANSWER]->(q:Question)
WHERE u.id = '1' AND NOT u<-[:ENDO]-()-[:??]->k
and fill in the relationship type at ?? for how the endo node (anonymous node () above) is connected?
If you want you can create an example graph at http://console.neo4j.org, that would help make your intention clear.
Also, it's fine to use backticks around labels and relationships, but you don't need to unless they contain whitespace or some other uncommon characters.