Neo4j Cypher return combination of matched nodes - neo4j

I'd like to do some analytics over the database with the following schema:
As you may see on the picture above, there is a Vacancy1 which requires knowledge of Java and Python and proposes the salary equals 5000$. Also, there is a set of candidates which know everything (like Candidate5) a suit the salary (Cadidate5 desired salary equals 4950$), and candidates which know some skills, like only Java or Python but together know everything what is required on the vacancy, for example:
Candidate1(Java, 2000$), Candidate2(Python, 1500$)
such set of candidates together know Java and Python and a united salary is equal 3500$.
Is it possible to write the query in Neo4j in order to find all possible sets of candidates which suite such vacancy condition?
For example, for the picture above the result should contain, something like that:
[candidate5],
[candidate1, candidate2],
[candidate1, candidate4],
[candidate3, candidate2]
Please note, that the combinations of the candidates in the result may contain any number of candidates and not limited to only 1 or 2 as in the example above.
Could you please show an example of such Cypher query?
UPDATED
What if I need to take into account some additional properties, like for example experience, like minExp on the diagram below:
Here, we need a candidate for the Vacancy1 with minExp = 3
The Candidate2 has exp (experience) = 2 and is not a good fit from Java point of view, but in pair with Candidate3(exp = 5), they together is a good fit for the Vacancy1. Is it possible to improve the query in order to take this information into account and do such combinations?

I am a fan of NEO4J APOC functions so in APOC, theres is a function that gives all possble combinations on a given list. It returns a list of list with 1 or 2 or 3 or n items.
With ["Java", "Python"] as skills, size(skills) as n
Match (v:Vacancy)-[:CONTAINS]->(s:Skills)<-[:CONTAINS]-(c:Candidate)
Where s.language in skills and v.salary <= c.salary
With n, v, collect(c) as candidates
With v, apoc.coll.combinations(candidates, 1, n) as allCandidatesCombi
Unwind allCandidatesCombi as combi
With v, combi where apoc.coll.sum([c in combi |combi.salary]) <= v.salary
Return v, combi
References:
n is number of skills or candidates in the result
apoc.coll.combinations will give you all possible combinations of all candidates with 1 to n candiates
Unwind is like a for loop and gives you each item of that list one at a time
apoc.coll.sum will sum up the candidates salary

Related

Cypher Query to Collect Arbitrary Depth Nodes and Edge Properties

I have a graph that looks like the image below. However, the depth and the number of rollups from the Person to the topmost Rollup is variable depending on how the rollups have been structured by the user. The edges from the Person to the Metric (HAS_METRIC) have the score values and the relationships from the metrics to the Rollup (HAS_PARENT) has the weighting that should be applied by to the value as it is rolled up to a top score.
Ideally, I would like to have a query that produces a table with the rollup and the summed/weighted scores. Like this:
node | value
-------------------
Metric A 23
Metric B 55
Metric C 29
Metric D 78
Rollup A 45.4
Rollup B 58.4
Rollup Tot 51.9
However, I am not understanding how to collect the edge properties for the HAS_PARENTS.
MATCH (p:Person)-[score:HAS_METRIC]->(m:Metric)-[weight:HAS_PARENT]->(ru:Rollup)
-[par_rel:HAS_PARENT*..8]->(ru_par:Rollup)
WITH p, score, m, weight, par_rel, ru, ru_par
RETURN p.uid, score.score, m.uid, weight.weight, ru.uid par_rel.weight, ru_par.uid
This query is giving me a type mismatch because it does not know what to do with the par_rel.weight. Any pointers are appreciated.
I believe what you are searching for is the relationships(path) function. It is one of the default path functions in Cypher. It returns all relationship is a defined path, and you can combine it with one or more Cypher list expressions to get the values you need from the relationships.
Generally speaking, you could do something like:
MATCH p = (n)-[:HAS_PARENT*..8]->()
RETURN [x IN relationships(p) | x.weight] AS weights
You might also find useful the reduce function. E.g.:
...
RETURN reduce(s = 0, x IN relationships(p) | s + x.weight) AS sumWeight
But you need to be careful with your variable length path queries and probably constrain them in order to get only the paths you are interested in.
A good advice would be probably to mark your leaf and root nodes in order to match only paths from a leaf to a/the root, not just intermediate ones. E.g.:
MATCH p = (n)-[:HAS_PARENT*..8]->(root)
WHERE NOT (root)-[:HAS_PARENT]->() AND NOT (n)<-[:HAS_PARENT]-()
...
And of course you can combine these cypher with others in order to return everything you need in one single query.
I hope this helps. Let us know when you succeed.

Checking if subgraph fulfill condition as a whole

I find it hard to explain, so consider the following picture
I'm trying to select all products that fulfill the warehouse requirements
In this example I need to select all products that have a maximum size of 5 AND maximum weight of 10.
To simplify, I only have MAX (no MIN or EQ) constraints, so the operator can be hardcoded.
I've tried to group the requirement subgraph using COLLECT and using the ALL operator, but failed.
Query to create the graph
CREATE
// NODES
(warehouse:WAREHOUSE{name:'My Warehouse'}),
(smallProduct:PRODUCT{name:'Small Product'}),
(largeProduct:PRODUCT{name:'Large Product'}),
// RELATIONSHIPS
(size:CONSTRAINT{name:'Size'}),
(weight:CONSTRAINT{name:'Weight'}),
(warehouse)-[:LIMIT{value:5}]->(size),
(warehouse)-[:LIMIT{value:5}]->(weight),
(smallProduct)-[:AMOUNT{value:3}]->(size),
(smallProduct)-[:AMOUNT{value:2}]->(weight),
(largeProduct)-[:AMOUNT{value:10}]->(size),
(largeProduct)-[:AMOUNT{value:4}]->(weight)
UPDATE
The following query apparently solves the problem:
MATCH (warehouse:WAREHOUSE)
MATCH rel = ((warehouse)-[limit:LIMIT]->(constraint:CONSTRAINT)<-[amount:AMOUNT]-(product:PRODUCT))
WITH warehouse, product, collect(relationships(rel)) as paths
WHERE all( p in paths WHERE p[0].value > p[1].value )
return product
I am wondering if there is a better solution.

find the particular node which fit three criteria neo4j

I would like to find the name node which trade with three fruits only.
I tried to use the following code in neo4j.
match (s:good)-[r:TRADES]-(n:Name)-[:TRADES]-(p:good)
WHERE (s.good = 'Apple' or s.good='Orange') and p.stock ='Grapes'
return s,n,p
where it returns the query as below.
However, I just want the following. Just the one who trade Grapes, Orange and Apple only.
I don't know which part of the cypher is incorrect. thank you for helping
We have a knowledge base article on match intersection, what you're trying to do here, however your other restriction is that these are the only 3 connected nodes, so we have some additional work to do.
Using the first approach in the article, we just have to add an additional predicate to ensure the degree of :TRADES relationships equals the size of the collection:
WITH ['Apple', 'Orange', 'Grapes'] as names
MATCH (g:good)<-[:TRADES]-(n:Name)
WHERE g.good in names AND size((n)-[:TRADES]->()) = size(names)
WITH n, size(names) as inputCnt, count(DISTINCT g) as cnt
WHERE cnt = inputCnt
RETURN n

neo4j cartesian product performance improvement

I have a Graph database with over 2 million nodes. I have an application which takes a social graph and does some inference on it. As one step of the algorithm, I have to get all possible combinations of a relationship [:friends] of two connected nodes. Currently, I have a query which looks like:
match (a)-[:friend]-(c), (b)-[:friend]-(d) where id(a)={ida} and id(b)={idb} return distinct c as first, d as second
So, I already know the nodes a and b and I want to get all the possible pairs that can be made from friends of a and b.
This is obviously a very slow operation. I was wondering if there is a more efficient way of getting the same result in neo4j. Perhaps adding indexes might help? Any ideas / clues are welcome!
Example
Node a has friends : x, y
Node b has friends : g, h, i``
Then the result should be:
x,g
x,h
x,i
y,g
y,h
y,i`
If you are not already you should use labels to speed up your query, which might look like:
MATCH (p1:Person)-[:FRIEND]->(p3:Person),(p2:Person)-[:FRIEND]->(p4:Person)
WHERE ID(p1) = 6 AND ID(p2) = 7
RETURN p3 as first, p4 as second
Obviously that will rely on you having created your nodes with a :Person label.
How many friends does the average node have?
I wouldn't use two patterns but just one and the IN operator.
MATCH (p:Person)-[:FRIEND]->(friend:Person)
WHERE id(p) IN [1,2,3]
RETURN p, collect(friend) as friends
Then you have no cross product and you can also return the friends nicely as collection per person.

neo4j cypher: "stacking" nodes from query result

Considering the existence of three types of nodes in a db, connected by the schema
(a)-[ra {qty}]->(b)-[rb {qty}]->(c)
with the user being able to have some of each in their wishlist or whatever.
What would be the best way to query the database to return a list of all the nodes the user has on their wishlist, considering that when he has an (a) then in the result the associated (b) and (c) should also be returned after having multiplied some of their fields (say b.price and c.price) for the respective ra.qty and rb.qty?
NOTE: you can find the same problem without the variable length over here
Assuming you have users connected to the things they want like so:
(user:User)-[:WANTS]->(part:Part)
And that parts, like you describe, have dependencies on other parts in specific quantities:
CREATE
(a:Part) -[:CONTAINS {qty:2}]->(b:Part),
(a:Part) -[:CONTAINS {qty:3}]->(c:Part),
(b:Part) -[:CONTAINS {qty:2}]->(c:Part)
Then you can find all parts, and how many of each, you need like so:
MATCH
(user:User {name:"Steven"})-[:WANTS]->(part),
chain=(part)-[:CONTAINS*1..4]->(subcomponent:Part)
RETURN subcomponent, sum( reduce( total=1, r IN relationships(chain) | total * r.rty) )
The 1..4 term says to look between 1-4 sub-components down the tree. You can obv. set that to whatever you like, including "1..", infinite depth.
The second term there is a bit complex. It helps to try the query without the sum to see what it does. Without that, the reduce will do the multiplying of parts that you want for each "chain" of dependencies. Adding the sum will then aggregate the result by subcomponent (inferred from your RETURN clause) and sum up the total count for that subcomponent.
Figuring out the price is then an excercise of multiplying the aggregate quantities of each part. I'll leave that as an exercise for the reader ;)
You can try this out by running the queries in the online console at http://console.neo4j.org/

Resources