Checking if subgraph fulfill condition as a whole - neo4j

I find it hard to explain, so consider the following picture
I'm trying to select all products that fulfill the warehouse requirements
In this example I need to select all products that have a maximum size of 5 AND maximum weight of 10.
To simplify, I only have MAX (no MIN or EQ) constraints, so the operator can be hardcoded.
I've tried to group the requirement subgraph using COLLECT and using the ALL operator, but failed.
Query to create the graph
CREATE
// NODES
(warehouse:WAREHOUSE{name:'My Warehouse'}),
(smallProduct:PRODUCT{name:'Small Product'}),
(largeProduct:PRODUCT{name:'Large Product'}),
// RELATIONSHIPS
(size:CONSTRAINT{name:'Size'}),
(weight:CONSTRAINT{name:'Weight'}),
(warehouse)-[:LIMIT{value:5}]->(size),
(warehouse)-[:LIMIT{value:5}]->(weight),
(smallProduct)-[:AMOUNT{value:3}]->(size),
(smallProduct)-[:AMOUNT{value:2}]->(weight),
(largeProduct)-[:AMOUNT{value:10}]->(size),
(largeProduct)-[:AMOUNT{value:4}]->(weight)
UPDATE
The following query apparently solves the problem:
MATCH (warehouse:WAREHOUSE)
MATCH rel = ((warehouse)-[limit:LIMIT]->(constraint:CONSTRAINT)<-[amount:AMOUNT]-(product:PRODUCT))
WITH warehouse, product, collect(relationships(rel)) as paths
WHERE all( p in paths WHERE p[0].value > p[1].value )
return product
I am wondering if there is a better solution.

Related

"Query Optimization" : Neo4j Query builds a cartesian product between disconnected patterns -

I’m supposed to have graph of multiple nodes(more than 2) with their relationships at 1st degree, second degree, third degree.
For that right now I am using this query
WITH ["1258311979208519680","3294971891","1176078684270333952",”117607868427845”] as ids
MATCH (n1:Target),(n2:Target) WHERE n1.id in ids and n2.id in ids and n1.id<>n2.id and n1.uid=103 and n2.uid=103
MATCH p = ((n1)-[*..3]-(n2)) RETURN p limit 30
In which 4 nodes Id’s are mention in WITH[ ] and next [*..3] it is used to draw 3rd degree graph between the selected nodes.
WHAT the ABOVE QUERY DOING
After running the above query it will return the mutual nodes in case of second degree [*..2] if any of the 2 selected nodes have mutual relation it’ll return.
WHAT I WANT
*1) First of all I want to optimize the query, as it is taking so much time and this query causing the Cartesian product which slow down the query process.
2) As in this above query if any 2 nodes have mutual relationship it will return the data, I WANT, the query will return mutual nodes attached with all selected nodes. Means if we have some nodes in return, these nodes must have relation to all selected target nodes.
Any suggestions to modify the query, to optimize the query.
If you are looking for to avoid the cartesian product issue with the given query
WITH ["1258311979208519680","3294971891","1176078684270333952",”117607868427845”] as ids
MATCH (n1:Target),(n2:Target) WHERE n1.id in ids and n2.id in ids and n1.id<>n2.id and n1.uid=103 and n2.uid=103
MATCH p = ((n1)-[*..3]-(n2)) RETURN p limit 30
I suggest to use this one below
MATCH (node1:Target) WHERE node1.id IN ["1258311979208519680","3294971891","1176078684270333952"]
MATCH (node2:Target) WHERE node2.id IN ["1258311979208519680","3294971891","1176078684270333952"]
and node1.id <> node2.id
MATCH p=(node1)-[*..2]-(node2)
RETURN p
It will remove the cartesian product issue.
Try this..

Cypher - Neo4j Query Profiling

I have some questions regarding Neo4j's Query profiling.
Consider below simple Cypher query:
PROFILE
MATCH (n:Consumer {mobileNumber: "yyyyyyyyy"}),
(m:Consumer {mobileNumber: "xxxxxxxxxxx"})
WITH n,m
MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;
and output is:
So according to Neo4j's Documentation:
3.7.2.2. Expand Into
When both the start and end node have already been found, expand-into
is used to find all connecting relationships between the two nodes.
Query.
MATCH (p:Person { name: 'me' })-[:FRIENDS_WITH]->(fof)-->(p) RETURN
> fof
So here in the above query (in my case), first of all, it should find both the StartNode & the EndNode before finding any relationships. But unfortunately, it's just finding the StartNode, and then going to expand all connected :HAS_CONTACT relationships, which results in not using "Expand Into" operator. Why does this work this way? There is only one :HAS_CONTACT relationship between the two nodes. There is a Unique Index constraint on :Consumer{mobileNumber}. Why does the above query expand all 7 relationships?
Another question is about the Filter operator: why does it requires 12 db hits although all nodes/ relationships are already retrieved? Why does this operation require 12 db calls for just 6 rows?
Edited
This is the complete Graph I am querying:
Also I have tested different versions of same above query, but the same Query Profile result is returned:
1
PROFILE
MATCH (n:Consumer{mobileNumber: "yyyyyyyyy"})
MATCH (m:Consumer{mobileNumber: "xxxxxxxxxxx"})
WITH n,m
MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;
2
PROFILE
MATCH (n:Consumer{mobileNumber: "yyyyyyyyy"}), (m:Consumer{mobileNumber: "xxxxxxxxxxx"})
WITH n,m
MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;
3
PROFILE
MATCH (n:Consumer{mobileNumber: "yyyyyyyyy"})
WITH n
MATCH (n)-[r:HAS_CONTACT]->(m:Consumer{mobileNumber: "xxxxxxxxxxx"})
RETURN n,m,r;
The query you are executing and the example provided in the Neo4j documentation for Expand Into are not the same. The example query starts and ends at the same node.
If you want the planner to find both nodes first and see if there is a relationship then you could use shortestPath with a length of 1 to minimize the DB hits.
PROFILE
MATCH (n:Consumer {mobileNumber: "yyyyyyyyy"}),
(m:Consumer {mobileNumber: "xxxxxxxxxxx"})
WITH n,m
MATCH Path=shortestPath((n)-[r:HAS_CONTACT*1]->(m))
RETURN n,m,r;
Why does this do this?
It appears that this behaviour relates to how the query planner performs a database search in response to your cypher query. Cypher provides an interface to search and perform operations in the graph (alternatives include the Java API, etc.), queries are handled by the query planner and then turned into graph operations by neo4j's internals. It make sense that the query planner will find what is likely to be the most efficient way to search the graph (hence why we love neo), and so just because a cypher query is written one way, it won't necessarily search the graph in the way we imagine it will in our head.
The documentation on this seemed a little sparse (or, rather I couldn't find it properly), any links or further explanations would be much appreciated.
Examining your query, I think you're trying to say this:
"Find two nodes each with a :Consumer label, n and m, with contact numbers x and y respectively, using the mobileNumber index. If you find them, try and find a -[:HAS_CONTACT]-> relationship from n to m. If you find the relationship, return both nodes and the relationship, else return nothing."
Running this query in this way requires a cartesian product to be created (i.e., a little table of all combinations of n and m - in this case only one row - but for other queries potentially many more), and then relationships to be searched for between each of these rows.
Rather than doing that, since a MATCH clause must be met in order to continue with the query, neo knows that the two nodes n and m must be connected via the -[:HAS_CONTACT]-> relationship if the query is to return anything. Thus, the most efficient way to run the query (and avoid the cartesian product) is as below, which is what your query can be simplified to.
"Find a node n with the :Consumer label, and value x for the index mobileNumber, which is connected via a -[:HAS_CONTACT]-> relationshop to a node m with the :Consumer label, and value y for its proprerty mobileNumber. Return both nodes and the relationship, else return nothing."
So, rather than perform two index searches, a cartesian product and a set of expand into operations, neo performs only one index search, an expand all, and a filter.
You can see the result of this simplification by the query planner through the presence of AUTOSTRING parameters in your query profile.
How to Change Query to Implement Search as Desired
If you want to change the query so that it must use an expand into relationship, make the requirement for the relationship optional, or use explicitly iterative execution. Both these queries below will produce the initially expected query profiles.
Optional example:
PROFILE
MATCH (n:Consumer{mobileNumber: "xxx"})
MATCH (m:Consumer{mobileNumber: "yyy"})
WITH n,m
OPTIONAL MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;
Iterative example:
PROFILE
MATCH (n1:Consumer{mobileNumber: "xxx"})
MATCH (m:Consumer{mobileNumber: "yyy"})
UNWIND COLLECT(n1) AS n
MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;

Neo4j relate nodes by same property

I have a Neo4J DB up and running with currently 2 Labels: Company and Person.
Each Company Node has a Property called old_id.
Each Person Node has a Property called company.
Now I want to establish a relation between each Company and each Person where old_id and company share the same value.
Already tried suggestions from: Find Nodes with the same properties in Neo4J and
Find Nodes with the same properties in Neo4J
following the first link I tried:
MATCH (p:Person)
MATCH (c:Company) WHERE p.company = c.old_id
CREATE (p)-[:BELONGS_TO]->(c)
resulting in no change at all and as suggested by the second link I tried:
START
p=node(*), c=node(*)
WHERE
HAS(p.company) AND HAS(c.old_id) AND p.company = c.old_id
CREATE (p)-[:BELONGS_TO]->(c)
RETURN p, c;
resulting in a runtime >36 hours. Now I had to abort the command without knowing if it would eventually have worked. Therefor I'd like to ask if its theoretically correct and I'm just impatient (the dataset is quite big tbh). Or if theres a more efficient way in doing it.
This simple console shows that your original query works as expected, assuming:
Your stated data model is correct
Your data actually has Person and Company nodes with matching company and old_id values, respectively.
Note that, in order to match, the values must be of the same type (e.g., both are strings, or both are integers, etc.).
So, check that #1 and #2 are true.
Depending on the size of your dataset you want to page it
create constraint on (c:Company) assert c.old_id is unique;
MATCH (p:Person)
WITH p SKIP 100000 LIMIT 100000
MATCH (c:Company) WHERE p.company = c.old_id
CREATE (p)-[:BELONGS_TO]->(c)
RETURN count(*);
Just increase the skip value from zero to your total number of people in 100k steps.

Neo4j - Cypher statement to build relationships took near half of day to complete with error "Self-suppression not permitted"

Usually I am building relationships between nodes while loading from CSV files. Here is a statement written cypher I used this time to build relationships between nodes. The Language nodes are 39K and the Description nodes are 2M.
MATCH (d:Description),(l:Language)
> WHERE d.description_language = l.language_name
> CREATE (d)-[r:HAS_LANGUAGE]->(l);
After a long, run the error I got is:
Self-suppression not permitted
I have created indexes on for the properties to be used in the relationship.
Indexes
...
ON :Description(woka_id) ONLINE
ON :Description(description_language) ONLINE
ON :Language(language_id) ONLINE (for uniqueness constraint)
ON :Language(language_name) ONLINE (for uniqueness constraint)
...
What I am doing wrong here causing such a long time to complete the relationships creation (more than 10 hours)?
You are dealing with a very large cartesian product at the filter step:
WHERE d.description_language = l.language_name
You could try to MATCH the Descriptions, group them by their description_language and CREATE the relationships from there:
MATCH (d:Description)
WITH d.description_language AS dl, collect(d) as all_d_for_lang
MATCH (l:Language {language_name: dl})
UNWIND all_d_for_lang AS d
CREATE (l)-[:HAS_LANGUAGE]->(d)
If you look at the PROFILE of this query you'll see there are less DB hits (limit the number of descriptions in the first MATCH for testing).
In general, I think the best way would be to use your CSV files to generate relationships when you create the nodes, i.e. do this application side, not on the database.
Since you are creating relationships from every Description node and there are 2M of them I would just grab the description that are not yet matched and do them in smaller batches.
Something like...
match (d:Description)
where not ( d-[:HAS_LANGUAGE]->() )
with d
limit 200000
match (l:Language {language_name: d.description_language} )
create d-[:HAS_LANGUAGE]->l

neo4j cypher: "stacking" nodes from query result

Considering the existence of three types of nodes in a db, connected by the schema
(a)-[ra {qty}]->(b)-[rb {qty}]->(c)
with the user being able to have some of each in their wishlist or whatever.
What would be the best way to query the database to return a list of all the nodes the user has on their wishlist, considering that when he has an (a) then in the result the associated (b) and (c) should also be returned after having multiplied some of their fields (say b.price and c.price) for the respective ra.qty and rb.qty?
NOTE: you can find the same problem without the variable length over here
Assuming you have users connected to the things they want like so:
(user:User)-[:WANTS]->(part:Part)
And that parts, like you describe, have dependencies on other parts in specific quantities:
CREATE
(a:Part) -[:CONTAINS {qty:2}]->(b:Part),
(a:Part) -[:CONTAINS {qty:3}]->(c:Part),
(b:Part) -[:CONTAINS {qty:2}]->(c:Part)
Then you can find all parts, and how many of each, you need like so:
MATCH
(user:User {name:"Steven"})-[:WANTS]->(part),
chain=(part)-[:CONTAINS*1..4]->(subcomponent:Part)
RETURN subcomponent, sum( reduce( total=1, r IN relationships(chain) | total * r.rty) )
The 1..4 term says to look between 1-4 sub-components down the tree. You can obv. set that to whatever you like, including "1..", infinite depth.
The second term there is a bit complex. It helps to try the query without the sum to see what it does. Without that, the reduce will do the multiplying of parts that you want for each "chain" of dependencies. Adding the sum will then aggregate the result by subcomponent (inferred from your RETURN clause) and sum up the total count for that subcomponent.
Figuring out the price is then an excercise of multiplying the aggregate quantities of each part. I'll leave that as an exercise for the reader ;)
You can try this out by running the queries in the online console at http://console.neo4j.org/

Resources