I have a query that returns two variables A & B. A returns collections that contain a variable number of values. B is a mathematical score related to each collection. I want to store A as nodes in the database with property B. Then if possible to check each node of A against the other nodes and create a relationship if both the values of B> some number. Is this possible in Neo4j?
An example of what my A and B looks like
MATCH (t:Trans)-[:CONTAINS]->(i2:Item), (t:Trans)-[:CONTAINS]->(i1:Item), (t:Trans)-[:CONTAINS]->(i3:Item)
WITH i1, i2, i3, COUNT(*) as c
WHERE c>100
WITH COLLECT({i1: i1, i2:i2, i3:i3, c: c}) AS data
UNWIND data AS d
WITH COLLECT({i1:d.i1.I_ID, i2:d.i2.I_ID, i3:d.i3.I_ID}) as Itemset, d
RETURN Itemset, d.c as NumTransactions
A B
{a,b,c} 45
{e,f} 23
{a,e,f} 12
{d} 89
[UPDATED]
Your query is doing a lot of unnecessary work. Also, your Itemset is always a list containing a single map, so having a list seems unnecessary.
This query should be equivalent (except that Itemset is just a map):
MATCH (t:Trans)-[:CONTAINS]->(i2:Item), (t)-[:CONTAINS]->(i1:Item), (t)-[:CONTAINS]->(i3:Item)
WITH {i1: i1.I_ID, i2: i2.I_ID, i3: i3.I_ID} AS Itemset, COUNT(*) as NumTransactions
WHERE NumTransactions > 100
RETURN Itemset, NumTransactions
It feels to me like you should not be storing the Itemset in an A node. Here is an example of how to create relationships from each Item to the same B node:
MATCH (t:Trans)-[:CONTAINS]->(i2:Item), (t)-[:CONTAINS]->(i1:Item), (t)-[:CONTAINS]->(i3:Item)
WITH i1.I_ID AS id1, i2.I_ID AS id2, i3.I_ID AS id3, COUNT(*) as c
WHERE c > 100
MERGE (b:B {numTransactions: c})
MERGE (i1)-[:FOO]->(b)
MERGE (i2)-[:FOO]->(b)
MERGE (i3)-[:FOO]->(b)
The second part looks easy enough:
give the A nodes a distinct label
MATCH (n:A_LABEL) WHERE a.b_property > 35 RETURN (n)
iterate over the list, checking for a relationship from the first one to each of the rest, and creating one if it's not there.
Could the values change over time? If so, you may need a corresponding function that removes this kind of relationship from nodes whose score is now <= the threshold value.
It's harder to advise about how to store the collections, without knowing more about the problem you're solving.
Related
I have developed a query which, by trial and error, appears to find all of the duplicated relationships in a Neo4j DB. I want delete all but one of these relationships but I'm concerned that I have not thought of problematic cases that could result in data deletion.
So, does this query delete all but one of a duplicated relationship?
MATCH (a)-->(b)<--(a) # identify where the duplication is present
WITH DISTINCT a, b
MATCH (a)-[r]->(b) # get all duplicated paths themselves
WITH a, b, collect(r)[1..] as rs # remove the first instance from the list
UNWIND rs as r
DELETE r
If I replace the UNWIND rs as r; DELETE r with WITH a, b, count(rs) as cnt RETURN cnt it seems to return the unnecessary relationships.
I'm still relucant to put this somewhere to be used by others, though....
Thanks
First of all, let me (strictly) define the term: "duplicate relationships". Two relationships are duplicates if they:
Connect the same pair of nodes (call them a and b)
Have the same relationship type
Have exactly the same set of properties (both names and values)
Have the same directionality between a and b (iff directionality is significant for use case)
Your query only considers #1 and #4, so it generally could delete non-duplicate relationships as well.
Here is a query that will take all of the above into consideration (assuming #4 should be included):
MATCH (a)-[r1]->(b)<-[r2]-(a)
WHERE TYPE(r1) = TYPE(r2) AND PROPERTIES(r1) = PROPERTIES(r2)
WITH a, b, apoc.coll.union(COLLECT(r1), COLLECT(r2))[1..] AS rs
UNWIND rs as r
DELETE r
Aggregating functions (like COLLECT) use non-aggregated terms as grouping keys, so there is no need for the query to perform a separate redundant DISTINCT a,b test.
The APOC function apoc.coll.union returns the distinct union of its 2 input lists.
MATCH (c:someNode) WHERE LOWER(c.erpId) contains (LOWER("1"))
OR LOWER(c.constructionYear) contains (LOWER("1"))
OR LOWER(c.label) contains (LOWER("1"))
OR LOWER(c.name) contains (LOWER("1"))
OR LOWER(c.description) contains (LOWER("1"))with collect(distinct c) as rows, count(c) as total
MATCH (c:someNode)-[adtype:OFFICIAL_someNode_ADDRESS]->(ad:anotherObject)
WHERE toString(ad.streetAddress) contains "1"
OR toString(ad.postalCity) contains "1"
with distinct rows+collect( c) as rows, count(c) +total as total
UNWIND rows AS part
RETURN part order by part.name SKIP 20 Limit 20
When I run the following cypher query it returns duplicate results. Also it the skip does not seem to work. What am I doing worng
When you use WITH DISTINCT a, b, c (or RETURN DISTINCT a, b, c), that just means that you want each resulting record ({a: ..., b: ..., c: ...}) to be distinct -- it does not affect in any way the contents of any lists that may be part of a, b, or c.
Below is a simplified query that might work for you. It does not use the LOWER() and TOSTRING() functions at all, as they appear to be superfluous. It also only uses a single MATCH/WHERE pair to find all the the nodes of interest. The pattern comprehension syntax is used as part of the WHERE clause to get a non-empty list of true value(s) iff there are any anotherObject node(s) of interest. Notice that DISTINCT is not needed.
MATCH (c:someNode)
WHERE
ANY(
x IN [c.erpId, c.constructionYear, c.label, c.name, c.description]
WHERE x CONTAINS "1") OR
[(c)-[:OFFICIAL_someNode_ADDRESS]->(ad:anotherObject)
WHERE ad.streetAddress CONTAINS "1" OR ad.postalCity CONTAINS "1"
| true][0]
RETURN c AS part
ORDER BY part.name SKIP 20 LIMIT 20;
I have a graph with one node type 'nodeName' and one relationship type 'relName'. Each node pair has 0-1 'relName' relationships with each other but each node can be connected to many nodes.
Given an initial list of nodes (I'll refer to this list as the query subset) I want to:
Find all the nodes that connect to the query subset
I'm currently doing this (which may be overly convoluted):
MATCH (a: nodeName)-[r:relName]-()
WHERE (a.name IN ['query list'])
WITH a
MATCH (b: nodeName)-[r2:relName]-()
WHERE NOT (b.name IN ['query list'])
WITH a, b
MATCH (a)--(b)
RETURN DISTINCT b
Then for each connected node (b) I want to return the SUM of the weights that connect to the query subset
For example. If node b1 has 4 edges that connect to nodes in the query subset I would like to RETURN SUM(r2.weight) AS totalWeight for b2. I actually need a list of all the b nodes ordered by totalWeight.
No. 2 is where I'm stuck. I've been reading the docs about FOREACH and reduce() but I'm not sure how to apply them here.
Speed is important as I have 30,000 nodes and 1.5M edges if you have any suggestions regarding this please throw them into the mix.
Many thanks
Matt
Why do you need so many Match statements? You can specify a nodes and b nodes in single Match statement and select only those who have a relationship between them.
After that just return b nodes and sum of the weights. b nodes will automatically be acting as a group by if it is returned along with aggregation function such as sum.
MATCH (a:nodeName)-[r:relName]-(b:nodeName)
WHERE (a.name IN ['query list']) AND NOT((b.name IN ['query list']))
RETURN b.name, sum(r.weight) as weightSum order by weightSum
I think we can simplify that query a bit.
MATCH (a: nodeName)
WHERE (a.name IN ['query list'])
WITH collect(a) as subset
UNWIND subset as a
MATCH (a)-[r:relName]-(b)
WHERE NOT b in subset
RETURN b, sum(r.weight) as totalWeight
ORDER BY totalWeight ASC
Since sum() is an aggregating function, it will make the non-aggregation variables the grouping key (in this case b), so the sum is per b node, then we order them (switch to DESC if needed).
A node A has 3 connected Nodes B1, B2, B3. Those Bx Nodes have again connected Nodes C1,C2,C3 and C4. Also Node A have 2 connected nodes C5 and C6.
Starting with node A I want to collect all C-nodes. I did a query for the A node, collect the two C-Nodes, then a query for the B-nodes, collect again all C-nodes and merge both arrays. Work but is not very clever.
I tried (Pseudocode)
MATCH (g)<-[:IS_SUBGROUP_OF*1]-(i)-[:HAS_C_NODES]->(c) WHERE g = A.uuid RETURN C_NODES
But I get either all c-nodes for A or for the B-nodes
How would I do a query that collects all C-Nodes starting with Node A?
* edited *
Here is some example data:
CREATE (a:A), (b1:B1), (b2:B2), (b3:B3), (c1:C1), (c2:C2), (c3:C3), (c4:C4), (a)-[r:HAS]->(c4), (a)-[r1:HAS]->(b1), (a)-[r2:HAS]->(b2), (a)-[r3:HAS]->(b3), (b1)-[r4:HAS]->(c1), (b1)-[r5:HAS]->(c2), (b2)-[r6:HAS]->(c3)
A query should return all nodes starting with C, no matter to which node they are connected (A or B).
You can add multiple labels for each node. You should use this to your advantage and segregate all the B and C nodes into a second label.
Eg:
CREATE (a:A), (b1:B1:BType), (b2:B2:BType), (b3:B3:BType), (c1:C1:CType), (c2:C2:CType), (c3:C3:CType), (c4:C4:CType), (a)-[r:HAS]->(c4), (a)-[r1:HAS]->(b1), (a)-[r2:HAS]->(b2), (a)-[r3:HAS]->(b3), (b1)-[r4:HAS]->(c1), (b1)-[r5:HAS]->(c2), (b2)-[r6:HAS]->(c3)
I have modified your create statement to group all the B nodes as :BType label and all the C nodes as :CType label.
You can simply use the optional match keyword to selectively traverse through the relationships if they exist and obtain the results you want.
match (a:A)-[:HAS]->(b:BType)-[:HAS]->(c:CType) optional match (a:A)-[:HAS]->(xc:CType) return c,xc
If you would like both sets of nodes to be grouped together you could try this statement instead which uses collect().
match (a:A)-[:HAS]->(b:BType)-[:HAS]->(c:CType) with a,collect (distinct c) as set1 optional match (a:A)-[:HAS]->(xc:CType) return set1 + collect (distinct xc) as output
I have nodes A and B and a relationship R from A to B. R has a property date_created. Is it possible to construct a Neo4j query that matches a specific A and returns the related (through R) B nodes with, say, the 10 highest values of date_created?
I think it would be straightforward if I wanted to return the B nodes that had a date_created property on R of greater than some specific date, but not sure if I can do the kind of relative query I want to do?
Thanks,
Paul
You aren't listing any labels, so I'll assume that you have a 'Object' label:
MATCH (a:Object)-[:r]->(b:Object)
WHERE a.some_property = {some_value}
RETURN b
ORDER BY r.date_created DESC
LIMIT 10