DELETE the MIN counted data in neo4j - neo4j

I want to delete some data after doing some counting on neo4j. This method can be done manually(counting the data then delete the data), but i need someone to point me whether it's possible or impossible to do this automatically(counting data and delete data in the same query). I couldn't find a way to return the least/minimal data after i did some counting using min() function in neo4j. I can probably do a workaround using order by and limit the data, but i need to be sure that there is no other option than this if i want to do this method.
This is the link to the data. The data is a custom event log that only consists of case_id and activity name.
So this is what i've already tried:
//LOAD DATA
LOAD CSV with headers FROM "file:///*.csv"
AS line
Create (:Activity {CaseId:line.Case_ID,
Name:line.Activity })
LOAD CSV with headers FROM "file:///*.csv"
AS line
Create (:CaseActivity {CaseId:line.Case_ID,
Name:line.Activity })
//SEQUENCE DISCOVERY
match (c:Activity)
with collect(c) AS Caselist
unwind range(0,Size(Caselist) - 2) as idx
with Caselist[idx] AS s1, Caselist[idx+1] AS s2
match (b:CaseActivity),(a:CaseActivity)
where s1.CaseId = s2.CaseId AND
s1.Name = a.Name AND
s2.Name = b.Name AND
s1.CaseId = a.CaseId AND
s2.CaseId = b.CaseId
merge (a)-[:NEXT {relation:"NEXT"}]->(b)
match(a:Activity)
with a.CaseId as id,
collect (a.Name) as Trace_Type
match(b:CaseActivity)
where id = b.CaseId
return count (distinct b.CaseId) as Frequencies, Trace_Type, Collect(distinct b.CaseId) as CaseId
order by Frequencies desc

Your question did not specify what you wanted to delete. This query assumes that you wanted your last query to delete the b nodes (and return some data about the deleted b nodes):
MATCH (a:Activity)
WITH a.CaseId as id, COLLECT(a.Name) AS Trace_Type
MATCH (b:CaseActivity)
WHERE id = b.CaseId
WITH
COUNT(distinct b.CaseId) AS Frequencies,
Trace_Type,
COLLECT(distinct b.CaseId) AS CaseId,
COLLECT(DISTINCT b) AS bs
FOREACH(x IN bs | DELETE x)
RETURN Frequencies, Trace_Type, CaseId
ORDER BY Frequencies DESC;
Variables containing values obtained from deleted b nodes (like Frequencies and CaseId) will still be valid after the nodes are deleted.
A tricky thing to note about your specific example is that your last WITH clause was using aggregation, with Trace_Type as the grouping key. In order for my answer to avoid changing the grouping key (and thereby possibly changing your returned results), I just added COLLECT(DISTINCT b) AS bs to the WITH clause. Then, since each bs is a list of b nodes (for a Trace_Type), I used FOREACH to delete the nodes in each list.

Related

Do not return set of nodes from a specific path in Cypher

I am trying to return a set of a node from 2 sessions with a condition that returned node should not be present in another session (third session). I am using the following code but it is not working as intended.
MATCH (:Session {session_id: 'abc3'})-[:HAS_PRODUCT]->(p:Product)
UNWIND ['abc1', 'abc2'] as session_id
MATCH (target:Session {session_id: session_id})-[r:HAS_PRODUCT]->(product:Product)
where p<>product
WITH distinct product.products_id as products_id, r
RETURN products_id, count(r) as score
ORDER BY score desc
This query was supposed to return all nodes present in abc1 & abc2 but not in abc3. This query is not excluding all products present in abc3. Is there any way I can get it working?
UPDATE 1:
I tried to simplify it without UNWIND as this
match (:Session {session_id: 'abc3'})-[:HAS_PRODUCT]->(p:Product)
MATCH (target:Session {session_id: 'abc1'})-[r:HAS_PRODUCT]->(product:Product)
where product <> p
WITH distinct product.products_id as products_id
RETURN products_id
Even this is also not working. It is returning all items present in abc1 without removing those which are already in abc3. Seems like where product <> p is not working correctly.
I would suggest it would be best to check if the nodes are in a list, and to prove out the approach, start with a very simple example.
Here is a simple cypher showing one way to do it. This approach can then be extended into the complex query,
// get first two product IDs as a list
MATCH (p:Product)
WITH p LIMIT 2
WITH COLLECT(ID(p)) as list
RETURN list
// now show two more product IDs which not in that list
MATCH (p:Product)
WITH p LIMIT 2
WITH COLLECT(ID(p)) as list
MATCH (p2:Product)
WHERE NOT ID(p2) in list
RETURN ID(p2) LIMIT 2
Note: I'm using the ID() of the nodes instead of the entire node, same dbhits but may be more performant...

Deleting duplicate relationships in neo4j - is this correct?

I have developed a query which, by trial and error, appears to find all of the duplicated relationships in a Neo4j DB. I want delete all but one of these relationships but I'm concerned that I have not thought of problematic cases that could result in data deletion.
So, does this query delete all but one of a duplicated relationship?
MATCH (a)-->(b)<--(a) # identify where the duplication is present
WITH DISTINCT a, b
MATCH (a)-[r]->(b) # get all duplicated paths themselves
WITH a, b, collect(r)[1..] as rs # remove the first instance from the list
UNWIND rs as r
DELETE r
If I replace the UNWIND rs as r; DELETE r with WITH a, b, count(rs) as cnt RETURN cnt it seems to return the unnecessary relationships.
I'm still relucant to put this somewhere to be used by others, though....
Thanks
First of all, let me (strictly) define the term: "duplicate relationships". Two relationships are duplicates if they:
Connect the same pair of nodes (call them a and b)
Have the same relationship type
Have exactly the same set of properties (both names and values)
Have the same directionality between a and b (iff directionality is significant for use case)
Your query only considers #1 and #4, so it generally could delete non-duplicate relationships as well.
Here is a query that will take all of the above into consideration (assuming #4 should be included):
MATCH (a)-[r1]->(b)<-[r2]-(a)
WHERE TYPE(r1) = TYPE(r2) AND PROPERTIES(r1) = PROPERTIES(r2)
WITH a, b, apoc.coll.union(COLLECT(r1), COLLECT(r2))[1..] AS rs
UNWIND rs as r
DELETE r
Aggregating functions (like COLLECT) use non-aggregated terms as grouping keys, so there is no need for the query to perform a separate redundant DISTINCT a,b test.
The APOC function apoc.coll.union returns the distinct union of its 2 input lists.

Nodes with relationship to multiple nodes

I want to get the Persons that know everyone in a group of persons which know some specific places.
This:
MATCH (:Place {name:'Breiter Weg'})<-[:knows]-(b:Person)-[:knows]->(:Place {name:'Buchhandel'})
WITH collect(DISTINCT b) as persons
Match (a:Person)
WHERE ALL(b in persons WHERE (a)-[:knows]->(b))
RETURN a
works, but for the second part does a full nodelabelscan, before applying the where clause, which is extremely slow - in a bigger db it takes 8~9 seconds. I also tried this:
MATCH (:Place {name:'Breiter Weg'})<-[:knows]-(b:Person)-[:knows]->(:Place {name:'Buchhandel'})
Match (a:Person)-[:knows]->(b)
RETURN a
This only needs 2ms, however it returns all persons that know any person of group b, instead of those that know everyone.
So my question is: Is there a effective/fast query to get what i want?
We have a knowledge base article for this kind of query that show a few approaches.
One of these is to match to :Persons known by the group, and then count the number of times each of those persons shows up in the results. Provided there aren't multiple :knows relationships between the same two people, if the count is equal to the collection of people from your first match, then that person must know all of the people in the collection.
MATCH (:Place {name:'Breiter Weg'})<-[:knows]-(b:Person)-[:knows]->(:Place {name:'Buchhandel'})
WITH collect(b) as persons
UNWIND persons as b // so we have the entire list of persons along with each person
WITH size(persons) as total, b
MATCH (a:Person)-[:knows]->(b)
WITH total, a, count(a) as knownCount
WHERE total = knownCount
RETURN a
Here is a simpler Cypher query that also compares counts -- the same basic idea used by #InverseFalcon.
MATCH (:Place {name:'Breiter Weg'})<-[:knows]-(b:Person)-[:knows]->(:Place {name:'Buchhandel'}), (a:Person)-[:knows]->(b)
WITH COLLECT({a:a, b:b}) as data, COUNT(DISTINCT b) AS total
UNWIND data AS d
WITH total, d.a AS a, COUNT(d.b) AS bCount
WHERE total = bCount
RETURN a

Return multiple sums of relationship weights using cypher

I have a graph with one node type 'nodeName' and one relationship type 'relName'. Each node pair has 0-1 'relName' relationships with each other but each node can be connected to many nodes.
Given an initial list of nodes (I'll refer to this list as the query subset) I want to:
Find all the nodes that connect to the query subset
I'm currently doing this (which may be overly convoluted):
MATCH (a: nodeName)-[r:relName]-()
WHERE (a.name IN ['query list'])
WITH a
MATCH (b: nodeName)-[r2:relName]-()
WHERE NOT (b.name IN ['query list'])
WITH a, b
MATCH (a)--(b)
RETURN DISTINCT b
Then for each connected node (b) I want to return the SUM of the weights that connect to the query subset
For example. If node b1 has 4 edges that connect to nodes in the query subset I would like to RETURN SUM(r2.weight) AS totalWeight for b2. I actually need a list of all the b nodes ordered by totalWeight.
No. 2 is where I'm stuck. I've been reading the docs about FOREACH and reduce() but I'm not sure how to apply them here.
Speed is important as I have 30,000 nodes and 1.5M edges if you have any suggestions regarding this please throw them into the mix.
Many thanks
Matt
Why do you need so many Match statements? You can specify a nodes and b nodes in single Match statement and select only those who have a relationship between them.
After that just return b nodes and sum of the weights. b nodes will automatically be acting as a group by if it is returned along with aggregation function such as sum.
MATCH (a:nodeName)-[r:relName]-(b:nodeName)
WHERE (a.name IN ['query list']) AND NOT((b.name IN ['query list']))
RETURN b.name, sum(r.weight) as weightSum order by weightSum
I think we can simplify that query a bit.
MATCH (a: nodeName)
WHERE (a.name IN ['query list'])
WITH collect(a) as subset
UNWIND subset as a
MATCH (a)-[r:relName]-(b)
WHERE NOT b in subset
RETURN b, sum(r.weight) as totalWeight
ORDER BY totalWeight ASC
Since sum() is an aggregating function, it will make the non-aggregation variables the grouping key (in this case b), so the sum is per b node, then we order them (switch to DESC if needed).

Perform MATCH on collection / Break apart collection

I am trying to mimc the functionality of the neo4j browser to display my graph in my front end. The neo4j browser issues two calls for every query - the first call performs the query that the user types into the query box and the second call uses find the relationships between every node returned in the first user-entered query.
{
"statements":[{
"statement":"START a = node(1,2,3,4), b = node(1,2,3,4)
MATCH a -[r]-> b RETURN r;",
"resultDataContents":["row","graph"],
"includeStats":true}]
}
In my application I would like to be more efficient so I would like to be able to get all of my nodes and relationships in a single query. The query that I have at present is:
START person = node({personId})
MATCH person-[:RELATIONSHIP*]-(p:Person)
WITH distinct p
MATCH p-[r]-(d:Data), p-[:DETAILS]->(details), d-[:FACT]->(facts)
RETURN p, r, d, details, facts
This query runs well but it doesn't give me the "d" and "details" nodes which were linked to the original "person".
I have tried to join the "p" and "person" results in a collection:
collect(p) + collect(person) AS people
But this does not allow me to perform a MATCH on the resulting collection. As far as I can figure out there is no way of breaking apart a collection.
The only option I see at the moment is to split the query into two; return the "collect(p) + collect(person) AS people" collection and then use the node values in a second query. Is there a more efficient way of performing this query?
If you use the quantifier *0.. RELATIONSHIP is also match at a depth of 0 making person the same as p in this case. The * without specified limits defaults to 1..infinity
START person = node({personId})
MATCH person-[:RELATIONSHIP*0..]-(p:Person)
WITH distinct p
MATCH p-[r]-(d:Data), p-[:DETAILS]->(details), d-[:FACT]->(facts)
RETURN p, r, d, details, facts

Resources