Neo4J cypher query to find similar graphs

Neo4J cypher query to find similar graphs - neo4j

I have several separated graphs in a single database and I am currently searching for a way to get a list of all similar graphs.
For instance, I have the following three graphs:
As you can see, graph 1 and 2 are similar and graph 3 is different, because the last node of graph 3 has Label_4 and not Label_3 (as it is the case for 1 and 2).
Therefore, I would like to get as a result of the query something like:
[a1->b1->c1,a2->b2->c2],[a3->b3->d3]
whereas a1->b1->c1 is graph 1, a2->b2->c2 is graph 2, and a3->b3->d3 is graph 3.
Is there a way to achieve this with Cypher? The representation of the result can also be different, as long as it groups similar graphs (e.g., also a list node IDs or only the start node IDs is fine).
For the creation of the example I used the following commands:
CREATE (a1:Label_1 {name: "Label_1"})
CREATE (b1:Label_2 {name: "Label_2"})
CREATE (c1:Label_3 {name: "Label_3"})
CREATE (a2:Label_1 {name: "Label_1"})
CREATE (b2:Label_2 {name: "Label_2"})
CREATE (c2:Label_3 {name: "Label_3"})
CREATE (a3:Label_1 {name: "Label_1"})
CREATE (b3:Label_2 {name: "Label_2"})
CREATE (d3:Label_4 {name: "Label_4"})
CREATE (a1)-[:FOLLOWS]->(b1)
CREATE (b1)-[:FOLLOWS]->(c1)
CREATE (a2)-[:FOLLOWS]->(b2)
CREATE (b2)-[:FOLLOWS]->(c2)
CREATE (a3)-[:FOLLOWS]->(b3)
CREATE (b3)-[:FOLLOWS]->(d3)

If you are: (A) trying to group complete directed graphs (i.e., directed graphs that start at a root node and end at a leaf node), and (B) only interested in using one of the (possibly many) labels for each node, this should work (but, due to the unbounded variable-length relationship, it could take a very long time or run out of memory in large DBs):
MATCH p = (n)-[*]->(m)
WHERE NOT ()-->(n) AND NOT (m)-->()
RETURN [x IN NODES(p) | LABELS(x)[0]] as labelPath, COLLECT(p)
You can remove the (A) constraint by removing the WHERE clause, but then you'd have a much bigger result set (and increase the time to completion and the risk of running out of memory).

Related

Find all leaves of a selected subgraph with Neo4j/ Cypher

Initial Situation
Large Neo4j 3.4.6 graph with a tree-like structure (10 levels deep, 10 million nodes).
Unexceptional all nodes are connected with each other. The nodes as well as the relationships are in each case of the same type.
Exactly one central root node.
Reduced and simplified example:
Graphic representation
CREATE (Root:CustomType {name: 'Root'})
CREATE (NodeA:CustomType {name: 'NodeA'})
CREATE (NodeB:CustomType {name: 'NodeB'})
CREATE (NodeC:CustomType {name: 'NodeC'})
CREATE (NodeD:CustomType {name: 'NodeD'})
CREATE (NodeE:CustomType {name: 'NodeE'})
CREATE (NodeF:CustomType {name: 'NodeF'})
CREATE (NodeG:CustomType {name: 'NodeG'})
CREATE (NodeH:CustomType {name: 'NodeH'})
CREATE (NodeI:CustomType {name: 'NodeI'})
CREATE (NodeJ:CustomType {name: 'NodeJ'})
CREATE (NodeK:CustomType {name: 'NodeK'})
CREATE (NodeL:CustomType {name: 'NodeL'})
CREATE (NodeM:CustomType {name: 'NodeM'})
CREATE (NodeN:CustomType {name: 'NodeN'})
CREATE (NodeO:CustomType {name: 'NodeO'})
CREATE (NodeP:CustomType {name: 'NodeP'})
CREATE (NodeQ:CustomType {name: 'NodeQ'})
CREATE
(Root)-[:CONTAINS]->(NodeA),
(Root)-[:CONTAINS]->(NodeB),
(Root)-[:CONTAINS]->(NodeC),
(NodeA)-[:CONTAINS]->(NodeD),
(NodeA)-[:CONTAINS]->(NodeE),
(NodeA)-[:CONTAINS]->(NodeF),
(NodeE)-[:CONTAINS]->(NodeG),
(NodeE)-[:CONTAINS]->(NodeH),
(NodeF)-[:CONTAINS]->(NodeI),
(NodeF)-[:CONTAINS]->(NodeJ),
(NodeF)-[:CONTAINS]->(NodeK),
(NodeI)-[:CONTAINS]->(NodeL),
(NodeI)-[:CONTAINS]->(NodeM),
(NodeJ)-[:CONTAINS]->(NodeN),
(NodeK)-[:CONTAINS]->(NodeO),
(NodeK)-[:CONTAINS]->(NodeP),
(NodeM)-[:CONTAINS]->(NodeQ);
To be solved challenge
By means of a MATCH-WITH-UNWIND Cypher query I’m successfully able to select a subtree and bind it to a path. Let’s say the subtree spans over the nodes A,E,F,I and J.
Based on this path I need all leaves of the subtree, not the complete tree now.
.
MATCH
path = (:CustomType {name:'NodeA'})-[:CONTAINS*]->(:CustomType {name:'NodeJ'}) /* simplified */
WITH
nodes(path) as selectedPath
/* here: necessary magic to identify the leaf nodes of the subtree */
RETURN
leafNode;
Among other things I tried to solve the requirement with a WHERE NOT(node-->()) approach, but realized this works for leaves of the complete tree only. Unfortunately I was not able to convince the WHERE NOT(node-->()) clause to respect the selected subtree boundaries.
So, how can I find all leaves of a selected subgraph with Cypher and Neo4j? Can you please give me an advice how to solve this challenge? Many thanks in advance for pointing me into the right direction!

You correctly noted that the check node with no children is suitable only for the entire tree. So you need to go through all the relationships in the subtree, and find such a node of the subtree that is as the end of the relationship, but not as the start of the relationship:
MATCH
path = (:CustomType {name:'NodeA'})-[:CONTAINS*]->(:CustomType {name:'NodeJ'})
UNWIND relationShips(path) AS r
WITH collect(DISTINCT endNode(r)) AS endNodes,
collect(DISTINCT startNode(r)) AS startNodes
UNWIND endNodes AS leaf
WITH leaf WHERE NOT leaf IN startNodes
RETURN leaf

Neo4j's Cypher query language - reducing nodes in a match

Relatively new to Neo4j. I realize the way I originally posted this it was too ambiguous. Below is hopefully a better explanation.
//Subgraph 1
Create (p1:Person {name: 'Person1'})
Create (p2:Person {name: 'Person2'})
Create (a1:Address {street: 'Suspicious'})
Create (p1)-[:Resides]->(a1)
Create (p2)-[:Resides]->(a1)
//Subgraph 2
Create (p3:Person {name: 'Person3'})
Create (p4:Person {name: 'Person4'})
Create (a2:Address {street: 'Double'})
Create (p3)-[:Resides]->(a2)
Create (p4)-[:Resides]->(a2)
Create (p3)-[:Knows]->(p4)
//Subgraph 3
Create (p5:Person {name: 'Person5'})
Create (a3:Address {street: 'Single'})
Create (p5)-[:Resides]->(a3)
What I would like to write is a query to detect the following:
- All addresses (and people) that have 2 or more People residing there that do not know each other.
This means that only Subgraph1 should be found.
Subgraph2 would not be found because there are 2 people that reside there but they know each other.
Subgraph3 would not be found because there is only 1 person residing there.
Again, thanks for the help.

This Cypher query should work:
MATCH (n1)-[:RESIDES_AT]->()<-[:RESIDES_AT]-(n2)
WHERE NOT exists((n1)-[:KNOWS]-(n2))
RETURN n1, n2
start by matching on nodes that have a RESIDES_AT relationship to the same node, then filter out nodes that have a KNOWS relationship.

neo4j: REST API call to get the whole connected subgraph

Build a D3 viewer, I'd like to get all the nodes connected to one, and all the links connecting these nodes. In fact, the same thing as in the default neo4j viewer....
For example, I have
CREATE (a:Person {name:'a'})
CREATE (b:Person {name:'b'})
CREATE (c:Person {name:'c'})
CREATE (d:Person {name:'d'})
CREATE (a)-[:KNOWS]->(b)
CREATE (a)-[:KNOWS]->(c)
CREATE (b)-[:KNOWS]->(c)
CREATE (c)-[:KNOWS]->(d)
I can issue some ugly cypher query to give to the POST call, but it does not scale with medium size graphs (~100k nodes, 3M relationships).
Looking, at the posted queries in the neo4j browser, it looks like there are 2 successive one:
A) get the connected nodes
MATCH (p:Person {name:'a'})-[l:KNOWS]-(q:Person) RETURN p,q
B) get the links existing between these nodes
"START a = node(185282,185283,185284), b = node(185282,185283,185284)
MATCH a -[r]-> b
RETURN r;"
My two questions:
Is there a clean way to get list of unique nodes?
Is this two steps method the preferred one?
Again, the data can be a bit heavy, so let's keep that in mind
Thanks for the help
Alex

You can pipe the results of the first query to the second one using WITH to get all connections between Persons which the 'a' Person knows:
MATCH (p:Person)-[l:KNOWS]-(q:Person)
WHERE p.name = "a"
WITH p,q
MATCH (p)-[r]-(q)
RETURN p,r,q

Get nodes that don't have certain relationship (cypher/neo4j)

I have the following two node types:
c:City {name: 'blah'}
s:Course {title: 'whatever', city: 'New York'}
Looking to create this:
(s)-[:offered_in]->(c)
I'm trying to get all courses that are NOT tied to cities and create the relationship to the city (city gets created if doesn't exist). However, the issue is that my dataset is about 5 million nodes and any query i make times out (unless i do in increment of 10k).
... anybody has any advice?
EDIT:
Here is a query for jobs i'm running now (that has to be done in 10k chunks (out of millions) because it takes few minutes as it is. creates city if doesn't exist):
match (j:Job)
where not has(j.merged) and has(j.city)
WITH j
LIMIT 10000
MERGE (c:City {name: j.city})
WITH j, c
MERGE (j)-[:in]->(c)
SET j.merged = 1
return count(j)
(for now don't know of a good way to filter out the ones already matched, so trying to do it by tagging it with custom "merged" attribute that i already have an index on)

500000 is a fair few nodes and on your other question you suggested 90% were without the relationship that you want to create here, so it is going to take a bit of time. Without more knowledge of your system (spec, neo setup, programming environment) and when you are running this (on old data or on insert) this is just a best guess at a tidier solution:
MATCH (j:Job)
WHERE NOT (j)-[:IN]->() AND HAS(j.city)
MERGE (c:City {name: j.city})
MERGE (j)-[:IN]->(c)
return count(j)
Obviously you can add your limits back as required.

How to "combine" two nodes and relationships in neo4j using Cypher

I'm modeling a "tag cloud" with the graph:
(t:Tag {name:'cypher'})-[:IN]->(g:TagGroup)<-[:TAGGED]-(x)
IE: A named tag is part of a "TagGroup", to which zero or more nodes are "TAGGED". I chose this design as I want the ability to combine two or more named tags (e.g. "cypher" and "neo4j") so that both (Tag)s are [IN]the new (TagGroup) and the new (TagGroup) is the endpoint for the union of all nodes that were previously [TAGGED].
My only (not very pleasing) attempt is:
match (t:Tag {name:'cypher'})-[i:IN]->(g:TagGroup),
(t2:Tag {name:'neo4j'})-[:IN]->(g2:TagGroup)<-[y:TAGGED]-(x)
create (t2)-[:IN]->(g)
create unique (g)<-[:TAGGED]-(x)
with g2 as g2
match (g2)<-[r]->() delete g2,r
My main issues is that it only combines two nodes, and doesn't feel very efficient (although I have no alternatives to compare it with). Ideally I'd be able to combine an arbitrary set of (Tag)s by name.
Any ideas if this can be done with Cypher, and if so, how?

You can use labels instead of creating separate tag groups.
eg. if tag neo4j and cypher come under tag group say XYZ then
MERGE (a:Tag {name: "neo4j"})-[:TAGGED]->(x)
MERGE (b:Tag {name: "cypher"})-[:TAGGED]->(x)
set a :XYZ , b :XYZ
So next time you want tags of a particular group TAGGED to a particular post x
MATCH (a:Tag:XYZ)-[:TAGGED]->(x) return a.name

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Neo4J cypher query to find similar graphs - neo4j

Related

Find all leaves of a selected subgraph with Neo4j/ Cypher

Neo4j's Cypher query language - reducing nodes in a match

neo4j: REST API call to get the whole connected subgraph

Get nodes that don't have certain relationship (cypher/neo4j)

How to "combine" two nodes and relationships in neo4j using Cypher

Categories

Resources