Get entire subgraph starting from one node in Neo4J? - neo4j

I want to extract (retrieve) all the nodes and relationships in a graph starting from a specific node.
I have tried something like:
MATCH (n:Resource {resource_id: "R208997"})
MATCH p=(n)-[*]->(m)
RETURN p
This fetches all the paths from the node I have, but It is not really what I want.
What I want is to have a table showing the following:
From | Rel. | To
----------------
x | r | y
z | r2 | g
I am using version: 3.4.12 Community edition of Neo4J, and the data that I'm working on looks like this:

Best approach is to use APOC Procedures, we have some path expander procs for doing this efficiently.
You can use apoc.path.subgraphAll() for this, YIELDing relationships which you can alias accordingly:
MATCH (n:Resource {resource_id: "R208997"})
CALL apoc.path.subgraphAll(n, {relationshipFilter:'>'}) YIELD relationships
UNWIND relationships as rel
RETURN startNode(rel) as from, type(rel) as rel, endNode(rel) as to
If you need to output only certain properties from the nodes rather than the node itself, then you can modify that in your RETURN accordingly.

Related

Complex neo4j cypher query to traverse a graph and extract nodes of a specific label and use them in optional match

I have a huge database of size 260GB, which is storing a ton of transaction information. It has Agent, Customer,Phone,ID_Card as the nodes. Relationships are as follows:
Agent_Send, Customer_Send,Customer_at_Agent, Customer_used_Phone,Customer_used_ID.
A single agent is connected to many customers .And hence hitting the agent node while querying a path is not feasible. Below is my query:
match p=((ph: Phone {Phone_ID : "3851308.0"})-[r:Customer_Send
| Customer_used_ID | Customer_used_Phone *1..5]-(n2))
with nodes(p) as ns
return extract (node in ns | Labels(node) ) as Labels
I am starting with a phone number and trying to extract a big "Customer" network. I am intentionally not touching the "Customer_at_Agent" relationship in the above networked query as it is not optimal as far as performance is concerned.
So, the idea is to extract all the "Customer" labeled nodes from the path and match it with [Customer_at_Agent] relationship.
For instance , something like:
match p=((ph: Phone {Phone_ID : "3851308.0"})-[r:Customer_Send
| Customer_used_ID | Customer_used_Phone *1..5]-(n2))
with nodes(p) as ns
return extract (node in ns | Labels(node) ) as Labels
of "type customer as c "
optional match (c)-[r1:Customer_at_Agent]-(n3)
return distinct p,r1
I am still new to neo4j and cypher and I am not able to figure out a hack to extract only "customer" nodes from the path and use that in the optional match.
Thanks in advance.
Use filter notation instead of extract and you can drop any nodes that aren't labelled right. Try out this query instead:
MATCH p = (ph:Phone {Phone_ID : "3851308.0"}) - [:Customer_Send|:Customer_used_ID|:Customer_used_Phone*1..5] - ()
WITH ph, [node IN NODES(p) WHERE node:Customer] AS customer_nodes
UNWIND customer_nodes AS c_node
OPTIONAL MATCH (c_node) - [r1:Customer_at_Agent] - ()
RETURN ph, COLLECT(DISTINCT r1)
So the second line takes the phone number and the path generated and gives you a list of nodes that have the Customer label as customer_nodes. You then unwind this list so you have individual nodes you can use in path matching. Line 4 performs your optional match and finds the r1 you're interested in, then line 5 will return the phone number node you started with and a collection of all of the r1 relationships that you found on customer nodes hooked up to that phone number.
UPDATE: I added some modifications to clean up your first query line as well. If you aren't going to use an alias (like r or n2 in the first line), then don't assign them in the first place; they can affect performance and cause confusion. Empty nodes and relationships are totally fine if you don't actually have any restrictions to place on them. You also don't need parentheses to mark off a path; they are used as a core part of Cypher's ASCII art to signify nodes, so I find they are more confusing than helpful.

Return Neo4J Combined Relationships When Searching Across Several Relationship Types

I would like to query for various things and returned a combined set of relationships. In the example below, I want to return all people named Joe living on Main St. I want to return both the has_address and has_state relationships.
MATCH (p:Person),
(p)-[r:has_address]-(a:Address),
(a)-[r1:has_state]-(s:State)
WHERE p.name =~ ".*Joe.*" AND a.street = ".*Main St.*"
RETURN r, r1;
But when I run this query in the Neo4J browser and look under the "Text" view, it seems to put r and r1 as columns in a table (something like this):
│r │r1 │
╞═══╪═══|
│{} │{} │
rather than as desired with each relationship on a different row, like:
Joe Smith | has_address | 1 Main Street
1 Main Street | has_state | NY
Joe Richards | has_address | 22 Main Street
I want to download this as a CSV file for filtering elsewhere. How do I re-write the query in Neo4J to get the desired result?
You may want to look at the Cypher cheat sheet, specifically the Relationship Functions.
That said, you have variables on all the nodes you need. You can output all the data you need on each row.
MATCH (p:Person),
(p)-[r:has_address]-(a:Address),
(a)-[r1:has_state]-(s:State)
WHERE p.name =~ ".*Joe.*" AND a.street = ".*Main St.*"
RETURN p.name AS name, a.street AS address, s.name AS state
That should be enough.
What you seem to be asking for above is a way to union r and r1, but in such a way that they alternate in-order, one row being r and the next being its corresponding r1. This is a rather atypical kind of query, and as such there isn't a lot of support for easily making this kind of output.
If you don't mind rows being out of order, it's easy to do, but your start and end nodes for each relationship are no longer the same type of thing.
MATCH (p:Person),
(p)-[r:has_address]-(a:Address),
(a)-[r1:has_state]-(s:State)
WHERE p.name =~ ".*Joe.*" AND a.street = ".*Main St.*"
WITH COLLECT(r) + COLLECT(r1) as rels
UNWIND rels AS rel
RETURN startNode(rel) AS start, type(rel) AS type, endNode(rel) as end

Cypher: Find relationships of nodes with connected parents

I'm hoping this diagram will be sufficient to explain what I'm after:
true
a--------------------b
| |
parent | | parent
| |
a_e------------------b_e
experimental
nodes a_e and b_e are experimental observations that each have only one parent, a and b, respectively. I know a true relationship exists between a and b, and I want to find cases where experimental relationships were observed between a_e and b_e. Among other things, I tried the following:
MATCH (n)-[:true]-(m)
WITH n,m
MATCH (n)-[:parent]-(i)
MATCH (m)-[:parent]-(j)
WITH i,j
OPTIONAL MATCH (i)-[r]-(j)
RETURN r
but this returns no rows. I'm thinking of this like a nested loop, matching all possible relationships between all i's and all j's. Is this type of query possible?
Something like
match (n)-[:true]-(m)
match (n)-[:parent]->(n_child)-[:experimental]-(m_child)<-[:PARENT]-(m)
return n_child,m_child
(not tested)
Assuming this is an example and you have labels etc. on your nodes.

Create relationships in Neo4j

I have a graph with about 800k nodes and I want to create random relationships among them, using Cypher.
Examples like the following didn't work because the cartesian product is too big:
match (u),(p)
with u,p
create (u)-[:LINKS]->(p);
For example I want 1 relationship for each node (800k), or 10 relationships for each node (8M).
In short, I need a query Cypher in order to UNIFORMLY create relationships between nodes.
Does someone know the query to create relationships in this way?
So you want every node to have exactly x relationships? Try this in batches until no more relationships are updated:
MATCH (u),(p) WHERE size((u)-[:LINKS]->(p)) < {x}
WITH u,p LIMIT 10000 WHERE rand() < 0.2 // LIMIT to 10000 then sample
CREATE (u)-[:LINKS]->(p)
This should work (assuming your neo4j server has enough memory):
MATCH (n)
WITH COLLECT(n) AS ns, COUNT(n) AS len
FOREACH (i IN RANGE(1, {numLinks}) |
FOREACH (x IN ns |
FOREACH(y IN [ns[TOINT(RAND()*len)]] |
CREATE (x)-[:LINK]->(y) )));
This query collects all nodes, and uses nested loops to do the following {numLinks} times: create a LINK relationship between every node and a randomly chosen node.
The innermost FOREACH is used as a workaround for the current Cypher limitation that you cannot put an operation that returns a node inside a node pattern. To be specific, this is illegal: CREATE (x)-[:LINK]->(ns[TOINT(RAND()*len)]).

neo4j collecting nodes and relations type b-->a<--c,a<--d

I am extending maxdemarzi's excellent graph visualisation example (http://maxdemarzi.com/2013/07/03/the-last-mile/) using VivaGraph backed by neo4j.
I want to display relationships of the type
a-->b<--c,b<--d
I tried the query
MATCH p = (a)--(b:X)--(c),(b:X)--(d)
RETURN EXTRACT(n in nodes(p) | {id:ID(n), name:COALESCE(n.name, n.title, ID(n)), type:LABELS(n)}) AS nodes,
EXTRACT(r in relationships(p)| {source:ID(startNode(r)) , target:ID(endNode(r))}) AS rels
It looks like the named query picks up only a-->b<--c pattern and omits the b<--d patterns.
Am i missing something... can i not add multiple patterns in a named query?
The most immediate problem is that the comma in the MATCH clause separates the first pattern from the second. The variable 'p' only stores the first pattern. This is why you aren't getting the results you desire. Independent of that, you are at risk of having a 'loose binding' by putting a label on both of your nodes named 'b' in the two patterns. The second 'b' node should not have a label.
So here is a version of your query that should work.
MATCH p1=(a)-->(b:X)<--(c), p2=(b)<--(d)
WITH nodes(p1) + d AS ns, relationships(p1) + relationships(p2) AS rs
RETURN EXTRACT(n IN ns | {id:ID(n), name:COALESCE(n.name, n.title, ID(n)), type:LABELS(n)}) AS nodes,
EXTRACT(r in rs| {source:ID(startNode(r)) , target:ID(endNode(r))}) AS rels
Capture both paths, then build collections from the nodes and relationships of both paths. The collection of nodes actually only extracts the nodes from p1 and adds the 'd' node. You could write that part as
nodes(p1) + nodes(p2) as ns
but then the 'b' node will appear in the list twice.

Resources