How to return a tree branch from large tree graph using cypher - neo4j

I have a large scale graph database with hundreds of nodes and relationships. its look like a tree.
i want to write a query to get returned just only a branch.
i have attached a simple graphical representation of what i needed for more information....
here i want to start traverse from node A then A->B->C that is a one branch, then again start from A->B-->D that is another branch. finally i want to merge those two branch and get a output as shown in the right side.
there can be more than two output to merge it depends on my original graph. this is a example.
different color arrows shows different types of relationships.
patterns that i needed to check are:
(A)<-[:ORANGE]-p->[:RED]-q ;where p & p can be any nodes node A is known
(A)<-[:ORANGE]-r->[:GREEN]-s ;where r & s can be any nodes node A is known
![figure][1]
https://lh5.googleusercontent.com/-1a41h63adqs/UaQ7B1qdAxI/AAAAAAAAAI4/2QjGS5pa1Zc/s1600/Presentation1.png

Try MATCH (n)-[r]->(m) WHERE n.property = "B" RETURN n,r,m.
This will return the paths needed to make your graph.

Related

neo4j query to exclude nodes related to nodes with certain properties

I am trying to write a neo4j query where I only want to present nodes that are have no relation to nodes with a specific property. One way to think of it is where two separate graphs exist where one node has the property I want to exclude. I should get a result that only contains the graph of the set of nodes not connected to the node that has the property I want to exclude. This is what the graph looks like before my query
match (n) where not (n{property:'valueIWishToExclude'})--() return n
This is what the result of the query looks like
I only want to have the four connected nodes in my results. How can I set up a query that excludes the nodes that are not connected to the node with the property I wish to exclude?
In fact you need those nodes from which there is no path to the node that should be excluded. You can use the shortestPath function and ALL predicate:
match (ex) where n.property = 'valueIWishToExclude'
with collect(ex) as exn
match (n) where (not n.property = 'valueIWishToExclude') and
ALL(e in exn where not shortestPath( (n)-[*]-(e) ) is null)
return n
You are almost there, just add in the relationship in your query to only get the nodes that are related to each other
MATCH (n:label) -[:RELATED]->() where n.property<>'exclude'
RETURN n
That should return only the nodes connected to each other, as the other nodes do not have that relationship.
Let me know if that worked for you.
You may want to alter your wording a bit, what you're asking for in this question, and what you really want, are not the same thing.
In Neo4j (and most graph databases), the phrase "nodes that have no relation to..." means nodes that are not connected by a relationship to the node in question.
In that context, in your right graph (assuming the one node selected is the node marked as excluded), one node would fit the criteria and be returned as a possible result, the topmost node, since it doesn't have a relationship to the node you want to exclude; It is however two relationships removed from the excluded node.
You seem to be asking for something else, though. You seem to want nodes that are not in the same subgraph as the node to exclude. Or, alternately, nodes that have no path to the excluded node.
Make sure on future queries you're clear about what you're asking, or you'll get answers that have no relevance to what you really want.
One approach that will work is to first find all nodes within the subgraph of the excluded node, and then return all nodes that are not in those subgraph nodes.
You'll want to install APOC Procedures so you can make use of a fast means of obtaining nodes within the subgraph.
You'll also want to use labels in your graph, and maybe put an index on the property you're searching for as this will make your search fast. As it is now, your query must examine every node in your entire database to find nodes with the property in question, and that will become slower and slower as your graph grows.
Your query might look like this (using 'Label' as a stand-in for the node label):
MATCH (n:Label{propertyToExclude:'valueToExclude'})
CALL apoc.path.expandConfig(n, {bfs:true, uniqueness:"NODE_GLOBAL"}) YIELD path
WITH COLLECT(DISTINCT LAST(NODES(path))) as subgraph
MATCH (n)
WHERE NOT n in subgraph
RETURN n

Using cypher or traversal api to match only a single node on extreme sides of a path

Say I have following path in the graph:
(:Type1)<-[:RelType1]-(:Type2)<-[:RelType2]-()<-[*]-(centernode)-[*]->()-[:RelType2]->(:Type2)-[:RelType1]->(:Type1)
Given <id> of (:Type1) node on left side, I am able to MATCH above path and get corresponding (:Type1) node on right side (notice that the path is symmetric and its center is node (centernode)). In my usecase we get <id>s of (:Type1) node, get the corresponding (:Type1) node on the other side and then process further.
However it may happen that I get <id>s of both nodes of (:Type1). In that case separate queries will be fired starting at corresponding node and will evaluate to the (:Type1) node on the other side, thus further execution will continue on both the nodes.
Q1. How can I avoid processing both nodes. That is, if given two <id>s of (:Type1) nodes which reside on extreme sides of same path, how can I ensure only one of the queries starting at one of these nodes matching node on the other side is executed so that only one of those nodes are processed further and other node is say held in temporary buffer to process afterwards (if processing of first node fails).
Added fact: Above I have a single path with two (:Type1) nodes at its extreme sides. I may have three or more paths emanating from (centernode) and ending in (:Type1) node. So I want only one of those (:Type1) nodes to get processed first, and next (:Type1) node will processed only if earlier processing fails.
Q2. Is this scenario even possible with pure cypher? Or I have to end up using Neo4J Traversal API? If yes how this can be done, as I have to ensure uniqueness of nodes/relationships visited across two different traveresals.
Q3. How can I add path expander in Traversal API to match path of type (:Type1)<-[:RelType1]-(:Type2)<-[:RelType2]-(). Should I be doing something like this:
at each traversal `next()`
if (node is of Type1)
follow <-[:RelType1]-
if (node is of Type2)
follow <-[:RelType2]-
(Above is pseudocode. I am new to Traversal API. I have went through all docs and examples. So I am guessing inside expander I have to put if() filters to check current nodes type and decide which relation type and its direction to expand next. Above pseudocode is meant to indicate that.)
Is this how such cypher can be writting in Traversal API? Or is there any better way?
An old trick is to use node ids to order pairs (ID(a) < ID(b)), which filters out "duplicate" results. So if you feed all your source IDs into a single query, you can make use this trick to filter out duplicates:
WITH [1, 2, 3, 4] AS sourceIds
UNWIND sourceIds AS sourceId
MATCH (source:Type1)
WHERE ID(source) = sourceId
MATCH
(source)<-[:RelType1]-(:Type2)<-[:RelType2]-
()<-[*]-(centernode)-[*]->()
-[:RelType2]->(:Type2)-[:RelType1]->(target:Type1)
WHERE ID(source) < ID(target)
RETURN source, target
Could this work for your use case?

Remove redundant two way relationships in a Neo4j graph

I have a simple model of a chess tournament. It has 5 players playing each other. The graph looks like this:
The graph is generally fine, but upon further inspection, you can see that both sets
Guy1 vs Guy2,
and
Guy4 vs Guy5
have a redundant relationship each.
The problem is obviously in the data, where there is a extraneous complementary row for each of these matches (so in a sense this is a data quality issue in the underlying csv):
I could clean these rows by hand, but the real dataset has millions of rows. So I'm wondering how I could remove these relationships in either of 2 ways, using CQL:
1) Don't read in the extra relationship in the first place
2) Go ahead and create the extra relationship, but then remove it later.
Thanks in advance for any advice on this.
The code I'm using is this:
/ Here, we load and create nodes
LOAD CSV WITH HEADERS FROM
'file:///.../chess_nodes.csv' AS line
WITH line
MERGE (p:Player {
player_id: line.player_id
})
ON CREATE SET p.name = line.name
ON MATCH SET p.name = line.name
ON CREATE SET p.residence = line.residence
ON MATCH SET p.residence = line.residence
// Here create the edges
LOAD CSV WITH HEADERS FROM
'file:///.../chess_edges.csv' AS line
WITH line
MATCH (p1:Player {player_id: line.player1_id})
WITH p1, line
OPTIONAL MATCH (p2:Player {player_id: line.player2_id})
WITH p1, p2, line
MERGE (p1)-[:VERSUS]->(p2)
It is obvious that you don't need this extra relationship as it doesn't add any value nor weight to the graph.
There is something that few people are aware of, despite being in the documentation.
MERGE can be used on undirected relationships, neo4j will pick one direction for you (as realtionships MUST be directed in the graph).
Documentation reference : http://neo4j.com/docs/stable/query-merge.html#merge-merge-on-an-undirected-relationship
An example with the following statement, if you run it for the first time :
MATCH (a:User {name:'A'}), (b:User {name:'B'})
MERGE (a)-[:VERSUS]-(b)
It will create the relationship as it doesn't exist. However if you run it a second time, nothing will be changed nor created.
I guess it would solve your problem as you will not have to worry about cleaning the data in upfront nor run scripts afterwards for cleaning your graph.
I'd suggest creating a "match" node like so
(x:Player)-[:MATCH]->(m:Match)<-[:MATCH]-(y:Player)
to enable tracking details about the match separate from the players.
If you need to track player matchups distinct from the matches themselves, then
(x:Player)-[:HAS_PLAYED]->(pair:HasPlayed)<-[:HAS_PLAYED]-(y:Player)
would do the trick.
If the schema has to stay as-is and the only requirement is to remove redundant relationships, then
MATCH (p1:Player)-[r1:VERSUS]->(p2:Player)-[r2:VERSUS]->(p1)
DELETE r2
should do the trick. This finds all p1, p2 nodes with bi-directional VERSUS relationships and removes one of them.
You need to use UNWIND to do the trick.
MATCH (p1:Player)-[r:VERSUS]-(p2:Player)
WITH p1,p2,collect(r) AS rels
UNWIND tail(rels) as rel
DELETE rel;
THe previous code will find the direct connections of type VERSUS between p1 and p2 using match (note that this is not directed). Then will get the collection of relationships and finally the last of those relations, which is deleted.
Of course you can add a check to see whether the length of the collection is 2.

Get full graph that node N is a part of in neo4j

I'm trying use Cypher to get the entire graph that exists if I start at a given node in neo4j. When I say entire graph I mean all nodes and relationships that are connected to at least one other node in the graph.
I've seen examples where people can get all nodes that might be connected to a given start node with a known relationship. Examples of this include this and this, but how could I do this if I do not know the relationships?
Ultimately I'd like every node and relationship where I start at one given node and sprawl out, listing the nodes that are linked by every relationship.
I've tried this:
START n=node(441007) MATHC (n)-[:*]->(d) RETURN d
but the syntax is incorrect. I'm unsure if you can submit a wildcard relationship. Additionaly I do not think this will give me what I am looking for.
Try this:
MATCH (n)-[r*]->(d)
WHERE ID(n) = 441007
RETURN r, d
This will fan out from n (if using an older version of Neo, you should revert to your START syntax) and return you the paths to each d Node that can be reached. It is relationship type agnostic through not defining the relationship label. If you didn't care about the path you could omit itwith:
MATCH (n)-[*]->(d)
WHERE ID(n) = 441007
RETURN d
Obviously on a large graph this will get expensive!
Edit
Meant to add the link to the cheat sheet, check out the section called Patterns.
Hej WildBill,
i have created a Company Graph for learning Neo4J, so i send the following Pattern against the Graph a got this result:
START a=node(9)
MATCH (a)<-[rel]-(d)
MATCH (d)-[sk]->(skill)
RETURN a, d, skill
Node 9 is my Company, which is part of the Graph.

Extract subgraph in neo4j

I have a large network stored in Neo4j. Based on a particular root node, I want to extract a subgraph around that node and store it somewhere else. So, what I need is the set of nodes and edges that match my filter criteria.
Afaik there is no out-of-the-box solution available. There is a graph matching component available, but it works only for perfect matches. The Neo4j API itself defines only graph traversal which I can use to define which nodes/edges should be visited:
Traverser exp = Traversal
.description()
.breadthFirst()
.evaluator(Evaluators.toDepth(2))
.traverse(root);
Now, I can add all nodes/edges to sets for all paths, but this is very inefficient. How would you do it? Thanks!
EDIT Would it make sense to add the last node and the last relationship of each traversal to the subgraph?
As for graph matching, that has been superseded by http://docs.neo4j.org/chunked/snapshot/cypher-query-lang.html which would fit nicely, and supports fuzzy matchin with optional relationships.
For subgraph representation, I would use the Cypher output to maybe construct new Cypher statements for recreating the graph, much like a SQL export, something like
start n=node:node_auto_index(name='Neo')
match n-[r:KNOWS*]-m
return "create ({name:'"+m.name+"'});"
http://console.neo4j.org/r/pqf1rp for an example
I solved it by constructing the induced subgraph based on all traversal endpoints.
Building the subgraph from the set of last nodes and edges of every traversal does not work, because edges that are not part of any shortest paths would not be included.
The code snippet looks like this:
Set<Node> nodes = new HashSet<Node>();
Set<Relationship> edges = new HashSet<Relationship>();
for (Node n : traverser.nodes())
{
nodes.add(n);
}
for (Node node : nodes)
{
for (Relationship rel : node.getRelationships())
{
if (nodes.contains(rel.getOtherNode(node)))
edges.add(rel);
}
}
Every edge is added twice. One time for the outgoing node and one time for the incoming node. Using a Set, I can ensure that it's in the collection only once.
It is possible to iterate over incoming/outgoing edges only, but it is unclear how loops (edge from a node to itself) are handled. To which category do they belong to? This snippet does not have this issue.
See dumping the database to cypher statements
dump START n=node({self}) MATCH p=(n)-[r:KNOWS*]->(m) RETURN n,r,m;
There's also an example for importing the subgraph of first database (db1) into a second (db2).

Resources