Extract subgraph in neo4j - neo4j

I have a large network stored in Neo4j. Based on a particular root node, I want to extract a subgraph around that node and store it somewhere else. So, what I need is the set of nodes and edges that match my filter criteria.
Afaik there is no out-of-the-box solution available. There is a graph matching component available, but it works only for perfect matches. The Neo4j API itself defines only graph traversal which I can use to define which nodes/edges should be visited:
Traverser exp = Traversal
.description()
.breadthFirst()
.evaluator(Evaluators.toDepth(2))
.traverse(root);
Now, I can add all nodes/edges to sets for all paths, but this is very inefficient. How would you do it? Thanks!
EDIT Would it make sense to add the last node and the last relationship of each traversal to the subgraph?

As for graph matching, that has been superseded by http://docs.neo4j.org/chunked/snapshot/cypher-query-lang.html which would fit nicely, and supports fuzzy matchin with optional relationships.
For subgraph representation, I would use the Cypher output to maybe construct new Cypher statements for recreating the graph, much like a SQL export, something like
start n=node:node_auto_index(name='Neo')
match n-[r:KNOWS*]-m
return "create ({name:'"+m.name+"'});"
http://console.neo4j.org/r/pqf1rp for an example

I solved it by constructing the induced subgraph based on all traversal endpoints.
Building the subgraph from the set of last nodes and edges of every traversal does not work, because edges that are not part of any shortest paths would not be included.
The code snippet looks like this:
Set<Node> nodes = new HashSet<Node>();
Set<Relationship> edges = new HashSet<Relationship>();
for (Node n : traverser.nodes())
{
nodes.add(n);
}
for (Node node : nodes)
{
for (Relationship rel : node.getRelationships())
{
if (nodes.contains(rel.getOtherNode(node)))
edges.add(rel);
}
}
Every edge is added twice. One time for the outgoing node and one time for the incoming node. Using a Set, I can ensure that it's in the collection only once.
It is possible to iterate over incoming/outgoing edges only, but it is unclear how loops (edge from a node to itself) are handled. To which category do they belong to? This snippet does not have this issue.

See dumping the database to cypher statements
dump START n=node({self}) MATCH p=(n)-[r:KNOWS*]->(m) RETURN n,r,m;
There's also an example for importing the subgraph of first database (db1) into a second (db2).

Related

neo4j query to exclude nodes related to nodes with certain properties

I am trying to write a neo4j query where I only want to present nodes that are have no relation to nodes with a specific property. One way to think of it is where two separate graphs exist where one node has the property I want to exclude. I should get a result that only contains the graph of the set of nodes not connected to the node that has the property I want to exclude. This is what the graph looks like before my query
match (n) where not (n{property:'valueIWishToExclude'})--() return n
This is what the result of the query looks like
I only want to have the four connected nodes in my results. How can I set up a query that excludes the nodes that are not connected to the node with the property I wish to exclude?
In fact you need those nodes from which there is no path to the node that should be excluded. You can use the shortestPath function and ALL predicate:
match (ex) where n.property = 'valueIWishToExclude'
with collect(ex) as exn
match (n) where (not n.property = 'valueIWishToExclude') and
ALL(e in exn where not shortestPath( (n)-[*]-(e) ) is null)
return n
You are almost there, just add in the relationship in your query to only get the nodes that are related to each other
MATCH (n:label) -[:RELATED]->() where n.property<>'exclude'
RETURN n
That should return only the nodes connected to each other, as the other nodes do not have that relationship.
Let me know if that worked for you.
You may want to alter your wording a bit, what you're asking for in this question, and what you really want, are not the same thing.
In Neo4j (and most graph databases), the phrase "nodes that have no relation to..." means nodes that are not connected by a relationship to the node in question.
In that context, in your right graph (assuming the one node selected is the node marked as excluded), one node would fit the criteria and be returned as a possible result, the topmost node, since it doesn't have a relationship to the node you want to exclude; It is however two relationships removed from the excluded node.
You seem to be asking for something else, though. You seem to want nodes that are not in the same subgraph as the node to exclude. Or, alternately, nodes that have no path to the excluded node.
Make sure on future queries you're clear about what you're asking, or you'll get answers that have no relevance to what you really want.
One approach that will work is to first find all nodes within the subgraph of the excluded node, and then return all nodes that are not in those subgraph nodes.
You'll want to install APOC Procedures so you can make use of a fast means of obtaining nodes within the subgraph.
You'll also want to use labels in your graph, and maybe put an index on the property you're searching for as this will make your search fast. As it is now, your query must examine every node in your entire database to find nodes with the property in question, and that will become slower and slower as your graph grows.
Your query might look like this (using 'Label' as a stand-in for the node label):
MATCH (n:Label{propertyToExclude:'valueToExclude'})
CALL apoc.path.expandConfig(n, {bfs:true, uniqueness:"NODE_GLOBAL"}) YIELD path
WITH COLLECT(DISTINCT LAST(NODES(path))) as subgraph
MATCH (n)
WHERE NOT n in subgraph
RETURN n

Using cypher or traversal api to match only a single node on extreme sides of a path

Say I have following path in the graph:
(:Type1)<-[:RelType1]-(:Type2)<-[:RelType2]-()<-[*]-(centernode)-[*]->()-[:RelType2]->(:Type2)-[:RelType1]->(:Type1)
Given <id> of (:Type1) node on left side, I am able to MATCH above path and get corresponding (:Type1) node on right side (notice that the path is symmetric and its center is node (centernode)). In my usecase we get <id>s of (:Type1) node, get the corresponding (:Type1) node on the other side and then process further.
However it may happen that I get <id>s of both nodes of (:Type1). In that case separate queries will be fired starting at corresponding node and will evaluate to the (:Type1) node on the other side, thus further execution will continue on both the nodes.
Q1. How can I avoid processing both nodes. That is, if given two <id>s of (:Type1) nodes which reside on extreme sides of same path, how can I ensure only one of the queries starting at one of these nodes matching node on the other side is executed so that only one of those nodes are processed further and other node is say held in temporary buffer to process afterwards (if processing of first node fails).
Added fact: Above I have a single path with two (:Type1) nodes at its extreme sides. I may have three or more paths emanating from (centernode) and ending in (:Type1) node. So I want only one of those (:Type1) nodes to get processed first, and next (:Type1) node will processed only if earlier processing fails.
Q2. Is this scenario even possible with pure cypher? Or I have to end up using Neo4J Traversal API? If yes how this can be done, as I have to ensure uniqueness of nodes/relationships visited across two different traveresals.
Q3. How can I add path expander in Traversal API to match path of type (:Type1)<-[:RelType1]-(:Type2)<-[:RelType2]-(). Should I be doing something like this:
at each traversal `next()`
if (node is of Type1)
follow <-[:RelType1]-
if (node is of Type2)
follow <-[:RelType2]-
(Above is pseudocode. I am new to Traversal API. I have went through all docs and examples. So I am guessing inside expander I have to put if() filters to check current nodes type and decide which relation type and its direction to expand next. Above pseudocode is meant to indicate that.)
Is this how such cypher can be writting in Traversal API? Or is there any better way?
An old trick is to use node ids to order pairs (ID(a) < ID(b)), which filters out "duplicate" results. So if you feed all your source IDs into a single query, you can make use this trick to filter out duplicates:
WITH [1, 2, 3, 4] AS sourceIds
UNWIND sourceIds AS sourceId
MATCH (source:Type1)
WHERE ID(source) = sourceId
MATCH
(source)<-[:RelType1]-(:Type2)<-[:RelType2]-
()<-[*]-(centernode)-[*]->()
-[:RelType2]->(:Type2)-[:RelType1]->(target:Type1)
WHERE ID(source) < ID(target)
RETURN source, target
Could this work for your use case?

Is a DFS Cypher Query possible?

My database contains about 300k nodes and 350k relationships.
My current query is:
start n=node(3) match p=(n)-[r:move*1..2]->(m) where all(r2 in relationships(p) where r2.GameID = STR(id(n))) return m;
The nodes touched in this query are all of the same kind, they are different positions in a game. Each of the relationships contains a property "GameID", which is used to identify the right relationship if you want to pass the graph via a path. So if you start traversing the graph at a node and follow the relationship with the right GameID, there won't be another path starting at the first node with a relationship that fits the GameID.
There are nodes that have hundreds of in and outgoing relationships, some others only have a few.
The problem is, that I don't know how to tell Cypher how to do this. The above query works for a depth of 1 or 2, but it should look like [r:move*] to return the whole path, which is about 20-200 hops.
But if i raise the values, the querys won't finish. I think that Cypher looks at each outgoing relationship at every single path depth relating to the start node, but as I already explained, there is only one right path. So it should do some kind of a DFS search instead of a BFS search. Is there a way to do so?
I would consider configuring a relationship index for the GameID property. See http://docs.neo4j.org/chunked/milestone/auto-indexing.html#auto-indexing-config.
Once you have done that, you can try a query like the following (I have not tested this):
START n=node(3), r=relationship:rels(GameID = 3)
MATCH (n)-[r*1..]->(m)
RETURN m;
Such a query would limit the relationships considered by the MATCH cause to just the ones with the GameID you care about. And getting that initial collection of relationships would be fast, because of the indexing.
As an aside: since neo4j reuses its internally-generated IDs (for nodes that are deleted), storing those IDs as GameIDs will make your data unreliable (unless you never delete any such nodes). You may want to generate and use you own unique IDs, and store them in your nodes and use them for your GameIDs; and, if you do this, then you should also create a uniqueness constraint for your own IDs -- this will, as a nice side effect, automatically create an index for your IDs.

How to return a tree branch from large tree graph using cypher

I have a large scale graph database with hundreds of nodes and relationships. its look like a tree.
i want to write a query to get returned just only a branch.
i have attached a simple graphical representation of what i needed for more information....
here i want to start traverse from node A then A->B->C that is a one branch, then again start from A->B-->D that is another branch. finally i want to merge those two branch and get a output as shown in the right side.
there can be more than two output to merge it depends on my original graph. this is a example.
different color arrows shows different types of relationships.
patterns that i needed to check are:
(A)<-[:ORANGE]-p->[:RED]-q ;where p & p can be any nodes node A is known
(A)<-[:ORANGE]-r->[:GREEN]-s ;where r & s can be any nodes node A is known
![figure][1]
https://lh5.googleusercontent.com/-1a41h63adqs/UaQ7B1qdAxI/AAAAAAAAAI4/2QjGS5pa1Zc/s1600/Presentation1.png
Try MATCH (n)-[r]->(m) WHERE n.property = "B" RETURN n,r,m.
This will return the paths needed to make your graph.

Neo4j Spatial- two nodes created for every spatially indexed node

I am using Neo4j 1.8.2 with Neo4j Spatial 0.9 for 1.8.2 (http://m2.neo4j.org/content/repositories/releases/org/neo4j/neo4j-spatial/0.9-neo4j-1.8.2/)
Followed the example code from here http://architects.dzone.com/articles/neo4jcypher-finding-football with one change- instead of SpatialIndexProvider.SIMPLE_WKT_CONFIG, I used SpatialIndexProvider.SIMPLE_POINT_CONFIG_WKT
Everything works fine until you execute the following query:
START n=node:stadiumsLocation('withinDistance:[53.489271,-2.246704, 5.0]')
RETURN n.name, n.wkt;
n.name is null. When I explored the graph, I found this data:
Node[80]{lon:-2.20024,lat:53.483,id:79,gtype:1,bbox:-2.20024,53.483,-2.20024,53.483]}
Node[168]{lon:-2.29139,lat:53.4631,id:167,gtype:1,bbox:-2.29139,53.4631,-2.29139,53.4631]}
For Node 80 returned, it looks like this is the node created for the spatial record, which contains a property id:79. Node 79 is the actual stadium record from the example.
As per the source of IndexProviderTest, the comments
//We not longer need this as the node we get back already a 'Real' node
// Node node = db.getNodeById( (Long) spatialRecord.getProperty( "id" ) );
seem to indicate that this feature isn't available in the version I am using.
My question is, what is the recommended way to use withinDistance with other match conditions? There are a couple of other conditions to be fulfilled but I can't seem to get a handle on the actual node to actually match them.
Should I explicitly create relations? Not use Cypher and use the core API to do a traversal? Split the queries?
Two options:
a) Use GeoPipline.startNearestNeighborLatLonSearch to get a starting set of nodes, supply to subsequent Cypher query to do matching/filtering on other properties
b) Since my lat/longs are common across many entities [using centroid of an area], I can create a relation from the spatial node to all entities that are located in that area and then use one Cypher query such as:
START n=node:stadiumsLocation('withinDistance:[53.489271,-2.246704, 5.0]')
MATCH (n)<-[:LOCATED_IN]-(something)
WHERE something.someProp=5
RETURN something
As advised by Peter, went with option b.
Note though, there is no way to get the spatially indexed node back so that you can create relations from it. Had to do a withinDistance query for 0.0 distance.
can you execute the enhanced testcase I did at https://github.com/neo4j/spatial/blob/2803093d544f56d7dfe8f1d122e049fa73489d8a/src/test/java/org/neo4j/gis/spatial/IndexProviderTest.java#L199 ? It shows how to find a location, and traverse with cypher to the next node.

Resources