I have a dag (Tree) in which the directed edges are only of three kinds :
Left to right (siblings)
Child to Parent
Parent to a child
Specifically , the problem is to evaluate an attribute parse tree, but it doesn't matter what the specific problem is.
Sort of :
What traversal is guaranteed to give a topological sort of the nodes ?
I think inorder will fail but some places it is suggested that inorder is the way to go. I know reverse post order woeks on general DAGS but I think there must be a simpler traversal for my case.
Since your graph is a DAG, and you therefore have no back-edges, you can use depth-first search to traverse your graph and add nodes to your sorted list in the order in which they come off the DFS stack.
Related
Given a neo4j database containing vertices which are either of type folder or leaf. A general tree is modelled using :childof relations, and there is a single 'root' node which is the common ancestor for all vertices.
When presenting the tree, I want to filter out either full branches based on the id of any vertex of type folder. Additionally there is a filter for any properties on vertices of type leaf. The tricky part is that I do not want to see any folders where all descendant leaf nodes are filtered out. Each query only returns immediate descendants, but the filter is applied to the whole subtree. The query must return the immediate children, and a collection of the id of each folder containing leafs which are not filtered out.
The use case is an API for showing a hierarchy based on some filter constraints. I have programmed this in the API application code, but transferring all data from the db to the API application is too slow, so I need to improve the query to condense the data transfer. A third approach is using a purpose built process that does this filtering, keeping the tree in memory. This has been done with some success, but I prefer to use shelf-ware if I can.
The following code is used to get the top level nodes, without filtering. I struggle with expressing MATCH only if at least one descendant also matches
MATCH (p)-[:childof]->(s:Folder) WHERE s.name = 'root'
WITH p OPTIONAL MATCH (v)-[:childof*1..]->(p)
WHERE NOT((v)<-[:childof]-(:Folder))
RETURN p, collect(v.id) as folder_ids
My personal inclination to the problem is that it is too specific for a general purpose graph engine, but I am hoping to be proved wrong.
It sounds like you're close.
We can use pattern comprehensions at the folder level to check for children that meet the filter, and make sure we only keep folders that have at least one child that meets the filter criteria.
And at the immediate descendent level, if we use a MATCH instead of an OPTIONAL MATCH, since folders will get filtered for you, the only immediate descendants that are left are ones with at least one of these folders.
Let's say for example that our filter is that leaf nodes must have active = true, so we want to make sure that our folders for consideration must have at least one child node meeting the filter, and when we get back to the immediate descendants, we only want to keep that descendent if the collection of eligible folders isn't empty.
Something like this:
MATCH (p)-[:childof]->(s:Folder) WHERE s.name = 'root'
WITH p
MATCH (folder)-[:childof*1..]->(p)
WHERE NOT((folder)<-[:childof]-(:Folder)) AND
size([(folder)<-[:childof]-(child) WHERE child.active = true | child]) <> 0
RETURN p, collect(folder.id) as folder_ids
Say we have a Neo4j database with several 50,000 node subgraphs. Each subgraph has a root. I want to find all nodes in one subgraph.
One way would be to recursively walk the tree. It works but can be thousands of trips to the database.
One way is to add a subgraph identifier to each node:
MATCH(n {subgraph_id:{my_graph_id}}) return n
Another way would be to relate each node in a subgraph to the subgraph's root:
MATCH(n)-[]->(root:ROOT {id: {my_graph_id}}) return n
This feels more "graphy" if that matters. Seems expensive.
Or, I could add a label to each node. If {my_graph_id} was "BOBS_QA_COPY" then
MATCH(n:BOBS_QA_COPY) return n
would scoop up all the nodes in the subgraph.
My question is when is it appropriate to use a garden-variety property, add relationships, or set a label?
Setting a label to identify a particular subgraph makes me feel weird, like I am abusing the tool. I expect labels to say what something is, not which instance of something it is.
For example, if we were graphing car information, I could see having parts labeled "FORD EXPLORER". But I am less sure that it would make sense to have parts labeled "TONYS FORD EXPLORER". Now, I could see (USER id:"Tony") having a relationship to a FORD EXPLORER graph...
I may be having a bout of "SQL brain"...
Let's work this through, step by step.
If there are N non-root nodes, adding an extra N ROOT relationships makes the least sense. It is very expensive in storage, it will pollute the data model with relationships that don't need to be there and that can unnecessarily complicate queries that want to traverse paths, and it is not the fastest way to find all the nodes in a subgraph.
Adding a subgraph ID property to every node is also expensive in storage (but less so), and would require either: (a) scanning every node to find all the nodes with a specific ID (slow), or (b) using an index, say, :Node(subgraph_id) (faster). Approach (b), which is preferable, would also require that all the nodes have the same Node label.
But wait, if approach 2(b) already requires all nodes to be labelled, why don't we just use a different label for each subgroup? By doing that, we don't need the subgraph_id property at all, and we don't need an index either! And finding all the nodes with the same label is fast.
Thus, using a per-subgroup label would be the best option.
I am new to Neo4j and currently playing with this tree structure:
The numbers in the yellow boxes are a property named order on the relationship CHILD_OF.
My goal was
a) to manage the sorting order of nodes at the same level through this property rather than through directed relationships (like e.g. LEFT, RIGHT or IS_NEXT_SIBLING, etc.).
b) being able to use plain integers instead of complete paths for the order property (i.e. not maintaining sth. like 0001.0001.0002).
I can't however find the right hint on how or if it is possible to recursively query the graph so that it keeps returning the nodes depth-first but for the sorting at each level consider the order property on the relationship.
I expect that if it is possible it might include matching the complete path iterating over it with the collection utilities of Cypher, but I am not even close enough to post some good starting point.
Question
What I'd expect from answers to this question is not necessarily a solution, but a hint on whether this is a bad approach that would perform badly anyways. In terms of Cypher I am interested if there is a practical solution to this.
I have a general idea on how I would tackle it as a Neo4j server plugin with the Java traversal or core api (which doesn't mean that it would perform well, but that's another topic), so this question really targets the design and Cypher aspect.
This might work:
match path = (n:Root {id:9})-[:CHILD_OF*]->(m)
WITH path, extract(r in rels(path) | r.order) as orders
ORDER BY orders
if it complains about sorting arrays then computing a number where each digit (or two digits) are your order and order by that number
match path = (n:Root {id:9})-[:CHILD_OF*]->(m)
WITH path, reduce(a=1, r in rels(path) | a*10+r.order) as orders
ORDER BY orders
I want to find the spanning tree from graph with loops. I cannot use regular bfs traversal here. so I check the allsimplepaths java function api, It seems find loop between two nodes. right now i select a random root, but don't know the end points. so i just want to get the spanning tree from graph while the it has many loops maybe. so it should convert to DAG and then give the tree structures. The graph may have more than one spanning tree.
how to do this? can allsimplepaths applied here?
Look at TraversalDescription with an appropriate uniqueness (NODE_GLOBAL) and Path-Expanders that follow the interesting relationships.
I have a large network stored in Neo4j. Based on a particular root node, I want to extract a subgraph around that node and store it somewhere else. So, what I need is the set of nodes and edges that match my filter criteria.
Afaik there is no out-of-the-box solution available. There is a graph matching component available, but it works only for perfect matches. The Neo4j API itself defines only graph traversal which I can use to define which nodes/edges should be visited:
Traverser exp = Traversal
.description()
.breadthFirst()
.evaluator(Evaluators.toDepth(2))
.traverse(root);
Now, I can add all nodes/edges to sets for all paths, but this is very inefficient. How would you do it? Thanks!
EDIT Would it make sense to add the last node and the last relationship of each traversal to the subgraph?
As for graph matching, that has been superseded by http://docs.neo4j.org/chunked/snapshot/cypher-query-lang.html which would fit nicely, and supports fuzzy matchin with optional relationships.
For subgraph representation, I would use the Cypher output to maybe construct new Cypher statements for recreating the graph, much like a SQL export, something like
start n=node:node_auto_index(name='Neo')
match n-[r:KNOWS*]-m
return "create ({name:'"+m.name+"'});"
http://console.neo4j.org/r/pqf1rp for an example
I solved it by constructing the induced subgraph based on all traversal endpoints.
Building the subgraph from the set of last nodes and edges of every traversal does not work, because edges that are not part of any shortest paths would not be included.
The code snippet looks like this:
Set<Node> nodes = new HashSet<Node>();
Set<Relationship> edges = new HashSet<Relationship>();
for (Node n : traverser.nodes())
{
nodes.add(n);
}
for (Node node : nodes)
{
for (Relationship rel : node.getRelationships())
{
if (nodes.contains(rel.getOtherNode(node)))
edges.add(rel);
}
}
Every edge is added twice. One time for the outgoing node and one time for the incoming node. Using a Set, I can ensure that it's in the collection only once.
It is possible to iterate over incoming/outgoing edges only, but it is unclear how loops (edge from a node to itself) are handled. To which category do they belong to? This snippet does not have this issue.
See dumping the database to cypher statements
dump START n=node({self}) MATCH p=(n)-[r:KNOWS*]->(m) RETURN n,r,m;
There's also an example for importing the subgraph of first database (db1) into a second (db2).