Neo4j query to ignore parent nodes which doesn't satisfy a condition but keep the same structure - neo4j

I have a tree-like structure and I'm trying to get a Cypher query which will replace the parent node with the child if the parent node does not have a certain relation
for example the query: MATCH (c)-[:CHILD_OF*]->(p {id:"123"}) return c returns a structure like so (we don't care about what the other nodes are, the structure is the only thing that needs to be preserved)
()<-(A)
()<-()<-(B)<-()<-(C)
()<-(D)<-(E)<-()<-(F)
\-(G)<-()<-H)
How could I get the query to ignore all nodes without a certain property but keep it the same structure like so:
(A)
(B)<-(C)
(D)<-(E)<-(F)
(G)<-(H)

You should take a look at the procedures for creating virtual nodes and relationships in APOC Procedures.
These will allow you to create virtual relationships, that will not be saved to the graph, but will be present and viewable in your query.
The tricky part will be creating those new virtual relationships. You'll likely be filtering down nodes in all paths to the nodes you're interested in. At that point you may need to use apoc.coll.pairsMin() in order to get each adjacent pair of nodes in the collection on a row so you can create the virtual relationships between them.
After all the virtual relationships are created (in the same cypher query), match from the root node using those virtual relationships, and you should see the graph you want.

Related

Get all nodes with a specific type of relationship to a root node

I have a rather large and complex graph in Neo4j (millions of nodes and relationships in various types), I want to get all child nodes (in all depths) of a specific root node, but only with a specific type of relationship
I have tried: Match (n:NODE_TYPE)-[*:REL_TYPE]->(r:NODE_TYPE {id:SPECIFIC_ID}) return n
But I get a syntax error for specifying a label on the relationship
Querying the whole graph takes a really long time without specifying the relationship type, and nodes could go through paths that will eventually lead to the root node but will use other types of relationships (which is not good for my use case)
you need to change the order of the rel type and wildcard operator:
Match (n:NODE_TYPE)-[:REL_TYPE*]->(r:NODE_TYPE {id:SPECIFIC_ID})
return n

How to partially isolate a subgraph without using labels in neo4j

I'm creating a graph that contains a large number of subgraphs of roughly treelike structure in that the 'root' of each subgraph only has outwardly directed relationships. The many leaves and branches of this subgraph all contain data related to the root. This is so that a single query like the following will return all data associated with a given root, and only the data associated with that root:
MATCH (root:ROOT {id: 'foo'})-[*]->(leaves) RETURN leaves
There are very strong reasons to optimize for this query. However, the subgraphs are not truly isolated, because some of the leaves are actually categories that can receive relationships from many roots, so structures like this exist:
(root)-[]->(category)<-[]-(root)
This seems like a great way to preserve the integrity of the subgraphs while also allowing for complex relationships between them, however, there's one catch. I can't have simple, one-to-one relationships directly between roots, or one root will contaminate the other's response to the first query. As I see it, there are only two real options.
Build a new dummy node for each 1-to-1 relationship between roots. Like so:
(root)-[]->(dummy)<-[]-(root)
I hate this option. It proliferates useless nodes and it dilutes the concept of relationships.
Give every child of each subgraph a label identifying it as a member of the subgraph. This is an even worse option as I see it. Since the subgraphs number in the many thousands it would dramatically pollute the label space.
I've also considered filtering on the label of a direct relationship, but that only excludes the foreign root, and not its children. See below:
Filter on the label of direct 1-to-1 relationships with a structure like this:
(root)-[:bar]->(foreign_root)-[]->(foreign_leaves)
And a primary query like this:
MATCH (root {id: 'foo'})-[*]->(leaves) WHERE NOT (root)-[:bar]->(leaves) RETURN leaves
Produces a result of (foreign_leaves) This is undesirable for multiple reasons, since it makes the most important query larger, and doesn't actually isolate the graph.
So, in one sense I am asking, is there a way to create a direct, 1-to-1 relationship between two of these roots without massive graph pollution or cross-contamination between subgraphs? In a larger sense, am I viewing the problem wrongly?
I think you are almost there. In your last Cypher query, you can tweak your WHERE clause so that it does not instantiate the :bar relationship's destination node. Like this:
MATCH (root {id: 'foo'})-[*]->(leaves)
WHERE NOT (root)-[:bar]->()
RETURN leaves
This way, you filter out all paths that start with a :bar relationship.

Is a DFS Cypher Query possible?

My database contains about 300k nodes and 350k relationships.
My current query is:
start n=node(3) match p=(n)-[r:move*1..2]->(m) where all(r2 in relationships(p) where r2.GameID = STR(id(n))) return m;
The nodes touched in this query are all of the same kind, they are different positions in a game. Each of the relationships contains a property "GameID", which is used to identify the right relationship if you want to pass the graph via a path. So if you start traversing the graph at a node and follow the relationship with the right GameID, there won't be another path starting at the first node with a relationship that fits the GameID.
There are nodes that have hundreds of in and outgoing relationships, some others only have a few.
The problem is, that I don't know how to tell Cypher how to do this. The above query works for a depth of 1 or 2, but it should look like [r:move*] to return the whole path, which is about 20-200 hops.
But if i raise the values, the querys won't finish. I think that Cypher looks at each outgoing relationship at every single path depth relating to the start node, but as I already explained, there is only one right path. So it should do some kind of a DFS search instead of a BFS search. Is there a way to do so?
I would consider configuring a relationship index for the GameID property. See http://docs.neo4j.org/chunked/milestone/auto-indexing.html#auto-indexing-config.
Once you have done that, you can try a query like the following (I have not tested this):
START n=node(3), r=relationship:rels(GameID = 3)
MATCH (n)-[r*1..]->(m)
RETURN m;
Such a query would limit the relationships considered by the MATCH cause to just the ones with the GameID you care about. And getting that initial collection of relationships would be fast, because of the indexing.
As an aside: since neo4j reuses its internally-generated IDs (for nodes that are deleted), storing those IDs as GameIDs will make your data unreliable (unless you never delete any such nodes). You may want to generate and use you own unique IDs, and store them in your nodes and use them for your GameIDs; and, if you do this, then you should also create a uniqueness constraint for your own IDs -- this will, as a nice side effect, automatically create an index for your IDs.

Cypher: Multiple independent queries in one call

in my Neo4j 2.0 server database I have a forest, i.e. a set of trees. One of my use cases is to get the child nodes of an arbitrary subset of tree nodes.
For instance, I have the root nodes
root1 root2 root3 root4
and now I want the child nodes of root1 and root4. And I need to know which children belong to which root. Each query individually is a simple MATCH Cypher query. But for the sake of performance I would like to keep the amount of database calls low since I use the Neo4j server. Thus I am thinking about a way to tell Cypher "give me the child terms of root1 and root4 and tell me which node belongs to which root in the result". That is, I think of a kind of map. Or a collection of result sets where the first element is the child nodes of the first root, the second element the child nodes of the second root etc.
Is there a way to do this in Cypher or will I have to fall back to a server plugin here?
Thank you and best regards!
Edit:
To clarify: My main concern is that I need to know which children belong to which root. As an example, consider the small graph generated by this command:
create (r1:ROOT {name:"root1"}),
(r2:ROOT {name:"root2"}),
(c11:CHILD {name:"child1_1"}),
(c12:CHILD {name:"child1_2"}),
(c13:CHILD {name:"child1_3"}),
(c21:CHILD {name:"child2_1"}),
(c22:CHILD {name:"child2_2"}),
(c23:CHILD {name:"child2_3"}),
(r1)-[:HAS_CHILD]->(c11),
(r1)-[:HAS_CHILD]->(c12),
(r1)-[:HAS_CHILD]->(c13),
(r2)-[:HAS_CHILD]->(c21),
(r2)-[:HAS_CHILD]->(c22),
(r2)-[:HAS_CHILD]->(c23)
Here, we get root1 and root2 with three children, respectively.
To get the children of root1 I would issue the following query:
MATCH (r:ROOT)-[:HAS_CHILD]->c where r.name='root1' RETURN collect(c)
Now I know the children of root root1.
The question is: How would a query look like that queries the children of root1 AND root2 where the result would show the association of which child belongs to which root. Because clearly the query
MATCH (r:ROOT)-[:HAS_CHILD]->c where r.name='root1' OR r.name='root2' RETURN collect(c.id)
would give me the children of both roots. But now I would not know which root had which children. So what can I do?
You should give us more details but a query like this (adjusting properties and relationships), should work as you want:
MATCH (child) <-[:HAS_CHILD]- (root:ROOT)
WHERE root.name IN ['root1','root4']
RETURN child, root

Slow performing cypher query that creates nodes to group existing nodes by property values

I have a performance issue with a modifying cypher query. Given is an origin node that has a huge amount of outgoing relationships to child nodes. These child nodes all have a key property. Now the goal is to create new nodes between the origin and the child nodes to group all child nodes which share the same key properties value. A plot of that idea can be found at the neo4j console: http://console.neo4j.org/?id=vinntj
I use the query together with spring-data-neo4j 2.2.2.RELEASE and neo4j 1.9.2 embedded. The parameter for that query must be a node id and the result of that query should be the modified root node.
The query currently looks like (a bit more complex than in the linked neo4j console):
START root=node({0})
MATCH (root)-[r:LEAF]->(child)
SET root.__type__='my.GroupedRoot'
DELETE r
WITH child.`custom-GROUP` AS groupingKey, root AS origin, child AS leaf
CREATE UNIQUE (origin)-[:GROUP]->(group{__type__:'my.Group',key:'GROUP',value:groupingKey,origin:ID(origin)})-[:LEAF]->(leaf)
RETURN DISTINCT origin
The property custom-GROUP is the key to group by. In SDN it is represented by a DynamicProperties object. I annotated it to be indexed as well as the groupingKey and origin property of the created group node.
With 5000 child nodes it takes ~50sec to group them. For 10000 nodes ~90sec. For 20000 nodes ~380s and for 30000 nodes > 50min! This looks like an o(log n) scale to me. But my goal is an o(n) scale and to get 500000+ child nodes processed below 30min. I assume that the CREATE UNIQUE part of that query causes that problem because for new group nodes it always need to check what kind of group nodes have already been created. And the amount to check grows with the amount of already grouped child nodes.
Does someone have an idea about how to get this query faster? Or to do the same thing faster with an other query?
If the CREATE UNIQUE is indeed the problem, then this will first create the groups, then map to them.
START root=node(*)
MATCH (root)-[r:LEAF]->(child)
WHERE HAS (root.key) AND root.key='root'
WITH DISTINCT child.key AS groupingKey, root as origin
CREATE UNIQUE (origin)-[:GROUP]->(intermediate { key:groupingKey,origin:ID(origin)})
WITH groupingKey, origin, intermediate
MATCH (origin)-[r:LEAF]->(leaf)
WHERE leaf.key = groupingKey
DELETE r
CREATE (intermediate)-[:LEAF]->(leaf)
RETURN DISTINCT origin
The console is not letting me view the execution plan for either of our queries for some reason so I don't know for sure if it will help.
You might also consider indexing the roots so that you aren't having to do a "WHERE" on all of the nodes. You could just check an index for key=root.
EDIT An alternative to the above query is as follows which will prevent having to match the leaf nodes a second time by using a collect.
START root=node(*)
MATCH (root)-[r:LEAF]->(child)
WHERE HAS (root.key) AND root.key='root'
DELETE r
WITH DISTINCT child.key AS groupingKey, root as origin, COLLECT(child) as children
CREATE UNIQUE (origin)-[:GROUP]->(intermediate { key:groupingKey,origin:ID(origin)})
WITH groupingKey, origin, intermediate, children
FOREACH(leaf IN children : CREATE (intermediate)-[:LEAF]->(leaf))
RETURN DISTINCT origin
Well, now I turned to not use this kind of cypher queries on such a big amount of data. I implemented the same functionality using the traversal API for extracting the groupable items and the Neo4jTemplate to create the new nodes and relationships. Now 50000 items can be grouped in 5474ms instead of ~1h with the previously used cypher query. This is a very big improvement.

Resources