Neo4j: Finding simple path between two nodes takes a lot of time - neo4j

Neo4j: Finding simple path between two nodes takes a lot of time even after using upper limit (*1..4). I don't want to use allShortestPath or shortestPath because it doesnt return all the paths.
Match p=((n {Name:"Node1"}) -[*1..4]-> (m {Name:"Node2"})) return p;
Any suggestions to make it faster ?

If you have a lot of nodes, try creating an index so that the neo4j DB engine does not have to search through every node to find the ones with the right Name property values.
I am presuming that, in your example, the n and m nodes are really the same "type" of node. If this is true, then:
Add a label (I'll call it 'X') to every node (of the same type as n and m). You can use the following to add the 'X' label to node(s) represented by the variable n. You'd want to precede it with the appropriate MATCH clause:
SET n:X
Create an index on the Name property of nodes with the X label like this:
CREATE INDEX ON :X(Name);
Modify your query to:
MATCH p=((n:X {Name:"Node1"}) -[*1..4]-> (m:X {Name:"Node2"}))
RETURN p;
If you do the above, then your query should be faster.

Related

Getting a subgraph not crossing through a certain node

I have a huge (non cyclic) graph and want to find all nodes reachable by relation X from a given node. However, I do not want to cross a node having a certain attribute {attr:'donotcross'} as this represents a choke point I do not want to cross (i.e. this is the only node leading to an adjacent subgraph).
Currently I do breadth first search myself using a trivial Cypher query to isolate neighboring nodes and python, stopping the recursion as soon as I reach that specific node. This, however, is really slow and I think that using pure Cypher to isolate those nodes could be faster.
What does the Cypher query look like returning all connected nodes via X but not traversing a node with property attr:'donotcross'?
My intuition would be
MATCH (n)-[:X*]->(inter)-[:X*]->(m) WHERE NOT inter.attr = 'donotcross' RETURN m
With n being the start node. However, this does not work as this pattern can match a path with a forbidden node if there are more than the forbidden node in between the start and target node.
Using Cypher alone, you can use the following approach:
MATCH path = (n)-[:X*]->(m) // best to use a label!
WHERE none(node in nodes(path) WHERE inter.attr = 'donotcross')
RETURN DISTINCT m
Keep in mind you should at least be using labels for your starting node n, if you aren't able to look them up by an indexed property for a specific label.
Also, if there are relatively few of these donotcross nodes, and if there is an index on the label of these nodes on attr, then it may be faster to first match on these nodes, collect them, then filter based on that:
MATCH (x) // best to use a label and index lookup!
WHERE x.attr = 'donotcross'
WITH collect(x) as excluded
MATCH path = (n)-[:X*]->(m) // best to use a label!
WHERE none(node in nodes(path) WHERE node in excluded)
RETURN DISTINCT m

Is there a way to return data according to properties in Neo4j?

In each of my nodes I have a selection of properties like education_id , work_id, locale etc etc. All these properties can have one or more than one values of the sort of education_id:112 or education_id:165 i.e. Node A might have education_id:112 and Node B might have education_id:165 and again Node C might have education_id:112 and so on.
I want a cypher query that return all nodes for a particular value of the property and I don't care about the value of the property beforehand.
To put it into perspective, in the example I have provided, it must return Node A and Node C under education_id:112 and Node B under education_id:165
Note: I am not providing multiple cypher queries specifying the values of properties each time. The whole output must be dynamic.
The output to the query should be something like
education_id:112 Node A, Node C
education_id:165 Node B
These are the results of a single query statement.
Not quite sure I understand your question, but based on the expected output:
MATCH (n) RETURN n.education_id,collect(n)
will group nodes by distinct values of education_id
You should probably take a look at the cypher refcard. What you are looking for is the WHEREclause:
Match (a) WHERE a.education_id = 112 return a
You can also specify property directly in the MATCH clause.
Match (a{education_id: 112}) RETURN a

Find all relations starting with a given node

In a graph where the following nodes
A,B,C,D
have a relationship with each nodes successor
(A->B)
and
(B->C)
etc.
How do i make a query that starts with A and gives me all nodes (and relationships) from that and outwards.
I do not know the end node (C).
All i know is to start from A, and traverse the whole connected graph (with conditions on relationship and node type)
I think, you need to use this pattern:
(n)-[*]->(m) - variable length path of any number of relationships from n to m. (see Refcard)
A sample query would be:
MATCH path = (a:A)-[*]->()
RETURN path
Have also a look at the path functions in the refcard to expand your cypher query (I don't know what exact conditions you'll need to apply).
To get all the nodes / relationships starting at a node:
MATCH (a:A {id: "id"})-[r*]-(b)
RETURN a, r, b
This will return all the graphs originating with node A / Label A where id = "id".
One caveat - if this graph is large the query will take a long time to run.

Neo4j Cypher finding the average of node properties which have another property in common

I have a "Gene" Label/node type with properties "value" and "geneName"
I have a separate Label/node type called Pathway with property "
I want to go through all the different geneName's and find the average of all the Gene's value with that Gene name. I need all those Gene's displayed as different rows. Bearing in mind I have a a lot of geneName's so I can't name them all in the query. I need to do this inside a certain Pathway.
MATCH (sample)-[:Measures]->(gene)-[:Part_Of]->(pathway)
WHERE pathway.pathwayName = 'Pyrimidine metabolism'
WITH sample, gene, Collect (distinct gene.geneName) AS temp
I have been trying to figure this out all day now and all I can manage to do is retrieve all the rows of geneNames. I'm lost from there.
RETURN extract(n IN temp | RETURN avg(gene.value))
Mabye?
This query should return the average gene value for each distinct gene name:
MATCH (sample)-[:Measures]->(gene)-[:Part_Of]->(pathway:Pathway)
WHERE pathway.pathwayName = 'Pyrimidine metabolism'
RETURN sample, gene.geneName AS name, AVG(gene.value) AS avg;
When you use an aggregation function (like AVG), it automatically uses distinct values for the non-aggregating values in the same WITH or RETURN clause (i.e., sample and gene.geneName in the above query).
For efficiency, I have also added the label to the pathway nodes so that neo4j can start off by scanning just Pathway nodes instead of all nodes. In addition, you should consider creating an index on :Pathway(pathwayName), so that the search for the Pathway is as fast as possible.

Neo4j deep hierarchy query

I've this kind of data model in the db:
(a)<-[:has_parent]<-(b)-[:has_parent]-(c)<-[:has_parent]-(...)
every parent can have multiple children & this can go on to unknown number of levels.
I want to find these values for every node
the number of descendants it has
the depth [distance from the node] of every descendant
the creation time of every descendant
& I want to rank the returned nodes based on these values. Right now, with no optimization, the query runs very slow (especially when the number of descendants increases).
The Questions:
what can I do in the model to make the query performant (indexing, data structure, ...)
what can I do in the query
what can I do anywhere else?
edit:
the query starts from a specific node using START or MATCH
to clarify:
a. the query may start from any point in the hierarchy, not just the root node
b. every node under the starting node is returned ranked by the total number of descendants it has, the distance (from the returned node) of every descendant & timestamp of every descendant it has.
c. by descendant I mean everything under it, not just it's direct children
for example,
here's a sample graph:
http://console.neo4j.org/r/awk6m2
First you need to know how to find the root node. The following statement finds the nodes having no outboung parent relationship - be aware that statement is potentially expensive in a large graph.
MATCH (n)
WHERE NOT ((n)-[:has_parent]->())
RETURN n
Instead you should use an index to find that node:
MATCH (n:Node {name:'abc'})
Starting with our root node, we traverse inbound parent relationship with variable depth. On each node traversed we calculate the number of children - since this might be zero a OPTIONAL MATCH is used:
MATCH (root:Node) // line 1-3 to find root node, replace by index lookup
WHERE NOT ((root)-[:has_parent]->())
WITH root
MATCH p =(root)<-[:has_parent*]-() // variable path length match
WITH last(nodes(p)) AS currentNode, length(p) AS currentDepth
OPTIONAL MATCH (currentNode)<-[:has_parent]-(c) // tranverse children
RETURN currentNode, currentNode.created, currentDepth, count(c) AS countChildren

Resources