I am trying to build a cypher query for two scenarios :
Tests having depth more than 2
Specific test having depth more than 2
As in Image you can see tests 1, 2, 3 are somewhat related through depth more than 2. The cypher which i ran was :
MATCH p=()<-[r:TEST_FOR*..10]-() RETURN p LIMIT 50
Now when i Change my cypher to below then i get no records/results or nodes.
1) MATCH p=()<-[r:TEST_FOR*2..5]-() RETURN p LIMIT 50
2) MATCH p=()<-[r:TEST_FOR*2]-() RETURN p LIMIT 50
3) MATCH p=(d:Disease)<-[r:TEST_FOR*]-(t:Tests) WHERE t.testname = 'Alkaline Phosphatase (ALP)' RETURN p
4) MATCH p=()<-[r:TEST_FOR*..10]-(t:Tests {testname:'Alkaline Phosphatase (ALP)'}) RETURN p LIMIT 50
When I run Query 3 and 4 above, I get same results, i.e 1 test with 5 diseases, but it does not extend out further for that specific test.
But if you see the image the test is connected to two other tests ! My structure is as follows :
Tests (testname)
Disease (diseasename, did)
Linknode (parentdieaseid, testname)
I had used below query to create Relationship "TEST_FOR"
match(d:Disease), (l:Linknode) where d.did = l.parentdiseaseid
with d, l.testname as name
match(t:Test {testname:name}) create (d)<-[:TEST_FOR]-(t);
Direction is the problem here. Each of your 4 queries above uses a directed variable-length matching pattern, meaning that every relationship traversed must use the direction indicated (incoming). However, that won't work once you hit :Tests nodes, since they only have outgoing relationships to :Disease nodes.
The easy fix is to omit the direction, so the matching pattern will traverse :TEST_FOR relationships regardless of direction. For example:
MATCH p=()-[r:TEST_FOR*2..5]-() RETURN p LIMIT 50
Note that the reason your original query
MATCH p=()<-[r:TEST_FOR*..10]-() RETURN p LIMIT 50
was returning the full graph is because it was never traversing deeper than a single hop from each :Disease node (since it would never be able to traverse an incoming :TEST_FOR relationship from a :Tests node), but it was starting from every :Disease node, so naturally that hit every single :Tests node as well. You would have gotten the same graph with this:
MATCH p=()<-[r:TEST_FOR]-() RETURN p LIMIT 50
Related
I have written below three queries and trying to understand difference between all 3 of them.
Query1:
MATCH (person)-[r]->(otherPerson)
Query2:
MATCH (person)-->(otherPerson)
Query3:
MATCH (person)--(otherPerson)
Please let me know if there is any difference between the three queries.
Query 1 and 2 are basically the same, you are asking for all nodes connected by relationships that start at the person nodes and end at the otherPerson node. In Query 1 you are also adding an alias/label to the actual relationship r that would allow you to return the relationship. In Query 1 you could do
MATCH (person)-[r]->(otherPerson) RETURN r
In Query 2, you could not return the relationship.
Query 3 is similar to Query 2 except that you are asking for all nodes connected by relationships that start or end at the person nodes and start or end at the otherPerson node.
Query 1 and 2 will find all nodes and give them a label of person. It will then go out all outbound relationships and label the connected node as otherPerson. In the case of Query 1 the relationship will also be given a label of r.
Query 3 will match the same pattern except it will traverse both incoming and outgoing edges to find the otherPerso node.
I am creating simple graph db for tranportation between few cities. My structure is:
Station = physical station
Stop = each station has several stops, depend on time and line ID
Ride = connection between stops
I need to find route from city A to city C, but i has no direct stopconnection, but they are connected thru city B. see picture please, as new user i cant post images to question.
How can I get router from City A with STOP 1 connect RIDE 1 to STOP 2 then
STOP 2 connected by same City B to STOP3 and finnaly from STOP3 by RIDE2 to STOP4 (City C)?
Thank you.
UPDATE
Solution from Vince is ok, but I need set filter to STOP nodes for departure time, something like
MATCH p=shortestPath((a:City {name:'A'})-[*{departuretime>xxx}]-(c:City {name:'C'})) RETURN p
Is possible to do without iterations all matches collection? because its to slow.
If you are simply looking for a single route between two nodes, this Cypher query will return the shortest path between two City nodes, A and C.
MATCH p=shortestPath((a:City {name:'A'})-[*]-(c:City {name:'C'})) RETURN p
In general if you have a lot of potential paths in your graph, you should limit the search depth appropriately:
MATCH p=shortestPath((a:City {name:'A'})-[*..4]-(c:City {name:'C'})) RETURN p
If you want to return all possible paths you can omit the shortestPath clause:
MATCH p=(a:City {name:'A'})-[*]-(c:City) {name:'C'}) RETURN p
The same caveats apply. See the Neo4j documentation for full details
Update
After your subsequent comment.
I'm not sure what the exact purpose of the time property is here, but it seems as if you actually want to create the shortest weighted path between two nodes, based on some minimum time cost. This is different of course to shortestPath, because that minimises on the number of edges traversed only, not the cost of those edges.
You'd normally model the traversal cost on edges, rather than nodes, but your graph has time only on the STOP nodes (and not for example on the RIDE edges, or the CITY nodes). To make a shortest weighted path query work here, we'd need to also model time as a property on all nodes and edges. If you make this change, and set the value to 0 for all nodes / edges where it isn't relevant then the following Cypher query does what I think you need.
MATCH p=(a:City {name: 'A'})-[*]-(c:City {name:'C'})
RETURN p AS shortestPath,
reduce(time=0, n in nodes(p) | time + n.time) AS m,
reduce(time=0, r in relationships(p) | time + r.time) as n
ORDER BY m + n ASC
LIMIT 1
In your example graph this produces a least cost path between A and C:
(A)->(STOP1)-(STOP2)->(B)->(STOP5)->(STOP6)->(C)
with a minimum time cost of 230.
This path includes two stops you have designated "bad", though I don't really understand why they're bad, because their traversal costs are less than other stops that are not "bad".
Or, use Dijkstra
This simple Cypher will probably not be performant on densely connected graphs. If you find that performance is a problem, you should use the REST API and the path endpoint of your source node, and request a shortest weighted path to the target node using Dijkstra's algorithm. Details here
Ah ok, if the requirement is to find paths through the graph where the departure time at every stop is no earlier than the departure time of the previous stop, this should work:
MATCH p=(:City {name:'A'})-[*]-(:City {name:'C'})
MATCH (a:Stop) where a in nodes(p)
MATCH (b:Stop) where b in nodes(p)
WITH p, a, b order by b.time
WITH p as ps, collect(distinct a) as as, collect(distinct b) as bs
WHERE as = bs
WITH ps, last(as).time - head(as).time as elapsed
RETURN ps, elapsed ORDER BY elapsed ASC
This query works by matching every possible path, and then collecting all the stops on each matched path twice over. One of these collections of stops is ordered by departure time, while the other is not. Only if the two collections are equal (i.e. number and order) is the path admitted to the results. This step evicts invalid routes. Finally, the paths themselves are ordered by least elapsed time between the first and last stop, so the quickest route is first in the list.
Normal warnings about performance, etc. apply :)
I have a dataset that looks like this (Artefact)-[HAS]-(Keyword), keywords can be shared multiple times by artefacts. What I am trying to achieve is;
Returning most interconnected keyword nodes, count of artefacts related to keywords, count of the overlap between keyword nodes and the hop to another keyword (keyword)-(artefact)-(keywords), the "shared" artefact count between two keywords.
In other words a count of the artefact records within an intersect between two keyword nodes. For example given these three artefact nodes
1) spoon (keywords; metal, food)
2) sword (keywords; metal, fighting)
3) fork (keywords; metal, food)
The query would therefore return the keyword node, count of artefacts related to keyword (3, spoon, sword and fork), count of the keywords related by artefact between keyword nodes (metal has 2 indirect connections to food and 1 to fighting).
Once I've worked that out, for the sake of speed because I realise this is a big query, create a related_to relationship between keywords with the count of the number of artefacts they share in common. Only select 1 record to create this relationship, to test it works :) (hence limit 1)
MATCH (n:Keyword)-[r*2]-(x:Keyword)
WITH n, COUNT(r) AS c, x
LIMIT 1
MERGE (n)-[s:RELATED_KEY]-(x) SET s.weight = c
I'm using neo4j community edition (2.1.6),
Many thanks, Andy
This query will return you the first part of your answer :
MATCH (k:Keyword)
WITH k
LIMIT 1
MATCH (k)<-[:HAS]-(a)
WITH k, collect(a) as artefacts
WITH k, artefacts, size(artefacts) as c
UNWIND artefacts as artefact
MATCH (k)<-[:HAS]-(artefact)-[:HAS]->(k2)
RETURN c, artefacts, collect(distinct(k2.name)) as keywords, count(distinct(k2.name)) as keyWordsCount
However, I guess you may create the relationships between the related nodes directly :
MATCH (k:Keyword)
WITH k
LIMIT 1
MATCH (k)<-[:HAS]-(a)-[:HAS]->(other)
MERGE (k)-[r:RELATED_TO]->(other)
ON CREATE SET r.weight = 1
ON MATCH SET r.weight = r.weight + 1
I just imported the English Wikipedia into Neo4j and am playing around. I started by looking up the pages that link into the Page "Berlin"
MATCH p=(p1:Page {title:"Berlin"})<-[*1..1]-(otherPage)
WITH nodes(p) as neighbors
LIMIT 500
RETURN DISTINCT neighbors
That works quite well. What I would like to achieve next is to show the 2nd degree of relationships. In order to be able to display them correctly, I would like to limit the number of first degree relationship nodes to 20 and then query the next level of relationship.
How does one achieve that?
I don't know the Wikipedia model, but I'm assuming that there are many different relationship types and that is why that -[*1..1]-, I think that is analogous to -[]- or even --. I doubt it has any serious impact though.
You can collect up the first level matches and limit them to 20 using a WITH with a LIMIT. You can then perform a second match using those (<20) other pages as the start point.
MATCH (p1:Page {title:"Berlin"})<-[*1..1]-(otherPage:Page)
WITH p1, otherPage
LIMIT 20
MATCH (otherPage)<-[*1..1]-(secondDegree:Page)
WHERE secondDegree <> p1
WITH otherPage, secondDegree
LIMIT 500
RETURN otherPage, COLLECT(secondDegree)
There are many ways to return the data, this just returns the first degree match with an array of the subsequent matches.
If the only type of relationship is :Link and you want to keep the start node then you can change the query to this:
MATCH (p1:Page {title:"Berlin"})<-[:Link]-(otherPage:Page)
WITH p1, otherPage
LIMIT 20
MATCH (otherPage)<-[:Link]-(secondDegree:Page)
WHERE secondDegree <> p1
WITH p1, otherPage, secondDegree
LIMIT 500
RETURN p1, otherPage, COLLECT(secondDegree)
I have a scenario where I have more than 2 random nodes.
I need to get all possible paths connecting all three nodes. I do not know the direction of relation and the relationship type.
Example : I have in the graph database with three nodes person->Purchase->Product.
I need to get the path connecting these three nodes. But I do not know the order in which I need to query, for example if I give the query as person-Product-Purchase, it will return no rows as the order is incorrect.
So in this case how should I frame the query?
In a nutshell I need to find the path between more than two nodes where the match clause may be mentioned in what ever order the user knows.
You could list all of the nodes in multiple bound identifiers in the start, and then your match would find the ones that match, in any order. And you could do this for N items, if needed. For example, here is a query for 3 items:
start a=node:node_auto_index('name:(person product purchase)'),
b=node:node_auto_index('name:(person product purchase)'),
c=node:node_auto_index('name:(person product purchase)')
match p=a-->b-->c
return p;
http://console.neo4j.org/r/tbwu2d
I actually just made a blog post about how start works, which might help:
http://wes.skeweredrook.com/cypher-it-all-starts-with-the-start/
Wouldn't be acceptable to make several queries ? In your case you'd automatically generate 6 queries with all the possible combinations (factorial on the number of variables)
A possible solution would be to first get three sets of nodes (s,m,e). These sets may be the same as in the question (or contain partially or completely different nodes). The sets are important, because starting, middle and end node are not fixed.
Here is the code for the Matrix example with added nodes.
match (s) where s.name in ["Oracle", "Neo", "Cypher"]
match (m) where m.name in ["Oracle", "Neo", "Cypher"] and s <> m
match (e) where e.name in ["Oracle", "Neo", "Cypher"] and s <> e and m <> e
match rel=(s)-[r1*1..]-(m)-[r2*1..]-(e)
return s, r1, m, r2, e, rel;
The additional where clause makes sure the same node is not used twice in one result row.
The relations are matched with one or more edges (*1..) or hops between the nodes s and m or m and e respectively and disregarding the directions.
Note that cypher 3 syntax is used here.