I am a newbie to neo4j, and basically I am trying to produce a subgraph from the whole graph according to certain rules. However, my current output does not meet what I want.
Suppose I have four nodes on the graph, which are A, B, C, D, and they are connected as:
A -- B
B -- C
C -- D
Basically I want to acquire a subgraph (or I would say it is two traces), consisting 4 nodes, and two edges:
A -- B
C -- D
However, when I use a Cypher code to query through the neo4j web interface, I always got the whole graph.. That means, I always get a graph with 4 nodes and three edges.
The Cypher query is something like below:
MATCH (n)-[r]-(m) where n.id = "ID_A" and m.id = "ID_B"
UNION
MATCH (n)-[r]-(m) where n.id = "ID_C" and m.id = "ID_D"
To be more specific, for the above query, I wish I can a subgraph with two traces, however, all three edges are shown in the output, connecting these four nodes.
Am I clear? Could anyone give me some help on how to produce the subgraph? Thank you!
It looks like the auto-completion option works. Disable it in the browser interface.
[ http://neo4j.com/developer/guide-neo4j-browser/ ]
Related
When doing a Cypher query to retrieve a specific subgraph with automorphisms, let's say
MATCH (a)-[:X]-(b)-[:X]-(c),
RETURN a, b, c
It seems that the default behaviour is to return every retrieved subgraph and all their automorphisms.
In that exemple, if (u)-[:X]-(v)-[:X]-(w) is a graph matching the pattern, the output will be u,v,w but also w,v,u, which consist in the same graph.
Is there a way to retrieve each subgraph only once ?
EDIT: It would be great if Cypher have a feature to do that in the search, using some kind of symmetry breaking condition as it would reduce the computing time. If that is not the case, how would you post-process to find the desired output ?
In the query you are making, (a)-[r:X]-(b) and (a)-[t:X]-(c) refer to a similar pattern. Since (b) and (c) can be interchanged. What is the need to repeat matching twice? MATCH (a)-[r:X]-(b) RETURN a, r, b returns all the subgraphs you are looking for.
EDIT
You can do something as follows to find the nodes, which are having two relations of type X.
MATCH (a)-[r:X]-(b) WHERE size((a)-[:X]-()) = 2 RETURN a, r, b
For these kind of mirrored patterns, we can add a restriction on the internal graph ids so only one of the two paths is kept:
MATCH (a)-[:X]-(b)-[:X]-(c)
WHERE id(a) < id(c)
RETURN a, b, c
This will also prevent the case where a = c.
In a graph where the following nodes
A,B,C,D
have a relationship with each nodes successor
(A->B)
and
(B->C)
etc.
How do i make a query that starts with A and gives me all nodes (and relationships) from that and outwards.
I do not know the end node (C).
All i know is to start from A, and traverse the whole connected graph (with conditions on relationship and node type)
I think, you need to use this pattern:
(n)-[*]->(m) - variable length path of any number of relationships from n to m. (see Refcard)
A sample query would be:
MATCH path = (a:A)-[*]->()
RETURN path
Have also a look at the path functions in the refcard to expand your cypher query (I don't know what exact conditions you'll need to apply).
To get all the nodes / relationships starting at a node:
MATCH (a:A {id: "id"})-[r*]-(b)
RETURN a, r, b
This will return all the graphs originating with node A / Label A where id = "id".
One caveat - if this graph is large the query will take a long time to run.
I have a big amounts of nodes that have outgoing relations to even bigger amount of nodes. I want to be able to
query for a limited amount of starting nodes, returning with it the related nodes, but the related nodes should also be limited in numbers.
Is this possible in neo4j 1.9?
For example create these nodes and have an auto index on name:
CREATE p = (bar{company:'Bar1'})<-[:FREQUENTS]-(andres {name:'Andres'})-[:WORKS_AT]->(neo{company:'Neo1'})
WITH andres
CREATE (restaurant{company:'Restaurant1'})<-[:FREQUENTS]-(andres)-[:WORKS_AT]-(lib{company:'Library'}) ;
CREATE p = (bar{company:'Bar2'})<-[:FREQUENTS]-(todd {name:'Todd'})-[:WORKS_AT]->(neo{company:'Neo2'})
WITH todd
CREATE (restaurant{company:'Restaurant2'})<-[:FREQUENTS]-(todd)-[:WORKS_AT]-(lib{company:'Library2'}) ;
CREATE p = (bar{company:'Bar3'})<-[:FREQUENTS]-(hank {name:'Hank'})-[:WORKS_AT]->(neo{company:'Neo3'})
WITH hank
CREATE (restaurant{company:'Restaurant3'})<-[:FREQUENTS]-(hank)-[:WORKS_AT]-(lib{company:'Library3'}) ;
What I would like is something like:
START p=node:node_auto_index('*:*')
MATCH p-[:WORKS_AT]-> c, p-[:FREQUENTS]-> f
RETURN p, collect(distinct c.company), collect(distinct f.company) LIMIT 2;
To return 2 rows and have the collections limited to one, but without using the function on the collections, tried that on a large
data set and it becomes extremely slow. So some way to LIMIT the matches..
If this is not possible in neo4j 1.9, would there be a solution in neo4j 2.0?
Can you try something like this:
START p=node:node_auto_index('*:*')
RETURN p,
head(extract(path in p-[:WORKS_AT]->() : head(tail(nodes(path))))) as work_company,
head(extract(path in p-[:FREQUENTS]->() : head(tail(nodes(path))))) as visit_company
The head function on the extracted path node should be lazy so it pulls only the first one from the pattern match
If you look at the profiling output you should see that it touches only the first node each.
It could be that the : query triggers some very large operations in the indexing layer, rather than being lazy.. I would try something like this:
START p=node:node_auto_index('*:*')
WITH p LIMIT 2
MATCH p-[:WORKS_AT]-> c, p-[:FREQUENTS]-> f return p, collect(distinct c.company), collect(distinct f.company)
I would like to query for the following subgraph in my Neo4J database:
(a)-->(b)-->(c)-->(d)
|
| -->(e)
Note: a, b, c, d, e are attribute values (non-unique values) for each of the nodes. There are thousands for these nodes with similar attribute values (a to e) but they are randomly connected to one another.
How can I write the Cyper query to specifically find the particular subgraph (akin to subgraph isomorphism problem) I seek and return (a)? I've tried the following Cyper query but other subgraphs pop up:
START n1=node:SomeIndex(AttrVal="a")
MATCH n1-[]->n2-[]->n3-[]->n4
WHERE n2.AttrVal="b" AND n3.AttrVal="c" and n4.AttrVal="d"
WITH n1, n2
MATCH n2-[]->n5
WHERE n5.AttrVal="e"
RETURN n1
Am I using the WITH and 2nd MATCH clause wrongly?
Thanks!
You can use the comma to combine multiple paths in a single match clause:
START n1=node:SomeIndex(AttrVal="a")
MATCH n1-[]->n2-[]->n3-[]->n4, n2-[]->n5
WHERE n2.AttrVal="b" AND n3.AttrVal="c" and n4.AttrVal="d" and n5.attrVal='e'
RETURN n1
Side note 1:
you can also refactor the statement like this:
START n1=node:SomeIndex(AttrVal="a"), n2=node:SomeIndex(AttrVal="b")
n3=node:SomeIndex(AttrVal="c"), n4=node:SomeIndex(AttrVal="d"),
n5=node:SomeIndex(AttrVal="e")
MATCH n1-[]->n2-[]->n3-[]->n4, n2-[]->n5
RETURN n1
Depending on the structure of your graph the second might be faster.
Side note 2:
When matching an arbitrary relationship type as you did in n1-[]->n2 you can use a shorter and more readable notation: n1-->n2
I have a scenario where I have more than 2 random nodes.
I need to get all possible paths connecting all three nodes. I do not know the direction of relation and the relationship type.
Example : I have in the graph database with three nodes person->Purchase->Product.
I need to get the path connecting these three nodes. But I do not know the order in which I need to query, for example if I give the query as person-Product-Purchase, it will return no rows as the order is incorrect.
So in this case how should I frame the query?
In a nutshell I need to find the path between more than two nodes where the match clause may be mentioned in what ever order the user knows.
You could list all of the nodes in multiple bound identifiers in the start, and then your match would find the ones that match, in any order. And you could do this for N items, if needed. For example, here is a query for 3 items:
start a=node:node_auto_index('name:(person product purchase)'),
b=node:node_auto_index('name:(person product purchase)'),
c=node:node_auto_index('name:(person product purchase)')
match p=a-->b-->c
return p;
http://console.neo4j.org/r/tbwu2d
I actually just made a blog post about how start works, which might help:
http://wes.skeweredrook.com/cypher-it-all-starts-with-the-start/
Wouldn't be acceptable to make several queries ? In your case you'd automatically generate 6 queries with all the possible combinations (factorial on the number of variables)
A possible solution would be to first get three sets of nodes (s,m,e). These sets may be the same as in the question (or contain partially or completely different nodes). The sets are important, because starting, middle and end node are not fixed.
Here is the code for the Matrix example with added nodes.
match (s) where s.name in ["Oracle", "Neo", "Cypher"]
match (m) where m.name in ["Oracle", "Neo", "Cypher"] and s <> m
match (e) where e.name in ["Oracle", "Neo", "Cypher"] and s <> e and m <> e
match rel=(s)-[r1*1..]-(m)-[r2*1..]-(e)
return s, r1, m, r2, e, rel;
The additional where clause makes sure the same node is not used twice in one result row.
The relations are matched with one or more edges (*1..) or hops between the nodes s and m or m and e respectively and disregarding the directions.
Note that cypher 3 syntax is used here.