I've been playing around with Neo4j and have a problem for which I do not have a solution, hence my question here.
For my particular problem I'll describe a simplified version that captures the essence. Suppose I have a graph of locations that are connected either directly or via a detour:
direct: (A)-[:GOES_TO]->(B)
indirect: (A)->[:GOES_THROUGH]->(C)-[:COMES_BACK_TO]->(B)
If I want to have everything between "Go" and the "Finish" with a GOES_TO relationship I can easily use the Cypher query:
START a=node:NODE_IDX(Id = "Go"), b=node:NODE_IDX(Id = "Finish)
MATCH a-[r:GOES_TO*]->b
RETURN a,r,b
Here, NODE_IDX is an index on the nodes (Id).
Where I get stuck is when I want to have all the paths between "Go" and "Finish" that are not GOES_TO relationships but rather multiple GOES_THROUGH-->()-->COMES_BACK_TO relationship combinations (of variable depth).
I do not want to filter out the GOES_TO relationships because there are many more relationships among the nodes, and I do not want to accommodate removing all of them (dynamically). Is it possible to have a variable-depth, multi-relationship MATCH that I envisage?
Thanks!
Let me restate what I believe is being asked.
"If there is a path of the form (a)-[:X]->(b), find all other paths from a to b."
The answer is simple:
MATCH p=(a)-[:X]->(b), q=(a)-[r*]->(b)
WHERE p<>q
RETURN r;
Related
My website has pages dedicated to some events, represented by nodes in neo4j. Those events possess sub-events which are relationships under neo4j, and which correspond to links on the source page to the target page. I currently have a search engine that highlights the links to the searched events, but it is flawed by cycles in the data model. Indeed it is highlighting all the links that contain cyclic references to the current page if this page contains any link to the searched event.
The aim is therefore to have a query that is able to flag the nodes and relationships which are related to the searched events, without flagging a path only because of cyclic relationships.
I've created a small dataset representative of the issue that you can build using this query:
CREATE
(r:Event:Searched {name:'R', tag:1}),
(d:Event:Searched {name:'D', tag:1}),
(o:Event {name:'O'}),
(a:Event {name:'A'}),
(b:Event {name:'B'}),
(c:Event {name:'C'}),
(e:Event {name:'E'}),
(o)-[:hasEvent]->(a),
(o)-[:hasEvent]->(e),
(o)-[:hasEvent]->(r),
(o)-[:hasEvent]->(c),
(a)-[:hasEvent]->(b),
(b)-[:hasEvent]->(o),
(c)-[:hasEvent]->(d)
Which produces the following graph:
My aim is to have a query that only fetches nodes C and O, but not A or B, as the only reason they are flagged is that O is already flagged:
.
My current query that I need to fix is the following:
MATCH path=(upper:Event)-[:hasEvent*]->(source:Event:Searched)
RETURN upper
I hope you can help me, I couldn't manage to make similar questions' answers work on my specific case.
Ideally, the solution shouldn't be too computing-intensive, as my real model is quite big (2.300.000 nodes and 9.500.000 relationships), and the current indexing in the search engine is already quite slow.
Thank you in advance for your help
You could try out apoc.path.expandConfig procedure. It has a uniqueness property, which you can be configured that a node cannot be traversed more than once.
MATCH (n:Event)
CALL apoc.path.expandConfig(n, {
relationshipFilter: "hasEvent>",
labelFilter: "/Searched",
uniqueness: "NODE_GLOBAL"
}) YIELD path
RETURN [n IN nodes(path) WHERE NOT n:Searched] AS upper
However, this query will still return A and B, because with MATCH (n:Event) you start looking from every node (regardless of the relationship between O and A). But if I understood you correctly, you don't wan't to start from all nodes, but from a specific one ("pages dedicated to some events"). So you might want to start with, e.g., MATCH (n:Event {name: "O"}) that will return only O and C.
I am trying to discover how many hops I have to do to know a friend with Cipher. I have these relationships.
(Gutierrez)-[:Conhece]->(Felipe),
(Felipe)-[:Conhece]->(Gutierrez),
(Felipe)-[:Conhece]->(Fernando),
(Fernando)-[:Conhece]->(Felipe),
(Fernando)-[:Conhece]->(Pedro),
(Pedro)-[:Conhece]->(Fernando),
(Pedro)-[:Conhece]->(Arthur),
(Arthur)-[:Conhece]->(Pedro),
(Arthur)-[:Conhece]->(Vitor),
(Vitor)-[:Conhece]->(Arthur),
and when I execute my query it shows Fernando. What I want is to show only Vitor, Pedro and Arthur.
MATCH (n:Leitor)-[r:Conhece]-m
WHERE n.nome='Pedro' OR m.nome='Vitor'
RETURN n,r,m
with my Bacon Path query ->
Ok, I'm adding another answer because I think I understand what you want and it's different than my other answer.
If you want to find everybody that both Pedro and Vitor have met (in this case, just Author):
MATCH (pedro:Leitor)-[:Conhece]-(in_common:Leitor)-[:Conhece]-(vitor:Leitor)
WHERE pedro.nome='Pedro' AND vitor.nome='Vitor'
RETURN in_common
That also might look a bit better like this:
MATCH (pedro:Leitor {nome: 'Pedro'})-[:Conhece]-(in_common:Leitor)-[:Conhece]-(vitor:Leitor {name: 'Vitor'})
RETURN in_common
I also notice from your screenshots that every meeting has two relationships. That may very well be what you want, but if your plan was to always have two relationships whenever two people meet then you can get away with just one relationship. When you query bidirectionally (that is, without specifying direction like in the queries above) then you'll get relationships in either direction.
Normally you only want relationships going in both directions if the direction is important. That could be because your just recording that it goes from one node to another, or it could be because you're storing different values on the relationships.
Here you go:
MATCH shortestPath((n:Leitor)-[rels:Conhece*]-m)
WHERE n.nome IN ['Pedro', 'Vitor']
RETURN n,rels,m,length(rels)
In this case rels will be a collection of relationships because the path is variable length. You can also do:
MATCH path=shortestPath((n:Leitor)-[rels:Conhece*]-m)
WHERE n.nome IN ['Pedro', 'Vitor']
RETURN n,rels,m,length(rels),path,length(path)
I hope to find a node according to multiple relationship to other nodes.
For example, find a movie acted by actor A, directed by B and filmed by C.
Can anyone tell me how to do that?
Perhaps START would do that but since it needs legacy index, I prefer match.
You should be able to string multiple matches together, such as:
MATCH (m:Movie)<-[:ACTED_IN]-(a:Actor),
(m:Movie)<-[:DIRECTED]-(d:Director),
(m:Movie)<-[:FILMED_BY]-(f:Filmer)
or:
MATCH (m:Movie)<-[:ACTED_IN]-(a:Actor)
MATCH (m:Movie)<-[:DIRECTED]-(d:Director)
MATCH (m:Movie)<-[:FILMED_BY]-(f:Filmer)
Note: I haven't tested this, but I believe both styles should work. And... for brevity, I left out details such as specifying actor/director/filmer name, and the RETURN portion. (and I made an assumption you were using labels; again, just an example on how to accomplish this).
We have a scenario to display relationships spreading pictures(or messages) to user.
For example: Relationship 1 of Node A has a message "Foo", Relationship 2 of Node 2 also has same message "Foo" ... Relationship n of Node n also has same message "Foo".
Now we are going to display a relationship graph by query Neo4j.
This is my query:
MATCH (a)-[r1]-()-[r2]-()-[r3]-()-[r4]
WHERE a.id = '59072662'
and r2.message_id = r1.target_message_id
and r3.message_id = r2.target_message_id
and r4.message_id = r3.target_message_id
RETURN r1,r2,r3,r4
The problem is, this query does not work if there are only 2 levels of linking. If there is only a r1 and r2, this query returns nothing.
Please tell me how to write a Cypher query returns a set of relationships of my case?
Adding to Stefan's answer.
If you want to keep track of how pictures spread then you would also include a relationship to the image like:
(message)-[:INCLUDES]->(image)
If you want how a specific picture got spread in the message network:
MATCH (i:Image {url: "X"}), p=(recipient:User)<-[*]-(m:Message)<-[*]-(sender:User)
WHERE (m)-[:INCLUDES]->(i) WITH length(p) as length, sender ORDER BY length
RETURN DISTINCT sender
This will return all senders, ordered by path length, so the top one should be the original sender.
If you're just interested in the original sender you could use LIMIT 1.
Alternatively, if you find yourself traversing huge networks and hitting performance issue because of the massive paths that have to be traversed, you could also add a relationship between the message and the original uploader.
The answer to the question you psoted at the bottom, about the way to get a set of relationships in a variable length path:
You define a path, like in the example above
p=(recipient:User)<-[*]-(m:Message)<-[*]-(sender:User)
Then, to access the relationships in that path, you use the rels function
RETURN rels(p)
You didn't provide much details on your use case. From my experience I suggest that you rethink your way of graph data modelling.
A message seems to be a central concept in your domain. Therefore the message should be probably modeled as a node. To connect (a) and (b) via message (m), you might use something like (a)-[:SENT]->(m {message_id: ....})-[TO:]->(b).
Using this (m) could easily have a REFERS_TO relationship to another message making the query above way more graphy.
My database contains about 300k nodes and 350k relationships.
My current query is:
start n=node(3) match p=(n)-[r:move*1..2]->(m) where all(r2 in relationships(p) where r2.GameID = STR(id(n))) return m;
The nodes touched in this query are all of the same kind, they are different positions in a game. Each of the relationships contains a property "GameID", which is used to identify the right relationship if you want to pass the graph via a path. So if you start traversing the graph at a node and follow the relationship with the right GameID, there won't be another path starting at the first node with a relationship that fits the GameID.
There are nodes that have hundreds of in and outgoing relationships, some others only have a few.
The problem is, that I don't know how to tell Cypher how to do this. The above query works for a depth of 1 or 2, but it should look like [r:move*] to return the whole path, which is about 20-200 hops.
But if i raise the values, the querys won't finish. I think that Cypher looks at each outgoing relationship at every single path depth relating to the start node, but as I already explained, there is only one right path. So it should do some kind of a DFS search instead of a BFS search. Is there a way to do so?
I would consider configuring a relationship index for the GameID property. See http://docs.neo4j.org/chunked/milestone/auto-indexing.html#auto-indexing-config.
Once you have done that, you can try a query like the following (I have not tested this):
START n=node(3), r=relationship:rels(GameID = 3)
MATCH (n)-[r*1..]->(m)
RETURN m;
Such a query would limit the relationships considered by the MATCH cause to just the ones with the GameID you care about. And getting that initial collection of relationships would be fast, because of the indexing.
As an aside: since neo4j reuses its internally-generated IDs (for nodes that are deleted), storing those IDs as GameIDs will make your data unreliable (unless you never delete any such nodes). You may want to generate and use you own unique IDs, and store them in your nodes and use them for your GameIDs; and, if you do this, then you should also create a uniqueness constraint for your own IDs -- this will, as a nice side effect, automatically create an index for your IDs.