Directed search in Cypher and Neo4j Bloom - neo4j

Given the following model:
Nodes: Host, Listener
Relations: Host -HOSTS-> Listener, Listener -CONNECTS_TO-> Listener
I want to perform a directed search on the CONNECTS_TO relation, to see the paths from host A to host B, but only if exists a listener on A that connects to a listener on B. Something like this:
(A:Host)-[:HOSTS]-(L_A:Listener)-[:CONNECTS_TO]->(L_B:Listener)-[:HOSTS]-(B:Host)
So for instance if I do have a relation L_A -CONNECTS_TO-> L_B, the query should return the path A->L_A->L_B<-B,
but if I also have a relation L_B-CONNECTS_TO->L_A the query should still return only the first path, since I'm looking for path from A to B, and if my only relation is the second one, the query should not return anything since A cannot connect to B, only B to A.
My Cypher query looks something like this, but still all the relations are returned:
match path=(A:Host)-[:HOSTS]-(L_A:Listener)-[:CONNECTS_TO*1]->(L_B:Listener)-[:HOSTS]-(B:Host)
where A.name = 'hostA' and B.name = 'hostB'
return path;
but Neo4j Browser still returns all the paths between A and B regardless of the CONNECTS_TO's direction. I also try to perform the same directed query in Bloom, but to no avail, so my questions are:
What might be wrong with my Cypher query?
Is it possible to perform this directed search in Bloom (as I cannot seem to see any directions there)?
Thank you!
LE: graph generation
create
(_0:`Host` {`name`:"hostA"}),
(_1:`Host` {`name`:"hostB"}),
(_2:`Host` {`name`:"hostC"}),
(_3:`Listener` {`name`:"listenerA"}),
(_4:`Listener` {`name`:"listenerB"}),
(_5:`Listener` {`name`:"listenerC"}),
(_0)-[:`HOSTS`]->(_3),
(_1)-[:`HOSTS`]->(_4),
(_2)-[:`HOSTS`]->(_5),
(_3)-[:`CONNECTS_TO`]->(_5),
(_3)-[:`CONNECTS_TO`]->(_4),
(_4)-[:`CONNECTS_TO`]->(_3),
(_5)-[:`CONNECTS_TO`]->(_4)
query for my only direct path between A and B:
match path=(source:Host)-[:HOSTS]-(l_source:Listener)-[:CONNECTS_TO*1]->(l_dest:Listener)-[:HOSTS]-(dest:Host) where source.name = 'hostA' and dest.name = 'hostB' return path;
https://console.neo4j.org/ here, it seems to work just fine, and at a closer look the json/text/table result in Neo4j Browser also seems correct, it's just that the visualization shows all the relations regardless of direction. Is there a way to display only the ones that I'm interested in? I know they can be interpreted as bi-directional, but in this case the direction it's quite important.
In Bloom, even though the search path is correct, and the number of nodes found is also correct, in this case 4 (hostA, listenerA, listenerB, hostB), all the nodes and relations are displayed, and this is what I'm trying to avoid, as our structure is very big, and we want to visualize certain paths only.

it's just that the visualization shows all the relations regardless of direction. Is there a way to display only the ones that I'm interested in?
If it is just the visualization, did you turn off the Connect result nodes feature in the browser?
With the option turned on I get:
Turned off:
For Bloom one option would be to put the query into a search phrase like this:
The resulting visualization will only show the query matching relationships:

Related

Cypher: Avoid cycle in multiple relationship paths

My website has pages dedicated to some events, represented by nodes in neo4j. Those events possess sub-events which are relationships under neo4j, and which correspond to links on the source page to the target page. I currently have a search engine that highlights the links to the searched events, but it is flawed by cycles in the data model. Indeed it is highlighting all the links that contain cyclic references to the current page if this page contains any link to the searched event.
The aim is therefore to have a query that is able to flag the nodes and relationships which are related to the searched events, without flagging a path only because of cyclic relationships.
I've created a small dataset representative of the issue that you can build using this query:
CREATE
(r:Event:Searched {name:'R', tag:1}),
(d:Event:Searched {name:'D', tag:1}),
(o:Event {name:'O'}),
(a:Event {name:'A'}),
(b:Event {name:'B'}),
(c:Event {name:'C'}),
(e:Event {name:'E'}),
(o)-[:hasEvent]->(a),
(o)-[:hasEvent]->(e),
(o)-[:hasEvent]->(r),
(o)-[:hasEvent]->(c),
(a)-[:hasEvent]->(b),
(b)-[:hasEvent]->(o),
(c)-[:hasEvent]->(d)
Which produces the following graph:
My aim is to have a query that only fetches nodes C and O, but not A or B, as the only reason they are flagged is that O is already flagged:
.
My current query that I need to fix is the following:
MATCH path=(upper:Event)-[:hasEvent*]->(source:Event:Searched)
RETURN upper
I hope you can help me, I couldn't manage to make similar questions' answers work on my specific case.
Ideally, the solution shouldn't be too computing-intensive, as my real model is quite big (2.300.000 nodes and 9.500.000 relationships), and the current indexing in the search engine is already quite slow.
Thank you in advance for your help
You could try out apoc.path.expandConfig procedure. It has a uniqueness property, which you can be configured that a node cannot be traversed more than once.
MATCH (n:Event)
CALL apoc.path.expandConfig(n, {
relationshipFilter: "hasEvent>",
labelFilter: "/Searched",
uniqueness: "NODE_GLOBAL"
}) YIELD path
RETURN [n IN nodes(path) WHERE NOT n:Searched] AS upper
However, this query will still return A and B, because with MATCH (n:Event) you start looking from every node (regardless of the relationship between O and A). But if I understood you correctly, you don't wan't to start from all nodes, but from a specific one ("pages dedicated to some events"). So you might want to start with, e.g., MATCH (n:Event {name: "O"}) that will return only O and C.

Cypher: Find any path between nodes

I have a neo4j graph that looks like this:
Nodes:
Blue Nodes: Account
Red Nodes: PhoneNumber
Green Nodes: Email
Graph design:
(:PhoneNumber) -[:PART_OF]->(:Account)
(:Email) -[:PART_OF]->(:Account)
The problem I am trying to solve is to
Find any path that exists between Account1 and Account2.
This is what I have tried so far with no success:
MATCH p=shortestPath((a1:Account {accId:'1234'})-[]-(a2:Account {accId:'5678'})) RETURN p;
MATCH p=shortestPath((a1:Account {accId:'1234'})-[:PART_OF]-(a2:Account {accId:'5678'})) RETURN p;
MATCH p=shortestPath((a1:Account {accId:'1234'})-[*]-(a2:Account {accId:'5678'})) RETURN p;
MATCH p=(a1:Account {accId:'1234'})<-[:PART_OF*1..100]-(n)-[:PART_OF]->(a2:Account {accId:'5678'}) RETURN p;
Same queries as above without the shortest path function call.
By looking at the graph I can see there is a path between these 2 nodes but none of my queries yield any result. I am sure this is a very simple query but being new to Cypher, I am having a hard time figuring out the right solution. Any help is appreciated.
Thanks.
All those queries are along the right lines, but need some tweaking to make work. In the longer term, though, to get a better system to easily search for connections between accounts, you'll probably want to refactor your graph.
Solution for Now: Making Your Query Work
The path between any two (n:Account) nodes in your graph is going to look something like this:
(a1:Account)<-[:PART_OF]-(:Email)-[:PART_OF]->(ai:Account)<-[:PART_OF]-(:PhoneNumber)-[:PART_OF]->(a2:Account)
Since you have only one type of relationship in your graph, the two nodes will thus be connected by an indeterminate number of patterns like the following:
<-[:PART_OF]-(:Email)-[:PART_OF]->
or
<-[:PART_OF]-(:PhoneNumber)-[:PART_OF]->
So, your two nodes will be connected through an indeterminate number of intermediate (:Account), (:Email), or (:PhoneNumber) nodes all connected by -[:PART_OF]- relationships of alternating direction. Unfortunately to my knowledge (and I'd love to be corrected here), using straight cypher you can't search for a repeated pattern like this in your current graph. So, you'll simply have to use an undirected search, to find nodes (a1:Account) and(a2:Account) connected through -[:PART_OF]- relationships. So, at first glance your query would look like this:
MATCH p=shortestPath((a1:Account { accId: {a1_id} })-[:PART_OF*]-(a2:Account { accId: {a2_id} }))
RETURN *
(notice here I've used cypher parameters rather than the integers you put in the original post)
That's very similar to your query #3, but, like you said - it doesn't work. I'm guessing what happens is that it doesn't return a result, or returns an out of memory exception? The problem is that since your graph has circular paths in it, and that query will match a path of any length, the matching algorithm will literally go around in circles until it runs out of memory. So, you want to set a limit, like you have in query #4, but without the directions (which is why that query doesn't work).
So, let's set a limit. Your limit of 100 relationships is a little on the large side, especially in a cyclical graph (i.e., one with circles), and could potentially match in the region of 2^100 paths.
As a (very arbitrary) rule of thumb, any query with a potential undirected and unlabelled path length of more than 5 or 6 may begin to cause problems unless you're very careful with your graph design. In your example, it looks like these two nodes are connected via a path length of 8. We also know that for any two nodes, the given minimum path length will be two (i.e., two -[:PART_OF]- relationships, one into and one out of a node labelled either :Email or :PhoneNumber), and that any two accounts, if linked, will be linked via an even number of relationships.
So, ideally we'd set out our relationship length between 2 and 10. However, cypher's shortestPath() function only supports paths with a minimum length of either 0 or 1, so I've set it between 1 and 10 in the example below (even though we know that in reality, the shortest path have a length of at least two).
MATCH p=shortestPath((a1:Account { accId: {a1_id} })-[:PART_OF*1..10]-(a2:Account { accId: {a2_id} }))
RETURN *
Hopefully, this will work with your use case, but remember, it may still be very memory intensive to run on a large graph.
Longer Term Solution: Refactor Graph and/or Use APOC
Depending on your use case, a better or longer term solution would be to refactor your graph to be more specific about relationships to speed up query times when you want to find accounts linked only by email or phone number - i.e. -[:ACCOUNT_HAS_EMAIL]- and -[:ACCOUNT_HAS_PHONE]-. You may then also want to use APOC's shortest path algorithms or path finder functions, which will most likely return a faster result than using cypher, and allow you to be more specific about relationship types as your graph expands to take in more data.

Neo4j cyper query: How to travese

I am trying to learn neo4j, so I just took a use case of a travel app to learn but I am not sure about the optimal way to solve it. Any help will be appreciated.
Thanks in advance.
So consider a use case in which I have to travel from one place (PLACE A) to other (PLACE C) by train, but there is no direct connection between the two places. And so we have to change our train in PLACE B.
Two places are connected via a relation IS_CONNECTED relation. refering to green nodes in the image
And then if there is an is_connected relation between two place then there will be an out going relation i.e. CONNECTED_VIA to a common train from both the node which implies how they are connected referring to red nodes in image
my question is how are we suppose to know that we have to change the station from place b
My understanding is:
We will check where the two places are connected via IS_CONNECTED relationship
match (start:place{name:"heidelberg"}), (end:place{name:"frankfurt"})
MATCH path = (start)-[:IS_CONNECTED*..]->(end)
RETURN path
this will show that these two places are connected
Then we will see that if place A and place c are directly connected or not by the query
match (p:place{name:"heidelberg"})-[:CONNECTED_VIA]->(q)<-[:CONNECTED_VIA]-(t:place{name:"frankfurt"})
return q
And this will return nothing because there is no direct connections
My brain stopped functioning after this. I am trying to figure how from past 3 days. I am sorry I look ao confused
Please click here for the image of what i am referring
You'll want to use variable-length relationships in your :CONNECTED_VIA match, and then get the :Place nodes that are in your path. And it's usually a good idea to use an upper bound, whatever makes sense in your graph.
Then we can use a filter on the nodes in your path to only keep the ones that are :Place nodes.
match path = (p:place{name:"heidelberg"})-[:CONNECTED_VIA*..4]-(t:place{name:"frankfurt"})
return path, [node in nodes(path)[1..-1] where node:Place] as connectionPlaces
And if you're only interested in the shortest paths, you may want to check the shortestPath() or shortestPaths() functions.
One last thing to note...when determining if two locations are connected, if all you need is a true or false if they're connected, you can use the EXISTS() function to return whether such a pattern exists:
match (start:place{name:"heidelberg"}), (end:place{name:"frankfurt"})
return exists((start)-[:IS_CONNECTED*..5]->(end))

How to select relationships spreading from neo4j?

We have a scenario to display relationships spreading pictures(or messages) to user.
For example: Relationship 1 of Node A has a message "Foo", Relationship 2 of Node 2 also has same message "Foo" ... Relationship n of Node n also has same message "Foo".
Now we are going to display a relationship graph by query Neo4j.
This is my query:
MATCH (a)-[r1]-()-[r2]-()-[r3]-()-[r4]
WHERE a.id = '59072662'
and r2.message_id = r1.target_message_id
and r3.message_id = r2.target_message_id
and r4.message_id = r3.target_message_id
RETURN r1,r2,r3,r4
The problem is, this query does not work if there are only 2 levels of linking. If there is only a r1 and r2, this query returns nothing.
Please tell me how to write a Cypher query returns a set of relationships of my case?
Adding to Stefan's answer.
If you want to keep track of how pictures spread then you would also include a relationship to the image like:
(message)-[:INCLUDES]->(image)
If you want how a specific picture got spread in the message network:
MATCH (i:Image {url: "X"}), p=(recipient:User)<-[*]-(m:Message)<-[*]-(sender:User)
WHERE (m)-[:INCLUDES]->(i) WITH length(p) as length, sender ORDER BY length
RETURN DISTINCT sender
This will return all senders, ordered by path length, so the top one should be the original sender.
If you're just interested in the original sender you could use LIMIT 1.
Alternatively, if you find yourself traversing huge networks and hitting performance issue because of the massive paths that have to be traversed, you could also add a relationship between the message and the original uploader.
The answer to the question you psoted at the bottom, about the way to get a set of relationships in a variable length path:
You define a path, like in the example above
p=(recipient:User)<-[*]-(m:Message)<-[*]-(sender:User)
Then, to access the relationships in that path, you use the rels function
RETURN rels(p)
You didn't provide much details on your use case. From my experience I suggest that you rethink your way of graph data modelling.
A message seems to be a central concept in your domain. Therefore the message should be probably modeled as a node. To connect (a) and (b) via message (m), you might use something like (a)-[:SENT]->(m {message_id: ....})-[TO:]->(b).
Using this (m) could easily have a REFERS_TO relationship to another message making the query above way more graphy.

Is a DFS Cypher Query possible?

My database contains about 300k nodes and 350k relationships.
My current query is:
start n=node(3) match p=(n)-[r:move*1..2]->(m) where all(r2 in relationships(p) where r2.GameID = STR(id(n))) return m;
The nodes touched in this query are all of the same kind, they are different positions in a game. Each of the relationships contains a property "GameID", which is used to identify the right relationship if you want to pass the graph via a path. So if you start traversing the graph at a node and follow the relationship with the right GameID, there won't be another path starting at the first node with a relationship that fits the GameID.
There are nodes that have hundreds of in and outgoing relationships, some others only have a few.
The problem is, that I don't know how to tell Cypher how to do this. The above query works for a depth of 1 or 2, but it should look like [r:move*] to return the whole path, which is about 20-200 hops.
But if i raise the values, the querys won't finish. I think that Cypher looks at each outgoing relationship at every single path depth relating to the start node, but as I already explained, there is only one right path. So it should do some kind of a DFS search instead of a BFS search. Is there a way to do so?
I would consider configuring a relationship index for the GameID property. See http://docs.neo4j.org/chunked/milestone/auto-indexing.html#auto-indexing-config.
Once you have done that, you can try a query like the following (I have not tested this):
START n=node(3), r=relationship:rels(GameID = 3)
MATCH (n)-[r*1..]->(m)
RETURN m;
Such a query would limit the relationships considered by the MATCH cause to just the ones with the GameID you care about. And getting that initial collection of relationships would be fast, because of the indexing.
As an aside: since neo4j reuses its internally-generated IDs (for nodes that are deleted), storing those IDs as GameIDs will make your data unreliable (unless you never delete any such nodes). You may want to generate and use you own unique IDs, and store them in your nodes and use them for your GameIDs; and, if you do this, then you should also create a uniqueness constraint for your own IDs -- this will, as a nice side effect, automatically create an index for your IDs.

Resources