How to select relationships spreading from neo4j? - neo4j

We have a scenario to display relationships spreading pictures(or messages) to user.
For example: Relationship 1 of Node A has a message "Foo", Relationship 2 of Node 2 also has same message "Foo" ... Relationship n of Node n also has same message "Foo".
Now we are going to display a relationship graph by query Neo4j.
This is my query:
MATCH (a)-[r1]-()-[r2]-()-[r3]-()-[r4]
WHERE a.id = '59072662'
and r2.message_id = r1.target_message_id
and r3.message_id = r2.target_message_id
and r4.message_id = r3.target_message_id
RETURN r1,r2,r3,r4
The problem is, this query does not work if there are only 2 levels of linking. If there is only a r1 and r2, this query returns nothing.
Please tell me how to write a Cypher query returns a set of relationships of my case?

Adding to Stefan's answer.
If you want to keep track of how pictures spread then you would also include a relationship to the image like:
(message)-[:INCLUDES]->(image)
If you want how a specific picture got spread in the message network:
MATCH (i:Image {url: "X"}), p=(recipient:User)<-[*]-(m:Message)<-[*]-(sender:User)
WHERE (m)-[:INCLUDES]->(i) WITH length(p) as length, sender ORDER BY length
RETURN DISTINCT sender
This will return all senders, ordered by path length, so the top one should be the original sender.
If you're just interested in the original sender you could use LIMIT 1.
Alternatively, if you find yourself traversing huge networks and hitting performance issue because of the massive paths that have to be traversed, you could also add a relationship between the message and the original uploader.
The answer to the question you psoted at the bottom, about the way to get a set of relationships in a variable length path:
You define a path, like in the example above
p=(recipient:User)<-[*]-(m:Message)<-[*]-(sender:User)
Then, to access the relationships in that path, you use the rels function
RETURN rels(p)

You didn't provide much details on your use case. From my experience I suggest that you rethink your way of graph data modelling.
A message seems to be a central concept in your domain. Therefore the message should be probably modeled as a node. To connect (a) and (b) via message (m), you might use something like (a)-[:SENT]->(m {message_id: ....})-[TO:]->(b).
Using this (m) could easily have a REFERS_TO relationship to another message making the query above way more graphy.

Related

NEO4J - Matching a path where middle node might exist or not

I have the following graph:
I would look to get all contractors and subcontractors and clients, starting from David.
So I thought of a query likes this:
MATCH (a:contractor)-[*0..1]->(b)-[w:works_for]->(c:client) return a,b,c
This would return:
(0:contractor {name:"David"}) (0:contractor {name:"David"}) (56:client {name:"Sarah"})
(0:contractor {name:"David"}) (1:subcontractor {name:"John"}) (56:client {name:"Sarah"})
Which returns the desired result. The issue here is performance.
If the DB contains millions of records and I leave (b) without a label, the query will take forever. If I add a label to (b) such as (b:subcontractor) I won't hit millions of rows but I will only get results with subcontractors:
(0:contractor {name:"David"}) (1:subcontractor {name:"John"}) (56:client {name:"Sarah"})
Is there a more efficient way to do this?
link to graph example: https://console.neo4j.org/r/pry01l
There are some things to consider with your query.
The relationship type is not specified- is it the case that the only relationships from contractor nodes are works_for and hired? If not, you should constrain the relationship types being matched in your query. For example
MATCH (a:contractor)-[:works_for|:hired*0..1]->(b)-[w:works_for]->(c:client)
RETURN a,b,c
The fact that (b) is unlabelled does not mean that every node in the graph will be matched. It will be reached either as a result of traversing the works_for or hired relationships if specified, or any relationship from :contractor, or via the works_for relationship.
If you do want to label it, and you have a hierarchy of types, you can assign multiple labels to nodes and just use the most general one in your query. For example, you could have a label such as ExternalStaff as the generic label, and then further add Contractor or SubContractor to distinguish individual nodes. Then you can do something like
MATCH (a:contractor)-[:works_for|:hired*0..1]->(b:ExternalStaff)-[w:works_for]->(c:client)
RETURN a,b,c
Depends really on your use cases.

Cypher: retrieve all attached nodes of more than one type?

Beginner Cypher question. I know how to get all the nodes of a particular type attached to a particular person in my database. Here I am retrieving all the friends of a particular person, within 10 hops:
MATCH (rebecca:Person {name:"Rebecca"})-[r*1..10]->(friends:Friend)
RETURN rebecca, friends
But how would I extend this to get nodes of two types: either the friends, or the neighbours, of Rebecca?
You can filter on the label of the friends identifier :
MATCH (rebecca:Person {name:"Rebecca"})-[r*1..10]->(other)
WHERE ALL( x IN ["Friend","Neighbour"] WHERE x IN labels(other) )
RETURN rebecca, other
NB: The answer from InverseFalcon is perfectly valid, here it is just another way to do this filter.
Note that this is not really ideal, FRIEND and NEIGHBOUR are semantically best described as relationships and you can see here that when
going away from the natural way of thinking as a graph (relationships matters!) you suffer from it in your queries.
There isn't an OR we can use on the label in the MATCH itself, so you may have to filter with a WHERE clause:
MATCH (rebecca:Person {name:"Rebecca"})-[r*1..10]->(friendOrNeighbor)
WHERE friendOrNeighbor:Friend or friendOrNeighbor:Neighbor
RETURN DISTINCT rebecca, friendOrNeighbor
Keep in mind variable-length relationship matches like this are meant to find all possible paths up to the given max limit, so this is actually doing extra work that you may not need, that may be slow if there are many relationships within that local graph.
You may want to consider apoc.path.expandConfig() from APOC Procedures. If you use 'NODE_GLOBAL' for uniqueness, and specify the upper bound with maxLevel: 10, it's a much more efficient means of getting the nodes you want faster.

neo4j getting from a list of labels that which is a child and which one is parent

I have a problem in which there a number of nodes A,B,C,D
where
B-->A
C-->B
D-->B
and the relation between them is children.
Now I want to query Neo4j to find that from a list of labels (B,C,D) which nodes exists at the bottom of the graph
I am making a bot application. In the neo4j database relations would be stored between different terms.
Like :dog-->:animal
:labra-->:dog
:germanShepard-->:dog
Now If a user asks a qustion tell me about dog then i should be able to get dog label data and if the user asks tell me about labra dog then i should be able to get labra label data.I am breaking the user input into tokens and then trying to find which label is at the bottom.
You can try something like
Match (a:Label) where not (a)<--(:Label) return a
(should work but I didn't test it)
As mentioned in my comment, using a unique label for every single node is going to be costly in the long run, and is going to impact your lookup speed on your queries.
So, if I'm understanding your use case correctly, you're breaking up user input into tokens, and the tokens should match to nodes on the same path in your graph. You want to find the label on the "bottom" of the graph, basically a leaf node, though in your description child nodes point toward their parent. I'll assume it's a :Parent relationship from the child to the parent node.
Here's a query which might do what you want. We'll assume you pass in the list of tokens as a parameter {tokens}. Please review the developer documentation for using parameters.
UNWIND {tokens} as token
MATCH (n)
WHERE labels(n) = token
AND NOT ()-[:Parent]->(n)
RETURN n
This will ensure the nodes you return are not themselves parents of any other node.
However, if you want instead wanted to be able to return nodes even if they were parents of other nodes, then we could instead return the node that is farthest from the root node. This requires a :Root node at the root of your entire graph. For your example in your description, :Root would be the parent of :animal.
UNWIND {tokens} as token
MATCH (n)
WHERE labels(n) = token
MATCH (n)-[r:Parent*]->(:Root)
RETURN n
ORDER BY SIZE(r)
LIMIT 1
Keep in mind that this query isn't guaranteed to work when there are multiple nodes with the same distance to the :Root. For example, if "germanShepard" and "labra" were given as elements of the tokens list, only one of the corresponding nodes would be returned because of the LIMIT 1, with no guarantee of which node would be returned.

NEO4J shortestPath taking into account a particular relationship pattern

I have a graph where I have chains of nodes that have a relationship [:LINKS_TO] and I can successfully get the shortestPath function to work.
For most of my users this level of detail is fine.
I have another set of users where there is a need for a richer set of information on the relationship. Given that properties on relationships are supposed to represent strengths or scores for the relationship I have created specific nodes to hold descriptive metadata.
This means I have a pattern that says (start)-[:PARTICIPATES]-(middle)-[:REFERENCES]->(end)
There can be any number of nodes between the start and end points in the chain.
I am struggling to get the shortestPath function to return any results for the more detailed chain. Is there a way to do this using Cypher?
You could also have kept your metadata information on the relationships.
For your needs, this should work:
MATCH p = shortestPath((start)-[:PARTICIPATES|:REFERENCES*]->(end))
RETURN nodes(p)

Is a DFS Cypher Query possible?

My database contains about 300k nodes and 350k relationships.
My current query is:
start n=node(3) match p=(n)-[r:move*1..2]->(m) where all(r2 in relationships(p) where r2.GameID = STR(id(n))) return m;
The nodes touched in this query are all of the same kind, they are different positions in a game. Each of the relationships contains a property "GameID", which is used to identify the right relationship if you want to pass the graph via a path. So if you start traversing the graph at a node and follow the relationship with the right GameID, there won't be another path starting at the first node with a relationship that fits the GameID.
There are nodes that have hundreds of in and outgoing relationships, some others only have a few.
The problem is, that I don't know how to tell Cypher how to do this. The above query works for a depth of 1 or 2, but it should look like [r:move*] to return the whole path, which is about 20-200 hops.
But if i raise the values, the querys won't finish. I think that Cypher looks at each outgoing relationship at every single path depth relating to the start node, but as I already explained, there is only one right path. So it should do some kind of a DFS search instead of a BFS search. Is there a way to do so?
I would consider configuring a relationship index for the GameID property. See http://docs.neo4j.org/chunked/milestone/auto-indexing.html#auto-indexing-config.
Once you have done that, you can try a query like the following (I have not tested this):
START n=node(3), r=relationship:rels(GameID = 3)
MATCH (n)-[r*1..]->(m)
RETURN m;
Such a query would limit the relationships considered by the MATCH cause to just the ones with the GameID you care about. And getting that initial collection of relationships would be fast, because of the indexing.
As an aside: since neo4j reuses its internally-generated IDs (for nodes that are deleted), storing those IDs as GameIDs will make your data unreliable (unless you never delete any such nodes). You may want to generate and use you own unique IDs, and store them in your nodes and use them for your GameIDs; and, if you do this, then you should also create a uniqueness constraint for your own IDs -- this will, as a nice side effect, automatically create an index for your IDs.

Resources