NEO4J shortestPath taking into account a particular relationship pattern - neo4j

I have a graph where I have chains of nodes that have a relationship [:LINKS_TO] and I can successfully get the shortestPath function to work.
For most of my users this level of detail is fine.
I have another set of users where there is a need for a richer set of information on the relationship. Given that properties on relationships are supposed to represent strengths or scores for the relationship I have created specific nodes to hold descriptive metadata.
This means I have a pattern that says (start)-[:PARTICIPATES]-(middle)-[:REFERENCES]->(end)
There can be any number of nodes between the start and end points in the chain.
I am struggling to get the shortestPath function to return any results for the more detailed chain. Is there a way to do this using Cypher?

You could also have kept your metadata information on the relationships.
For your needs, this should work:
MATCH p = shortestPath((start)-[:PARTICIPATES|:REFERENCES*]->(end))
RETURN nodes(p)

Related

What is indexing means neo4j and how it effects performance

I have a idea of indexing in rdbms but can't think how indexing works in neo4j and also what is schema indexing?
To quote from neo4j's free book, Graph Databases:
Indexes help optimize the process of finding specific nodes.
Most of
the time, when querying a graph, we’re happy to let the traversal
process discover the nodes and relationships that meet our
information goals. By following relationships that match a specific
graph pattern, we encounter elements that contribute to a query’s
result. However, there are certain situations that require us to pick
out specific nodes directly, rather than discover them over the course
of a traversal. Identifying the starting nodes for a traversal, for
example, requires us to find one or more specific nodes based on some
combination of labels and property values.
That same book does an extensive comparison between neo4j and relational databases as well.
As for what the above-mentioned indexes (also known as "schema indexes") index: they index the nodes that have a specific node label and node property combination.
There is also a different indexing mechanism called "manual" (or "legacy", or "explicit") indexing, which is now only recommended for special use cases.
[UPDATE]
As an example, suppose we have already created an index on :Person(firstname), like so:
CREATE INDEX ON :Person(firstname);
In that case, the following query can quickly start off by using the index to find the desired Person nodes. Once those nodes are found, neo4j can easily traverse their outgoing WORKS_AT relationships to find the related Company nodes:
MATCH (p:Person)-[:WORKS_AT]->(c:Company)
WHERE p.firstname = 'Karan'
RETURN p, c;
Without that index, the query would have to either:
Scan through all Person nodes to find the right ones, before traversing their outgoing WORKS_AT relationships, or
Find all Company nodes, traverse their incoming WORKS_AT relationships, and compare the firstname values of every Person at the other end of the relationship.

How *not* to return the relationships that link a node to itself in neo4J

In the database that I am using, nodes often have multiple relationships to themselves which makes the resulting graph very messy. As this is for a presentation how do we structure a Cypher query which does not return the self-referencing relationships
I have tried
match p=((n:actor) -[*1..3]-> (nd:movie)) where n.name='Craig' and nd.name='Pride_and_prejudice' and not (n)-[]->(n) return p
didnt give the desired result.
If you have a lot of relationships from Actor to itself, the variable length path query might not be ideal. It will always include the self-referencing relationships which limits performance and gives too many results. One solution would be to explicitly MATCH the first step and filter for the label:
MATCH p=( (n:actor)-[r1]-(n1)-[*0..2]->(nd:Movie) )
WHERE NOT n1:actor
RETURN ...
The *0..2 relationship will catch cases where n1 is a Movie.
Alternatively, you can filter the variable length path for a property as described here: http://neo4j.com/docs/stable/query-match.html#match-match-with-properties-on-a-variable-length-path

How to select relationships spreading from neo4j?

We have a scenario to display relationships spreading pictures(or messages) to user.
For example: Relationship 1 of Node A has a message "Foo", Relationship 2 of Node 2 also has same message "Foo" ... Relationship n of Node n also has same message "Foo".
Now we are going to display a relationship graph by query Neo4j.
This is my query:
MATCH (a)-[r1]-()-[r2]-()-[r3]-()-[r4]
WHERE a.id = '59072662'
and r2.message_id = r1.target_message_id
and r3.message_id = r2.target_message_id
and r4.message_id = r3.target_message_id
RETURN r1,r2,r3,r4
The problem is, this query does not work if there are only 2 levels of linking. If there is only a r1 and r2, this query returns nothing.
Please tell me how to write a Cypher query returns a set of relationships of my case?
Adding to Stefan's answer.
If you want to keep track of how pictures spread then you would also include a relationship to the image like:
(message)-[:INCLUDES]->(image)
If you want how a specific picture got spread in the message network:
MATCH (i:Image {url: "X"}), p=(recipient:User)<-[*]-(m:Message)<-[*]-(sender:User)
WHERE (m)-[:INCLUDES]->(i) WITH length(p) as length, sender ORDER BY length
RETURN DISTINCT sender
This will return all senders, ordered by path length, so the top one should be the original sender.
If you're just interested in the original sender you could use LIMIT 1.
Alternatively, if you find yourself traversing huge networks and hitting performance issue because of the massive paths that have to be traversed, you could also add a relationship between the message and the original uploader.
The answer to the question you psoted at the bottom, about the way to get a set of relationships in a variable length path:
You define a path, like in the example above
p=(recipient:User)<-[*]-(m:Message)<-[*]-(sender:User)
Then, to access the relationships in that path, you use the rels function
RETURN rels(p)
You didn't provide much details on your use case. From my experience I suggest that you rethink your way of graph data modelling.
A message seems to be a central concept in your domain. Therefore the message should be probably modeled as a node. To connect (a) and (b) via message (m), you might use something like (a)-[:SENT]->(m {message_id: ....})-[TO:]->(b).
Using this (m) could easily have a REFERS_TO relationship to another message making the query above way more graphy.

Is a DFS Cypher Query possible?

My database contains about 300k nodes and 350k relationships.
My current query is:
start n=node(3) match p=(n)-[r:move*1..2]->(m) where all(r2 in relationships(p) where r2.GameID = STR(id(n))) return m;
The nodes touched in this query are all of the same kind, they are different positions in a game. Each of the relationships contains a property "GameID", which is used to identify the right relationship if you want to pass the graph via a path. So if you start traversing the graph at a node and follow the relationship with the right GameID, there won't be another path starting at the first node with a relationship that fits the GameID.
There are nodes that have hundreds of in and outgoing relationships, some others only have a few.
The problem is, that I don't know how to tell Cypher how to do this. The above query works for a depth of 1 or 2, but it should look like [r:move*] to return the whole path, which is about 20-200 hops.
But if i raise the values, the querys won't finish. I think that Cypher looks at each outgoing relationship at every single path depth relating to the start node, but as I already explained, there is only one right path. So it should do some kind of a DFS search instead of a BFS search. Is there a way to do so?
I would consider configuring a relationship index for the GameID property. See http://docs.neo4j.org/chunked/milestone/auto-indexing.html#auto-indexing-config.
Once you have done that, you can try a query like the following (I have not tested this):
START n=node(3), r=relationship:rels(GameID = 3)
MATCH (n)-[r*1..]->(m)
RETURN m;
Such a query would limit the relationships considered by the MATCH cause to just the ones with the GameID you care about. And getting that initial collection of relationships would be fast, because of the indexing.
As an aside: since neo4j reuses its internally-generated IDs (for nodes that are deleted), storing those IDs as GameIDs will make your data unreliable (unless you never delete any such nodes). You may want to generate and use you own unique IDs, and store them in your nodes and use them for your GameIDs; and, if you do this, then you should also create a uniqueness constraint for your own IDs -- this will, as a nice side effect, automatically create an index for your IDs.

Neo4j: Conditions on Relationships with Depth

What I'm trying to do
Being relatively new to Neo4j, I'm trying to find certain nodes with Cypher in a Neo4j graph database. The nodes should be connected by a chain of relationships of a certain type with further conditions on the relationships:
// Cypher
START self = node(3413)
MATCH (self)<-[rel:is_parent_of*1..100]-(ancestors)
WHERE rel.some_property = 'foo'
RETURN DISTINCT ancestors
What goes wrong
If I drop the depth part *1..100, the query works, but of course, then allows only one relationship between self and the ancestors.
But, if I allow the ancestors to be several steps away from self by introducing the depth *1..100, the query fails:
Error: Expected rel to be a Map but it was a Collection
I thought, maybe this syntax defines rel to be is_parent_of*1..100 rather than defining rel to be a relationship of type is_parent_of and allowing a larger relationship depth.
So, I've tried to make my intentions clear by using parenthesis: [(rel:is_parent_of)*1..100. But this causes a syntax error.
I'd appreciate any help to fix this. Thanks!
Nomenclature
Calling *1..100 depth originates in the nomenclature of the neography ruby gem, where this is done using the abstract depth method.
In neo4j, this is called variable length relationships, as can be seen here in the documentation: MATCH / Variable length relationships.
Reason for the error
The reason for the "Expected rel to be a Map but it was a Collection" error is, indeed, that rel does not refer to each single relationship but rather to the entire collection of matched relationships.
For an example, see here in the documentation: MATCH / Relationship variable in variable length relationships.
Solution
First, acknowledge that the identifier refers to a collection (i.e. a set of multiple items) and call it rels instead of rel. Then, in the WHERE clause, state that the condition has to apply to all rel items in the rels collection using the ALL predicate.
// Cypher
START self = node(3413)
MATCH (self)<-[rels:is_parent_of*1..100]-(ancestors)
WHERE ALL (rel in rels WHERE rel.some_property = 'foo')
RETURN DISTINCT ancestors
The ALL predicate is explained here in the documentation: Functions / Predicate Functions.
I was led to this solution by this stackoverflow answer of a related question.
Long query time
Unfortunately, asking for relationship properties does cost a lot of time. The above query with only a couple of nodes in the database, takes over 3000ms on my development machine.

Resources