Find some connection between two node in Neo4j movie database - neo4j

How can I find some connection between the Actor node with id=1100 and the Actor node with id=65731 in Neo4j graph Movie database from the neo4j sample dataset? I have tried a lot but so far I know that id 11oo is Arnold Schwarzenegger and id 65731 is Sam Worthington.When I run Cypher using ACTS_In relationship its shows no movie together.
For finding name i have used this Cypher:
MATCH (a:Actor {id:"1100"}),(b:Actor {id:"65731"})
RETURN a.name, b.name
For finding relationship I have used this Cypher:
Match(a:Actor{name:"Arnold Schwarzenegger"})-[:ACTS_IN]->()<-[:ACTS_IN]-(b:Actor{name:"‌Sam Worthington"})
using index a:Actor(name)
using index b:Actor(name)
return count(*)
I am looking for any kind of connection. Not only the same movie.

In general, to find the paths between any 2 nodes, you can perform a variable-length relationship query.
For example:
MATCH path=(a:Actor{name:"Arnold Schwarzenegger"})-[*]-(b:Actor{name:"Sam Worthington"})
RETURN path;
Note, however, that unbounded variable-length relationship queries can take a very long time to complete (or may seem to never complete), even with relatively small DBs. The best practice is to put a reasonable upper bound on the depth of the query. For example, to search for path depths of at most 5:
MATCH path=(a:Actor{name:"Arnold Schwarzenegger"})-[*..5]-(b:Actor{name:"Sam Worthington"})
RETURN path;

Related

How to get all connected nodes in neo4j

I want to get list of all connected nodes starting from node 0 as shown in the diagram
Based on your comment:
I want to get a list of all the connected nodes. For example in the
above case when I search for connected nodes for 0, it should return
nodes- 1,2,3
This query will do what you want:
MATCH ({id : 0})-[*]-(connected)
RETURN connected
The above query will return all nodes connected with a node with id=0 (I'm considering that the numbers inside the nodes are values of an id property) in any depth, both directions and considering any relationship type. Take a look in the section Relationships in depth of the docs.
While this will work fine for small graphs note that this is a very expensive operation. It will go through the entire graph starting from the start point ({id : 0}) considering any relationship type. This is really not a good idea for production environments.
If you wish to match the nodes that have a relationship to another node, you can use this:
MATCH (n) MATCH (n)-[r]-() RETURN n,r
It will return you all the nodes that have a relationship to another node or nodes, irrespective of the direction of the relationship.
If you wish to add a constraint you can do it this way:
MATCH (n:Label {id:"id"}) MATCH (n)-[r]-() RETURN n,r
For larger or more heavily interconnected graphs, APOC Procedures offers a more efficient means of traversal that returns all nodes in a subgraph.
As others have already mentioned, it's best to use labels on your nodes, and add either an index or a unique constraint on the label+property for fast lookup of your starting node.
Using a label of "Label", and a parameter of idParam, a query to get nodes of the subgraph with APOC would be:
MATCH (n:Label {id:$idParam})
CALL apoc.path.subgraphNodes(n, {minLevel:1}) YIELD node
RETURN node
Nodes will be distinct, and the starting node will not be returned with the rest.
EDIT
There's currently a restriction preventing usage of minLevel in subgraphNodes(), you can use either filter out the starting node yourself, or use apoc.path.expandConfig() using uniqueness:'NODE_GLOBAL' to get the same effect.

How to extract a graph out of Neo4J and reconstruct it in the programming language

Consider I have a bunch of connected nodes in Neo4J, forming a tree or a graph or whatever, and I want to have them in the programming language that I'm using (I'm using Java but that's not important).
I know I can have them all with a single cypher query like this:
MATCH (n0:Root)-[:Child*0..]->(nx:Node) WHERE ID(n0) = 1 RETURN nx;
But the problem I have here is that once returned to Java, I don't know which node is connected to which! How can I return the data so I can reconstruct the graph in my programming language?
I can see that the Neo4J web interface is doing that but I don't know how!?
In your query you are returning only :Node and not any relationship info or :Root nodes.
One example would be to return the ids of nodes and type of relationships between them
MATCH (s)-[r]->(t)
RETURN id(s) as source,id(r) as target,type(r) as relationship_type
You can modify this query depending on what you want to export.
The whole idea is to return nodes in pairs (source)->(destination). If you want to export only a specific subgraph that is connected to a specific starting node labeled :Root, you can return the graph like this:
MATCH (n0:Root:Node)-[:Child*0..]->(n1:Node)-[:Child]->(n2:Node)
RETURN n1, n2;
As an alternative, if you have access to APOC Procedures, you can take advantage of apoc.path.subgraphAll(), which gives you a list of all nodes in the subgraph, and all relationships between nodes in the subgraph.
MATCH (n0:Root)
CALL apoc.path.subgraphAll(n0,{relationshipFilter:'Child>'}) YIELD nodes, relationships
...

How to filter results by node label in neo4j cypher?

I have a graph database that maps out connections between buildings and bus stations, where the graph contains other connecting pieces like roads and intersections (among many node types).
What I'm trying to figure out is how to filter a path down to only return specific node types. I have two related questions that I'm currently struggling with.
Question 1: How do I return the labels of nodes along a path?
It seems like a logical first step is to determine what type of nodes occur along the path.
I have tried the following:
MATCH p=(a:Building)­-[:CONNECTED_TO*..5]­-(b:Bus)
WITH nodes(p) AS nodes
RETURN DISTINCT labels(nodes);
However, I'm getting a type exception error that labels() expects data of type node and not Collection. I'd like to dynamically know what types of nodes are on my paths so that I can eventually filter my paths.
Question 2: How can I return a subset of the nodes in a path that match a label I identified in the first step?
Say I found that that between (a:Building) and (d1:Bus) and (d2:Bus) I can expect to find (:Intersection) nodes and (:Street) nodes.
This is a simplified model of my graph:
(a:Building)­­--(:Street)­--­(:Street)--­­(b1:Bus)
\­­(:Street)--­­(:Intersection)­­--(:Street)--­­(b2:Bus)
I've written a MATCH statement that would look for all possible paths between (:Building) and (:Bus) nodes. What would I need to do next to filter to selectively return the Street nodes?
MATCH p=(a:Building)-[r:CONNECTED_TO*]-(b:Bus)
// Insert logic to only return (:Street) nodes from p
Any guidance on this would be greatly appreciated!
To get the distinct labels along matching paths:
MATCH p=(a:Building)-[:CONNECTED_TO*..5]-(b:Bus)
WITH NODES(p) AS nodes
UNWIND nodes AS n
WITH LABELS(n) AS ls
UNWIND ls AS label
RETURN DISTINCT label;
To return the nodes that have the Street label.
MATCH p=(a:Building)-[r:CONNECTED_TO*]-(b:Bus)
WITH NODES(p) AS nodes
UNWIND nodes AS n
WITH n
WHERE 'Street' IN LABELS(n)
RETURN n;
Cybersam's answers are good, but their output is simply a column of labels...you lose the path information completely. So if there are multiple paths from a :Building to a :Bus, the first query will only output all labels in all nodes in all patterns, and you can't tell how many paths exist, and since you lose path information, you cannot tell what labels are in some paths but not others, or common between some paths.
Likewise, the second query loses path information, so if there are multiple paths using different streets to get from a :Building to a :Bus, cybersam's query will return all streets involved in all paths. It is possible for it to output all streets in your graph, which doesn't seem very useful.
You need queries that preserve path information.
For 1, finding the distinct labels on nodes on each path I would offer this query:
MATCH p=(:Building)-[:CONNECTED_TO*..5]-(:Bus)
WITH NODES(p) AS nodes
WITH REDUCE(myLabels = [], node in nodes | myLabels + labels(node)) as myLabels
RETURN DISTINCT myLabels
For 2, this query preserves path information:
MATCH p=(:Building)-[:CONNECTED_TO*..5]-(:Bus)
WITH NODES(p) AS nodes
WITH FILTER(node in nodes WHERE (node:Street)) as pathStreets
RETURN pathStreets
Note that these are both expensive operations, as they perform a cartesian product of all buildings and all busses, as in the queries in your description. I highly recommend narrowing down the buildings and busses you're matching upon, hopefully to very few or specific buildings at least.
I also encourage limiting how deep you're looking in your pattern. I get the idea that many, if not most, of your nodes in your graph are connected by :CONNECTED_TO relationships, and if we don't cap that to a reasonable amount, your query could be finding every single path through your entire graph, no matter how long or convoluted or nonsensical, and I don't think that's what you want.

Find the nodes in a Neo subgraph

I have a cyclic subgraph. I would like to know all the relationships in that subgraph. I don't know how deep the subgraph is, nor do I want to hardcode any relationship types.
The best thing I have found so far is driven by this snippet.
match(n:X)-[r*]->(m)
from r, I can find what I need. However, even for a small subgraph the cardinality of r* can be 30k or more. There is no point for Neo to calculate every path through the subgraph. I really just need the nodes or the individual relationships (preferred).
What is a way to just get the individual relationships in a subgraph? We're using Cypher.
Cypher provides no way to get all the relationships in a subgraph without following the paths. Besides, it has to explore those paths anyway in order to figure out what nodes and relationships belong to the subgraph.
To ensure that you get each relationship in a cyclic subgraph only once, you can do this:
MATCH p=(:Foo)-[*]->()
WITH RELATIONSHIPS(p) AS ps
UNWIND ps AS p
RETURN DISTINCT p;
Note, however, that variable-length path queries with no upper bound can be very expensive and may end up running "forever".
Alternate approach
If you can identify all the nodes in the desired subgraph, then there can be a more performant approach.
For example, let's suppose that all the nodes in the desired subgraph (and only those nodes) have the label X. In that case, this quick query will return all the relationships in the subgraph:
MATCH p=(:Foo)-[r]->()
RETURN r;
You can collect all nodes in a connected components with a breadth first or depth first search without filter.
The neo4j REST API has a traversal endpoint which can be used to do exactly that. It's not a Cypher query, but it could solve your problem: http://neo4j.com/docs/stable/rest-api-traverse.html
You can POST something like this against a node, there are options to only take unique nodes. Not sure but this might help with a cyclic graph.
{
"order" : "breadth_first",
"uniqueness" : "node_global",
"return_filter" : {
"language" : "builtin",
"name" : "all"
},
"max_depth" : 20
}

Is a DFS Cypher Query possible?

My database contains about 300k nodes and 350k relationships.
My current query is:
start n=node(3) match p=(n)-[r:move*1..2]->(m) where all(r2 in relationships(p) where r2.GameID = STR(id(n))) return m;
The nodes touched in this query are all of the same kind, they are different positions in a game. Each of the relationships contains a property "GameID", which is used to identify the right relationship if you want to pass the graph via a path. So if you start traversing the graph at a node and follow the relationship with the right GameID, there won't be another path starting at the first node with a relationship that fits the GameID.
There are nodes that have hundreds of in and outgoing relationships, some others only have a few.
The problem is, that I don't know how to tell Cypher how to do this. The above query works for a depth of 1 or 2, but it should look like [r:move*] to return the whole path, which is about 20-200 hops.
But if i raise the values, the querys won't finish. I think that Cypher looks at each outgoing relationship at every single path depth relating to the start node, but as I already explained, there is only one right path. So it should do some kind of a DFS search instead of a BFS search. Is there a way to do so?
I would consider configuring a relationship index for the GameID property. See http://docs.neo4j.org/chunked/milestone/auto-indexing.html#auto-indexing-config.
Once you have done that, you can try a query like the following (I have not tested this):
START n=node(3), r=relationship:rels(GameID = 3)
MATCH (n)-[r*1..]->(m)
RETURN m;
Such a query would limit the relationships considered by the MATCH cause to just the ones with the GameID you care about. And getting that initial collection of relationships would be fast, because of the indexing.
As an aside: since neo4j reuses its internally-generated IDs (for nodes that are deleted), storing those IDs as GameIDs will make your data unreliable (unless you never delete any such nodes). You may want to generate and use you own unique IDs, and store them in your nodes and use them for your GameIDs; and, if you do this, then you should also create a uniqueness constraint for your own IDs -- this will, as a nice side effect, automatically create an index for your IDs.

Resources