Neo4j: Find Person who acted in and directed any movies

Neo4j: Find Person who acted in and directed any movies - neo4j

I want to display all Person who act as well as direct a movie. It does not matter if the person direct a movie but does not act in the movie. As long as there are edges ACTED_IN and DIRECTED exist on a node, the query will display the result.
I have tried several Cypher queries. This one I believe show the nearest result I intend to:
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE exists( (p)-[:DIRECTED]->() )
RETURN distinct *
Now, the issue is, one of the result shows "James Marshall" ACTED_IN "A Few Good Men" but he also DIRECTED two different movies which are "Ninja Assasin" and "V for Vendetta".
My current outcome only display "James Marshall" ACTED_IN "A Few Good Men" and does not show the other two movies he DIRECTED. So, how can I improve my Cypher?

You can first match on persons that have the relationships you need (this will be a degree check), then MATCH on the pattern using both relationships at once (which would be an OR match for the relationships in question otherwise):
MATCH (p:Person)
WHERE (p)-[:ACTED_IN]->() AND (p)-[:DIRECTED->()
MATCH path = (p)-[:ACTED_IN | DIRECTED]->(m:Movie)
RETURN path

You can use the solution in this post
The key point is to add another relationships in the other way around (right to the left to the movie), and then add in the WHERE clause the condition.
MATCH (a1:Person)-[:ACTED_IN]->(m:Movie)<-[:ACTED_IN]-(a2:Person)
WHERE exists( (a2)-[:DIRECTED]->(m) ) RETURN a2, m

Related

Enforce order of relations in multipath queries

I'm looking into neo4j as a Graph database, and variable length path queries will be a very important use case. I now think I've found an example query that Cypher will not support.
The main issue is that I want to treat composed relations as a single relation. Let my give an example: finding co-actors. I've done this using the standard database of movies. The goal is to find all actors that have acted alongside Tom Hanks. This can be found with the query:
MATCH (tom {name: "Tom Hanks"})-[:ACTED_IN]->()<-[:ACTED_IN]-(a:Person) return a
Now, what if we want to find co-actors of co-actors recursively.
We can rewrite the above query to:
MATCH (tom {name: "Tom Hanks"})-[:ACTED_IN*2]-(a:Person) return a
And then it becomes clear we can do this with
MATCH (tom {name: "Tom Hanks"})-[:ACTED_IN*]-(a:Person) return a
Notably, all odd-length paths are excluded because they do not end in a Person.
Now, I have found a query that I cannot figure out how to make recursive:
MATCH (tom {name: "Tom Hanks"})-[:ACTED_IN]->()<-[:DIRECTED]-()-[:DIRECTED]->()<-[:ACTED_IN]-(a:Person) return DISTINCT a
In words, all actors that have a director in common with Tom Hanks.
In order to make this recursive I tried:
MATCH (tom {name: "Tom Hanks"})-[:ACTED_IN|DIRECTED*]-(a:Person) return DISTINCT a
However, (besides not seeming to complete at all). This will also capture co-actors.
That is, it will match paths of the form
()-[:ACTED_IN]->()<-[:ACTED_IN]-()
So what I am wondering is:
can we somehow restrict the order in which relations occur in a multi-path query?
Something like:
MATCH (tom {name: "Tom Hanks"}){-[:ACTED_IN]->()<-[:DIRECTED]-()-[:DIRECTED]->()<-[:ACTED_IN]-}*(a:Person) return DISTINCT a
Where the * applies to everything in the curly braces.

The path expander procs from APOC Procedures should help here, as we added the ability to express repeating sequences of labels, relationships, or both.
In this case, since you want to match on the actor of the pattern rather than the director (or any of the movies in the path), we need to specify which nodes in the path you want to return, which requires either using the labelFilter in addition to the relationshipFilter, or just to use the combined sequence config property to specify the alternating labels/relationships expected, and making sure we use an end node filter on the :Person node at the point in the pattern that you want.
Here's how you would do this after installing APOC:
MATCH (tom:Person {name: "Tom Hanks"})
CALL apoc.path.expandConfig(tom, {sequence:'>Person, ACTED_IN>, *, <DIRECTED, *, DIRECTED>, *, <ACTED_IN', maxLevel:12}) YIELD path
WITH last(nodes(path)) as person, min(length(path)) as distance
RETURN person.name
We would usually use subgraphNodes() for these, since it's efficient at expanding out and pruning paths to nodes we've already seen, but in this case, we want to keep the ability to revisit already visited nodes, as they may occur in further iterations of the sequence, so to get a correct answer we can't use this or any of the procs that use NODE_GLOBAL uniqueness.
Because of this, we need to guard against exploring too many paths, as the permutations of relationships to explore that fit the path will skyrocket, even after we've already found all distinct nodes possible. To avoid this, we'll have to add a maxLevel, so I'm using 12 in this case.
This procedure will also produce multiple paths to the same node, so we're going to get the minimum length of all paths to each node.
The sequence config property lets us specify alternating label and relationship type filterings for each step in the sequence, starting at the starting node. We are using an end node filter symbol, > before the first Person label (>Person) indicating that we only want paths to the Person node at this point in the sequence (as the first element in the sequence it will also be the last element in the sequence as it repeats). We use the wildcard * for the label filter of all other nodes, meaning the nodes are whitelisted and will be traversed no matter what their label is, but we don't want to return any paths to these nodes.

If you want to see all the actors who acted in movies directed by directors who directed Tom Hanks, but who have never acted with Tom, here is one way:
MATCH (tom {name: "Tom Hanks"})-[:ACTED_IN]->(m)
MATCH (m)<-[:ACTED_IN]-(ignoredActor)
WITH COLLECT(DISTINCT m) AS ignoredMovies, COLLECT(DISTINCT ignoredActor) AS ignoredActors
UNWIND ignoredMovies AS movie
MATCH (movie)<-[:DIRECTED]-()-[:DIRECTED]->(m2)
WHERE NOT m2 IN ignoredMovies
MATCH (m2)<-[:ACTED_IN]-(a:Person)
WHERE NOT a IN ignoredActors
RETURN DISTINCT a
The top 2 MATCH clauses are deliberately not combined into one clause, so that the Tom Hanks node will be captured as an ignoredActor. (A MATCH clause filters out any result that use the same relationship twice.)

get output of query in source-edge-destination format

I am new to cypher. I am using the movie database of neo4j CE 3.1.2.
I want to get output of my query
match (a:Person)-[r]-(b:Movie) return a, r, b
in the form source, edge, destination, for example
Tom Hanks, ACTED_IN, That Thing You Do
Tom Hanks, DIRECTED, That Thing You Do
Gary Sinise, ACTED_IN, The Green Mile
Basically I want a row for every source_node, destination_node connection.
Please help.
EDIT 1
If my graph is like A->B->C->D and if I want all down tree nodes of A, then my output should be like
A,connected,B
B,connected,C
C,connected,D

You're looking at the graphical view of the results. If you look to the left of the graph, you'll see the ability to change to Row and Text views of the results, which should give you what you want...almost.
Output for relationships currently shows only properties on the relationships, not the type, so you may need to adjust your query slightly.
match (a:Person)-[r]-(b:Movie)
return a, type(r) as type, b
If you want ONLY the person's name, the type of the relationship, and the movie title, then you'll need to return the properties from the nodes, not just the nodes themselves:
match (a:Person)-[r]-(b:Movie)
return a.name as name, type(r) as type, b.title as title

Neo4J contains all Query

I'm brand new to neo4j and graph databases.
I'm trying to create a query I would describe as a 'contains all' however I think I'm very far away and not sure how to progress
MATCH (movie:Movie {name:'tropic thunder'})-[:stars_in]-(actors)
-[:guest_stars_in]-(movie2)
RETURN movie2.name
Let's say
MATCH (movie:Movie{name:'tropic thunder'})-[:stars_in]-(actors)
returns 5 actors
I'm looking to match exactly (all 5 actors -> same 5 actors as guest stars) or as a subset (all 5 actors are a subset of a movie which has 10 guest stars).
Hope that makes sense. Thanks for your help :D

The first thing I would point out is that you should call the variable actor instead of actors. It may seem picky, but it's a common confusion with Cypher. With the MATCH you are matching one sub-pattern at a time.
So to start out let's find each movie2 and get an array of the actors in question:
MATCH (movie:Movie {name:'tropic thunder'})-[:stars_in]-(actor)
-[:guest_stars_in]-(movie2)
RETURN movie2.name, collect(actor)
A first instinct might be to extend the path like so:
MATCH (movie:Movie {name:'tropic thunder'})-[:stars_in]-(actor)
-[:guest_stars_in]-(movie2)-[:guest_starts_in]-(actor2)
But again, we're matching every possible match of that path in the database. So for each actor, we're going to match all possible actor2s, which would lead to duplicates.
What we can do, though, is to take our first query and change the RETURN to a WITH in order to pass our data onto a second part of the query:
MATCH (movie:Movie {name:'tropic thunder'})-[:stars_in]-(actor)
-[:guest_stars_in]-(movie2)
WITH movie2, collect(actor) AS original_movie_actors
MATCH movie2-[:guest_stars_in]-(guest_star)
RETURN movie2.name, original_movie_actors, collect(guest_star) AS guest_stars
This gives us
a list of movies in question
the list of the actors who both stared in "tropic thunder" and guest stared in the movie in question
all guest stars for the movie in question
From here you could probably figure it out in your programming language of choice. But we can figure this out in Cypher too:
MATCH (movie:Movie {name:'tropic thunder'})-[:stars_in]-(actor)
-[:guest_stars_in]-(movie2)
WITH movie2, collect(actor) AS original_movie_actors
MATCH movie2-[:guest_stars_in]-(guest_star)
WITH movie2, original_movie_actors, collect(guest_star) AS guest_stars
RETURN
movie.name,
ALL(guest_star IN guest_stars WHERE guest_star IN original_movie_actors) AS all_matched,
length(original_movie_actors) / length(guest_stars) AS percentage_match
I threw in a percentage_match as a double-check and in case that's useful

Cypher path querying (using Neo4j)

I have a graph datebase so that there is in it some pattern like this one:
(n1)-[:a]->(n2),
(n1)-[:b]->(n2),
(n1)-[:c]->(n2),
(n1)-[:e]->(n2),
(n1)-[:d]->(n3),
(n2)-[:b]->(n4)
And I want to have all graph with this pattern
MATCH p={
(n3)<-[:d]-(n1)-[:a]->(n2)-[:b]->(n4),
(n1)-[:b]->(n2)<-[:c]-(n1),
(n1)-[:e]->(n2)
}
RETURN p
Is it possible? I've search a little but I haven't found how to do it.
I know we can use "|" for a type like this
()-[:a|b]->()
but there is no "&" and the path assigning only works on pattern which are written without ",".
Thanks
EDIT:
If it could help, here is another example of what I'm seeking:
In a database with movies, person and relations like ACTED_IN, KNOWS, FRIEND and HATE
I want all the graphs containing an actor "Actor1" (who ACTED_IN a movie "M") who KNOWS "Person1", FRIEND "Person2" and HATE "Person3" which ACTED_IN the same movie "M".
An UNION like the one in the answer of "Michael Hunger" does not work because we have multiple subgraphs and not graphs. Moreover, some subgraph might not be correct answers for the bigger pattern.

Your query will be very inefficient, as you don't restrict your search to a set of start nodes neither with labels or label+property combinations !!!!
You can use UNION for that:
MATCH p=(n3)<-[:d]-(n1)-[:a]->(n2)-[:b]->(n4) RETURN p
UNION
MATCH p=(n1)-[:b]->(n2)<-[:c]-(n1) RETURN p
UNION
MATCH p=(n1)-[:e]->(n2) RETURN p

Please clarify the query used in Neo4j online course

I am going through the online course at http://www.neo4j.org/learn/online_course.
In that under the section (Graph LAB) - (Paths), the below query was used to RETURN all of the actors and directors in all of the movies.
MATCH (a)-[:ACTED_IN]->(m)<-[:DIRECTED]-(d)
RETURN a.name, m.title, d.name;
It is perfectly alright.
To the next question "How would you change this query to RETURN only the directors who acted in their own movies?"
They gave the solution as change (d) to (a). So the query is,
MATCH (a)-[:ACTED_IN]->(m)<-[:DIRECTED]-(a)
RETURN a.name, m.title, a.name;
When i execute this query it throws output as "Unforgiven, Clint Eastwood".
But my doubt is when i look at the interactive network diagram, the node "Clint Eastwood" was connected to the "Movie" node only through the relationship "DIRECTED". There was not a separate relationship "ACTED_IN". Then how neo4j selects only "Unforgiven, Clint Eastwood" and discards other directors.
Please clarify this. Is my understanding wrong.
r karthik.

Yes, the DIRECTED relationship is overlaying the ACTED_IN one. You can see the ACTED_IN relationship after deleting the DIRECTED relationship:
MATCH (a)-[:ACTED_IN]->(m)<-[d:DIRECTED]-(a)
DELETE d;
I agree that the visualization should somehow avoid letting overlays like this happen. You should bring this to the attention of the neo4j folks. At the very least, they may want to change the tutorial to avoid this situation.

This works for me:
MATCH (a)-[:ACTED_IN]->(m)<-[:DIRECTED]-(a)
RETURN a.name, m.title;

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Neo4j: Find Person who acted in and directed any movies - neo4j

Related

Enforce order of relations in multipath queries

get output of query in source-edge-destination format

Neo4J contains all Query

Cypher path querying (using Neo4j)

Please clarify the query used in Neo4j online course

Categories

Resources