Basic questions about path syntax in Cypher - neo4j

I have questions about the syntax of this Cypher query:
MATCH (tom:Person {name: "Tom Hanks"})-[:ACTED_IN]->(tomHanksMovies)
RETURN tom, tomHanksMovies
I swear I've seen some paths that have two dashes, as in --[:ACTED_IN]. What is the difference between two and one dash?
The relationship in the MATCH pattern is: [:ACTED_IN]. I think it's safe to say that the key is missing because there is no need for an identifier.
By extension, then why doesn't (tomHanksMovies) need to be written to explicitly show that it is basically just an identifier, as in (tomHanksMovies:)? Or is it not an identifier? I've read it called a variable as well. What's the correct terminology?

You would have seen Cypher patterns like this: (a)-->(b), but never (a)--[:ACTED_IN]->(b), as the latter is not legal. The -- syntax just means that there is a relationship, but the relationship type does not matter (and you don't need to use any relationship properties).
You indicate an identifier as the first string after the ( for a node or [ for a relationship, as long as the string does not start with a : or { character. The : character is used before a node label or relationship type. The { and } characters are used enclose property name/value pairs.
An identifier is referred to that way in the neo4j documentation, so that is the preferred name. However, people often use variable as well.

Related

How to create a relationship with cypher when two nodes of different types has the same value for a certain property?

The two types of nodes are Address and Wallet respectively. The property to be matched is called primWallAddr.
I am not sure if this would work:
MATCH (addr:Address {primWallAddr}), (wa:Wallet {primWallAddr})
CREATE (addr)-[:belongsTo]->(wa)
Or do I have to use "where"? (If yes, an example would be nice)
Sorry for such a basic question. I want to be sure and could not find something comparable (in my opinion) online.
You need to modify your query a bit for it to work:
MATCH (addr:Address), (wa:Wallet) WHERE addr.primWallAddr=wa.primWallAddr
CREATE (addr)-[:belongsTo]->(wa)

What is pattern comprehension and custom projection in neo4j cypher

I was reading cypher refcard in which I came across following:
Pattern comprehensions may be used to do a custom projection from a match directly into a list:
MATCH (a)
RETURN [(a)-->(b) WHERE b.name = 'Bob' | b.age]
I prepared simple graph and tried similar looking queries on it. But it kept giving error Invalid input 'W': expected whitespace, comment, a relationship pattern on WHERE.
Q1. Whats the meaning of the above cypher, should it return all paths (a)-->(b) with b.name=Bob or return b.age?
Q2. I never saw path specification (a)-->(b) after RETURN. Obviously I am missing some basics here. Whats that?
NOTE: Pattern comprehension was only introduced in Neo4j 3.1, versions 3.0.x and below won't have this feature.
Answer to Q1: The meaning in this example is: "Given variable a (since it is in scope from earlier in the query) find an outgoing relation to some node and bind it to the variable b where node b's name property is 'Bob'. Populate a list with the age property of every b node.
The | in this context separates the pattern and where clause from the expression of what values to populate into the resulting list.
Not sure I'm following what you're asking about in Q2.
For your specific usage, why it's giving you an error, we need to be able to see what you're doing with it to figure out the problem. Can you add that to your description?
Though if I was to venture a guess, you might be using a pattern in the pattern comprehension that doesn't have any relationships, something like this:
return [(a:Person) | a.name] as names
Currently usages like this will fail when there is no relationship in the pattern, something I consider a bug, and filed as such to the issues list.
For more info, here's the pattern comprehension entry in the dev guide, and a longer writeup about pattern comprehension (and map projection).

Back-reference neo4j

Can I use some back-reference sort of mechanism in neo4j? I'm not interested in what matches the query, just that it is the same thing in many places. Something like:
MATCH (a:Event {diagnosis1:11})
MATCH (b:Event {diagnosis1:15})
MATCH (c:Event {diagnosis1:5})
MATCH (a)-[rel:Next {PatientID:*}]->(b)
MATCH (b)-[rel1:Next {PatientID:\{1}]->(c)
The idea is that I just require the attribute IDs from both edges to be the same, without specifying it. The whole purpose of it would be not generating all possible matchings, to then filter them, but only hop in the specific places.
I've asked something similar in a more specific way here.
Edit: I know WHERE clauses can be used for that, but they filter the query AFTER matching the edges and nodes. I want to do that DURING the matching!
Use a WHERE clause with simple references, there's no need for back references:
MATCH (a)-[rel:Next]->(b)
MATCH (b)-[rel1:Next]->(c)
WHERE rel.PatientID = rel1.PatientID
Update
First of all, Cypher is a declarative query language: you express what you want, the runtime takes care of executing and optimizing it any way it can, so it's not that obvious that it would do it the way you think it will, or that using "back references" would magically solve the problem; it's just another way of writing the same thing.
So, your problem is that the match creates all the relationship pairs before filtering them. How about splitting the match in 2 phases using WITH?
MATCH (a:Event {diagnosis1:11})-[rel:Next]->(b:Event {diagnosis1:15})
WITH a, b, rel
MATCH (b)-[rel1:Next]->(c:Event {diagnosis1:5})
WHERE rel1.PatientID = rel.PatientID
That should only select the second relationships that match the first, but I'm not sure if it's an O(n^2) algorithm in Cypher's runtime.
Otherwise, if you drop to the Java API (which would mean either an extension or a procedure, depending on your version of Neo4j), you can probably implement in O(n) by
scanning all the relationships between a and b, indexing them by PatientID in some multimap (see Guava, or use a Map<K, Collection<V>>); this is O(n)
then doing the same for all the relationships between b and c, still O(n)
iterate on the keys of one multimap to get the values in both and match them, still O(n)

What does a comma in a Cypher query do?

A co-worker coded something like this:
match (a)-[r]->(b), (c) set c.x=y
What does the comma do? Is it just shorthand for MATCH?
Since Cypher's ASCII-art syntax can only let you specify one linear chain of connections in a row, the comma is there, at least in part, to allow you to specify things that might branch off. For example:
MATCH (a)-->(b)<--(c), (b)-->(d)
That represents three nodes which are all connected to b (two incoming relationships, and one outgoing relationship.
The comma can also be useful for separating lines if your match gets too long, like so:
MATCH
(a)-->(b)<--(c),
(c)-->(d)
Obviously that's not a very long line, but that's equivalent to:
MATCH
(a)-->(b)<--(c)-->(d)
But in general, any MATCH statement is specifying a pattern that you want to search for. All of the parts of that MATCH form the pattern. In your case you're actually looking for two unconnected patterns ((a)-[r]->(b) and (c)) and so Neo4j will find every combination of each instance of both patterns, which could potentially be very expensive. In Neo4j 2.3 you'd also probably get a warning about this being a query which would give you a cartesian product.
If you specify multiple matches, however, you're asking to search for different patterns. So if you did:
MATCH (a)-[r]->(b)
MATCH (c)
Conceptually I think it's a bit different, but the result is the same. I know it's definitely different with OPTIONAL MATCH, though. If you did:
MATCH (a:Foo)
OPTIONAL MATCH (a)-->(b:Bar), (a)-->(c:Baz)
You would only find instances where there is a Foo node connected to nothing, or connected to both a Bar and a Baz node. Whereas if you do this:
MATCH (a:Foo)
OPTIONAL MATCH (a)-->(b:Bar)
OPTIONAL MATCH (a)-->(c:Baz)
You'll find every single Foo node, and you'll match zero or more connected Bar and Baz nodes independently.
EDIT:
In the comments Stefan Armbruster made a good point that commas can also be used to assign subpatterns to individual identifiers. Such as in:
MATCH path1=(a)-[:REL1]->(b), path2=(b)<-[:REL2*..10]-(c)
Thanks Stefan!
EDIT2: Also see Mats' answer below
Brian does a good job of explaining how the comma can be used to construct larger subgraph patterns, but there's also a subtle yet important difference between using the comma and a second MATCH clause.
Consider the simple graph of two nodes with one relationship between them. The query
MATCH ()-->() MATCH ()-->() RETURN 1
will return one row with the number 1. Replace the second MATCH with a comma, however, and no rows will be returned at all:
MATCH ()-->(), ()-->() RETURN 1
This is because of the notion of relationship uniqueness. Inside each MATCH clause, each relationship will be traversed only once. That means that for my second query, the one relationship in the graph will be matched by the first pattern, and the second pattern will not be able to match anything, leading to the whole pattern not matching. My first query will match the one relationship once in each of the clauses, and thus create a row for the result.
Read more about this in the Neo4j manual: http://neo4j.com/docs/stable/cypherdoc-uniqueness.html

Neo4j node property type

I'm playing around with neo4j, and I was wondering, is it common to have a type property on nodes that specify what type of Node it is? I've tried searching for this practice, and I've seen some people use name for a purpose like this, but I was wondering if it was considered a good practice or if indexes would be the more practical method?
An example would be a "User" node, which would have type: user, this way if the index was bad, I would be able to do an all-node scan and look for types of user.
Labels have been added to neo4j 2.0. They fix this problem.
You can create nodes with labels:
CREATE (me:American {name: "Emil"}) RETURN me;
You can match on labels:
MATCH (n:American)
WHERE n.name = 'Emil'
RETURN n
You can set any number of labels on a node:
MATCH (n)
WHERE n.name='Emil'
SET n :Swedish:Bossman
RETURN n
You can delete any number of labels on a node:
MATCH (n { name: 'Emil' })
REMOVE n:Swedish
Etc...
True, it does depend on your use case.
If you add a type property and then wish to find all users, then you're in potential trouble as you've got to examine that property on every node to get to the users. In that case, the index would probably do better- but not in cases where you need to query for all users with conditions and relations not available in the index (unless of course, your index is the source of the "start").
If you have graphs like mine, where a relation type implies two different node types like A-(knows)-(B) and A or B can be a User or a Customer, then it doesn't work.
So your use case is really important- it's easy to model graphs generically, but important to "tune" it as per your usage pattern.
IMHO you shouldn't have to put a type property on the node. Instead, a common way to reference all nodes of a specific "type" is to connect all user nodes to a node called "Users" maybe. That way starting at the Users node, you can very easily find all user nodes. The "Users" node itself can be indexed so you can find it easily, or it can be connected to the reference node.
I think it's really up to you. Some people like indexed type attributes, but I find that they're mostly useful when you have other indexed attributes to narrow down the number of index hits (search for all users over age 21, for example).
That said, as #Luanne points out, most of us try to solve the problem in-graph first. Another way to do that (and the more natural way, in my opinion) is to use the relationship type to infer a practical node type, i.e. "A - (knows) -> B", so A must be a user or some other thing that can "know", and B must be another user, a topic, or some other object that can "be known".
For client APIs, modeling the element type as a property makes it easy to instantiate the right domain object in your client-side code so I always include a type property on each node/vertex.
The "type" var name is commonly used for this, but in some languages like Python, "type" is a reserved word so I use "element_type" in Bulbs ( http://bulbflow.com/quickstart/#models ).
This is not needed for edges/relationships because they already contain a type (the label) -- note that Neo4j also uses the keyword "type" instead of label for relationships.
I'd say it's common practice. As an example, this is exactly how Spring Data Neo4j knows of which entity type a certain node is. Each node has "type" property that contains the qualified class name of the entity. These properties are automatically indexed in the "types" index, thus nodes can be looked up really fast. You could implement your use case exactly like this.
Labels have recently been added to Neo4j 2.0 ( http://docs.neo4j.org/chunked/milestone/graphdb-neo4j-labels.html ). They are still under development at the moment, but they address this exact problem.

Resources