I have a question about extracting specific elements from array-valued properties in Neo4j. For example if the nodes in out database each have a property 'Scores', with Scores being an integer array of length 4. Is there a way to extract the first and fourth elements of every node in a path i.e. can we do something along the lines of -
start src=node(1), end =node(7)
match path=src-[*..2]-end
return extract(n in nodes(path)| n.Scores[1], n.Scores[4]);
p.s. I am using Neo4j 2.0.0-RC1
Does this work for you?
START src=node(1), end=node(7)
MATCH path=src-[*..2]-end
RETURN extract(n in nodes(path)| [n.Scores[0], n.Scores[3]] )
Basically that's creating a collection for each node of the 1st and 4th (indexes start at 0) score. See 8.2.1. Expressions in general
An expression in Cypher can be:
...
A collection of expressions:
["a", "b"], [1,2,3],["a", 2, n.property, {param}], [ ].
Related
I've got a list of id's and a list of values. I want to catch each node with the id and set a property by the value.
With just one Node that is super basic:
MATCH (n) WHERE n.id='node1' SET n.name='value1'
But i have a list of id's ['node1', 'node2', 'node3'] and same amount of values ['value1', 'value2', 'value3'] (For simplicity i used a pattern but values and id's vary a lot). My first approach was to use the query above and just call the database each time. But nowadays this isn't appropriate since i got thousand of id's which would result in thousand of requests.
I came up with this approach that I iterate over each entry in both lists and set the values. The first node from the node list has to get the first value from the value list and so on.
MATCH (n) WHERE n.id IN["node1", "node2"]
WITH n, COLLECT(n) as nodeList, COLLECT(["value1","value2"]) as valueList
UNWIND nodeList as nodes
UNWIND valueList as values
FOREACH (index IN RANGE(0, size(nodeList)) | SET nodes.name=values[index])
RETURN nodes, values
The problem with this query is that every node gets the same value (the last of the value list). The reason is in the last part SET nodes.name=values[index] I can't use the index on the left side nodes[index].name - doesn't work and the database throws error if i would do so. I tried to do it with the nodeList, node and n. Nothing worked out well. I'm not sure if this is the right way to achieve the goal maybe there is a more elegant way.
Create pairs from the ids and values first, then use UNWIND and simple MATCH .. SET query:
// THe first line will likely come from parameters instead
WITH ['node1', 'node2', 'node3'] AS ids,['value1', 'value2', 'value3'] AS values
WITH [i in range(0, size(ids)) | {id:ids[i], value:values[i]}] as pairs
UNWIND pairs AS pair
MATCH (n:Node) WHERE n.id = pair.id
SET n.value = pair.value
The line
WITH [i in range(0, size(ids)) | {id:ids[i], value:values[i]}] as pairs
combines two concepts - list comprehensions and maps. Using the list comprehension (with omitted WHERE clause) it converts list of indexes into a list of maps with id,value keys.
I have a big neo4j db with info about celebs, all of them have relations with many others, they are linked, dated, married to each other. So I need to get random path from one celeb with defined count of relations (5). I don't care who will be in this chain, the only condition I have I shouldn't have repeated celebs in chain.
To be more clear: I need to get "new" chain after each query, for example:
I try to get chain started with Rita Ora
She has relations with
Drake, Jay Z and Justin Bieber
Query takes random from these guys, for example Jay Z
Then Query takes relations of Jay Z: Karrine
Steffans, Rosario Dawson and Rita Ora
Query can't take Rita Ora cuz
she is already in chain, so it takes random from others two, for
example Rosario Dawson
...
And at the end we should have a chain Rita Ora - Jay Z - Rosario Dawson - other celeb - other celeb 2
Is that possible to do it by query?
This is doable in Cypher, but it's quite tricky. You mention that
the only condition I have I shouldn't have repeated celebs in chain.
This condition could be captured by using node-isomorphic pattern matching, which requires all nodes in a path to be unique. Unfortunately, this is not yet supported in Cypher. It is proposed as part of the openCypher project, but is still work-in-progress. Currently, Cypher only supports relationship uniqueness, which is not enough for this use case as there are multiple relationship types (e.g. A is married to B, but B also collaborated with A, so we already have a duplicate with only two nodes).
APOC solution. If you can use the APOC library, take a look at the path expander, which supports various uniqueness constraints, including NODE_GLOBAL.
Plain Cypher solution. To work around this limitation, you can capture the node uniqueness constraint with a filtering operation:
MATCH p = (c1:Celebrity {name: 'Rita Ora'})-[*5]-(c2:Celebrity)
UNWIND nodes(p) AS node
WITH p, count(DISTINCT node) AS countNodes
WHERE countNodes = 5
RETURN p
LIMIT 1
Performance-wise this should be okay as long as you limit its results because the query engine will basically keep enumerating new paths until one of them passes the filtering test.
The goal of the UNWIND nodes(p) AS node WITH count(DISTINCT node) ... construct is to remove duplicates from the list of nodes by first UNWIND-ing it to separate rows, then aggregating them to a unique collection using DISTINCT. We then check whether the list of unique nodes still has 5 elements - if so, the original list was also unique and we RETURN the results.
Note. Instead of UNWIND and count(DISTINCT ...), getting unique elements from a list could be expressed in other ways:
(1) Using a list comprehension and ranges:
WITH [1, 2, 2, 3, 2] AS l
RETURN [i IN range(0, length(l)-1) WHERE NOT l[i] IN l[0..i] | l[i]]
(2) Using reduce:
WITH [1, 2, 2, 3, 2] AS l
RETURN reduce(acc = [], i IN l | acc + CASE NOT i IN acc WHEN true THEN [i] ELSE [] END)
However, I believe both forms are less readable than the original one.
I have a graph which looks like this:
Here is the link to the graph in the neo4j console:
http://console.neo4j.org/?id=av3001
Basically, you have two branching paths, of variable length. I want to match the two paths between orange node and yellow nodes. I want to return one row of data for each path, including all traversed nodes. I also want to be able to include different WHERE clauses on different intermediate nodes.
At the end, i need to have a table of data, like this:
a - b - c - d
neo - morpheus - null - leo
neo - morpheus - trinity - cypher
How could i do that?
I have tried using OPTIONAL MATCH, but i can't get the two rows separately.
I have tried using variable length path, which returns the two paths but doesn't allow me to access and filter intermediate nodes. Plus it returns a list, and not a table of data.
I've seen this question:
Cypher - matching two different possible paths and return both
It's on the same subject but the example is very complex, a more generic solution to this simpler problem is what i'm looking for.
You can define what your end node by using WHERE statement. So in your case end node has no outgoing relationship. Not sure why you expect a null on return as you said neo - morpheus - null - leo
MATCH p=(n:Person{name:"Neo"})-[*]->(end) where not (end)-->()
RETURN extract(x IN nodes(p) | x.name)
Edit:
may not the the best option as I am not sure how to do this programmatically. If I use UNWIND I get back only one row. So this is a dummy solution
MATCH p=(n{name:"Neo"})-[*]->(end) where not (end)-->()
with nodes(p) as list
return list[0].name,list[1].name,list[2].name,list[3].name
You can use Cypher to match a path like this MATCH p=(:a)-[*]->(:d) RETURN p, and p will be a list of nodes/relationships in the path in the order it was traversed. You can apply WHERE to filter the path just like with node matching, and apply any list functions you need to it.
I will add these examples too
// Where on path
MATCH p=(:a)-[*]-(:d) WHERE NONE(n in NODES(p) WHERE n.name="Trinity") WITH NODES(p) as p RETURN p[0], p[1], p[2], p[3]
// Spit path into columns
MATCH p=(:a)-[*]-(:d) WITH NODES(p) as p RETURN p[0], p[1], p[2], p[3]
// Match path, filter on label
MATCH p=(:a)-[*]-(:d) WITH NODES(p) as p RETURN FILTER(n in p WHERE "a" in LABELS(n)) as a, FILTER(n in p WHERE "b" in LABELS(n)) as b, FILTER(n in p WHERE "c" in LABELS(n)) as c, FILTER(n in p WHERE "d" in LABELS(n)) as d
Unfortunately, you HAVE to explicitly set some logic for each column. You can't make dynamic columns (that I know of). In your table example, what is the rule for which column gets 'null'? In the last example, I set each column to be the set of nodes of a label.
I.m.o. you're asking for extensive post-processing of the results of a simply query (give me all the paths starting from Neo). I say this because :
You state you need to be able to specify specific WHERE clauses for each path (but you don't specify which clauses for which path ... indicating this might be a dynamic thing ?)
You don't know the size of the longest path beforehand ... but you still want the result to be a same-size-for-all-results table. And would any null columns then always be just before the end node ? Why (for that makes no real sense other then convenience) ?
...
Therefore (and again i.m.o.) you need to process the results in a (Java or whatever you prefer) program. There you'll have full control over the resultset and be able to slice and dice as you wish. Cypher (exactly like SQL in fact) can only do so much and it seems that you're going beyond that.
Hope this helps,
Regards,
Tom
P.S. This may seem like an easy opt-out, but look at how simple your query is as compared to the constructs that have to be wrought trying to answer your logic. So ... separate the concerns.
I am taking some steps in Cypher and Neo4j and tying to understand how cypher deals with "variables".
Specifically, I have a query
match (A {name: "A"})
match (A)<-[:st*]-(C)-[:hp]->(c)
match (A)<-[:st*]-(B)-[:hp]->(b)
match (c)-[:st]->(b)
return b
which does the job I want. Now, in the code I am using a match clause two times (lines 2 and 3), so that the variables (c) and (d) basically contain the same nodes before the final match on line 4.
Can I write the query without having to repeat the second match clause? Using
match (A {name: "A"})
match (A)<-[:st*]-(B)-[:hp]->(b)
match (b)-[:st]->(b)
return b
seems to be something very different, returning nothing since there are no :st type relationships from a node in (b) to itself. My understanding so far is that, even if (b) and (c) contain the same nodes,
match (c)-[:st]->(b)
tries to find matches between ANY node of (c) and ANY node of (b), whereas
match (b)-[:st]->(b)
tries to find matches from a particular node of (b) onto itself? Or is it that one has to think of the 3 match clauses as a holistic pattern?
Thanx for any insight into the inner working ...
When you write the 2 MATCH statements
match (A)<-[:st*]-(C)-[:hp]->(c)
match (A)<-[:st*]-(B)-[:hp]->(b)
they don't depend on each other's results (only on the result of the previous MATCH finding A). The Cypher engine could execute them independently and then return a cartesian product of their results, or it could execute the first MATCH and for each result, then execute the second MATCH, producing a series of pairs using the current result of the first MATCH and each result of the second MATCH (the actual implementation is a detail). Actually, it could also detect that the same pattern is matched twice, execute it only once and generate all possible pairs from the results.
To summarize, b and c are taken from the same collection of results, but independently, so you'll get pairs where b and c are the same node, but also all the other pairs where they are not.
If you do a single MATCH, you obviously have a single node.
Supposing a MATCH returns 2 nodes 1 and 2, with the 2 intermediate MATCH the final MATCH will see all 4 pairs:
1 2
1 (1, 1) (1, 2)
2 (2, 1) (2, 2)
whereas with a single intermediate MATCH and a final MATCH using b twice, it will only see:
1 2
1 (1, 1)
2 (2, 2)
which are not the interesting pairs, if you don't have self-relationships.
Note that it's the same in a SQL database if you do a SELECT on 2 tables without a join: you also get a cartesian product of unrelated results.
I have a graph of tags, which are related with each other. My goal is to create a Cypher query, which will return all tags that are related to an array of input tags via 1 or 2 hops.
I made a query, which doesn't work quite as intended.
MATCH (t:Tag)
WHERE t.name IN ["A", "B", "C"]
WITH t
MATCH (a:Tag)-[:RELATED*1..2]-(t)
RETURN DISTINCT a;
This query first finds the nodes A, B, C and then searches for tags, that are related to A, B or C via 1 node or less.
What I want to do though is to find tags, which are related to ALL three nodes (A, B and C).
I know I could concatenate MATCH and WITH statements, and do something like this:
MATCH (t:Tag)-[:RELATED*1..2]-(a:Tag)
WHERE t.name="A"
WITH DISTINCT a
MATCH (t:Tag)-[:RELATED*1..2]-(a)
WHERE t.name="B"
WITH DISTINCT a
MATCH (t:Tag)-[:RELATED*1..2]-(a)
WHERE t.name="C"
...
RETURN DISTINCT a;
But it runs painfully slow, when the number of input tags increase (in this case only 3 input tags: A, B, C).
So is there a way to make it in one query, similar to my first try?
Here is a solution that only requires a single MATCH clause.
MATCH (t:Tag)-[:RELATED*..2]-(other:Tag)
WHERE t.name IN ["A", "B", "C"]
WITH t, COLLECT(DISTINCT other) AS others
WITH COLLECT(others) AS coll
RETURN FILTER(x IN coll[0] WHERE ALL(y IN coll[1..] WHERE x IN y)) AS res;
The query finds all the tags (other) that are "related" (by up to 2 steps) to each of your named tags (t).
It then uses aggregation to collect the distinct other nodes for each t. In this example, we end up with 3 others collections -- 1 for each t.
It then collects all the others collections into a single coll collection.
Finally, since the result set is supposed to be the intersection of every others collection, the query walks through the nodes in first others collection, and extracts the ones that are also in the remaining others collections. And, since each others collection already contains distinct nodes, the result must also have distinct nodes.
In addition, if you have a lot of tags, the above query can be made a bit faster by:
Creating an index (or uniqueness constraint, which automatically creates an index for you) on :Tag(name), and then
Specifying the use of that index in your query -- by inserting the following clause between the MATCH and WHERE clauses. Currently, the Cypher engine does not automatically use the index for this specific query.
USING INDEX t:Tag(name)
How about this one:
MATCH (t:Tag)-[:RELATED*1..2]-(other:Tag)
WHERE t.name IN ["A", "B", "C"]
WITH t, collect(other.name) as others
WHERE ALL(x in ["A","B","C"] WHERE x in others)
RETURN t
The trick is put all the related nodes for t into a collection (others) and use the ALL predicate to make sure all of your A,B and C are part of that.
Here is an alternative:
MATCH shortestPath((t:Tag)<-[:RELATED*1..2]-(source:Tag)) //make sure there are no duplicate paths
WHERE source.name IN ["A","B","C"] AND NOT source.name = t.name //shortest path for identical node would throw an exception
WITH COLLECT(t) as tags //all tags that were reachable, with duplicates for reachable from multiple tags
UNWIND tags as tag //for each tag
WITH tag, tags //using with as match would be a drastic slowdown
WHERE size(filter(t IN tags WHERE ID(t) = ID(tag))) = 3 //if it is connected to all three, it must have been matched three times
RETURN DISTINCT m //since any match will still be in there 3 (or n) times
It first matches all reachable tags. All tags that were reachable from all tags in a list with the length n must have been matched n times if shortestPath is used. If you then filter by that criteria (present n times) the wanted tags can be retrieved with distinct.