neo4j - puzzling behavior where should be very simple - neo4j

I'm totally baffled. Have been using neo4j for a while now but my file just got much much bigger (1.4G) and all of a sudden simple queries just don't work anymore. Does cypher break down when the file gets big?
MATCH (n:Node)
WHERE n.ID = "myid"
WITH DISTINCT n
OPTIONAL MATCH (n)-[rel:RELATIONSHIP]->(:Node)
REMOVE rel.Property
WITH DISTINCT n
OPTIONAL MATCH (n)-[rel:RELATIONSHIP]->(other:Node)
WHERE other.ID in ["this","that"]
SET rel.Property=true //I added this inside a foreach and both "this" and "that" started getting set properly, but I'm not sure why that would make a difference...
return n, other
This query invariably only sets Property to true for "that" and not "this". I'm totally baffled.
When instead I end with RETURN rel {.*}, id(rel) it shows two trues with one of the rel ids 67876, but then when I
MATCH ()-[rel]-()
WHERE id(rel)=67876
RETURN rel {}
I get {} as a result (ie the property is not there at all!!)
I added foreach but does not seem to really make a difference (nor should it I don't think).
Even more confusing, if I end with WITH DISTINCT n return [(n)-[rel:RELATIONSHIP]->(o) | rel.filter] it will be missing one. However, if I remove the DISTINCT n, I get more than one row and the last ones are correct-- ie in the results the exact same relationship is coming back as having Property first null then true. It's like I've come across Schrödinger's cat.
Could my file be corrupt and how would I fix it? TIA!
Addendum: I got it to work by repeating the match at the end and doing away with comprehensive maps for the return, but I'm still puzzled about why the comprehensive maps are returning incorrect information in first row and correct information in second row - all referring to the same relationship property.

Related

Match relationships if a certain parameter exists

I've been using neo4j for a while and recently i got stuck with a query that i don't seem to be able to succesfully run.
My goal: I have a type of relationship called HAS_RELATIONSHIP. this type of rel sometimes hasa property called verified. I want to get a subgraph of those relationships that don't have this property so i can afterwards add the property.
What I have done so far:
Match (a)-[r:HAS_RELATIONSHIP]-(b)
where r.verified=False
set r.verified=True
LIMIT 5
return r, a, b
the part that is not working is where r.verified=False it should be something like exists(r)=verified but t doesn't seem to exist this kind of query. I have checked on OPTIONAL MATCH, but it seems it is neither the solution.
Any ideas?
You can use the NOT operator together with the predicate function exists() for this problem:
MATCH (r) WHERE NOT exists(r.verified) RETURN r

Getting first root of a leaf in neo4j

I have follwing simple graph:
CREATE (:leaf)<-[:rel]-(:nonleaf)<-[:rel]-(:nonleaf)<-[:rel]-(:nonleaf)<-[:rel]-(r1:nonleaf{root:true})<-[:rel]-(r2:nonleaf{root:true})
I want to get the first ancestor starting from (:leaf) with root:true set on it. That is I want to get r1. For this I wrote following cypher:
MATCH (:leaf)<-[*]-(m)<-[*]-(r:nonleaf{root:true}) WHERE m.root<>true OR NOT exists(m.root)
RETURN r
But it returned both (r1) and (r2). Same happened for following cypher:
MATCH shortestPath((l:leaf)<-[*]-(r:nonleaf{root:true}))
RETURN r
Whats going on here?
Update
Ok after thinking more, it clicked to my mind that (r2) is also returned because on path from (:leaf) to (r2), there are nodes with no root property set on them (this should have clicked to me earlier, pretty much obvious, subtle interpretation mistake). In other words, it returns (:nonleaf{root:true}) if "for at least one m" following condition is true: m.root<>true OR NOT exists(m.root). The requirement here is that the condition should be valid for "all ms" on the path, "not at least one m". Now it remains to figure out how to put this in cypher and my grip on cypher isnt that tight...
You can enforce that there is a single root node on the matched path with the help of the single() predicate function:
MATCH p=(:leaf)<-[*]-(r:nonleaf{root:true})
WHERE SINGLE(m IN nodes(p) WHERE exists(m.root) AND m.root=true )
RETURN r
You just need to adjust your where condition a little so that it says "and the :nonleaf node before the root :nonleaf node matched is not itself marked as a root node". I think this will satisfy your needs.
MATCH (l:leaf)<-[*]-(r:nonleaf {root: true})
WHERE NOT (:nonleaf {root: true})<--(r)
RETURN r
UPDATED
Reading the updated example in the comments, I thought of another way to solve your problem using the APOC procedure apoc.path.expandConfig.
It does require a slight change to your data. Each root: true node should have a :root label set on it. Here is an update statement...
MATCH (n:nonleaf {root:true})
SET n:root
RETURN n
And here is the updated query
MATCH (leaf:leaf {name: 'leaf'})
WITH leaf
CALL apoc.path.expandConfig(leaf, {relationshipFilter:'<rel',labelFilter:'/root'} ) yield path
RETURN last(nodes(path))

Return results if at least one node (related but possibly not a result) has a certain property, in Neo4j

I have a graph that consists of a set of disjoint family trees.
I have a working query that has a few OPTIONAL MATCH statements, which allow me to get only the immediate parents and siblings of someone in the main_person's family tree, assuming that those relatives are of interest to us:
MATCH (p:Person {main_person: 'y'})
OPTIONAL MATCH (p)<-[]-(parent:Person)
WHERE parent.`person_of_interest` = 'y'
OPTIONAL MATCH (parent:Person)-[]->(sib:Person)
WHERE sib <> p
AND sib.`person_of_interest` = 'y'
RETURN
p, parent, sib;
But say I want to qualify this by making sure:
at least one member of a family has a test_me = 'y' property. This can be a far, distant member of the family. It definitely doesn't have to be family member that is a person_of_interest, or is a close family member.
If at least one of them has this property, then we can return the family members we are looking for. But if nobody has the property, then we don't want any results for that family.
I'm not sure how to construct this. I keep trying to start with the test_me = 'y' part, and carry it with a WITH:
MATCH (p:Person)-[]-(m)
WHERE ANY m.test_me = 'y'
WITH p, m
. . .
Maybe it should be more like:
MATCH (p:Person {main_person: 'y'})
OPTIONAL MATCH (p)<-[]-(parent:Person)
OPTIONAL MATCH (parent:Person)-[]->(sib:Person)
WHERE sib <> p
HAVING <condition here>
RETURN
p, parent, sib;
If this were SQL, I'd try to use a temp table to pipe things along.
None of it is really working.
Thanks for reading this.
[UPDATED to answer updated question]
This query may work for you (or it may run out of memory or appear to run forever):
MATCH (p:Person {main_person: 'y'})
WHERE EXISTS((p)-[*0..]-({test_me: 'y'}))
OPTIONAL MATCH (p)<--(parent:Person)
WHERE parent.person_of_interest = 'y'
OPTIONAL MATCH (parent:Person)-->(sib:Person)
WHERE sib <> p AND sib.person_of_interest = 'y'
RETURN p, COLLECT(parent) AS parents, COLLECT(sib) AS sibs;
The [*0..] syntax denotes a variable length relationship search where the matching paths can have 0 or more relationships. The reason the query uses a lower bound of 0 instead of 1 (which is the default) is this: we also want to also test whether p itself has the desired test_me property value.
However, variable length relationship searches are notorious for using a lot of memory or taking a long time to finish when no upper bound is specified, so normally a query would specify a reasonable upper bound (e.g., [*0..5]).
By the way, you should probably pass values such as 'y' as parameters instead of hard-coding them.
You're definitely on the right track, I think you already have your answer even if you don't realize it.
What you have in your description works as the start of your query, with just a few modifications:
MATCH pattern=(p:Person{main_person: 'y'})-[*]-()
WHERE ANY (person IN nodes(pattern) WHERE person.test_me = 'y')
WITH p
...
The variable relationship lets you consider every person in the tree (if there are non-family relationships in your graph, you'll want to use types on your relationship to ensure you're only considering a single family's tree), as well as the main_person. If nobody in p's family tree has your desired property, p will be null, and any subsequent matchings using p will yield no results. This should let you specify the rest of the query freely, and as long as all matches include p, you shouldn't get any results at the end for families without the desired property value.
EDIT fixed my query a bit, the ANY() clause wasn't written correctly.

Return multiple relationship counts for one MATCH statement

I want to do something like this:
MATCH (p:person)-[a:UPVOTED]->(t:topic),(p:person)-[b:DOWNVOTED]->(t:topic),(p:person)-[c:FLAGGED]->(t:topic) WHERE ID(t)=4 RETURN COUNT(a),COUNT(b),COUNT(c)
..but I get all 0 counts when I should get 2, 1, 1
A better solution is to use size which improve drastically the performance of the query :
MATCH (t:Topic)
WHERE id(t) = 4
RETURN size((t)<-[:DOWNVOTED]-(:Person)) as downvoted,
size((t)<-[:UPVOTED]-(:Person)) as upvoted,
size((t)<-[:FLAGGED]-(:Person)) as flagged
If you are sure that the other nodes on the relationships are always labelled with Person, you can remove them from the query and it will be a bit faster again
Let's start with refactoring the query a bit (hopefully the meaning of it isn't lost):
MATCH
(t:topic)
(p:person)-[upvote:UPVOTED]-(t),
(p:person)-[downvote:DOWNVOTED]->(t),
(p:person)-[flag:FLAGGED]->(t)
WHERE ID(t)=4
RETURN COUNT(upvote), COUNT(downvote), COUNT(flag)
Since t is your primary variable (since you are filtering on it), I've matched once with the label and then used just the variable throughout the rest of the matches. Seeing the query cleaned up like this, it seems to me that you're trying to count all upvotes/downvotes/flags for a topic, but you don't care who did those things. Currently, since you're using the same variable p Cypher is going to try to match the same person for all three lines. So you could have different variables:
(p1:person)-[upvote:UPVOTED]-(t),
(p2:person)-[downvote:DOWNVOTED]->(t),
(p3:person)-[flag:FLAGGED]->(t)
Or better, since you're not referencing the people anywhere else, you can just leave the variables out:
(:person)-[upvote:UPVOTED]-(t),
(:person)-[downvote:DOWNVOTED]->(t),
(:person)-[flag:FLAGGED]->(t)
And stylistically, I would also suggest starting your matches with the item that you're filtering on:
(t)<-[upvote:UPVOTED]-(:person)
(t)<-[downvote:DOWNVOTED]-(:person)
(t)<-[flag:FLAGGED]-(:person)
The next problem comes in because by making these a MATCH, you're saying that there NEEDS to be a match. Which means you'll never get cases with zeros. So you'll want OPTIONAL MATCH:
MATCH (t:topic)
WHERE ID(t)=4
OPTIONAL MATCH (t)<-[upvote:UPVOTED]-(:person)
OPTIONAL MATCH (t)<-[downvote:DOWNVOTED]-(:person)
OPTIONAL MATCH (t)<-[flag:FLAGGED]-(:person)
RETURN COUNT(upvote), COUNT(downvote), COUNT(flag)
Even then, though what you're saying is: "Find a topic and find all cases where there is 1 upvote, no downvote, no flag, 1 upvote, 1 downvote, no flag, etc... to all permutations). That means you'll want to COUNT one at a time:
MATCH (t:topic)
WHERE ID(t)=4
OPTIONAL MATCH (t)<-[r:UPVOTED]-(:person)
WITH t, COUNT(r) AS upvotes
OPTIONAL MATCH (t)<-[r:DOWNVOTED]-(:person)
WITH t, upvotes, COUNT(r) AS downvotes
OPTIONAL MATCH (t)<-[r:FLAGGED]-(:person)
RETURN upvotes, downvotes, COUNT(r) AS flags
A couple of miscellaneous items:
Be careful about using Neo IDs as a long-term reference because they can be recycled.
Use parameters whenever possible for performance / security (WHERE ID(t)={topic_id})
Also, labels are generally TitleCase. See The Zen of Cypher guide.
Check this query, i think it will help you.
MATCH (p:person)-[a:UPVOTED]->(t:topic),
(p)-[b:DOWNVOTED]->(t),(p)-[c:FLAGGED]->(t)
WHERE ID(t)=4
RETURN COUNT(a) as a_count,COUNT(b) as b_count,COUNT(c) as c_count;
Your current MATCH requires that the same person node (identified by p) have relationships of all 3 types with t. This is because an identifier is bound to a specific node (or relationship, or value), and (unless hidden by a WITH clause, which you do not have in your query) will reference that same node (or relationship, or value) throughout a query.
Based on your expected results, I am assuming that you are just trying to count the number of relationships of those 3 types between any person and t. If so, this is a performant way to do that:
MATCH (t:topic)
WHERE ID(t) = 4
MATCH (:person)-[r:UPVOTED|DOWNVOTED|FLAGGED]->(t)
RETURN REDUCE(s=[0,0,0], x IN COLLECT(r) |
CASE TYPE(x)
WHEN 'UPVOTED' THEN [s[0]+1, s[1], s[2]]
WHEN 'DOWNVOTED' THEN [s[0], s[1]+1, s[2]]
ELSE [s[0], s[1], s[2]+1]
END
) As res;
res is an array with the number of UPVOTED, DOWNVOTED, and FLAGGED relationships, respectively, between any person and t.
Another approach would be to use separate OPTIONAL MATCH statements for each relationship type, returning three COUNT(DISTINCT x) values. But the above query uses a single MATCH statement, greatly reducing the number of DB hits, which are generally expensive.

Cypher: how to return multiple nodes along a path?

I have the following graph structure:
(Building)<-[:PART_OF]-(Floor)
(Floor)<-[:PART_OF]-(Room)
(Room)<-[:INSIDE]-(Asset)
all nodes between a building and an asset are optional, for example, there might be another hierarchy between, or asset can be directly inside a building.
to get all assets in a specific building I use: MATCH (b:Building {id: buildingId})<-[*]-(a:Asset) RETURN a
how can I change this query to return also the PART_OF hierarchies along the paths?
the value of Room, Floor, ... is stored in a 'value' property.
eventually, I want to know for each returned asset, the value of Floor and Room and the labels..
I thought on starting from something like MATCH (b:Building {id: {buildingId}})<-[:PART_OF*0..]-(x)<-[:INSIDE]-(a:Asset) RETURN a, labels(x), x.value but it returns only the hierarchy which is directly connected to the asset
EDIT:
match (b:Building)<-[:PART_OF*0..]-(x)<-[:PART_OF*0..]-()<-[:INSIDE]-(a:Asset) return a, labels(x), x.value seems to do the trick, does it look correct?
Assuming you gave your complete graph structure in your question, the following might work for your needs (I assume that the building's 'id' property value is parameterized}:
OPTIONAL MATCH (b:Building {id: {id}})<-[:PART_OF]-(f:Floor)<-[:PART_OF]-(r:Room)<-[:INSIDE]-(a:Asset)
RETURN a, f.value AS fVal, r.value as rVal
UNION
OPTIONAL MATCH (b)<-[:PART_OF]-(f:Floor)<-[:INSIDE]-(a:Asset)
RETURN a, f.value AS fVal, null as rVal
UNION
OPTIONAL MATCH (b)<-[:INSIDE]-(a:Asset)
RETURN a, null AS fVal, null as rVal;
If an asset is not part of a room, then the rVal value will be null. If it is also not part of a floor, then the fVal value will be null.
Also, if there are no assets at all for the building (or any floor/room in the building), then you will still get a single row in the result, but all values will be null.
I did not bother to return any labels, since that should not be necessary with this approach.
You could try examining the paths returned between m:Machine and b:Building. Assuming you don't just want the shortest path(s) and assuming you're using Cypher 2.0 (which it looks like you are), try something like this (note the "p" binding for the path):
MATCH p = (m:Machine)-->(b:Building) RETURN nodes(p), rels(p)
(You can also use "--" instead of "-->" if direction isn't an issue.)
And if you need other info from what's returned, you can always use EXTRACT and other functions, e.g.
RETURN EXTRACT(n IN nodes(p) | p.value)
Hope this helps!
EDIT:
I may have misread the question. You may want to use "allShortestPaths" (around the (m:Machine)-->(b:Building) portion) OR use (m:Machine)-[*]->(b:Building) for variable depth paths (watch the performance, though; you may want to limit the depth) if my original answer doesn't give you what you want.

Resources