Neo4j query execution process - neo4j

I am using Neo4j 5.1 Enterprise edition.
I performed the following code:
profile MATCH(d:Dataset {name:'dataset2'})<-[:`has_d`]-(s:Score)-[:`has_a`]->(a:Algorithm {name:'algorithm1'})
MATCH (t:Tag) WHERE t.name IN ['tag1', 'tag2', 'tag3', 'tag4', 'tag5']`
MATCH (i:Image)-[:has_score]->(s)-[:`has_tag`]->(t)
RETURN i LIMIT 100
Due to the profile result is too big, i only post here the important part:
I was expecting it to filter Tag by name before doing Expand.
Why Neo4j did Expand before Filter?
How can i fix it? Is the order of execution irrelevant?
Is Filter#Neo4j a simple filter or it uses our index?
I'm very sorry for asking so many questions, maybe some of them are stupid and obvious, but I don't understand why.
Any help would be greatly appreciated

It needs to follow the relationships by type and direction first from the source nodes.
So it does the expand by type and direction
and only then sees the end node and then can filter those nodes if they have the matching label(s).
If your relationships are already uniquely identifying the target node label, then you can leave off the labels from those (but not from the start node otherwise it won't use the index).

Related

Adding a property filter to cypher query explodes memory, why?

I'm trying to write a query that explores a DAG-type graph (a bill of materials) for all construction paths leading down to a specific part number (second MATCH), among all the parts associated with a given product (first MATCH). There is a strange behavior I don't understand:
This query runs in a reasonable time using Neo4j community edition (~2 s):
WITH '12345' as snid, 'ABCDE' as pid
MATCH (m:Product {full_sn:snid})-[:uses]->(p:Part)
WITH snid, pid, collect(p) AS mparts
MATCH path=(anc:Part)-[:has*]->(child:Part)
WHERE ALL(node IN nodes(path) WHERE node IN mparts)
WITH snid, path, relationships(path)[-1] AS rel,
nodes(path)[-2] AS parent, nodes(path)[-1] AS child
RETURN stuff I want
However, to get the query I want, I must add a filter on the child using the part number pid in the second MATCH statement:
MATCH path=(anc:Part)-[:has*]->(child:Part {pn:pid})
And when I try to run the new query, neo4j browser compains that there is not enough memory. (Neo.TransientError.General.OutOfMemoryError). When I run it with EXPLAIN, the db hits are exploding into the 10s of billions, as if I'm asking it for a massive cartestian product: but all I have done is added a restriction on the child, so this should be reducing the search space, shouldn't it?
I also tried adding an index on :Part(pn). Now the profile shown by EXPLAIN looks very efficient, but I still have the same memory error.
If anyone can help me understand why this change between the two queries is causing problems, I'd greatly appreciate it!
Best wishes,
Ben
MATCH path=(anc:Part)-[:has*]->(child:Part)
The * is exploding to every downstream child node.
That's appropriate if that is what's desired. If you make this an optional match and limit to the collect items, this should restrict the return results.
OPTIONAL MATCH path=(anc:Part)-[:has*]->(child:Part)
This is conceptionally (& crudely) similar to an inner join in SQL.

Return Nodes related with a relatiohip to other Neo4j

i have just started using Neo4j, and after creating the whole graph i'm trying to get all the nodes related to another one by a relatioship.
Match (n)-[Friendship_with]->({Name:"Gabriel"}) return n
That should give me the nodes that are friend of Gabriel, what i'm doing wrong?
I have used too this:
Match n-[r:Friendship_with]->n1 where n1.Name="Gabriel" return n
That give me some nodes, but some of then aren't directly related to Gabriel (for example, Maria is friend of Gabriel, she appears when i write that, but Alex who is friend of Maria and not from Gabriel, appear too)
This is weird.
Your query is correct.
I would suggest to check your data. Are you sure there isn't any direct connection between Alex and Gabriel ?
You could visualize your graph and see what is happening exactly in the neo4j browser. Just type a query with a bit more info like:
MATCH (n)-[f:Friendship_with]->(p {Name:"Gabriel"}) return n,f,p
and use the graph view to inspect your data.
EDIT:
As Pointed out by Michael, your first query is missing a colon in front of the specified relationship label "Friendship_with". So neo4j thinks it is a (rather long) variable name for your relationships, just as 'n' or 'n1'. It will thus retrieve anything that is connected to Gabriel without filtering by relationship label.
It doesn't explain though why you:
get the same results with the first and second query
get a 2nd degree relation as a result
so check your data anyway :)
You forgot the colon before :Friendship_with
Don't forget to provide labels, e.g. (n1:Person {Name:"Gabriel"})
Also some of your friendships might go in the other direction, so leave off the direction-arrow: Match (n:Person)-[Friendship_with]-(:Person {Name:"Gabriel"}) return n

Different results of two (synonymous) queries in Neo4j

I have identified that some queries happen to return less results than expected. I have taken one of the missing results and tried to force Neo4j to return this result - and I succeeded with the following query:
match (q0),(q1),(q2),(q3),(q4),(q5)
where
q0.name='v4' and q1.name='v3' and q2.name='v5' and
q3.name='v1' and q4.name='v3' and q5.name='v0' and
(q1)-->(q0) and (q0)-->(q3) and (q2)-->(q0) and (q4)-->(q0) and
(q5)-->(q4)
return *
I have supposed that the following query is semantically equivalent to the previous one. However in this case, Neo4j returns no result at all.
match (q1)-->(q0), (q0)-->(q3), (q2)-->(q0), (q4)-->(q0), (q5)-->(q4)
where
q0.name='v4' and q1.name='v3' and q2.name='v5' and
q3.name='v1' and q4.name='v3' and q5.name='v0'
return *
I have also manually verified that the required edges among vertices v0, v1, v3, v4 and v5 are present in the database with right directions.
Am I missing some important difference between these queries or is it just a bug of Neo4j? (I have tested these queries on Neo4j 2.1.6 Community Edition.)
Thank you for any advice
/EDIT: Updating to newest version 2.2.1 was of no help.
This might not be a complete answer, but here's what I found out.
These queries aren't synonymous, if I understand correctly.
First of all, use EXPLAIN (or even PROFILE) to look under the hood. The first query will be executed as follows:
The second query:
As you can see (even without going deep down), those are different queries in terms of both efficiency and semantics.
Next, what's actually going on here:
the 1st query will look through all (single) nodes, filter them by name, then - try to group them according to your pattern, which will involve computing Cartesian product (hence the enormous space complexity), then collect those groups into the larger ones, and then evaluate your other conditions.
the 2nd query will first pick a pair of nodes connected with some relationship (which satisfy the condition on the name property), then throw in the third node and filter again, ..., and so on till the end. The number of nodes is expected to decrease after every filter cycle.
By the way, is it possible that you accidentally set the same name twice (for q1 and q3?)

Approaches to tagging in Neo4j

I'm pretty new to Neo4j; I've only gotten as far as writing a hello world. Before I proceed, I want to make sure I have the right idea about how Neo4j works and what it can do for me.
As an example, say you wanted to write a Neo4j back end for a site like this. Questions would be nodes. Naïvely, tags would be represented by an array property on the question node. If you wanted to find questions with a certain tag, you'd have to scan every question in the database.
I think a better approach would to represent tags as nodes. If you wanted to find all questions with a certain tag, you'd start at the tag node and follow the relationships to the questions. If you wanted to find questions with all of a set of tags, you'd start at one of the tag nodes (preferably the least common/most specific one, if you know which one that is), follow its relationships to questions, and then select the questions with relationships to the other tags. I don't know how to express that in Cypher yet, but is that the right idea?
In my real application, I'm going to have entities with a potentially long list of tags, and I'm going to want to find entities that have all of the requested tags. Is this something where Neo4j would have significant advantages over SQL?
Kevin, correct.
You'd do it like that.
I even created a model some time ago for stackoverflow that does this.
For Cypher you can imagine queries like these
Find the User who was most active
MATCH (u:User)
OPTIONAL MATCH (u)-[:AUTHORED|ASKED|COMMENTED]->()
RETURN u,count(*)
ORDER BY count(*) DESC
LIMIT 5
Find co-used Tags
MATCH (t:Tag)
OPTIONAL MATCH (t)<-[:TAGGED]-(question)-[:TAGGED]->(t2)
RETURN t.name,t2.name,count(distinct question) as questions
ORDER BY questions DESC
MATCH (t:Tag)<-[r:TAGGED]->(question)
RETURN t,r,question

Returning all relationships for a list of nodes

This is quite a general question but to make it more understandable I'll give it a bit of context.
In neo4j I have a series of words (nodes) that are associated with one another. I want to specify a list of nodes and the Cypher query return a list of any relationships between those nodes.
The nodes specified in the list are all guaranteed to have at least one relationship to another node specified in the list.
I created a query to do this and in certain circumstances it works fine - http://console.neo4j.org/?id=s30cbm
Unfortunately, when I add the words 'bark' and 'dog' to the list I get an 'unexpected traversal state encountered' error message. I presume this is because the database cursor has got to the fruit node and then there's no relationship between that and bark, even though there is a relationship from tree to bark. http://console.neo4j.org/?id=258d6g
I'm obviously doing the query slightly wrong and any advice would be appreciated on how I can rectify this.
This works in the latest console (your second link), btw, so it looks like they fixed it. Looks like it should be working in 1.9-M04+.

Resources