I'm a newbie in Cypher and Graph DB...I am trying to display a kind of tree from my DB, with a source node, and the downstream leaves. I would like to filter on only the leaves that have only certain labels and where all the relationships have a specific constraint on their properties.
Root is a Column1 and I want to arrive on a Column1 as well, but there can be some Column2 in the path
I wrote this:
MATCH p=(:Column1{name:'Root'})-[*1..7]-(:Column1)
WHERE
all(n IN nodes(p)
WHERE all(l in labels(n)
WHERE l IN ['Column1', 'Column2']
)
AND n.deleted='0'
)
AND all(r IN relationships(p)
WHERE r.deleted='0')
RETURN p
When launched in Neo4J browser, the resulting graph is wrong and includes some relationships where deleted='1'. However, if I export the CSV table and look for deleted='1' (or even just 1), there are no result.
So it seems like the query is correct, but somehow, the graphical display will show the relationships where deleted=1.
Is it a bug, is it the query?
I also tried
MATCH (:Column1{name:'Root'})-[*1..7{deleted:'0'}]-(t)
WHERE t:Column1 or t:Column2
RETURN *
but in my DB, it takes forever to complete, compared to the previous query.
Solved it!
I had to uncheck the "Connect Result nodes" check box in the config tab of the browser. Otherwise, it queries for all the relationships between the displayed node!
Try doing
MATCH (:Column1 {name: 'Root'})-[*1..7 {deleted: '0'}]-(t)
WHERE labels(t)[0] IN ['Column1', 'Column2']
RETURN *
Depending on the amount of nodes Column1 has, and how many Column1 exists in your db, I wouldn't recommend this approach without a LIMIT 50 or whatever limit you want. If you do
PROFILE
MATCH (:Column1 {name: 'Root'})-[*1..7 {deleted: '0'}]-(t)
WHERE labels(t)[0] IN ['Column1', 'Column2']
RETURN *
You'll see just how many db hits this makes which is really resource consuming.
Related
match(m:master_node:Application)-[r]-(k:master_node:Server)-[r1]-(n:master_node)
where (m.name contains '' and (n:master_node:DeploymentUnit or n:master_node:Schema))
return distinct m.name,n.name
Hi,I am trying to get total number of records for the above query.How I change the query using count function to get the record count directly.
Thanks in advance
The following query uses the aggregating funtion COUNT. Distinct pairs of m.name, n.name values are used as the "grouping keys".
MATCH (m:master_node:Application)--(:master_node:Server)--(n:master_node)
WHERE EXISTS(m.name) AND (n:DeploymentUnit OR n:Schema)
RETURN m.name, n.name, COUNT(*) AS cnt
I assume that m.name contains '' in your query was an attempt to test for the existence of m.name. This query uses the EXISTS() function to test that more efficiently.
[UPDATE]
To determine the number of distinct n and m pairs in the DB (instead of the number of times each pair appears in the DB):
MATCH (m:master_node:Application)--(:master_node:Server)--(n:master_node)
WHERE EXISTS(m.name) AND (n:DeploymentUnit OR n:Schema)
WITH DISTINCT m.name AS n1, n.name AS n2
RETURN COUNT(*) AS cnt
Some things to consider for speeding up the query even further:
Remove unnecessary label tests from the MATCH pattern. For example, can we omit the master_node label test from any nodes? In fact, can we omit all label testing for any nodes without affecting the validity of the result? (You will likely need a label on at least one node, though, to avoid scanning all nodes when kicking off the query.)
Can you add a direction to each relationship (to avoid having to traverse relationships in both directions)?
Specify the relationship types in the MATCH pattern. This will filter out unwanted paths earlier. Once you do so, you may also be able to remove some node labels from the pattern as long as you can still get the same result.
Use the PROFILE clause to evaluate the number of DB hits needed by different Cypher queries.
You can find examples of how to use count in the Neo4j docs here
In your case the first example where:
count(*)
Is used to return a count of each returned item should work.
I have some questions regarding Neo4j's Query profiling.
Consider below simple Cypher query:
PROFILE
MATCH (n:Consumer {mobileNumber: "yyyyyyyyy"}),
(m:Consumer {mobileNumber: "xxxxxxxxxxx"})
WITH n,m
MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;
and output is:
So according to Neo4j's Documentation:
3.7.2.2. Expand Into
When both the start and end node have already been found, expand-into
is used to find all connecting relationships between the two nodes.
Query.
MATCH (p:Person { name: 'me' })-[:FRIENDS_WITH]->(fof)-->(p) RETURN
> fof
So here in the above query (in my case), first of all, it should find both the StartNode & the EndNode before finding any relationships. But unfortunately, it's just finding the StartNode, and then going to expand all connected :HAS_CONTACT relationships, which results in not using "Expand Into" operator. Why does this work this way? There is only one :HAS_CONTACT relationship between the two nodes. There is a Unique Index constraint on :Consumer{mobileNumber}. Why does the above query expand all 7 relationships?
Another question is about the Filter operator: why does it requires 12 db hits although all nodes/ relationships are already retrieved? Why does this operation require 12 db calls for just 6 rows?
Edited
This is the complete Graph I am querying:
Also I have tested different versions of same above query, but the same Query Profile result is returned:
1
PROFILE
MATCH (n:Consumer{mobileNumber: "yyyyyyyyy"})
MATCH (m:Consumer{mobileNumber: "xxxxxxxxxxx"})
WITH n,m
MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;
2
PROFILE
MATCH (n:Consumer{mobileNumber: "yyyyyyyyy"}), (m:Consumer{mobileNumber: "xxxxxxxxxxx"})
WITH n,m
MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;
3
PROFILE
MATCH (n:Consumer{mobileNumber: "yyyyyyyyy"})
WITH n
MATCH (n)-[r:HAS_CONTACT]->(m:Consumer{mobileNumber: "xxxxxxxxxxx"})
RETURN n,m,r;
The query you are executing and the example provided in the Neo4j documentation for Expand Into are not the same. The example query starts and ends at the same node.
If you want the planner to find both nodes first and see if there is a relationship then you could use shortestPath with a length of 1 to minimize the DB hits.
PROFILE
MATCH (n:Consumer {mobileNumber: "yyyyyyyyy"}),
(m:Consumer {mobileNumber: "xxxxxxxxxxx"})
WITH n,m
MATCH Path=shortestPath((n)-[r:HAS_CONTACT*1]->(m))
RETURN n,m,r;
Why does this do this?
It appears that this behaviour relates to how the query planner performs a database search in response to your cypher query. Cypher provides an interface to search and perform operations in the graph (alternatives include the Java API, etc.), queries are handled by the query planner and then turned into graph operations by neo4j's internals. It make sense that the query planner will find what is likely to be the most efficient way to search the graph (hence why we love neo), and so just because a cypher query is written one way, it won't necessarily search the graph in the way we imagine it will in our head.
The documentation on this seemed a little sparse (or, rather I couldn't find it properly), any links or further explanations would be much appreciated.
Examining your query, I think you're trying to say this:
"Find two nodes each with a :Consumer label, n and m, with contact numbers x and y respectively, using the mobileNumber index. If you find them, try and find a -[:HAS_CONTACT]-> relationship from n to m. If you find the relationship, return both nodes and the relationship, else return nothing."
Running this query in this way requires a cartesian product to be created (i.e., a little table of all combinations of n and m - in this case only one row - but for other queries potentially many more), and then relationships to be searched for between each of these rows.
Rather than doing that, since a MATCH clause must be met in order to continue with the query, neo knows that the two nodes n and m must be connected via the -[:HAS_CONTACT]-> relationship if the query is to return anything. Thus, the most efficient way to run the query (and avoid the cartesian product) is as below, which is what your query can be simplified to.
"Find a node n with the :Consumer label, and value x for the index mobileNumber, which is connected via a -[:HAS_CONTACT]-> relationshop to a node m with the :Consumer label, and value y for its proprerty mobileNumber. Return both nodes and the relationship, else return nothing."
So, rather than perform two index searches, a cartesian product and a set of expand into operations, neo performs only one index search, an expand all, and a filter.
You can see the result of this simplification by the query planner through the presence of AUTOSTRING parameters in your query profile.
How to Change Query to Implement Search as Desired
If you want to change the query so that it must use an expand into relationship, make the requirement for the relationship optional, or use explicitly iterative execution. Both these queries below will produce the initially expected query profiles.
Optional example:
PROFILE
MATCH (n:Consumer{mobileNumber: "xxx"})
MATCH (m:Consumer{mobileNumber: "yyy"})
WITH n,m
OPTIONAL MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;
Iterative example:
PROFILE
MATCH (n1:Consumer{mobileNumber: "xxx"})
MATCH (m:Consumer{mobileNumber: "yyy"})
UNWIND COLLECT(n1) AS n
MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;
I have currently visualised a graph between myself and a number of other people.
My Current query is:
MATCH (p)-[:emailed]->(m)
WITH p,count(m) as rels, collect(m) as Contact
WHERE rels > 2
RETURN p,Contact, rels
It creates a pretty complex graph as per image below:
Messy Graph
You can manually remove them by directly clicking on them as per below:
Manually hide node from visualisation
Which then results in a very different looking graph.
Q. How do I change my query to automatically show the graph visualisation without showing the nodes that I wish to remove? (i.e by editing the query, so I dont have to manually remove each one)
By doing either
A) Adding a list of the specific Node ID's in the query to ignore, OR
B) (Ideally) Exclude all nodes that meet a criteria against the node Property
In this case: Ignore [Slug: "myname" ] where includes 'myname'
MATCH (p)-[:emailed]->(m)
WITH p,count(m) as rels, collect(m) as Contact
WHERE rels > 2 AND NOT WHERE p.slug Contains 'Mahdi'
RETURN p,Contact, rels
Thanks for any help!
I would change it slightly. If you collect the actual :emailed relationships rather than just counting the node they are connect to you can use them in your result set. Then if you turn off autocomplete as JeromeB suggests above then you will actually see some relationship. If you turn off autocomplete in your current query there will only be nodes and no relationships which I don't think you are after (unless of course you are).
You could also check to make sure that the p.slug attribute exists when testing for CONTAINS otherwise if the attribute does not exist you will not generate any results for that row.
MATCH (p:User)-[r:emailed]->(m:User)
WITH p, COLLECT(r) as rels, COLLECT(m) as contact
WHERE (NOT p.slug CONTAINS 'Mahdi' OR NOT EXISTS(p.slug))
AND size(rels) > 2
RETURN p, contact, rels
I would also add a label to the nodes in the match and an index on the slug property.
The autocomplete is 'Connect result nodes' in the Gear tab.
I had a query of this kind, which would basically find a specific node "Statement", find all the nodes connected to it with an :OF relation, find all their connections to one another, as well all the relations of the node "Statement" to other nodes and the node "Statement" itself:
MATCH (s:Statement{uid:"e63cf470-ade4-11e3-bc66-2d7f9b2c7878"}),
c1-[:OF]->s<-[:OF]-c2, c1-[to:TO]->c2
WITH DISTINCT to, c1, c2, s
MATCH c1-[by:BY]->u, c2-[at:AT]->ctx
WHERE to.statement="e63cf470-ade4-11e3-bc66-2d7f9b2c7878"
AND by.statement="e63cf470-ade4-11e3-bc66-2d7f9b2c7878"
AND at.statement="e63cf470-ade4-11e3-bc66-2d7f9b2c7878"
DELETE s,rel,to,by,at;
This worked OK for when there was 3 nodes connected to the node "Statement", but when there's a 100, it crashes the database.
I tried playing around passing different nodes and relationships with a WITH, but it didn't help.
The closest to a solution that I could get was to set up automatic indexing on relationship properties and then execute the deletion with two queries:
MATCH (s:Statement{uid:"e63cf470-ade4-11e3-bc66-2d7f9b2c7878"}),
s-[by:BY]->u, s-[in:IN]->ctx, c-[of:OF]->s DELETE by,in,of,s;
START rel=relationship:relationship_auto_index
(statement="e63cf470-ade4-11e3-bc66-2d7f9b2c7878")
DELETE rel;
2 Questions:
1) I know that the first query took too long because there were too many iterations. How to avoid that?
2) Do you know how to combine the two faster queries above into one so that it works fast and preferably without using the relationship index and START clause?
Thank you!
For this statement
You must not separate the condition on to from the match. Then Cypher will find all matches first and only filter after it is done with that.
MATCH (s:Statement{uid:"e63cf470-ade4-11e3-bc66-2d7f9b2c7878"}),
c1-[:OF]->s<-[:OF]-c2, c1-[to:TO]->c2
WHERE to.statement="e63cf470-ade4-11e3-bc66-2d7f9b2c7878"
WITH DISTINCT to, c1, c2, s
MATCH c1-[by:BY]->u, c2-[at:AT]->ctx
WHERE by.statement="e63cf470-ade4-11e3-bc66-2d7f9b2c7878"
AND at.statement="e63cf470-ade4-11e3-bc66-2d7f9b2c7878"
DELETE s,rel,to,by,at;
Also I'm not sure if this c1-[:OF]->s<-[:OF]-c2, c1-[to:TO]->c2 doesn't span up a cross-product.
Just do this:
MATCH (s:Statement{uid:"e63cf470-ade4-11e3-bc66-2d7f9b2c7878"}),
c1-[:OF]->s<-[:OF]-c2, c1-[to:TO]->c2
WHERE to.statement="e63cf470-ade4-11e3-bc66-2d7f9b2c7878"
RETURN count(*),count(distinct c1), count(distinct c2), count(distinct to)
to see some numbers.
You also don't seem to use (u) and (ctx) in the result? So might be an option to convert that into a condition. (Have to try), then you might even be able to leave of the with (if the cardinality with distinct is not much smaller than without.
....
WHERE c2-[:AT {at.statement:"e63cf470-ade4-11e3-bc66-2d7f9b2c7878"}]->()
AND c1-[:BY {statement:"e63cf470-ade4-11e3-bc66-2d7f9b2c7878"}]->()
DELETE s,rel,to,b
HTH
Would love to get the dataset to try it out.
I have the following query in Neo4J Cypher 2.0:
MATCH (u:User{uid:'1111111'}), (c1:Concept), (c2:Concept),
c1-[:BY]->u, c2-[:BY]->u, c1-[rel:TO]->c2
WITH c1,c2,rel
MATCH c1-[:AT]->ctx, c2-[:AT]-ctx
WHERE ctx.uid = rel.context
RETURN c1.uid AS source_id, c1.name AS source_name,
c2.uid AS target_id, c2.name AS target_name,
rel.uid AS edge_id,
rel.context AS context_id, ctx.name AS context_name;
What it does is that it looks for all the nodes of the Concept label (c1 and c2) connected to User node u, finds their (c1 to c2) connections to one another (rel), then it tries to find which different contexts (ctx) those concept nodes (c1 and c2) appear in, but only those, whose uid matches the uid of the .context property of the relationships rel (rel.context) and then returns them in a table, where we have the source id and name, the target id and name, the connection id, as well as the .context id property of that relation and the name of the context with that id.
So all works fine, but the question is: WHY?
I mean how does Cypher matches so neatly the right ctx.uid to the right rel.context to know that it should be inserted exactly at the right place of the results table?
Can somebody explain me the magic behind this?
Or am I completely wrong and just getting messy results?
Thank you!
It creates a pattern-graph that represents your combined match patterns. And then it uses indexes to find bound nodes that it starts to apply the pattern graph to and returns a result row for every match found.
While applying the pattern graph it uses your WHERE conditions to filter out paths that you don't want eagerly as early as possible.
If it can't find bound nodes it has to go over all nodes of a label (like :Concept) or over all nodes of the graph (if you haven't specify any label or lookup condition).