I'm using neo4j 2.1.7 Recently i was experimenting with Match queries, searching for nodes with several labels. And i found out, that generally query
Match (p:A:B) return count(p) as number
and
Match (p:B:A) return count(p) as number
works different time, extremely in cases when you have for example 2 millions of Nodes A and 0 of Nodes B.
So do labels order effects search time? Is this future is documented anywhere?
Neo4j internally maintains a labelscan store - that's basically a lookup to quickly get all nodes carrying a definied label A.
When doing a query like
MATCH (n:A:B) return count(n)
labelscanstore is used to find all A nodes and then they're filtered if those nodes carry label B as well. If n(A) >> n(B) it's way more efficient to do MATCH (n:B:A) instead since you look up only a few B nodes and filter those for A.
You can use PROFILE MATCH (n:A:B) return count(n) to see the query plan. For Neo4j <= 2.1.x you'll see a different query plan depending on the order of the labels you've specified.
Starting with Neo4j 2.2 (milestone M03 available as of writing this reply) there's a cost based Cypher optimizer. Now Cypher is aware of node statistics and they are used to optimize the query.
As an example I've used the following statements to create some test data:
create (:A:B);
with 1 as a foreach (x in range(0,1000000) | create (:A));
with 1 as a foreach (x in range(0,100) | create (:B));
We have now 100 B nodes, 1M A nodes and 1 AB node. In 2.2 the two statements:
MATCH (n:B:A) return count(n)
MATCH (n:A:B) return count(n)
result in the exact same query plan (and therefore in the same execution speed):
+------------------+---------------+------+--------+-------------+---------------+
| Operator | EstimatedRows | Rows | DbHits | Identifiers | Other |
+------------------+---------------+------+--------+-------------+---------------+
| EagerAggregation | 3 | 1 | 0 | count(n) | |
| Filter | 12 | 1 | 12 | n | hasLabel(n:A) |
| NodeByLabelScan | 12 | 12 | 13 | n | :B |
+------------------+---------------+------+--------+-------------+---------------+
Since there are only few B nodes, it's cheaper to scan for B's and filter for A. Smart Cypher, isn't it ;-)
I haven't found a question about this or found any comment in the Neo4j manual.
This query returns the start node:
start n = node:node_auto_index(subject_id='A1')
match (n)-[]->()<-[]-(n)
return distinct n.subject_id;
==> +--------------+
==> | n.subject_id |
==> +--------------+
==> | "A1" |
==> +--------------+
==> 1 row
but this query does not return the start node. Is there any way to make it return the start node along with with other matching nodes?
start n = node:node_auto_index(subject_id='A1')
match (n)-[]->()<-[]-(s)
where s.subject_id = 'A1'
return distinct s.subject_id;
==> +--------------+
==> | s.subject_id |
==> +--------------+
==> +--------------+
==> 0 row
Just to be sure I have the syntax right, the previous query works on nodes other than the start node:
start n = node:node_auto_index(subject_id='A1')
match (n)-[]->()<-[]-(s)
where s.subject_id = 'B2'
return distinct s.subject_id;
==> +--------------+
==> | s.subject_id |
==> +--------------+
==> | "B2" |
==> +--------------+
==> 1 row
I think you ran into identifier uniqueness in cypher paths.
In the same path two different identifiers (if not bound upfront) won't point to the same node.
In your fist example both sides of the path are bound (to the same node) and in the last example you have two different nodes, one bound to n the other bound to s.
In the second example you would end up with the same node being bound to n and s, which cypher does not do in a path.
The structure of my nodes are like this:
==> | Node[613]{name:"The Bigos",fs_id:"51a8e1a12fc6e7ef6d121077"}
==> | Node[614]{name:"Maceraperest",fs_id:"51bafb3d498ed54bd4c7fa8c"}
==> | Node[616]{name:"Viking",fs_id:"51bafe1de4b090ea9dceb20e"}
==> | Node[618]{name:"Metro Gross Market",fs_id:"51bb426c498e47af428ca013"}
When I try to create these nodes again, a php script I wrote checks on fs_id to find that if the node already exists or not. If it exists, it returns me the node and does not create a new one.
Now the problem is that even though it does not create new nodes, the console shows me that it did.
==> | Node[613]{name:"The Bigos",fs_id:"51a8e1a12fc6e7ef6d121077"}
==> | Node[613]{name:"The Bigos",fs_id:"51a8e1a12fc6e7ef6d121077"}
==> | Node[613]{name:"The Bigos",fs_id:"51a8e1a12fc6e7ef6d121077"}
==> | Node[614]{name:"Maceraperest",fs_id:"51bafb3d498ed54bd4c7fa8c"}
==> | Node[614]{name:"Maceraperest",fs_id:"51bafb3d498ed54bd4c7fa8c"}
==> | Node[614]{name:"Maceraperest",fs_id:"51bafb3d498ed54bd4c7fa8c"}
==> | Node[616]{name:"Viking",fs_id:"51bafe1de4b090ea9dceb20e"}
==> | Node[616]{name:"Viking",fs_id:"51bafe1de4b090ea9dceb20e"}
==> | Node[616]{name:"Viking",fs_id:"51bafe1de4b090ea9dceb20e"}
==> | Node[618]{name:"Metro Gross Market",fs_id:"51bb426c498e47af428ca013"}
==> | Node[618]{name:"Metro Gross Market",fs_id:"51bb426c498e47af428ca013"}
==> | Node[618]{name:"Metro Gross Market",fs_id:"51bb426c498e47af428ca013"}
Look at the node ids, they are same! And if I explore the node 618 for example in the data browser, it returns me a single node. Also the query
start n=node(618) return n;
also returns single row. But the query below returns multiple rows of same node id and the row count is increasing when I test the above nodes for existence.
start n=node(331) match n-[:BEEN]->(venues) return venues order by id(venues);
It might be nothing but I'm curious that if somehow Neo4j is eating extra memory for doing this or it is just something like caching system.
You probably just have multiple BEEN relationships, then each of those relationships yields another result row.
If you just have one row per venue do this:
start n=node(331)
match n-[:BEEN]->(venues)
return distinct venues;
to see the different relationships, use:
start n=node(331)
match n-[rel:BEEN]->(venues)
return venues,collect(rel);
I am getting confused with the way relationships are created through cypher. I was under the impression that _src-[:likes]- _dst creates a bidirectional relationship but looks like that is not the case as _src-[:likes]- _dst == _src<-[:likes]- _dst (example provided below)
Let's say I create the following graph but using the _src[:likes]-_dst notation ( using '-' as opposed to '->')
create
(_u1 {type:"User",`name`:"u1",userId:'u1' }) , ( _u2 {type:"User",`name`:"u2",userId:'u2'} ) , ( _u3 {type:"User",`name`:"u3",userId:'u3' }) , ( _u4 {type:"User",`name`:"u4",userId:'u4' }) , ( _u5 {type:"User",`name`:"u5",userId:'u5'}) , (_u6 {type:"User",`name`:"u6",userId:'u6'}),
(_f1 {type:"Item",`name`:"f1",itemId:'f1' }) , ( _f2 {type:"Item",`name`:"f2",itemId:'f2' }) , ( _f3 {type:"Item",`name`:"f3",itemId:'f3' }) , ( _f4 {type:"Item",`name`:"f4",itemId:'f4'}) , (_f5 {type:"Item",`name`:"f5",itemId:'f5'}),
_u1-[:`likes`{likeValue:3}]-_f1 , _u1-[:`likes` {likeValue:13}]-_f2 , _u1-[:`likes` {likeValue:1}]-_f3 , _u1-[:`likes` {likeValue:5}]-_f4,
_u2-[:`likes`{likeValue:7}]-_f1 , _u2-[:`likes` {likeValue:13}]-_f2 , _u2-[:`likes` {likeValue:1}]-_f3,
_u3-[:`likes`{likeValue:5}]-_f1 , _u3-[:`likes` {likeValue:8}]-_f2 , _u4-[:`likes`{likeValue:5}]-_f1
,_u5-[:`likes` {likeValue:8}]-_f2,_u6-[:`likes` {likeValue:8}]-_f2;
My impression was this way, you tell neo4j to created a bidirectional relationship. Now, look at the following query
neo4j-sh (?)$ start n=node(*) match n-[:likes]->m where has(n.type) and n.type='User' return n,m;
==> +-------+
==> | n | m |
==> +-------+
==> +-------+
==> 0 row
But the opposite works
neo4j-sh (?)$ start n=node(*) match n-[r]->m where has(n.type) and n.type="Item" return n,m limit 3;
==> +-----------------------------------------------------------------------------------------+
==> | n | m |
==> +-----------------------------------------------------------------------------------------+
==> | Node[7]{type:"Item",name:"f1",itemId:"f1"} | Node[4]{type:"User",name:"u4",userId:"u4"} |
==> | Node[7]{type:"Item",name:"f1",itemId:"f1"} | Node[3]{type:"User",name:"u3",userId:"u3"} |
==> | Node[7]{type:"Item",name:"f1",itemId:"f1"} | Node[2]{type:"User",name:"u2",userId:"u2"} |
==> +-----------------------------------------------------------------------------------------+
The question is why a-[:likes]-b = a<-[:likes]-b ?
Now I create two more nodes and a relationship as instructed in the Cypher manual
create (_u7 {type:"User",`name`:"u7",userId:'u7' });
create (_f7 {type:"Item",`name`:"f7",itemId:'f7' });
start src=node(*),dst=node(*) where src.name='u7' and dst.name='f7' create src-[:likes{likeValue:3}]-dst;
neo4j-sh (?)$ start n=node(*) match n-[r]->m where has(n.type) and n.type="User" return n,m limit 3;
==> +-------+
==> | n | m |
==> +-------+
==> +-------+
==> 0 row
same results, we can't query from User to Item but we can from Item to User
now if use the following method things change
create (_u {type:"User",`name`:"u8",userId:'u8' }) , ( _f {type:"User",`name`:"f8",userId:'f8'} ), _u-[:likes{likeValue:2}]-_f;
neo4j-sh (?)$ start n=node(*) match n-[r]->m where has(n.type) and n.type="User" return n,m limit 3;
==> +-------------------------------------------------------------------------------------------+
==> | n | m |
==> +-------------------------------------------------------------------------------------------+
==> | Node[19]{type:"User",name:"f8",userId:"f8"} | Node[18]{type:"User",name:"u8",userId:"u8"} |
==> +-------------------------------------------------------------------------------------------+
What is going on? These are my questions
1- Why create _src-[:likes]-_dst does not create a bidirectional relationship?
2- If it can't then why even allow _src-[:likes]-_dst for relationship creation? Why not force people to use directions when creating relationships?
3- What is the difference between the two methods I used to create relationships? (u7-f7 and u8-f8)
You can't create a bidirectional relationship using _src[:likes]-_dst
In Neo4j, a relation can and must only have a single direction. So to represent bidirectional, you have two options:
a) Create the relation with a direction but ignore when querying (_src[:likes]-_dst will match both directions when part of a match clause)
b) Create two relations- one in either direction
It appears that if you execute a create without a direction such as _src[:likes]-_dst, an incoming relation is created for _src
I'm discovering a new graph data model in Neo4j and I was wondering how to list all the possible node properties but not their value if possible.
For the relations, I found this very handy generic cypher query :
start n=node(*)
match n-[r]-m
return distinct type(r)
which return a useful list of properties you can start to use to query more specifically the graph:
==> +------------+
==> | type(r) |
==> +------------+
==> | "RATED" |
==> | "FRIEND" |
==> | "DIRECTED" |
==> | "ACTS_IN" |
==> +------------+
==> 4 rows
==> 0 ms
==>
Is there any function/expression that allows to do this but for the node properties ?
Thanks
type() does not return relationship properties, but the relationship type.
Both nodes and relationships can have properties, but only relationships can have a type.
To list all the properties of nodes in graph DB, you can try using following cypher:
match (n)
WITH distinct keys(n) as properties
UNWIND properties as property
return distinct property
Thanks,
Vishal