The structure of my nodes are like this:
==> | Node[613]{name:"The Bigos",fs_id:"51a8e1a12fc6e7ef6d121077"}
==> | Node[614]{name:"Maceraperest",fs_id:"51bafb3d498ed54bd4c7fa8c"}
==> | Node[616]{name:"Viking",fs_id:"51bafe1de4b090ea9dceb20e"}
==> | Node[618]{name:"Metro Gross Market",fs_id:"51bb426c498e47af428ca013"}
When I try to create these nodes again, a php script I wrote checks on fs_id to find that if the node already exists or not. If it exists, it returns me the node and does not create a new one.
Now the problem is that even though it does not create new nodes, the console shows me that it did.
==> | Node[613]{name:"The Bigos",fs_id:"51a8e1a12fc6e7ef6d121077"}
==> | Node[613]{name:"The Bigos",fs_id:"51a8e1a12fc6e7ef6d121077"}
==> | Node[613]{name:"The Bigos",fs_id:"51a8e1a12fc6e7ef6d121077"}
==> | Node[614]{name:"Maceraperest",fs_id:"51bafb3d498ed54bd4c7fa8c"}
==> | Node[614]{name:"Maceraperest",fs_id:"51bafb3d498ed54bd4c7fa8c"}
==> | Node[614]{name:"Maceraperest",fs_id:"51bafb3d498ed54bd4c7fa8c"}
==> | Node[616]{name:"Viking",fs_id:"51bafe1de4b090ea9dceb20e"}
==> | Node[616]{name:"Viking",fs_id:"51bafe1de4b090ea9dceb20e"}
==> | Node[616]{name:"Viking",fs_id:"51bafe1de4b090ea9dceb20e"}
==> | Node[618]{name:"Metro Gross Market",fs_id:"51bb426c498e47af428ca013"}
==> | Node[618]{name:"Metro Gross Market",fs_id:"51bb426c498e47af428ca013"}
==> | Node[618]{name:"Metro Gross Market",fs_id:"51bb426c498e47af428ca013"}
Look at the node ids, they are same! And if I explore the node 618 for example in the data browser, it returns me a single node. Also the query
start n=node(618) return n;
also returns single row. But the query below returns multiple rows of same node id and the row count is increasing when I test the above nodes for existence.
start n=node(331) match n-[:BEEN]->(venues) return venues order by id(venues);
It might be nothing but I'm curious that if somehow Neo4j is eating extra memory for doing this or it is just something like caching system.
You probably just have multiple BEEN relationships, then each of those relationships yields another result row.
If you just have one row per venue do this:
start n=node(331)
match n-[:BEEN]->(venues)
return distinct venues;
to see the different relationships, use:
start n=node(331)
match n-[rel:BEEN]->(venues)
return venues,collect(rel);
Related
I'm getting started with Neo4J 2.0.1 and I'm already running into performance problems that make me think that my approach is wrong. I have a single node type so far (all with the label NeoPerson) and one type of relationship (all with the label NeoWeight). In my test setup, there are about 100,000 nodes and each node has between 0 and 300 relationships to other nodes. There is a Neo4j2.0-style index on NeoPerson's only field, called profile_id (eg CREATE INDEX ON :NeoPerson(profile_id)). Looking up a NeoPerson by profile_id is reasonably fast:
neo4j-sh (?)$ match (n:NeoPerson {profile_id:38}) return n;
+----------------------------+
| n |
+----------------------------+
| Node[23840]{profile_id:38} |
+----------------------------+
1 row
45 ms
However, once I throw relationships into the mix, it gets quite slow.
neo4j-sh (?)$ match (n:NeoPerson {profile_id:38})-[e:NeoWeight]->() return n, e;
+----------------------------------------------------------------------------+
| n | e |
+----------------------------------------------------------------------------+
| Node[23840]{profile_id:38} | :NeoWeight[8178324]{value:384} |
| Node[23840]{profile_id:38} | :NeoWeight[8022460]{value:502} |
...
| Node[23840]{profile_id:38} | :NeoWeight[54914]{} |
+----------------------------------------------------------------------------+
244 rows
2409 ms
My understanding was that traversing relationships from a single node should be quite efficient (isn't that the point of using a graph database?), so why is it taking over 2 seconds for such a simple query on a small data set? I didn't see a way to add an index on a relationship whose keys are the source and/or destination nodes.
People use Neo4j in production without issues. If they have the requirement that the first user query has to return in a few ms, they warm up the caches after server start. E.g. by running their most important use-case queries upfront.
It takes some time to load the nodes and rels from disk. Esp. if the relationships (and their properties) of the single node are distributed across the relationship store file and are loaded from a spinning disk.
For the first query it also takes a bit longer as its query plan has to be built and compiled.
That's why in production you usually use parameters to allow query caching.
What is the use-case you're trying to address?
I haven't found a question about this or found any comment in the Neo4j manual.
This query returns the start node:
start n = node:node_auto_index(subject_id='A1')
match (n)-[]->()<-[]-(n)
return distinct n.subject_id;
==> +--------------+
==> | n.subject_id |
==> +--------------+
==> | "A1" |
==> +--------------+
==> 1 row
but this query does not return the start node. Is there any way to make it return the start node along with with other matching nodes?
start n = node:node_auto_index(subject_id='A1')
match (n)-[]->()<-[]-(s)
where s.subject_id = 'A1'
return distinct s.subject_id;
==> +--------------+
==> | s.subject_id |
==> +--------------+
==> +--------------+
==> 0 row
Just to be sure I have the syntax right, the previous query works on nodes other than the start node:
start n = node:node_auto_index(subject_id='A1')
match (n)-[]->()<-[]-(s)
where s.subject_id = 'B2'
return distinct s.subject_id;
==> +--------------+
==> | s.subject_id |
==> +--------------+
==> | "B2" |
==> +--------------+
==> 1 row
I think you ran into identifier uniqueness in cypher paths.
In the same path two different identifiers (if not bound upfront) won't point to the same node.
In your fist example both sides of the path are bound (to the same node) and in the last example you have two different nodes, one bound to n the other bound to s.
In the second example you would end up with the same node being bound to n and s, which cypher does not do in a path.
This is the number of nodes before I create the new one:
neo4j-sh (0)$ match n return n;
==> +------------------------------------------------------------------------+
==> | n |
==> +------------------------------------------------------------------------+
==> | Node[0]{} |
==> | Node[1]{address:"rioeduardo92#gmail.com",comment:"home",person_id:"1"} |
==> | Node[2]{address:"rioeduardo92#yahoo.com",comment:"work",person_id:"1"} |
==> | Node[3]{person_id:"1",name:"Rio"} |
==> +------------------------------------------------------------------------+
after I created the new one, the node that I just created is started from node number 300:
neo4j-sh (0)$ create (n:lolo{color:'blue'}) return n;
==> +-------------------------+
==> | n |
==> +-------------------------+
==> | Node[300]{color:"blue"} |
==> +-------------------------+
Thank you
It's not the number of nodes increasing but the internal node id. If you created a lot of nodes and deleted them for example, then your new node might have taken up the next highest id (300) because the old id's haven't been recycled yet.
Which is why you should never count on the internal node ID to serve as an identifier/key on your nodes.
start n=node(*) return count(n)
should give you the true number of nodes in your graph
Given a Neo4J Node with an array property, how do I create a Cypher query to return only the node(s) that match an array literal?
Using the console I created a node with the array property called "list":
neo4j-sh (0)$ create n = {list: [1,2,3]};
==> +-------------------+
==> | No data returned. |
==> +-------------------+
==> Nodes created: 1
==> Properties set: 1
==> 83 ms
neo4j-sh (0)$ start n=node(1) return n;
==> +-----------------------+
==> | n |
==> +-----------------------+
==> | Node[1]{list:[1,2,3]} |
==> +-----------------------+
==> 1 row
==> 1 ms
However, my query does not return the Node that was just created given a WHERE clause that matches an array literal:
neo4j-sh (0)$ start n=node(1) where n.list=[1,2,3] return n;
==> +---+
==> | n |
==> +---+
==> +---+
==> 0 row
==> 0 ms
It's entirely possible I'm mis-using Cypher. Any tips on doing exact array property matching in Cypher would be helpful.
The console is always running the latest SNAPSHOT builds of Neoj4. the version refers to the Cypher Syntax parswer, we will point that out more clearly :)
Now, there has been some fixing around the Array handling in Cypher, see https://github.com/neo4j/community/pull/815 and https://github.com/neo4j/community/issues/818 which problably are the ones that make the console work. This has been merged in after 1.8.M07, so in order to get it work locally, please download one of the latest 1.8.-SNAPSHOT, build it from GITHUB or wait for 1.8.M08 which is due very soon.
/peter
Which version of Neo4j are you using?
Your same code works for me in 1.8M07.
http://console.neo4j.org/?id=p9cy6l
Update:
I get the same result (no results) in a local install via the web client. Maybe it's a web client issue?
I have a graph like this:
(2)<-[0:CHILD]-(1)-[1:CHILD]->(3)
In words: Node 1,2 and 3 (all with names); Edges 0 and 1
I write the following cypher-query:
START nodes = node(1,2,3), relationship = relationship(0,1)
RETURN nodes, relationship
and got as a result:
==> +-----------------------------------------------+
==> | nodes | relationship |
==> +-----------------------------------------------+
==> | Node[1]{name->"Risikogruppe2"} | :CHILD[0] {} |
==> | Node[1]{name->"Risikogruppe2"} | :CHILD[1] {} |
==> | Node[2]{name->"Beruf 1"} | :CHILD[0] {} |
==> | Node[2]{name->"Beruf 1"} | :CHILD[1] {} |
==> | Node[3]{name->"Beruf 2"} | :CHILD[0] {} |
==> | Node[3]{name->"Beruf 2"} | :CHILD[1] {} |
==> +-----------------------------------------------+
==> 6 rows, 0 ms
now my question:
why I became all nodes twice and relationships three time? I just want to get all of it one time.
thanks for your time ^^
The way Cypher works is very similar to SQL. When you create your variables in your START clause, you're sort of doing a from nodes, relationships in SQL (tables). The reason you're getting a cartesian product of all of the possible values for the two, is because you're not doing any sort of match or where to filter them, so it's basically like:
select *
from nodes, relationships
Where you forgot to put the foreign key relationship between the tables.
In Cypher, you do this by doing a match, usually:
start n=node(1,2,3), r=relationship(0,1)
match n-[r]-m // find where the n nodes and the r relationships point (to m)
return *
But since you have no match, you get a cartesian product.
You should only see the nodes and relationships once, unless you do some matching.
Tried to reproduce your problem, but I haven't been able to.
http://tinyurl.com/cobd8oq
Is it possible for you to create an console.neo4j.org example of your problem?
Thanks,
Andrés