I have big number of EmpBase imprted (from csv file from postgreSQL )nodes like:
neo4j-sh (?)$ match (e:EmpBase) return e limit 10;
+-------------------------------------------------------------------------+
| e |
+-------------------------------------------------------------------------+
| Node[8992]{neo_eb_id:8993,neo_eb_name:"emp_no_8993",neo_eb_bossID:5503} |
| Node[8993]{neo_eb_id:8994,neo_eb_name:"emp_no_8994",neo_eb_bossID:8131} |
| Node[8994]{neo_eb_id:8995,neo_eb_name:"emp_no_8995",neo_eb_bossID:8624} |
What cypher query can create self relations on every node so that every node with neo_eb_bossid can have the relationship to the adequate node ?
In postgreSQl the data is about 1020MB table. In Neo4j, after import, it is 6.42 GiB as the console says.
In order to create the relationship based on neo_eb_bossID, you can match the nodes and run a foreach loop that will create the relationships to the related node :
MATCH (e:EmpBase) WITH collect(e) AS empbases
FOREACH (emp in empbases |
MERGE (target:EmpBase {neo_eb_id:emp.neo_eb_bossID}
MERGE (emp)-[:YOUR_RELATIONSHIP]->(target)
)
Concerning the self relationship, I've hard to understand what you exactly want.
Chris
Related
In neo4j my database consists of chains of nodes. For each distinct stucture/layout (does graph theory has a better word?), I want to count the number of chains. For example, the database consists of 9 nodes and 5 relationships as this:
(:a)->(:b)
(:b)->(:a)
(:a)->(:b)
(:a)->(:b)->(:b)
where (:a) is a node with label a. Properties on nodes and relationships are irrelevant.
The result of the counting should be:
------------------------
| Structure | n |
------------------------
| (:a)->(:b) | 2 |
| (:b)->(:a) | 1 |
| (:a)->(:b)->(:b) | 1 |
------------------------
Is there a query that can achieve this?
Appendix
Query to create test data:
create (:a)-[:r]->(:b), (:b)-[:r]->(:a), (:a)-[:r]->(:b), (:a)-[:r]->(:b)-[:r]->(:b)
EDIT:
Thanks for the clarification.
We can get the equivalent of what you want, a capture of the path pattern using the labels present:
MATCH path = (start)-[*]->(end)
WHERE NOT ()-->(start) and NOT (end)-->()
RETURN [node in nodes(path) | labels(node)[0]] as structure, count(path) as n
This will give you a list of the labels of the nodes (the first label present for each...remember that nodes can be multi-labeled, which may throw off your results).
As for getting it into that exact format in your example, that's a different thing. We could do this with some text functions in APOC Procedures, specifically apoc.text.join().
We would need to first add formatting around the extraction of the first label to add the prefixed : as well as the parenthesis. Then we could use apoc.text.join() to get a string where the nodes are joined by your desired '->' symbol:
MATCH path = (start)-[*]->(end)
WHERE NOT ()-->(start) and NOT (end)-->()
WITH [node in nodes(path) | labels(node)[0]] as structure, count(path) as n
RETURN apoc.text.join([label in structure | '(:' + label + ')'], '->') as structure, n
i have a csv file containing activities (process graph) :
startActivityId,Name,endActivityId
1,A,2
2,B,3
3,C,4
4,D,5
so that it will look like this : A->B->C->D
i imported the csv file successfully into neo4j server : using this Cypher query :
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:path/graph/activity.csv" AS row
CREATE (:Activity {startactivityId:row.startActivityId, Name: row.Name, endActivityId: row.endActivityId});
i then created an index on startactivityId :
CREATE INDEX ON :activity(startActivityId);
then i want to create the relationships between these nodes, so tried this cypher query :
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:path/graph/activity.csv" AS row
MATCH (startActivity:Activity {startActivityId: row.startActivityId})
MATCH (endActivity:Activity {startActivityId: row.endActivityId})
MERGE (startActivity)-[:LINKS_TO]->(endActivity);`
but no relationships created, nothing happens
i'm sure i missed something cause i'm new to cypher but i can't figure it out.
any ideas ?
I copied your updated csv (and removed the whitespace at the head of the first column) and ran your queries.
neo4j-sh (?)$ USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM "file:///Users/jonatan/src/doc/stackexchange/32225817.pdc" as row CREATE (:Activity {startActivityId:row.startActivityId, name:row.Name, endActivityId:row.endActivityId});
+-------------------+
| No data returned. |
+-------------------+
Nodes created: 4
Properties set: 12
Labels added: 4
115 ms
neo4j-sh (?)$ USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM "file:///Users/jonatan/src/doc/stackexchange/32225817.pdc" as row MATCH (s:Activity {startActivityId:row.startActivityId}) MATCH (e:Activity {startActivityId:row.endActivityId}) MERGE (s)-[r:LINKS_TO]->(e) RETURN r;
+-------------------+
| r |
+-------------------+
| :LINKS_TO[2084]{} |
| :LINKS_TO[2085]{} |
| :LINKS_TO[2086]{} |
+-------------------+
3 rows
Relationships created: 3
178 ms
Three relationships created. To confirm that they are the right relationships I match and return the path (:Activity)-[:LINKS_TO]->().
neo4j-sh (?)$ MATCH p=(:Activity)-[:LINKS_TO]->() RETURN p;
+-------------------------------------------------------------------------------------------------------------------------------------------+
| p |
+-------------------------------------------------------------------------------------------------------------------------------------------+
| [Node[1415]{name:"A",startActivityId:"1",endActivityId:"2"},:LINKS_TO[2084]{},Node[1416]{name:"B",startActivityId:"2",endActivityId:"3"}] |
| [Node[1416]{name:"B",startActivityId:"2",endActivityId:"3"},:LINKS_TO[2085]{},Node[1417]{name:"C",startActivityId:"3",endActivityId:"4"}] |
| [Node[1417]{name:"C",startActivityId:"3",endActivityId:"4"},:LINKS_TO[2086]{},Node[1418]{name:"D",startActivityId:"4",endActivityId:"5"}] |
+-------------------------------------------------------------------------------------------------------------------------------------------+
3 rows
49 ms
neo4j-sh (?)$
It looks OK to me, not sure what's not working for you.
What does MATCH p=(:Activity)-[r]->() RETURN p; tell you?
I'm using neo4j 2.1.7 Recently i was experimenting with Match queries, searching for nodes with several labels. And i found out, that generally query
Match (p:A:B) return count(p) as number
and
Match (p:B:A) return count(p) as number
works different time, extremely in cases when you have for example 2 millions of Nodes A and 0 of Nodes B.
So do labels order effects search time? Is this future is documented anywhere?
Neo4j internally maintains a labelscan store - that's basically a lookup to quickly get all nodes carrying a definied label A.
When doing a query like
MATCH (n:A:B) return count(n)
labelscanstore is used to find all A nodes and then they're filtered if those nodes carry label B as well. If n(A) >> n(B) it's way more efficient to do MATCH (n:B:A) instead since you look up only a few B nodes and filter those for A.
You can use PROFILE MATCH (n:A:B) return count(n) to see the query plan. For Neo4j <= 2.1.x you'll see a different query plan depending on the order of the labels you've specified.
Starting with Neo4j 2.2 (milestone M03 available as of writing this reply) there's a cost based Cypher optimizer. Now Cypher is aware of node statistics and they are used to optimize the query.
As an example I've used the following statements to create some test data:
create (:A:B);
with 1 as a foreach (x in range(0,1000000) | create (:A));
with 1 as a foreach (x in range(0,100) | create (:B));
We have now 100 B nodes, 1M A nodes and 1 AB node. In 2.2 the two statements:
MATCH (n:B:A) return count(n)
MATCH (n:A:B) return count(n)
result in the exact same query plan (and therefore in the same execution speed):
+------------------+---------------+------+--------+-------------+---------------+
| Operator | EstimatedRows | Rows | DbHits | Identifiers | Other |
+------------------+---------------+------+--------+-------------+---------------+
| EagerAggregation | 3 | 1 | 0 | count(n) | |
| Filter | 12 | 1 | 12 | n | hasLabel(n:A) |
| NodeByLabelScan | 12 | 12 | 13 | n | :B |
+------------------+---------------+------+--------+-------------+---------------+
Since there are only few B nodes, it's cheaper to scan for B's and filter for A. Smart Cypher, isn't it ;-)
Could anyone please tell me the difference between Merge and Create in Cypher Querying.
How Neo4j stores data physically?
Thanks in advance..
CREATE does just what it says. It creates, and if that means creating duplicates, well then it creates. MERGE does the same thing as create, but also checks to see if a node already exists with the properties you specify. If it does, then it doesn't create. This helps avoid duplicates. Here's an example: I use CREATE twice to create a person with the same name.
neo4j-sh (?)$ create (p:Person {name: "Bob"});
+-------------------+
| No data returned. |
+-------------------+
Nodes created: 1
Properties set: 1
Labels added: 1
9 ms
neo4j-sh (?)$ create (p:Person {name: "Bob"});
+-------------------+
| No data returned. |
+-------------------+
Nodes created: 1
Properties set: 1
Labels added: 1
5 ms
So now when we query, there are two Bob's.
neo4j-sh (?)$ match (p:Person {name:"Bob"}) return p;
+--------------------------+
| p |
+--------------------------+
| Node[222124]{name:"Bob"} |
| Node[222125]{name:"Bob"} |
+--------------------------+
2 rows
46 ms
Let's MERGE in another Bob and see what happens.
neo4j-sh (?)$ merge (p:Person {name:"Bob"});
+--------------------------------------------+
| No data returned, and nothing was changed. |
+--------------------------------------------+
2 ms
neo4j-sh (?)$ match (p:Person {name:"Bob"}) return p;
+--------------------------+
| p |
+--------------------------+
| Node[222124]{name:"Bob"} |
| Node[222125]{name:"Bob"} |
+--------------------------+
2 rows
11 ms
Bob already existed, so MERGE did nothing here. Querying again, the same two Bobs are present. Had there been no Bobs in the database, MERGE would have done the same thing as CREATE.
I'm discovering a new graph data model in Neo4j and I was wondering how to list all the possible node properties but not their value if possible.
For the relations, I found this very handy generic cypher query :
start n=node(*)
match n-[r]-m
return distinct type(r)
which return a useful list of properties you can start to use to query more specifically the graph:
==> +------------+
==> | type(r) |
==> +------------+
==> | "RATED" |
==> | "FRIEND" |
==> | "DIRECTED" |
==> | "ACTS_IN" |
==> +------------+
==> 4 rows
==> 0 ms
==>
Is there any function/expression that allows to do this but for the node properties ?
Thanks
type() does not return relationship properties, but the relationship type.
Both nodes and relationships can have properties, but only relationships can have a type.
To list all the properties of nodes in graph DB, you can try using following cypher:
match (n)
WITH distinct keys(n) as properties
UNWIND properties as property
return distinct property
Thanks,
Vishal