Getting node/edge creation/removal statistics - neo4j

I am running a Python (3.8) script which uses pip library neo4j 4.0.0 to interact with a community edition neo4j 4.1.1 server.
I am running many queries which use MERGE to update or create if nodes and relationships don't exist.
So far this is working well, as the database is getting the data as intended.
From my script's side, I would like however to know how many nodes and edges were created in each query.
The issue is that in these Cypher queries that I send to the database from my script, I call MERGE more than one time and also use APOC procedures (though the APOC ones are just for updating labels, they don't create entities).
Here is an example of a query:
comment_field_names: List[str] = list(threads[0].keys())
cypher_single_properties: List[str] = []
for c in comment_field_names:
cypher_single_properties.append("trd.{0} = {1}.{0}".format(c, "trd_var"))
cypher_property_string: str = ", ".join(cypher_single_properties)
with driver.session() as session:
crt_stmt = ("UNWIND $threads AS trd_var "
"MERGE (trd:Thread {thread_id:trd_var.thread_id}) "
"ON CREATE SET PY_REPLACE "
"ON MATCH SET PY_REPLACE "
"WITH trd AS trd "
"CALL apoc.create.addLabels(trd, [\"Comment\"]) YIELD node "
"WITH trd as trd "
"MERGE (trd)-[r:THREAD_IN]->(Domain {domain_id:trd.domain_id}) "
"ON CREATE SET r.created_utc = trd.created_utc "
"ON MATCH SET r.created_utc = trd.created_utc "
"RETURN distinct 'done' ")
crt_params = {"threads": threads}
# Insert the individual properties we recorded earlier.
crt_stmt = crt_stmt.replace("PY_REPLACE", cypher_property_string)
run_res = session.run(crt_stmt, crt_params)
This works fine and the nodes get created with the properties passed from the threads Dict which is passed through the variable crt_params to UNWIND.
However, the Result instance in run_res does not have any ResultSummary inside it with a SummaryCounters instance for me to access statistics of created nodes and relations.
I suspect this is because of:
"RETURN distinct 'done' "
However, I am not sure if this is the reason.
Hoping someone may be able to help me set up my queries so that, no matter the number of MERGE operations I perform, I get the statistics for the whole query that was sent in crt_stmt.
Thank you very much.

When using the earlier neo4j version, you could write n = result.summary().counters.nodes_created, but from 4.0 the summary() method does not exist.
Now I found from https://neo4j.com/docs/api/python-driver/current/breaking_changes.html that Result.summary() has been replaced with Result.consume(), this behaviour is to consume all remaining records in the buffer and returns the ResultSummary.
You can get all counters by counters = run_res.consume().counters

Related

How to set "resultDataContents" in Neo4jrb?

I want to visualise data from Neo4j with the frontend-library D3.js in an Rails application, using Neo4jrb. For example I could use the following query to get my graph data.
query = "MATCH path = (a)-[b]->(c) RETURN path"
result = Neo4j::Session.current.query(query)
But this query is not giving me the exact data I want.
According to the Neo4j data visualisation guide there is a possibility to set the parameter resultDataContents to "graph". (
Neo4j documentation for "resultDataContents")
This is exactly what I need for my application. Is there any possibility to set this parameter in Neo4jrb, or another idea how to achieve such a result?
Unfortunately not currently. The neo4j-core gem (which the neo4j gem uses) was build to abstract away the REST format. The "graph" format returns data in a different way.
You have a couple of options. You could make the JSON queries yourself or you could retrieve the nodes and relationships from the queries that you perform and then build your own nodes/relationships structure which is returned. This might be more future-proof anyway if you ever want to switch to Bolt.
A way that you might do this in your case:
query = "MATCH path = (a)-[b]->(c) RETURN nodes(path) AS nodes, rels(path) AS rels"
result = Neo4j::Session.current.query(query)
response = {nodes: [], rels: []}
result.each do |row|
response[:nodes].concat(row.nodes)
response[:rels].concat(row.rels)
end
response[:nodes].uniq!
response[:rels].uniq!

How to correctly use conditionals like IF or CASE in Cypher query language (Neo4J) to successfully create relationships?

I failed to create relationships in Neo4J and I would like to encourage anyone who has sucessfully done it to help me.
The desired result is to have a detailed visualisation of who is a brother to whom, who is who's mother and so on. I want to extract the data from single parent-child relationships. That means, setting a relationship like [:relatedTo {:how['daughter']}] if a node has a parent whose name corresponds to the field node.name and the gender of the node is F.
I have my CSV file that looks like this.
1;Jakub Hančin;M;1994;4;3
2;Hana Hančinová;F;1991;4;3
3;Alojz Hančin jr.;M;1968;15;14
4;Viera Hančinová;F;1968;9;
5;Miroslav Barus sr.;M;1965;9;
6;Helena Barusová;F;1942;;
7;Miroslav Barus jr.;M;1995;6;5
8;Martin Barus;M;1991;6;5
9;Hedviga Barusová;F;1945;;
10;Peter Hančin jr.;M;1991;12;13
11;Zuzka Hančinová;F;1996;12;13
12;Andrea Hančinová;F;1966;;
13;Peter Hančin sr.;M;1965;15;14
14;Alojz Hančin sr.;M;1937;;
15;Anna Hančinová;F;1945;;
This is my personal family tree and I would like to visualize it through Neo4J.
It is a file created with Excel, where I put the information into a table and create a database. Then it was converted to .csv file which is importable into Neo4J. I have sucessfully installed it and now I am at the point of writing the Cypher script to manage it. So far, I have this:
LOAD CSV WITH HEADERS FROM "file:c:/users/Skelo/Desktop/Family Database/Family Database CSV UTF.txt" AS row FIELDTERMINATOR ';'
CREATE (n:Person)
SET n = row, n.name = row.name,
n.personID = toInt(row.personID) , n.G = row.G,
n.Year = toInt(row.Year), n.Parent1 = row.Parent1, n.Parent2 = row.Parent2
WITH n
MATCH(n:Person),(b:Person)
WHERE n.Parent1 = b.name OR n.Parent2 = b.name
CASE b.gender
WHEN b.gender = 'F' THEN
CREATE (b)-[:isRelatedTo{how:['mother']}]->(n)
WHEN b.gender = 'M' THEN
CREATE (b)-[:isRelatedTo{how:['father']}]->(n)
RETURN *
The error message shown looks like this.
Invalid input 'A': expected 'r/R' (line 11, column 2 (offset: 389))
"CASE b.gender"
^
Somehow, I can't figure out why this does not work. Why can't I use the Case command? The Neo4J does not allow me to use anything but the command CREATE (it expects a letter R after C and not an A, this means the CREATE command).
Again, I want to do this. I have a few nodes that are correctly set. For each of those nodes (they represent people), I want to look into the Parent1 and Parent2 fields and to look for a node that has the same name as one of these fields. If it matches one of these, I want to mark that node as a father or a mother to the previous node (judging by the gender of the node, which represents the person).
This way I would like to fill the graph database with many relationships, but I fail at this very basic step. Please help me. If you can, please do not only say what is wrong and why it is wrong, but present a solution that works.
Since you want to create the isRelatedTo relationship regardless of gender and only the property is dependent upon a conditional, do this:
CREATE (b)-[r:isRelatedTo]->(n)
SET r.how = CASE b.gender WHEN 'F' THEN 'mother' ELSE 'father' END

py2neo return number of nodes and relationships created

I need to create a python function such that it adds nodes and relationship to a graph and returns the number of created nodes and relationships.
I have added the nodes and relationship using graph.cypher.execute().
arr_len = len(dic_st[story_id]['PER'])
for j in dic_st[story_id]['PER']:
graph.cypher.execute("MERGE (n:PER {name:{name}})",name = j[0].upper()) #creating the nodes of PER in the story
print j[0]
for j in range(0,arr_len):
for k in range(j+1,arr_len):
graph.cypher.execute("MATCH (p1:PER {name:{name1}}), (p2:PER {name:{name2}}) WHERE upper(p1.name)<>upper(p2.name) CREATE UNIQUE (p1)-[r:in_same_doc {st_id:{st_id}}]-(p2)", name1=dic_st[story_id]['PER'][j][0].upper(),name2=dic_st[story_id]['PER'][k][0].upper(),st_id=story_id) #linking the edges for PER nodes
What I need is to return the number of new nodes and relationships created.
What I get to know from the neo4j documentation is that there is something called "ON CREATE" and "ON MATCH" for MERGE in cypher, but thats not being very useful.
The browser interface for neo4j do actually shows the number of nodes and relationship updated. This is what I need to return, but I am not getting quite the way for it to access it.
Any help please.
In case you need the exact counts of properties either created or updated then you have use "Match" with "Create" or "Match" with "Set" and then count the size of results. Merge may not return which ones are updated and which ones are created.
When you post your query against the Cypher endpoint of the neo4j REST API without using py2neo, you can include the argument "includeStats": true in your post request to get the node/relationship statistics. See this question for an example.
As far as I can tell, py2neo currently does not support additional parameters for the Cypher query (even though it is using the same API endpoints under the hood).
In Python, you could do something like this (using the requests and json packages):
import requests
import json
payload = {
"statements": [{
"statement": "CREATE (t:Test) RETURN t",
"includeStats": True
}]
}
r = requests.post('http://your_server_host:7474/db/data/transaction/commit',
data=json.dumps(payload))
print(r.text)
The response will include statistics about the number of nodes created etc.
{
"stats":{
"contains_updates":true,
"nodes_created":1,
"nodes_deleted":0,
"properties_set":1,
"relationships_created":0,
"relationship_deleted":0,
"labels_added":1,
"labels_removed":0,
"indexes_added":0,
"indexes_removed":0,
"constraints_added":0,
"constraints_removed":0
}
}
After executing your query using x = session.run(...) you can use x.summary.counters to get the statistics noted in Martin Perusse's answer. See the documentation here.
In older versions the counters are available as a "private" field under x._summary.counters.

Too much time importing data and creating nodes

i have recently started with neo4j and graph databases.
I am using this Api to make the persistence of my model. I have everything done and working but my problems comes related to efficiency.
So first of all i will talk about the scenary. I have a couple of xml documents which translates to some nodes and relations between the, as i already read that this API still not support a batch insertion, i am creating the nodes and relations once a time.
This is the code i am using for creating a node:
var newEntry = new EntryNode { hash = incremento++.ToString() };
var result = client.Cypher
.Merge("(entry:EntryNode {hash: {_hash} })")
.OnCreate()
.Set("entry = {newEntry}")
.WithParams(new
{
_hash = newEntry.hash,
newEntry
})
.Return(entry => new
{
EntryNode = entry.As<Node<EntryNode>>()
});
As i get it takes time to create all the nodes, i do not understand why the time it takes to create one increments so fats. I have made some tests and am stuck at the point where creating an EntryNode the setence takes 0,2 seconds to resolve, but once it has reached 500 it has incremented to ~2 seconds.
I have also created an index on EntryNode(hash) manually on the console before inserting any data, and made test with both versions, with and without index.
Am i doing something wrong? is this time normal?
EDITED:
#Tatham
Thanks for the answer, really helped. Now i am using the foreach statement in the neo4jclient to create 1000 nodes in just 2 seconds.
On a related topic, now that i create the nodes this way i wanted to also create relationships. This is the code i am trying right now, but got some errors.
client.Cypher
.Match("(e:EntryNode)")
.Match("(p:EntryPointerNode)")
.ForEach("(n in {set} | " +
"FOREACH (e in (CASE WHEN e.hash = n.EntryHash THEN [e] END) " +
"FOREACH (p in pointers (CASE WHEN p.hash = n.PointerHash THEN [p] END) "+
"MERGE ((p)-[r:PointerToEntry]->(ee)) )))")
.WithParam("set", nodesSet)
.ExecuteWithoutResults();
What i want it to do is, given a list of pairs of strings, get the nodes (which are uniques) with the string value as the property "hash" and create a relationship between them. I have tried a couple of variants to do this query but i dont seem to find the solution.
Is this possible?
This approach is going to be very slow because you do a separate HTTP call to Neo4j for every node you are inserting. Each call is then a transaction. Finally, you are also returning the node back, which is probably a waste.
There are two options for doing this in batches instead.
From https://stackoverflow.com/a/21865110/211747, you can do something like this, where you pass in a set of objects and then FOREACH through them in Cypher. This means one, larger, HTTP call to Neo4j and then executing in a single transaction on the DB:
FOREACH (n in {set} | MERGE (c:Label {Id : n.Id}) SET c = n)
http://docs.neo4j.org/chunked/stable/query-foreach.html
The other option, coming soon, is that you will be able to write something like this in Cypher:
LOAD CSV WITH HEADERS FROM 'file://c:/temp/input.csv' AS n
MERGE (c:Label { Id : n.Id })
SET c = n
https://github.com/davidegrohmann/neo4j/blob/2.1-fix-resource-failure-load-csv/community/cypher/cypher/src/test/scala/org/neo4j/cypher/LoadCsvAcceptanceTest.scala

Neo4j Cypher: How to stop duplicate SET if multiple CREATES

I have a complex cypher query that creates multiple nodes and increments some counters on those nodes. For sake of example here is a simplified version of what I am trying to do:
START a = node(1), e = node(2)
CREATE a-[r1]->(b {})-[r2]->(c {}), e-[r3]->b-[r4]->(d{})
SET a.first=a.first+1, e.second=e.second+1
RETURN b
The issue is that because there are two CREATE commands the SET commands run twice and the values are incremented by 2 instead of 1 as intended. I have looked to see if I can merge the multiple CREATE statements and I cannot.
My initial idea is to separate out the different creates into a batch query, however I was wondering if there is another option.
Where are you executing this query? What version of neo4j are you using?
I went to console.neo4j.org and successfully ran the following and it correctly added one to both a.first and e.second:
START a = node(1), e = node(2)
CREATE a-[r:KNOWS]->b-[r2:KNOWS]->c, e-[:KNOWS]->b-[:KNOWS]->d
SET a.first=a.first+1, e.second=e.second+1
RETURN b

Resources