py2neo relationship push empties nodes - py2neo

I'm using py2neo (using the Relationship.push() method) to update the properties of existing relationships. The relationship properties are updated successfully, but the related nodes get emptied of all their properties. Is this standard behavior?
Thanks.

I'm not able to recreate what you describe using the code below:
>>> from py2neo import *
>>> graph = Graph(password="p4ssw0rd")
>>> a = Node(name="Alice")
>>> b = Node(name="Bob")
>>> ab = Relationship(a, "KNOWS", b)
>>> graph.create(ab)
>>> remote(a)._id
1
>>> graph.evaluate("MATCH (n) WHERE id(n) = {x} RETURN n", x=1)
(alice {name:"Alice"})
>>> ab["since"] = 1999
>>> graph.push(ab)
>>> graph.evaluate("MATCH (n) WHERE id(n) = {x} RETURN n", x=1)
(alice {name:"Alice"})
That said, bear in mind that what you describe as pushing a relationship is in fact pushing the entire subgraph consisting of the relationship plus its start and end nodes. Therefore if the local copies of those nodes contain no properties, this will be taken as a signal to update the remote nodes as well by removing the remote properties.
Because of this "entire subgraph update" behaviour, you'll need to make sure that the local copies of your nodes are up-to-date before pushing them. Perhaps by pulling them first. There is no higher-level API operation to just push the relationship and ignore the nodes, these operations work on whole subgraphs. To do otherwise, you'll need to drop into Cypher.

Related

Make copy of subtree containing 500k nodes and 1m relations in neo4j

I am evaluating neo4j. I created some random data to compare with other dbs. The data represents a tree structure with 10k, 100k and 1m nodes. There are two relationship types, the hierarchical one, and a connection chain relation like a linked list.
One of the operations that I want to test is to make a copy of a subtree. This operation is done in three steps (copy nodes, copy relations, connect to target). The operation works fine for 10k and 100k tree. But for the biggest example with a copy tree of 500k neo4j never comes back.
The browser shows that it is getting reconnected and nothing happens. I think 500k nodes should not be that much. The test data in cvs files is around 300mb.
What am I doing wrong?
1: copy nodes
match (r {`domain key` : 'unit-B2'})-[:isPredecessorOf*0..]->(n:`T-Base`)
with n as map create (copy:`T-Base`)
set copy = map, copy.`domain key` = map.`domain key` + '-copy'
with map, copy
create (copy)-[:isCopyOf]->(map)
2: copy relations
match (s {`domain key` : 'unit-B2'})-[:isPredecessorOf*0..]->(n)
with collect(n) as st, s
match (s)-[:isPredecessorOf*0..]->(t)-[r:`isPredecessorOf`]->(x)
where x in st
with startNode(r) as s, endNode(r) as d
match (s)<-[:isCopyOf]-(source), (d)<-[:isCopyOf]-(dest)
with source, dest
create (source)-[:isPredecessorOf]->(dest)
match (s {`domain key` : 'unit-B2'})-[:isPredecessorOf*0..]->(n)
with collect(n) as st, s
match (s)-[:isPredecessorOf*0..]->(t)-[r:`isConnectedTo`]->(x)
where x in st
with startNode(r) as s, endNode(r) as d
match (s)<-[:isCopyOf]-(source), (d)<-[:isCopyOf]-(dest)
with source, dest
create (source)-[:isConnectedTo]->(dest)
3: connect root of copy tree to target node
match (source{`domain key`:'unit-B1'}), (dest{`domain key`:'unit-B2-copy'})
create (source)-[:isPredecessorOf]->(dest)
How do you run Neo4j? It's probably just a memory configuration issue for transactional memory. For 1M records you need about 4G heap config.
you should use a label for r and s
separate your 2nd statement into two statements.
If you have to do larger transactional updates, you can install the apoc procedures and use apoc.periodic.iterate to execute updates in batches.
e.g.
call apoc.periodic.iterate('
match (r:Label {`domain key` : 'unit-B2'})-[:isPredecessorOf*0..]->(n:`T-Base`)
return distinct n as map
','
create (copy:`T-Base`)
set copy = map, copy.`domain key` = map.`domain key` + '-copy'
with map, copy
create (copy)-[:isCopyOf]->(map)
',{batchSize:10000,iterateList:true})

Neo4j: Customize Path Traversal

I am pretty new to Neo4j. I have implemented an example use case with the following setup:
acyclic directed graph
nodes have a property called externalID
Nodes:
Node Type S (Start Node)
Node Type E (End Node)
Node Type I (Intermediate Node)
Relations:
Node Type S can only have outgoing relations to Nodes of Type I
Node Type I can have ingoing relations from I and S
Node Type I can have outgoing relations to I and E
Node Type E can only have incomming relations from I
All relations have a weight property assigned which can be any number
With the help of stackoverflow and several tutorials I was able to formulate a Cypher query which gets me all paths from any start node with one externalID to the matching end node with the same externalID.
MATCH p=(a:S)-[r*]->(b:E)
WHERE a.externalID=b.externalID
WITH p, relationships(p) as rcoll
RETURN p
The query works more or less good so far ...
However, I have no idea how to change the behavior on how the graph is scanned for possible paths. Actually I only need a subset of all possible paths. Such paths fulfill the following requirement:
The path traversal is started at a Start Node S with a given capacity C.
if a relationship is traversed the weight property of this relationship is subtracted from the current capacity C (that means negative weights are added)
if the capacity gets negative the path up to this point is invalid (the path up to the previous node is still valid and may continue with other relationships)
if the capacity is still positive continue with another relationship from this point and use the result of C - weight as new C
Can I somehow adjust the query or is there any other possibility with Neo4j to get all paths using the strategy above?
Thanks a lot for your help in advance.
This Cypher query might be suitable for your use case:
MATCH p = (a:S)-[r*]->(b:E)
WHERE a.externalID = b.externalID
WITH
p,
REDUCE(c = a.capacity, r IN RELATIONSHIPS(p) |
CASE WHEN c < 0 THEN -1 ELSE c - r.weight END) AS residual
WHERE residual >= 0
RETURN p;
The REDUCE clause will set residual to a negative value if the capacity is ever reduced below 0, even if subsequent weights would normally cause it to go positive.

Inserting a Relation into Neo4j using MERGE or MATCH runs forever

I am experimenting with Neo4j using a simple dataset of Locations. A location can have a relation to another relation.
a:Location - [rel] - b:Location
I already have the locations in the database (roughly 700.000+ Location entries)
Now I wanted to add the relation data (170M Edges), but I wanted to experiment with the import logic with a smaller set first, so I basically picked 2 nodes that are in the set and tried to create a relationship as follows.
MERGE p =(a:Location {locationid: 3616})-[w:WikiLink]->(b:Location {locationid: 467501})
RETURN p;
and also tried the approach directly from the docu
MATCH (a:Person),(b:Person)
WHERE a.name = 'Node A' AND b.name = 'Node B'
CREATE (a)-[r:RELTYPE { name : a.name + '<->' + b.name }]->(b)
RETURN r
I tried using a directional merge, undirectional merge, etc. etc. I basically tried multiple variants of the above queries and the result is: They run forever, seeming to no complete even after 15 minutes. Which is very odd.
Indexes
ON :Location(locationid) ONLINE (for uniqueness constraint)
Constraints
ON (location:Location) ASSERT location.locationid IS UNIQUE
This is what I am currently using:
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM 'file:///edgelist.csv' AS line WITH line
MATCH (a:Location {locationid: toInt(line.locationidone)}), (b:Location {locationid: toInt(line.locationidtwo)})
MERGE (a)-[w:WikiLink {weight: toFloat(line.edgeweight)}]-(b)
RETURN COUNT(w);
If you look at the terminal output below you can see Neo4j reports 258ms query execution time, the realtime is however somewhat above that. This query already takes a few seconds too much in my opinion (The machine this runs on has 48GB RAM, 16 Cores and is relatively new).
I am currently running this query with LIMIT 1000 (before it was LIMIT 1) but the script is already running for a few minutes. I wonder if I have to switch from MERGE to CREATE. The problem is, I cannot understand the callgraph that EXPLAIN gives me in order to determine the bottleneck.
time /usr/local/neo4j/bin/neo4j-shell -file import-relations.cql
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| p |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [Node[758609]{title:"Tehran",locationid:3616,locationlabel:"NIL"},:WikiLink[9422418]{weight:1.2282325516616477E-7},Node[917147]{title:"Khorugh",locationid:467501,locationlabel:"city"}] |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row
Relationships created: 1
Properties set: 1
258 ms
real 0m1.417s
user 0m1.497s
sys 0m0.158s
If you haven't:
create constraint on loc:Location assert loc.locationid is unique;
Then find both nodes, and create the releationship.
MATCH (a:Location {locationid: 3616}),(b:Location {locationid: 467501})
MERGE p = (a)-[w:WikiLink]->(b)
RETURN p;
or if the locations don't exist yet:
MERGE (a:Location {locationid: 3616})
MERGE (b:Location {locationid: 467501})
MERGE p = (a)-[w:WikiLink]->(b)
RETURN p;
You should also use parameters if you do that from a program.
Have you indexed the Location nodes on locationid?
CREATE INDEX ON :Location(locationid)
I had a similar problem adding edges to a graph and indexing the nodes led to the linking running over 150x faster.
If the nodes aren't indexed neo4j will do a serial search for the two nodes to link together.
USING PERIODIC COMMIT <value>:
Specifies number of records(rows) to be commited in a transaction. Since you have high RAM, it is good to use value that is greater than 100000. This will reduce the number of transactions committed and might further reduce the overall time.

Create Neo4j Nodes by passing a data variable in Python

I work with a large data set in Python and would like to create Neo4j nodes out of a data array within Python. So, my naive attempt to do so would be something like the following.
(In Python script)
Product_IDs = data_array[1:1000] # This contains a list of product IDs
tot_node_num = len(Product_IDs) # It states the total number of product IDs
graph = Graph()
tx = graph.cypher.begin()
tx.append("FOREACH (r IN range(1,tot_node_num) | CREATE (:Product {ID:Product_IDs[r]}))")
tx.commit()
With the above statement, the variables: tot_node_num and Product_IDs are not recognized. How can I pass down an array that I created with my python script to create nodes in Neo4j graph database?
Thank you!
You're absolutely right - the best way to pass in variables is via parameters. Bear in mind however that while this works for expressions and property values, parameters cannot be used for labels and relationship types. To help with this, py2neo provides the cypher_escape function (http://py2neo.org/2.0/cypher.html#py2neo.cypher.cypher_escape):
>>> from py2neo.cypher import cypher_escape
>>> rel_type = "KNOWS WELL"
>>> "MATCH (a)-[:%s]->(b) RETURN a, b" % cypher_escape(rel_type)
'MATCH (a)-[:`KNOWS WELL`]->(b) RETURN a, b'

merging nodes into a new one with cypher and neo4j

using Neo4j - Graph Database Kernel 2.0.0-M02 and the new merge function,
I was trying to merge nodes into a new one (merge does not really merges but binds to the returning identifier according to the documentation) and delete old nodes. I only care at the moment about properties to be transferred to the new node and not relationships.
What I have at the moment is the cypher below
merge (n:User {form_id:123}) //I get the nodes with form_id=123 and label User
with n match p=n //subject to change to have the in a collection
create (x) //create a new node
foreach(n in nodes(p): set x=n) //properties of n copied over to x
return n,x
Problems
1. When foreach runs it creates a new x for every n
2. Moving properties from n to x is replacing all properties each time with the new n
so if the 1st n node from merge has 2 properties a,b and the second c,d in the and after the set x=n all new nodes end up with c,d properties. I know is stated in the documentation so my question is:
Is there a way to merge all properties of N number of nodes (and maybe relationships as well) in a new node with cypher only?
I don't think the Cypher language currently has a syntax that non-destructively copies any and all properties from one node into another.
However, I'll present the solution to a simple situation that may be similar to yours. Let's say that some User nodes have the properties a & b, and some others have c & d. For example:
CREATE (:User { id:1,a: 1,b: 2 }),(:User { id:1,c: 3,d: 4 }),
(:User { id:2,a:10,b:20 }),(:User { id:2,c:30,d:40 });
This is how we would "merge" all User nodes with the same id into a single node:
MATCH (x:User), (y:User)
WHERE x.id=y.id AND has(x.a) AND has(y.c)
SET x.c = y.c, x.d = y.d
DELETE y
RETURN x
You can try this out in the neo4j sandbox at: http://console.neo4j.org/
With Neo4j-3.x it is also possible to merge two nodes into one using a specific apoc procedure.
First you need to download the apoc procedures jar file in your into your $NEO4J_HOME/plugins folder and start the Neo4j server.
Then you can call apoc.refactor.mergeNodes this way:
MATCH (x:User), (y:User)
WHERE x.id=y.id
call apoc.refactor.mergeNodes([x,y]) YIELD node
RETURN node
As I can see it, the resulting node would have all the properties of both x and y, choosing the values of y if both are set.

Resources