Clone nodes and relationships with Cypher

Clone nodes and relationships with Cypher - neo4j

Is it possible to clone arbitrary nodes and relationships in a single Cypher neo4j 2.0 query?
'Arbitrary' reads 'without specifying their labels and relationship types'. Something like:
MATCH (node1:NodeType)-[e]->(n)
CREATE (clone: labels(n)) set clone=n set clone.prop=1
CREATE (node1)-[e1:type(e)]->(clone) set e1=e set e1.prop=2
is not valid in Cypher, so one cannot simply get labels from one node or relationship and assign them to another, because labels are compiled into the query literally.
Sure, labels and relation types are important for MATCH and WHERE for producing effective query plan, but isn't CREATE making another case?

The easiest way to clone parts of a graph is to use the dump command in Neo4j shell. dump generates cypher create statements from your return clauses. The result of dump can be appied to the graph database to create clones.

Today, April 2022, I believe the best approach might be using an APOC procedure
I had a similar requirement and this worked for me.
MATCH (rootA:Root{name:'A'}),
(rootB:Root{name:'B'})
MATCH path = (rootA)-[:LINK*]->(node)
WITH rootA, rootB, collect(path) as paths
CALL apoc.refactor.cloneSubgraphFromPaths(paths, {
standinNodes:[[rootA, rootB]]
})
YIELD input, output, error
RETURN input, output, error

Related

APOC/Neo4j: match variable length of relationships in between

For example, if I want to match a pattern like this:
MATCH (:Person)-[:A]->(:Movie)-[:B*]->(:Movie)-[:C]->(:Person)
where the first and last relationship's length ([:A] and [:C]) is fixed (they are both with length 1), but the middle relationship's length ([:B]) is variable.
Then the following paths will both match this pattern:
(:Person)-[:A]->(:Movie)-[:B]->(:Movie)-[:C]->(:Person)
(:Person)-[:A]->(:Movie)-[:B]->()-[:B]->(:Movie)-[:C]->(:Person)
I'm wondering if it's possible to use only one APOC path expander procedure (say apoc.path.expandConfig) and use the sequence to achieve this? (It seems that APOC does not support variable length of relationships in between though)
I know that I could create a new relationship between these two (:Movie) nodes based on this post
but my goal is to return the whole path all at once, including the relationship and nodes between these two (:Movie) nodes.
Edited: This Cypher query can output what I want, but I want to
compare the performance between Cypher and APOC (for research
purposes). That's why I'm asking the equivalent way of using APOC for
this query.

Create relationship with properties using a query in Cypher

I would like to know if this is possible. I have a query that produces a nice report showing a relationship between two entities through two other nodes. There can be more than one path. I now want to create a direct relationship between those two nodes and count the number of paths and sum based upon data in the nodes in between. the report query is below.
match (bo:BuyerAgency)<-[:IS_FOR_BO]-(sol:Solicitation)-[:SELECTED]->(prop:Proposal)<-[:OWNS_BID]-(so:VendorOrg)
where sol.currStatus='Awarded'
return bo.AgencyName, count(sol.Number) as awards, so.orgName, sum(prop.finalPrice) as awardVolume;
What I want to do is similar to below which will not work.
match (bo:BuyerAgency)<-[:IS_FOR_BO]-(sol:Solicitation)-[:SELECTED]->(prop:Proposal)<-[:OWNS_BID]-(so:VendorOrg)
where sol.currStatus='Awarded'
create (bo)-[:HAS_AWARDED{awardCount: count(sol.Number), awardVolume: sum(prop.finalPrice)}]->(so);
If I remove the properties for the relationship, it works but want to add the properties without to much programing.
I am using the most recent version of Neo4j 3.2.
thanks

The problem here is you are trying to use count() and sum() functions in an invalid context. The below query should work:
match (bo:BuyerAgency)<-[:IS_FOR_BO]-(sol:Solicitation)-[:SELECTED]->(prop:Proposal)<-[:OWNS_BID]-(so:VendorOrg)
where sol.currStatus='Awarded'
with bo, so, count(sol.Number) as count_sol, sum(prop.finalPrice) as sum_finalPrice
create (bo)-[:HAS_AWARDED{awardCount: count_sol, awardVolume: sum_finalPrice}]->(so);
This query uses WITH to pass bo, so and the result of the aggregation functions count(sol.Number) and sum(prop.finalPrice) to the next context. After, these values are used to create the new relation between bo and so.

Cypher query optimisation - Utilising known properties of nodes

Setup:
Neo4j and Cypher version 2.2.0.
I'm querying Neo4j as an in-memory instance in Eclipse created TestGraphDatabaseFactory().newImpermanentDatabase();.
I'm using this approach as it seems faster than the embedded version and I assume it has the same functionality.
My graph database is randomly generated programmatically with varying numbers of nodes.
Background:
I generate cypher queries automatically. These queries are used to try and identify a single 'target' node. I can limit the possible matches of the queries by using known 'node' properties. I only use a 'name' property in this case. If there is a known name for a node, I can use it to find the node id and use this in the start clause. As well as known names, I also know (for some nodes) if there are names known not to belong to a node. I specify this in the where clause.
The sorts of queries that I am running look like this...
START
nvari = node(5)
MATCH
(target:C5)-[:IN_LOCATION]->(nvara:LOCATION),
(nvara:LOCATION)-[:CONNECTED]->(nvarb:LOCATION),
(nvara:LOCATION)-[:CONNECTED]->(nvarc:LOCATION),
(nvard:LOCATION)-[:CONNECTED]->(nvarc:LOCATION),
(nvard:LOCATION)-[:CONNECTED]->(nvare:LOCATION),
(nvare:LOCATION)-[:CONNECTED]->(nvarf:LOCATION),
(nvarg:LOCATION)-[:CONNECTED]->(nvarf:LOCATION),
(nvarg:LOCATION)-[:CONNECTED]->(nvarh:LOCATION),
(nvari:C4)-[:IN_LOCATION]->(nvarg:LOCATION),
(nvarj:C2)-[:IN_LOCATION]->(nvarg:LOCATION),
(nvare:LOCATION)-[:CONNECTED]->(nvark:LOCATION),
(nvarm:C3)-[:IN_LOCATION]->(nvarg:LOCATION),
WHERE
NOT(nvarj.Name IN ['nf']) AND NOT(nvarm.Name IN ['nb','nj'])
RETURN DISTINCT target
Another way to think about this (if it helps), is that this is an isomorphism testing problem where we have some information about how nodes in a query and target graph correspond to each other based on restrictions on labels.
Question:
With regards to optimisation:
Would it help to include relation variables in the match clause? I took them out because the node variables are sufficient to distinguish between relationships but this might slow it down?
Should I restructure the match clause to have match/where couples including the where clauses from my previous example first? My expectation is that they can limit possible bindings early on. For example...
START
nvari = node(5)
MATCH
(nvarj:C2)-[:IN_LOCATION]->(nvarg:LOCATION)
WHERE NOT(nvarj.Name IN ['nf'])
MATCH
(nvarm:C3)-[:IN_LOCATION]->(nvarg:LOCATION)
WHERE NOT(nvarm.Name IN ['nb','nj'])
MATCH
(target:C5)-[:IN_LOCATION]->(nvara:LOCATION),
(nvara:LOCATION)-[:CONNECTED]->(nvarb:LOCATION),
(nvara:LOCATION)-[:CONNECTED]->(nvarc:LOCATION),
(nvard:LOCATION)-[:CONNECTED]->(nvarc:LOCATION),
(nvard:LOCATION)-[:CONNECTED]->(nvare:LOCATION),
(nvare:LOCATION)-[:CONNECTED]->(nvarf:LOCATION),
(nvarg:LOCATION)-[:CONNECTED]->(nvarf:LOCATION),
(nvarg:LOCATION)-[:CONNECTED]->(nvarh:LOCATION),
(nvare:LOCATION)-[:CONNECTED]->(nvark:LOCATION)
RETURN DISTINCT target
On the side:
(Less important but still an interest) If I make each relationship in a match clause an optional match except for relationships containing the target node, would cypher essentially be finding a maximum common sub-graph between the query and the graph data base with the constraint that the MCS contains the target node?
Thanks a lot in advance! I hope I have made my requirements clear but I appreciate that this is not a typical use-case for Neo4j.

I think querying with node properties is almost always preferable to using relationship properties (if you had a choice), as that opens up the possibility that indexing can help speed up the query.
As an aside, I would avoid using the IN operator if the collection of possible values only has a single element. For example, this snippet: NOT(nvarj.Name IN ['nf']), should be (nvarj.Name <> 'nf'). The current versions of Cypher might not use an index for the IN operator.
Restructuring a query to eliminate undesirable bindings earlier is exactly what you should be doing.
First of all, you would need to keep using MATCH for at least the first relationship in your query (which binds target), or else your result would contain a lot of null rows -- not very useful.
But, thinking clearly about this, if all the other relationships were placed in separate OPTIONAl MATCH clauses, you'd be essentially saying that you want a match even if none of the optional matches succeeded. Therefore, the logical equivalent would be:
MATCH (target:C5)-[:IN_LOCATION]->(nvara:LOCATION)
RETURN DISTINCT target
I don't think this is a useful result.

Neo4j Spatial- two nodes created for every spatially indexed node

I am using Neo4j 1.8.2 with Neo4j Spatial 0.9 for 1.8.2 (http://m2.neo4j.org/content/repositories/releases/org/neo4j/neo4j-spatial/0.9-neo4j-1.8.2/)
Followed the example code from here http://architects.dzone.com/articles/neo4jcypher-finding-football with one change- instead of SpatialIndexProvider.SIMPLE_WKT_CONFIG, I used SpatialIndexProvider.SIMPLE_POINT_CONFIG_WKT
Everything works fine until you execute the following query:
START n=node:stadiumsLocation('withinDistance:[53.489271,-2.246704, 5.0]')
RETURN n.name, n.wkt;
n.name is null. When I explored the graph, I found this data:
Node[80]{lon:-2.20024,lat:53.483,id:79,gtype:1,bbox:-2.20024,53.483,-2.20024,53.483]}
Node[168]{lon:-2.29139,lat:53.4631,id:167,gtype:1,bbox:-2.29139,53.4631,-2.29139,53.4631]}
For Node 80 returned, it looks like this is the node created for the spatial record, which contains a property id:79. Node 79 is the actual stadium record from the example.
As per the source of IndexProviderTest, the comments
//We not longer need this as the node we get back already a 'Real' node
// Node node = db.getNodeById( (Long) spatialRecord.getProperty( "id" ) );
seem to indicate that this feature isn't available in the version I am using.
My question is, what is the recommended way to use withinDistance with other match conditions? There are a couple of other conditions to be fulfilled but I can't seem to get a handle on the actual node to actually match them.
Should I explicitly create relations? Not use Cypher and use the core API to do a traversal? Split the queries?

Two options:
a) Use GeoPipline.startNearestNeighborLatLonSearch to get a starting set of nodes, supply to subsequent Cypher query to do matching/filtering on other properties
b) Since my lat/longs are common across many entities [using centroid of an area], I can create a relation from the spatial node to all entities that are located in that area and then use one Cypher query such as:
START n=node:stadiumsLocation('withinDistance:[53.489271,-2.246704, 5.0]')
MATCH (n)<-[:LOCATED_IN]-(something)
WHERE something.someProp=5
RETURN something
As advised by Peter, went with option b.
Note though, there is no way to get the spatially indexed node back so that you can create relations from it. Had to do a withinDistance query for 0.0 distance.

can you execute the enhanced testcase I did at https://github.com/neo4j/spatial/blob/2803093d544f56d7dfe8f1d122e049fa73489d8a/src/test/java/org/neo4j/gis/spatial/IndexProviderTest.java#L199 ? It shows how to find a location, and traverse with cypher to the next node.

How do I provide multiple queries in Neo4j Cypher?

I want to use the results from the first query in the second query. I am not sure how to do this in Cypher?
Current code,
START user1=node:USER_INDEX(USER_INDEX = "userA")
MATCH user1-[r1:ACCESSED]->docid1<-[r2:ACCESSED]-user2, user2-[r3:ACCESSED]->docid2
WHERE r2.Topic=r3.Topic
RETURN distinct docid2.Label;
I want to have different conditions checked in the WHERE clause for the same docid2 set of nodes and accumulate the results and perform order by based on a date field.
I am not able to provide multiple match and return within the same transaction.
That is when I am trying to have two different cypher scripts and combine them in a third query. Is this possible in cypher?
Or is there any option to write custom functions and invoke them?
Do we have stored Cypher scripts like Stored Gremlin scripts?

As Michael mentioned in the comment, you can use the "with" statement to stream result into further statements. Unfortunately, you can't start another statement after the "where" clause. Multiple return statements would be kind of illogical, but you can do multiple things in a single query e.g.:
START x=node:node_auto_index(key="x")
with count(x) as exists
start y=node:node_auto_index(key="y")
where exists = 0
create (n {key:"y"})<-[:rel]-y
return n, y
This would check if the "x" node exists and if it doesn't, proceed to create it and add a couple of parameters.
If you wish to do more sophisticated things on result sets, your best options are either batch requests or the Java API...

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Clone nodes and relationships with Cypher - neo4j

The easiest way to clone parts of a graph is to use the dump command in Neo4j shell. dump generates cypher create statements from your return clauses. The result of dump can be appied to the graph database to create clones.

Related

APOC/Neo4j: match variable length of relationships in between

Create relationship with properties using a query in Cypher

Cypher query optimisation - Utilising known properties of nodes

Neo4j Spatial- two nodes created for every spatially indexed node

How do I provide multiple queries in Neo4j Cypher?

Categories

Resources