Weird Neo4J Cypher behavior on setting relationship properties - neo4j

I have a Cypher request for Neo4J of this kind:
MATCH (u:User {uid: $userId})
UNWIND $contextNames as contextName
MERGE (context:Context {name:contextName.name,by:u.uid,uid:contextName.uid})
ON CREATE SET context.timestamp=$timestamp
MERGE (context)-[by:BY]->(u)
SET by.timestamp = $timestamp
My params are:
{
"userId": "15229100-b20e-11e3-80d3-6150cb20a1b9",
"contextNames": [
{
"uid": "822e2580-1f5e-11e9-9ed0-5b93e8900a78",
"name": "fnas"
}
],
"timestamp": "1912811921129"
}
That query above sets 8 parameters because I have (probably) 8 other relationships of the :BY type in relation to that u. Which seems to me illogical as it should only find one relationship between a concrete context and a concrete u, create it if it doesn't exist, and set the property for it
However, when I do this:
MATCH (u:User {uid: $userId})
UNWIND $contextNames as contextName
MERGE (context:Context {name:contextName.name,by:u.uid,uid:contextName.uid})
ON CREATE SET context.timestamp=$timestamp
MERGE (context)-[:BY{timestamp:$timestamp}]->(u)
It either creates a relationship (if the one with the same timestamp doesn't exist) or it simply doesn't do anything (which seems to be the right behavior).
What is the reason for this discrepancy? A bug in Neo4J?

Related

Match paths of node types where nodes may have cycles

I'm trying to find a match pattern to match paths of certain node types. I don't care about the type of relation. Any relation type may match. I only care about the node types.
Of course the following would work:
MATCH (n)-->(:a)-->(:b)-->(:c) WHERE id(n) = 0
But, some of these paths may have relations to themselves. This could be for :b, so I'd also like to match:
MATCH (n)-->(:a)-->(:b)-->(:b)-->(:c) WHERE id(n) = 0
And:
MATCH (n)-->(:a)-->(:b)-->(:b)-->(:b)-->(:c) WHERE id(n) = 0
I can do this with relations easily enough, but I can't figure out how to do this with nodes, something like:
MATCH (n)-->(:a)-->(:b*1..)-->(:c) WHERE id(n) = 0
As a practical example, let's say I have a database with people, cars and bikes. The cars and bikes are "owned" by people, and people have relationships like son, daughter, husband, wife, etc. What I'm looking for is a query that from a specific node, gets all nodes of related types. So:
MATCH (n)-->(:person*1..)-->(:car) WHERE Id(n) = 0
I would expect that to get node "n", it's parents, grandparents, children, grandchildren, all recursively. And then of those people, their cars. If I could assume that I know the full list of relations, and that they only apply to people, I could get this to work as follows:
MATCH
p = (n)-->(:person)-[:son|daughter|husband|wife|etc*0..]->(:person)-->(:car)
WHERE Id(n) = 0
RETURN nodes(p)
What I'm looking for is the same without having to specify the full list of relations; but just the node label.
Edit:
If you want to find the path from one Person node to each Car node, using only the node labels, and assuming nodes may create cycles, you can use apoc.path.expandConfig.
For example:
MERGE (mark:Person {name: "Mark"})
MERGE (lju:Person {name: "Lju"})
MERGE (praveena:Person {name: "Praveena"})
MERGE (zhen:Person {name: "Zhen"})
MERGE (martin:Person {name: "Martin"})
MERGE (joe:Person {name: "Joe"})
MERGE (stefan:Person {name: "Stefan"})
MERGE (alicia:Person {name: "Alicia"})
MERGE (markCar:Car {name: "Mark's car"})
MERGE (ljuCar:Car {name: "Lju's car"})
MERGE (praveenaCar:Car {name: "Praveena's car"})
MERGE (zhenCar:Car {name: "Zhen's car"})
MERGE (zhen)-[:CHILD_OF]-(mark)
MERGE (praveena)-[:CHILD_OF]-(martin)
MERGE (praveena)-[:MARRIED_TO]-(joe)
MERGE (zhen)-[:CHILD_OF]-(joe)
MERGE (alicia)-[:CHILD_OF]-(joe)
MERGE (zhen)-[:CHILD_OF]-(mark)
MERGE (anthony)-[:CHILD_OF]-(rik)
MERGE (martin)-[:CHILD_OF]-(mark)
MERGE (stefan)-[:CHILD_OF]-(zhen)
MERGE (lju)-[:CHILD_OF]-(stefan)
MERGE (markCar)-[:OWNED]-(mark)
MERGE (ljuCar)-[:OWNED]-(lju)
MERGE (praveenaCar)-[:OWNED]-(praveena)
MERGE (zhenCar)-[:OWNED]-(zhen)
Running a query:
MATCH (n:Person{name:'Joe'})
CALL apoc.path.expandConfig(n, {labelFilter: "Person|/Car", uniqueness: "NODE_GLOBAL"})
YIELD path
RETURN path
will return four unique paths from Joe node to the four car nodes. There are several options for uniqueness of the path, see uniqueness
The /CAR makes it a Termination label, i.e. returned paths are only up to this given label.

Conditional partial merge of pattern into graph

I'm trying to create a relationship that connects a person to a city -> state -> country without recreating the city/state/country nodes and relationships if they do already exist - so I'd end-up with only one USA node in my graph for example
I start with a person
CREATE (p:Person {name:'Omar', Id: 'a'})
RETURN p
then I'd like to turn this into an apoc.do.case statement with apoc
or turn it into one merge statement using unique the constraint that creates a new node if no node is found or otherwise matches an existing node
// first case where the city/state/country all exist
MATCH (locality:Locality{name:"San Diego"})-[:SITUATED_IN]->(adminArea:AdministrativeArea { name: 'California' })-[:SITUATED_IN]->(country:Country { name: 'USA' })
MERGE (p)-[:SITUATED_IN]->(locality)-[:SITUATED_IN]->(adminArea)-[:SITUATED_IN]->(country)
return p
// second case where only state/country exist
MATCH (adminArea:AdministrativeArea { name: 'California' })-[:SITUATED_IN]->(country:Country { name: 'USA' })
MERGE (p)-[:SITUATED_IN]->(locality:Locality{name:"San Diego"})-[:SITUATED_IN]->(adminArea)-[:SITUATED_IN]->(country)
return p
// third case where only country exists
MATCH (country:Country { name: 'USA' })
MERGE (p)-[:SITUATED_IN]->(locality:Locality{name:"San Diego"})-[:SITUATED_IN]->(adminArea:AdministrativeArea { name: 'California' })-[:SITUATED_IN]->(country)
return p
// last case where none of city/state/country exist, so I have to create all nodes + relations
MERGE (p)-[:SITUATED_IN]->(locality:Locality{name:"San Diego"})-[:SITUATED_IN]->(adminArea:AdministrativeArea { name: 'California' })-[:SITUATED_IN]->(country:Country { name: 'USA' })
return p
The key here is I only want to end-up with one (California)->(USA). I don't want those nodes & relationships to get duplicated
Your queries that use MATCH never specify which Person you want. Variable names like p only exist for the life of a query (and sometimes not even that long). So p is unbound in your MATCH queries, and can result in your MERGE clauses creating empty nodes. You need to add MATCH (p:Person {Id: 'a'}) to the start of those queries (assuming all people have unique Id values).
It should NOT be the responsibility of every single query to ensure that all needed localities exist and are connected correctly -- that is way too much complexity and overhead for every query. Instead, you should create the appropriate localities and inter-locality relationships separately -- before you need them. If fact, it should be the responsibility of each query that creates a locality to create all the relationships associated with it.
A MERGE will only not create the specified pattern if every single thing in the pattern already exists, so to avoid duplicates a MERGE pattern should have at most 1 thing that might not already exist. So, a MERGE pattern should have at most 1 relationship, and if it has a relationship then the 2 end nodes should already be bound (by MATCH clauses, for example).
Once the Locality nodes and the inter-locality relationships exist, you can add a person like this:
MATCH (locality:Locality {name: "San Diego"})
MERGE (p:Person {Id: 'a'}) // create person if needed, specifying a unique identifier
ON CREATE SET p.name = 'Omar'; // set other properties as needed
MERGE (p)-[:SITUATED_IN]->(locality) // create relationship if necessary
The above considerations should help you design the code for creating the Locality nodes and the inter-locality relationships.
Finally, the solution I used is much simpler, it's a series of merges.
match (person:Person {Id: 'Omar'}) // that should be present in the graph
merge (country:Country {name: 'USA'})
merge (state:State {name: 'California'})-[:SITUATED_IN]->(country)
merge (city:City {name: 'Los Angeles'})-[:SITUATED_IN]->(state)
merge (person)-[:SITUATED_IN]->(city)
return person;

Adding multiple relationships using WITH, WHERE, and UNWIND

I have data in the following structure:
{"id": "1", "name": "A. I. Lazarev", "org": "United States Department of State", "tags": [{"t": "Infrared"}, {"t": "Near-infrared spectroscopy"}, {"t": "Infrared astronomy"}, {"t": "Data collection"}], "pubs": [{"i": "1542417502", "r": 6}], }
{"id": "2", "name": "Stevan Spremo", "tags": [{"t": "Micro-g environment"}, {"t": "Antibiotics"}, {"t": "Bacteriology"}], "pubs": [{"i": "222163962", "r": 0}], }
{"id": "3", "name": "Bricchi G", "pubs": [{"i": "2417067698", "r": 1}, {"i": "2406980973", "r": 1}]}
Some of the rows have tags, some have organizations, some have both, and some have neither.
I'd like to add relationships between (1) authors and tags, (2) authors and organizations, and (3) authors and publications. I have the publications as nodes already, so it should be fairly straightforward to get (3) once I get (1) and (2).
I have been trying to use the following code:
CALL apoc.periodic.iterate(
"CALL apoc.load.json('file:/test.txt') YIELD value AS q RETURN q",
"UNWIND q.id as id
CREATE (a:Author {id:id, name:q.name, citations:q.n_citation, publications:q.n_pubs})
WITH q, a
UNWIND q.tags as tags
MERGE (t:Tag {{name: tags.t}})
CREATE (a)-[:HAS_TAGS]->(t)
WITH q, a
WHERE q.org is not null
MERGE (o:Organization {name: q.org})
CREATE (a)-[:AFFILIATED_WITH]->(o)",
{batchSize:10000, iterateList:true, parallel:false})
The tags and the organizations show up multiple times in the data, but should only have one node each, so I have used MERGE to create unique nodes for these.
The problem with the following code is that it creates duplicate AFFILIATED_WITH relationships - it actually creates the same number of AFFILIATED_WITH relationships as there are tags.
How can I change the cypher query so that it isn't creating duplicate relationships?
After this clause:
UNWIND q.tags as tags
your query will have as many data rows as the number of tags for the current q (each row will have q, a, id, tags values). The subsequent operations will be performed once per data row. That is why you are creating too many AFFILIATED_WITH relationships.
To solve your issue, you have to reduce the number of data rows appropriately, at the appropriate time (and this will also speed up your processing, since unnecessarily repeated operations will be avoided). In your case, you can just change the second WITH q, a clause to WITH DISTINCT q, a:
CALL apoc.periodic.iterate(
"CALL apoc.load.json('file:///test.txt') YIELD value AS q RETURN q",
"CREATE (a:Author {id:q.id, name:q.name, citations:q.n_citation, publications:q.n_pubs})
WITH q, a
UNWIND q.tags as tags
MERGE (t:Tag {name: tags.t})
CREATE (a)-[:HAS_TAGS]->(t)
WITH DISTINCT q, a
WHERE q.org is not null
MERGE (o:Organization {name: q.org})
CREATE (a)-[:AFFILIATED_WITH]->(o)",
{batchSize:10000, iterateList:true, parallel:false}
)
I have also simplified the query by removing the unnecessary UNWIND q.id as id clause, and fixed some syntax issues.
[UPDATED]
If you want to add the AUTHORED relationships (as requested in the comments to this answer), you should do that before you create the AFFILIATED_WITH relationships -- since the WHERE q.org is not null clause would filter out some q nodes. Also, whenever you use CREATE to create a relationship, Cypher requires that you specify a direction for the relationship.
CALL apoc.periodic.iterate(
"CALL apoc.load.json('file:///test.txt') YIELD value AS q RETURN q",
"CREATE (a:Author {id:q.id, name:q.name, citations:q.n_citation, publications:q.n_pubs})
WITH q, a
UNWIND q.tags as tags
MERGE (t:Tag {name: tags.t})
CREATE (a)-[:HAS_TAGS]->(t)
WITH DISTINCT q, a
UNWIND q.pubs as pubs
MERGE (p:Quanta {id: pubs.i})
CREATE (a)-[r:AUTHORED {rank: pubs.r}]->(p)
WITH q, a
WHERE q.org is not null
MERGE (o:Organization {name: q.org})
CREATE (a)-[:AFFILIATED_WITH]->(o)",
{batchSize:10000, iterateList:true, parallel:false}
)

How to avoid duplicate nodes when importing JSON into Neo4J

Let's say I have a JSON containing relationships between people:
{
[
{
"name": "mike",
"loves": ["karen", "david", "joy"],
"loved": ["karen", "joy"]
},
{
"name": "karen",
"loves": ["mike", "david", "joy"],
"loved": ["mike"]
},
{
"name": "joy",
"loves": ["karen"],
"loved": ["karen", "david"]
}
]
}
I want to import nodes and relationships into a Neo4J DB. For this sample, there's only one relationship ("LOVES") and the 2 lists each user has just control the arrow's direction. I use the following query to import the JSON:
UNWIND {json} as person
CREATE (p:Person {name: person.username})
FOREACH (l in person.loves | MERGE (v:Person {name: l}) CREATE (p)-[:LOVES]->(v))
FOREACH (f in person.loved | MERGE (v:Person {name: f}) CREATE (v)-[:LOVES]->(p))
My problem is that I now have duplicate nodes (i.e. 2 nodes with {name: 'karen'}). I know I could probably use UNIQUE if I insert records one at a time. But what should I use here when importing a large JSON? (to be clear: the name property would always be unique in the JSON - i.e., there are no 2 "mikes").
[EDITED]
Since you cannot assume that a Person node does not yet exist, you need to MERGE your Person nodes everywhere.
If there is no need to use your loved data (that is, if the loves data is sufficient to create all the necessary relationships):
UNWIND {json} as person
MERGE (p:Person {name: person.name})
FOREACH (l in person.loves | MERGE (v:Person {name: l}) CREATE (p)-[:LOVES]->(v))
On the other hand, if the loved data is needed, then you need to use MERGE when creating the relationships as well (since any relationship might already exist).
UNWIND {json} as person
MERGE (p:Person {name: person.name})
FOREACH (l in person.loves | MERGE (v:Person {name: l}) MERGE (p)-[:LOVES]->(v))
FOREACH (f in person.loved | MERGE (v:Person {name: f}) MERGE (v)-[:LOVES]->(p))
In both cases, you should create an index (or uniqueness constraint) on :Person(name) to speed up the query.

How to merge tree in neo4j

Let's say I have a database with named nodes and that the database is either empty or has the following content:
I now need a neo4j statement, that inserts exactly that tree structure, if it does not exists already in the database.
For simple node pair merge, I could use something like
MERGE ({name: 'A'})-[:R1]->({name: 'B'})
But I want the tree structure. How do I add C here?
Firstly, you have to add a label on your tree node (Tree in my above example) and create a unique constraint on the name attribute like this :
CREATE CONSTRAINT ON (n:Tree) ASSERT n.name IS UNIQUE;
Then you can use this script to create the C node and the others is they don't exist :
MERGE (a:Tree {name: 'A'})
MERGE (b:Tree {name: 'B'})
MERGE (c:Tree {name: 'C'})
MERGE (a)-[:R1]->(b)
MERGE (a)-[:R2]->(c);
As you can see you have to use one MERGE per node, and then one MERGE per relationship.

Resources