Neo4J Cypher - Case Insensitive MERGE - neo4j

Is there a way to do a case insensitive MERGE in Cypher (Neo4J)?
I'm creating a graph of entities I have been able to extract from a set of documents, and want to merge entities that are the same across multiple documents (accepting the risk that the same name doesn't mean it's the same entity!). The issue is that the case can vary between documents.
At the moment, I'm using the MERGE syntax to create merged nodes, but it is sensitive to the differences in case. How can I perform a case-insensitive merge?

There is no direct way but you can try out something like below.MERGE is made for pattern matching and labels of different cases constitute different patterns
MERGE (a:Crew123)
WITH a,labels(a) AS t
LIMIT 1
MATCH (n)
WHERE [l IN labels(n)
WHERE lower(l)=lower(t[0])] AND a <> n
WITH a,collect(n) AS s
FOREACH (x IN s |
DELETE a)
RETURN *
The above query will give you an ERROR but it will delete the newly created node if a similar label exists. You can add additional pattern in the MERGE clause . And in case there are no similar labels it will run successfully.
Again this is just a work around to not allow new similar labels.

If the data is coming for instance from a CSV or similar source (parameter) you can use a dedicated, consistent case property for the merge and set the original value separately.
e.g.
CREATE CONSTRAINT ON (u:User) ASSERT u.iname IS UNIQUE;
LOAD CSV WITH HEADERS FROM "http://some/url" AS line
WITH line, lower(line.name) as iname
MERGE (u:User {iname:iname}) ON CREATE SET u.name = line.name;

The best solution we found was to change our schema to include a labelled node that contains the upper-cased value which we can merge on, whilst still retaining the case information on the original mode. E.g. (OriginalCase)-[uppercased]->(ORIGINALCASE)

Related

Loading sparse adjacency matrix on Neo4j

I'm trying to load a sparse (co-occurrence) matrix in Neo4j but after many failed queries, it's getting frustrating.
Raw data
Basically, I want to create the nodes from the ids, and the relationship weight against each other node (including itself) should be the value on the matrix.
So, for example, 'nhs' should have a self-relationship with weight 41 and 16 with 'england', and so on.
I was trying things like:
LOAD CSV WITH HEADERS FROM 'file:///matpharma.csv' AS row
MERGE (a: node{name: row.id})
MERGE (b: node{name: row.key})
MERGE (a)-[:w]-(b);
I'm not sure how to attach the edge values though (and not yet sure if the merges are producing the expected result).
Thanks in advance for the assistance
If you just need to add a property on a relationship, where the property value is in your CSV, then it's just a matter of adding a variable for the relationship that you MERGE in, and then using SET (or ON CREATE SET, if you only want to set the property if the relationship didn't exist and needed to be created). So something like:
LOAD CSV WITH HEADERS FROM 'file:///matpharma.csv' AS row
MERGE (a: node{name: row.id})
MERGE (b: node{name: row.key})
MERGE (a)-[r:w]-(b)
SET r.weight = row.weight
EDIT
Ah, took a look at the CSV clip. This is a very strange way to format your data. You have data in your header (that is, your headers are trying to define the other node to lookup) which is the wrong way to go about this. You should instead have, per row, one column that defines one of the two nodes to connect (like the "id" column) and then another column for the other node (something like an "id2"). That way you can just do two MATCHes to get your nodes, then a MERGE between them, and then setting the relationship property, similar to the sample query I provided above.
But if you're set on this format, then it's going to be a more complicated query, since we have to deal with dynamic access of the row keys and values.
Something like:
LOAD CSV WITH HEADERS FROM 'file:///matpharma.csv' AS row
MERGE (start:Node {name:row.id})
WITH start, row, [key in keys(row) WHERE key <> 'id'] as keys
FOREACH (key in keys |
MERGE (end:Node {name:key})
MERGE (start)-[r:w]-(end)
ON CREATE SET r.weight = row[key] )
This is a nice Cypher challenge :) Let's say that LOAD CSV is not really meant to do this and probably you would be happier by flattening your data
Here is what I came up with :
LOAD CSV FROM "https://gist.githubusercontent.com/ikwattro/a5260d131f25bcce97c945cb97bc0bee/raw/4ce2b3421ad80ca946329a0be8a6e79ca025f253/data.csv" AS row
WITH collect(row) AS rows
WITH rows, rows[0] AS firstRow
UNWIND rows AS row
WITH firstRow, row SKIP 1
UNWIND range(0, size(row)-2) AS i
RETURN firstRow[i+1], row[0], row[i+1]
You can take a look at the gist

How to match line of csv which is ignored by constraint and create only relationship

I have been created a graph having a constraint on primary id. In my csv a primary id is duplicate but the other proprieties are different. Based on the other properties I want to create relationships.
I tried multiple times to change the code but it does not do what I need.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///Trial.csv' AS line FIELDTERMINATOR '\t'
MATCH (n:Trial {id: line.primary_id})
with line.cui= cui
MATCH (m:Intervention)
where m.id = cui
MERGE (n)-[:HAS_INTERVENTION]->(m);
I already have the nodes Intervention in the graph as well as the trials. So what I am trying to do is to match a trial with the id from intervention and create only the relationship. Instead is creating me also the nodes.
This is a sample of my data, so the same primary id, having different cuis and I am trying to match on cui:
You can refer the following query which finds Trial and Intervention nodes by primary_id and cui respectively and creates the relationship between them.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///Trial.csv' AS line FIELDTERMINATOR '\t'
MATCH (n:Trial {id: line.primary_id}), (m:Intervention {id: line.cui})
MERGE (n)-[:HAS_INTERVENTION]->(m);
The behavior you observed is caused by 2 aspects of the Cypher language:
The WITH clause drops all existing variables except for the ones explicitly specified in the clause. Therefore, since your WITH clause does not specify the n node, n becomes an unbound variable after the clause.
The MERGE clause will create its entire pattern if any part of the pattern does not already exist. Since n is not bound to anything, the MERGE clause would go ahead and create the entire pattern (including the 2 nodes).
So, you could have fixed the issue by simply specifying the n variable in the WITH clause, as in:
WITH n, line.cui= cui
But #Raj's query is even better, avoiding the need for WITH entirely.

Bulk merge of relations in Neo4J using Cypher

I am trying to create/update distinct relations between two nodes with a single bulk operation in Cypher, leveraging the MERGE and FOREACH clause.
Right now, I am trying to do it with the following, but it is not syntactically correct:
MERGE (u1:Person {id:1})
MERGE (u2:Person {id:3})
FOREACH (score IN [{name:'R1',val:1.0},{name:'R2',val:0.5}]|
MERGE (u1)-[r]-(u2)
WHERE type(r) = score.name
ON CREATE SET r.weight=score.val,r.created=timestamp(),r.updated=r.created
ON MATCH SET r.weight=score.val,r.updated=timestamp()
)
May you please suggest me a query to achieve that.
I think the problem with your query is this:
MERGE (u1)-[r]-(u2)
WHERE type(r) = score.name
Creting relationships without a type is not allowed, nor is it to use a variable name (score.name) for the type of the relationship. I can only suggest two partial solutions:
1) If you are writing the query from some code, insert the name from there. For instance, in PHP: :
....
$rels[] = [val => 1.0, name => 'R1'];
foreach ($rels as $rel) {
$query[] = 'MERGE (u1)-[r:' . '$rel["name"]' . ']-(u2)';
ON CREATE SET r.weight=score.val, r.created=timestamp(), r.updated=r.created
ON MATCH SET r.weight=score.val,r.updated=timestamp()
}
....
This probably would give an error because of reusing the "r" identifier for the relationship, but that could be avoided making it variable too.
2) A cleaner solution, but maybe not available in your environment, is to use APOC. In Neo4j 3.0+ it´s available to install with many functions for you to use, in particular apoc.create.relationship. I am not familiar with this, but here it´s quite well explained.
Anyway here I also leave the current open issue at neo4j repository in case it´s useful.

Avoid processing duplicate data when CSV importing via Cypher

I have a set of CSV files with duplicate data, i.e. the same row might (and does) appear in multiple files. Each row is uniquely identified by one of the columns (id) and has quite a few other columns that indicate properties, as well as required relationships (i.e. ids of other nodes to link to). The files all have the same format.
My problem is that, due to size and number of the files, I want to avoid processing the rows that already exist - I know that as long as id is the same, the contents of the rows will be the same across the files.
Can any cypher wizard advise how to write a query that would create the node, set all the properties and create all the relationship if a node with given id does not exist, but skip the action altogether if such node is found? I tried with MERGE ON CREATE, something along the lines of:
LOAD CSV WITH HEADERS FROM "..." AS row
MERGE (f:MyLabel {id:row.uniqueId})
ON CREATE SET f....
WITH f,row
MATCH (otherNode:OtherLabel {id : row.otherNodeId})
MERGE (f) -[:REL1] -> (otherNode)
but unfortunately that can only be applied to not setting the properties again, but I couldn't work out how to skip the merging part of relationships (only shown one here, but there are quite a few more).
Thanks in advance!
You can just optionally match the node and then skip with WHERE n IS NULL
Make sure you have an index or constraint on :MyLabel(id)
LOAD CSV WITH HEADERS FROM "..." AS row
OPTIONAL MATCH (f:MyLabel {id:row.uniqueId})
WHERE f IS NULL
MERGE (f:MyLabel {id:row.uniqueId})
ON CREATE SET f....
WITH f,row
MATCH (otherNode:OtherLabel {id : row.otherNodeId})
MERGE (f) -[:REL1] -> (otherNode)

MATCH prior to MERGE in single Cypher query confuses ON MATCH and ON CREATE

I've run into this on Neo4j 2.1.5. I have a query which I'm issuing from Node.js using the Neo4j REST API. The point of this query is to be able to create or update a given Node and set its state (including labels and properties) to some known state. The MATCH and REMOVE clause prior to the WITH is to work around the fact that there's no direct way to remove all of a Node's labels nor is there a way to update a Node's labels with a given set of labels. You have to explicitly remove the labels you don't want and add the labels you do want. And there's no way to remove labels in the MERGE clause.
A somewhat simplified version of the query looks like:
MATCH (m {name:'Brian'})
REMOVE m:l1:l2
WITH m
MERGE (n {name:'Brian'})
ON MATCH SET n={mprops} ON CREATE SET n={cprops}
RETURN n
where mprops = {updated:true, created:false} and cprops = {updated:false, created:true}. I do this so that in a single Cypher query I can remove all of the Node's existing labels and set new labels using the ON MATCH clause. The problem is that including the initial MATCH seems to confuse the ON MATCH vs ON CREATE logic.
Assuming the Brian Node already exists, the result of this query should show that n.created = false and n.updated = true. However, I get the opposite result, n.created=true, n.updated=false. If I remove the initial MATCH (and WITH) clause and execute only the MERGE clause, the results are as expected. So somehow, the inclusion of the MATCH clause causes the MERGE clause to think that a CREATE vs MATCH is happening.
I realize this is a weird use of the WITH clause, but it did seem like it would work around the limitation in manipulating labels. And Cypher thinks that it's valid Cypher. I'm assuming this is just a bug and an edge case, but I wanted to get others insights and possible alternatives before I report it.
I realize that I could have created a transaction and issued the MATCH and MERGE as separate queries within that transaction, but there are reasons that this does not work well in the design of the API I'm writing.
Thanks!
If you prefix your query with MATCH it will never execute if there is no existing ('Brian') node.
You also override all properties with your SET n = {param} you should use SET n += {param}
MERGE (n:Label { name:'Brian' })
ON MATCH SET n += {create :false,update:true }
ON CREATE SET n += {create :true,update:false }
REMOVE n:WrongLabel
RETURN n
I don't see why your query would not work, but the issues brought up by #FrobberOfBits are valid.
However, logically, your example query is equivalent to this one:
MATCH (m {name:'Brian'})
REMOVE m:l1:l2
SET m={mprops}
RETURN m
This query is simpler, avoids the use of MERGE entirely, and may avoid whatever issue you are seeing. Does this represent what you were trying to do?

Resources