failure trying to shorten CASE-based predicate - neo4j

This was inspired by these two SO threads:
Boolean value return from Neo4j cypher query without CASE
How to set property name and value on csv load?
I have a CSV with 3 columns:
first_name,prop_name,prop_value
John,weight,100
Paul,height,200
John,hair_color,blonde
Paul,weight,120
So, there are a number of people, and their properties are randomly scattered across different rows. My goal is to screen the rows and assign all found properties to their holders. For the sake of simplicity, let's focus on the 'weight' property only.
I do know how to make this work the long way:
LOAD CSV WITH HEADERS FROM
"file:///test.csv" AS line
WITH line
MERGE (m:Person {name:line.first_name})
WITH line, CASE line.prop_name WHEN "weight" THEN [1] ELSE [] END as loopW
UNWIND loopW as w
MATCH (p:Person {name: line.first_name})
SET p.weight = line.prop_value
But then, I tried to replace the CASE line with a shorter version
WITH line, collect(line.prop_name = "weight") as loopW
...which resulted in weird behavior, where the created nodes did get their 'weight' keys assigned to, but sometimes with the wrong values. So, I could see something like (:Person {weight:blue})
What would be the right way to get rid of the CASE?

You should know that your current usage filters out all lines that don't have "weight" as the prop_name (the UNWIND of an empty collection wipes out all the other lines and they won't get processed).
What you really need is a better way to set dynamically named properties on your nodes so you can avoid the CASE usage completely.
If you can install APOC Procedures (please read the instructions at the top for how to modify your neo4j.conf to whitelist the procedures, and pay attention to the version matrix to ensure you get the version that corresponds with your Neo4j version) there is one that is a perfect fit for what you're trying to do: CALL apoc.create.setProperty( [node,id,ids,nodes], key, value) YIELD node
Usage would be something like:
LOAD CSV WITH HEADERS FROM
"file:///test.csv" AS line
MERGE (m:Person {name:line.first_name})
CALL apoc.create.setProperty(m, line.prop_name, line.prop_value) YIELD node
RETURN count(distinct m)
EDIT
Expanding on what's wrong with the original query:
UNWIND produces rows in a multiplicative way, with respect to the number of elements in the collection. If the collection in the row had 5 elements, the single row would result in 5 rows, one for each element. If the collection was empty, the row would be removed instead, as there wouldn't be any elements in the collection for which to output rows. Because of this, any further WITH line, CASE ... lines in the query wouldn't do any good.
Let's analyze the original query, going by the example input csv:
LOAD CSV WITH HEADERS FROM
"file:///test.csv" AS line
WITH line //redundant, line not needed
MERGE (m:Person {name:line.first_name}) // 4 rows corresponding with 2 nodes
WITH line, CASE line.prop_name WHEN "weight" THEN [1] ELSE [] END as loopW
// still 4 rows, 2 have [1] as loopW, the other 2 have [] as loopW
UNWIND loopW as w // 2 rows eliminated by unwinding empty collections
MATCH (p:Person {name: line.first_name})
SET p.weight = line.prop_value
// only 2 rows are for 'John,weight,100' and 'Paul,weight,120'
// any further repetitions of WITH line, CASE ... UNWIND for different props will fail and eliminate the remaining 2 rows.

Related

load csv doesn't work, maybe because of it's big size

I'm trying to import my csv file to neo4j, here is the cypher query:
load csv with headers from "file:/Users/mac/Desktop/Dataset.csv" as row with row
merge (u:User {name:row.user, nationality:row.nation})
merge (l:Location {name:row.location,region:row.region})
merge (u) -[t:to {date:row.DateReview,rating:row.rating}]-> (l)
return u,l,t
it takes few seconds running then returns back:
Cannot merge the following node because of null property value for 'nationality': (:User {nationality: null})
the problem that the same query works perfectly for a csv file of 40 lines, but now my csv file is about 17200 lines, plus it's size is 2,2Mo.
thank you in advance for any help or suggestions.
It means some rows on the column nation is blank (null). Neo4j do not allow assigning a null value to a property. If a property is null, that instance property will be skipped. Also check other fields as well like DateReview and rating.
If some of the rows are blank then you can "SET" the values in the cypher query after doing a merge. Trim is making sure a space will be treated as also blank(null).
load csv with headers from "file:/Users/mac/Desktop/Dataset.csv" as row
with row
merge (u:User {name:row.user})
set u.nationality = case trim(row.nation)
when "" then null else row.nation end
return u
Do not worry about setting the value of nationality to null. Neo4j will not store that nationality:null value in the database.
Please do the same trick (SET) on location.region and to.date
Are you sure that you need both properties name and nationality for a :User to be unique? If not, you can just MERGE on the name field.
As a fallback for any fields used in MERGE to be empty, you can do
MERGE (:User {myMergeField: COALESCE (row.muColumn, '_noValue')})
and after the import DETACH DELETE all :User nodes with myMergeField = '_noValue_'
The fact that your query runs fine if you do a LIMIT 8000 indicates that there is a data problem after row 8000

Neo4j Cypher two separate unwind loops in one query

I am reading a big file into neo4j with the script below:
WITH $dict.rows as rows UNWIND rows as row
WITH row WHERE row.object CONTAINS 'wikidata'
MERGE(e:Entity {wikidataId: replace(row.object,"http://www.wikidata.org/entity/","")})
SET e.dbpediaUri = row.subject
WITH distinct $dict.rows as rows UNWIND rows as row
MATCH(e:Entity) where e.dbpediaUri = row.subject
WITH row, e
CREATE(object:Property {value:row.object, type: "string"})
WITH row,e,object
CALL apoc.create.relationship(e, row.predicate, {source:"dbpedia", type:"uri"}, object) YIELD rel
RETURN null
Here I want to first merge entities with the given wikidata id(Here I need the WITH with WHERE so that first I get the desired property), and in the second loop I want to add relationships to that entity.
I'm wondering if this code would end up with cartesian product? Will the second WITH ... UNWIND statement run inside the first one or not? If so, how can I achieve what I want to do in one query?
To my understanding your second unwind on rows will run inside the first one. I assume you want to prevent this. After SET e.dbpediaUri = row.subject you need to close the loop by using a proper aggregation function. Example in your case: with collect(e) as entities
Caution: in case WITH row WHERE row.object CONTAINS 'wikidata' returns 0 records, the rest of the cypher query will NOT executed. The second unwind will never be reached. Therefore it is better to split your query in 2 different transactions
Best
Markus

Loading sparse adjacency matrix on Neo4j

I'm trying to load a sparse (co-occurrence) matrix in Neo4j but after many failed queries, it's getting frustrating.
Raw data
Basically, I want to create the nodes from the ids, and the relationship weight against each other node (including itself) should be the value on the matrix.
So, for example, 'nhs' should have a self-relationship with weight 41 and 16 with 'england', and so on.
I was trying things like:
LOAD CSV WITH HEADERS FROM 'file:///matpharma.csv' AS row
MERGE (a: node{name: row.id})
MERGE (b: node{name: row.key})
MERGE (a)-[:w]-(b);
I'm not sure how to attach the edge values though (and not yet sure if the merges are producing the expected result).
Thanks in advance for the assistance
If you just need to add a property on a relationship, where the property value is in your CSV, then it's just a matter of adding a variable for the relationship that you MERGE in, and then using SET (or ON CREATE SET, if you only want to set the property if the relationship didn't exist and needed to be created). So something like:
LOAD CSV WITH HEADERS FROM 'file:///matpharma.csv' AS row
MERGE (a: node{name: row.id})
MERGE (b: node{name: row.key})
MERGE (a)-[r:w]-(b)
SET r.weight = row.weight
EDIT
Ah, took a look at the CSV clip. This is a very strange way to format your data. You have data in your header (that is, your headers are trying to define the other node to lookup) which is the wrong way to go about this. You should instead have, per row, one column that defines one of the two nodes to connect (like the "id" column) and then another column for the other node (something like an "id2"). That way you can just do two MATCHes to get your nodes, then a MERGE between them, and then setting the relationship property, similar to the sample query I provided above.
But if you're set on this format, then it's going to be a more complicated query, since we have to deal with dynamic access of the row keys and values.
Something like:
LOAD CSV WITH HEADERS FROM 'file:///matpharma.csv' AS row
MERGE (start:Node {name:row.id})
WITH start, row, [key in keys(row) WHERE key <> 'id'] as keys
FOREACH (key in keys |
MERGE (end:Node {name:key})
MERGE (start)-[r:w]-(end)
ON CREATE SET r.weight = row[key] )
This is a nice Cypher challenge :) Let's say that LOAD CSV is not really meant to do this and probably you would be happier by flattening your data
Here is what I came up with :
LOAD CSV FROM "https://gist.githubusercontent.com/ikwattro/a5260d131f25bcce97c945cb97bc0bee/raw/4ce2b3421ad80ca946329a0be8a6e79ca025f253/data.csv" AS row
WITH collect(row) AS rows
WITH rows, rows[0] AS firstRow
UNWIND rows AS row
WITH firstRow, row SKIP 1
UNWIND range(0, size(row)-2) AS i
RETURN firstRow[i+1], row[0], row[i+1]
You can take a look at the gist

How to set property name and value on csv load?

How can one set both the property name and its value loading a tall skinny csv file?
The csv file would contain only 3 columns, node name (id), property name (p) and property value (v). A node with to properties would therefore correspond to 2 lines.
LOAD CSV... row
MERGE (n) WHERE n.Name = row.id
SET n.{row.p} = row.v
This syntax doesn't exit it's just to explain what I'd like to do. Is there a way to do such a thing in cypher? That be really useful rather than having to pivot data first.
A file contains definitions for person nodes:
Name,Age
A,25
B,34
A second file contains properties for specific nodes, one property per line
Name,property_name,property_value
A,weight,64
A,height,180
B,hair color,blond
I'd like to update nodes A and B and set additional properties based on the second file.
As mentioned below, one possibility is to create (:Data) nodes containing one property each, and link them person nodes
CREATE (p) -[:hasProperty]-> (:Data {Name: row.property_name, Value: row.property_value})
However, this might not be very efficient and extracting person nodes and properties gets much more complex.
MATCH (p:Person) --> (d:Data)
RETURN {name: p.name, age: p.age, property_names: collect(d.Name), property_values: collect(d.Value)}
Graal could either be to set property name dynamically on load, or a pivot function to return data properties on nodes.
You can not assign a property key dynamically from parameters, however since the row is a map, you can set all the row as properties on the node
if it is ok for you :
LOAD CSV ... AS row
MERGE (n:Label {Name: row.id})
SET n += row
EDIT
Based on your edit, for your second file, if you don't have too much different values like weight, height, etc.. you can create a conditional collection, example :
LOAD CSV WITH ... AS row
WITH row, CASE row.property_name WHEN weight THEN [1] ELSE [] END as loopW
UNWIND loopW as w
MATCH (p:Person {name: row.Name})
SET p.weight = row.property_value
WITH row, CASE row.property_name WHEN height THEN [1] ELSE [] END as loopH
UNWIND loopH as h
MATCH (p:Person {name: row.Name})
SET p.height = row.property_value
...

Avoid processing duplicate data when CSV importing via Cypher

I have a set of CSV files with duplicate data, i.e. the same row might (and does) appear in multiple files. Each row is uniquely identified by one of the columns (id) and has quite a few other columns that indicate properties, as well as required relationships (i.e. ids of other nodes to link to). The files all have the same format.
My problem is that, due to size and number of the files, I want to avoid processing the rows that already exist - I know that as long as id is the same, the contents of the rows will be the same across the files.
Can any cypher wizard advise how to write a query that would create the node, set all the properties and create all the relationship if a node with given id does not exist, but skip the action altogether if such node is found? I tried with MERGE ON CREATE, something along the lines of:
LOAD CSV WITH HEADERS FROM "..." AS row
MERGE (f:MyLabel {id:row.uniqueId})
ON CREATE SET f....
WITH f,row
MATCH (otherNode:OtherLabel {id : row.otherNodeId})
MERGE (f) -[:REL1] -> (otherNode)
but unfortunately that can only be applied to not setting the properties again, but I couldn't work out how to skip the merging part of relationships (only shown one here, but there are quite a few more).
Thanks in advance!
You can just optionally match the node and then skip with WHERE n IS NULL
Make sure you have an index or constraint on :MyLabel(id)
LOAD CSV WITH HEADERS FROM "..." AS row
OPTIONAL MATCH (f:MyLabel {id:row.uniqueId})
WHERE f IS NULL
MERGE (f:MyLabel {id:row.uniqueId})
ON CREATE SET f....
WITH f,row
MATCH (otherNode:OtherLabel {id : row.otherNodeId})
MERGE (f) -[:REL1] -> (otherNode)

Resources