How to set property name and value on csv load? - neo4j

How can one set both the property name and its value loading a tall skinny csv file?
The csv file would contain only 3 columns, node name (id), property name (p) and property value (v). A node with to properties would therefore correspond to 2 lines.
LOAD CSV... row
MERGE (n) WHERE n.Name = row.id
SET n.{row.p} = row.v
This syntax doesn't exit it's just to explain what I'd like to do. Is there a way to do such a thing in cypher? That be really useful rather than having to pivot data first.
A file contains definitions for person nodes:
Name,Age
A,25
B,34
A second file contains properties for specific nodes, one property per line
Name,property_name,property_value
A,weight,64
A,height,180
B,hair color,blond
I'd like to update nodes A and B and set additional properties based on the second file.
As mentioned below, one possibility is to create (:Data) nodes containing one property each, and link them person nodes
CREATE (p) -[:hasProperty]-> (:Data {Name: row.property_name, Value: row.property_value})
However, this might not be very efficient and extracting person nodes and properties gets much more complex.
MATCH (p:Person) --> (d:Data)
RETURN {name: p.name, age: p.age, property_names: collect(d.Name), property_values: collect(d.Value)}
Graal could either be to set property name dynamically on load, or a pivot function to return data properties on nodes.

You can not assign a property key dynamically from parameters, however since the row is a map, you can set all the row as properties on the node
if it is ok for you :
LOAD CSV ... AS row
MERGE (n:Label {Name: row.id})
SET n += row
EDIT
Based on your edit, for your second file, if you don't have too much different values like weight, height, etc.. you can create a conditional collection, example :
LOAD CSV WITH ... AS row
WITH row, CASE row.property_name WHEN weight THEN [1] ELSE [] END as loopW
UNWIND loopW as w
MATCH (p:Person {name: row.Name})
SET p.weight = row.property_value
WITH row, CASE row.property_name WHEN height THEN [1] ELSE [] END as loopH
UNWIND loopH as h
MATCH (p:Person {name: row.Name})
SET p.height = row.property_value
...

Related

can you add a value as the relationship type from the csv

eg [:owes] instead of this i would like the amount they owe (row.amount)
couldnt come up with much
Below simple cypher script will load the csv file then create a relationship type based on the row.amount and uses APOC (awesome procedure)
LOAD CSV WITH HEADERS FROM "file:///testing.csv" AS row
MERGE (p:Person {name: row.fromPerson})
MERGE (m:Person {name: row.toPerson})
WITH p, m, row
CALL apoc.create.relationship(p, row.amount, {amount: row.amount}, m) YIELD rel
RETURN p, m, rel;
Sample testing.csv:
fromPerson,amount,toPerson
"Tom Hanks",100,"Meg Ryan"
Sample Result:
You wouldn't want to have this as relationship type. The standard way of storing such information is to keep the OWES label as a type and store the amount value as relationship property.
Example statement :
LOAD CSV FROM file:///... AS row
MERGE (from:User {id: row.from_id})
MERGE (to:User {id: row.to_id})
MERGE (from)-[r:OWES]->(to)
SET r.amount = row.amount
If for visualisation purposes you want to see the amount as the caption for the relationship in the Neo4j browser, you can do the following.
Click on the relationship type in the panel on the right
Select the property you want to use as caption

failure trying to shorten CASE-based predicate

This was inspired by these two SO threads:
Boolean value return from Neo4j cypher query without CASE
How to set property name and value on csv load?
I have a CSV with 3 columns:
first_name,prop_name,prop_value
John,weight,100
Paul,height,200
John,hair_color,blonde
Paul,weight,120
So, there are a number of people, and their properties are randomly scattered across different rows. My goal is to screen the rows and assign all found properties to their holders. For the sake of simplicity, let's focus on the 'weight' property only.
I do know how to make this work the long way:
LOAD CSV WITH HEADERS FROM
"file:///test.csv" AS line
WITH line
MERGE (m:Person {name:line.first_name})
WITH line, CASE line.prop_name WHEN "weight" THEN [1] ELSE [] END as loopW
UNWIND loopW as w
MATCH (p:Person {name: line.first_name})
SET p.weight = line.prop_value
But then, I tried to replace the CASE line with a shorter version
WITH line, collect(line.prop_name = "weight") as loopW
...which resulted in weird behavior, where the created nodes did get their 'weight' keys assigned to, but sometimes with the wrong values. So, I could see something like (:Person {weight:blue})
What would be the right way to get rid of the CASE?
You should know that your current usage filters out all lines that don't have "weight" as the prop_name (the UNWIND of an empty collection wipes out all the other lines and they won't get processed).
What you really need is a better way to set dynamically named properties on your nodes so you can avoid the CASE usage completely.
If you can install APOC Procedures (please read the instructions at the top for how to modify your neo4j.conf to whitelist the procedures, and pay attention to the version matrix to ensure you get the version that corresponds with your Neo4j version) there is one that is a perfect fit for what you're trying to do: CALL apoc.create.setProperty( [node,id,ids,nodes], key, value) YIELD node
Usage would be something like:
LOAD CSV WITH HEADERS FROM
"file:///test.csv" AS line
MERGE (m:Person {name:line.first_name})
CALL apoc.create.setProperty(m, line.prop_name, line.prop_value) YIELD node
RETURN count(distinct m)
EDIT
Expanding on what's wrong with the original query:
UNWIND produces rows in a multiplicative way, with respect to the number of elements in the collection. If the collection in the row had 5 elements, the single row would result in 5 rows, one for each element. If the collection was empty, the row would be removed instead, as there wouldn't be any elements in the collection for which to output rows. Because of this, any further WITH line, CASE ... lines in the query wouldn't do any good.
Let's analyze the original query, going by the example input csv:
LOAD CSV WITH HEADERS FROM
"file:///test.csv" AS line
WITH line //redundant, line not needed
MERGE (m:Person {name:line.first_name}) // 4 rows corresponding with 2 nodes
WITH line, CASE line.prop_name WHEN "weight" THEN [1] ELSE [] END as loopW
// still 4 rows, 2 have [1] as loopW, the other 2 have [] as loopW
UNWIND loopW as w // 2 rows eliminated by unwinding empty collections
MATCH (p:Person {name: line.first_name})
SET p.weight = line.prop_value
// only 2 rows are for 'John,weight,100' and 'Paul,weight,120'
// any further repetitions of WITH line, CASE ... UNWIND for different props will fail and eliminate the remaining 2 rows.

Performance: Creating a relationship in neo4j based on property ID

For test purposes i imported data from another datasource into neo4j.
I imported the data only as nodes. Now i want to add the edges based on the imported ID. Every node has 2 fields
id: contains the identification as String
from: contains all connections as a String[]
For performance improvements i also created an index for the propertiy "id" and an index for the property "from"
First i created both properties as String (the from list as comma separated String).
This works, but is really slow:
MATCH (e:Test1),(r:Test2)
WHERE r.from CONTAINS e._id
MERGE (e)-[:HAS]->(r)
is there a better way?
PS: i tried also to store the from field as String[]. than i used the following query
MATCH (e:Test1),(r:Test2)
WHERE e._id IN r.from
MERGE (e)-[:HAS]->(r)
-> Performance is the same
The problem is that you take a combination of all components - the Cartesian product. In both cases. More would be better to split the string by comma to identifiers. For example:
MATCH (T2:Test2)
UNWIND split(T2.from, ",") as id
MATCH (T1:Test1) WHERE T1._id = id
MERGE (T1)-[:HAS]->(T2)
Or, if you keep the identifiers in the array:
MATCH (T2:Test2)
UNWIND T2.from as id
MATCH (T1:Test1) WHERE T1._id = id
MERGE (T1)-[:HAS]->(T2)
And, of course, do not forget about the index.
Actually, at import time, you should be creating the :HAS relationships instead of creating the from property (which forces you to make a wasteful additional query to create the relationships, and leaves you with redundant from properties that you would probably want to delete).
For example, if you are using LOAD CSV to import, and your import file has test2Id and from columns (a string and a string collection, respectively), this import query should create all the nodes and relationships:
LOAD CSV WITH HEADERS FROM "file:///input.csv" AS row
MERGE (t2:Test2 {id: row.test2Id})
WITH row, t2
UNWIND row.from AS t1Id
MERGE (t1:Test1 {id: t1Id})
MERGE (t1)-[:HAS]->(t2);
For better performance, you would want indexes on both :Test1(id) and :Test2(id).

How to use cypher to create a relationship between items in an array and another node

I would like to use cypher to create a relationship between items in an array and another node.
The result from this query was a list of empty nodes connected to each other.
MATCH (person:person),(preference:preference)
UNWIND person.preferences AS p
WITH p
WHERE NOT (person)-[:likes]->(preference) AND
p = preference.name CREATE (person)-[r:likes]->(preference)
Where person.preferences contains an array of preference names.
Obviously I am doing something wrong. I am new to neo4j and any help with above would be much appreciated.
Properties are attributes of a nodes while relationships involve one or two nodes. As such, it's not possible to create a relationship between properties of two nodes. You'd need to split the properties into their own collection of nodes, and then create a relationship between the respective nodes.
You can do all that in one statement - like so:
create (:Person {name: "John"})-[:LIKES]->(:Preference {food: "ice cream"})
For other people, you don't want to create duplicate Preferences, so you'd look up the preference, create the :Person node, and then create the relationship, like so:
match (preference:Preference {food: "ice cream"})
create (person:Person {name: "Jane"})
create (person)-[:LIKES]->(preference)
The bottom line for your use case is you'll need to split the preference arrays into a set of nodes and then create relationships between the people nodes and your new preference nodes.
One thing....
MATCH (person:person),(preference:preference)
Creates a Cartesian product (inefficient and causes weird things)
Try this...
// Get all persons
MATCH (person:person)
// unwind preference list, (table is now person | preference0, person | preference1)
UNWIND person.preferences AS p
// For each row, Match on prefrence
MATCH (preference:preference)
// Filter on preference column
WHERE preference.name=p
// MERGE instead of CREATE to "create if doesn't exist"
MERGE (person)-[:likes]->(preference)
RETURN person,preference
If this doesn't work, could you supply your sample data and noe4j version? (As far as I can tell, your query should technically work)

Add Unique Nodes and relationship between them from Existing Setup in Neo4j

This is in continuation of the problem defined here
Query case-specific nodes in Neo4j
So the situation looks like the image below(please bear with the crappy image)
The blue links denotes the [:RELATES_TO] relationship with the number in black boxes denoting the value of Length property. Similar values also exists for all such other [:RELATES_TO] relationship which is not shown here.
Now I would like to find and create unique Nodes based on 'Name' property of Performer Nodes. Continuing with the example in the link there will be only 4 New Unique Nodes [A,B,C,D]. Lets Call them NewUniqueNodes with name as a property.
Then I would like to query each case in turn. Within each case, I need to query [:RELATES_TO] relationship in increasing order of Length property. For any such pair of nodes (x,y) I need to add a relationship [:FINALRESULT{strength:0}] from NewUniqueNode(Name:x) to NewUniqueNode(name:y) with strength being updated to (strength + value). The value is the number associated with value property of [:RELATES_TO] for the pair of nodes(x,y).
[Example and Expected Output]
In case1, the order of visiting nodes will be
Node(ID:3) to Node(ID:4)
Node(ID:1) to Node(ID:2)
Node(ID:1) to Node(ID:3)
On processing these nodes, the result would be
NewUniqueNode(name:A)-[:FINALRESULT{strength: 1}]-NewUniqueNode(name:D)
NewUniqueNode(name:A)-[:FINALRESULT{strength: 1}]-NewUniqueNode(name:B)
NewUniqueNode(name:B)-[:FINALRESULT{strength: 1}]-NewUniqueNode(name:A)
On processing the full set of cases(case1 + case2 + case3), the result would be something like
NewUniqueNode(name:A)-[:FINALRESULT{strength: 1}]-NewUniqueNode(name:D)
NewUniqueNode(name:A)-[:FINALRESULT{strength: 3}]-NewUniqueNode(name:B)
NewUniqueNode(name:B)-[:FINALRESULT{strength: 2}]-NewUniqueNode(name:A)
NewUniqueNode(name:C)-[:FINALRESULT{strength: 1}]-NewUniqueNode(name:B)
NewUniqueNode(name:A)-[:FINALRESULT{strength: 1}]-NewUniqueNode(name:A)
According to this Neo4j console setup, based on the previous question http://console.neo4j.org/r/vci9yd
I have the following query :
MATCH (n:Performer)
WITH collect(DISTINCT (n.name)) AS names
UNWIND names as name
MERGE (nn:NewUniqueNode {name:name})
WITH names
MATCH (c:Case)
MATCH (p1)-[r:RELATES_TO]->(p2)<-[:RELATES]-(c)-[:RELATES]->(p1)
WITH r
ORDER BY r.length
MATCH (nn1:NewUniqueNode {name:startNode(r).name})
MATCH (nn2:NewUniqueNode {name:endNode(r).name})
MERGE (nn1)-[rf:FINAL_RESULT]->(nn2)
SET rf.strength = CASE WHEN rf.strength IS NULL THEN r.value ELSE rf.strength + r.value END
Explanations :
First we match all performer nodes and collect the distinct name values in the names variable.
Secondly, we iterate the names with the UNWIND clause, creating a NewUniqueNode for each name in the names collection
Then we match all cases, within each case we look for the :RELATES_TO relationships that are inside this case and ordering them by the relationship length value
Then for each relationship found, we match the NewUniqueNode corresponding to the startNode name value, and same for the NewUniqueNode corresponding to the endNode name value
Lastly we merge the :FINAL RESULT relationship between those two unique nodes, we then set the strength property on the relationship depending of the :RELATES_TO relationship length value, for this part I guess you could do the same with ON CREATE and ON MATCH on the MERGE

Resources