I'm searching for a way to remove every property, of any node in the DB, having a specific value using Cypher.
Context
I got a csv bulk file from a relational table with plenty of NULL values. LOAD CSV brings them as values. Removing them (replacing them with empty '' within the csv file) resulted in the same issue (properties without values). Tried many (many) Cypher operations to discard NULL values but nothing worked.
Can't find anything in the docs neither by Googling. Can this be done using only Cypher? It seems to me not (yet) supported.
Thanks.
How about this (when you know the property-names):
MATCH (n:Label)
WHERE n.property = ''
REMOVE n.property;
MATCH (u:User)
WHERE u.age = ''
SET u.age = null;
If you know which columns these are in your import you can do something like this
load csv with headers from "" as line
with line, case line.foo when '' then null else line.foo end as foo
create (:User {name:line.name, foo:foo})
It won't create the properties with null.
For numeric values it's easier as toInt() and toFloat() return null on unparseable values like ''.
No, there is no way to do this only with chypher. I suppose that you have already seen the way to do it via REST. That is the best solution for now.
Related
I am new to Neo4j and I have a relatively complex (but small) database which I have simplified to the following:
The first door has no key, all other doors have keys, the window doesn't require a key. The idea is that if a person has key:'A', I want to see all possible paths they could take.
Here is the code to generate the db
CREATE (r1:room {name:'room1'})-[:DOOR]->(r2:room {name:'room2'})-[:DOOR {key:'A'}]->(r3:room {name:'room3'})
CREATE (r2)-[:DOOR {key:'B'}]->(r4:room {name:'room4'})-[:DOOR {key:'A'}]->(r5:room {name:'room5'})
CREATE (r4)-[:DOOR {key:'C'}]->(r6:room {name:'room6'})
CREATE (r2)-[:WINDOW]->(r4)
Here is the query I have tried, expecting it to return everything except for room6, instead I have an error which means I really don't know how to construct the query.
with {key:'A'} as params
match (n:room {name:'room1'})-[r:DOOR*:WINDOW*]->(m)
where r.key=params.key or not exists(r.key)
return n,m
To be clear, I don't need my query debugged so much as help understanding how to write it correctly.
Thanks!
This should work for you:
WITH {key:'A'} AS params
MATCH p=(n:room {name:'room1'})-[:DOOR|WINDOW*]->(m)
WHERE ALL(r IN RELATIONSHIPS(p) WHERE NOT EXISTS(r.key) OR r.key=params.key)
RETURN n, m
With your sample data, the result is:
╒════════════════╤════════════════╕
│"n" │"m" │
╞════════════════╪════════════════╡
│{"name":"room1"}│{"name":"room2"}│
├────────────────┼────────────────┤
│{"name":"room1"}│{"name":"room3"}│
├────────────────┼────────────────┤
│{"name":"room1"}│{"name":"room4"}│
├────────────────┼────────────────┤
│{"name":"room1"}│{"name":"room5"}│
└────────────────┴────────────────┘
I am new to Neo4j and graph databases. I am trying to create a simple graph DB, but my MATCH query is returning null values. How do I avoid nulls or any errors in my coding?
Create DB
CREATE
(d2:CorNode:course:dscor{Cors_Name:'DataScience_Sem2',Start_Date:'January2018',No_Students:[tointeger(10)]})-[:sem2dsrelation{Name:'Semester2',Cor_Name:'Data Science',A_Y:'AY2017-18',Cor_Node:'CMM534'}]->(Rob:FLNode:course:dscor{Name_F:'Rob Lothian', Role:'Leader',Mod_code:'CMM534'}),
(d2)-[:sem2dsrelation{Name:'Semester2', Cor_Name:'Data Science',A_Y:'AY2017-18',Cor_Node:'CMM534'}]->(Eyad:FLNode:course:dscor{Name_F:'Eyad Elyan',Role:'Leader',Mod_code:'CMM534'}),
(d2)-[:sem2dsrelation{Name:'Semester2', Cor_Name:'Data Science',A_Y:'AY2017-18',Cor_Node:'CMM507'}]->(David:FLNode:course:dscor{Name_F:'David Lonie', Role:'Leader', Mod_code:'CMM507'}),
(d2)-[:sem2dsrelation{Name:'Semester2',Cor_Name:'Data Science', A_Y:'AY2017-18',Cor_Node:'CMM535'}]->(Rob),
(d1:CorNode:course:dscor{Cors_Name:'DataScience_Sem1',Start_Date:'Septemer2017',No_Students:[tointeger(25)]})-[:sem1dsrelation{Name:'Semester1',Cor_Name:'Data Science',A_Y:'AY2017-18',Cor_Node:'CMM524'}]->(Rob),
(d1)-[:sem1dsrelation{Name:'Semester1', Cor_Name:'Data Science',A_Y:'AY2017-18',Cor_Node:'CMM531'}]->(Eyad),
(d1)-[:sem1dsrelation{Name:'Semester1', Cor_Name:'Data Science',A_Y:'AY2017-18',Cor_Node:'CMM510'}]->(Ines:FLNode:course:dscor{Name_F:'Ines Arana',Role:'Leader',Mod_code:'CMM510'})
Query DB
MATCH (:dscor)-[ds1:sem1dsrelation{Name:'Semester1',Cor_Name:'Data Science'}]-(FLNode:dscor)
RETURN DISTINCT [FLNode.Name_F,ds1.Cor_Node]
You basically have a typo. In (FLNode:dscor), "FLNode" is being used a a variable name instead of as a label.
To fix this, you need to assign a variable name (say, x) to the node, make sure FLNode is specified to be a label (by putting a : in front of it), and then use x in the result. Like so:
MATCH (:dscor)-[ds1:sem1dsrelation{Name:'Semester1',Cor_Name:'Data Science'}`]-(x:FLNode:dscor)
RETURN DISTINCT [x.Name_F,ds1.Cor_Node];
So I've been trying to load a csv file where participants have had to rate whom they will get advice from/talk to when they have problems with studying. the table looks something like this:
The alphabets are just names of the people. As you can see there are nulls in this table. I'm trying to load this into Neo4j so we can visualise who is choosing who and if this relationship is reciprocal. Any idea? All help is much appreciated!
Using IS NOT NULL can solve your problem.
LOAD CSV WITH HEADERS FROM file:///xyz.csv AS line
WITH line LIMIT 10
RETURN line
Using this you can see how your data is being loaded.(Don't forget to use limit). Since all the values loaded from CSV are in string format, you'll get your empty column values as this -> "".
From that you can create your node by following the blog i've referenced. And also using IS NOT NULL you can skip the null values and create your schema.
Example:
MERGE (n:Person{name:line.Person})-[:CHOSE]-(:Study1{name:line[1]})
MERGE (n)-[:CHOSE]-(:Study2{name:line[2]})
MERGE (n)-[:CHOSE]-(:Study3{name:line[3]})
MERGE (n)-[:CHOSE]-(:Study4{name:line[4]})
MERGE (n)-[:CHOSE]-(:Study5{name:line[5]})
OR you can use
WITH Line[1] as Person, Line[2] as Study1 and so on...
WHERE Study5 IS NOT NULL
MERGE (n:Person{name:line.Person})-[:CHOSE]-(:Study1{name:line[1]})
MERGE (n)-[:CHOSE]-(:Study2{name:line[2]})
MERGE (n)-[:CHOSE]-(:Study3{name:line[3]})
MERGE (n)-[:CHOSE]-(:Study4{name:line[4]})
MERGE (n)-[:CHOSE]-(:Study5{name:line[5]})
For more detail go through this example.
Hope this helps!
I am trying to load large dataset into neo4j-3 and looking for the options. I found one neo4j-import but the problem with that is it is for initial load only. I have to load 2M records around every week.
I tried loading through shell but having some performance issue, I tried following.
1) Creating constraint upfront.
2) Creating Node and relationships in separate query.
3) Heap space 8G
4) dbms.memory.pagecache 4G
Many times the import just hangs and does nothing for hours.
Edit - CSV load being executed:
USING PERIODIC COMMIT 5000
LOAD CSV WITH HEADERS
FROM "file:///my_sds_39_joe.csv"
AS row
OPTIONAL MATCH (per:Person {UID : "Person."+row.player_cardnum})
WHERE per IS NULL
MERGE (p:Person {CardNumber : row.player_cardnum})
ON CREATE SET p.Creation Date = timestamp(), p.Modification Date = timestamp() ;
EDIT
On a second look, seems like you're trying to implement some kind of conditional logic to your insert.
It looks like what you're trying to do is figure out if a :Person exists with a UID (derived from some concatenation with row.player_cardnum), and in the case where that :Person doesn't exist and the match fails, MERGE a :Person with the CardNumber given by row.player_cardnum.
If this is your goal, you're ALMOST there with your query. The problem is with your WHERE clause.
Understand that WHERE clauses are linked with a preceding MATCH, OPTIONAL MATCH, or WITH, and only affects the linked clause.
With that WHERE on that OPTIONAL MATCH, per will always be null, but more importantly, your row will still exist, and the following MERGE will ALWAYS take place for all rows in the CSV. This is probably the source of your slowdown, as it's creating new :Person nodes for all rows.
If you're trying to null out the row completely when the OPTIONAL MATCH hits on an existing :Person (so the MERGE won't happen in that case), you'll need to add a WITH clause, and make sure your WHERE clause is applied to it instead of the OPTIONAL MATCH.
Additionally, make sure that you have either unique constraints or indexes on Person.UID and Person.CardNumber. As for the UID match, I've heard that indexes are not used when there's some kind of string concatenation of the thing you're matching upon, so you may need to assemble it first and pass it in with a WITH.
Your final query would look like this:
USING PERIODIC COMMIT 5000
LOAD CSV WITH HEADERS
FROM "file:///my_sds_39_joe.csv"
AS row
// first build the UID so we can take advantage of the index
WITH row, "Person." + row.player_cardnum AS UID
OPTIONAL MATCH (per:Person {UID : UID})
// the WHERE now applies to the WITH, which will filter out and null out the row when an OPTIONAL MATCH is found
WITH row, per
WHERE per IS NULL
MERGE (p:Person {CardNumber : row.player_cardnum})
ON CREATE SET p.Creation Date = timestamp(), p.Modification Date = timestamp() ;
I have a column in a csv that looks like this:
I am using this code to test how the splitting of the dates is working:
LOAD CSV WITH HEADERS FROM
'file:///..some_csv.csv' AS line
WITH
SPLIT(line.date_of_birth, '/') AS date_of_birth
return date_of_birth;
This code block works fine and gives me what I'd expect, which is a collection of three values for each date, or perhaps a null if there was no date ( e.g,
[4, 5, 1971]
[0, 0, 2003]
[0, 0, 2005]
. . .
null
null
. . .
My question is, what is this problem with the nulls that are created, and why can't I do a MERGE when there are nulls?
LOAD CSV WITH HEADERS FROM
'file:///..some_csv.csv' AS line
WITH
SPLIT(line.date_of_birth, '/') AS date_of_birth, line
MERGE (p:Person {
date_of_birth: date_of_birth
});
This block above gives me the error:
Cannot merge node using null property value for date_of_birth
I have searched around and have only found one other SO question about this error, which has no answer. Other searches didn't help.
I was under the impression that if there isn't a value, then Neo4j simply doesn't create the element.
I figured maybe the node can't be generated since, after all, how can a node be generated if there is no value to generate it from? So, since I know there are no ID's missing, maybe I could MERGE with ID and date, so Neo4j always sees a value.
But this code didn't fare any better (same error message):
LOAD CSV WITH HEADERS FROM
'file:///..some_csv.csv' AS line
WITH
SPLIT(line.date_of_birth, '/') AS date_of_birth, line
MERGE (p:Person {
ID: line.ID
,date_of_birth: date_of_birth
});
My next idea is that maybe this error is because I'm trying to split a null value on slashes? Maybe the whole issue is due to the SPLIT.
But alas, same error when simplified to this:
LOAD CSV WITH HEADERS FROM
'file:///..some_csv.csv' AS line
WITH line
MERGE (p:Person {
subject_person_id: line.subject_person_id
,date_of_birth: line.date_of_birth
});
So I don't really understand the cause of the error. Thanks for looking at this.
EDIT
Both #stdob-- and #cybersam have both answered with equally excellent responses, if you came here via Google please consider them as if both were accepted
As #cybersam said merge not work well with queries where the properties are set within the scope in null. So, you can use on create and on match:
LOAD CSV WITH HEADERS FROM
'file:///..some_csv.csv' AS line
MERGE (p:Person {
subject_person_id: line.subject_person_id
})
ON CREATE SET p.date_of_birth = line.date_of_birth
ON MATCH SET p.date_of_birth = line.date_of_birth
Some Cypher queries, like MERGE, do not work well with NULL values.
The somewhat tricky workaround for handling this situation with MERGE is to use the FOREACH clause to conditionally perform the MERGE. This query might work for you:
LOAD CSV WITH HEADERS FROM 'file:///..some_csv.csv' AS line
FOREACH (x IN CASE WHEN line.date_of_birth IS NULL THEN [] ELSE [1] END |
MERGE (:Person {date_of_birth: SPLIT(line.date_of_birth, '/')})
);
Another solution that I've been rather fond of is to just tell cypher to skip rows in which the field of interest is NULL as follows:
USING PERIODIC COMMIT #
LOAD CSV WITH HEADERS FROM
'file:///.../csv.csv' AS line
WITH line, SPLIT(line.somedatefield, delimiter) AS date
WHERE NOT line.somedatefield IS NULL
[THE REST OF YOUR QUERY INVOLVING THE FIELD]
Or you can use COALESCE(n.property?, {defaultValue})
Following with Vojtech Ruzicka's approach, you can use something like this your_value:COALESCE(line.your_value, 'default value')
Link to the documentation here, in case you need more information.