I want to create some new nodes in my database if some condition is satisfy
MATCH (u:User)-[:has]->(a:Account)-[:initiated]->(l:Process) WHERE ID(l)=984
UNWIND [{mobile:123,email:'a#b.com'}, {mobile:456, email:'a1#b1.com'}] as x
OPTIONAL MATCH (u1:User) WHERE u1.mobile = x.mobile OR u1.email = x.email CASE u1 WHEN u1 IS NULL THEN CREATE (u)-[:pending]->(p:Pending {mobile: x.mobile, email: x.email})
ELSE CREATE (u1)-[:pending]->(p:Pending {mobile: x.mobile, email: x.email})
END
I want to check condition whether any users exist with mobile or email. If exist I want to create node (p) attached to there node (u1) else I want to create node attached to my node i.e (u).
Somehow create is not working in case
Currently the only way to do conditional writes is using the FOREACH/CASE WHEN trick. Based on your condition you create either a 1 element or an empty array and iterate over that using FOREACH, e.g.
...
FOREACH(x in CASE WHEN u1 IS null THEN [1] ELSE [] END |
CREATE (u)-[:pending]->(p:Pending {mobile: x.mobile, email: x.email}))
FOREACH(x in CASE WHEN u1 IS NOT null THEN [1] ELSE [] END |
CREATE (u1)-[:pending]->(p:Pending {mobile: x.mobile, email: x.email}))
See http://www.markhneedham.com/blog/2014/06/17/neo4j-load-csv-handling-conditionals/ for more details.
Related
It seems like any variable I define outisde the WITH and UNWIND clause is not available inside the clause.
I wonder why this query returns that owner is not defined.
MATCH (owner:User { id: 'b6bba33e-646a-46c2-80a2-4b6a6b5cece9' })
// Create one coaching that has two participants (WITH clause)
CREATE (coaching:Coaching {
id: randomUUID(),
subject: 'Testing coaching 1' ,
notes: 'testing this new feature',
createdAt: DateTime()
})<-[:COACHES]-(owner)
WITH ['e1d8aaef-8db6-4dc1-8f3e-1bce06463d04', '6c20b918-284c-42a9-bc6f-ab4f99a09f2f'] AS participants
UNWIND participants AS p
MATCH (participant:User { id: p })
// Create relationship participant PARTICIPATES in coaching
CREATE (coaching)<-[:PARTICIPATES]-(participant)
RETURN owner, coaching, p
Variable `owner` not defined (line 16, column 8 (offset: 536))
"RETURN owner, coaching, p"
I needed to add the variables to the WITH clause.
...
WITH ['e1d8aaef-8db6-4dc1-8f3e-1bce06463d04', '6c20b918-284c-42a9-bc6f-ab4f99a09f2f'] AS participants, owner, coaching
UNWIND participants AS p
MATCH (participant:User { id: p })
// Create relationship participant PARTICIPATES in coaching
CREATE (coaching)<-[:PARTICIPATES]-(participant)
...
I have created the following nodes in neo4j (1 million of them):
CREATE (p:Person { name: 'user1', email: ['user1#gmail.com', 'user1#yahoo.com'] }) RETURN p
CREATE (p:Person { name: 'user2', email: ['user2#gmail.com', 'user2#yahoo.com'] }) RETURN p
...
CREATE (p:Person { name: 'user1000000', email: ['user1000000#gmail.com', 'user1000000#yahoo.com'] }) RETURN p
I have created the following indexes:
CREATE BTREE INDEX i1 FOR (n:Person) ON (n.name)
CREATE BTREE INDEX i2 FOR (n:Person) ON (n.email)
With the above data, the following query takes 2ms to complete and I can concurrently execute about 2800 such queries per second on my desktop.
MATCH (p:Person) WHERE p.name = 'user10' RETURN DISTINCT p.name
But the following query takes 710ms to complete and I can concurrently execute only about 5 such queries per second on my desktop.
MATCH (p:Person) WHERE 'user10#gmail.com' IN p.email RETURN DISTINCT p.name
Is there any way to speed up the second query and also increase the throughput ?
Edit 1:
I tried to use separate nodes for email as suggested by #jose_bacoy in his answer.
I created the following nodes:
CREATE (m1:mail { email: 'user1#gmail.com' })
CREATE (m2:mail { email: 'user1#yahoo.com' })
CREATE (p:Person { name: 'user1' })
CREATE (p) - [:attribute] -> (m1)
CREATE (p) - [:attribute] -> (m2)
RETURN p
...
CREATE (m1:mail { email: 'user1000000#gmail.com' })
CREATE (m2:mail { email: 'user1000000#yahoo.com' })
CREATE (p:Person { name: 'user1000000' })
CREATE (p) - [:attribute] -> (m1)
CREATE (p) - [:attribute] -> (m2)
RETURN p
and indexed them as follows:
CREATE BTREE INDEX i1 FOR (n:Person) ON (n.name)
CREATE BTREE INDEX i2 FOR (n:mail) ON (n.email)
The speed is also good. Latency: 4ms, throughput 1850 queries per second.
The problem with this is that the following query performs very badly.
MATCH (p:Person) - [:attribute] -> (m1:mail)
MATCH (p) - [:attribute] -> (m2:mail)
WHERE m1.email = 'user10#gmail.com' OR m2.email = 'user10#yahoo.com'
RETURN DISTINCT p.name
On my desktop, the latency is about 5s and the throughput is less than 1 per second.
Edit 2:
I modified the query as suggested by Charchit Kapoor below. Following is the query I used.
MATCH (p:Person) - [:attribute] -> (m:mail)
WHERE m.email IN ['user10#gmail.com', 'user10#yahoo.com']
RETURN DISTINCT p.name
has a latency of about 4ms and throughput of about 2600 queries per second.
Your data model is not aligned to your query. Email is a list of emails in Person node and you are searching within a list. Below is a script to change your data model from Person.email into a relationship between Person -[:HAS_EMAIL]-> Email. The APOC function iterate will divide your Person nodes into batches and will run it in parallel for efficiency. Ensure that you have APOC installed.
Then it will create the (Person)->(Email) relationship and remove the property in Person after completion. You can change the batch size (10k per batch) according to your taste. You also want to create a unique index for Email. I will leave it up to you on how to do it.
CALL apoc.periodic.iterate(
"MATCH (p:Person) RETURN p as person;",
"WITH person
UNWIND person.email as email
MERGE (e:Email {email: email})
MERGE (person)-[:HAS_EMAIL]->(e)
SET person.email = null;",
{batchSize:10000, parallel:true, retries:3});
After doing this and creating the index on Email.email, profiling shows that the BTREE index is being used:
PROFILE MATCH (p:Person) -[:HAS_EMAIL] -> (e:Email)
WHERE e.email = 'user10#gmail.com'
RETURN DISTINCT p.name
BTREE INDEX e:Email(email) WHERE
email = $autostring_0
Previously, it shows NodeLabelByScan and Filter on $autostring_0 IN p.email. Even if you create an index on a list, it is not used.
Your second query can be structured differently, first find all the relevant emails and then find the related users:
MATCH (m1:mail)
WHERE m1.email IN ['user10#gmail.com', 'user10#yahoo.com']
MATCH (p)-[:attribute]->(m1)
RETURN DISTINCT p.name
I tried the following but it threw this error. I wish to only create a new person node if there isnt a person node in the existing database that has the exact same properties.
org.neo4j.driver.exceptions.ClientException: Invalid input 'R': expected
MERGE (n:Person{id: abc.id})
MERGE (m:Place{place:def.id})
MERGE (o:Thing{id:abcd.id})
WITH n,m,o
OPTIONAL MATCH (n) – [:present_at] -> x with n,m,o, collect (distinct x) as known_place
OPTIONAL MATCH (m) – [:is] -> y with n,m,o, collect (distinct y) as known_thing
FOREACH (a in ( CASE WHEN NOT m IN known_place THEN [1] ELSE [] END ) CREATE (n)-[:present_at] ->(m))
FOREACH (a in ( CASE WHEN NOT o IN known_thing THEN [1] ELSE [] END ) CREATE (m)-[:is] ->(o))
That error was caused by a missing | in each of your FOREACH clauses. For example, this would fix that syntax error:
FOREACH (a in ( CASE WHEN NOT m IN known_place THEN [1] ELSE [] END ) | CREATE (n)-[:present_at] ->(m))
FOREACH (a in ( CASE WHEN NOT o IN known_thing THEN [1] ELSE [] END ) | CREATE (m)-[:is] ->(o))
However, your query would still have numerous other syntax errors.
In fact, the entire query could be refactored to be simpler and more efficient:
WITH {id: 123} AS abc, {id: 234} as def, {id: 345} AS abcd
MERGE (n:Person{id: abc.id})
MERGE (m:Place{place: def.id})
MERGE (o:Thing{id: abcd.id})
FOREACH (a in ( CASE WHEN NOT EXISTS((n)–[:present_at]->(m)) THEN [1] END ) | CREATE (n)-[:present_at]->(m))
FOREACH (a in ( CASE WHEN NOT EXISTS((m)–[:is]->(o)) THEN [1] END ) | CREATE (m)-[:is]->(o))
I want to MERGE a node:
MERGE (a: Article {URL: event.URL})
If the node does not exist, I need to do this:
ON CREATE FOREACH( site_name in CASE WHEN event.site_name is not null then [1] ELSE [] END |
MERGE (w: Website { value: event.site_name})
MERGE (w)-[:PUBLISHED]->(a))
// all of the tag creation
FOREACH( tag in CASE WHEN event.tags is not NULL then event.tags else [] END |
Merge (t: Article_Tag {value: tag})
CREATE (a)-[: HAS_ARICLE_TAG {date:event_datetime}]->(t))
I believe that ON CREATE only works with SET, but as above, i need to execute multiple statements. Is there a way to create multiple nodes and relationships with an ON CREATE clause?
EDIT: I have tried ON CREATE FOREACH(ignoreme in case when event.article is not null then [1] else [] end |... multiple statements but this does not escape the SET problem.
This is the best way to do this: just wrap it in a FOREACH statement
MATCH (a: Article {URL: event.URL})
FOREACH(ignoreme in case when a is not null then [1] else [] end |... Statement here...)
So I am importing a rather robust CSV with tons of information in it. Rather than slicing it and reduplicating a lot of data and cleansing does Neo4j support a where clause in a multiple way for instance:
USING PERIODIC COMMIT 1000
LOAD CSV FROM 'file:///registryDump.csv' AS line
WITH line
WHERE line[25] IS NOT NULL
MERGE (u:User {name: line[25]})
ON CREATE SET u.source = "Registry", u.type = "Owner"
Than additionally add another:
WHERE line[12] IS NOT NULL
MERGE (u:User {name: line[12]})
ON CREATE SET u.source = "Registry", u.type = "Steward"
Making a much larger clause?
Use a combination of CASE and FOREACH:
WITH [0,1,null,3] as line
FOREACH(NULL IN CASE WHEN line[0]=0 THEN [1] ELSE [] END |
MERGE (U:User{name:0})
ON CREATE SET U.source = "Registry", U.type = "Steward"
)
FOREACH(NULL IN CASE WHEN line[1]<1 THEN [1] ELSE [] END |
MERGE (U:User{name:1})
)
FOREACH(NULL IN CASE WHEN line[2] IS NOT NULL THEN [1] ELSE [] END |
MERGE (U:User{name:line[2]})
)
FOREACH(NULL IN CASE WHEN line[3] IS NOT NULL THEN [1] ELSE [] END |
MERGE (U:User{name:line[3]})
)
Taken from Mark Needham. Neo4j: LOAD CSV – Handling empty columns