Stuck after ~500 inserts

Stuck after ~500 inserts - neo4j

I am inserting nodes and relations into my neo4j DB (graphenedb but also happens locally).
After roughly 500 inserts the insert statment stucks.
After a neo4j server restart, the same insert works as usual and I can continue with the next ~500 inserts.
Do you have any clue why it get stuck?
One insert statement looks like following:
MERGE (b0:Company{company_id:{b1},universal_name:{b2},company_name:{b3}})
ON CREATE SET b0.funding_total_usd = null
ON MATCH SET b0.funding_total_usd = null
MERGE (b13:Industry{name:{b12}})
MERGE (b0)-[:company_industry]->(b13)
MERGE (b15:Category{name:{b14}})
MERGE (b0)-[:company_category]->(b15)
MERGE (b17:Category{name:{b16}})
MERGE (b0)-[:company_category]->(b17)
MERGE (b19:Category{name:{b18}})
MERGE (b0)-[:company_category]->(b19)
MERGE (b21:Category{name:{b20}})
MERGE (b0)-[:company_category]->(b21)
MERGE (b23:Category{name:{b22}})
MERGE (b0)-[:company_category]->(b23)
MERGE (b25:Category{name:{b24}})
MERGE (b0)-[:company_category]->(b25)
MERGE (b27:Category{name:{b26}})
MERGE (b0)-[:company_category]->(b27)
Indexes are present:
Indexes
ON :Category(name) ONLINE (for uniqueness constraint)
ON :Company(company_id) ONLINE (for uniqueness constraint)
ON :Company(universal_name) ONLINE (for uniqueness constraint)
ON :Industry(name) ONLINE (for uniqueness constraint)
Constraints
ON ( category:Category ) ASSERT category.name IS UNIQUE
ON ( company:Company ) ASSERT company.company_id IS UNIQUE
ON ( company:Company ) ASSERT company.universal_name IS UNIQUE
ON ( industry:Industry ) ASSERT industry.name IS UNIQUE
I use following PHP code to submit the statement:
$config = \GraphAware\Bolt\Configuration::create()
->withCredentials($user, $pw)
->withTimeout($timeout);
if($ssl) {
$config = $config->withTLSMode(\GraphAware\Bolt\Configuration::TLSMODE_REQUIRED);
}
$driver = \GraphAware\Bolt\GraphDatabase::driver($uri, $config);
$driver->session()->run($query, $binds);
Tested versions: 3.4.12 and 3.5.1
#edit: Added code which is used to submit the statement and neo4j version.

You should be batching your insertions, and you should not be explicitly creating separate variables for each individual node. Instead, see if you can provide parameters which include lists of properties that you can address at a single time using UNWIND.
See some of our batching tips and tricks.
Applied to your query, your parameter input per batch could look something like this:
{entries:[{companyId:12345, universalName:'foo', companyName:'bar',
industry:'industry', categories:[{name:'cat1'}, {name:'cat2'},
{name:'cat3'}]}]}
And the query itself per batch execution could look like this:
UNWIND $entries as entry
MERGE (c:Company{company_id:entry.companyId, universal_name:entry.universalName, company_name:entry.companyName})
SET c.funding_total_usd = null
MERGE (industry:Industry{name:entry.industry})
MERGE (c)-[:company_industry]->(industry)
WITH entry, c
UNWIND entry.categories as cat
MERGE (category:Category{name:cat.name})
MERGE (c)-[:company_category]->(category)

Related

Why is LOAD WITH HEADERS statement only importing the first row of my dataset?

I am attempting a project where I must import a dataset into neo4j. After trying to use the LOAD CSV WITH HEADERS statement, I noticed that it only imported the first row from my file. After realizing this, I attempted to use the apoc plugin to run CALL apoc.periodic.iterate thinking that since my dataset had 16719 rows, it needed to be able to wait for each row to be called on so it would not fail.
apoc.periodic.iterate attempt:
CALL apoc.periodic.iterate
(
"LOAD CSV WITH HEADERS FROM 'file:///Video_Games_Sales_as_at_22_Dec_2016.csv' as row
WITH row
RETURN row",
"MERGE (g:Game)
ON CREATE SET g.Name = row.Name,
g.Release = row.Release,
g.NASales = row.NASales,
g.EUSales = row.EUSales,
g.JPSales = row.JPSales,
g.OtherSales = row.OtherSales,
g.GlobalSales = row.GlobalSales
MERGE (p:Platform)
ON CREATE SET p.Name = row.Platform
MERGE (c:Genre)
ON CREATE SET c.Type = row.Genre
MERGE (v:Publisher)
ON CREATE SET v.Name = row.Publisher
MERGE (x:Developer)
ON CREATE SET x.Name = row.Developer
MERGE (r:Rating)
ON CREATE SET r.Rating = row.Rating
MERGE (g)-[:ON_PLATFORM]-(p)
MERGE (g)-[:GENRE]-(c)
MERGE (g)-[:PUBLISHEDBY]-(v)
MERGE (g)-[:DEVELOPEDBY]-(x)
MERGE (g)-[:RATED]-(r)",
{batchSize: 10000, iterateList: true}
)
YIELD batches, total
RETURN batches, total;
Even after running this new statement, it only imported the first row and all relationships.
In an attempt to figure out what I am doing wrong, I would like to know if anyone has experienced a similar issue?
With that being said, if you see where I am messing up, please point me in the right direction.

It may have to do with the fact that you do things like
MERGE (g:Game)
which may overwrite the same node every time.
Normally you do
MERGE (g:Game {Name: row.Name})
assuming that Name is an identifying property.
Also, make sure that you have a CONSTRAINT set for the Name property.
Same of course for all the other node types that you are using.

Merge statement in Cypher

I came across this statement in a Intro to Cypher video:
Ignoring the last MERGE statement, does the MERGE essentially do an INSERT...ON DUPLICATE KEY ? For example:
MERGE (a:Person {name: "Ann"})
ON CREATE SET a.twitter = "#ann"
Would correspond to:
INSERT INTO Person (name) VALUES ("Ann")
ON DUPLICATE KEY SET twitter = "#ann"
And by extension, if there is a MERGE on a node that doesn't already exist does it act as if it is a CREATE keyword?

Yes, that is what MERGE does. Note that it is not limited to just key fields. It takes into account all fields you provide in the MERGE clause. See also https://neo4j.com/docs/cypher-manual/current/clauses/merge/

WITH is required between MERGE and MATCH (line 4, column 1 (offset: 63))

I am trying to use CYPHER to create a simple graph on NEO4J.
Below is the query:
MERGE (nut:asset{name:'nut'})
MERGE (bolt:asset{name:'bolt'})
MATCH (nut:asset)
WITH nut,bolt
MERGE (nut:asset)-[:hasPart]->(washer:asset{name:'washer',domain:'tool'})
Its throws me an error
WITH is required between MERGE and MATCH (line 4, column 1 (offset: 63))
"MATCH (nut:asset)"
^
When I try to change my query to
MERGE (nut:asset{name:'nut'})
MERGE (bolt:asset{name:'bolt'})
MERGE (nut:asset)-[:hasPart]->(washer:asset{name:'washer',domain:'tool'})
Its says
Can't create node `nut` with labels or properties here. The variable is already declared in this context
How to use the MERGE statement in this context. I have used the tutorial from Neo4j link to construct my query.

The first error was caused because after creating two nodes, you directly used MATCH. The query is continuous but you broke it into parts using MATCH with a label. In order to maintain continuation you've to use WITH.
The second error was caused because you are using the same variable nut twice.
Using WITH you can reduce the cardinality and time taken by the query.
The first query can be written like this:
MERGE (nut:asset{name:'nut'})
with nut
MERGE (bolt:asset{name:'bolt'})
with nut,bolt
MERGE (nut)-[:hasPart]->(washer:asset{name:'washer',domain:'tool'})
and the second one:
MERGE (nut:asset{name:'nut'})
MERGE (bolt:asset{name:'bolt'})
MERGE (nut)-[:hasPart]->(washer:asset{name:'washer',domain:'tool'})

Cypher 'Node Already Exists' issue with MERGE

I am preplexed on why I am getting an issue with this Cypher statment when I have a unique constraint on the address of the location node but am using a merge which should find that if it exists and only return the id for the rest of the statment. What am I missing?
Here is my statement:
MERGE(l:Location{location_name:"Starbucks", address:"36350 Van Dyke Ave", city: "Sterling Heights",state: "MI", zip_code:"48312",type:"location",room_number:"",long:-83.028889,lat:42.561152})
CREATE(m:Meetup{meet_date:1455984000,access:"Private",status:"Active",type:"project",did_happen:"",topic:"New features for StudyUup",agenda:"This is a brainstorming session to come with with new ideas for the companion website, StudyUup. Using MatchUup as the base, what should be added, removed, or modified? Bring your thinking caps and ideas!"})
WITH m,l
MATCH (g:Project{title_slug:"studyuup"}) MATCH (p:Person{username:"wkolcz"})
WITH m,l,g,p
MERGE (g)-[:CREATED {rating:0}]->(m)
MERGE (m)-[:MEETUP_AT {rating:0}]->(l)-[:HOSTED_MEETUP]->(m)
MERGE (m)<-[:ATTENDING]-(p)
RETURN id(m) as meeting_id
I am getting:
Node 416 already exists with label Location and property "address"=[36350 Van Dyke Ave]

You've encountered a common misunderstanding of MERGE. MERGE merges on everything you've specified within the single MERGE clause. So the order of operations are:
Search for a :Location node with all of the properties you've specified.
If found, return the node.
If not found, create the node.
Your problem occurs at step 3. Because a node with all of the properties you've specified does not exist, it goes to step 3 and tries to create a node with all of those properties. That's when your uniqueness constraint is violated.
The best practice is to merge on the property that you've constrained to be unique and then use SET to update the other properties. In your case:
MERGE (l:Location {address:"36350 Van Dyke Ave"})
SET l.location_name = "Starbucks",
l.city = "Sterling Heights"
...
The same logic is going to apply for the relationships you're merging later in the query. If the entire pattern doesn't exist, it's going to try to create the entire pattern. That's why you should stick to the best practice of:
MERGE (node1:Label1 {unique_property: "value"})
MERGE (node2:Label2 {unique_property: "value"})
MERGE (node1)-[:REL]-(node2)

Is it better to make one MERGE request for multiple nodes / edges creation in Neo4J Cypher 2.0 or to split it into transactions?

I have a long Cypher query (the new Neo4J 2.0 version), which creates multiple nodes and connections using the MERGE command.
The question is: do you think I'm better off splitting it into different parts and submitting it as a transaction (for robustness) or should I keep the long single one (for speed)?
Here's the query:
MATCH (u:User {name: "User"}) MERGE (tag1:Hashtag {name:"tag1"}) MERGE (tag2:Hashtag
{name:"tag2"}) MERGE (tag3:Hashtag {name:"tag3"}) MERGE (tag4:Hashtag {name:"tag4"})
MERGE tag1-[:BY]->u MERGE tag2-[:BY]->u MERGE tag3-[:BY]->u MERGE tag4-[:BY]->u;
(I purposefully made the request shorter, imagine that there are like 50 tags (nodes) and even more edges for example)

As long as your query statement is not hundreds of lines and your data created doesn't exceed 50k elements, I'd stick with one query.
But you should use parameters instead.
I would also rewrite your query with foreach and parameters
MATCH (u:User {name: {userName})
FOREACH (tagName in {tags} |
MERGE (tag:Hashtag {name:tagName})
MERGE (tag)-[:BY]->(u)
)
params:
{userName:"User", tags: ["tag1",...,"tagN"]}

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Stuck after ~500 inserts - neo4j

Related

Why is LOAD WITH HEADERS statement only importing the first row of my dataset?

Merge statement in Cypher

WITH is required between MERGE and MATCH (line 4, column 1 (offset: 63))

Cypher 'Node Already Exists' issue with MERGE

Is it better to make one MERGE request for multiple nodes / edges creation in Neo4J Cypher 2.0 or to split it into transactions?

Categories

Resources