Multiple Match statements in Neo4j - neo4j

I have a list of MATCH statements which are totally unrelated to each other. But if I execute them like
MATCH (a:Person),(b:InProceedings) WHERE a.identifier = 'person/joseph-valeri' and b.identifier = 'conference/edm2008/paper/209' CREATE (a)-[r:creator]->(b)
MATCH (a:Person),(b:InProceedings) WHERE a.identifier = 'person/nell-duke' and b.identifier = 'conference/edm2008/paper/209' CREATE (a)-[r:creator]->(b)
But if I execute them at once I get the following error:
WITH is required between CREATE and MATCH (line 2, column 1)
What changes should I incorporate?
(I am new to Neo4j)

Does this need to happen in a single transaction? In which case you should be matching your nodes up front before performing the create:
MATCH (jo:Person{identifier:'person/joseph-valeri'}), (nell:Person{identifier:'person/nell-duke'}), (b:InProceedings{identifier:'conference/edm2008/paper/209'})
CREATE (jo)-[:creator]->(b), (nell)-[:creator]->(b)
If it's just the two creators you could change the create to:
CREATE (jo)-[:creator]->(b)<-[:creator]-(nell)
If this isn't what you want to achieve then effectively what you have posted is two distinct Cypher statements that you are trying to run as one, and the parser is getting confused.
Post comment edit
Given that you said millions I think that you are going to find the transaction time on performing the import prohibitive and therefore you should investigate the CSV import syntax (and specifically pay attention to PERIODIC COMMIT) if you can write to CSV instead of to the big Cypher dump?
If for some reason that is not an option and you are starting from empty then build slowly - creating nodes first. These are going to need names to keep the speed up (but these names aren't persisted, just constant in your Cypher query):
CREATE (a:Person{identifier:'person/joseph-valeri'}),
(b:Person{identifier:'person/nell-duke'}),
(zzz:Person{identifier:'person/do-you-really-want-person-in-all-these-identifiers'}),
(inProca:InProceedings{identifier:'conference/edm2008/paper/209'}),
(inProcb:InProceedings{identifier:'conference/edm2009/paper/209'})
You will have kept track of a, b .. zzz in your Python script allowing you to build the CREATE statment up with:
(a)-[:creator]->(inProcA), (zzz)-[:creator]-(inProcB)
Now if all of your nodes already exist and you just want to build the relationships in now, then you have the choice of:
Performing individual MATCH and CREATEs for each new relationship, exceuting them each individually. This looks like what your original code was doing. You should move the conditions into the MATCH rather than the WHERE clause.
MATCHing a large set of nodes and CREATEing new realtionships. This is more akin to what my initial code was doing and will require your script to be smart in generating the queries.
MERGEing existing nodes into new relationships.
Whatever you do, you'e going to need to batch the writes within the transaction or you're going to run out of memory - you can advise Neo4J to do this by using the USING PERIODIC COMMIT 50000 syntax, here is a great blog post on it.

Related

How to make relationships between already created nodes of different columns from a single csv file?

I have a single csv file whose contents are as follows -
id,name,country,level
1,jon,USA,international
2,don,USA,national
3,ron,USA,local
4,bon,IND,national
5,kon,IND,national
6,jen,IND,local
7,ken,IND,international
8,ben,GB,local
9,den,GB,international
10,lin,GB,national
11,min,AU,national
12,win,AU,local
13,kin,AU,international
14,bin,AU,international
15,nin,CN,national
16,con,CN,local
17,eon,CN,international
18,fon,CN,international
19,pon,SZN,national
20,zon,SZN,international
First of all I created a constraint on id
CREATE CONSTRAINT idConstraint ON (n:Name) ASSERT n.id IS UNIQUE
Then I created nodes for name, then for country and finally for level as follows -
LOAD CSV WITH HEADERS FROM "file:///demo.csv" AS row
MERGE (name:Name {name: row.name, id: row.id, country:row.country, level:row.level})
MERGE (country:Country {name: row.country})
MERGE (level:Level {type: row.level})
I can see the nodes fine. However, I want to be able to query for things like, for a given country how many names are there? For a given level, how many countries and then how many names for that country are there?
So for that I need to make Relationships between the nodes.
For that I tried like this -
LOAD CSV WITH HEADERS FROM "file:///demo.csv" AS row
MATCH (n:Name {name:row.name}), (c:Country {name:row.country})
CREATE (n)-[:LIVES_IN]->(c)
RETURN n,c
However this gives me a warning as follows -
This query builds a cartesian product between disconnected patterns.
If a part of a query contains multiple disconnected patterns, this will build a cartesian product between all those parts. This may produce a large amount of data and slow down query processing. While occasionally intended, it may often be possible to reformulate the query that avoids the use of this cross product, perhaps by adding a relationship between the different parts or by using OPTIONAL MATCH (identifier is: (c))
Moreover the resulting Graph looks slightly wrong - each Name node has 2 relations with a country whereas I would think there would be only one?
I also have a nagging fear that I am not doing things in an optimized or correct way. This is just a demo. In my real dataset, I often cannot run multiple CREATE or MERGE statements together. I have to LOAD the same CSV file again and again to do pretty much everything from creating nodes. When creating relationships, because a cartesian product forms, the command basically gives Java Heap Memory error.
PS. I just started with neo4j yesterday. I really don't know much about it. I have been struggling with this for a whole day, hence thought of asking here.
You can ignore the cartesian product warning, since that exact approach is needed in order to create the relationships that form the patterns you need.
As for the multiple relationships, it's possible you may have run the query twice. The second run would have created the duplicate relationships. You could use MERGE instead of CREATE for the relationships, that would ensure that there would be no duplicates.

Creating relationships between nodes in neo4j is extremely slow

I'm using a python script to generate and execute queries loaded from data in a CSV file. I've got a substantial amount of data that needs to be imported so speed is very important.
The problem I'm having is that merging between two nodes takes a very long time, and including the cypher to create the relations between the nodes causes a query to take around 3 seconds (for a query which takes around 100ms without).
Here's a small bit of the query I'm trying to execute:
MERGE (s0:Chemical{`name`: "10074-g5"})
SET s0.`name`="10074-g5"
MERGE (y0:Gene{`gene-id`: "4149"})
SET y0.`name`="MAX"
SET y0.`gene-id`="4149"
MERGE (s0)-[:INTERACTS_WITH]->(y0)
MERGE (s1:Chemical{`name`: "10074-g5"})
SET s1.`name`="10074-g5"
MERGE (y1:Gene{`gene-id`: "4149"})
SET y1.`name`="MAX"
SET y1.`gene-id`="4149"
MERGE (s1)-[:INTERACTS_WITH]->(y1)
Any suggestions on why this is running so slowly? I've got index's set up on Chemical->name and Gene->gene-id so I honestly don't understand why this runs so slowly.
Most of your SET clauses are just setting properties to the same values they already have (as guaranteed by the preceding MERGE clauses).
The remaining SET clauses probably only need to be executed if the MERGE had created a new node. So, they should probably be preceded by ON CREATE.
You should never generate a long sequence of almost identical Cypher code. Instead, your Cypher code should use parameters, and you should pass your data as parameter(s).
You said you have a :Gene(id) index, whereas your code actually requires a :Gene(gene-id) index.
Below is sample Cypher code that uses the dataList parameter (a list of maps containing the desired property values), which fixes most of the above issues. The UNWIND clause just "unwinds" the list into individual maps.
UNWIND $dataList AS d
MERGE (s:Chemical{name: d.sName})
MERGE (y:Gene{`gene-id`: d.yId})
ON CREATE SET y.name=d.yName
MERGE (s)-[:INTERACTS_WITH]->(y)

Creating/Managing millions of Vertex Tree in Neo4j 3.0.4

I'm doing some stuff with my University and I've been asked to create a system that builds Complete Trees with millions of nodes (1 or 2 million at least).
I was trying to create the Tree with a Load CSV Using a periodic commit and it worked well with the creation of just Nodes (70000 ms on a general purpose Notebook :P ). When I tried the same with the Edges, it didn't scale as well.
Using periodic commit LOAD CSV WITH HEADERS FROM 'file:///Archi.csv' AS line
Merge (:Vertex {name:line.from})<-[:EDGE {attr1: toFloat(line.attr1), attr2:toFloat(line.attr2), attr3: toFloat(line.attr3), attr4: toFloat(line.attr4), attr5: toFloat(line.attr5)}]-(:Vertex {name:line.to})
I need to guarantee that a Tree is generated in no more than 5 minutes.
Is there a Faster method that can return such a performances?
P.S. : The task doesn't expect to use Neo4j, but just a Database (either SQL or NoSQL), but I found out this NoSQL Graph DB and I thought would be nice to implement with Neo4j as the graph data structure is given for free.
P.P.S : I'm using Cypher
I think you should read up on MERGE in the developer documentation again, to make sure you understand exactly what it's doing.
A few things in particular to be aware of...
If the pattern you are merging does not exist, all elements of the pattern will be merged, which could result in duplicate :Vertex nodes being created. If your :Vertexes are supposed to be in the database already, and if there are no relationships yet, and if you are sure that no relationship repeats itself in your CSV, I strongly urge you to MATCH on the start and end nodes, and then CREATE the relationship between them instead of the MERGE. Remember that doing a MERGE with a relationship with many attributes means it will try to match on that first, so as the number of relationships grow between nodes, there will be an increasing number of comparisons, which will slow your query down further. CREATE is a better choice if you know that no relationship will be duplicated, and if you are sure those relationships don't exist yet.
I also urge you to create an index on :Vertex(name), as that will significantly help matching on end nodes.

Using Excel to generate significant numbers of Cypher statements for Neo4j data loading

I will apologise right at the outset as I am sure my question is elementary! I am not a database man but I have an idea and only a graph database is going to do it - so i am learning right from the very beginning. I am using Neo4j 2.3 and building the blocks of my structure in org charts which I then convert into Excel - I am comfortable with Excel, I am an Engineer!
I use CONCATENATE within Excel to build my Cypher statements and generating the nodes works perfectly, so far so good.
I then used the same technique to build the Cypher statements for the relationships and when I trialled it using a single Cypher Statement the relationship loads perfectly but when I try a set of statements I get a message saying that I need WITH between MATCH and MERGE.
I have read up the stuff about WITH and I can see that I am mixing read and write statements without separating them properly, I can also see that aliasing comes into it - but for the life of me I can't see how to deal with it!
The first sheet looks like this and this generates the nodes nicely:
:
The second sheet - for the relationships, looks like this:
Any help at all would be much appreciated!
Each of the statements your second sheet generates could be executed independently, since you don't reference any of the aliases from previous lines.
Or you could add a WITH to the end of each statement, clearing out the aliases in scope:
MATCH (a1{id:470}), (b1: {id: 48}) MERGE (a1)-[:HAS_ROD_ASSY]->(b1) WITH NULL AS _
MATCH (a2 {id:463}), (b2: {id: 584}) MERGE (a2)-[:ROD_FEATURES]->(b2) WITH NULL AS _
...
LOAD CSV
However, you might find the LOAD CSV functionality in Cypher easier to work with.

Creating nodes and relationships at the same time in neo4j

I am trying to build an database in Neo4j with a structure that contains seven different types of nodes, in total around 4-5000 nodes and between them around 40000 relationships. The cypher code i am currently using is that i first create the nodes with the code:
Create (node1:type {name:'example1', type:'example2'})
Around 4000 of that example with unique nodes.
Then I've got relationships stated as such:
Create
(node1)-[:r]-(node51),
(node2)-[:r]-(node5),
(node3)-[:r]-(node2);
Around 40000 of such unique relationships.
With smaller scale graphs this has not been any problem at all. But with this one, the Executing query never stops loading.
Any suggestions on how I can make this type of query work? Or what i should do instead?
edit. What I'm trying to build is a big graph over a product, with it's releases, release versions, features etc. in the same way as the Movie graph example is built.
The product has about 6 releases in total, each release has around 20 releaseversion. In total there is 371 features and of there 371 features there is also 438 featureversions. ever releaseversion (120 in total) then has around 2-300 featureversions each. These Featureversions are mapped to its Feature whom has dependencies towards a little bit of everything in the db. I have also involed HW dependencies such as the possible hw to run these Features on, releases on etc. so basicaly im using cypher code such as:
Create (Product1:Product {name:'ABC', type:'Product'})
Create (Release1:Release {name:'12A', type:'Release'})
Create (Release2:Release {name:'13A, type:'release'})
Create (ReleaseVersion1:ReleaseVersion {name:'12.0.1, type:'ReleaseVersion'})
Create (ReleaseVersion2:ReleaseVersion {name:'12.0.2, type:'ReleaseVersion'})
and below those i've structured them up using
Create (Product1)<-[:Is_Version_Of]-(Release1),
(Product1)<-[:Is_Version_Of]-(Release2),
(Release2)<-[:Is_Version_Of]-(ReleaseVersion21),
All the way down to features, and then I've also added dependencies between them such as:
(Feature1)-[:Requires]->(Feature239),
(Feature239)-[:Requires]->(Feature51);
Since i had to find all this information from many different excel-sheets etc, i made the code this way thinking i could just put it together in one mass cypher query and run it on the /browser on the localhost. it worked really good as long as i did not use more than 4-5000 queries at a time. Then it created the entire database in about 5-10 seconds at maximum, but now when I'm trying to run around 45000 queries at the same time it has been running for almost 24 hours, and are still loading and saying "executing query...". I wonder if there is anyway i can improve the time it takes, will the database eventually be created? or can i do some smarter indexes or other things to improve the performance? because by the way my cypher is written now i cannot divide it into pieces since everything in the database has some sort of connection to the product. Do i need to rewrite the code or is there any smooth way around?
You can create multiple nodes and relationships interlinked with a single create statement, like this:
create (a { name: "foo" })-[:HELLO]->(b {name : "bar"}),
(c {name: "Baz"})-[:GOODBYE]->(d {name:"Quux"});
So that's one approach, rather than creating each node individually with a single statement, then each relationship with a single statement.
You can also create multiple relationships from objects by matching first, then creating:
match (a {name: "foo"}), (d {name:"Quux"}) create (a)-[:BLAH]->(d);
Of course you could have multiple match clauses, and multiple create clauses there.
You might try to match a given type of node, and then create all necessary relationships from that type of node. You have enough relationships that this is going to take many queries. Make sure you've indexed the property you're using to match the nodes. As your DB gets big, that's going to be important to permit fast lookup of things you're trying to create new relationships off of.
You haven't specified which query you're running that isn't "stopping loading". Update your question with specifics, and let us know what you've tried, and maybe it's possible to help.
If you have one of the nodes already created then a simple approach would be:
MATCH (n: user {uid: "1"}) CREATE (n) -[r: posted]-> (p: post {pid: "42", title: "Good Night", msg: "Have a nice and peaceful sleep.", author: n.uid});
Here the user node already exists and you have created a new relation and a new post node.
Another interesting approach might be to generate your statements directly in Excel, see http://blog.bruggen.com/2013/05/reloading-my-beergraph-using-in-graph.html?view=sidebar for an example. You can run a lot of CREATE statements in one transaction, so this should not be overly complicated.
If you're able to use the Neo4j 2.1 prerelease milestones, then you should try using the new LOAD CSV and PERIODIC COMMIT features. They are designed for just this kind of use case.
LOAD CSV allows you to describe the structure of your data with one or more Cypher patterns, while providing the values in CSV to avoid duplication.
PERIODIC COMMIT can help make large imports more reliable and also improve performance by reducing the amount of memory that is needed.
It is possible to use a single cypher query to create a new node as well as relate it to an existing now.
As an example, assume you're starting with:
an existing "One" node which has an "id" property "1"
And your goal is to:
create a second node, let's call that "Two", and it should have a property id:"2"
relate the two nodes together
You could achieve that goal using a single Cypher query like this:
MATCH (one:One {id:'1'})
CREATE (one) -[:RELATED_TO]-> (two:Two {id:'2'})

Resources