I'm using a LOAD CSV for some tests, and i figured out one problem. How can I create an intermediate node if there isn't one already, using a CREATE?
For example:
I have (p:Person)-[:HAS_BANCOMAT]->(bm:Bancomat)-[:FROM]->(b:Bank). In my model, one Person can only have one bancomat, so, if in my CSV file I'm going to find some people who actually have more then one Bancomat, I will just persist the first occurrence.
I've ended up with this script:
LOAD CSV FROM 'file:///myfile.csv' as row
WITH row.name as name, row.bank as bank, row.id as bancomat_id
//OMITTING CREATING PERSONS p AND BANKS b PART
//JUST GOING ON WHAT IS NOT WORKING
WHERE size((p)-[:HAS_BANCOMAT]->(:Bancomat)-[:FROM]->(b)) = 0
CREATE (bm:Bancomat {bancomatId: row.bancomat_id})
CREATE (p)-[:HAS_BANCOMAT]->(bm)
CREATE (bm)-[:FROM]->(b)
The size part it's not working. I've also tried using NOT over the (p)-[:HAS_BANCOMAT]->(:Bancomat)-[:FROM]->(b) path, bot it doesn't work either.
MERGE is not the solution I'm looking for.
Any suggestions about what I'm doing wrong?
EDIT1: it's not working because this script will create anyway more then one linked Bancomat if some persons got more then one.
Related
To start with Neo4j (4.2.3) I loaded a year's worth of flights data (7m rows) and wanted to try and model a flight as a relationship between origin and destination airport. However the following query just eats up memory and has not finished after two days, so something is clearly amiss:
MATCH (f:Flight), (dest:Airport), (orig:Airport)
WHERE f.Dest = dest.IATA_Code AND f.Origin = orig.IATA_Code
CREATE (orig)-[r:FlightTo {DeptDateTime:f.DepDT, ArriveDateTime:f.ArrDT, Flight:f.Name}]->(dest)
I can do this instead:
LOAD CSV WITH HEADERS FROM 'file:///flights.csv' AS row
MERGE (o:Org_Airport {Org_IATA:row.Origin})
MERGE (d:Dest_Airport {Dest_IATA:row.Dest})
CREATE (o)-[r:FlightTo {DeptDateTime:row.DepDT, ArriveDateTime:row.ArrDT, Flight:row.Name}]->(d)
While this has the advantage of working (even in a reasonable time) it feels ugly to essentially duplicate the airports and also to go through the CSV file again when all the required data is already in the database.
I'm not quite there with my graph thinking probably so I'd appreciate some guidance on what the best way is to add a relationship like this, keeping in mind that original load files might get lost.
Do you have indexes set? Looking at your first query, you'd need:
CREATE INDEX ON :Flight(Dest);
CREATE INDEX ON :Airport(IATA_Code);
If you don't have indexes/constraints set on the label/property, the look up/merge will be very slow.
I created a simple csv that has some boxing matches. I'm trying to figure out how to model this in Neo4j.
The csv looks like this:
My interest in practicing using this small dataset in Neo4j was because it seems like Neo4j would be a good way to easily query who fought who, and who had common opponents, or whatever.
My first thought was that naturally, each boxer should be represented in a 'boxer' node, and each fight should be represented in a 'fight' node.
After modeling it as such, I realized, that there isn't actually one node for each boxer, because over time, the boxer's age changes. So I realized that each boxer would have to have a separate node for each fight. For example, Glass Joe has 2 fights and thus he appears twice, once when he was 23 and once next year when he battled Sandman and he was 24:
But this kinda defeats the purpose. Now, my graph will be made up of disconnected sets of 3 nodes, one for each fight in the csv. So what's the purpose?
My question is, how can I model such a simple yet complex situation like this: some type of tournament or game that changes over time, and the properties of the competitors' nodes change -- yet we want the graph to be connected:
(oops: Sandman should now be 51)
But then again, I don't think the above image is correct -- the edges shown are actually properties of the boxer node. If they are properties of the boxer...then they don't belong on the edge, right?
Here is my code so far (and the csv lives here):
LOAD CSV WITH HEADERS FROM
'file:///<grab it from dropbox please!>' AS line
CREATE (b:boxer {boxer_id: line.boxer_id, name: line.name})
SET b.age = TOINT(age);
LOAD CSV WITH HEADERS FROM
'file:///<grab it from dropbox please!>' AS line
MERGE(f:fight {fight_id: line.fight_id});
I end up with these nodes:
...but not sure how to connect them. Any advice or recommendations would be greatly appreciated.
Your first instinct was right. Ideally if you had the boxer's birthday that's what you would store. That would also help you tell apart boxers who have the same name/nickname. Your idea of storing the boxer's age as part of the relationship is a good idea, though.
If you really wanted to store each node for each boxer for each row you could do the following:
(:BoxerRecord)-[:FOUGHT_IN]->(:Fight)
(:BoxerRecord)-[:REPRESENTS]->(:Boxer)
So basically you use the CREATE clause to create each BoxerRecord and MERGE for each Boxer record so that they get merged together.
Then if you wanted to find all of the boxers that two people have fought in common (I'm making up an :
MATCH
(b1:Boxer {boxer_id: 100),
(b2:Boxer {boxer_id: 101})
(b1)<-[:REPRESENTS]-(:BoxerRecord)-[:FOUGHT_IN]->(:Fight)<-[:FOUGHT_IN]-(:BoxerRecord)-[:REPRESENTS]->(common_boxer:Boxer)<-[:REPRESENTS]-(:BoxerRecord)-[:FOUGHT_IN]->(:Fight)<-[:FOUGHT_IN]-(:BoxerRecord)-[:REPRESENTS]->(b2)
RETURN common_boxer, count(*)
I am trying to build an database in Neo4j with a structure that contains seven different types of nodes, in total around 4-5000 nodes and between them around 40000 relationships. The cypher code i am currently using is that i first create the nodes with the code:
Create (node1:type {name:'example1', type:'example2'})
Around 4000 of that example with unique nodes.
Then I've got relationships stated as such:
Create
(node1)-[:r]-(node51),
(node2)-[:r]-(node5),
(node3)-[:r]-(node2);
Around 40000 of such unique relationships.
With smaller scale graphs this has not been any problem at all. But with this one, the Executing query never stops loading.
Any suggestions on how I can make this type of query work? Or what i should do instead?
edit. What I'm trying to build is a big graph over a product, with it's releases, release versions, features etc. in the same way as the Movie graph example is built.
The product has about 6 releases in total, each release has around 20 releaseversion. In total there is 371 features and of there 371 features there is also 438 featureversions. ever releaseversion (120 in total) then has around 2-300 featureversions each. These Featureversions are mapped to its Feature whom has dependencies towards a little bit of everything in the db. I have also involed HW dependencies such as the possible hw to run these Features on, releases on etc. so basicaly im using cypher code such as:
Create (Product1:Product {name:'ABC', type:'Product'})
Create (Release1:Release {name:'12A', type:'Release'})
Create (Release2:Release {name:'13A, type:'release'})
Create (ReleaseVersion1:ReleaseVersion {name:'12.0.1, type:'ReleaseVersion'})
Create (ReleaseVersion2:ReleaseVersion {name:'12.0.2, type:'ReleaseVersion'})
and below those i've structured them up using
Create (Product1)<-[:Is_Version_Of]-(Release1),
(Product1)<-[:Is_Version_Of]-(Release2),
(Release2)<-[:Is_Version_Of]-(ReleaseVersion21),
All the way down to features, and then I've also added dependencies between them such as:
(Feature1)-[:Requires]->(Feature239),
(Feature239)-[:Requires]->(Feature51);
Since i had to find all this information from many different excel-sheets etc, i made the code this way thinking i could just put it together in one mass cypher query and run it on the /browser on the localhost. it worked really good as long as i did not use more than 4-5000 queries at a time. Then it created the entire database in about 5-10 seconds at maximum, but now when I'm trying to run around 45000 queries at the same time it has been running for almost 24 hours, and are still loading and saying "executing query...". I wonder if there is anyway i can improve the time it takes, will the database eventually be created? or can i do some smarter indexes or other things to improve the performance? because by the way my cypher is written now i cannot divide it into pieces since everything in the database has some sort of connection to the product. Do i need to rewrite the code or is there any smooth way around?
You can create multiple nodes and relationships interlinked with a single create statement, like this:
create (a { name: "foo" })-[:HELLO]->(b {name : "bar"}),
(c {name: "Baz"})-[:GOODBYE]->(d {name:"Quux"});
So that's one approach, rather than creating each node individually with a single statement, then each relationship with a single statement.
You can also create multiple relationships from objects by matching first, then creating:
match (a {name: "foo"}), (d {name:"Quux"}) create (a)-[:BLAH]->(d);
Of course you could have multiple match clauses, and multiple create clauses there.
You might try to match a given type of node, and then create all necessary relationships from that type of node. You have enough relationships that this is going to take many queries. Make sure you've indexed the property you're using to match the nodes. As your DB gets big, that's going to be important to permit fast lookup of things you're trying to create new relationships off of.
You haven't specified which query you're running that isn't "stopping loading". Update your question with specifics, and let us know what you've tried, and maybe it's possible to help.
If you have one of the nodes already created then a simple approach would be:
MATCH (n: user {uid: "1"}) CREATE (n) -[r: posted]-> (p: post {pid: "42", title: "Good Night", msg: "Have a nice and peaceful sleep.", author: n.uid});
Here the user node already exists and you have created a new relation and a new post node.
Another interesting approach might be to generate your statements directly in Excel, see http://blog.bruggen.com/2013/05/reloading-my-beergraph-using-in-graph.html?view=sidebar for an example. You can run a lot of CREATE statements in one transaction, so this should not be overly complicated.
If you're able to use the Neo4j 2.1 prerelease milestones, then you should try using the new LOAD CSV and PERIODIC COMMIT features. They are designed for just this kind of use case.
LOAD CSV allows you to describe the structure of your data with one or more Cypher patterns, while providing the values in CSV to avoid duplication.
PERIODIC COMMIT can help make large imports more reliable and also improve performance by reducing the amount of memory that is needed.
It is possible to use a single cypher query to create a new node as well as relate it to an existing now.
As an example, assume you're starting with:
an existing "One" node which has an "id" property "1"
And your goal is to:
create a second node, let's call that "Two", and it should have a property id:"2"
relate the two nodes together
You could achieve that goal using a single Cypher query like this:
MATCH (one:One {id:'1'})
CREATE (one) -[:RELATED_TO]-> (two:Two {id:'2'})
I am attempting to create a linked list with neo4j, and have a root node with no relationships. Here is the pseudo cypher I am trying to create, but I am not sure how, or even if it is possible:
START root=node(1), item=node(2)
MATCH root-[old?:LINK]->last
WHERE old IS NOT NULL
CREATE root-[:LINK]->item-[:LINK]->last
DELETE old
WHERE old IS NULL
CREATE root-[:LINK]->item
Basically I am trying to insert a node into the list if the list exists, and simply create the first list item otherwise. Obviously you cannot do multiple WHEREs like I have done above. Any ideas how I can achieve this desired functionality with a cypher?
The docs solve the problem by first creating a recurrent :LINK relationship on the root node, but I would like to solve this without doing that (as you then need to create possibly unnecessary relationships for each node).
For anyone interested, I figured out a way to solve the above using some WITH tricks. This is essentially a solution for creating linked lists in neo4j without having to first create a self referencing relationship.
START root=node(1), item=node(2)
MATCH root-[old?:LIST_NEXT]->last
CREATE root-[:LIST_NEXT]->item
WITH item, old, last
WHERE old IS NOT NULL
CREATE item-[:LIST_NEXT]->last
DELETE old
This works by first looking for an existing link relationship, and creating the new one from the root to the item. Then by using WITH we can chain the query to now examine whether or not the matched relationship did in fact exist. If it did, then remove it, and create the remaining link piece from the new item to the old one.
For this, you might want to look at MERGE, http://docs.neo4j.org/chunked/snapshot/query-merge.html#merge-merge-with-on-create-and-on-match
And maybe at the linked list example, http://docs.neo4j.org/chunked/snapshot/cookbook-linked-list.html
I'm very interested in building a data visualisation component and can
see how it could be done but would prefer not to reinvent something
which already exists. If this truly is a 'first' then I'm prepared to put my initial code
on Github for others to share [and hopefully improve !!]
Essentially I'd like to be able to do the following:
1) Access a table or tables within a database and create nodes based
on entries within them. Add nodes on create, remove them on delete.
2) Use the foreign keys and/or join tables [for many-many links] to
create edges. Add edge(s) when node created, remove edges when node
deleted, check and add/remove edges when node updated.
3) Pass the nodes and edges to Gephi for display
I can see how to do steps 1 and 2 quickly and easily -- what I haven't
been able to find (after much searching) is how to do step 3.
Has anyone had any success in doing this? -- any example code that they're willing to share ?
Thanks
We tried something similar once, but it may not help you that much. We wrote a Rake task that got data out our DB, which we then fed into Gephi manually. That wasn't really satisfactory and in the end I went with Rake task -> CSV -> R script for visualization (basically connections of users on a world map). If you are not dead set on using Gephi I could show you some of the R code :-)