How do I refactor data two neo4j nodes to a relationship?

How do I refactor data two neo4j nodes to a relationship? - neo4j

I'm doing an experiment with using a graph database (neo4j). I have two csv's that I imported into a neo4j datastore. I'm a little shakey on the neo terminology; so forgive me. Lets say I have:
Customer (AccountNumber, CustomerName) and
CustomerGroups (AccountNumber, GroupName).
I would like to create a new Node called groups which is comprised of the distinct GroupName from CustomerGroups. I'll call it Group.
I then want to create relationships "HAS_GROUP" from Customer to Group using the common AccountNumber from CustomerGroups.
Once the above is completed, I could delete CustomerGroups as its no longer needed.
I'm just stuck at the syntax. I can get the distinct groups from CustomerGroups with:
MATCH (n:CustomerGroups) distinct n.GROUP_NAME
and I get back about 50 distinct groups, but can't figure how to add the create statement to the results and CREATE g:Group {GroupName: n.GROUP_NAME}
I then know my followup question is how to do the MATCH to the new group using the old table with common account numbers.
FYI: I've indexed the AccountNumber in both Nodes. Both Customer and CustomerGroups have over 5 Million nodes. Not bad for a laptop (2 min to import using neo4j-import). I was impressed!
Thanks for any help you can give!

Instead of creating a CustomerGroups label and creating nodes for that, you should be able to define relationships that you would like to create in your neo4j-import. It would certainly be a lot faster too. See:
http://neo4j.com/docs/stable/import-tool-header-format.html
To your question, you could probably do something like:
MATCH (cg:CustomerGroup)
MATCH (customer:Customer {AccountNumber: cg. AccountNumber}), (group:Group {GroupName: cg.GroupName})
CREATE (customer)-[:IN_GROUP]->(group)
You'd definitely want to make sure you have indexes on :Customer(AccountNumber) and :Group(GroupName) first. But even then it would still be much slower than doing it as part of your initial import.
Also, you may or may not want MERGE instead of CREATE

Related

Create relationship with properties using a query in Cypher

I would like to know if this is possible. I have a query that produces a nice report showing a relationship between two entities through two other nodes. There can be more than one path. I now want to create a direct relationship between those two nodes and count the number of paths and sum based upon data in the nodes in between. the report query is below.
match (bo:BuyerAgency)<-[:IS_FOR_BO]-(sol:Solicitation)-[:SELECTED]->(prop:Proposal)<-[:OWNS_BID]-(so:VendorOrg)
where sol.currStatus='Awarded'
return bo.AgencyName, count(sol.Number) as awards, so.orgName, sum(prop.finalPrice) as awardVolume;
What I want to do is similar to below which will not work.
match (bo:BuyerAgency)<-[:IS_FOR_BO]-(sol:Solicitation)-[:SELECTED]->(prop:Proposal)<-[:OWNS_BID]-(so:VendorOrg)
where sol.currStatus='Awarded'
create (bo)-[:HAS_AWARDED{awardCount: count(sol.Number), awardVolume: sum(prop.finalPrice)}]->(so);
If I remove the properties for the relationship, it works but want to add the properties without to much programing.
I am using the most recent version of Neo4j 3.2.
thanks

The problem here is you are trying to use count() and sum() functions in an invalid context. The below query should work:
match (bo:BuyerAgency)<-[:IS_FOR_BO]-(sol:Solicitation)-[:SELECTED]->(prop:Proposal)<-[:OWNS_BID]-(so:VendorOrg)
where sol.currStatus='Awarded'
with bo, so, count(sol.Number) as count_sol, sum(prop.finalPrice) as sum_finalPrice
create (bo)-[:HAS_AWARDED{awardCount: count_sol, awardVolume: sum_finalPrice}]->(so);
This query uses WITH to pass bo, so and the result of the aggregation functions count(sol.Number) and sum(prop.finalPrice) to the next context. After, these values are used to create the new relation between bo and so.

Cypher: Create relationships between nodes based on a common property key id

I'm brand new to Cypher (and Stackoverflow) and am having trouble creating relationships between nodes based on share property keys.
I would like to do something like this:
MATCH (a:Person)-->()<--(b:Country)
WHERE HAS (a.id) AND HAS (b.id) AND a.id=b.id
CREATE (a)-[:LIVES]->(b);
to create a relationship between Country node and Person nodes where they share the same id.
The above creates no errors when run but doesn't create any relationships either and I know that the ids should match.
Many thanks!!
EDIT:
I think I know what is going wrong - I'm asking to match nodes that have a relationship to eachother but no relationships are set up yet hence 0 results. I have now tried:
MATCH (a:Person),
(b:Country)
WHERE HAS (a.id) AND HAS (b.id) AND a.id=b.id
CREATE (a)-[:LIVES]->(b);
and the query is running. It's a big data set so might take a while......

That worked. Had to reduce the size of my data set (down from 64k nodes) as Neo4j was taking way too long to process but once I had a smaller set it worked fine.

One minor addition for future Googlers.
per the help files as of version 3.4
The has() function has been superseded by exists() and has been removed.
The new code should read
MATCH (a:Person),
(b:Country)
WHERE EXISTS (a.id) AND EXISTS (b.id) AND a.id=b.id
CREATE (a)-[:LIVES]->(b);

Creating nodes and relationships at the same time in neo4j

I am trying to build an database in Neo4j with a structure that contains seven different types of nodes, in total around 4-5000 nodes and between them around 40000 relationships. The cypher code i am currently using is that i first create the nodes with the code:
Create (node1:type {name:'example1', type:'example2'})
Around 4000 of that example with unique nodes.
Then I've got relationships stated as such:
Create
(node1)-[:r]-(node51),
(node2)-[:r]-(node5),
(node3)-[:r]-(node2);
Around 40000 of such unique relationships.
With smaller scale graphs this has not been any problem at all. But with this one, the Executing query never stops loading.
Any suggestions on how I can make this type of query work? Or what i should do instead?
edit. What I'm trying to build is a big graph over a product, with it's releases, release versions, features etc. in the same way as the Movie graph example is built.
The product has about 6 releases in total, each release has around 20 releaseversion. In total there is 371 features and of there 371 features there is also 438 featureversions. ever releaseversion (120 in total) then has around 2-300 featureversions each. These Featureversions are mapped to its Feature whom has dependencies towards a little bit of everything in the db. I have also involed HW dependencies such as the possible hw to run these Features on, releases on etc. so basicaly im using cypher code such as:
Create (Product1:Product {name:'ABC', type:'Product'})
Create (Release1:Release {name:'12A', type:'Release'})
Create (Release2:Release {name:'13A, type:'release'})
Create (ReleaseVersion1:ReleaseVersion {name:'12.0.1, type:'ReleaseVersion'})
Create (ReleaseVersion2:ReleaseVersion {name:'12.0.2, type:'ReleaseVersion'})
and below those i've structured them up using
Create (Product1)<-[:Is_Version_Of]-(Release1),
(Product1)<-[:Is_Version_Of]-(Release2),
(Release2)<-[:Is_Version_Of]-(ReleaseVersion21),
All the way down to features, and then I've also added dependencies between them such as:
(Feature1)-[:Requires]->(Feature239),
(Feature239)-[:Requires]->(Feature51);
Since i had to find all this information from many different excel-sheets etc, i made the code this way thinking i could just put it together in one mass cypher query and run it on the /browser on the localhost. it worked really good as long as i did not use more than 4-5000 queries at a time. Then it created the entire database in about 5-10 seconds at maximum, but now when I'm trying to run around 45000 queries at the same time it has been running for almost 24 hours, and are still loading and saying "executing query...". I wonder if there is anyway i can improve the time it takes, will the database eventually be created? or can i do some smarter indexes or other things to improve the performance? because by the way my cypher is written now i cannot divide it into pieces since everything in the database has some sort of connection to the product. Do i need to rewrite the code or is there any smooth way around?

You can create multiple nodes and relationships interlinked with a single create statement, like this:
create (a { name: "foo" })-[:HELLO]->(b {name : "bar"}),
(c {name: "Baz"})-[:GOODBYE]->(d {name:"Quux"});
So that's one approach, rather than creating each node individually with a single statement, then each relationship with a single statement.
You can also create multiple relationships from objects by matching first, then creating:
match (a {name: "foo"}), (d {name:"Quux"}) create (a)-[:BLAH]->(d);
Of course you could have multiple match clauses, and multiple create clauses there.
You might try to match a given type of node, and then create all necessary relationships from that type of node. You have enough relationships that this is going to take many queries. Make sure you've indexed the property you're using to match the nodes. As your DB gets big, that's going to be important to permit fast lookup of things you're trying to create new relationships off of.
You haven't specified which query you're running that isn't "stopping loading". Update your question with specifics, and let us know what you've tried, and maybe it's possible to help.

If you have one of the nodes already created then a simple approach would be:
MATCH (n: user {uid: "1"}) CREATE (n) -[r: posted]-> (p: post {pid: "42", title: "Good Night", msg: "Have a nice and peaceful sleep.", author: n.uid});
Here the user node already exists and you have created a new relation and a new post node.

Another interesting approach might be to generate your statements directly in Excel, see http://blog.bruggen.com/2013/05/reloading-my-beergraph-using-in-graph.html?view=sidebar for an example. You can run a lot of CREATE statements in one transaction, so this should not be overly complicated.

If you're able to use the Neo4j 2.1 prerelease milestones, then you should try using the new LOAD CSV and PERIODIC COMMIT features. They are designed for just this kind of use case.
LOAD CSV allows you to describe the structure of your data with one or more Cypher patterns, while providing the values in CSV to avoid duplication.
PERIODIC COMMIT can help make large imports more reliable and also improve performance by reducing the amount of memory that is needed.

It is possible to use a single cypher query to create a new node as well as relate it to an existing now.
As an example, assume you're starting with:
an existing "One" node which has an "id" property "1"
And your goal is to:
create a second node, let's call that "Two", and it should have a property id:"2"
relate the two nodes together
You could achieve that goal using a single Cypher query like this:
MATCH (one:One {id:'1'})
CREATE (one) -[:RELATED_TO]-> (two:Two {id:'2'})

How to create multiple nodes and relationships in Neo4J with one Cypher / REST query?

I want to create multiple nodes (if do not exist yet) and relationships between them (parallel ones, if already exist) with one query.
What would be the best way to do that in Neo4J 2.0?
I tried different ways, but what I've found so far is either to add them pair by pair, as described here, merge on multiple relationships (but that seems to work also only by pairs), or through transactions (as described here). The combination of the 2nd and the 3rd option would work fine, but I would just like to limit it to two queries:
1) Create all the nodes (if don't exist yet), get their IDs.
2) Create relationships between them (using IDs obtained in 1).
3) Submit the two queries as statements into transaction, commit.
The only thing is that I'm new to Cypher and don't know how to make a query like that.
Can anybody help, please?
Thank you!

How to use "MATCH.. CREATE UNIQUE.." with neography

I'm trying to write an importer script that will take a MySQL resultset and add nodes and relationships in my Neo4J DB. To simplify things, my resultset has the following fields:
application_id
amount
application_date
account_id
I want to create a node with Application label with the application_id, amount, application_date fields and another node with Account label with account_id field and a relationship between them.
My SQL query is from the applications table so I'm not afraid of dups there, but an account_id can appear more than once and I obviously don't want to create multiple nodes for that.
I'm using neography (but willing to switch if there is something simpler). What would be the best (easiest) way to achieve that?
My script will drop the database before it starts so no leftovers to take care of.
Should I create an index before and use create_unique_node?
I think I can do what I want in cypher using "MATCH .. CRAETE UNIQUE..", what's the equivalent of that in neography? I don't understand how index_name gets into the equation...
Should I or should I not define the constraint on Account?
Many thanks,
It's my first time with graph DBs so apologies if I miss a concept here..

From what you describe, it looks like you should use the constraints that come with Neo4j 2.0 : http://docs.neo4j.org/chunked/milestone/query-constraints.html#constraints-create-uniqueness-constraint
CREATE CONSTRAINT ON (account:ACCOUNT) ASSERT account.account_id IS UNIQUE
Then you can use the MATCH .. CREATE UNIQUE clauses for all your inserts. You can use neography to submit the cypher queries, see the examples here: https://github.com/maxdemarzi/neography/wiki/Scripts-and-queries

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

How do I refactor data two neo4j nodes to a relationship? - neo4j

Related

Create relationship with properties using a query in Cypher

Cypher: Create relationships between nodes based on a common property key id

Creating nodes and relationships at the same time in neo4j

How to create multiple nodes and relationships in Neo4J with one Cypher / REST query?

How to use "MATCH.. CREATE UNIQUE.." with neography

Categories

Resources