Remove property from all nodes in a Neo4j database - neo4j

I've inherited a Neo4j database which is massive to say the least. I've been looking into it and noticed we store a large string in each node that is not used.
Let's say we have a few million nodes called Post and each Post has Text. I'd like to remove Text from each Post in the entire database. Can someone point me down the path of doing so via Cypher?
Also, how would you recommend safely doing this? Should I back up the entire instance incase something goes wrong? Or does Neo4j have a transactional history I can rely on?

Neo4j does not allow NULL in the property so you can REMOVE Text from Post. Here is the syntax:
MATCH (p:Post)
REMOVE p.text
However, you said you have a massive database so I would recommend to remove it by batch.
CALL apoc.periodic.iterate(
"MATCH (p:Post) RETURN p",
"REMOVE p.text",
{batchSize:10000, parallel:true})
It would create batches of 10k Post nodes then remove the property text then run it in parallel.
In any mass updates that you do, it is highly recommended to create a backup of the database.

This code should delete all of the p.text in the nodes that hold it:
MATCH (p:Post)
WHERE exists(p.text)
REMOVE p.text

Related

What's the recommended memory efficient way to perform mass property update?

I have a Neo4j DB with more than 30 million nodes. I'm wondering what might be the most efficient (regarding memory and speed) approach to do a bulk update of a returning pattern within a node property using only the Cypher Shell.
e.g.: having a node with the label USER with a property name of the type String like:
'Peter_Test'
If I want to get rid of all the underscores in a bulk update what's the best way to achieve this without having to select each of the 30 million nodes in a single transaction, update the content and write it back to the same property?
A selection of all USER nodes upfront and a following UNWIND for each entry within the selection plus an update would definitely run into memory issues.
Any advice to perform such a task?
You can use apoc procedure apoc.periodic.iterate for this ,
CALL apoc.periodic.iterate( "MATCH (o:Order) WHERE o.date > '2016-10-13' RETURN o", "MATCH (o)-[:HAS_ITEM]->(i) WITH o, sum(i.value) as value SET o.value = value", {batchSize:100})
This is example in the documentation. Where you return the nodes to be updated in first query and do the update in second query.here ,you won't load all 30 million nodes at once..you can configure it using batch size .
Check out the documentation here

remove all label with a Cypher query?

I deleted all of node and relationship. Now, I want to delete all existing labels with a Cypher query but I can't.
You are probably referring to the neo4j browser's "Node labels" display. The browser can continue to display labels that have been deleted from all nodes (or even if the DB no longer has any nodes). This is really just a minor nuisance.
As long as your Cypher queries show that there are no nodes with that label, rest assured that the label does not "really" exist in the DB.
If you're removing all of the data (nodes and relationships) anyway, you might as well delete your graph.db directory or wherever you store your data. This will also result in having pre-existing labels not show up in the browser.
This will also remove all indexes you might have had set up.

Cypher: Create relationships between nodes based on a common property key id

I'm brand new to Cypher (and Stackoverflow) and am having trouble creating relationships between nodes based on share property keys.
I would like to do something like this:
MATCH (a:Person)-->()<--(b:Country)
WHERE HAS (a.id) AND HAS (b.id) AND a.id=b.id
CREATE (a)-[:LIVES]->(b);
to create a relationship between Country node and Person nodes where they share the same id.
The above creates no errors when run but doesn't create any relationships either and I know that the ids should match.
Many thanks!!
EDIT:
I think I know what is going wrong - I'm asking to match nodes that have a relationship to eachother but no relationships are set up yet hence 0 results. I have now tried:
MATCH (a:Person),
(b:Country)
WHERE HAS (a.id) AND HAS (b.id) AND a.id=b.id
CREATE (a)-[:LIVES]->(b);
and the query is running. It's a big data set so might take a while......
That worked. Had to reduce the size of my data set (down from 64k nodes) as Neo4j was taking way too long to process but once I had a smaller set it worked fine.
One minor addition for future Googlers.
per the help files as of version 3.4
The has() function has been superseded by exists() and has been removed.
The new code should read
MATCH (a:Person),
(b:Country)
WHERE EXISTS (a.id) AND EXISTS (b.id) AND a.id=b.id
CREATE (a)-[:LIVES]->(b);

Creating nodes and relationships at the same time in neo4j

I am trying to build an database in Neo4j with a structure that contains seven different types of nodes, in total around 4-5000 nodes and between them around 40000 relationships. The cypher code i am currently using is that i first create the nodes with the code:
Create (node1:type {name:'example1', type:'example2'})
Around 4000 of that example with unique nodes.
Then I've got relationships stated as such:
Create
(node1)-[:r]-(node51),
(node2)-[:r]-(node5),
(node3)-[:r]-(node2);
Around 40000 of such unique relationships.
With smaller scale graphs this has not been any problem at all. But with this one, the Executing query never stops loading.
Any suggestions on how I can make this type of query work? Or what i should do instead?
edit. What I'm trying to build is a big graph over a product, with it's releases, release versions, features etc. in the same way as the Movie graph example is built.
The product has about 6 releases in total, each release has around 20 releaseversion. In total there is 371 features and of there 371 features there is also 438 featureversions. ever releaseversion (120 in total) then has around 2-300 featureversions each. These Featureversions are mapped to its Feature whom has dependencies towards a little bit of everything in the db. I have also involed HW dependencies such as the possible hw to run these Features on, releases on etc. so basicaly im using cypher code such as:
Create (Product1:Product {name:'ABC', type:'Product'})
Create (Release1:Release {name:'12A', type:'Release'})
Create (Release2:Release {name:'13A, type:'release'})
Create (ReleaseVersion1:ReleaseVersion {name:'12.0.1, type:'ReleaseVersion'})
Create (ReleaseVersion2:ReleaseVersion {name:'12.0.2, type:'ReleaseVersion'})
and below those i've structured them up using
Create (Product1)<-[:Is_Version_Of]-(Release1),
(Product1)<-[:Is_Version_Of]-(Release2),
(Release2)<-[:Is_Version_Of]-(ReleaseVersion21),
All the way down to features, and then I've also added dependencies between them such as:
(Feature1)-[:Requires]->(Feature239),
(Feature239)-[:Requires]->(Feature51);
Since i had to find all this information from many different excel-sheets etc, i made the code this way thinking i could just put it together in one mass cypher query and run it on the /browser on the localhost. it worked really good as long as i did not use more than 4-5000 queries at a time. Then it created the entire database in about 5-10 seconds at maximum, but now when I'm trying to run around 45000 queries at the same time it has been running for almost 24 hours, and are still loading and saying "executing query...". I wonder if there is anyway i can improve the time it takes, will the database eventually be created? or can i do some smarter indexes or other things to improve the performance? because by the way my cypher is written now i cannot divide it into pieces since everything in the database has some sort of connection to the product. Do i need to rewrite the code or is there any smooth way around?
You can create multiple nodes and relationships interlinked with a single create statement, like this:
create (a { name: "foo" })-[:HELLO]->(b {name : "bar"}),
(c {name: "Baz"})-[:GOODBYE]->(d {name:"Quux"});
So that's one approach, rather than creating each node individually with a single statement, then each relationship with a single statement.
You can also create multiple relationships from objects by matching first, then creating:
match (a {name: "foo"}), (d {name:"Quux"}) create (a)-[:BLAH]->(d);
Of course you could have multiple match clauses, and multiple create clauses there.
You might try to match a given type of node, and then create all necessary relationships from that type of node. You have enough relationships that this is going to take many queries. Make sure you've indexed the property you're using to match the nodes. As your DB gets big, that's going to be important to permit fast lookup of things you're trying to create new relationships off of.
You haven't specified which query you're running that isn't "stopping loading". Update your question with specifics, and let us know what you've tried, and maybe it's possible to help.
If you have one of the nodes already created then a simple approach would be:
MATCH (n: user {uid: "1"}) CREATE (n) -[r: posted]-> (p: post {pid: "42", title: "Good Night", msg: "Have a nice and peaceful sleep.", author: n.uid});
Here the user node already exists and you have created a new relation and a new post node.
Another interesting approach might be to generate your statements directly in Excel, see http://blog.bruggen.com/2013/05/reloading-my-beergraph-using-in-graph.html?view=sidebar for an example. You can run a lot of CREATE statements in one transaction, so this should not be overly complicated.
If you're able to use the Neo4j 2.1 prerelease milestones, then you should try using the new LOAD CSV and PERIODIC COMMIT features. They are designed for just this kind of use case.
LOAD CSV allows you to describe the structure of your data with one or more Cypher patterns, while providing the values in CSV to avoid duplication.
PERIODIC COMMIT can help make large imports more reliable and also improve performance by reducing the amount of memory that is needed.
It is possible to use a single cypher query to create a new node as well as relate it to an existing now.
As an example, assume you're starting with:
an existing "One" node which has an "id" property "1"
And your goal is to:
create a second node, let's call that "Two", and it should have a property id:"2"
relate the two nodes together
You could achieve that goal using a single Cypher query like this:
MATCH (one:One {id:'1'})
CREATE (one) -[:RELATED_TO]-> (two:Two {id:'2'})

Delete all relations and connected nodes in Neo4j for a user

We have selected neo4j as the DB for our web application. The user has a large number of relations and connected nodes. As of now there are about 20 relations for a user. One of the features is a newsfeed feature. If i want to delete a user completely, is the cypher query the best way to delete or is there any other alternative?
Since we are still planning to add new features, the relationships and nodes connected to the user also will increase. So if we use cypher query, the query has to be modified for every new relationship added. Please advise.
Thanks,
Pavan
Yes, you can use Cypher to remove a user. Of course, there are alternative methods, depending on the language or framework you're using with your web application. If you like to have advise on that, please specifiy how you're using Neo4j in detail.
Note that you have to remove all relationships (outgoing and incoming) first in order to be able to remove the node.
Example:
START n = node(3)
MATCH n-[r]-()
DELETE n, r
This example was taken from the official manual: http://docs.neo4j.org/chunked/milestone/query-delete.html
As of Neo4j 2.3 there is another way to do this:
MATCH (n { name:'Andres' })
DETACH DELETE n
I found this example in the documentation at: http://neo4j.com/docs/stable/query-delete.html
An alternative could be to write a gremlin script that traverses your graph starting with your user and is putting in two collection the relationships and the nodes that you intend to delete. If you want to delete everything, perhaps you can implement your depth first traversal in Gremlin and delete while traversing.

Resources