Performing partial backup in neo4j - neo4j

I have more separate independent structures in the database. I need to do a backup for each of these structures separately and not to do a full backup of everything.
I am interested is there a way to do a backup of some specific graph part. I checked what backup strategies there are in the neo4j documentation. There are incremental backup and full backup, but I could not find the possibility to extract and backup only some part of the graph or maybe some independent graph structure in the database.
Ideal would be to define cypher query and to get the result like that. For example in most relational databases it is possible to extract/backup separate table or dataset (depending on database). So that is something I am looking to do in neo4j too. Define node label and then do a backup or by some other criteria.

You can use the experimental dump command along with the shell :
Example: dumping the user nodes to a users.cypher file that will contain all the cypher statements for recreating the users later :
./bin/neo4j-shell -c 'dump MATCH (n:User) RETURN n;' > users.cypher
Related info in the documentation : http://neo4j.com/docs/stable/shell-commands.html#_dumping_the_database_or_cypher_statement_results

Related

Metadata in neo4j graph database

I know that neo4j stores data structured in graphs rather than in tables. In RDBMS we will be having schemas of the tables but in neo4j we will not be having the tables. Only nodes, relations and properties are defined. So is there any concept of metadata in neo4j. Like is there any information stored about nodes, relationships in the database? If yes, how and what it stores in the metadata? Also where can we find the metadata related information in the graph database (location)
Thanks,
Neo4J doesn't directly store metadata in the way that you're looking for. The NeoProfiler tool was written precisely for this purpose. You can run it on a Neo4J database, and it will pull out as much information on labels, indexes, constraints, properties, nodes, and relationships as it can. The way that this works isn't too far off of the queries that #ulkas suggests in the other answer here, the output is just much better.
More broadly, in an RDBMS the schema information you pull out substantially constrains the database. The schema there is like a set of rules; you can't insert data unless it conforms to that schema. In Neo4J, because it's so flexible, even if there was a schema it would just be documentation of what's there, it would not be a set of constraints on what you can put in. At any time, you can insert new data that has nothing to do with the present schema (except that you can't violate things like uniqueness constraints).
If you want to see an equivalent schema for your database in neo4j, check out neoprofiler linked above. A few people out there have written about "metagraphs" - that is, they talk about representing a neo4j schema as a graph itself, where for example a node refers to a label. Relationships from that "label node" then go out to other kinds of label nodes, specifying what sorts of relationships can exist between nodes. For example, nodes labeled "Employee" may frequently have "works_for" relationships to nodes of label "Company".
no, direct metadata are not present. the maximum you can do is to query all the structure types and have a small inside what kind of graph could be stored in the db.
START r=rel(*)
RETURN type(r), count(*)
START n=node(*)
RETURN labels(n), count(*)
the specific database files are stored in the folder data/graph.db but besides some index and key files they are binary and not easy to read.
Meanwhile there is the official APOC Library.
This includes functions like apoc.meta.graph, apoc.meta.schema and others.
The link above describes the installation, if you run into sandbox errors, check the answers in this question

Loading Neo4j database dump (neo4j-shell)

My database was affected by the bug in Neo4j 2.1.1 that tends to corrupt the database in the areas where many nodes have been deleted. It turns out most of the relationships that have been affected were marked for deletion in my database. I have dumped the rest of the data using neo4j-shell and with a single query. This gives a 1.5G Cypher file that I need to import into a mint database to have my data back in a healthy data structure.
I have noticed that the dump file contains definitions for (1) schema, (2) nodes and (3) relationships. I have already removed the schema definitions from the file because they can be applied later on. Now the issue is that since the dump file uses a single series of identifiers for nodes during node creation (in the following format: _nodeid) and relationship creation, it seems that all CREATE statements (33,160,527 in my case) need to be run in a single transaction.
My first attempt to do so kept the server busy for 36 hours without results. I had neo4j-shell load the data directly into a new database directory instead of connecting to a server. The data files in the new database directory never showed any sign of receiving data, and the message log showed many messages indicating thread blocks.
I wonder what is the best way of getting this data back into the database? Should I load a specific config file? Do I need to allocate a large Java heap? What is the trick to have such a large dump file loaded into a database?
The dump command is not meant for larger scale exports, there was originally a version that did, but it was not included in the product.
if you have the old database still around, you can try some things:
contact Neo4j support to help you recover your data
use my store-utils to copy it over to a new db (it will skip all broken records)
query the data with cypher and export the results as csv
you could use the shell-import-tools for that
and then import your data from the CSV using either the shell tools again, or the load csv command or the batch-importer
Here is what I finally did:
First I identified all unaffected nodes and marked them with one specific label (let's say Carriable). This was a pretty easy process in my case because all the affected nodes had the same label, so, I just excluded this specific label. In my case I did not have to identify the affected relationships separately because all the affected relationships were also connected to nodes from the affected label.
Then I exported the whole database except the affected nodes and relationships to GraphML using a single query (in neo4j-shell):
export-graphml -o /home/mah/full.gml -t -r match (n:Carriable) optional match (n)-[i]-(:Carriable) return n,i
This took about a half hour to yield a 4GB XML file.
Then I imported the entire GraphML back into a mint database:
JAVA_OPTS="-Xmx8G" neo4j-shell -c "import-graphml -c -t -b 10000 -i /home/mah/full.gml" -path /db/newneo
This took yet another half hour to accomplish.
Please note that I allocated more than sufficient Java heap memory (JAVA_OPTS="-Xmx8G"), imposed a particularly small batch size (-b 10000) and allowed the use of on-disk caching.
Finally, I removed the unnecessary "Carriable" label and recreated the constraints.

Neo4j data modeling for branching/merging graphs

We are working on a system where users can define their own nodes and connections, and can query them with arbitrary queries. A user can create a "branch" much like in SCM systems and later can merge back changes into the main graph.
Is it possible to create an efficient data model for that in Neo4j? What would be the best approach? Of course we don't want to duplicate all the graph data for every branch as we have several million nodes in the DB.
I have read Ian Robinson's excellent article on Time-Based Versioned Graphs and Tom Zeppenfeldt's alternative approach with Network versioning using relationnodes but unfortunately they are solving a different problem.
I Would love to know what you guys think, any thoughts appreciated.
I'm not sure what your experience level is. Any insight into that would be helpful.
It would be my guess that this system would rely heavily on tags on the nodes. maybe come up with 5-20 node types that are very broad, including the names and a few key properties. Then you could allow the users to select from those base categories and create their own spin-offs by adding tags.
Say you had your basic categories of (:Thing{Name:"",Place:""}) and (:Object{Category:"",Count:4})
Your users would have a drop-down or something with "Thing" and "Object". They'd select "Thing" for instance, and type a new label (Say "Cool"), values for "Name" and "Place", and add any custom properties (IsAwesome:True).
So now you've got a new node (:Thing:Cool{Name:"Rock",Place:"Here",IsAwesome:True}) Which allows you to query by broad categories or a users created categories. Hopefully this would keep each broad category to a proportional fraction of your overall node count.
Not sure if this is exactly what you're asking for. Good luck!
Hmm. While this isn't insane, think about the type of system you're replacing first. SQL. In SQL databases you wouldn't use branches because it's data storage. If you're trying to get data from multiple sources into one DB, I'd suggest exporting them all to CSV files and using a MERGE statement in cypher to bring them all into your DB at once.
This could manifest similar to branching by having each person run a script on their own copy of the DB when you merge that takes all the nodes and edges in their copy and puts them all into a CSV. IE
MATCH (n)-[:e]-(n2)
RETURN n,e,n2
Then comparing these CSV's as you pull them into your final DB to see what's already there from the other copies.
IMPORT CSV WITH HEADERS FROM "file:\\YourFile.CSV" AS file
MERGE (N:Node{Property1:file.Property1, Property2:file.Property2})
MERGE (N2:Node{Property1:file.Property1, Property2:file.Property2})
MERGE (N)-[E:Edge]-(N2)
This will work, as long as you're using node types that you already know about and each person isn't creating new data structures that you don't know about until the merge.

Importing data from oracle to neo4j using java API

Can u please share any links/sample source code for generating the graph using neo4j from Oracle database tables data .
And my use case is oracle schema table names as Nodes and columns are properties. And also need to genetate graph in tree structure.
Make sure you commit the transaction after creating the nodes with tx.success(), tx.finish().
If you still don't see the nodes, please post your code and/or any exceptions.
Use JDBC to extract your oracle db data. Then use the Java API to build the corresponding nodes :
GraphDatabaseService db;
try(Transaction tx = db.beginTx()){
Node datanode = db.createNode(Labels.TABLENAME);
datanode.setProperty("column name", "column value"); //do this for each column.
tx.success();
}
Also remember to scale your transactions. I tend to use around 1500 creates per transaction and it works fine for me, but you might have to play with it a little bit.
Just do a SELECT * FROM table LIMIT 1000 OFFSET X*1000 with X being the value for how many times you've run the query before. Then keep those 1000 records stored somewhere in a collection or something so you can build your nodes with them. Repeat this until you've handled every record in your database.
Not sure what you mean with "And also need to genetate graph in tree structure.", if you mean you'd like to convert foreign keys into relationships, remember to just index the key and in stead of adding the FK as a property, create a relationship to the original node in stead. You can find it by doing an index lookup. Or you could just create your own little in-memory index with a HashMap. But since you're already storing 1000 sql records in-memory, plus you are building the transaction... you need to be a bit careful with your memory depending on your JVM settings.
You need to code this ETL process yourself. Follow the below
Write your first Neo4j example by following this article.
Understand how to model with graphs.
There are multiple ways of talking to Neo4j using Java. Choose the one that suits your needs.

Is there a tool to dump a Neo4j graph as Cypher and re-load it from Cypher?

Everyone familiar with MySQL has likely used the mysqldump command which can generate a file of SQL statements representing both the schema and data in a MySQL database.
These SQL text files are commonly used for many purposes: backups, seeding replicas, copying databases between installations (- copy prod DBs to staging environments etc) and others.
Is there a similar tool for Neo4j that can dump an entire graph into a text file of Cypher statements, that when executed on an empty database would reconstruct the original data?
Thanks.
In neo4j version 2 (e.g. 2.0.0M3), using neo4j-shell, you can use the command
dump
which will create the cypher statements (pretty much like mysqldump would do. To read in the file you can use
cat dump.cql | neo4j-shell
Cypher is just a query language for Neo4J just as SQL is for MySQL or other relational databases. If you wish to transfer the db, then you just need to copy the folder containing the database files. Simple.
For example my folder simple-graph contains all the db files. Just copy the folder and store it at some other location. You can directly start using it as:
GraphDatabaseServiceraphDb = new EmbeddedGraphDatabase(DB_PATH);//DB_PATH is path to the new location
You can use the procedure apoc.export.cypher.all() to dump all the data in your database.
For example, you can dump the database into a single file called dump-file.cypher:
neo4j#neo4j> CALL apoc.export.cypher.all('dump-file.cypher');
For details of the procedure, please see the documentation: https://neo4j.com/labs/apoc/4.4/overview/apoc.export/apoc.export.cypher.all/.

Resources