Backup neo4j community edition offline in unix: mac or linux - neo4j

Previously I had a problem when making a 'backup' as shown in this question where I get an error when trying to restore the database because I did a copy when the database was running.
So I did an experiment with a new database from another computer (this time with ubuntu) I tried this:
I created some nodes and relations, very few like 10 (the matrix example).
Then I stopped the service neo4j
I copied the folder data that contains graph.db to another location
After that I deleted the graph.db folder and started neo4j
It created automatically a new graph.db folder and the database runs as new without any data, that is normal.
Then I stopped again and paste the old graph.db folder
I get an error:
Starting Neo4j Server...WARNING: not changing user waiting
for server to be ready... Failed to start within 120 seconds.
The error appears after 5 seconds not after 120 seconds.
I tried pasting the folder called data. Same error.
How should I backup and restore in neo4j community offline manually?
I read in some posts that you only copy and restore but that does not work.
Thank you for your help

Online backup, in a sense of taking a consistent backup while Neo4j is running, is only available in Neo4j enterprise edition. Enterprise edition's backup also features a verbose consistency check of the backup, something you do not get in community either.
The only safe option in community edition is to shutdown Neo4j cleanly and copy away the graph.db folder recursively. I'm typically using:
cd data
tar -zcf graph.db.tar.gz graph.db/
For restoring you shut down neo4j, clean out a existing graph.db folder and restore the original graph.db folder from your backup:
cd data
rm -rf graph.db
tar -zxf graph.db.tar.gz

I also ran into this issue and wrote following two codes:
Make backup of momentary state
service neo4j stop && now=$(date +"%m_%d_%Y") && cd /var/lib/neo4j/data/databases/ && tar -cvzf /var/backups/neo4j/$now.gb.tar.gz graph.db && service neo4j start
service neo4j stop = stop the neo4j service
now=$(date +"%m_%d_%Y") = declare the current date as variable
cd /var/lib/neo4j/data/databases/ = change directories to your neo4j dir where the graph.db is located
tar -cvzf /var/backups/neo4j/$now.gb.tar.gz graph.db = make a compressed copy of the graph.db and save it to /var/backups/neo4j/$now.gb.tar.gz
service neo4j start = restart neo4j
Restore neo4j database from a backup
service neo4j stop && cd /var/lib/neo4j/data/databases/ && rm -r graph.db && tar xf /var/backups/neo4j/10_25_2016.gb.tar.gz -C /var/lib/neo4j/data/databases/ && service neo4j start
service neo4j stop = stop the neo4j service
cd /var/lib/neo4j/data/databases/ = change directories to your neo4j dir where the graph.db is located
rm -r graph.db = remove the current graph.db and all its contents
tar xf /var/backups/neo4j/10_25_2016.gb.tar.gz -C /var/lib/neo4j/data/databases/ = Extract the backup to the directory where the old graph.db was located. Be sure to adjust the filename 10_25_2016.gb.tar.gz to what you called your file
service neo4j start = restart neo4j
Info:
This seems to work for me but as I do not have alot of experience with bash scripting I doubt this is the optimal way. But I think it is understandable and easy to customize :)
Cheers

If you cant shutdown and copy the file then you can write a cron script to fetch the data from Neo4j and store it in some other database , say mongodb. You can write cron script to restore also.
This method is only for those who dont have money to buy enterprise edition and cant shutdown his server .

Related

How to remove neo4j COMPLETE?

Installing neo4j at arch with yay -S neo4j-community and starting with systemctl start neo4j I open http://localhost:7474/browser/ and execute the demo command create database movies but I get the error
Neo.ClientError.Statement.NotSystemDatabaseError
Unsupported administration command: CREATE DATABASE movies
because of desperation I tried to remove neo4j completely but he still "remembers" my initial password and also still has a reference to graph.db that I have removed manually in the menubar.
Where are the files neo4j stores and does someone have idea to fix the problem with the database creation?

How to get postgres to start on big sur?

I'm attempting to launch a rails server on big sur (M1 chip) and postgres is giving the following error:
ActiveRecord::ConnectionNotEstablished (could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/tmp/.s.PGSQL.5432"?
):
I've seen and tried several fixes but none have worked, including the following:
Reinstalling postgres via homebrew.
Reinstalling the pg gem.
brew services restart.
Trying to delete a postmaster.pid file (none exist). This directory: "/usr/local/var/postgres/postmaster.pid" does not exist on my machine.
My postgres.log file contains the following line repeating:
could not open directory "pg_notify": No such file or directory LOG: database system is shut down
While Genetic's answer works, a quicker solution would be to delete the partially created database (assuming you have just installed postgres and there's no data to be lost) and then run initdb as listed in brew info postgresql to recreate the database:
brew services stop postgresql
rm -rf "$(brew --prefix)/var/postgres"
initdb --locale=C -E UTF-8 "$(brew --prefix)/var/postgres"
brew services start postgresql
The original error on the console didn't change until I entered the following command:
brew services restart -vvv postgresql
After doing this, the errors updated. It then displayed the other directories and sub-directories that were missing. Once I added everything, all was fine.
The solution by anonymus_rex in the comments worked for me. Here are the exact steps I needed to take in case it could help anyone to elaborate a bit more. i was stuck on this for way too long.
I tried almost all of the answers in this question and this other one and this is what finally worked for me to get postgres to start
tail the logs for postgres.
the path needs to be updated depending on where postgres is installed, and your version. I am using postgresql#14 on an m1 Monterey and installed it with homebrew.
i finally found the path i needed to look at using this article.
tail /opt/homebrew/var/log/postgresql#14.log
output shows this:
2023-02-03 15:33:49.294 CST [82651] FATAL: could not open directory "pg_notify": No such file or directory
2023-02-03 15:33:49.294 CST [82651] LOG: database system is shut down
go to the / directory and cd opt/homebrew/var/postgresql#14
create the missing directory (maybe this is a different directory for you)
mkdir pg_notify
repeat this process for all missing directories.
I needed to mkdir for pg_tblspc, pg_replslot, pg_twophase, pg_stat_tmp, pg_logical/snapshots, pg_logical/mappings, pg_commit_ts, pg_snapshots, & pg_commit_ts but i recommend you specifically run the tail command each time to make sure you are not missing different directories & files than me.
finally after running the tail command repeatedly after creating each missing directory, I got this output.
2023-02-03 15:49:18.909 CST [85772] LOG: redo done at 0/17211D8 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2023-02-03 15:49:18.914 CST [85771] LOG: database system is ready to accept connections
i was then able to create & migrate my db in my project ・ᴗ・

InfluxDB: Move only one database of many from one server instance to another

I have an InfluxDB server instance containing several databases, like sensors, network, telegraf and so on.
Together these databases consume several dozens of GB, and I want to offload only the sensors database to another more powerful machine.
The simplest case would be that I create a new InfluxDB server instance on that other machine, and just move (rsync) the influxdb/data/sensors folder to the other machine, and delete it from the original one.
While I haven't tested it, I assume that this does not work that easily; there is a data/_internal directory, then there's the meta/meta.db file as well as the wal/* directory, which will probably require everything to be left "as-is" in order for the server instance to boot without error.
Since I'm talking about dozens of GBs per database, I'd ideally just would like to mount a new ssd, copy the files/directories, and then mount that new ssd on the other machine and use it directly as the new data source without further copying.
I'd basically wish I could do this in a way as easy as moving rrd-tool's rrd files from one machine to another.
Is this possible? If not, what are my options?
Edit 2022: This is a solution which works for InfluxDB 1.x, the commands shown here may not be directly applicable to 2.x. Here is a link to the 2.x backup/restore documentation: https://docs.influxdata.com/influxdb/v2.2/backup-restore/
The InfluxDB 2.2 influx backup command is not compatible with versions of InfluxDB prior to 2.0.0.
I resorted to using influxd backup / influxd restore as Yuri Lachin pointed out.
While it does have the drawback of first needing to save the data on disc and then read it in from there, it seems to be the the most flexible approach.
Rsyncing 50GB does take a certain amount, and the databases would need to be offline during that time, which is not a requirement for backup / restore; so no data is lost. It also allows to migrate the data which used to be on one single InfluxDB instance to different InfluxDB servers without having to think about the issue with the metadata database.
The backup / restore can be done in steps, where the first step ist to initially backup all the data of the database, restore it into the new server instance, and then exporting the newest data again which didn't make it into the first backup, restoring it again into the new database.
Step 1:
On the machine containing the new, empty InfluxDB server instance, backup the data from the remote, old InfluxDB instance:
influxd backup \
-portable \
-host 192.168.11.10:8088 \
-database sensors \
/var/lib/influxdb/export-sensors-01
Afterwards import this data into the new server instance:
influxd restore \
-portable \
/var/lib/influxdb/export-sensors-01
Step 2:
Now take the time to adjust the IP-address or domain name to which the InfluxDB clients are currently connected, and make them point to the new InfluxDB server; restart the clients if necessary.
Step 3:
During the time the backup finished and you restarted the clients with the new IP-address, new data was still written to the old database, so we will need to sync that data over.
Again, on the new server, pull a backup from the old one, but specify the time range of the missing data and a different target directory:
influxd backup \
-portable \
-host 192.168.11.10:8088 \
-database sensors \
-start 2019-06-22T19:30:00Z \
-end 2019-06-24T00:00:00Z \
/var/lib/influxdb/export-sensors-02
Apparently it is important to specify -end as well, one test I did which had no -end argument started to backup the entire database again. I just ctrl-d'd out of it and deleted /var/lib/influxdb/export-sensors-02 and started it again with the -end argument set.
The -start argument can contain a couple of minutes of the data which already got restored, since during restoring this second backup these duplicated entries will be ignored or overwrite the already existing identical values.
For example, if you start the main backup at 4pm and it finishes at 6pm, the second backup can contain a -start argument of 5:55pm and an -end argument a couple of days in the future, which is no problem, because as soon as you switch the IP-addresses of the client, no more future data will be written to the old database. Probably the -since argument would have been better, but I was experimenting a bit with time ranges so I left it at using -start+-end.
In order to insert the missing data which you just backed up into /var/lib/influxdb/export-sensors-02 you need to do a bit more work, since you can't restore into an already existing database. If you try to do it, nothing is damaged, only a warning message is shown and restore gets aborted.
So we will need to restore the data into a new, temporary database:
influxd restore \
-portable \
-database sensors \
-newdb sensors_tmp_backup \
/var/lib/influxdb/export-sensors-02
Then copy the data into the sensors database:
influx \
-database=sensors_tmp_backup \
-execute 'SELECT * INTO sensors..:MEASUREMENT FROM /.*/ GROUP BY *'
And delete the temporary database:
influx \
-database=sensors_tmp_backup \
-execute 'DROP DATABASE sensors_tmp_backup'
If all is OK, delete the backup directories
rm -rf /var/lib/influxdb/export-sensors-01
rm -rf /var/lib/influxdb/export-sensors-02
Before changing the addresses with Step 2, you can test Step 3 a couple of times, by making the new db catch up the old, current one via a couple of backups. It's a good way to get acquainted with the procedure in Step 3.
If you're running InfluxDB in Docker, like I am doing, you can execute all the commands from the host. Step 3 would then look like this:
docker exec -w /var/lib/influxdb/ influxdb-1.7.6 influxd backup -portable -host 192.168.11.10:8088 -database sensors -start 2019-06-22T19:40:00Z -end 2019-06-24T00:00:00Z /var/lib/influxdb/export-sensors-02
docker exec -w /var/lib/influxdb/ influxdb-1.7.6 influxd restore -portable -database sensors -newdb sensors_tmp_back /var/lib/influxdb/export-sensors-02
docker exec -w /var/lib/influxdb/ influxdb-1.7.6 influx -database=sensors_tmp_back -execute 'SELECT * INTO sensors..:MEASUREMENT FROM /.*/ GROUP BY *'
docker exec -w /var/lib/influxdb/ influxdb-1.7.6 influx -database=sensors_tmp_back -execute 'DROP DATABASE sensors_tmp_back'
docker exec -w /var/lib/influxdb/ influxdb-1.7.6 rm -rf /var/lib/influxdb/export-sensors-01
docker exec -w /var/lib/influxdb/ influxdb-1.7.6 rm -rf /var/lib/influxdb/export-sensors-02
If you are having problems accessing the remote InfluxDB server keep in mind that the RPC-port 8088 is usually bound to localhost for security reasons, so you may need to bind it to 0.0.0.0 first, probably by setting the environment variable INFLUXDB_BIND_ADDRESS on the remote instance to 0.0.0.0:8088, as specified in the documentation, and then restarting the server.
Not sure it is safe to rsync influxdb/data/sensors directory files from a running influxdb instance. At least you should copy files with rsync and a running influxd, then stop influxd service and repeat rsync to fetch recently updated files.
Without copying `influxdb/meta/meta.db' to a new server your new instance won't know about existing old databases and measurements.
AFAIK, the procedure of manual file copying is not officially documented or recommended by InfluxData.
Probably using official influxd backup / influxd restore commands is a safer approach. They were buggy 1-2 years ago when I tried them, but are likely to work now. You can run backup on a new server from remote old instance and restore backup locally.
I may try as you mentioned in your question copy influxdb/data/sensors directory to the new machine.
_internal database maintains the run time statistics. So you can ignore that if you are not looking into that database.
I am ignorant where it is using its metadata, so be cautious.
wal/* - directory is nothing but write ahead log to avoid data loss. I assume you have some downtime for this activity. If you can find most recent data within sensor DB before you do this copying, there is not chance for data loss from wal.

How to create new database in neo4j?

I'm using Linux 16.04 OS. I have installed fresh neo4j. I get referenced exegetic and digitalocean sites.
By default there's graph.db database.
My question is how to create a new database and create nodes and
relation ship between nodes?
As I show in picture default DB name is graph.db.
Since you're using Neo 3.x, to create a new database without removing your existing one, you can simply edit the neo4j.conf file in your conf directory of your $NEO4J_HOME.
Search for dbms.active_database=, which should have the default value of graph.db. Replace it with some other name and start neo4j again. Now, a new database will be created under that directory name. To switch back to your previous db, repeat the steps, just replace your new value with graph.db in the configuration file.
Neo Technology has come with a new Desktop Tool that greatly improves productivity called Neo4J Desktop. You can download it here
Using it you can manage different projects, create different databases, and simply manage them / switch between them, using the GUI.
Really saves a lot of time.
Apparently in Community Edition you only have 1 database, so I used docker containers to create one server per db. Modify the ports + data volume as shown below:
docker run \
--rm \
--publish=8474:7474 --publish=8687:7687 \
--volume=$HOME/neo4j/data2:/data \
--volume=$HOME/Downloads/neo4j/import:/var/lib/neo4j/import \
--name=neo4j \
--env NEO4J_AUTH=neo4j/password \
neo4j:3.4
# Defaults:
# --publish=7474:7474 --publish=7687:7687 \
# --volume=$HOME/neo4j/data:/data \
In the documentation of Neo4j
Community Edition is a fully functional edition of Neo4j, suitable for
single instance deployments. It has full support for key Neo4j
features, such as ACID compliance, Cypher, and programming APIs. It is
ideal for learning Neo4j, for do-it-yourself projects, and for
applications in small workgroups.
So you only have one database instance.
If you want to get started with Neo4j there is a section in the community edition called "jump into code." There is a wizard to tell you how to get started with their language "Cypher."
To create a new Neo4j database in Unix Environment, the following steps are needed:
first, the configuration file of neo4j exists in the following location:
cd /etc/neo4j (ls ---> neo4j.config);
access the file using vim: sudo vim neo4j.config;
edit the following (by pressing i (for insert)):
there is a commented assignment (at the beginning) which is:
#dbms.active_database=graph.db; remove the comment and add the name of the folder containing the database that you want to create and directly add its location before graph.db
i.e: dbms.active_database=new_db/graph.db; press: Esc + :wq (to save the modification)
After executing sudo service neo4j start, you will see that the activated database is new_db/graph.db
if you want to check that everything went fine, follow these steps:
go to: cd /var/lib/neo4j;
execute: ls (you will have certificates, plugins, data, import); then go to: cd data/databases; then execute ls :you will notice that you have the old database (graph.db), and the new folder new_db that contains also the new_created database graph.db
Remarks:
Neo4j is developed for single database use, and all the manipulations are performed on a single database.
If you want to clear the database, you can go to the location of graph.db and erase everything since doing that from neo4j is very difficult and most of times, you will forget to delete dependencies, labels, ...
i.e : say, we want to clear the new created database graph.db that exists in the folder new_db:
we go to : cd ..../new_db;
execute ls (you will have graph.db);
execute: sudo rm -rf graph.db/*;
Last remark, if you want to access the default database, you just recomment the assignment that you edited
The process is a little tricky in case of causal cluster.
First, stop all the neo4j instances across the VMs in your cluster
sudo systemctl stop neo4j
DB location on Linux machines = /var/lib/neo4j/data/databases
To delete existing db : rm -rf /database/graph.db
Edit new DB name under the template
Search for dbms.active_database=, which should have the default value of graph.db . Replace it with a new DB. On the restart, neo4j will automatically create it.
Remember to UNCOMMENT the line.
Unbind all the nodes — this clears the cluster state and forces the node to freshly join the cluster.
neo4j-admin unbind
Now, this is really important and most people are unaware of this.
Now go ahead and start neo4j instances in all the nodes one by one. This should create new DBs across and you’ll see the nodes joining the cluster.
sudo systemctl start neo4j
Check logs using
journalctl -unit=neo4j -r
OR
sudo systemctl status neo4j

How to have 2 Neo4j DB's on same computer: dev and test

I'm using Neo4j with Node.js to build a REST API.
I'd like to write some tests for this API. How do I use a "test database" during those tests?
With MySQL or MongoDB, I'd fiddle the resource URL to use a different database like "app-test" vs "app".
What's the smart way to do that in neo4j?
Thanks!
SOLUTION: This is what I did:
Made a directory db/test. In that dir, put:
two bash scripts below
a test.zip with backup of the database you want
install.sh
#!/bin/bash
VERSION=neo4j-community-2.1.5
# Download a copy of the server
wget http://dist.neo4j.org/$VERSION-unix.tar.gz
# Unpack it here
tar -xvzf $VERSION-unix.tar.gz
# Change the default port to http->7475 https->7476
sed -i.bak s/7474/7475/g $VERSION/conf/neo4j-server.properties
sed -i.bak s/7473/7476/g $VERSION/conf/neo4j-server.properties
restart.sh
#!/bin/bash
VERSION=neo4j-community-2.1.5
echo === stop the server
$VERSION/bin/neo4j stop
echo --- replace the database
rm -rf $VERSION/data/graph.db
unzip -q test.zip -d $VERSION/data/graph.db
echo --- start the server
$VERSION/bin/neo4j start
Then added it to gruntfile using grunt-run:
run: {
restartTestDb: {
exec: 'cd db/test && ./restart.sh',
}
},
Works.
You can do the same thing with neo4j. Just install another copy of neo4j in a separate location, and configure it to use a different port.
After each test, you can delete the graph.db file to clear all the data.
I'm currently busy with a project that can manage and switch databases from the command line.
I'm busy with fixing the neo4j download feature, this is still in WIP but the more users, the more quickly stable it can be.
I should push on github within an hour or two.
EDIT: Repository here https://github.com/neoxygen/neo4j-toolkit
Live video here http://recordit.co/YRVhOJKXdj

Resources