InfluxDB: Move only one database of many from one server instance to another - influxdb

I have an InfluxDB server instance containing several databases, like sensors, network, telegraf and so on.
Together these databases consume several dozens of GB, and I want to offload only the sensors database to another more powerful machine.
The simplest case would be that I create a new InfluxDB server instance on that other machine, and just move (rsync) the influxdb/data/sensors folder to the other machine, and delete it from the original one.
While I haven't tested it, I assume that this does not work that easily; there is a data/_internal directory, then there's the meta/meta.db file as well as the wal/* directory, which will probably require everything to be left "as-is" in order for the server instance to boot without error.
Since I'm talking about dozens of GBs per database, I'd ideally just would like to mount a new ssd, copy the files/directories, and then mount that new ssd on the other machine and use it directly as the new data source without further copying.
I'd basically wish I could do this in a way as easy as moving rrd-tool's rrd files from one machine to another.
Is this possible? If not, what are my options?

Edit 2022: This is a solution which works for InfluxDB 1.x, the commands shown here may not be directly applicable to 2.x. Here is a link to the 2.x backup/restore documentation: https://docs.influxdata.com/influxdb/v2.2/backup-restore/
The InfluxDB 2.2 influx backup command is not compatible with versions of InfluxDB prior to 2.0.0.
I resorted to using influxd backup / influxd restore as Yuri Lachin pointed out.
While it does have the drawback of first needing to save the data on disc and then read it in from there, it seems to be the the most flexible approach.
Rsyncing 50GB does take a certain amount, and the databases would need to be offline during that time, which is not a requirement for backup / restore; so no data is lost. It also allows to migrate the data which used to be on one single InfluxDB instance to different InfluxDB servers without having to think about the issue with the metadata database.
The backup / restore can be done in steps, where the first step ist to initially backup all the data of the database, restore it into the new server instance, and then exporting the newest data again which didn't make it into the first backup, restoring it again into the new database.
Step 1:
On the machine containing the new, empty InfluxDB server instance, backup the data from the remote, old InfluxDB instance:
influxd backup \
-portable \
-host 192.168.11.10:8088 \
-database sensors \
/var/lib/influxdb/export-sensors-01
Afterwards import this data into the new server instance:
influxd restore \
-portable \
/var/lib/influxdb/export-sensors-01
Step 2:
Now take the time to adjust the IP-address or domain name to which the InfluxDB clients are currently connected, and make them point to the new InfluxDB server; restart the clients if necessary.
Step 3:
During the time the backup finished and you restarted the clients with the new IP-address, new data was still written to the old database, so we will need to sync that data over.
Again, on the new server, pull a backup from the old one, but specify the time range of the missing data and a different target directory:
influxd backup \
-portable \
-host 192.168.11.10:8088 \
-database sensors \
-start 2019-06-22T19:30:00Z \
-end 2019-06-24T00:00:00Z \
/var/lib/influxdb/export-sensors-02
Apparently it is important to specify -end as well, one test I did which had no -end argument started to backup the entire database again. I just ctrl-d'd out of it and deleted /var/lib/influxdb/export-sensors-02 and started it again with the -end argument set.
The -start argument can contain a couple of minutes of the data which already got restored, since during restoring this second backup these duplicated entries will be ignored or overwrite the already existing identical values.
For example, if you start the main backup at 4pm and it finishes at 6pm, the second backup can contain a -start argument of 5:55pm and an -end argument a couple of days in the future, which is no problem, because as soon as you switch the IP-addresses of the client, no more future data will be written to the old database. Probably the -since argument would have been better, but I was experimenting a bit with time ranges so I left it at using -start+-end.
In order to insert the missing data which you just backed up into /var/lib/influxdb/export-sensors-02 you need to do a bit more work, since you can't restore into an already existing database. If you try to do it, nothing is damaged, only a warning message is shown and restore gets aborted.
So we will need to restore the data into a new, temporary database:
influxd restore \
-portable \
-database sensors \
-newdb sensors_tmp_backup \
/var/lib/influxdb/export-sensors-02
Then copy the data into the sensors database:
influx \
-database=sensors_tmp_backup \
-execute 'SELECT * INTO sensors..:MEASUREMENT FROM /.*/ GROUP BY *'
And delete the temporary database:
influx \
-database=sensors_tmp_backup \
-execute 'DROP DATABASE sensors_tmp_backup'
If all is OK, delete the backup directories
rm -rf /var/lib/influxdb/export-sensors-01
rm -rf /var/lib/influxdb/export-sensors-02
Before changing the addresses with Step 2, you can test Step 3 a couple of times, by making the new db catch up the old, current one via a couple of backups. It's a good way to get acquainted with the procedure in Step 3.
If you're running InfluxDB in Docker, like I am doing, you can execute all the commands from the host. Step 3 would then look like this:
docker exec -w /var/lib/influxdb/ influxdb-1.7.6 influxd backup -portable -host 192.168.11.10:8088 -database sensors -start 2019-06-22T19:40:00Z -end 2019-06-24T00:00:00Z /var/lib/influxdb/export-sensors-02
docker exec -w /var/lib/influxdb/ influxdb-1.7.6 influxd restore -portable -database sensors -newdb sensors_tmp_back /var/lib/influxdb/export-sensors-02
docker exec -w /var/lib/influxdb/ influxdb-1.7.6 influx -database=sensors_tmp_back -execute 'SELECT * INTO sensors..:MEASUREMENT FROM /.*/ GROUP BY *'
docker exec -w /var/lib/influxdb/ influxdb-1.7.6 influx -database=sensors_tmp_back -execute 'DROP DATABASE sensors_tmp_back'
docker exec -w /var/lib/influxdb/ influxdb-1.7.6 rm -rf /var/lib/influxdb/export-sensors-01
docker exec -w /var/lib/influxdb/ influxdb-1.7.6 rm -rf /var/lib/influxdb/export-sensors-02
If you are having problems accessing the remote InfluxDB server keep in mind that the RPC-port 8088 is usually bound to localhost for security reasons, so you may need to bind it to 0.0.0.0 first, probably by setting the environment variable INFLUXDB_BIND_ADDRESS on the remote instance to 0.0.0.0:8088, as specified in the documentation, and then restarting the server.

Not sure it is safe to rsync influxdb/data/sensors directory files from a running influxdb instance. At least you should copy files with rsync and a running influxd, then stop influxd service and repeat rsync to fetch recently updated files.
Without copying `influxdb/meta/meta.db' to a new server your new instance won't know about existing old databases and measurements.
AFAIK, the procedure of manual file copying is not officially documented or recommended by InfluxData.
Probably using official influxd backup / influxd restore commands is a safer approach. They were buggy 1-2 years ago when I tried them, but are likely to work now. You can run backup on a new server from remote old instance and restore backup locally.

I may try as you mentioned in your question copy influxdb/data/sensors directory to the new machine.
_internal database maintains the run time statistics. So you can ignore that if you are not looking into that database.
I am ignorant where it is using its metadata, so be cautious.
wal/* - directory is nothing but write ahead log to avoid data loss. I assume you have some downtime for this activity. If you can find most recent data within sensor DB before you do this copying, there is not chance for data loss from wal.

Related

influxdb: unclear usage of tsi1 index after upgrading from in-memory type

influxdb 1.5.2
I've tried switching from inmem index type to tsi1 according documentation
https://docs.influxdata.com/influxdb/v1.5/administration/upgrading/#switching-from-in-memory-tsm-based-index-to-disk-tsi-based-index
change index-version = "tsi1" in config file
stop influxdb
run index migration for all data sudo -H -u influxdb bash -c 'influx_inspect buildtsi -datadir /var/lib/influxdb/data -waldir /var/lib/influxdb/wal/'
run influxdb service
Index dirs were created but system start using even more memory than previous :(
Also I've checked modification date of files inside index dir and it wasn't changed after hours (the same time when I complete buildtsi command).
How I can be sure that influxdb start using new index type?
I see that devs work on visibility in new versions of influxdb
https://github.com/influxdata/influxdb/pull/9777
https://github.com/influxdata/influxdb/issues/9707
But now (in 1.5.x version) it's absolutely unclear for me
Make sure that the index was build sucessfully.
If your memory is not sufficient the build process will killed from the out of memory detection mechanism before it successfully ends.
Infludb will then ignore the incomplete index files and use inmem index instead.
Check /var/log/messages for OOM kills.

Issue Importing the Paradise Papers Dataset into Neo4j

Hej all,
I am having an issue with importing the Paradise Papers dataset into a Neo4j (3.3.2) database.
It seems that the data is imported correctly into the database, as reported by neo4j-admin import.
...
IMPORT DONE in 1m 4s 889ms.
Imported:
867931 nodes
1657838 relationships
17838925 properties
Peak memory usage: 488.28 MB
...
However, after importing the data, the database seems to be empty, as reported by the Cypher queries MATCH (n) RETURN count(n); and CALL apoc.meta.graph();
...
count(n)
0
nodes, relationships
[], []
...
The following link points to a script, which should reproduce my issue. It is a Bash script for OS X/BSD (I think the -E switch for sed does not exist on Linux). Additionally, the script requires Docker to be installed and running on the system.
https://github.com/HelgeCPH/cypher_kernel/blob/master/example/import_data.sh
To run the script quickly:
wget https://raw.githubusercontent.com/HelgeCPH/cypher_kernel/master/example/import_data.sh
chmod u+x import_data.sh
./import_data.sh
I cannot see what I am doing wrong. Do I have to point to the database explicitely when running cypher-shell?
Checking on the container, the database files exist (ls -ltrh data/databases/graph.db) and their timestamps correspond to the time when importing the data.
Thanks in advance for your help!
You had multiple errors on your script :
Nodes were not loaded, because in the CSV the :ID column is not set. That's why I have added this part :
for file in import/csv_paradise_papers/.nodes..csv
do
sed -i -E '1s/node_id/node_id:ID/' $file
done
Labels of node were also not set. It's possible to set them directly in the command line like this : --nodes:MyLabel
If you do a query on Neo4j when the server is restarting, you will probably receive an error because the server is not yet ready. That's why I have added a sleep 5 at the end.
A better approach would be to wait until you have response from the server with someting like this :
until $(curl --output /dev/null --silent --head --fail http://localhost:7474); do
printf '.'
sleep 1
done
Last point, I don't know why, but if you do the restart of neo4j inside the container, you will not see the imported data. But if you restart the container itself it's OK ...

How to create new database in neo4j?

I'm using Linux 16.04 OS. I have installed fresh neo4j. I get referenced exegetic and digitalocean sites.
By default there's graph.db database.
My question is how to create a new database and create nodes and
relation ship between nodes?
As I show in picture default DB name is graph.db.
Since you're using Neo 3.x, to create a new database without removing your existing one, you can simply edit the neo4j.conf file in your conf directory of your $NEO4J_HOME.
Search for dbms.active_database=, which should have the default value of graph.db. Replace it with some other name and start neo4j again. Now, a new database will be created under that directory name. To switch back to your previous db, repeat the steps, just replace your new value with graph.db in the configuration file.
Neo Technology has come with a new Desktop Tool that greatly improves productivity called Neo4J Desktop. You can download it here
Using it you can manage different projects, create different databases, and simply manage them / switch between them, using the GUI.
Really saves a lot of time.
Apparently in Community Edition you only have 1 database, so I used docker containers to create one server per db. Modify the ports + data volume as shown below:
docker run \
--rm \
--publish=8474:7474 --publish=8687:7687 \
--volume=$HOME/neo4j/data2:/data \
--volume=$HOME/Downloads/neo4j/import:/var/lib/neo4j/import \
--name=neo4j \
--env NEO4J_AUTH=neo4j/password \
neo4j:3.4
# Defaults:
# --publish=7474:7474 --publish=7687:7687 \
# --volume=$HOME/neo4j/data:/data \
In the documentation of Neo4j
Community Edition is a fully functional edition of Neo4j, suitable for
single instance deployments. It has full support for key Neo4j
features, such as ACID compliance, Cypher, and programming APIs. It is
ideal for learning Neo4j, for do-it-yourself projects, and for
applications in small workgroups.
So you only have one database instance.
If you want to get started with Neo4j there is a section in the community edition called "jump into code." There is a wizard to tell you how to get started with their language "Cypher."
To create a new Neo4j database in Unix Environment, the following steps are needed:
first, the configuration file of neo4j exists in the following location:
cd /etc/neo4j (ls ---> neo4j.config);
access the file using vim: sudo vim neo4j.config;
edit the following (by pressing i (for insert)):
there is a commented assignment (at the beginning) which is:
#dbms.active_database=graph.db; remove the comment and add the name of the folder containing the database that you want to create and directly add its location before graph.db
i.e: dbms.active_database=new_db/graph.db; press: Esc + :wq (to save the modification)
After executing sudo service neo4j start, you will see that the activated database is new_db/graph.db
if you want to check that everything went fine, follow these steps:
go to: cd /var/lib/neo4j;
execute: ls (you will have certificates, plugins, data, import); then go to: cd data/databases; then execute ls :you will notice that you have the old database (graph.db), and the new folder new_db that contains also the new_created database graph.db
Remarks:
Neo4j is developed for single database use, and all the manipulations are performed on a single database.
If you want to clear the database, you can go to the location of graph.db and erase everything since doing that from neo4j is very difficult and most of times, you will forget to delete dependencies, labels, ...
i.e : say, we want to clear the new created database graph.db that exists in the folder new_db:
we go to : cd ..../new_db;
execute ls (you will have graph.db);
execute: sudo rm -rf graph.db/*;
Last remark, if you want to access the default database, you just recomment the assignment that you edited
The process is a little tricky in case of causal cluster.
First, stop all the neo4j instances across the VMs in your cluster
sudo systemctl stop neo4j
DB location on Linux machines = /var/lib/neo4j/data/databases
To delete existing db : rm -rf /database/graph.db
Edit new DB name under the template
Search for dbms.active_database=, which should have the default value of graph.db . Replace it with a new DB. On the restart, neo4j will automatically create it.
Remember to UNCOMMENT the line.
Unbind all the nodes — this clears the cluster state and forces the node to freshly join the cluster.
neo4j-admin unbind
Now, this is really important and most people are unaware of this.
Now go ahead and start neo4j instances in all the nodes one by one. This should create new DBs across and you’ll see the nodes joining the cluster.
sudo systemctl start neo4j
Check logs using
journalctl -unit=neo4j -r
OR
sudo systemctl status neo4j

How to backup and restore Open edX from one server to other?

I have an Open edX system run entire in only one server, but system performance is bad. Its RAM consuming is being increased day by day, now I wan to backup and restore to other bigger server.
Document of Open edX is hard to reach this information, and I've searched for a while but don't get what I want. If you know this, please guide me on this problem
Many thanks,
You need to backup edxapp and cs_comments_service_development database in mongodb and all data from mysql.
Backing up:
mysqldump edxapp -u root --single-transaction > backup/backup.sql
mongodump --db edxapp
mongodump --db cs_comments_service_development
Restoring:
mysql -u root edxapp < backup.sql
mongo edxapp --eval "db.dropDatabase()"
mongorestore dump/
It worked for me. Copies all courses, accounts, progress and discussions.
Idea taken from BluePlanetLife/openedx-server-prep, for more details, look here
This might not be a exact answer also not a standard solution for production environment, but might help you.
Manual way can be as follows:
You can setup a new edX instance on a new server.
Update all your repos edx-platform, custom xblocks to appropriate branch,tag.
(The database replacement point 3 and 4 below i haven't tested for production environment.)
replace the mysql databases 'edxapp', 'ora', 'xqueue' in new server with older ones.
replace mongodb databases 'cs_comments_service_development', 'edxapp' in new server with older ones.
I was able to replace mysql 'edxapp' database on the devstack.

Backup neo4j community edition offline in unix: mac or linux

Previously I had a problem when making a 'backup' as shown in this question where I get an error when trying to restore the database because I did a copy when the database was running.
So I did an experiment with a new database from another computer (this time with ubuntu) I tried this:
I created some nodes and relations, very few like 10 (the matrix example).
Then I stopped the service neo4j
I copied the folder data that contains graph.db to another location
After that I deleted the graph.db folder and started neo4j
It created automatically a new graph.db folder and the database runs as new without any data, that is normal.
Then I stopped again and paste the old graph.db folder
I get an error:
Starting Neo4j Server...WARNING: not changing user waiting
for server to be ready... Failed to start within 120 seconds.
The error appears after 5 seconds not after 120 seconds.
I tried pasting the folder called data. Same error.
How should I backup and restore in neo4j community offline manually?
I read in some posts that you only copy and restore but that does not work.
Thank you for your help
Online backup, in a sense of taking a consistent backup while Neo4j is running, is only available in Neo4j enterprise edition. Enterprise edition's backup also features a verbose consistency check of the backup, something you do not get in community either.
The only safe option in community edition is to shutdown Neo4j cleanly and copy away the graph.db folder recursively. I'm typically using:
cd data
tar -zcf graph.db.tar.gz graph.db/
For restoring you shut down neo4j, clean out a existing graph.db folder and restore the original graph.db folder from your backup:
cd data
rm -rf graph.db
tar -zxf graph.db.tar.gz
I also ran into this issue and wrote following two codes:
Make backup of momentary state
service neo4j stop && now=$(date +"%m_%d_%Y") && cd /var/lib/neo4j/data/databases/ && tar -cvzf /var/backups/neo4j/$now.gb.tar.gz graph.db && service neo4j start
service neo4j stop = stop the neo4j service
now=$(date +"%m_%d_%Y") = declare the current date as variable
cd /var/lib/neo4j/data/databases/ = change directories to your neo4j dir where the graph.db is located
tar -cvzf /var/backups/neo4j/$now.gb.tar.gz graph.db = make a compressed copy of the graph.db and save it to /var/backups/neo4j/$now.gb.tar.gz
service neo4j start = restart neo4j
Restore neo4j database from a backup
service neo4j stop && cd /var/lib/neo4j/data/databases/ && rm -r graph.db && tar xf /var/backups/neo4j/10_25_2016.gb.tar.gz -C /var/lib/neo4j/data/databases/ && service neo4j start
service neo4j stop = stop the neo4j service
cd /var/lib/neo4j/data/databases/ = change directories to your neo4j dir where the graph.db is located
rm -r graph.db = remove the current graph.db and all its contents
tar xf /var/backups/neo4j/10_25_2016.gb.tar.gz -C /var/lib/neo4j/data/databases/ = Extract the backup to the directory where the old graph.db was located. Be sure to adjust the filename 10_25_2016.gb.tar.gz to what you called your file
service neo4j start = restart neo4j
Info:
This seems to work for me but as I do not have alot of experience with bash scripting I doubt this is the optimal way. But I think it is understandable and easy to customize :)
Cheers
If you cant shutdown and copy the file then you can write a cron script to fetch the data from Neo4j and store it in some other database , say mongodb. You can write cron script to restore also.
This method is only for those who dont have money to buy enterprise edition and cant shutdown his server .

Resources