influx: merging large database with select into fails. Alternatives? - docker

influxdb 1.8.10
I have 2 databases which was originally one, but due to hardware limitations,I had to move to a new system and just started a new database there.
now i've upgraded to a new system and wants to merge these two again.
I've restored the backups of both into a seperate docker instance in two db. energypre2021 and energycombined(which has the data beyond 2021)
if I use
influx -database=energycombined -execute 'SELECT * INTO energypre2021..:MEASUREMENT FROM /.*/ GROUP BY *'
in the docker container, i just get kicked out of the docker instance without any messages and no merged db.
the log just says this
ts=2022-09-08T22:30:10.118858Z lvl=info msg="Open store (end)" log_id=0cooRaaG000 service=store trace_id=0cooRa~0000 op_name=tsdb_open op_event=end op_elapsed=4042.491ms
any tips on how to effectively merge both db's? I am willing to merge one table at a time if needed.
influxdb 1.8.10
64GBRam +1TB SSD should be enough power for this stuff me thinks.

I actually did an export -portable of both instances
influxd backup -portable -database energy /mnt/backup/pre2020
influxd backup -portable -database energy /mnt/backup/newestdata
then I restore the instance with the least data (newestdata) in one empty influx instance. . influxd restore -portable -database energy /mnt/backup/newestdata and export a copy of this influx_inspect export -datadir /var/lib/influxdb/data/ -waldir /var/lib/influxdb/wal/ -out /mnt/backup/newestdata.gz
then i drop that instance and restore the export with the most data influxd restore -portable -database energy /mnt/backup/pre2020
and then import the export
influx -import -compressed=true -path=/mnt/backup/newestdata.gz
then I import both instances-backups one by one in one empty influx instance.

You could probably export the database from the two instances separately and then import them one by one.
Step 1: export the database from the two instances
influx_inspect export -compress -datadir /var/lib/influxdb/data -waldir /var/lib/influxdb/wal -out /root/1.gz
influx_inspect export -compress -datadir /var/lib/influxdb/data -waldir /var/lib/influxdb/wal -out /root/2.gz
Step 2: and then you can import these two files one by one:
influx -import -compressed=true -path=/root/1.gz
influx -import -compressed=true -path=/root/2.gz
See more details here on the export and there on the import.

Related

Influx v1.8 CLI query gets Killed

I'm looking at options to export data from InfluxDB to MySQL. I'm exploring the option to export the data to flat files for the import (so we don't have to hit our production InfluxDB instance).
When I execute the command influx -database 'mydb' -execute 'SELECT * FROM "1D"' -format csv > my-influx-all.csv it runs for about a minute and then outputs Killed.
My test DB is about 2.1GB in size atm so not large. The production DB is 51GB. Is there a flag I can pass so Influx CLI doesn't die? Or is there an alternate way to export data into a flat file?
The query you can might hit an OOM. Further details should be found in the logs.
If you want to export the data in an readable format, you could try influx_inspect :
sudo influx_inspect export -database yourDatabase -out "influx_backup.db"

InfluxDB: Move only one database of many from one server instance to another

I have an InfluxDB server instance containing several databases, like sensors, network, telegraf and so on.
Together these databases consume several dozens of GB, and I want to offload only the sensors database to another more powerful machine.
The simplest case would be that I create a new InfluxDB server instance on that other machine, and just move (rsync) the influxdb/data/sensors folder to the other machine, and delete it from the original one.
While I haven't tested it, I assume that this does not work that easily; there is a data/_internal directory, then there's the meta/meta.db file as well as the wal/* directory, which will probably require everything to be left "as-is" in order for the server instance to boot without error.
Since I'm talking about dozens of GBs per database, I'd ideally just would like to mount a new ssd, copy the files/directories, and then mount that new ssd on the other machine and use it directly as the new data source without further copying.
I'd basically wish I could do this in a way as easy as moving rrd-tool's rrd files from one machine to another.
Is this possible? If not, what are my options?
Edit 2022: This is a solution which works for InfluxDB 1.x, the commands shown here may not be directly applicable to 2.x. Here is a link to the 2.x backup/restore documentation: https://docs.influxdata.com/influxdb/v2.2/backup-restore/
The InfluxDB 2.2 influx backup command is not compatible with versions of InfluxDB prior to 2.0.0.
I resorted to using influxd backup / influxd restore as Yuri Lachin pointed out.
While it does have the drawback of first needing to save the data on disc and then read it in from there, it seems to be the the most flexible approach.
Rsyncing 50GB does take a certain amount, and the databases would need to be offline during that time, which is not a requirement for backup / restore; so no data is lost. It also allows to migrate the data which used to be on one single InfluxDB instance to different InfluxDB servers without having to think about the issue with the metadata database.
The backup / restore can be done in steps, where the first step ist to initially backup all the data of the database, restore it into the new server instance, and then exporting the newest data again which didn't make it into the first backup, restoring it again into the new database.
Step 1:
On the machine containing the new, empty InfluxDB server instance, backup the data from the remote, old InfluxDB instance:
influxd backup \
-portable \
-host 192.168.11.10:8088 \
-database sensors \
/var/lib/influxdb/export-sensors-01
Afterwards import this data into the new server instance:
influxd restore \
-portable \
/var/lib/influxdb/export-sensors-01
Step 2:
Now take the time to adjust the IP-address or domain name to which the InfluxDB clients are currently connected, and make them point to the new InfluxDB server; restart the clients if necessary.
Step 3:
During the time the backup finished and you restarted the clients with the new IP-address, new data was still written to the old database, so we will need to sync that data over.
Again, on the new server, pull a backup from the old one, but specify the time range of the missing data and a different target directory:
influxd backup \
-portable \
-host 192.168.11.10:8088 \
-database sensors \
-start 2019-06-22T19:30:00Z \
-end 2019-06-24T00:00:00Z \
/var/lib/influxdb/export-sensors-02
Apparently it is important to specify -end as well, one test I did which had no -end argument started to backup the entire database again. I just ctrl-d'd out of it and deleted /var/lib/influxdb/export-sensors-02 and started it again with the -end argument set.
The -start argument can contain a couple of minutes of the data which already got restored, since during restoring this second backup these duplicated entries will be ignored or overwrite the already existing identical values.
For example, if you start the main backup at 4pm and it finishes at 6pm, the second backup can contain a -start argument of 5:55pm and an -end argument a couple of days in the future, which is no problem, because as soon as you switch the IP-addresses of the client, no more future data will be written to the old database. Probably the -since argument would have been better, but I was experimenting a bit with time ranges so I left it at using -start+-end.
In order to insert the missing data which you just backed up into /var/lib/influxdb/export-sensors-02 you need to do a bit more work, since you can't restore into an already existing database. If you try to do it, nothing is damaged, only a warning message is shown and restore gets aborted.
So we will need to restore the data into a new, temporary database:
influxd restore \
-portable \
-database sensors \
-newdb sensors_tmp_backup \
/var/lib/influxdb/export-sensors-02
Then copy the data into the sensors database:
influx \
-database=sensors_tmp_backup \
-execute 'SELECT * INTO sensors..:MEASUREMENT FROM /.*/ GROUP BY *'
And delete the temporary database:
influx \
-database=sensors_tmp_backup \
-execute 'DROP DATABASE sensors_tmp_backup'
If all is OK, delete the backup directories
rm -rf /var/lib/influxdb/export-sensors-01
rm -rf /var/lib/influxdb/export-sensors-02
Before changing the addresses with Step 2, you can test Step 3 a couple of times, by making the new db catch up the old, current one via a couple of backups. It's a good way to get acquainted with the procedure in Step 3.
If you're running InfluxDB in Docker, like I am doing, you can execute all the commands from the host. Step 3 would then look like this:
docker exec -w /var/lib/influxdb/ influxdb-1.7.6 influxd backup -portable -host 192.168.11.10:8088 -database sensors -start 2019-06-22T19:40:00Z -end 2019-06-24T00:00:00Z /var/lib/influxdb/export-sensors-02
docker exec -w /var/lib/influxdb/ influxdb-1.7.6 influxd restore -portable -database sensors -newdb sensors_tmp_back /var/lib/influxdb/export-sensors-02
docker exec -w /var/lib/influxdb/ influxdb-1.7.6 influx -database=sensors_tmp_back -execute 'SELECT * INTO sensors..:MEASUREMENT FROM /.*/ GROUP BY *'
docker exec -w /var/lib/influxdb/ influxdb-1.7.6 influx -database=sensors_tmp_back -execute 'DROP DATABASE sensors_tmp_back'
docker exec -w /var/lib/influxdb/ influxdb-1.7.6 rm -rf /var/lib/influxdb/export-sensors-01
docker exec -w /var/lib/influxdb/ influxdb-1.7.6 rm -rf /var/lib/influxdb/export-sensors-02
If you are having problems accessing the remote InfluxDB server keep in mind that the RPC-port 8088 is usually bound to localhost for security reasons, so you may need to bind it to 0.0.0.0 first, probably by setting the environment variable INFLUXDB_BIND_ADDRESS on the remote instance to 0.0.0.0:8088, as specified in the documentation, and then restarting the server.
Not sure it is safe to rsync influxdb/data/sensors directory files from a running influxdb instance. At least you should copy files with rsync and a running influxd, then stop influxd service and repeat rsync to fetch recently updated files.
Without copying `influxdb/meta/meta.db' to a new server your new instance won't know about existing old databases and measurements.
AFAIK, the procedure of manual file copying is not officially documented or recommended by InfluxData.
Probably using official influxd backup / influxd restore commands is a safer approach. They were buggy 1-2 years ago when I tried them, but are likely to work now. You can run backup on a new server from remote old instance and restore backup locally.
I may try as you mentioned in your question copy influxdb/data/sensors directory to the new machine.
_internal database maintains the run time statistics. So you can ignore that if you are not looking into that database.
I am ignorant where it is using its metadata, so be cautious.
wal/* - directory is nothing but write ahead log to avoid data loss. I assume you have some downtime for this activity. If you can find most recent data within sensor DB before you do this copying, there is not chance for data loss from wal.

InfluxDB: restore: DB metadata not changed. database may already exist

I'm trying to restore DB "test" and started by doing a drop test which was successful, however when I try to restore using influxd restore -portable -newdb "test" test_backup.influx I get this error:
restore: DB metadata not changed. database may already exist
The database is not listed when I show databases so I find this a bit strange.
you can try adding -db , -datadir and -metadir ..
influxd restore -portable -db "test" -newdb "test" -datadir /.../data -metadir /.../meta test_backup.influx

Issue Importing the Paradise Papers Dataset into Neo4j

Hej all,
I am having an issue with importing the Paradise Papers dataset into a Neo4j (3.3.2) database.
It seems that the data is imported correctly into the database, as reported by neo4j-admin import.
...
IMPORT DONE in 1m 4s 889ms.
Imported:
867931 nodes
1657838 relationships
17838925 properties
Peak memory usage: 488.28 MB
...
However, after importing the data, the database seems to be empty, as reported by the Cypher queries MATCH (n) RETURN count(n); and CALL apoc.meta.graph();
...
count(n)
0
nodes, relationships
[], []
...
The following link points to a script, which should reproduce my issue. It is a Bash script for OS X/BSD (I think the -E switch for sed does not exist on Linux). Additionally, the script requires Docker to be installed and running on the system.
https://github.com/HelgeCPH/cypher_kernel/blob/master/example/import_data.sh
To run the script quickly:
wget https://raw.githubusercontent.com/HelgeCPH/cypher_kernel/master/example/import_data.sh
chmod u+x import_data.sh
./import_data.sh
I cannot see what I am doing wrong. Do I have to point to the database explicitely when running cypher-shell?
Checking on the container, the database files exist (ls -ltrh data/databases/graph.db) and their timestamps correspond to the time when importing the data.
Thanks in advance for your help!
You had multiple errors on your script :
Nodes were not loaded, because in the CSV the :ID column is not set. That's why I have added this part :
for file in import/csv_paradise_papers/.nodes..csv
do
sed -i -E '1s/node_id/node_id:ID/' $file
done
Labels of node were also not set. It's possible to set them directly in the command line like this : --nodes:MyLabel
If you do a query on Neo4j when the server is restarting, you will probably receive an error because the server is not yet ready. That's why I have added a sleep 5 at the end.
A better approach would be to wait until you have response from the server with someting like this :
until $(curl --output /dev/null --silent --head --fail http://localhost:7474); do
printf '.'
sleep 1
done
Last point, I don't know why, but if you do the restart of neo4j inside the container, you will not see the imported data. But if you restart the container itself it's OK ...

Moving data between Neo4j databases

I have to move data between two Neo4j databases. One of them is older (2.1.8) and the new one is 2.3.0.
What I tried is this, but you can see also in the output that something is wrong.
/home/adam/neo4j-community-2.1.8/bin/neo4j-shell -path /home/adam/neo4j_bak9/ -c "dump" | /home/adam/neo4j-community-2.3.0/bin/neo4j-shell -file -
Transaction started
3 ms
WARNING: Invalid input 'c': expected whitespace, comment, ';' or end of input (line 2, column 1 (offset: 39))
"create index on :`Location`(`latitude`)"
^
ERROR (-v for expanded information):
Transaction was marked as successful, but unable to commit transaction so rolled back.
-host Domain name or IP of host to connect to (default: localhost)
-port Port of host to connect to (default: 1337)
-name RMI name, i.e. rmi://<host>:<port>/<name> (default: shell)
-pid Process ID to connect to
-c Command line to execute. After executing it the shell exits
-file File containing commands to execute, or '-' to read from stdin. After executing it the shell exits
-readonly Connect in readonly mode (only for connecting with -path)
-path Points to a neo4j db path so that a local server can be started there
-config Points to a config file when starting a local server
Example arguments for remote:
-port 1337
-host 192.168.1.234 -port 1337 -name shell
-host localhost -readonly
...or no arguments for default values
Example arguments for local:
-path /path/to/db
-path /path/to/db -config /path/to/neo4j.config
-path /path/to/db -readonly
It look that neo4j is producing syntax that could not be read by the new version. Am I doing something wrong or this is a bug?
That's a problem that I've had. I think that it's expecting semi-colons (or vice versa) for the create index statements at the top of the dump. It's sad that it's not more of a smooth import/export there.
Another option for the easiest and cleanest way of upgrading Neo4j (assuming you're able to have a bit of downtime):
Shut down both servers
Copy the graph.db dir from the old data dir to the new one
Make sure that the new database has allow_store_upgrade=true set in the conf/neo4j.properties file
Start up the new database
When it starts up, it should see that the database files are from an old version and automatically upgrade them to the 2.3.0 format.

Resources