Elasticsearch create snapshot repo throws RepositoryVerificationException - docker

I am trying to take a snapshot for a elasticsearch cluster.The design is the following. There are 3 VMs that run 1 master, 1 data and 1 client node each in Docker containers. Each VM has a volume attached for storing. So a cluster with 3 masters 3 clients 3 data nodes and 3 volumes.
After reading the documentation I created a separate backup volume that I attached to one of the VMs. After that i created a NFS between all 3 VMs that saves the data on the backup volume and then I modified the cluster and mounted the shared NFS directory as a volume to all the nodes in the cluster
So now each VM has the following:
VM1:
drwxr-xr-x 16 root root 3560 Jul 24 10:30 dev
drwxr-xr-x 2 nobody nogroup 4096 Jul 24 11:49 elastic-backup
drwxr-xr-x 97 root root 4096 Jul 24 14:04 etc
drwxr-xr-x 5 root root 4096 Apr 27 12:53 home
VM2:
drwxr-xr-x 2 root root 4096 Jul 24 13:52 bin
drwxr-xr-x 3 root root 4096 Jul 24 12:09 boot
drwxr-xr-x 5 root root 4096 Jan 27 16:41 data
drwxr-xr-x 16 root root 3580 Jul 24 11:48 dev
drwxr-xr-x 2 nobody nogroup 4096 Jul 24 11:49 elastic-backup
VM3:
drwxr-xr-x 3 root root 4096 Jul 24 15:28 boot
drwxr-xr-x 5 root root 4096 Jan 27 16:41 data
drwxr-xr-x 16 root root 3560 Jul 24 10:30 dev
drwxr-xr-x 2 nobody nogroup 4096 Jul 24 15:34 elastic-backup
When i create a file into it i can see it, modify or whatever and the action is visible from each VM.
Elasticsearch docker nodes:
drwxr-xr-x 1 elasticsearch elasticsearch 4096 May 15 2018 config
drwxr-xr-x 4 elasticsearch elasticsearch 4096 Jul 23 12:15 data
drwxr-xr-x 2 elasticsearch elasticsearch 4096 Jul 24 15:08 elastic-backup
Each docker elasticsearch node has the same directory mounted. I can see all the files from each node.
The problem is that whenever I try to create a snapshot repository i get the following error:
Call:
PUT /_snapshot/elastic-backup-1
{
"type": "fs",
"settings": {
"location": "/usr/share/elasticsearch/elastic-backup"
}
}
Error:
{
"error": {
"root_cause": [
{
"type": "repository_verification_exception",
"reason": "[elastic-backup-1] [[some-id, 'RemoteTransportException[[master-2][VM2-ip][internal:admin/repository/verify]]; nested: RepositoryVerificationException[[elastic-backup-1] a file written by master to the store [/usr/share/elasticsearch/elastic-backup] cannot be accessed on the node [{master-2}{some-id}{some-id}{VM2-ip}{VM2-ip}{zone=AZ2}]. This might indicate that the store [/usr/share/elasticsearch/elastic-backup] is not shared between this node and the master node or that permissions on the store don't allow reading files written by the master node];'], [some-id, 'RemoteTransportException[[data-2][VM2-ip][internal:admin/repository/verify]]; nested: RepositoryVerificationException[[elastic-backup-1] a file written by master to the store [/usr/share/elasticsearch/elastic-backup] cannot be accessed on the node [{data-2}{some-id}{some-id}{VM2-ip}{VM2-ip}{zone=AZ2}]. This might indicate that the store [/usr/share/elasticsearch/elastic-backup] is not shared between this node and the master node or that permissions on the store don't allow reading files written by the master node];'], [some-id, 'RemoteTransportException[[data-1][VM1-ip][internal:admin/repository/verify]]; nested: RepositoryVerificationException[[elastic-backup-1] a file written by master to the store [/usr/share/elasticsearch/elastic-backup] cannot be accessed on the node [{data-1}{some-id}{some-id}{VM1-ip}{VM1-ip}{zone=AZ1}]. This might indicate that the store [/usr/share/elasticsearch/elastic-backup] is not shared between this node and the master node or that permissions on the store don't allow reading files written by the master node];'], [some-id, 'RemoteTransportException[[master-1][VM1-ip][internal:admin/repository/verify]]; nested: RepositoryVerificationException[[elastic-backup-1] a file written by master to the store [/usr/share/elasticsearch/elastic-backup] cannot be accessed on the node [{master-1}{some-id}{some-id}{VM1-ip}{VM1-ip}{zone=AZ1}]. This might indicate that the store [/usr/share/elasticsearch/elastic-backup] is not shared between this node and the master node or that permissions on the store don't allow reading files written by the master node];'], [some-id, 'RemoteTransportException[[data-3][VM3-ip][internal:admin/repository/verify]]; nested: RepositoryVerificationException[[elastic-backup-1] a file written by master to the store [/usr/share/elasticsearch/elastic-backup] cannot be accessed on the node [{data-3}{some-id}{some-id}{VM3-ip}{VM3-ip}{zone=AZ1}]. This might indicate that the store [/usr/share/elasticsearch/elastic-backup] is not shared between this node and the master node or that permissions on the store don't allow reading files written by the master node];']]"
}
etc ..
Anything I am doing wrong ? How can this be fixed

As stated by Christian_Dahlqvist, you must provide a shared file system.
You need to have a shared volume, e.g. a NFS volume, behind the repository path that all nodes can access. This means that if node1 writes a file it will be visible by node 2 and node 3. A directory in the local file system will therefore not work, even if the path is identical on all machines.

Related

docker-compose mounted volume is empty, but other volumes created during Docker image build are populated

Starting with an empty directory, I created this docker-compose.yml:
version: '3.9'
services:
neo4j:
image: neo4j:3.2
restart: unless-stopped
ports:
- 7474:7474
- 7687:7687
volumes:
- ./conf:/conf
- ./data:/data
- ./import:/import
- ./logs:/logs
- ./plugins:/plugins
environment:
# Raise memory limits
- NEO4J_dbms_memory_pagecache_size=1G
- NEO4J_dbms.memory.heap.initial_size=1G
- NEO4J_dbms_memory_heap_max__size=1G
Then I add the import directory, which contains data files I intend to work with in the container.
At this point, my directory looks like this:
0 drwxr-xr-x 9 cc staff 288 Dec 11 18:57 .
0 drwxr-xr-x 5 cc staff 160 Dec 11 18:15 ..
8 -rw-r--r-- 1 cc staff 458 Dec 11 18:45 docker-compose.yml
0 drwxr-xr-x 20 cc staff 640 Dec 11 18:57 import
I run docker-compose up -d --build, and the container is built. Now the local directory looks like this:
0 drwxr-xr-x 9 cc staff 288 Dec 11 18:57 .
0 drwxr-xr-x 5 cc staff 160 Dec 11 18:15 ..
0 drwxr-xr-x 2 cc staff 64 Dec 11 13:59 conf
0 drwxrwxrwx# 4 cc staff 128 Dec 11 18:08 data
8 -rw-r--r-- 1 cc staff 458 Dec 11 18:45 docker-compose.yml
0 drwxr-xr-x 20 cc staff 640 Dec 11 18:57 import
0 drwxrwxrwx# 3 cc staff 96 Dec 11 13:59 logs
0 drwxr-xr-x 3 cc staff 96 Dec 11 15:32 plugins
The conf, data, logs, and plugins directories are created.
data and logs are populated from the build of the Neo4j image, and conf and plugins are empty, as expected.
I use docker exec to look at the directory structures on the container:
8 drwx------ 1 neo4j neo4j 4096 Dec 11 23:46 .
8 drwxr-xr-x 1 root root 4096 May 11 2019 ..
36 -rwxrwxrwx 1 neo4j neo4j 36005 Feb 18 2019 LICENSE.txt
128 -rwxrwxrwx 1 neo4j neo4j 130044 Feb 18 2019 LICENSES.txt
12 -rwxrwxrwx 1 neo4j neo4j 8493 Feb 18 2019 NOTICE.txt
4 -rwxrwxrwx 1 neo4j neo4j 1594 Feb 18 2019 README.txt
4 -rwxrwxrwx 1 neo4j neo4j 96 Feb 18 2019 UPGRADE.txt
8 drwx------ 1 neo4j neo4j 4096 May 11 2019 bin
4 drwxr-xr-x 2 neo4j neo4j 4096 Dec 11 23:46 certificates
8 drwx------ 1 neo4j neo4j 4096 Dec 11 23:46 conf
0 lrwxrwxrwx 1 root root 5 May 11 2019 data -> /data
4 drwx------ 1 neo4j neo4j 4096 Feb 18 2019 import
8 drwx------ 1 neo4j neo4j 4096 May 11 2019 lib
0 lrwxrwxrwx 1 root root 5 May 11 2019 logs -> /logs
4 drwx------ 1 neo4j neo4j 4096 Feb 18 2019 plugins
4 drwx------ 1 neo4j neo4j 4096 Feb 18 2019 run
My problem is that the import directory in the container is empty. The data and logs directories are not empty though.
The data and logs directories on my local have extended attributes which the conf and plugins do not:
xattr -l data
com.docker.grpcfuse.ownership: {"UID":100,"GID":101}
The only difference I can identify is that those directories that had data created by docker-compose when it grabbed the Neo4j image.
Does anyone understand what is happening here, and tell me how I can get this to work? I'm using Mac OS X 10.15 and docker-compose version 1.27.4, build 40524192.
Thanks.
TL;DR: your current setup probably works fine.
To walk through the specific behavior you're observing:
On container startup, Docker will create empty directories on the host if they don't exist, and mount-point directories inside the container. (Which is why those directories appear.)
Docker never copies data from an image into a bind mount. This behavior only happens for named volumes (and only the very first time you use them, not on later runs; and only on native Docker, not on Kubernetes).
But, the standard database images generally know how to initialize an empty data directory. In the case of the neo4j image, its Dockerfile ends with an ENTRYPOINT directive that runs at container startup; that docker-entrypoint.sh knows how to do various sorts of first-time setup. That's how data gets into ./data.
The image also declares a WORKDIR /var/lib/neo4j (via an intermediate environment variable). That explains, in your ls -l listing, why there are symlinks like data -> /data. Your bind mount is to /import, but if you docker-compose exec neo4j ls import, it will look relative to that WORKDIR, which is why the directory looks empty.
But, the entrypoint script specifically looks for a /import directory inside the container, and if it exists and is readable, it sets an environment variable NEO4J_dbms_directories_import=/import.
This all suggests to me that your setup is correct, and if you try to execute an import, it will work correctly and see your host data. You are looking at a /var/lib/neo4j/import directory from the image, and it's empty, but the image startup knows to also look for /import in the container, and your mount points there.

Accessing multiple allure reports over http

I have test-results in a folder, which will be updated with new executions on every test run.
[root#server test-results]# ls -lrt /var/log/test-results/
total 352
drwxrwxrwx. 14 root root 4096 Jan 10 10:28 9ca9cd74-21d3-4d39-b556-1ca914a37408
drwxrwxrwx. 14 root root 4096 Jan 10 10:41 4fc5f9fa-ee03-4370-98bc-0cca6dcb95d6
drwxrwxrwx. 14 root root 4096 Jan 10 13:00 9e7a9239-cbd4-48a1-929e-bf90892903b0
drwxrwxrwx. 14 root root 4096 Jan 11 09:09 544bd6c9-8d43-4e17-bbc7-8395498e98b6
drwxr-xr-x. 14 root root 4096 Jan 14 10:40 faa3284c-01b0-4581-89fd-57a1919d13b7
drwxr-xr-x. 14 root root 4096 Jan 14 11:31 ee84c6f4-048d-4e9d-96c0-2bc4f6ab3f53
drwxr-xr-x. 14 root root 4096 Jan 14 11:46 c229cacb-1e27-4629-a67f-3eb2965006f9
drwxr-xr-x. 14 root root 4096 Jan 14 12:35 2cca5070-0333-4d95-a1e5-c409c3185bf3
drwxr-xr-x. 14 root root 4096 Jan 14 13:13 e19c3bd1-3a1f-459a-b8f2-5e1bfad11fe9
Each of this folder has test-data which can be parsed by the allure engine.
I am trying to find a way to view these results in the following way.
https://localhost:port/<test folder name/
https://localhost:port/9ca9cd74-21d3-4d39-b556-1ca914a37408/
https://localhost:port/4fc5f9fa-ee03-4370-98bc-0cca6dcb95d6/
https://localhost:port/9e7a9239-cbd4-48a1-929e-bf90892903b0/
https://localhost:port/544bd6c9-8d43-4e17-bbc7-8395498e98b6/
I tried using https://github.com/fescobar/allure-docker-service. This does provide a way to view the allure report. But, it cannot individually allow me to have separate links for different test-results. Instead, it can show allure report of data in only one folder.
I am not sure of the right way to do it. Please let me know if any information is needed.
You can start multiple allure containers using different ports and showing different directories or you can have one directory where you will have all files together.
Now you can handle multiple projects with Allure Docker Service
https://github.com/fescobar/allure-docker-service#MULTIPLE-PROJECTS---REMOTE-REPORTS

How can Docker container write to a mounted directory with permissions granted through group membership?

Versions
Host OS: Debian 4.9.110
Docker Version: 18.06.1-ce
Scenario
I have a directory where multiple users (user-a and user-b) have read/write access through a common group membership (shared), set up via chown:
/media/disk-a/shared/$ ls -la
drwxrwsr-x 4 user-a shared 4096 Oct 7 22:21 .
drwxrwxr-x 7 root root 4096 Oct 1 19:58 ..
drwxrwsr-x 5 user-a shared 4096 Oct 7 22:10 folder-a
drwxrwsr-x 3 user-a shared 4096 Nov 10 22:10 folder-b
UIDs & GIDs are as following:
uid=1000(user-a) gid=1000(user-a) groups=1000(user-a),1003(shared)
uid=1002(user-b) gid=1002(user-b) groups=1002(user-b),1003(shared)
Relevant /etc/group looks like this:
shared:x:1003:user-a,user-b
When suing into both users, files can be created as expected within the shared directory.
The shared directory is attached to a Docker container via mount binds to /shared/. The Docker container runs as user-b (using the --user "1002:1002" parameter)
$ ps aux | grep user-b
user-b 1347 0.2 1.2 1579548 45740 ? Ssl 17:47 0:02 entrypoint.sh
id from within the container prints the following, to me okay-looking result:
I have no name!#7a5d2cc27491:/$ id
uid=1002 gid=1002
Also ls -la mirrors its host system equivalent perfectly:
I have no name!#7a5d2cc27491:/shared ls -la
total 16
drwxrwsr-x 4 1000 1003 4096 Oct 7 20:21 .
drwxr-xr-x 1 root root 4096 Oct 8 07:58 ..
drwxrwsr-x 5 1000 1003 4096 Oct 7 20:10 folder-a
drwxrwsr-x 3 1000 1003 4096 Nov 10 20:10 folder-b
Problem
From within the container, I cannot write anything to the shared directory. For touch test I get the following i.e.:
I have no name!#7a5d2cc27491:/shared$ touch test
touch: cannot touch 'test': Permission denied
I can write to a directory which is directly owned by user-b (user & group) and mounted to the container... Simply the group membership seems somehow not to be respected at all.
I have looked into things like user namespace remapping and things, but these seemed to be solutions for something not applying here. What do I miss?
Your container user has gid=1002, but is not member of group shared with gid=1003.
Additionally to --user "1002:1002" you need --group-add 1003.
Than the container user is allowed to access the shared folder with gid=1003.
id should show:
I have no name!#7a5d2cc27491:/$ id
uid=1002 gid=1002 groups=1003

upgrade datastax-agent for opscenter RHEL isolated nodes from tarbell

A colleague installed a 3 node DSE cluster.
When bringing up Opscenter on the seed node, we get an alert that the agent requires an upgrade from 5.2.0 to 5.2.2 and the agent is not installed on the two other nodes.
Because of environmental restrictions, We do not have the internet accessibility or root credentials to perform the automated upgrade/install from Opscenter. I downloaded and unpacked the 5.2.2 agent tarbell and the latest Opscenter.
Where do I overlay 5.2.0 contents with 5.2.2 to manually perform the upgrade with only SU access via the command line? On the non-seed nodes, I started the agents manually.
Non-seed:
root 8362 1 2 Nov30 ? 03:36:33 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.51-1.b16.el6_7.x86_64/jre/bin/java -Xmx128M -Djclouds.mpu.parts.magnitude=100000 -Djclouds.mpu.parts.size=16777216 -Dopscenter.ssl.trustStore=ssl/agentKeyStore -Dopscenter.ssl.keyStore=ssl/agentKeyStore -Dopscenter.ssl.keyStorePassword=opscenter -Dagent-pidfile=./datastax-agent.pid -Dlog4j.configuration=file:./conf/log4j.properties -Djava.security.auth.login.config=./conf/kerberos.config -jar datastax-agent-5.2.`2-standalone.jar ./conf/address.yaml
Seed:
497 4375 1 2 Nov30 ? 03:42:23 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.51-1.b16.el6_7.x86_64/jre/bin/java -Xmx128M -Djclouds.mpu.parts.magnitude=100000 -Djclouds.mpu.parts.size=16777216 -Dopscenter.ssl.trustStore=/var/lib/datastax-agent/ssl/agentKeyStore -Dopscenter.ssl.keyStore=/var/lib/datastax-agent/ssl/agentKeyStore -Dopscenter.ssl.keyStorePassword=opscenter -Dagent-pidfile=/var/run/datastax-agent/datastax-agent.pid -Dlog4j.configuration=file:./conf/log4j.properties -Djava.security.auth.login.config=./conf/kerberos.config -jar datastax-agent-5.2.0-standalone.jar /var/lib/datastax-agent/conf/address.yaml
We are completely new to DSE and RHEL.
drwxr-xr-x 4 cassandra cassandra 4096 Sep 17 12:20 datastax-agent
drwxr-xr-x 7 root root 4096 Nov 30 14:31 datastax-agent-5.2.2
drwxr-xr-x 4 root root 4096 Nov 30 14:31 datastax-agent-old
datastax-agent:
total 24836
drwxrwxr-x 7 cassandra cassandra 4096 Sep 17 12:20 .
drwxr-xr-x. 95 root root 4096 Dec 1 17:08 ..
drwxrwxr-x 3 cassandra cassandra 4096 Nov 16 13:16 bin
drwxrwxr-x 2 cassandra cassandra 4096 Sep 17 12:20 conf
-rw-rw-r-- 1 cassandra cassandra 25402316 Jul 14 12:19 datastax-agent-5.2.0-standalone.jar
drwxrwxr-x 2 cassandra cassandra 4096 Sep 17 12:20 doc
drwxrwxr-x 2 cassandra cassandra 4096 Sep 17 12:20 ssl
drwxrwxr-x 3 cassandra cassandra 4096 Sep 17 12:20 tmp
datastax-agent-5.2.2:
total 25044
drwxr-xr-x 7 root root 4096 Dec 1 17:08 .
drwxr-xr-x. 95 root root 4096 Dec 1 17:08 ..
drwxr-xr-x 3 root root 4096 Dec 1 17:08 bin
drwxr-xr-x 2 root root 4096 Dec 1 17:08 conf
-rw-r--r-- 1 root root 25608470 Dec 1 17:08 datastax-agent-5.2.2-standalone.jar
-rw-r--r-- 1 root root 5 Dec 1 18:06 datastax-agent.pid
drwxr-xr-x 2 root root 4096 Dec 1 17:08 doc
drwxr-xr-x 2 root root 4096 Dec 1 17:08 log
drwxr-xr-x 2 root root 4096 Dec 1 17:08 ssl
How did you install DSE? rpm, tarball or standalone installer? Either way, to get the new agent in place, the only thing you need is the new jar file, so drop the one from the tar ball into the location where you see datastax-agent-5.2.0-standalone.jar (which unfortunately varies based upon the install method you used, hence my question above :-). Move the old jar out of the way and restart the agent process /etc/init.d/datastax-agent stop followed by /etc/init.d/datastax-agent start).
To upgrade OpsCenter Agent installed from tarball, simply extract that tarball to the same directory the agent was installed before and remove the old jar file (datastax-agent-5.2.0-standalone.jar in this case).

Is it safe to delete docker logs generated at /var/lib/docker/containers/HASH

The log is now is currently 13GB, I don't know if it safe to delete the log, and how to make the log smaller
root#faith:/var/lib/docker/containers/f1ac17e833be2e5d1586d34c51324178bd18f969d
1046cbb59f10eaa4bcf84be# ls -alh
total 13G
drwx------ 2 root root 4.0K Mar 6 08:35 .
drwx------ 3 root root 4.0K Feb 24 11:00 ..
-rw-r--r-- 1 root root 2.1K Feb 24 10:15 config.json
-rw------- 1 root root 13G Feb 25 00:27 f1ac17e833be2e5d1586d34c51324178bd18f96
9d1046cbb59f10eaa4bcf84be-json.log
-rw-r--r-- 1 root root 611 Feb 24 10:15 hostconfig.json
-rw-r--r-- 1 root root 13 Feb 24 10:15 hostname
-rw-r--r-- 1 root root 175 Feb 24 10:15 hosts
-rw-r--r-- 1 root root 61 Feb 24 10:15 resolv.conf
-rw-r--r-- 1 root root 71 Feb 24 10:15 resolv.conf.hash
Congratulations, you have discovered one of The Big Unsolved Problems with Docker!
As Nathaniel says, Docker assumes it has complete ownership of things under /var/lib/docker so trying to delete files there from behind Docker's back may not work.
However, based on components in issue 7333 and in PR 9753, it looks like people are successfully using logrotate and the copytruncate directive to rotate docker logs. Both these links are worth reading, because they contain a long discussion about the pitfalls of Docker logging and some potential solutions.
Ideally, Docker itself would have much better native support for log management. Until then, here are some alternatives to consider:
If you control the source for your applications, you can configure everything to log to syslog rather than to stdout/stderr. You can then have a variety of solutions you can pursue, from running a syslog service inside your container to exposing the hosts's /dev/log inside the container.
Another options is to run systemd inside your container, and use this to start your services. systemd will collect stdout/stderr from your services and feed that to journald, and journald will take care of things like log rotation (and also give you a reasonably flexible mechanism for querying the logs).
These ought to be cleaned up when you delete the container. (Thus, it is not OK to delete them, because Docker believes that it has control of /var/lib/docker.)

Resources