Container is unhealthy. Encountered errors while bringing up the project - docker

Upgraded Elasticsearch from 7.10.1 to 7.17.6 by changing the version number in .env files.
environment is a 3 node cluster (es01, es02, es03) on the same server and 1 kibana kib01 on it.
Running the sudo docker-compose up -d gives below error. Unhealthy container as mentioned in error is es01.
Starting es03 ... done
Recreating es01 ... done
Starting es02 ... done
ERROR: for kib01 Container "<container-ID>" is unhealthy.
ERROR: Encountered errors while bringing up the project.
var/log/messages file shows below if it's related to the issue
Sep 9 avahi-daemon[890]: Withdrawing address record for on <ID1>.
Sep 9 avahi-daemon[890]: Withdrawing workstation service for <id>.
Sep 9 libvirtd: 2022-09-09 : 1468: error : virFileReadAll:1460 : Failed to open file '/sys/class/net/<id>/operstate': No such file or directory
Sep 9 libvirtd: 2022-09-09 : 1468: error : virNetDevGetLinkInfo:2552 : unable to read: /sys/class/net/<id>/operstate: No such file or directory
Sep 9 kernel: device <ID1> left promiscuous mode
Sep 9 kernel: : port 2(<ID1>) entered disabled state
Sep 9 avahi-daemon[890]: Withdrawing workstation service for <ID1>.
Sep 9 NetworkManager[884]: <info> [] manager: (<ID2>): new Veth device (/org/freedesktop/NetworkManager/Devices/100)
Sep 9 NetworkManager[884]: <info> [] device (<ID1>): released from master device
Sep 9 kernel: : port 3() entered disabled state
...................................................................
Update:
There is another folder at the same location. when i remove the previous containers created and try to run the sudo docker-compose up -d from this folder. Gets below output and elasticsearch cluster health shows good for all 3 nodes but going to the Kibana Home frontend url still throws 'site can't be reached error. ANy help to resolve this issue.
Creating es02 ... done
Creating es03 ... done
Creating es01 ... done
Creating kib01 ... done

Related

How to connect to Docker containers using hostnames from Docker host? [duplicate]

This question already has answers here:
Access docker container from host using containers name
(7 answers)
Closed 3 years ago.
I want to connect to my Docker containers from my Docker host using hostnames.
I already know how to connect to containers by mapping their ports using docker run -p <host-port>:<container-port> ... and then access them through localhost.
Also, I can connect to containers using their IP-addresses given by docker inspect <container>. But these IP-adresses are not static.
How can I give containers hostnames, so that I can connect to them through exposed ports without having to think about non-static IPs?
Use docker-compose and make services in them. Each container will be a part of a Service and one container can talk to other container using the service name the container belongs to.
Ex:
$ cat docker-compose.yml
version: '3.1'
services:
server:
image: redis
command: [ "redis-server" ]
client:
image: redis
command: [ "redis-cli", "-h", "server", "ping" ]
links:
- server
$
$
$ docker-compose up
Starting server_1 ... done
Starting client_1 ... done
Attaching to server_1, client_1
client_1 | PONG
server_1 | 1:C 10 Dec 2019 12:59:20.161 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
server_1 | 1:C 10 Dec 2019 12:59:20.161 # Redis version=5.0.6, bits=64, commit=00000000, modified=0, pid=1, just started
server_1 | 1:C 10 Dec 2019 12:59:20.161 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
server_1 | 1:M 10 Dec 2019 12:59:20.162 * Running mode=standalone, port=6379.
server_1 | 1:M 10 Dec 2019 12:59:20.162 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
server_1 | 1:M 10 Dec 2019 12:59:20.162 # Server initialized
server_1 | 1:M 10 Dec 2019 12:59:20.162 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
server_1 | 1:M 10 Dec 2019 12:59:20.162 * Ready to accept connections
client_1 exited with code 0
Here, I created two services, server and client. server starts a redis-server and client tries to connect to the server. Also, note that I haven't exposed ports here, so the client container is talking to the server container using server (service-name)
client:
image: redis
command: [ "redis-cli", "-h", "server", "ping" ]

Redis is shutting down for no reason in docker container

I am trying to launch a redis docker container using docker-compose, but I always get this error. This is my docker-compose run commands docker-compose -f docker-compose.yml build and docker-compose -f docker-compose.yml up -d --force-recreate. I am running the docker containers on aws ecs and using a t2.micro ec2 instance. I am not sure if that is the reason why. Any insight would be helpful.
I have also included my docker-compose.yml
version: '2.1'
services:
redis:
image: redis:latest
container_name: redis
volumes:
- redis_data:/data
ports:
- 6379:6379
app:
image: custom_image
build: .
depends_on:
redis:
condition: service_started
ports:
- 8003:8003
links:
- redis
volumes:
redis_data:
Error
1:C 11 Sep 00:18:34.345 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 11 Sep 00:18:34.348 # Redis version=4.0.11, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 11 Sep 00:18:34.348 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
1:M 11 Sep 00:18:34.349 # You requested maxclients of 10000 requiring at least 10032 max file descriptors.
1:M 11 Sep 00:18:34.349 # Server can't set maximum open files to 10032 because of OS error: Operation not permitted.
1:M 11 Sep 00:18:34.349 # Current maximum open files is 4096. maxclients has been reduced to 4064 to compensate for low ulimit. If you need higher maxclients increase 'ulimit -n'.
1:M 11 Sep 00:18:34.350 * Running mode=standalone, port=6379.
1:M 11 Sep 00:18:34.350 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:M 11 Sep 00:18:34.350 # Server initialized
1:M 11 Sep 00:18:34.350 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
1:M 11 Sep 00:18:34.350 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1:M 11 Sep 00:18:34.350 * Ready to accept connections
1:signal-handler (1536625117) Received SIGTERM scheduling shutdown...
1:M 11 Sep 00:18:37.375 # User requested shutdown...
1:M 11 Sep 00:18:37.375 * Saving the final RDB snapshot before exiting.
1:M 11 Sep 00:18:37.378 * DB saved on disk
1:M 11 Sep 00:18:37.378 # Redis is now ready to exit, bye bye...
Ran into the same issues. After a bit of digging we found that it was killed by systemd due to inactive.
Run systemctl show docker.service command show that the inactive and active enter timestamp match up with when the redis service stop and start again.
InactiveEnterTimestamp=Tue 2021-08-03 22:07:19 AEST
ActiveEnterTimestamp=Wed 2021-08-04 09:30:36 AEST
Our solution is just to perform some activity on redis so that it doesn't enter inactive state.

Docker stopped all of sudden in CentOS 7

I was running docker on my CentOS 7 machine.
Today I was trying to upgrade a container. So I stopped the container and tried to pull new image.
I got the below error
Error getting v2 registry: Get https://registry-1.docker.io/v2/: proxyconnect tcp: dial tcp: lookup https_proxy=http: no such host"
I checked the proxy setting for machine in cat /etc/environment and for docker in cat /etc/systemd/system/docker.service.d/http-proxy.conf
It is set correctly.
I enabled daemon logs for docker and the logs says
Sep 14 10:43:18 myCentOsServer kernel: [4913751.074277] docker0: port 1(veth1e3300a) entered disabled state
Sep 14 10:43:18 myCentOsServer kernel: [4913751.084599] docker0: port 1(veth1e3300a) entered disabled state
Sep 14 10:43:18 myCentOsServer kernel: [4913751.084888] docker0: port 1(veth1e3300a) entered disabled state
Sep 14 10:43:18 myCentOsServer NetworkManager[794]: <info> [1505349798.0267] device (veth1e3300a): released from master device docker0
Sep 14 10:44:48 myCentOsServer dockerd[29136]: time="2017-09-14T10:44:48.802236300+10:00" level=warning msg="Error getting v2 registry: Get https://registry-1.docker.io/v2/: proxyconnect tcp: dial tcp: lookup https_proxy=http: no such host"
I tried below commands but it is stuck.
systemctl daemon-reload
systemctl restart docker
Any idea what might be the issue.
Thanks in advance.
I was finally able to solve this issue.
Issue was with my docker mount points. Mine was set as /var/lib/docker and I suspect it got courrupted when I did data volume export.
Steps I followed
1) Navigated to /var/lib/docker, took a backup of images,containers and volumes folder and deleted them.
2) Reloaded the Daemon
3) Restarted the docker.
Now it is working fine.
However bad news is I lost my datadump which I took from one of the containers (using volumes-from).
But it was a dev version of software. So I reinstalled and did the setup.
It occurs sometimes in CentOS. You can simply restart the docker service by
systemctl restart docker.service

When using mesos, marathon, and zookeeper my mesos-slave doesnt start when I specify the "containerizers" file with "docker,mesos"?

I have 3 CentOS VMs and I have installed Zookeeper, Marathon, and Mesos on the master node, while only putting Mesos on the other 2 VMs. The master node has no mesos-slave running on it. I am trying to run Docker containers so i specified "docker,mesos" in the containerizes file. One of the mesos-agents starts fine with this configuration and I have been able to deploy a container to that slave. However, the second mesos-agent simply fails when I have this configuration (it works if i take out that containerizes file but then it doesn't run containers). Here are some of the logs and information that has come up:
Here are some "messages" in the log directory:
Apr 26 16:09:12 centos-minion-3 systemd: Started Mesos Slave.
Apr 26 16:09:12 centos-minion-3 systemd: Starting Mesos Slave...
WARNING: Logging before InitGoogleLogging() is written to STDERR
[main.cpp:243] Build: 2017-04-12 16:39:09 by centos
[main.cpp:244] Version: 1.2.0
[main.cpp:247] Git tag: 1.2.0
[main.cpp:251] Git SHA: de306b5786de3c221bae1457c6f2ccaeb38eef9f
[logging.cpp:194] INFO level logging started!
[systemd.cpp:238] systemd version `219` detected
[main.cpp:342] Inializing systemd state
[systemd.cpp:326] Started systemd slice `mesos_executors.slice`
[containerizer.cpp:220] Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni
[linux_launcher.cpp:150] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
[provisioner.cpp:249] Using default backend 'copy'
[slave.cpp:211] Mesos agent started on (1)#172.22.150.87:5051
[slave.cpp:212] Flags at startup: --appc_simple_discovery_uri_prefix="http://" --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="docker,mesos" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" --enforce_container_disk_quota="false" --executor_registration_timeout="1mins" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_command_executor="false" --http_heartbeat_interval="30secs" --initialize_driver_logging="true" --isolation="posix/cpu,posix/mem" --launcher="linux" --launcher_dir="/usr/libexec/mesos" --log_dir="/var/log/mesos" --logbufsecs="0" --logging_level="INFO" --max_completed_executors_per_framework="150" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="1secs" --revocable_cpu_low_priority="true" --runtime_dir="/var/run/mesos" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --systemd_enable_support="true" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/var/lib/mesos"
[slave.cpp:541] Agent resources: cpus(*):1; mem(*):919; disk(*):2043; ports(*):[31000-32000]
[slave.cpp:549] Agent attributes: [ ]
[slave.cpp:554] Agent hostname: node3
[status_update_manager.cpp:177] Pausing sending status updates
[state.cpp:62] Recovering state from '/var/lib/mesos/meta'
[state.cpp:706] No committed checkpointed resources found at '/var/lib/mesos/meta/resources/resources.info'
[status_update_manager.cpp:203] Recovering status update manager
[docker.cpp:868] Recovering Docker containers
[containerizer.cpp:599] Recovering containerizer
[provisioner.cpp:410] Provisioner recovery complete
[group.cpp:340] Group process (zookeeper-group(1)#172.22.150.87:5051) connected to ZooKeeper
[group.cpp:830] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
[group.cpp:418] Trying to create path '/mesos' in ZooKeeper
[detector.cpp:152] Detected a new leader: (id='15')
[group.cpp:699] Trying to get '/mesos/json.info_0000000015' in ZooKeeper
[zookeeper.cpp:259] A new leading master (UPID=master#172.22.150.88:5050) is detected
Failed to perform recovery: Collect failed: Failed to run 'docker -H unix:///var/run/docker.sock ps -a': exited with status 1; stderr='Cannot connect to the Docker daemon. Is the docker daemon running on this host?'
To remedy this do as follows:
Step 1: rm -f /var/lib/mesos/meta/slaves/latest
This ensures agent doesn't recover old live executors.
Step 2: Restart the agent.
Apr 26 16:09:13 centos-minion-3 systemd: mesos-slave.service: main process exited, code=exited, status=1/FAILURE
Apr 26 16:09:13 centos-minion-3 systemd: Unit mesos-slave.service entered failed state.
Apr 26 16:09:13 centos-minion-3 systemd: mesos-slave.service failed.
Logs from docker:
$ sudo systemctl status docker
● docker.service - Docker Application Container Engine Loaded:
loaded (/usr/lib/systemd/system/docker.service; disabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/docker.service.d
└─flannel.conf Active: inactive (dead) since Tue 2017-04-25 18:00:03 CDT;
24h ago Docs: docs.docker.com Main PID: 872 (code=exited, status=0/SUCCESS)
Apr 26 18:25:25 centos-minion-3 systemd[1]: Dependency failed for Docker Application Container Engine.
Apr 26 18:25:25 centos-minion-3 systemd[1]: Job docker.service/start failed with result 'dependency'
Logs from flannel:
[flanneld-start: network.go:102] failed to retrieve network config: client: etcd cluster is unavailable or misconfigured
You have answer in your logs
Failed to perform recovery: Collect failed:
Failed to run 'docker -H unix:///var/run/docker.sock ps -a': exited with status 1;
stderr='Cannot connect to the Docker daemon. Is the docker daemon running on this host?'
To remedy this do as follows:
Step 1: rm -f /var/lib/mesos/meta/slaves/latest
This ensures agent doesn't recover old live executors.
Step 2: Restart the agent.
Mesos keeps it state/metadata on local disk. When it's restarted it try to load this state. If configuration changed and is not compatible with previous state it won't start.
Just bring docker to live by fixing problems with flannel and etcd and everything will be fine.
add the following flag while starting agent,
--reconfiguration_policy=additive
more details here: http://mesos.apache.org/documentation/latest/agent-recovery/

Setting multiple DOCKER_OPTS arguments

If you want to pass an option to the Docker Engine at startup on Ubuntu, you can edit the /etc/defaults/docker file.
Here I'm setting the storage driver to AUFS:
DOCKER_OPTS="--storage-driver=aufs"
However, if I pass more than one argument, Docker doesn't start. For example:
DOCKER_OPTS="--insecure-registry=0.0.0.0:5000 --storage-driver=aufs"
Now Docker fails to start:
# service docker stop && service docker start
docker start/running, process 31569
# service docker status
docker stop/waiting
From /var/log/syslog:
Mar 11 14:55:30 myhost kernel: [ 2788.030270] init: docker main process (31253) terminated with status 1
Mar 11 14:55:30 myhost kernel: [ 2788.030279] init: docker main process ended, respawning
Mar 11 14:55:30 myhost kernel: [ 2788.085931] init: docker main process (31287) terminated with status 1
Mar 11 14:55:30 myhost kernel: [ 2788.085940] init: docker respawning too fast, stopped
Each argument works on its own, but if passed together the Docker service refuses to start. I am using Docker version 1.10.3, build 20f81dd on Ubuntu 14.04 3.13.0-74-generic.
How can I pass more than one argument to DOCKER_OPTS?
The arguments must be separated by ,
This format works:
DOCKER_OPTS="--insecure-registry=0.0.0.0:5000,--storage-driver=aufs"

Resources