Docker container with status "Dead" after consul healthcheck runs - docker

I am using consul's healthcheck feature, and I keep getting these these "dead" containers:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
20fd397ba638 progrium/consul:latest "\"/bin/bash -c 'cur 15 minutes ago Dead
What is exactly a "Dead" container? When does a stopped container become "Dead"?
For the record, I run progrium/consul + gliderlabs/registrator images + SERVICE_XXXX_CHECK env variables to do health checking. It runs a healthcheck script running an image every X secs, something like docker run --rm my/img healthcheck.sh
I'm interested in general to what "dead" means and how to prevent it from happening. Another peculiar thing is that my dead containers have no name.
this is some info from the container inspection:
"State": {
"Dead": true,
"Error": "",
"ExitCode": 1,
"FinishedAt": "2015-05-30T19:00:01.814291614Z",
"OOMKilled": false,
"Paused": false,
"Pid": 0,
"Restarting": false,
"Running": false,
"StartedAt": "2015-05-30T18:59:51.739464262Z"
},
The strange thing is that only every now and then a container becomes dead and isn't removed.
Thank you
Edit:
Looking at the logs, I found what makes the container stop fail:
Handler for DELETE /containers/{name:.*} returned error: Cannot destroy container 003876e41429013e46187ebcf6acce1486bc5011435c610bd163b159ba550fbc:
Driver aufs failed to remove root filesystem 003876e41429013e46187ebcf6acce1486bc5011435c610bd163b159ba550fbc:
rename /var/lib/docker/aufs/diff/003876e41429013e46187ebcf6acce1486bc5011435c610bd163b159ba550fbc
/var/lib/docker/aufs/ diff/003876e41429013e46187ebcf6acce1486bc5011435c610bd163b159ba550fbc-removing:
device or resource busy
Why does this happen?
edit2:
found this: https://github.com/docker/docker/issues/9665

Update March 2016: issue 9665 has just been closed by PR 21107 (for docker 1.11 possibly)
That should help avoid the "Driver aufs failed to remove root filesystem", "device or resource busy" problem.
Original answer May 2015
Dead is one if the container states, which is tested by Container.Start()
if container.removalInProgress || container.Dead {
return fmt.Errorf("Container is marked for removal and cannot be started.")
}
It is set Dead when stopping fails, in order to prevent that container to be restarting.
Amongst the possible cause of failure, see container.Kill().
It means kill -15 and kill -9 are both failing.
// 1. Send a SIGTERM
if err := container.killPossiblyDeadProcess(15); err != nil {
logrus.Infof("Failed to send SIGTERM to the process, force killing")
if err := container.killPossiblyDeadProcess(9); err != nil {
That usually mean, as the OP mention, a busy device or resource, preventing the process to be killed.

There are a lot of bugs caused by EBUSY, in particular when devicemapper is used.
There is a tracker bug for all of the EBUSY related issues.
see https://github.com/docker/docker/issues/5684#issuecomment-69052334

Related

How to configure a JanusGraph image for running in docker

I follow this installation guide : https://docs.janusgraph.org/getting-started/installation/
I run :
docker run -it -p 8182:8182 janusgraph/janusgraph
but when i try to connect with the gremlin console I have this exception :
gremlin-driver-initializer] INFO org.apache.tinkerpop.gremlin.driver.ConnectionPool - Signalled closing of connection pool on Host{address=localhost/127.0.0.1:8182, hostUri=ws://localhost:8182/gremlin} with core size of 2
18:32:42.556 [gremlin-driver-initializer] ERROR org.apache.tinkerpop.gremlin.driver.Client - Could not initialize client for Host{address=localhost/127.0.0.1:8182, hostUri=ws://localhost:8182/gremlin}
18:32:42.560 [main] ERROR org.apache.tinkerpop.gremlin.driver.Client -
java.net.ConnectException: Connection refused: no further information*
I try with docker desktop and realize than my container automatically stop after 26 seconds. I have read than docker container automatically stop when nothing run on it. When I inspect it there is the message :
/etc/opt/janusgraph/janusgraph-server.yaml will be used to start JanusGraph Server in foreground.
Could you help me to configure it ?
When you start the container with janusgraph server like you did, it should proceed with log messages until:
6028 [gremlin-server-boss-1] INFO org.apache.tinkerpop.gremlin.server.GremlinServer - Gremlin Server configured with worker thread pool of 1, gremlin pool of 8 and boss thread pool of 1.
6028 [gremlin-server-boss-1] INFO org.apache.tinkerpop.gremlin.server.GremlinServer - Channel started at port 8182.
It then keeps running in a waiting loop for clients to connect.
When connecting from a local gremlin console with the same TinkerPop version as logged by the janusgraph container, connecting should proceed as follows:
plugin activated: janusgraph.imports
gremlin> :remote connect tinkerpop.server conf/remote.yaml
==>Configured localhost/127.0.0.1:8182
gremlin> :remote console
==>All scripts will now be sent to Gremlin Server - [localhost/127.0.0.1:8182] - type ':remote console' to return to local mode
gremlin> g.V()
gremlin> g.addV()
==>v[4264]
Thanks for your help, HadoopMarc, but it wasn't a issue of different console version, i try to call:
docker inspect <container-id>
and I discover that :
"State": {
"Status": "exited",
"Running": false,
"Paused": false,
"Restarting": false,
"OOMKilled": true,
"Dead": false,
"Pid": 0,
"ExitCode": 137,
"Error": "",
"StartedAt": "2022-11-22T08:18:16.4661912Z",
"FinishedAt": "2022-11-22T08:19:21.7929991Z"
},
with "OOMKilled": true. I affect more RAM to my docker container ( swith from 1Go to 20 Go and my docker container successfully started ! :)
But still with no logs .. :(

Facing issue while deploying Docker images through AWS-Greengrass Connector Service

BACKGROUND:
We are trying to deploy App as a docker container through AWS-Greengrass Connector Service to the edge device (Running Greengrass core as container in Linux env).
We are configuring the greengrass group connector in cloud for docker app deployment.
ISSUES:
While deploying from AWS greengrass group (AWS cloud), we are able to see successful deployment message, but application is not getting deployed to the edge device (running greengrass core as container).
LOGS:
DockerApplicationDeploymentLog:
[2020-11-05T10:35:42.632Z][FATAL]-lambda_runtime.py:381,Failed to initialize Lambda runtime due to exception: "getgrnam(): name not found: 'docker'"
[2020-11-05T10:35:44.789Z][WARN]-ipc_client.py:162,deprecated arg port=8000 will be ignored
[2020-11-05T10:35:45.012Z][WARN]-ipc_client.py:162,deprecated arg port=8000 will be ignored
[2020-11-05T10:35:45.012Z][INFO]-docker_deployer.py:41,docker deployer starting up
[2020-11-05T10:35:45.012Z][INFO]-docker_deployer.py:45,checking inputs
[2020-11-05T10:35:45.012Z][INFO]-docker_deployer.py:52,docker group permissions
[2020-11-05T10:35:45.02Z][FATAL]-lambda_runtime.py:141,Failed to import handler function "handlers.function_handler" due to exception: "getgrnam(): name not found: 'docker'"
RuntimeSystemLog:
[2020-11-05T10:31:49.78Z][DEBUG]-Restart worker because it was killed. {"workerId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5", "funcArn": "arn:aws:lambda:ap-south-1:aws:function:DockerApplicationDeployment:6"}
[2020-11-05T10:31:49.78Z][DEBUG]-Reserve worker. {"workerId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5", "funcArn": "arn:aws:lambda:ap-south-1:aws:function:DockerApplicationDeployment:6"}
[2020-11-05T10:31:49.78Z][DEBUG]-Doing start attempt: {"Attempt count": 0, "workerId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5", "funcArn": "arn:aws:lambda:ap-south-1:aws:function:DockerApplicationDeployment:6"}
[2020-11-05T10:31:49.78Z][DEBUG]-Creating directory. {"dir": "/greengrass/ggc/packages/1.11.0/var/lambda/8b0ee21d-e481-4d27-5e30-cb4d912547f5"}
[2020-11-05T10:31:49.78Z][DEBUG]-changed ownership {"path": "/greengrass/ggc/packages/1.11.0/var/lambda/8b0ee21d-e481-4d27-5e30-cb4d912547f5", "new uid": 121, "new gid": 121}
[2020-11-05T10:31:49.782Z][DEBUG]-Resolving environment variable {"Variable": "PYTHONPATH=/greengrass/ggc/deployment/lambda/arn.aws.lambda.ap-south-1.aws.function.DockerApplicationDeployment.6"}
[2020-11-05T10:31:49.79Z][DEBUG]-Resolving environment variable {"Variable": "PATH=/usr/bin:/usr/local/bin"}
[2020-11-05T10:31:49.799Z][DEBUG]-Resolving environment variable {"Variable": "DOCKER_DEPLOYER_DOCKER_COMPOSE_DESTINATION_FILE_PATH=/home/ggc_user"}
[2020-11-05T10:31:49.82Z][DEBUG]-Creating new worker. {"functionArn": "arn:aws:lambda:ap-south-1:aws:function:DockerApplicationDeployment:6", "workerId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5"}
[2020-11-05T10:31:49.82Z][DEBUG]-Starting worker process. {"workerId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5"}
[2020-11-05T10:31:49.829Z][DEBUG]-Worker process started. {"workerId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5", "pid": 20471}
[2020-11-05T10:31:49.83Z][DEBUG]-Start work result: {"workerId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5", "funcArn": "arn:aws:lambda:ap-south-1:aws:function:DockerApplicationDeployment:6", "state": "Starting", "initDurationSeconds": 0.012234454}
[2020-11-05T10:31:49.831Z][INFO]-Created worker. {"functionArn": "arn:aws:lambda:ap-south-1:aws:function:DockerApplicationDeployment:6", "workerId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5", "pid": 20471}
[2020-11-05T10:31:53.155Z][DEBUG]-Received a credential provider request {"serverLambdaArn": "arn:aws:lambda:::function:GGTES", "clientId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5"}
[2020-11-05T10:31:53.156Z][DEBUG]-WorkManager getting work {"workerId": "148f7a1a-168f-40a5-682d-92e00d56a5df", "funcArn": "arn:aws:lambda:::function:GGTES", "invocationId": "955c2c43-1187-4001-7988-4213b95eb584"}
[2020-11-05T10:31:53.156Z][DEBUG]-Successfully GET work. {"invocationId": "955c2c43-1187-4001-7988-4213b95eb584", "fromWorkerId": "148f7a1a-168f-40a5-682d-92e00d56a5df", "ofFunction": "arn:aws:lambda:::function:GGTES"}
[2020-11-05T10:31:53.156Z][DEBUG]-POST work result. {"invocationId": "955c2c43-1187-4001-7988-4213b95eb584", "ofFunction": "arn:aws:lambda:::function:GGTES"}
[2020-11-05T10:31:53.156Z][DEBUG]-WorkManager putting work result. {"workerId": "148f7a1a-168f-40a5-682d-92e00d56a5df", "invocationId": "955c2c43-1187-4001-7988-4213b95eb584"}
[2020-11-05T10:31:53.156Z][DEBUG]-WorkManager put work result successfully. {"workerId": "148f7a1a-168f-40a5-682d-92e00d56a5df", "invocationId": "955c2c43-1187-4001-7988-4213b95eb584"}
[2020-11-05T10:31:53.156Z][DEBUG]-Successfully POST work result. {"invocationId": "955c2c43-1187-4001-7988-4213b95eb584", "ofFunction": "arn:aws:lambda:::function:GGTES"}
[2020-11-05T10:31:53.157Z][DEBUG]-Handled a credential provider request {"clientId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5"}
[2020-11-05T10:31:53.158Z][DEBUG]-GET work item. {"fromWorkerId": "148f7a1a-168f-40a5-682d-92e00d56a5df", "ofFunction": "arn:aws:lambda:::function:GGTES"}
[2020-11-05T10:31:53.158Z][DEBUG]-Worker timer doesn't exist. {"workerId": "148f7a1a-168f-40a5-682d-92e00d56a5df"}
Did you doublecheck to meet the requirments listed in
https://docs.aws.amazon.com/greengrass/latest/developerguide/docker-app-connector.html
https://docs.aws.amazon.com/greengrass/latest/developerguide/docker-app-connector.html#docker-app-connector-linux-user
I dont know this particular error, but it complains about some missing basic user/group settings:
[2020-11-05T10:35:42.632Z][FATAL]-lambda_runtime.py:381,Failed to initialize Lambda runtime due to exception: "getgrnam(): name not found: 'docker'"

docker: Error creating container: 400 Client Error: Bad Request (\"invalid reference format\")"

While trying to build an awx image (Ansible works) for ppc64le, the following comes up:
TASK [image_build : Build AWX distribution using container] ***************************************************************************************************************************************************
fatal: [localhost -> localhost]: FAILED! => {"changed": false, "msg": "Error creating container: 400 Client Error: Bad Request (\"invalid reference format\")"}
to retry, use: --limit #/root/awx/installer/install.retry
PLAY RECAP ****************************************************************************************************************************************************************************************************
localhost : ok=10 changed=3 unreachable=0 failed=1
How can I see what really happens in the background? Any verbose docker logs that I can look at? The message itself is somewhat useless to me. I already set Ansible to verbose but this also was of no help.
Docker image names can only consist of lowercase (a-z) characters.
Either you are giving a un-supported image name or the variable(or paths) passed to the buid(or the container) cannot be resolved.
To enable debug logs, add "--debug" to docker daemon (/etc/systemd/system/multi-user.target.wants/docker.service for systemd based linux env)
For reference: https://docs.docker.com/config/daemon/#configure-the-docker-daemon

Docker container RestartCount not incrementing

Test
def test_can_pop_new_container(self):
config = {
'ip': '10.49.0.2',
'subnet': '10.49.0.0/16',
'gateway': '10.49.0.202',
'vlan': 102,
'hostname': 'test-container',
}
container = container_services.pop_new_container(config, self.docker_api)
inspection = self.docker_api.inspect_container(container.get('Id'))
print('before', inspection.get('RestartCount'), inspection.get('StartedAt'))
container_services.restart(container, self.docker_api)
new_inspection = self.docker_api.inspect_container(container.get('Id'))
print('after', new_inspection.get('RestartCount'), new_inspection.get('StartedAt'))
Code
def restart(container, docker_client):
return docker_client.restart(container.get('Id'))
Output
From the test I get
before 0 None
after 0 None
From docker ps that confirm the container restarted.
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
86f16438ffdd docker.akema.fr:5000/coaxis/coaxisopt_daemon:latest "/usr/bin/supervis..." 28 seconds ago Up 17 seconds confident_dijkstra
Question
Why is RestartCount still at 0 then? Am I using the wrong field?
As already indicated in the comment, the field RestartCount is used in the context of Restart Policies to keep track of restart attempts in case of failures.
It will not be incremented in case of user-initiated restarts.
You can look at docker events to keep track on normal container restarts. This is also available for dockerpy.

Debugging Elastic Beanstalk Docker run failures?

I'm new to EB and AWS, and my docker images build fine but fail to run on Elastic Beanstalk. My suspicion is that they are not connecting to the database correctly, however, I'm not getting anything useful when I run "eb logs" from the commandline. Here are the errors:
{
"status": "FAILURE",
"api_version": "1.0",
"results": [
{
"status": "FAILURE",
"msg": "(TRUNCATED)...rrun.aws.json: No such file or directory
73927c49adff622a1a229d9369bdd80674d96d20f3eb99a9cdea786f4411a368
Docker container quit unexpectedly after launch: Docker container quit unexpectedly on Wed May 20 17:15:02 UTC 2015:.
Check snapshot logs for details.
Hook /opt/elasticbeanstalk/hooks/appdeploy/pre/04run.sh failed.
For more detail, check /var/log/eb-activity.log using console or EB CLI",
"returncode": 1,
"events": [
{
"msg": "Successfully pulled node:0.12.2-slim",
"severity": "TRACE",
"timestamp": 1432142064
},
{
"msg": "Successfully built aws_beanstalk/staging-app",
"severity": "TRACE",
"timestamp": 1432142094
},
{
"msg": "Docker container quit unexpectedly after launch: Docker container quit unexpectedly on Wed May 20 17:15:02 UTC 2015:. Check snapshot logs for details.",
"severity": "ERROR",
"timestamp": 1432142102
}
]
}
],
"truncated": "true"
}
And after the build completes:
[2015-05-20T17:15:02.694Z] INFO [8603] - [CMD-AppDeploy/AppDeployStage0/AppDeployPreHook/04run.sh] : Activity execution failed, because: cat: /var/app/current/Dockerrun.aws.json: No such file or directory
cat: /var/app/current/Dockerrun.aws.json: No such file or directory
73927c49adff622a1a229d9369bdd80674d96d20f3eb99a9cdea786f4411a368
Docker container quit unexpectedly after launch: Docker container quit unexpectedly on Wed May 20 17:15:02 UTC 2015:. Check snapshot logs for details. (ElasticBeanstalk::ExternalInvocationError)
caused by: cat: /var/app/current/Dockerrun.aws.json: No such file or directory
cat: /var/app/current/Dockerrun.aws.json: No such file or directory
73927c49adff622a1a229d9369bdd80674d96d20f3eb99a9cdea786f4411a368
Docker container quit unexpectedly after launch: Docker container quit unexpectedly on Wed May 20 17:15:02 UTC 2015:. Check snapshot logs for details. (Executor::NonZeroExitStatus)
The docker containers work locally, so what else can I do to figure out what's going wrong? I keep hearing about "snapshot logs" but where do I check these snapshot logs? Are they the output of what I'm already running "eb logs"?
I had this issue for a day or two. I managed to see the logs by going AWS Console > Elastic Beanstalk > Environment > ${YOUR_APPLICATION_ENV}
On the left pane;
Log > Request Logs > Download > Open in any text editor.
/var/log/eb-docker/containers/eb-current-app/
Follow the path and you will see the what causing the error and can fix it.
Assuming you have SSH access to the EC2 instance running your container, these are a few log files useful for debugging single container Docker instances in Beanstalk:
/tmp/docker_build.log
/tmp/docker_pull.log
/tmp/docker_run.log
In order to look at the error logs for the running process, first read the
/tmp/docker_run.log file. This file contains the Docker process id. Something like this:
c6ae58e4ad77e926f6a8230237acf95771c6b5d80d48fb1bc20591f964fd690c
The first few characters should match the process listed from the command docker ps. Use this value to find the corresponding log file in the following directory:
/var/log/eb-docker/containers/eb-current-app/
The format of the file name is eb-docker-ps-id-stdouterr.log
I had this issue when my containers were crashing because there was no traffic allowed between EBS and RDS. If you use any database try curling it. Also, you might want to try sudo docker logs CONTAINER_ID a try catching something useful. What might help also is trying to launch container manually from the instance. There's slight possibility something will come up.

Resources