AWS ECS container exiting without specific reason - docker

ECS service is exiting after running for a couple of seconds. In case we make a container out of the image manually it is running fine, command: sudo docker run -i -p 9100:9100 -p 9110:9110 -p 9120:9120 -p 9130:9130 847782638323.dkr.ecr.us-east-1.amazonaws.com/bytemark/cap
ECS logs are as follows:
2017-01-23T19:37:00Z [INFO] Created docker container for task bytemark-cap:2 arn:aws:ecs:us-east-1:847782638323:task/5d0c49a2-9591-469e-b05b-ebbaaefac26e, Status: (CREATED->RUNNING) Containers: [bytemark-cap-container (CREATED->RUNNING),]: bytemark-cap-container(847782638323.dkr.ecr.us-east-1.amazonaws.com/bytemark/cap:latest) (CREATED->RUNNING) -> c8ee05c5cd688209939d96b54cb0c74c4122686036362d7bfa19f85bdc2dd56e
2017-01-23T19:37:00Z [INFO] Starting container module="TaskEngine" task="bytemark-cap:2 arn:aws:ecs:us-east-1:847782638323:task/5d0c49a2-9591-469e-b05b-ebbaaefac26e, Status: (CREATED->RUNNING) Containers: [bytemark-cap-container (CREATED->RUNNING),]" container="bytemark-cap-container(847782638323.dkr.ecr.us-east-1.amazonaws.com/bytemark/cap:latest) (CREATED->RUNNING)"
2017-01-23T19:37:01Z [INFO] Task change event module="TaskEngine" event="{TaskArn:arn:aws:ecs:us-east-1:847782638323:task/5d0c49a2-9591-469e-b05b-ebbaaefac26e Status:RUNNING Reason: SentStatus:NONE}"
2017-01-23T19:37:01Z [INFO] Adding event module="eventhandler" change="ContainerChange: arn:aws:ecs:us-east-1:847782638323:task/5d0c49a2-9591-469e-b05b-ebbaaefac26e bytemark-cap-container -> RUNNING, Ports [{9100 9100 0.0.0.0 0} {9110 9110 0.0.0.0 0} {9120 9120 0.0.0.0 0} {9130 9130 0.0.0.0 0}], Known Sent: NONE"
2017-01-23T19:37:01Z [INFO] Adding event module="eventhandler" change="TaskChange: arn:aws:ecs:us-east-1:847782638323:task/5d0c49a2-9591-469e-b05b-ebbaaefac26e -> RUNNING, Known Sent: NONE"
2017-01-23T19:37:01Z [INFO] Sending container change module="eventhandler" event="ContainerChange: arn:aws:ecs:us-east-1:847782638323:task/5d0c49a2-9591-469e-b05b-ebbaaefac26e bytemark-cap-container -> RUNNING, Ports [{9100 9100 0.0.0.0 0} {9110 9110 0.0.0.0 0} {9120 9120 0.0.0.0 0} {9130 9130 0.0.0.0 0}], Known Sent: NONE" change="ContainerChange: arn:aws:ecs:us-east-1:847782638323:task/5d0c49a2-9591-469e-b05b-ebbaaefac26e bytemark-cap-container -> RUNNING, Ports [{9100 9100 0.0.0.0 0} {9110 9110 0.0.0.0 0} {9120 9120 0.0.0.0 0} {9130 9130 0.0.0.0 0}], Known Sent: NONE"
2017-01-23T19:37:01Z [INFO] Redundant container state change for task bytemark-cap:2 arn:aws:ecs:us-east-1:847782638323:task/5d0c49a2-9591-469e-b05b-ebbaaefac26e, Status: (RUNNING->RUNNING) Containers: [bytemark-cap-container (RUNNING->RUNNING),]: bytemark-cap-container(847782638323.dkr.ecr.us-east-1.amazonaws.com/bytemark/cap:latest) (RUNNING->RUNNING) to RUNNING, but already RUNNING
2017-01-23T19:37:01Z [INFO] Sending task change module="eventhandler" event="TaskChange: arn:aws:ecs:us-east-1:847782638323:task/5d0c49a2-9591-469e-b05b-ebbaaefac26e -> RUNNING, Known Sent: NONE" change="TaskChange: arn:aws:ecs:us-east-1:847782638323:task/5d0c49a2-9591-469e-b05b-ebbaaefac26e -> RUNNING, Known Sent: NONE"
2017-01-23T19:37:01Z [INFO] Task change event module="TaskEngine" event="{TaskArn:arn:aws:ecs:us-east-1:847782638323:task/5d0c49a2-9591-469e-b05b-ebbaaefac26e Status:STOPPED Reason: SentStatus:RUNNING}"
2017-01-23T19:37:01Z [INFO] Error retrieving stats for container c8ee05c5cd688209939d96b54cb0c74c4122686036362d7bfa19f85bdc2dd56e: context canceled
2017-01-23T19:37:01Z [INFO] Adding event module="eventhandler" change="ContainerChange: arn:aws:ecs:us-east-1:847782638323:task/5d0c49a2-9591-469e-b05b-ebbaaefac26e bytemark-cap-container -> STOPPED, Exit 0, , Known Sent: RUNNING"
2017-01-23T19:37:01Z [INFO] Adding event module="eventhandler" change="TaskChange: arn:aws:ecs:us-east-1:847782638323:task/5d0c49a2-9591-469e-b05b-ebbaaefac26e -> STOPPED, Known Sent: RUNNING"
2017-01-23T19:37:01Z [INFO] Container c8ee05c5cd688209939d96b54cb0c74c4122686036362d7bfa19f85bdc2dd56e is terminal, stopping stats collection
Is there a way to debug the actual cause of the ECS service exiting.
Thank you.

If you can SSH into the docker instances (if you made it through ECS and specified a key or are using your own instances) you can get the logs from the docker.
I had an issue where my only docker container was dying after a few seconds so you can list the containers with
docker ps -a
Find the container you want to see the logs for and use
docker logs *container id*

Related

Included container does't work with docker compose

I have Kafka/Zookeeper container and Divolte container in - https://github.com/divolte/docker-divolte/blob/master/docker-compose.yml, which correctly starts and works by
docker-compose up -d --build
I want to add the hdfs container - https://hub.docker.com/r/mdouchement/hdfs/ which correctly starts and works by
docker run -p 22022:22 -p 8020:8020 -p 50010:50010 -p 50020:50020 -p 50070:50070 -p 50075:50075 -it mdouchement/hdfs
But after adding the code to yml:
hdfs:
image: mdouchement/hdfs
environment:
DIVOLTE_KAFKA_BROKER_LIST: kafka:9092
ports:
- "22022:22"
- "8020:8020"
- "50010:50010"
- "50020:50020"
- "50070:50070"
- "50075:50075"
depends_on:
- kafka
The web http://localhost:50070 and data node http://localhost:8020/ did not answer. Could you help me to add new container? Which of hdfs ports do I have to write as source connection port?
The logs of HDFS container is:
2020-02-21T15:11:47.613270635Z Starting OpenBSD Secure Shell server: sshd.
2020-02-21T15:11:50.440130986Z Starting namenodes on [localhost]
2020-02-21T15:11:54.616344960Z localhost: Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
2020-02-21T15:11:54.616369660Z localhost: starting namenode, logging to /opt/hadoop/logs/hadoop-root-namenode-278b399bc998.out
2020-02-21T15:11:59.328993612Z localhost: Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
2020-02-21T15:11:59.329016212Z localhost: starting datanode, logging to /opt/hadoop/logs/hadoop-root-datanode-278b399bc998.out
2020-02-21T15:12:06.078269195Z Starting secondary namenodes [0.0.0.0]
2020-02-21T15:12:10.837364362Z 0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.
2020-02-21T15:12:10.839375064Z 0.0.0.0: starting secondarynamenode, logging to /opt/hadoop/logs/hadoop-root-secondarynamenode-278b399bc998.out
2020-02-21T15:12:17.249040842Z starting portmap, logging to /opt/hadoop/logs/hadoop--portmap-278b399bc998.out
2020-02-21T15:12:18.253954832Z DEPRECATED: Use of this script to execute hdfs command is deprecated.
2020-02-21T15:12:18.253993233Z Instead use the hdfs command for it.
2020-02-21T15:12:18.254002633Z
2020-02-21T15:12:21.277829129Z starting nfs3, logging to /opt/hadoop/logs/hadoop--nfs3-278b399bc998.out
2020-02-21T15:12:22.284864146Z DEPRECATED: Use of this script to execute hdfs command is deprecated.
2020-02-21T15:12:22.284883446Z Instead use the hdfs command for it.
2020-02-21T15:12:22.284887146Z
Port description:
Ports
Portmap -> 111
NFS -> 2049
HDFS namenode -> 8020 (hdfs://localhost:8020)
HDFS datanode -> 50010
HDFS datanode (ipc) -> 50020
HDFS Web browser -> 50070
HDFS datanode (http) -> 50075
HDFS secondary namenode -> 50090
SSH -> 22
The docker-compose response answer is:
Name Command State Ports
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
divolte-streamsets-quickstart_divolte_1 /opt/divolte/start.sh Up 0.0.0.0:8290->8290/tcp
divolte-streamsets-quickstart_hdfs_1 /bin/sh -c service ssh sta ... Exit 0
divolte-streamsets-quickstart_kafka_1 supervisord -n Up 2181/tcp, 9092/tcp, 9093/tcp, 9094/tcp, 9095/tcp, 9096/tcp, 9097/tcp, 9098/tcp, 9099/tcp
divolte-streamsets-quickstart_streamsets_1 /docker-entrypoint.sh dc -exec Up 0.0.0.0:18630->18630/tcp

How to add advanced Kafka configurations to the Landoop `fast-data-dev` image

Does anyone know how to pass custom Kafka configuration options to the landoop/fast-data-dev docker image?
Since there is no way to puss a custom config file and/or config params, what I've tried so far was to mount my own server.properties config file into /opt/confluent/etc/kafka by adding the following into my docker compose file
landoop:
hostname: 'landoop'
image: 'landoop/fast-data-dev:latest'
expose:
- '3030'
ports:
- '3030:3030'
environment:
- RUNTESTS=0
- RUN_AS_ROOT=1
volumes:
- ./docker/landoop/tmp:/tmp
- ./docker/landoop/opt/confluent/etc/kafka:/opt/confluent/etc/kafka
however, this causes Kafka to throw the following logs:
landoop_1 | 2017-09-28 11:53:03,886 INFO exited: broker (exit status 1; not expected)
landoop_1 | 2017-09-28 11:53:04,749 INFO spawned: 'broker' with pid 281
landoop_1 | 2017-09-28 11:53:05,851 INFO success: broker entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
landoop_1 | 2017-09-28 11:53:11,867 INFO exited: rest-proxy (exit status 1; not expected)
landoop_1 | 2017-09-28 11:53:12,604 INFO spawned: 'rest-proxy' with pid 314
landoop_1 | 2017-09-28 11:53:13,024 INFO exited: schema-registry (exit status 1; not expected)
landoop_1 | 2017-09-28 11:53:13,735 INFO spawned: 'schema-registry' with pid 341
landoop_1 | 2017-09-28 11:53:13,739 INFO success: rest-proxy entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
in addition, when I go to http://localhost:3030/kafka-topics-ui/, I see the following:
KAFKA REST
/api/kafka-rest-proxy
CONNECTIVITY ERROR
Any suggestions? Thank you.
There are couple of things that i did which simplified the whole process. This is valid only for dev environment
Start the docker with a interactive shell as the entry point
Start the docker on the host network
Make necessary changes to the server.properties file include the host IP where the docker was running (as mentioned in (2) docker is running on host network)
** 4. If you want to make any advanced configuration, you may do now**
Run the actual entry point "/usr/local/bin/setup-and-run.sh"
Actual commands used:
Start the container
sudo docker run -it --entrypoint /bin/bash --net=host --rm -e ADV_HOST=HOSTIP landoop/fast-data-dev:latest
Add following to /run/broker/server.properties
advertised.host.name = HOST-IP
advertised.port = 9092
Run /usr/local/bin/setup-and-run.sh
At today, with latest version of landoop/fast-data-dev is possible to specify custom Kafka configuration options by converting the configuration option to uppercase, replacing dots with underscores and prepending with KAFKA_.
For example if you want to set specific values for "log.retention.bytes" and "log.retention.hours" you should add the following to your docker compose environment section:
environment:
KAFKA_LOG_RETENTION_BYTES: 1073741824
KAFKA_LOG_RETENTION_HOURS: 48
ADV_HOST: 127.0.0.1
RUNTESTS: 0
BROWSECONFIGS: 1
You can specify this way also configuration options for other services ( schema registry, connect, rest proxy ). Check doc for details https://hub.docker.com/r/landoop/fast-data-dev/.
Once the container is up you can confirm this by looking at configuration file at the following path inside the container:
/run/broker/server.properties
or also through the Landoop UI at the following URL if you've set "BROWSECONFIGS" to 1 on the environments parameters:
http://127.0.0.1:3030/config/broker/server.properties

Using Docker with Sensu

i am a Novice in Docker and wanted to use Sensu for monitoring containers. I have set up a Sensu server and Sensu client ( where my Docker containers are running ) using the below material:
Click [here] (http://devopscube.com/monitor-docker-containers-guide/)
I get the Sensu client information in Uchiwa Dashboard while running the below command:
docker run -d --name sensu-client --privileged \
-v $PWD/load-docker-metrics.sh:/etc/sensu/plugins/load-docker-metrics.sh \
-v /var/run/docker.sock:/var/run/docker.sock \
usman/sensu-client SENSU_SERVER_IP RABIT_MQ_USER RABIT_MQ_PASSWORD CLIENT_NAME CLIENT_IP
However, when i try to fire a new container from the same host machine , i do not get the information of the client in Uchiwa Dashboard.
It would be great if anyone have used Sensu with Docker to monitor Docker containers can guide on the same.
Thanks for the time.
Please logs of the sensu-client
'Supervisord is running as root and it is searching '
2017-01-09 04:11:47,210 CRIT Supervisor running as root (no user in config file)
2017-01-09 04:11:47,212 INFO supervisord started with pid 12
2017-01-09 04:11:48,214 INFO spawned: 'sensu-client' with pid 15
2017-01-09 04:11:49,524 INFO success: sensu-client entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2017-01-09 04:11:49,530 INFO exited: sensu-client (exit status 0; expected)
[ec2-user#ip-172-31-0-89 sensu-client]$ sudo su
[root#ip-172-31-0-89 sensu-client]# docker logs sensu-client
/usr/lib/python2.6/site-packages/supervisor-3.1.3-py2.6.egg/supervisor/options.py:296: UserWarning: Supervisord is running as root and it is searching for its configuration file in default locations (
including its current working directory); you probably want to specify a "-c" argument specifying an absolute path to a configuration file for improved security.
'Supervisord is running as root and it is searching '
2017-01-09 04:11:47,210 CRIT Supervisor running as root (no user in config file)
2017-01-09 04:11:47,212 INFO supervisord started with pid 12
2017-01-09 04:11:48,214 INFO spawned: 'sensu-client' with pid 15
2017-01-09 04:11:49,524 INFO success: sensu-client entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2017-01-09 04:11:49,530 INFO exited: sensu-client (exit status 0; expected)

Docker Cloud Service Discovery Two Containers

In DockerCloud I am trying to get my container to speak with the other container. I believe the problem is the hostname not resolving (this is set in /conf.d/kafka.yaml shown below).
To get DockerCloud to have the two containers communicate, I have tried many variations including the full host-name kafka-development-1 and kafka-development-1.kafka, etc.
The error I keep getting is in the datadog-agent info:
Within the container I run ./etc/init.d/datadog-agent info and receive:
kafka
-----
- instance #kafka-kafka-development-9092 [ERROR]: 'Cannot connect to instance
kafka-development:9092 java.io.IOException: Failed to retrieve RMIServer stub:
javax.naming.CommunicationException [Root exception is java.rmi.ConnectIOException: error during JRMP connection establishment; nested exception is: \n\tjava.net.SocketException: Connection reset]' collected 0 metrics
- Collected 0 metrics, 0 events & 0 service checks
The steps I take for details
SSH Into Docker Node:
$ docker ps
CONTAINER | PORTS
datadog-agent-kafka-development-1.2fb73f62 | 8125/udp, 9001/tcp
kafka-development-1.3dc7c2d0 | 0.0.0.0:9092->9092/tcp
I log into the containers to see their values, this is the datadog-agent:
$ docker exec -it datadog-agent-kafka-development-1.2fb73f62 /bin/bash
$ > echo $DOCKERCLOUD_CONTAINER_HOSTNAME
datadog-agent-kafka-development-1
$ > tail /etc/hosts
172.17.0.7 datadog-agent-kafka-development-1
10.7.0.151 datadog-agent-kafka-development-1
This is the kafka container:
$ docker exec -it kafka-development-1.3dc7c2d0 /bin/bash
$ > echo $DOCKERCLOUD_CONTAINER_HOSTNAME
kafka-development-1
$ > tail /etc/hosts
172.17.0.6 kafka-development-1
10.7.0.8 kafka-development-1
$ > echo $KAFKA_ADVERTISED_HOST_NAME
kafka-development.c23d1d00.svc.dockerapp.io
$ > echo $KAFKA_ADVERTISED_PORT
9092
$ > echo $KAFKA_ZOOKEEPER_CONNECT
zookeeper-development:2181
Datadog conf.d/kafka.yaml:
instances:
- host: kafka-development
port: 9092 # This is the JMX port on which Kafka exposes its metrics (usually 9999)
tags:
kafka: broker
env: development
# ... Defaults Below
Can anyone see what I am doing wrong?

No nodes connecting to host in docker swarm

I just followed this tutorial step by step for setting up a docker swarm in EC2 -- https://docs.docker.com/swarm/install-manual/
I created 4 Amazon Servers using the Amazon Linux AMI.
manager + consul
manager
node1
node2
I followed the instructions to start the swarm and everything seems to go ok regarding making the docker instances.
Server 1
Running docker ps gives:
The Consul logs show this
2016/07/05 20:18:47 [INFO] serf: EventMemberJoin: 729a440e5d0d 172.17.0.2
2016/07/05 20:18:47 [INFO] serf: EventMemberJoin: 729a440e5d0d.dc1 172.17.0.2
2016/07/05 20:18:48 [INFO] raft: Node at 172.17.0.2:8300 [Follower] entering Follower state
2016/07/05 20:18:48 [INFO] consul: adding server 729a440e5d0d (Addr: 172.17.0.2:8300) (DC: dc1)
2016/07/05 20:18:48 [INFO] consul: adding server 729a440e5d0d.dc1 (Addr: 172.17.0.2:8300) (DC: dc1)
2016/07/05 20:18:48 [ERR] agent: failed to sync remote state: No cluster leader
2016/07/05 20:18:49 [WARN] raft: Heartbeat timeout reached, starting election
2016/07/05 20:18:49 [INFO] raft: Node at 172.17.0.2:8300 [Candidate] entering Candidate state
2016/07/05 20:18:49 [INFO] raft: Election won. Tally: 1
2016/07/05 20:18:49 [INFO] raft: Node at 172.17.0.2:8300 [Leader] entering Leader state
2016/07/05 20:18:49 [INFO] consul: cluster leadership acquired
2016/07/05 20:18:49 [INFO] consul: New leader elected: 729a440e5d0d
2016/07/05 20:18:49 [INFO] raft: Disabling EnableSingleNode (bootstrap)
2016/07/05 20:18:49 [INFO] consul: member '729a440e5d0d' joined, marking health alive
2016/07/05 20:18:50 [INFO] agent: Synced service 'consul'
I registered each node using the following command with appropriate IP's
docker run -d swarm join --advertise=x-x-x-x:2375 consul://x-x-x-x:8500
Each of those created a docker instance
Node1
Running docker ps gives:
With logs that suggest there's a problem:
time="2016-07-05T21:33:50Z" level=info msg="Registering on the discovery service every 1m0s..." addr="172.31.17.35:2375" discovery="consul://172.31.3.233:8500"
time="2016-07-05T21:36:20Z" level=error msg="cannot set or renew session for ttl, unable to operate on sessions"
time="2016-07-05T21:37:20Z" level=info msg="Registering on the discovery service every 1m0s..." addr="172.31.17.35:2375" discovery="consul://172.31.3.233:8500"
time="2016-07-05T21:39:50Z" level=error msg="cannot set or renew session for ttl, unable to operate on sessions"
time="2016-07-05T21:40:50Z" level=info msg="Registering on the discovery service every 1m0s..." addr="172.31.17.35:2375" discovery="consul://172.31.3.233:8500"
...
And lastly when I get to the last step of trying to get host information like so on my Consul machine,
docker -H :4000 info
I see no nodes. Lastly when I try the step of running an app, I get the obvious error:
[ec2-user#ip-172-31-3-233 ~]$ docker -H :4000 run hello-world
docker: Error response from daemon: No healthy node available in the cluster.
See 'docker run --help'.
[ec2-user#ip-172-31-3-233 ~]$
Thanks for any insight on this. I'm still pretty confused by much of the swarm model and not sure where to go from here to diagnose.
It looks like Consul is either not binding to a public IP address, or is not accessible on the public IP due to security group or VPC settings. You are setting the discovery URL to consul://172.31.3.233:8500 on the Docker nodes, so I would sugest trying to connect to that address from an external IP, either in your browser or via curl like this:
% curl http://172.31.3.233:8500/ui/dist/
HTML
If you cannot connect (connection refused or timeout) then add a TCP port 8500 ingress rule to your AWS VMs, and try again.
After investigating your issue, I see that you forgot open port 2375 for Docker Engine in all four nodes.
Before starting Swarm Manager or Swarm Node, you have to open a TCP Port for Docker Engine, so Swarm will work with Docker Engine via that Port.
With Docker on Ubuntu 14.04, you can open the port by change file /etc/default/docker and add -H tcp://0.0.0.0:2375 to DOCKER_OPTS. For example:
DOCKER_OPTS="-H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock"
After that, you restart Docker Engine
service docker restart
If you are using CentOS, the solution is same, you can read my blog article https://sonnguyen.ws/install-docker-docker-swarm-centos7/
And the other thing, I think that you should install and run Consul in all nodes (4 servers). So your Swarm can work with Consul on its node

Resources