Jenkins slave running in ECS cluster can not start container

Jenkins slave running in ECS cluster can not start container - docker

I'm using jenkins slave in AWS ECS cluster, I config like this web:
Jenkins in ECS.
Normally it works well, but sometimes in rush hour, the slave container starts very slow, more than 40mins, or even can not start container.
I have to terminated the ECS instance, then launch a new one. When the container cannot start I saw a logs in ecs-agent:
STOPPED, Reason CannotCreateContainerError: API error (500):
devmapper: Thin Pool has 788 free data blocks which is less than
minimum required 4454 free data blocks. Create more free space in thin
pool or use dm.min_free_space option to change behavior
Here is my docker info, please advise me how to fix this issue.
[root#ip-10-124-2-159 ec2-user]# docker info
Containers: 10
Running: 1
Paused: 0
Stopped: 9
Images: 2
Server Version: 1.12.6
Storage Driver: devicemapper
Pool Name: docker-docker--pool
Pool Blocksize: 524.3 kB
Base Device Size: 10.74 GB
Backing Filesystem: ext4
Data file:
Metadata file:
Data Space Used: 8.646 GB
Data Space Total: 23.35 GB
Data Space Available: 14.71 GB
Metadata Space Used: 2.351 MB
Metadata Space Total: 25.17 MB
Metadata Space Available: 22.81 MB
Thin Pool Minimum Free Space: 2.335 GB
Udev Sync Supported: true
Deferred Removal Enabled: true
Deferred Deletion Enabled: true
Deferred Deleted Device Count: 0
Library Version: 1.02.93-RHEL7 (2015-01-28)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options:
Kernel Version: 4.4.39-34.54.amzn1.x86_64
Operating System: Amazon Linux AMI 2016.09
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.8 GiB
Name: ip-10-124-2-159
ID: 6HVT:TWH3:YP6T:GMZO:23TM:EUAA:F7XJ:ISII:QDE7:V2SN:XKFI:XPGZ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Insecure Registries:
127.0.0.0/8
And I don't know why only 4 tasks can be run at the same time, even resource of ECS instance still available, how can I increase it

Your problem is a very common one when you start and stop containers very often, and the post you just mentioned is all about that! They specifically say that:
"The Amazon EC2 Container Service Plugin can launch containers on your
ECS cluster that automatically register themselves as Jenkins slaves,
execute the appropriate Jenkins job on the container, and then
automatically remove the container/build slave afterwards"
The problem with this is that, if the stopped containers are not cleaned up, you eventually run out of memory, as you have experienced. You can check this yourself if you ssh into the instance and run the following command:
docker ps -a
If you run this command when Jenkins is getting in trouble, you should see an almost endless list of stopped containers. You can delete them all by running the following command:
docker rm -f $(docker ps -a -f status-exited)
However, doing this manually every so often is really not very convenient, so what you really want to do is to include the following script in the userData parameter of you ECS instance configuration when you launch it:
ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION=1m >> /etc/ecs/ecs.config
ECS_CLUSTER=<NAME_OF_CLUSTER> >> /etc/ecs/ecs.config
ECS_DISABLE_IMAGE_CLEANUP=false >> /etc/ecs/ecs.config
ECS_IMAGE_CLEANUP_INTERVAL=10m >> /etc/ecs/ecs.config
ECS_IMAGE_MINIMUM_CLEANUP_AGE=30m >> /etc/ecs/ecs.config
This will instruct the ECS agent to enable a cleanup daemon that checks every 10 minutes (that is the lowest interval you can set) for images to delete, deletes containers 1 minute after the task has stopped, and deletes images which are 30 minutes old and no longer referenced by an active Task Definition. You can learn more about these variables here.
In my experience, this configuration might not be enough if you start and stop containers very fast, so you may want to attach a decent volume to your instance in order to make sure you have enough space to carry on while the daemon cleans up the stopped containers.

Thanks Jose for the answer.
But, this command worked for me in Docker 1.12.*
docker rm $(docker ps -aqf "status=exited")
flag 'q' filters the containerIds from the result and removes it.

If you upgrade to latest AWS client (or latest ECS AMIs, amzn-ami-2017.09.d-amazon-ecs-optimized or later) then you configure ECS automated cleanup of defunct images, containers and volumes in your ecs config for the EC hosts serving the cluster.
This cleans up after and node(label){} clause but not docker execution during that build.
node container and its volumes - cleaned
docker images generated by steps executed upon that node - not cleaned
ECS is blind to what happens on that node. Given that the nodes themselves should be the largest things, ECS automated clean up should reduce the need to run a separate cleaning task to a minimum.

Related

Increasing Docker partition size on Google Cloud Compute instance running container image

I have an Apache server in production that is running in a Docker container, which I've deployed to a Google Compute instance using the "gcloud compute instances create-with-container" command. The /var/www/html folder is mounted onto the container from the boot disk of the computer instance to make it persistent, using the --container-mount-host-path flag:
gcloud compute instances create-with-container $INSTANCE_NAME \
--zone=europe-north1-a \
--container-image gcr.io/my-project/my-image:latest \
--container-mount-host-path mount-path=/var/www/html,host-path=/var/www/html,mode=rw \
--machine-type="$MACHINE_TYPE"
But now I've ran into the problem that the size of the Docker partition is only 5.7G!
Output of df -h:
...
/dev/sda1 5.7G 3.6G 2.2G 62% /mnt/stateful_partition
overlay 5.7G 3.6G 2.2G 62% /var/lib/docker/overlay2/4f223d8157033ce937a79af741df3eadf79a02d2d003f01a085301ff66884bf2/merged
overlay 5.7G 3.6G 2.2G 62% /var/lib/docker/overlay2/86316491e2bb20bc300c1cc55c9f9254001ed77d6ec7f05f716af1e52fe15f53/merged
...
I had assumed that the partition size would increase automatically, but I ran into the problem where the website couldn't write files onto disk anymore because the partition was full. As a quick fix, I ran "docker prune -a" (there were a bunch of old images hanging around) on the host machine to make some more space on the docker partition.
So my question is, what is the proper way of increasing the size of the partition?

You can resize the boot disk in the Google Cloud Console GUI. However, since this is a container host, I recommend deleting the virtual machine instance and creating a new instance with the correct configuration.
The default disk size is usually 10 GB. To create a virtual machine instance with a larger disk, specify that when creating the instance.
Add the following to your CLI command:
--boot-disk-size=32GB
Optionally specify the type of persistent disk to control costs:
--boot-disk-type=pd-standard
gcloud compute instances create-with-container

Limit docker (or docker-compose) resources GLOBALLY

i'm kinda new to docker and docker compose, plus i recently switched back to ubuntu from a year or so of using osx.
I am working with some docker-compose projects that are quite resource consuming, and when configuring the env on ubuntu i stumbled across a problem: when using docker on a mac (https://docs.docker.com/docker-for-mac/) you can specify maximum resources allocation -like hd space, memory, cpu - for the entire system (so to speak) in the docker app - in ubuntu i didn't find such thing anywhere.
I saw that there is a way to do this for some specific container, but what if i want to - say - allow a max of 6GB of ram for ALL containers? Is there a way to do this i'm not seeing?
Thanks a lot!

you need to setup a cgroup with limited CPU and Memory and refer Docker engine to it
example for a cgroup configs in "/etc/systemd/system/my_docker_slice.slice":
[Unit]
Description=my cgroup for Docker
Before=slices.target
[Slice]
MemoryAccounting=true
MemoryHigh=2G
MemoryMax=2.5G
CPUAccounting=true
CPUQuota=50%
and then update your docker daemon.json in /etc/docker/
{
"cgroup-parent": "/my_docker_slice.slice"
}
Note:
If the cgroup has a leading forward slash (/), the cgroup is created
under the root cgroup, otherwise the cgroup is created under the
daemon cgroup.
you can read more by search after "Default cgroup parent" here

Docker images disappearing over time

I loaded some docker images running
docker load --input <file>
I can then see these images when executing
docker image ls
After a while images start disappearing. Every few minutes there are less and less images listed. I did not run any of images yet. What could be the cause of this issue?
EDIT: This issue arises with docker inside minikube VM.

Since you've mentioned that Docker daemon runs inside minikube VM, I assume that you might hit K8s Garbage collection mechanism, which keeps system utilization on appropriate level and reduce amount of unused containers(built from images) by adjusting the specific thresholds.
These eviction thresholds are fully managed by Kubelet k8s node agent, cleaning uncertain images and containers according to the parameters(flags) propagated in kubelet configuration file.
Therefore, I guess you can investigate K8s eviction behavior looking at the certain thresholds, adjusted in kubelet config file which is generated by minikube bootstrapper in the following path /var/lib/kubelet/config.yaml.

As mention in #mk_sta answer to fix issue you need:
Create or edit /var/lib/kubelet/config.yaml with
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
evictionHard:
imagefs.available: "5%"
Default value is 15%
minikube stop
minikube start --extra-config=kubelet.config=/var/lib/kubelet/config.yaml
Or free more space on docker partition.
https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/#create-the-config-file
https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/#hard-eviction-thresholds

Clean docker environment: devicemapper

I have a docker environment with 2 containers (Jenkins and Nexus, both with their own named volume).
I have a daily cron-job which deletes unused containers and images. This is working fine. But the problem is inside my devicemapper:
du -sh /var/lib/docker/
30G docker/
I can each folder in my docker folder:
Volumes (big, but that's normal in my case):
/var/lib/docker# du -sh volumes/
14G volumes/
Containers:
/var/lib/docker# du -sh containers/
3.2M containers/
Images:
/var/lib/docker# du -sh image/
5.8M image/
Devicemapper:
/var/lib/docker# du -sh devicemapper/
16G devicemapper/
/var/lib/docker/devicemapper/mnt is 7.3G
/var/lib/docker/devicemapper/devicemapper is 8.1G
Docker info:
Storage Driver: devicemapper
Pool Name: docker-202:1-xxx-pool
Pool Blocksize: 65.54 kB
Base Device Size: 10.74 GB
Backing Filesystem: ext4
Data file: /dev/loop0
Metadata file: /dev/loop1
Data Space Used: 5.377 GB
Data Space Total: 107.4 GB
Data Space Available: 28.8 GB
Metadata Space Used: 6.148 MB
Metadata Space Total: 2.147 GB
Metadata Space Available: 2.141 GB
Udev Sync Supported: true
What is this space and am I able to clean this without breaking stuff?

Don't use a devicemapper loop file for anything serious! Docker has big warnings about this.
The /var/lib/docker/devicemapper/devicemapper directory contains the sparse loop files that contain all the data that docker mounts. So you would need to use lvm tools to trawl around them and do things. Have a read though the remove issues with devicemapper, they are kinda sorta resolved but maybe not.
I would move away from devicemapper where possible or use LVM thin pools on anything RHEL based. If you can't change storage drivers, the same procedure will at least clear up any allocated sparse space you can't reclaim.
Changing the docker storage driver
Changing storage driver will require dumping your /var/lib/docker directories which contains all your docker data. There are ways to save portions of it but that involves messing around with Docker internals. Better to commit and export any containers or volumes you want to keep and import them after the change. Otherwise you will have a fresh, blank Docker install!
Export data
Stop Docker
Remove /var/lib/docker
Modify your docker startup to use the new storage driver.
Set --storage-driver=<name> in /lib/systemd/system/docker.service or /etc/systemd/system/docker.service or /etc/default/docker or /etc/sysconfig/docker
Start Docker
Import Data
AUFS
AUFS is not in the mainline kernel (and never will be) which means distro's have to actively include it somehow. For Ubuntu it's in the linux-image-extra packages.
apt-get install linux-image-extra-$(uname -r) linux-image-extra-virtual
Then change the storage driver option to --storage-driver=aufs
OverlayFS
OverlayFS is already available in Ubuntu, just change the storage driver to --storage-driver=overlay2 or --storage-driver=overlay if you are still using a 3.x kernel
I'm not sure how good an idea this is right now. It can't be much worse than the loop file but
The overlay2 driver is pretty solid for dev use but isn't considered production ready yet (e.g. Docker Enterprise don't provide support) but it is being pushed to become the standard driver due to the AUFS/Kernel issues.
Direct LVM Thin Pool
Instead of the devicemapper loop file you can use an LVM thin pool directly. RHEL makes this easy with a docker-storage-setup utility that distributed with their EPEL docker package. Docker have detailed steps for setting up the volumes manually.
--storage-driver=devicemapper \
--storage-opt=dm.thinpooldev=/dev/mapper/docker-thinpool \
--storage-opt dm.use_deferred_removal=true
Docker 17.06+ supports managing simple direct-lvm block device setups for you.
Just don't run out of space in the LVM volume, ever. You end up with an unresponsive Docker daemon that needs to be killed and then LVM resources that are still in use that are hard to clean up.

A periodic docker system prune -a works for me on systems where I use devicemapper and not the LVM thinpool. The pattern I use is:
I label any containers, images, etc with label "protected" if I want them to be exempt from cleanup
I then periodically run docker system prune -a --filter=label!=protected (either manually or on cron with -f)
Labeling examples:
docker run --label protected ...
docker create --label=protected=true ...
For images, Dockerfile's LABEL, eg LABEL protected=true
To add a label to an existing image that I cannot easily rebuild, I make a 2 line Dockerfile with the above, build a new image, then switch the new image for the old one (tag).
General Docker label documentation

First, what is devicemapper (official documentation)
Device Mapper has been included in the mainline Linux kernel since version 2.6.9 [in 2005]. It is a core part of RHEL family of Linux distributions.
The devicemapper driver stores every image and container on its own virtual device. These devices are thin-provisioned copy-on-write snapshot devices.
Device Mapper technology works at the block level rather than the file level. This means that devicemapper storage driver's thin provisioning and copy-on-write operations work with blocks rather than entire files.
The devicemapper is the default Docker storage driver on some Linux distributions.
Docker hosts running the devicemapper storage driver default to a configuration mode known as loop-lvm. This mode uses sparse files to build the thin pool used by image and container snapshots
Docker 1.10 [from 2016] and later no longer matches image layer IDs with directory names in /var/lib/docker.
However, there are two key directories.
The /var/lib/docker/devicemapper/mnt directory contains the mount points for image and container layers.
The /var/lib/docker/devicemapper/metadatadirectory contains one file for every image layer and container snapshot.
If your docker info does show your Storage Driver is devicemapper (and not aufs), proceed with caution with those folders.
See for instance issue 18867.

I faced the same issue where in my /var/lib/docker/devicemapper/devicemapper/data file has reached ~91% of root volume(~45G of 50G). I tried removing all the unwanted images, deleted volumes, nothing helped in reducing this file.
Did a few googling and understood that the "data" files is loopback-mounted sparse files and docker uses it to store the mount locations and other files we would have stored inside the containers.
Finally I removed all the images which were run before and stopped
Warning: Deletes all docker containers
docker rm $(docker ps -aq)
The reduced the devicemapper file significantly. Hope this may help you
.

Set disc quota with DeviceMapper

I have changed storage plugin to DeviceMapper. Docker info gives following output.
Server Version: 1.9.0
Storage Driver: devicemapper
Pool Name: docker-253:1-16-pool
Pool Blocksize: 65.54 kB
Base Device Size: 107.4 GB
Backing Filesystem: extfs
Data file: /dev/loop0
Metadata file: /dev/loop1
Data Space Used: 1.821 GB
Data Space Total: 268.4 GB
Data Space Available: 11.66 GB
Metadata Space Used: 2.101 MB
Metadata Space Total: 2.147 GB
Metadata Space Available: 2.145 GB
Udev Sync Supported: true
Deferred Removal Enabled: false
Deferred Deletion Enabled: false
Deferred Deleted Device Count: 0
Data loop file: /var/lib/docker/devicemapper/devicemapper/data
Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
Library Version: 1.02.90 (2014-09-01)
Execution Driver: native-0.2
First of all, I don't know, how to set quota per container. Should I use maybe flgas in docker run commands?

With devicemapper as storage plugin, you can not set disk size per container. It would be of a fixed size for each container. And as per the output of docker info it suggests that the fixed size would be of around 100GB. However, you can have one of the following 2 options depending on your requirement.
a.) You can change this fixed size from 100GB to some other value like 20GB but in that case as well all the containers would be having fixed disk size as 20GB. If you want go ahead with this option, you can follow this:
Stop docker service, sudo service docker stop
Remove the existing docker directory (which in your case is default
one i.e. /var/lib/docker)-- NOTE this will delete all your
existing docker images and containers.
Start docker daemon with option docker daemon -s devicemapper --storage-opt dm.basesize=20G
Or in place of step 3, add option DOCKER_OPTS='-g /var/lib/docker -s devicemapper --storage-opt dm.basesize=5G'in the file /etc/default/docker and restart docker service
sudo service docker start
Now, whatever containers you will spawn, would be having disk size as 20GB.
b.) As a second option, you can increase disk size of your existing containers from whatever base disk size you have set. (which by default is 100GB and if you follow first option, then 20GB). To do this, here is a very useful article you can follow. This may help in letting you set different disk size for different containers.
Hope this answer is useful for your requirement, thanks.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart