How to specify GPU limit docker swarm using docker compose? - docker

How to specify GPU, CPU & Memory limit using docker compose?
For this case i have follow instructions at (https://nvidia.github.io/nvidia-container-runtime/).
Using :
$ apt-get install nvidia-container-runtime
I'm also check it with :
$ docker run -it --rm --gpus all ubuntu nvidia-smi
Examples docker-compose.yaml :
version: '2'
...
services:
...
my-service:
...
deploy:
resources:
reservations:
generic_resources:
- discrete_resource_spec:
kind: 'gpu'
value: 1
limits:
cpus: '4'
memory: 4096M
another-service:
...
deploy:
resources:
limits:
cpus: '0.001'
memory: 50M

Related

How to select specific GPUs in docker-compose

On a machine with multiple Nvidia GPUs, is there a way to specify in the docker-compose.yml file that we want to use two specific GPU device IDs in the Docker container?
The following seems to be the equivalent of docker run --gpus=allsince all the GPUs are listed when runningnvidia-smi` inside the Docker container.
deploy:
resources:
reservations:
devices:
- driver: nvidia
capabilities: [gpu]
You may use device_ids or count.
For example, to allow only GPUs with ID 0 and 3:
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ['0', '3']
capabilities: [gpu]
Enabling GPU access with Compose

Set Docker resource limits via gitlab CICD as variables

I'm trying to set some resource limits into a docker container. I'm able to add the below values to a docker-compose.yml file for docker resource limits;
resources:
limits:
cpus: '2'
memory: 4GB
reservations:
cpus: '1'
memory: 4GB
How would I pass them in via gitlab pipeline for the container being built but set them as variables?
I was able to override the java heap size by adding;
java_xmx=${JAVA_XMX_OVERRIDE}
and the value
JAVA_XMX_OVERRIDE: "-Xmx2048m"
How would I do the same with resource limits?
Thanks
You can use variables in docker compose which you can propagate with the starting command.
compose.yaml:
version: '3.9'
services:
httpd:
container_name: httpd-test
image: httpd
deploy:
resources:
limits:
memory: ${MEMORY}
Start container:
$ MEMORY=60M docker-compose up -d
$ docker stats
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
ace0d6d638e1 httpd-test 0.00% 26.86MiB / 60MiB 44.77% 4.3kB / 0B 0B / 0B 82
You should be able to define an environment variable in your gitlab pipeline:
variables:
MEMORY: 60M
I've ended up adding a docker-compose file template in the pipeline, in the template I modified the resource limits with ansible
- name: Compose Docker | Get text block for service set_fact:
service_content: "{{ lookup('template', 'templates/docker-compose-service.yml.j2') }}" tags:
- compose_docker

How to check Docker Swarm resources reservation via cli

I introduced Docker Swarm resources limit on a cluster (24 GB RAM and 12 VCPUs) and specified services limits with the following configurations:
redis:
image: redis
deploy:
replicas: 1
resources:
reservations:
cpus: '1'
memory: 300m
ports:
- "6379:6379"
Now the problem is that I get the error no suitable node (insufficient resources on 3 nodes) and I can't understand what resources are over and where exactly. Is there a way to understand resource reservation overall?

Where is Kubernetes in Docker (KIND) mapping its Volume Mounts on windwos 10

Im following the instructions here to install an Elastic Search Cluster on KIND (Kubernetes in Docker). https://www.elastic.co/blog/alpha-helm-charts-for-elasticsearch-kibana-and-cncf-membership
This is running in a 4 node cluster on Docker on Windows 10. Im running into a problem similar to whats reported here: https://github.com/elastic/helm-charts/issues/137
Im trying to figure out where the mounts are so I can CHOWN that directory. Where is this mapped on the local machine?
Im not running WSL2 yet
In order to change the owner of /usr/share/elasticsearch/data/nodes directory you have to create an initContainer that will change the permissions.
You can do it by fetching elasticsearch chart:
helm fetch --untar elasticsearch elastic/elasticsearch
Then change values.yaml and add following lines:
antiAffinity: "soft"
# Shrink default JVM heap.
esJavaOpts: "-Xmx128m -Xms128m"
# Allocate smaller chunks of memory per pod.
resources:
requests:
cpu: "100m"
memory: "512M"
limits:
cpu: "1000m"
memory: "512M"
# Request smaller persistent volumes.
volumeClaimTemplate:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "hostpath"
resources:
requests:
storage: 100M
extraInitContainers: |
- name: create
image: busybox:1.28
command: ['mkdir', '/usr/share/elasticsearch/data/nodes/']
volumeMounts:
- mountPath: /usr/share/elasticsearch/data
name: elasticsearch-master
- name: file-permissions
image: busybox:1.28
command: ['chown', '-R', '1000:1000', '/usr/share/elasticsearch/']
volumeMounts:
- mountPath: /usr/share/elasticsearch/data
name: elasticsearch-master
It changes the cpu and memory requests and limits for the pods and starts initContainer with chown', '-R', '1000:1000', '/usr/share/elasticsearch/' command that changes the permission of the directory.

Docker - swarm with docker toolbox doesn't run

i applied docker tutorial to set up a swarm.
I used docker toolbox, because i'm on windows 10 Family.
i step all statements, but at the end, the statement "curl ip_adress" doesn't run. error also with access on url.
$ docker --version
Docker version 18.03.0-ce, build 0520e24302
docker-compose.yml, located in /home/docker of virtual machine called "myvm1" :
version: "3"
services:
web:
# replace username/repo:tag with your name and image details
image: 12081981/friendlyhello:part1
deploy:
replicas: 5
resources:
limits:
cpus: "0.1"
memory: 50M
restart_policy:
condition: on-failure
ports:
- "80:80"
networks:
- webnet
networks:
webnet:
swarm :
$ docker-machine ssh myvm1 "docker stack ps getstartedlab"
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
blmx8mldam52 getstartedlab_web.1 12081981/friendlyhello:part1 myvm1 Running Running 9 seconds ago
04ctl86chp6o getstartedlab_web.2 12081981/friendlyhello:part1 myvm3 Running Running 6 seconds ago
r3qyznllno9j getstartedlab_web.3 12081981/friendlyhello:part1 myvm3 Running Running 6 seconds ago
2twwicjssie9 getstartedlab_web.4 12081981/friendlyhello:part1 myvm1 Running Running 9 seconds ago
o4rk4x7bb3vm getstartedlab_web.5 12081981/friendlyhello:part1 myvm3 Running Running 6 seconds ago
result of "docker-machine ls" :
NAME ACTIVE DRIVER STATE URL SWARM DOCKER ERRORS
default - virtualbox Running tcp://192.168.99.100:2376 v18.09.0
myvm1 * virtualbox Running tcp://192.168.99.102:2376 v18.09.0
myvm3 - virtualbox Running tcp://192.168.99.103:2376 v18.09.0
test with curl
$ curl 192.168.99.102
curl: (7) Failed to connect to 192.168.99.102 port 80: Connection refused
How do i do to debug ?
I can give more information, if you want.
Thanks in advance.
Use of the routing mesh in Windows appears to be an EE only feature right now. You can monitor this docker for windows issue for more details. The current workaround is to use DNSRR internally and publish ports to the host directly instead of with the routing mesh. If you want your application to be reachable from any node in the cluster, this means you'd need to have a service on ever host in the cluster, scheduled globally, listening on the requested port. E.g.
version: "3.2"
services:
web:
# replace username/repo:tag with your name and image details
image: 12081981/friendlyhello:part1
deploy:
# global runs 1 on every node, instead of the replicated variant
mode: global
# DNSRR skips the VIP normally assigned to services
endpoint_mode: dnsrr
resources:
limits:
cpus: "0.1"
memory: 50M
restart_policy:
condition: on-failure
ports:
- target: 80
published: 80
protocol: tcp
# host publishes the port directly from the container without the routing mesh
mode: host
networks:
- webnet
networks:
webnet:

Resources