Disable swap on a kubelet - docker

I'm running Kubernetes 1.2.0 on a number of lab machines. The machines have swap enabled. As the machines are used for other purposes, too, I cannot disable swap globally.
I'm observing the following problem: If I start a pod with a memory limit, the container starts swapping after it reached the memory limit. I would expect the container to be killed.
According to this issue this was a problem that has been fixed, but it still occurs with Kubernetes 1.2.0. If I check the running container with docker inspect, then I can see that MemorySwap = -1 and MemorySwappiness = -1. If I start a pod with low memory limits, it starts swapping almost immediately.
I had some ideas, but I couldn't figure out how to do any of these:
Change the default setting in Docker so no container is allowed to swap
Add a parameter to the Kubernetes container config so it passes --memory-swappiness=0
Fiddle with docker's cgroup and disallow swapping for the group
How can I prevent the containers to start swapping?

Kubernetes, specifically the kubelet, fails if swap is enabled on Linux since version 1.8 (flag --fail-swap-on=true), as Kubernetes can't handle swap. That means you can be sure that swap is disabled by default on Kubernetes.
To test it in local Docker container, set memory-swap == memory, e.g.:
docker run --memory="10m" --memory-swap="10m" dominikk/swap-test
My test image is based on this small program with the addition to flush output in Docker:
setvbuf(stdout, NULL, _IONBF, 0); // flush stdout buffer every time
You can also test it with docker-compose up (only works for version <= 2.x):
version: '2'
services:
swap-test:
image: dominikk/swap-test
mem_limit: 10m
# memswap_limit:
# -1: unlimited swap
# 0: field unset
# >0: mem_limit + swap
# == mem_limit: swap disabled
memswap_limit: 10m

If you are just playing around then no need to bother with turning swap off. Stuff will still run but resource isolation won't work as well. If you are using Kubernetes seriously enough to need resource isolation then you should not be running other things on the machines.

Related

can i limit the active service in a swarm to 1 instance?

i’m currently in the process of setting up a swarm with 5 machines. i’m just wondering if i can and should limit the swarm to only allow one active instance of a service? and all others just wait till they should jump in when the service fail.
This is to prevent potential concurrency problems with maria-db (as the nodes sill write to a nas), or connection limit to an external service (like node red with telegram)
If you're deploying with stack files you can set "replicas: 1" in the deploy section to make sure only one instance runs at a time.
If that instance fails (crashes or exits) docker will start another one.
https://docs.docker.com/compose/compose-file/deploy/#replicas
If the service is replicated (which is the default), replicas
specifies the number of containers that SHOULD be running at any given
time.
services: frontend:
image: awesome/webapp
deploy:
mode: replicated
replicas: 6
If you want multiple instances running and only one "active" hitting the database you'll have to coordinate that some other way.

docker-compose: Reserve a different GPU for each scaled container

I have a docker-compose file that looks like the following:
version: "3.9"
services:
api:
build: .
ports:
- "5000"
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
count: 1
When I run docker-compose up, this runs as intended, using the first GPU on the machine.
However, if I run docker-compose up --scale api=2, I would expect each docker container to reserve one GPU on the host.
The actual behaviour is that both containers receive the same GPU, meaning that they compete for resources. Additionally, I also get this behaviour if I have two containers specified in the docker-compose.yml, both with count: 1. If I manually specify device_ids for each container, it works.
How can I make it so that each docker container reserves exclusive access to 1 GPU? Is this a bug or intended behaviour?
The behavior of docker-compose when a scale is requested is to create additional containers as per the exact specification provided by the service.
There are very few specification parameters that will vary during the creation of the additional containers and the devices which are part of the host_config set of parameters are copied without modifications.
docker-compose is python project, so if this is important feature for you, you can try to implement it. The logic that drives the lifecycle of the services (creation, scaling, etc.) reside in compose/services.py.

Increase memory of Docker container with docker-compose on Windows?

On Docker for Windows, I have a simple SQL Server container based on microsoft/mssql-server-windows-developer that is launched with docker-compose up via a simple docker-compose.yaml file.
Is there a way to allocate more than 1GB of memory to this container? I can do it when running the image directly or when I build my image with -m 4GB, but I can't figure out how to do this when using Docker Compose. This container needs more than 1GB of RAM to run properly and all of my research has revealed nothing helpful thus far.
I've looked into the resources configuration option, but that only applies when running under Docker Swarm, which I don't need.
In docker compose version 2.* you could use the mem_limit option as below
version: '2.4'
services:
my-svc:
image: microsoft/mssql-server-windows-developer
mem_limit: 4G
In docker compose version 3 it is replaced by the resources options which requires docker swarm.
version: '3'
services:
my-svc:
image: microsoft/mssql-server-windows-developer
deploy:
resources:
limits:
memory: 4G
There is a compatibility flag that can be used to translate the deploy section to equivalent version 2 parameters when running docker-compose --compatibility up. However this is not recommended for production deployments
From documentation
docker-compose 1.20.0 introduces a new --compatibility flag designed
to help developers transition to version 3 more easily. When enabled,
docker-compose reads the deploy section of each service’s definition
and attempts to translate it into the equivalent version 2 parameter.
Currently, the following deploy keys are translated:
resources
limits and memory reservations
replicas
restart_policy
condition and max_attempts All other keys are ignored and produce a
warning if present. You can review the configuration that will be used
to deploy by using the --compatibility flag with the config command.
We recommend against using --compatibility mode in production. Because the resulting configuration is only an approximate using non-Swarm mode properties, it may produce unexpected results.
Looking for options to set resources on non swarm mode containers?
The options described here are specific to the deploy key and swarm mode. If you want to set resource constraints on non swarm deployments, use Compose file format version 2 CPU, memory, and other resource options. If you have further questions, refer to the discussion on the GitHub issue docker/compose/4513.
You can use the docker-compose file on version 2 instead of version 3. You can use mem_limit (available on version 2) to set the memory limit. So you can use a docker-compose file like this:
version: "2.4"
services:
sql-server:
image: microsoft/mssql-server-windows-developer
environment:
- ACCEPT_EULA=Y
- SA_PASSWORD=t3st&Pa55word
mem_limit: 4GB
You can check the memory limit using docker stats.
Was also out for setting this up via docker-compose. Had a hard time figuring out why sql server worked on a new machine but not any longer on my older one. Finally recalled I had tuned the size down able to allocate in Docker Desktop. Utilizing this you find it through the settings button, Resources/Advanced. Setting Memory to 2GB resolved the issue for me.

How is it possible that data in Kafka survives container recycling?

First, I do not know whether this issue is with Kafka or with Docker … I am a rookie regarding both topics. But I assume that it is more a Docker than a Kafka problem (in fact it will be my problem not really understanding one or the other …).
I installed Docker on a Raspberry 4 and created Docker images for Kafka and for Zookeeper; I had to create them by myself because 64-bit Raspi was not supported by any of the existing images (at least I could not find anyone). But I got them working.
Next I implemented the Kafka Streams example (Wordcount) from the Kafka documentation; it runs fine, counting the words in all the texts you push into it, keeping the numbers from all previous runs. That is somehow expected; at least it is described that way in that documentation.
So after some test runs I wanted to reset the whole thing.
I thought the easiest way to get there is to shut down the docker containers, delete the mounted folders on the host and start over.
But that does not work: the word counters are still there! Meaning the word count did not start from 0 …
Ok, next turn: not only removing the containers, but rebuild the images, too! Both, Zookeeper and Kafka, of course!
No difference! The word count from all the previous runs were retained.
Using docker system prune --volumes made no difference also …
From my limited understanding of Docker, I assumed that any runtime data is stored in the container, or in the mounted folders (volumes). So when I delete the containers and the folders on the Docker host that were mounted by the containers, I expect that any status would have gone.
Obviously not … so I missed something important here, most probably with Docker.
The docker-compose file I used:
version: '3'
services:
zookeeper:
image: tquadrat/zookeeper:latest
ports:
- "2181:2181"
- "2888:2888"
- "3888:3888"
- "8080:8080"
volumes:
- /data/zookeeper/config:/config
- /data/zookeeper/data:/data
- /data/zookeeper/datalog:/datalog
- /data/zookeeper/logs:/logs
environment:
ZOO_SERVERS: "server.1=zookeeper:2888:3888;2181"
restart: always
kafka:
image: tquadrat/kafka:latest
depends_on:
- zookeeper
ports:
- "9091:9091"
volumes:
- /data/kafka/config:/config
- /data/kafka/logs:/logs
environment:
KAFKA_LISTENERS: "INTERNAL://kafka:29091,EXTERNAL://:9091"
KAFKA_ADVERTISED_LISTENERS: "INTERNAL://kafka:29091,EXTERNAL://TCON-PI4003:9091"
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: "INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT"
KAFKA_INTER_BROKER_LISTENER_NAME: "INTERNAL"
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_DELETE_TOPIC_ENABLE: "true"
restart: always
The script file I used to clear out the mounted folders:
#!/bin/sh
set -eux
DATA="/data"
KAFKA_DATA="$DATA/kafka"
ZOOKEEPER_DATA="$DATA/zookeeper"
sudo rm -R "$KAFKA_DATA"
sudo rm -R "$ZOOKEEPER_DATA"
mkdir -p "$KAFKA_DATA/config" "$KAFKA_DATA/logs"
mkdir -p "$ZOOKEEPER_DATA/config" "$ZOOKEEPER_DATA/data" "$ZOOKEEPER_DATA/datalog" "$ZOOKEEPER_DATA/logs"
Any ideas?
Kafka Streams stores its own state under the "state.dir" config on the Host machine its running on. In Apache Kafka libraries, this is under /tmp. First check if you have overridden that property in your code.
As far as Docker goes, try without volumes first.
Using docker system prune --volumes made no difference also …
That would clean unattached volumes made with docker volume create or volumes: in Compose, not host-mounted directories.
As I assumed right from the beginning, the problem was mainly my lack of knowledge.
The behaviour I observed is not related to a magical data store for Docker that survives all attempts to kill it; it is not related to Docker at all.
I use those Docker images to run Zookeeper and the Kafka server on it. Then I switched back to my workstation machine, wrote that code (the "Wordcount" sample) that implements a Kafka Stream processor. When I started that in my IDE, it was executed on my local machine, accessing Kafka over the network.
My assumption was that any state was stored on the Kafka server, so that dumping that should reset the whole thing; as that does not work, I dumped also the Zookeeper, and as this was to no avail also, I removed nearly everything …
After some hints here I found that Kafka Streams processors maintain their own local state in a filesystem folder that is configured through state.dir (StreamsConfig.STATE_DIR_CONFIG) – see Configuring a Streams Application. This means that a Kafka Streams processor maintains its own local state independent from any Kafka server, and – as in my case when it runs on my local machine – also outside/unrelated to any Docker container …
According to the documentation, the default location should be /var/lib/kafka-streams, but this is not writeable in my environment – no idea where the Stream processor put its state instead.
After setting the configuration value state.dir for my Streams processor explicitly to a folder in my home directory, I could see that state on my disk, and after removing that, the word count started over with one.
A deeper look into the documentation for Kafka Streams revealed that I could have got the same with a call to KafkaStream.cleanup() before starting or after closing the stream processor (no removing files on the filesystem required).

Slow mounting of Docker volumes with large number of files on Linux

We are experiencing some very strange behaviour when mounting volumes with large amount of data (e.g a million files )
The current setup:
Docker host: Container Linux by CoreOS 1465.7.0 (Ladybug)
Docker version client: 18.06.1-ce
Docker version host: 17.09.0-ce
I have tried different versions of docker and CoreOs, both newer and older releases without any differences.
The data-directory on the docker-host is probably mapped to some storage tech, im not sure about the setup here, but I can fill out with details if necessary, but from my point of view it looks like a normal folder.
The initial error happened when switching from an anonymous volume mounted through a dummy-container (docker-compose v1) to a named volume (docker-compose v3). After creating a named volume, i shut down the docker-service and does a manual copy of the files from the old volume to the new volume. This has been tested with small data amounts, and that works so it doesnt seem to be an issue with the actual moving data in the internal /var/lib/docker-domain. I am also able to recreate this issue with a new installation where I copy a decently large amount of data.
Building container with compose works fine:
myservice:
build: myservice
restart: always
ports:
- "8080:8080"
volumes:
- type: volume
source: repo
target: /home/repo
volumes:
repo:
The repo-volume beeing the volume with a lot of data. Now, when trying to up the services, I get a timeout on the up, and the service gets stuck in "Created":
ERROR: for my-servce-new_exp_1 HTTPSConnectionPool(host='xxxx.xx.xxx.xxx', port=2376): Read timed out. (read timeout=60)
ERROR: for exp HTTPSConnectionPool(host='xxx.xx.xxx.xxx', port=2376): Read timed out. (read timeout=60)
ERROR: An HTTP request took too long to complete. Retry with --verbose to obtain debug information.
If you encounter this issue regularly because of slow network conditions, consider setting COMPOSE_HTTP_TIMEOUT to a higher value (current value: 60).
I have tried to increase the timeout, but something I get another timeout after a while.
Now, if I RESTART the docker-service or host now, the service is getting up and running (but doing it this way causes issues with internal dns-mappings between services)
If i up the service with an empty / small volume, it works as expected.
As a curiosity, I found something possibly releated when trying to mount the same volume to a docker-container:
docker run -it --rm --name rmytest --volume my-service-new_repo:/data ubuntu:latest
This will time out after e.g 30 minutes or so.
If I, on the other hand, adds any option to the consistency-parameter of the volume-mapping, it runs within a couple of seconds:
docker run -it --rm --name rmytest --volume my-service-new_repo:/data:consistent ubuntu:latest
I have had no success adding the same options to the compose files either, e.g
volumes:
- type: volume
source: repo
target: /home/repo
consistency: delegated
Yields the same results; timeout and not working. Any help and pointers in the right direction would be much appreciated.
As mentioned in my own comment, this was due to SeLinux and labeling of date when mounting. To avoid this issue, we had to turn of the labeling:
mycontainer:
build: mycontainer
restart: always
# To avoid issue with named volume and mounting time
security_opt:
- "label=disable"
Disclaimer: Im not 100% sure about the full consequences of using this option, but in our situation this was the feasible way of solving this for now.

Resources