Folder deleted/not created inside the common dir mounted with emptyDir{} type on EKS Fargate pod - docker

We are facing strange issue with EKS Fargate Pods. We want to push logs to cloudwatch with sidecar fluent-bit container and for that we are mounting the separately created /logs/boot and /logs/access folders on both the containers with emptyDir: {} type. But somehow the access folder is getting deleted. When we tested this setup in local docker it produced desired results and things were working fine but not when deployed in the EKS fargate. Below is our manifest files
Dockerfile
FROM anapsix/alpine-java:8u201b09_server-jre_nashorn
ARG LOG_DIR=/logs
# Install base packages
RUN apk update
RUN apk upgrade
# RUN apk add ca-certificates && update-ca-certificates
# Dynamically set the JAVA_HOME path
RUN export JAVA_HOME="$(dirname $(dirname $(readlink -f $(which java))))" && echo $JAVA_HOME
# Add Curl
RUN apk --no-cache add curl
RUN mkdir -p $LOG_DIR/boot $LOG_DIR/access
RUN chmod -R 0777 $LOG_DIR/*
# Add metadata to the image to describe which port the container is listening on at runtime.
# Change TimeZone
RUN apk add --update tzdata
ENV TZ="Asia/Kolkata"
# Clean APK cache
RUN rm -rf /var/cache/apk/*
# Setting JAVA HOME
ENV JAVA_HOME=/opt/jdk
# Copy all files and folders
COPY . .
RUN rm -rf /opt/jdk/jre/lib/security/cacerts
COPY cacerts /opt/jdk/jre/lib/security/cacerts
COPY standalone.xml /jboss-eap-6.4-integration/standalone/configuration/
# Set the working directory.
WORKDIR /jboss-eap-6.4-integration/bin
EXPOSE 8177
CMD ["./erctl"]
Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: vinintegrator
namespace: eretail
labels:
app: vinintegrator
pod: fargate
spec:
selector:
matchLabels:
app: vinintegrator
pod: fargate
replicas: 2
template:
metadata:
labels:
app: vinintegrator
pod: fargate
spec:
securityContext:
fsGroup: 0
serviceAccount: eretail
containers:
- name: vinintegrator
imagePullPolicy: IfNotPresent
image: 653580443710.dkr.ecr.ap-southeast-1.amazonaws.com/vinintegrator-service:latest
resources:
limits:
memory: "7629Mi"
cpu: "1.5"
requests:
memory: "5435Mi"
cpu: "750m"
ports:
- containerPort: 8177
protocol: TCP
# securityContext:
# runAsUser: 506
# runAsGroup: 506
volumeMounts:
- mountPath: /jboss-eap-6.4-integration/bin
name: bin
- mountPath: /logs
name: logs
- name: fluent-bit
image: 657281243710.dkr.ecr.ap-southeast-1.amazonaws.com/fluent-bit:latest
imagePullPolicy: IfNotPresent
env:
- name: HOST_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
resources:
limits:
memory: 200Mi
requests:
cpu: 200m
memory: 100Mi
volumeMounts:
- name: fluent-bit-config
mountPath: /fluent-bit/etc/
- name: logs
mountPath: /logs
readOnly: true
volumes:
- name: fluent-bit-config
configMap:
name: fluent-bit-config
- name: logs
emptyDir: {}
- name: bin
persistentVolumeClaim:
claimName: vinintegrator-pvc
Below is the /logs folder ownership and permission. Please notice the 's' in drwxrwsrwx
drwxrwsrwx 3 root root 4096 Oct 1 11:50 logs
Below is the content inside logs folder. Please notice the access folder is not created or deleted.
/logs # ls -lrt
total 4
drwxr-sr-x 2 root root 4096 Oct 1 11:50 boot
/logs #
Below is the configmap of Fluent-Bit
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-config
namespace: eretail
labels:
k8s-app: fluent-bit
data:
fluent-bit.conf: |
[SERVICE]
Flush 5
Log_Level info
Daemon off
Parsers_File parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2020
#INCLUDE application-log.conf
application-log.conf: |
[INPUT]
Name tail
Path /logs/boot/*.log
Tag boot
[INPUT]
Name tail
Path /logs/access/*.log
Tag access
[OUTPUT]
Name cloudwatch_logs
Match *boot*
region ap-southeast-1
log_group_name eks-fluent-bit
log_stream_prefix boot-log-
auto_create_group On
[OUTPUT]
Name cloudwatch_logs
Match *access*
region ap-southeast-1
log_group_name eks-fluent-bit
log_stream_prefix access-log-
auto_create_group On
parsers.conf: |
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%LZ
Below is error log of Fluent-bit container
AWS for Fluent Bit Container Image Version 2.14.0
Fluent Bit v1.7.4
* Copyright (C) 2019-2021 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io
[2021/10/01 06:20:33] [ info] [engine] started (pid=1)
[2021/10/01 06:20:33] [ info] [storage] version=1.1.1, initializing...
[2021/10/01 06:20:33] [ info] [storage] in-memory
[2021/10/01 06:20:33] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2021/10/01 06:20:33] [error] [input:tail:tail.1] read error, check permissions: /logs/access/*.log
[2021/10/01 06:20:33] [ warn] [input:tail:tail.1] error scanning path: /logs/access/*.log
[2021/10/01 06:20:38] [error] [net] connection #33 timeout after 5 seconds to: 169.254.169.254:80
[2021/10/01 06:20:38] [error] [net] socket #33 could not connect to 169.254.169.254:80

Suggest remove the following from your Dockerfile:
RUN mkdir -p $LOG_DIR/boot $LOG_DIR/access
RUN chmod -R 0777 $LOG_DIR/*
Use the following method to setup the log directories and permissions:
apiVersion: v1
kind: Pod # Deployment
metadata:
name: busy
labels:
app: busy
spec:
volumes:
- name: logs # Shared folder with ephemeral storage
emptyDir: {}
initContainers: # Setup your log directory here
- name: setup
image: busybox
command: ["bin/ash", "-c"]
args:
- >
mkdir -p /logs/boot /logs/access;
chmod -R 777 /logs
volumeMounts:
- name: logs
mountPath: /logs
containers:
- name: app # Run your application and logs to the directories
image: busybox
command: ["bin/ash","-c"]
args:
- >
while :; do echo "$(date): $(uname -r)" | tee -a /logs/boot/boot.log /logs/access/access.log; sleep 1; done
volumeMounts:
- name: logs
mountPath: /logs
- name: logger # Any logger that you like
image: busybox
command: ["bin/ash","-c"]
args: # tail the app logs, forward to CW etc...
- >
sleep 5;
tail -f /logs/boot/boot.log /logs/access/access.log
volumeMounts:
- name: logs
mountPath: /logs
The snippet runs on Fargate as well, run kubectl logs -f busy -c logger to see the tailing. In real world, the "app" is your java app, "logger" is any log agent you desired. Note Fargate has native logging capability using AWS Fluent-bit, you do not need to run AWS Fluent-bit as sidecar.

Related

Cannot mount local config for maven in docker multi-stage builds

I using Jenkins with Kubernetes slave. (Kubernetes plugin)
I want to build docker multi-stage. In maven build state I want to use local repository. (config in settings.xml)
I already created configmap on K8S and mounted while running build job.
agent {
kubernetes {
yaml '''
apiVersion: v1
kind: Pod
spec:
volumes:
- name: docker-socket
emptyDir: {}
- configMap:
defaultMode: 420
name: nexus-xml-test
name: config-vol
containers:
- name: docker-pod
image: docker:19.03.1
command:
- cat
tty: true
volumeMounts:
- name: docker-socket
mountPath: /var/run
- mountPath: /root/.m2
name: config-vol
- name: docker-daemon
image: docker:19.03.1-dind
securityContext:
privileged: true
volumeMounts:
- name: docker-socket
mountPath: /var/run
'''
}
}
And I Already verify that settings.xml is already mounted.
2022-09-13 17:27:54 + cd /root/.m2
2022-09-13 17:27:54 + ls
2022-09-13 17:27:54 settings.xml
In dockerfile. I added this command.
RUN mvn -s /root/.m2/settings.xml
But when i build. It cannot find settings.xml
2022-09-13 17:45:48 Step 2/10 : RUN mvn -s /root/.m2/settings.xml
2022-09-13 17:45:51 ---> Running in e779f9fcf9b6
2022-09-13 17:45:52 [ERROR] Error executing Maven.
2022-09-13 17:45:52 [ERROR] The specified user settings file does not exist: /root/.m2/settings.xml
2022-09-13 17:45:52 The command '/bin/sh -c mvn -s /root/.m2/settings.xml' returned a non-zero code: 1
Please help suggest.

Why I cannot read files from a shared PersistentVolumeClaim between containers in Kubernetes?

I have a docker image felipeogutierrez/tpch-dbgen that I build using docker-compose and I push it to docker-hub registry using travis-CI.
version: "3.7"
services:
other-images: ....
tpch-dbgen:
build: ../docker/tpch-dbgen
image: felipeogutierrez/tpch-dbgen
volumes:
- tpch-dbgen-data:/opt/tpch-dbgen/data/
- datarate:/tmp/
stdin_open: true
and this is the Dockerfile to build this image:
FROM gcc AS builder
RUN mkdir -p /opt
COPY ./generate-tpch-dbgen.sh /opt/generate-tpch-dbgen.sh
WORKDIR /opt
RUN chmod +x generate-tpch-dbgen.sh && ./generate-tpch-dbgen.sh
In the end, this scripts creates a directory /opt/tpch-dbgen/data/ with some files that I would like to read from another docker image that I am running on Kubernetes. Then I have a Flink image that I create to run into Kubernetes. This image starts 3 Flink Task Managers and one stream application that reads files from the image tpch-dbgen-data. I think that the right approach is to create a PersistentVolumeClaim so I can share the directory /opt/tpch-dbgen/data/ from image felipeogutierrez/tpch-dbgen to my flink image in Kubernetes. So, first I have this file to create the PersistentVolumeClaim:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: tpch-dbgen-data-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 200Mi
Then, I am creating an initContainers to launch the image felipeogutierrez/tpch-dbgen and after that launch my image felipeogutierrez/explore-flink:1.11.1-scala_2.12:
apiVersion: apps/v1
kind: Deployment
metadata:
name: flink-taskmanager
spec:
replicas: 3
selector:
matchLabels:
app: flink
component: taskmanager
template:
metadata:
labels:
app: flink
component: taskmanager
spec:
initContainers:
- name: tpch-dbgen
image: felipeogutierrez/tpch-dbgen
#imagePullPolicy: Always
env:
command: ["ls"]
# command: ['sh', '-c', 'for i in 1 2 3; do echo "job-1 `date`" && sleep 5s; done;', 'ls']
volumeMounts:
- name: tpch-dbgen-data
mountPath: /opt/tpch-dbgen/data
containers:
- name: taskmanager
image: felipeogutierrez/explore-flink:1.11.1-scala_2.12
#imagePullPolicy: Always
env:
args: ["taskmanager"]
ports:
- containerPort: 6122
name: rpc
- containerPort: 6125
name: query-state
livenessProbe:
tcpSocket:
port: 6122
initialDelaySeconds: 30
periodSeconds: 60
volumeMounts:
- name: flink-config-volume
mountPath: /opt/flink/conf/
- name: tpch-dbgen-data
mountPath: /opt/tpch-dbgen/data
securityContext:
runAsUser: 9999 # refers to user _flink_ from official flink image, change if necessary
volumes:
- name: flink-config-volume
configMap:
name: flink-config
items:
- key: flink-conf.yaml
path: flink-conf.yaml
- key: log4j-console.properties
path: log4j-console.properties
- name: tpch-dbgen-data
persistentVolumeClaim:
claimName: tpch-dbgen-data-pvc
The Flink stream application is starting but it cannot read the files on the directory /opt/tpch-dbgen/data of the image felipeogutierrez/tpch-dbgen. I am getting the error: java.io.FileNotFoundException: /opt/tpch-dbgen/data/orders.tbl (No such file or directory). It is strange because when I try to go into the container felipeogutierrez/tpch-dbgen I can list the files. So I suppose there is something wrong on my Kubernetes configuration. Does anyone know to point what I am missing on the Kubernetes configuration files?
$ docker run -i -t felipeogutierrez/tpch-dbgen /bin/bash
root#10c0944a95f8:/opt# pwd
/opt
root#10c0944a95f8:/opt# ls tpch-dbgen/data/
customer.tbl dbgen dists.dss lineitem.tbl nation.tbl orders.tbl part.tbl partsupp.tbl region.tbl supplier.tbl
Also, when I list the logs of the container tpch-dbgen I can see the directory tpch-dbgen that I want to read. Although I cannot execute the command command: ["ls tpch-dbgen"] inside my Kubernetes config file.
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
flink-jobmanager-n9nws 1/1 Running 2 17m
flink-taskmanager-777cb5bf77-ncdl4 1/1 Running 0 4m54s
flink-taskmanager-777cb5bf77-npmrx 1/1 Running 0 4m54s
flink-taskmanager-777cb5bf77-zc2nw 1/1 Running 0 4m54s
$ kubectl logs flink-taskmanager-777cb5bf77-ncdl4 tpch-dbgen
generate-tpch-dbgen.sh
tpch-dbgen
Docker has an unusual feature where, under some specific circumstances, it will populate a newly created volume from the image. You should not rely on this functionality, since it completely ignores updates in the underlying images and it doesn't work on Kubernetes.
In your Kubernetes setup, you create a new empty PersistentVolumeClaim, and then mount this over your actual data in both the init and main containers. As with all Unix mounts, this hides the data that was previously in that directory. Nothing causes data to get copied into that volume. This works the same way as every other kind of mount, except the Docker named-volume mount: you'll see the same behavior if you change your Compose setup to do a host bind mount, or if you play around with your local development system using a USB drive as a "volume".
You need to make your init container (or something else) explicitly copy data into the directory. For example:
initContainers:
- name: tpch-dbgen
image: felipeogutierrez/tpch-dbgen
command:
- /bin/cp
- -a
- /opt/tpch-dbgen/data
- /data
volumeMounts:
- name: tpch-dbgen-data
mountPath: /data # NOT the same path as in the image
If the main process modifies these files in place, you can make the command be more intelligent, or write a script into your image that only copies the individual files in if they don't exist yet.
It could potentially make more sense to have your image generate the data files at startup time, rather than at image-build time. That could look like:
FROM gcc
COPY ./generate-tpch-dbgen.sh /usr/local/bin/
RUN chmod +x /usr/local/bin/generate-tpch-dbgen.sh
CMD ["generate-tpch-dbgen.sh"]
Then in your init container, you can run the default command (the generate script) with the working directory set to the volume directory
initContainers:
- name: tpch-dbgen
image: felipeogutierrez/tpch-dbgen
volumeMounts:
- name: tpch-dbgen-data
mountPath: /opt/tpch-dbgen/data # or anywhere really
workingDir: /opt/tpch-dbgen/data # matching mountPath
I got to run the PersistentVolumeClaim and share it between pods. Basically I had to use a subPath property which I learned from this answer https://stackoverflow.com/a/43404857/2096986 and I am using a simple Job that I learned from this answer https://stackoverflow.com/a/64023672/2096986. The final results is below:
The Dockerfile:
FROM gcc AS builder
RUN mkdir -p /opt
COPY ./generate-tpch-dbgen.sh /opt/generate-tpch-dbgen.sh
WORKDIR /opt
RUN chmod +x /opt/generate-tpch-dbgen.sh
ENTRYPOINT ["/bin/sh","/opt/generate-tpch-dbgen.sh"]
and the script generate-tpch-dbgen.sh has to have this line in the end sleep infinity & wait to not finalize. The PersistentVolumeClaim is the same of the question. Then I create a Job with the subPath property.
apiVersion: batch/v1
kind: Job
metadata:
name: tpch-dbgen-job
spec:
template:
metadata:
labels:
app: flink
component: tpch-dbgen
spec:
restartPolicy: OnFailure
volumes:
- name: tpch-dbgen-data
persistentVolumeClaim:
claimName: tpch-dbgen-data-pvc
containers:
- name: tpch-dbgen
image: felipeogutierrez/tpch-dbgen
imagePullPolicy: Always
volumeMounts:
- mountPath: /opt/tpch-dbgen/data
name: tpch-dbgen-data
subPath: data
and I use it on the other deployment also with the subPath property.
apiVersion: apps/v1
kind: Deployment
metadata:
name: flink-taskmanager
spec:
replicas: 3
selector:
matchLabels:
app: flink
component: taskmanager
template:
metadata:
labels:
app: flink
component: taskmanager
spec:
volumes:
- name: flink-config-volume
configMap:
name: flink-config
items:
- key: flink-conf.yaml
path: flink-conf.yaml
- key: log4j-console.properties
path: log4j-console.properties
- name: tpch-dbgen-data
persistentVolumeClaim:
claimName: tpch-dbgen-data-pvc
containers:
- name: taskmanager
image: felipeogutierrez/explore-flink:1.11.1-scala_2.12
imagePullPolicy: Always
env:
args: ["taskmanager"]
ports:
- containerPort: 6122
name: rpc
- containerPort: 6125
name: query-state
livenessProbe:
tcpSocket:
port: 6122
initialDelaySeconds: 30
periodSeconds: 60
volumeMounts:
- name: flink-config-volume
mountPath: /opt/flink/conf/
- name: tpch-dbgen-data
mountPath: /opt/tpch-dbgen/data
subPath: data
securityContext:
runAsUser: 9999 # refers to user _flink_ from official flink image, change if necessary
Maybe the issue is the accessMode you set on your PVC. ReadWriteOnce means it can only be mounted by one POD.
See here for Details.
You could try to use ReadWriteMany.
Your generate-tpch-dbgen.sh script is executed while building the docker image resulting those files in /opt/tpch-dbgen/data directory. So, when you run the image, you can see those files.
But the problem with k8s pvc, when you mount the volume (initially empty) to your containers, it replaces the /opt/tpch-dbgen/data directory along with the files in it.
Solution:
Don't execute the generate-tpch-dbgen.sh while building the docker image, rather execute it in the runtime. Then, the files will be created in the shared pv from the init container.
Something like below:
FROM gcc AS builder
RUN mkdir -p /opt
COPY ./generate-tpch-dbgen.sh /opt/generate-tpch-dbgen.sh
RUN chmod +x /opt/generate-tpch-dbgen.sh
ENTRYPOINT ["/bin/sh","/opt/generate-tpch-dbgen.sh"]

Missing write permissions to the following paths: /var/www/html/pub/media

kubectl -n magento logs magento-install-jssk6
I am getting Database found In ConfigModel.php line 166:Missing write permissions to the following paths: /var/www/html/pub/media in install job:
apiVersion: batch/v1
kind: Job
metadata:
name: magento-install
namespace: magento
spec:
template:
metadata:
name: install
labels:
app: magento-install
k8s-app: magento
spec:
containers:
- name: magento-setup
image: kiweeteam/magento2:vanilla-2.3.4-php7.3-fpm
command: ["/bin/sh"]
args:
- -c
- |
/bin/bash <<'EOF'
bin/install.sh
php bin/magento setup:perf:generate-fixtures setup/performance-toolkit/profiles/ce/small.xml
magerun index:list | awk '{print $2}' | tail -n+4 | xargs -I{} magerun index:set-mode schedule {}
magerun cache:flush
EOF
envFrom:
- configMapRef:
name: config
volumeMounts:
- mountPath: /var/www/html/pub/media
name: media
volumes:
- name: media
persistentVolumeClaim:
claimName: media
restartPolicy: OnFailure
and when I try to change permissions I am getting chown: changing
ownership of '/var/www/html/pub/media': Operation not permitted
It happens because you run chown as www-data user and the current owner of this directory is root.
You can resolve your issue by using the init container run as root (user with id 0). Below you can see a modified version of your magento-install Job with the init cotntainer already added:
apiVersion: batch/v1
kind: Job
metadata:
name: magento-install
namespace: magento
spec:
template:
metadata:
name: install
labels:
app: magento-install
k8s-app: magento
spec:
initContainers:
- name: magento-chown
securityContext:
runAsUser: 0
image: kiweeteam/magento2:vanilla-2.3.4-php7.3-fpm
command: ['sh', '-c', 'chown -R www-data:www-data /var/www/html/pub/media']
volumeMounts:
- name: media
mountPath: "/var/www/html/pub/media"
containers:
- name: magento-setup
image: kiweeteam/magento2:vanilla-2.3.4-php7.3-fpm
command: ["/bin/sh"]
args:
- -c
- |
/bin/bash <<'EOF'
bin/install.sh
php bin/magento setup:perf:generate-fixtures setup/performance-toolkit/profiles/ce/small.xml
magerun index:list | awk '{print $2}' | tail -n+4 | xargs -I{} magerun index:set-mode schedule {}
magerun cache:flush
EOF
envFrom:
- configMapRef:
name: config
volumeMounts:
- mountPath: /var/www/html/pub/media
name: media
volumes:
- name: media
persistentVolumeClaim:
claimName: media
restartPolicy: OnFailure
Once you attach to your newly created Pod by using:
kubectl exec -ti -n magento magento-install-z66qg -- /bin/bash
You'll see that the current owner of the /var/www/html/pub/media directory isn't any more root but www-data user:
www-data#magento-install-z66qg:~/html$ ls -ld /var/www/html/pub/media
drwxr-xr-x 3 www-data www-data 4096 Jul 27 18:45 /var/www/html/pub/media
We can simplify it even more. The init container doesn't even need to use the kiweeteam/magento2:vanilla-2.3.4-php7.3-fpm image. It might as well be a simple container based on busybox, which runs as root by default so you can omit the security context from the previous example and your initContainers section will look as follows:
initContainers:
- name: magento-chown
image: busybox
command: ['sh', '-c', 'chown -R www-data:www-data /var/www/html/pub/media']
volumeMounts:
- name: media
The final effect will be exactly the same.

How to specify a directory in ConfigMap that is located with in a docker container running on Kubernetes?

I'm working in a setup where I have a docker container running airflow deployed on Kubernetes. What I'm trying to do is package the dags definition file with the docker container that contains the airflow installation (for versioning purposes), and then have the ConfigMap which defines the dags_folder directory specify the directory (within airflow-docker) where that dags definition file is.
Dockerfile (airflow is the k8s namespace)
RUN mkdir /home/airflow/ \
&& mkdir /home/airflow/dags \
&& chown airflow:airflow /home/airflow \
&& chown airflow:airflow /home/airflow/dags
...
ADD dags.py /home/airflow/dags
USER airflow
ConfigMap
airflow.cfg: |
[core]
dags_folder = /home/airflow/dags
You have to create config file
airflow.cfg: |
data:
core:
dags_folder = /home/airflow/dags
Then mount it in deployment.yml file.
apiVersion: apps/v1
kind: Deployment
metadata:
name: your_app_name
namespace: default or your namespace
spec:
replicas: 1
selector:
matchLabels:
app: your_app_name
template:
metadata:
labels:
app: your_app_name
spec:
container:
- name: Your image.
ports:
- name: my_app_port
containerPort: 7000
volumeMounts:
- mountPath: /your/directory/airflow.cfg
subPath: core
name: name of config_map # this should match with name of config map
volumes:
- name: name of the config_map
configMap:
name: your_config_map

Disable Transparent Huge Pages from Kubernetes

I deploy Redis container via Kubernetes and get the following warning:
WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled
Is it possible to disable THP via Kubernetes? Perhaps via init-containers?
Yes, with init-containers it's quite straightforward:
apiVersion: v1
kind: Pod
metadata:
name: thp-test
spec:
restartPolicy: Never
terminationGracePeriodSeconds: 1
volumes:
- name: host-sys
hostPath:
path: /sys
initContainers:
- name: disable-thp
image: busybox
volumeMounts:
- name: host-sys
mountPath: /host-sys
command: ["sh", "-c", "echo never >/host-sys/kernel/mm/transparent_hugepage/enabled"]
containers:
- name: busybox
image: busybox
command: ["cat", "/sys/kernel/mm/transparent_hugepage/enabled"]
Demo (notice that this is a system wide setting):
$ ssh THATNODE cat /sys/kernel/mm/transparent_hugepage/enabled
always [madvise] never
$ kubectl create -f thp-test.yaml
pod "thp-test" created
$ kubectl logs thp-test
always madvise [never]
$ kubectl delete pod thp-test
pod "thp-test" deleted
$ ssh THATNODE cat /sys/kernel/mm/transparent_hugepage/enabled
always madvise [never]
Ay,
I don't know if what I did is a good idea but we needed to deactivate THP on all our K8S VMs for all our apps. So I used a DaemonSet instead of adding an init-container to all our stacks :
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: thp-disable
namespace: kube-system
spec:
selector:
matchLabels:
name: thp-disable
template:
metadata:
labels:
name: thp-disable
spec:
restartPolicy: Always
terminationGracePeriodSeconds: 1
volumes:
- name: host-sys
hostPath:
path: /sys
initContainers:
- name: disable-thp
image: busybox
volumeMounts:
- name: host-sys
mountPath: /host-sys
command: ["sh", "-c", "echo never >/host-sys/kernel/mm/transparent_hugepage/enabled"]
containers:
- name: busybox
image: busybox
command: ["watch", "-n", "600", "cat", "/sys/kernel/mm/transparent_hugepage/enabled"]
I think it's a little dirty but it works.

Resources