I am trying to perform a Kubernetes Rolling Update using Helm v2; however, I'm unable to.
When I perform a helm upgrade on a slow Tomcat image, the original pod is destroyed.
I would like to figure out how to achieve zero downtime by incrementally updating Pods instances with new ones, and draining old ones.
To demonstrate, I created a sample slow Tomcat Docker image, and a Helm chart.
To install:
helm install https://github.com/h-q/slowtom/raw/master/docs/slowtom.tgz --name slowtom \
-f https://github.com/h-q/slowtom/raw/master/docs/slowtom/environments/initial.yaml
You can follow the logs by running kubectl logs -f slowtom-sf-0, and once ready you can access the application on http://localhost:30901
To upgrade:
(and that's where I need help)
helm upgrade slowtom https://github.com/h-q/slowtom/raw/master/docs/slowtom.tgz \
-f https://github.com/h-q/slowtom/raw/master/docs/slowtom/environments/upgrade.yaml
The upgrade.yaml is identical to the initial.yaml deployment file with the exception of the tag version number.
Here the original pod is destroyed, and the new one starts. Meanwhile, users are unable to access the application on http://localhost:30901
To Delete:
helm del slowtom --purge
Reference
Local Helm Chart
Download helm chart:
curl -LO https://github.com/h-q/slowtom/raw/master/docs/slowtom.tgz
tar vxfz ./slowtom.tgz
Install from local helm-chart:
helm install --debug ./slowtom --name slowtom -f ./slowtom/environments/initial.yaml
Upgrade from local helm-chart:
helm upgrade --debug slowtom ./slowtom -f ./slowtom/environments/upgrade.yaml
Docker Image
Dockerfile
FROM tomcat:8.5-jdk8-corretto
RUN mkdir /usr/local/tomcat/webapps/ROOT && \
echo '<html><head><title>Slow Tomcat</title></head><body><h1>Slow Tomcat Now Ready</h1></body></html>' >> /usr/local/tomcat/webapps/ROOT/index.html
RUN echo '#!/usr/bin/env bash' >> /usr/local/tomcat/bin/slowcatalina.sh && \
echo 'x=2' >> /usr/local/tomcat/bin/slowcatalina.sh && \
echo 'secs=$(($x * 60))' >> /usr/local/tomcat/bin/slowcatalina.sh && \
echo 'while [ $secs -gt 0 ]; do' >> /usr/local/tomcat/bin/slowcatalina.sh && \
echo ' >&2 echo -e "Blast off in $secs\033[0K\r"' >> /usr/local/tomcat/bin/slowcatalina.sh && \
echo ' sleep 1' >> /usr/local/tomcat/bin/slowcatalina.sh && \
echo ' : $((secs--))' >> /usr/local/tomcat/bin/slowcatalina.sh && \
echo 'done' >> /usr/local/tomcat/bin/slowcatalina.sh && \
echo '>&2 echo "slow cataline done. will now start real catalina"' >> /usr/local/tomcat/bin/slowcatalina.sh && \
echo 'exec catalina.sh run' >> /usr/local/tomcat/bin/slowcatalina.sh && \
chmod +x /usr/local/tomcat/bin/slowcatalina.sh
ENTRYPOINT ["/usr/local/tomcat/bin/slowcatalina.sh"]
Helm Chart Content
slowtom/Chart.yaml
apiVersion: v1
description: slow-tomcat Helm chart for Kubernetes
name: slowtom
version: 1.1.2 # whatever
slowtom/values.yaml
# Do not use this file, but ones from environmments folder
slowtom/environments/initial.yaml
# Storefront
slowtom_sf:
name: "slowtom-sf"
hasHealthcheck: "true"
isResilient: "false"
replicaCount: 2
aspect_values:
- name: y_aspect
value: "storefront"
image:
repository: hqasem/slow-tomcat
pullPolicy: IfNotPresent
tag: 1
env:
- name: y_env
value: whatever
slowtom/environments/upgrade.yaml
# Storefront
slowtom_sf:
name: "slowtom-sf"
hasHealthcheck: "true"
isResilient: "false"
replicaCount: 2
aspect_values:
- name: y_aspect
value: "storefront"
image:
repository: hqasem/slow-tomcat
pullPolicy: IfNotPresent
tag: 2
env:
- name: y_env
value: whatever
slowtom/templates/deployment.yaml
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: {{ .Values.slowtom_sf.name }}
labels:
chart: "{{ .Chart.Name | trunc 63 }}"
chartVersion: "{{ .Chart.Version | trunc 63 }}"
visualize: "true"
app: {{ .Values.slowtom_sf.name }}
spec:
replicas: {{ .Values.slowtom_sf.replicaCount }}
selector:
matchLabels:
app: {{ .Values.slowtom_sf.name }}
template:
metadata:
labels:
app: {{ .Values.slowtom_sf.name }}
visualize: "true"
spec:
dnsPolicy: ClusterFirstWithHostNet
containers:
- name: {{ .Values.slowtom_sf.name }}
image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
command: ["/usr/local/tomcat/bin/slowcatalina.sh"]
args: ["whatever"]
env:
{{ toYaml .Values.env | indent 12 }}
{{ toYaml .Values.slowtom_sf.aspect_values | indent 12 }}
resources:
{{ toYaml .Values.resources | indent 12 }}
---
slowtom/templates/service.yaml
kind: Service
apiVersion: v1
metadata:
name: {{.Values.slowtom_sf.name}}
labels:
chart: "{{ .Chart.Name | trunc 63 }}"
chartVersion: "{{ .Chart.Version | trunc 63 }}"
app: {{.Values.slowtom_sf.name}}
visualize: "true"
hasHealthcheck: "{{ .Values.slowtom_sf.hasHealthcheck }}"
isResilient: "{{ .Values.slowtom_sf.isResilient }}"
spec:
type: NodePort
selector:
app: {{.Values.slowtom_sf.name}}
sessionAffinity: ClientIP
ports:
- protocol: TCP
port: 8080
targetPort: 8080
name: http
nodePort: 30901
---
Unlike Deployment, StatefulSet does not start a new pod before destroying the old one during a rolling update. Instead, the expectation is that you have multiple pods, and they will be replaced one-by-one. Since you only have 1 replica configured, it must destroy it first. Either increase your replica count to 2 or more, or switch to a Deployment template.
https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#rolling-update
I solved this problem by adding Readiness or Startup Probes to my deployment.yaml
slowtom/templates/deployment.yaml
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: {{ .Values.slowtom_sf.name }}
labels:
chart: "{{ .Chart.Name | trunc 63 }}"
chartVersion: "{{ .Chart.Version | trunc 63 }}"
visualize: "true"
app: {{ .Values.slowtom_sf.name }}
spec:
replicas: {{ .Values.slowtom_sf.replicaCount }}
selector:
matchLabels:
app: {{ .Values.slowtom_sf.name }}
template:
metadata:
labels:
app: {{ .Values.slowtom_sf.name }}
visualize: "true"
spec:
dnsPolicy: ClusterFirstWithHostNet
containers:
- name: {{ .Values.slowtom_sf.name }}
image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
command: ["/usr/local/tomcat/bin/slowcatalina.sh"]
args: ["whatever"]
env:
{{ toYaml .Values.env | indent 12 }}
{{ toYaml .Values.slowtom_sf.aspect_values | indent 12 }}
resources:
{{ toYaml .Values.resources | indent 12 }}
readinessProbe:
httpGet:
path: /
port: 8080
initialDelaySeconds: 60
periodSeconds: 30
timeoutSeconds: 1
failureThreshold: 3
---
Related
we are in process of migrating init-container to kubernates job . so I added init-container image location in containers section of job.yaml. but shell script execution within .dockerfile of init-container is not getting invoked. Could some one help what could be wrong here?
job.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: "{{ .Release.Name }}-init-job"
namespace: {{ .Release.Namespace }}
spec:
template:
metadata:
annotations:
linkerd.io/inject: disabled
"helm.sh/hook-delete-policy": before-hook-creation
"helm.sh/hook": pre-install,pre-upgrade,pre-delete
"helm.sh/hook-weight": "-5"
spec:
serviceAccountName: {{ .Release.Name }}-init-service-account
containers:
- name: app-installer
image: artifactorylocation/test-init-container:1.0.1
command:
- /bin/bash
- -c
- echo Hello executing k8s init-container
securityContext:
readOnlyRootFilesystem: true
restartPolicy: OnFailure
.dockerfile of test-init-container
FROM repository/java17-ol8-x64:adddd4c
WORKDIR /
ADD target/test-init-container-ms.jar ./
ADD target/lib ./lib
ADD start.sh /
RUN chmod +x /start.sh
CMD ["sh", "/start.sh"]
EXPOSE 8080
start.sh is not been executed by job.
Just replace your job.yml to avoid replacing the entrypoint(CMD):
apiVersion: batch/v1
kind: Job
metadata:
name: "{{ .Release.Name }}-init-job"
namespace: {{ .Release.Namespace }}
spec:
template:
metadata:
annotations:
linkerd.io/inject: disabled
"helm.sh/hook-delete-policy": before-hook-creation
"helm.sh/hook": pre-install,pre-upgrade,pre-delete
"helm.sh/hook-weight": "-5"
spec:
serviceAccountName: {{ .Release.Name }}-init-service-account
containers:
- name: app-installer
image: artifactorylocation/test-init-container:1.0.1
securityContext:
readOnlyRootFilesystem: true
restartPolicy: OnFailure
I am facing a "CrashLoopBackoff" error when I deploy a .Net Core API with helm upgrade --install flextoeco . :
NAME READY STATUS RESTARTS AGE
flextoecoapi-6bb7cdd846-r6c67 0/1 CrashLoopBackOff 4 (38s ago) 3m8s
flextoecoapi-fb7f7b556-tgbrv 0/1 CrashLoopBackOff 219 (53s ago) 10h
mssql-depl-86c86b5f44-ldj48 0/1 Pending
I have run ks describe pod flextoecoapi-6bb7cdd846-r6c67 and part of the output is as below :
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 5m4s default-scheduler Successfully assigned default/flextoecoapi-6bb7cdd846-r6c67 to fbcdcesdn02
Normal Pulling 5m3s kubelet Pulling image "golide/flextoeco:1.1.1"
Normal Pulled 4m57s kubelet Successfully pulled image "golide/flextoeco:1.1.1" in 6.2802081s
Normal Killing 4m34s kubelet Container flextoeco failed liveness probe, will be restarted
Normal Created 4m33s (x2 over 4m57s) kubelet Created container flextoeco
Normal Started 4m33s (x2 over 4m56s) kubelet Started container flextoeco
Normal Pulled 4m33s kubelet Container image "golide/flextoeco:1.1.1" already present on machine
Warning Unhealthy 4m14s (x12 over 4m56s) kubelet Readiness probe failed: Get "http://10.244.6.59:80/": dial tcp 10.244.0.59:80: connect: connection refused
Warning Unhealthy 4m14s (x5 over 4m54s) kubelet Liveness probe failed: Get "http://10.244.6.59:80/": dial tcp 10.244.0.59:80: connect: connection refused
Warning BackOff 3s (x10 over 2m33s) kubelet Back-off restarting failed container
Taking from the suggestions here it appears I have a number of options to fix most notable being:
i) Add a command to the Dockerfile that will ensure there is some foreground process running
ii)Extend the LivenessProbe initialDelaySeconds
I have opted for the first and edited my Dockerfile as below :
FROM mcr.microsoft.com/dotnet/core/sdk:3.1 AS build-env
WORKDIR /app
# Copy csproj and restore as distinct layers
COPY *.csproj ./
RUN dotnet restore
# Copy everything else and build
COPY . ./
RUN dotnet publish -c Release -o out
# Build runtime image
FROM mcr.microsoft.com/dotnet/aspnet:3.1
WORKDIR /app
ENV ASPNETCORE_URLS http://+:5000
COPY --from=build-env /app/out .
ENTRYPOINT ["dotnet", "FlexToEcocash.dll"]
CMD tail -f /dev/null
After this change I am still getting the same error.
UPDATE
Skipped : The deployment works perfectly when I am not using helm i.e I can do a kubectl apply for the deployment/service/nodeport/clusterip and the API is deployed without issues.
I have tried to update values.yaml and service.yaml as below, but after redeploy the CrashLoopBackOff error persists :
templates/service.yaml
apiVersion: v1
kind: Service
metadata:
name: {{ include "flextoeco.fullname" . }}
labels:
{{- include "flextoeco.labels" . | nindent 4 }}
spec:
type: {{ .Values.service.type }}
ports:
- port: {{ .Values.service.port }}
targetPort: http
protocol: TCP
name: http
selector:
{{- include "flextoeco.selectorLabels" . | nindent 4 }}
values.yaml
I have explicitly specified the CPU and memory usage here
replicaCount: 1
image:
repository: golide/flextoeco
pullPolicy: IfNotPresent
# Overrides the image tag whose default is the chart appVersion.
tag: "1.1.2"
imagePullSecrets: []
nameOverride: ""
fullnameOverride: ""
serviceAccount:
podAnnotations: {}
podSecurityContext: {}
# fsGroup: 2000
securityContext: {}
service:
type: ClusterIP
port: 80
ingress:
enabled: false
className: ""
annotations: {}
# kubernetes.io/ingress.class: nginx
# kubernetes.io/tls-acme: "true"
hosts:
- host: flextoeco.local
paths:
- path: /
pathType: ImplementationSpecific
tls: []
# - secretName: chart-example-tls
# hosts:
# - chart-example.local
resources:
limits:
cpu: 1
memory: 1Gi
requests:
cpu: 100m
memory: 250Mi
autoscaling:
enabled: false
minReplicas: 1
maxReplicas: 100
targetCPUUtilizationPercentage: 80
# targetMemoryUtilizationPercentage: 80
nodeSelector: {}
templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "flextoeco.fullname" . }}
labels:
{{- include "flextoeco.labels" . | nindent 4 }}
spec:
{{- if not .Values.autoscaling.enabled }}
replicas: {{ .Values.replicaCount }}
{{- end }}
selector:
matchLabels:
{{- include "flextoeco.selectorLabels" . | nindent 6 }}
template:
metadata:
{{- with .Values.podAnnotations }}
annotations:
{{- toYaml . | nindent 8 }}
{{- end }}
labels:
{{- include "flextoeco.selectorLabels" . | nindent 8 }}
spec:
{{- with .Values.imagePullSecrets }}
imagePullSecrets:
{{- toYaml . | nindent 8 }}
{{- end }}
serviceAccountName: {{ include "flextoeco.serviceAccountName" . }}
securityContext:
{{- toYaml .Values.podSecurityContext | nindent 8 }}
containers:
- name: {{ .Chart.Name }}
securityContext:
{{- toYaml .Values.securityContext | nindent 12 }}
image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
ports:
- name: http
labels:
{{- include "flextoeco.selectorLabels" . | nindent 8 }}
spec:
{{- with .Values.imagePullSecrets }}
imagePullSecrets:
{{- toYaml . | nindent 8 }}
{{- end }}
serviceAccountName: {{ include "flextoeco.serviceAccountName" . }}
securityContext:
{{- toYaml .Values.podSecurityContext | nindent 8 }}
containers:
- name: {{ .Chart.Name }}
securityContext:
{{- toYaml .Values.securityContext | nindent 12 }}
image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
ports:
- name: http
containerPort: 80
protocol: TCP
livenessProbe:
tcpSocket:
port: 8085
initialDelaySeconds: 300
periodSeconds: 30
timeoutSeconds: 20
readinessProbe:
tcpSocket:
port: 8085
initialDelaySeconds: 300
periodSeconds: 30
resources:
{{- toYaml .Values.resources | nindent 12 }}
{{- with .Values.nodeSelector }}
nodeSelector:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.affinity }}
affinity:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.tolerations }}
{{- toYaml . | nindent 8 }}
{{- end }}
In the Deployment spec, I need to use port 5000 as the containerPort: value and also the port: in the probes. My application is listening on port 5000 :
- name: http
containerPort: 5000
protocol: TCP
livenessProbe:
tcpSocket:
port: 5000
initialDelaySeconds: 300
periodSeconds: 30
timeoutSeconds: 20
readinessProbe:
tcpSocket:
port: 5000
initialDelaySeconds: 300
periodSeconds: 30
The configuration in service.yaml is correct : If the Deployment spec maps the name http to port 5000 then referring to targetPort: http in the Service is right.
I am trying to run a docker container within a job that I am deploying with helm using AKS. The purpose of this is to run some tests using Selenium and make some postgres calls to automate web ui tests.
When trying to run within the job, the following error is received:
"Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?"
I ran into this problem locally, but can work around it using
docker run -it --rm -v /var/run/docker.sock:/var/run/docker.sock web-ui-auto:latest /bin/bash
The problem is I am using helm to deploy the job separate from the running tasks since it can take about an hour to complete.
I tried adding a deployment.yaml to my helm like this:
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ template "web-ui-auto.fullname" . }}
labels:
app: {{ template "web-ui-auto.name" . }}
chart: {{ template "web-ui-auto.chart" . }}
draft: {{ .Values.draft | default "draft-app" }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
spec:
revisionHistoryLimit: 5
replicas: {{ .Values.replicaCount }}
selector:
matchLabels:
app: {{ template "web-ui-auto.name" . }}
release: {{ .Release.Name }}
template:
metadata:
labels:
app: {{ template "web-ui-auto.name" . }}
draft: {{ .Values.draft | default "draft-app" }}
release: {{ .Release.Name }}
annotations:
buildID: {{ .Values.buildID | default "" | quote }}
container.apparmor.security.beta.kubernetes.io/{{ .Chart.Name }}: runtime/default
spec:
containers:
- name: {{ .Chart.Name }}
securityContext:
runAsNonRoot: {{ .Values.securityContext.runAsNonRoot }}
runAsUser: {{ .Values.securityContext.runAsUser }}
runAsGroup: {{ .Values.securityContext.runAsGroup }}
allowPrivilegeEscalation: {{ .Values.securityContext.allowPrivilegeEscalation }}
seccompProfile:
type: RuntimeDefault
image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
ports:
- name: http
containerPort: {{ .Values.deployment.containerPort }}
protocol: TCP
volumeMounts:
- name: dockersock
mountPath: "/var/run/docker.sock"
volumes:
- name: dockersock
hostPath:
path: /var/run/docker.sock
but was met with failures still. My question is what is the best method to run docker within the job successfully when deploying with helm. Any help is appreciated.
kubectl -n magento logs magento-install-jssk6
I am getting Database found In ConfigModel.php line 166:Missing write permissions to the following paths: /var/www/html/pub/media in install job:
apiVersion: batch/v1
kind: Job
metadata:
name: magento-install
namespace: magento
spec:
template:
metadata:
name: install
labels:
app: magento-install
k8s-app: magento
spec:
containers:
- name: magento-setup
image: kiweeteam/magento2:vanilla-2.3.4-php7.3-fpm
command: ["/bin/sh"]
args:
- -c
- |
/bin/bash <<'EOF'
bin/install.sh
php bin/magento setup:perf:generate-fixtures setup/performance-toolkit/profiles/ce/small.xml
magerun index:list | awk '{print $2}' | tail -n+4 | xargs -I{} magerun index:set-mode schedule {}
magerun cache:flush
EOF
envFrom:
- configMapRef:
name: config
volumeMounts:
- mountPath: /var/www/html/pub/media
name: media
volumes:
- name: media
persistentVolumeClaim:
claimName: media
restartPolicy: OnFailure
and when I try to change permissions I am getting chown: changing
ownership of '/var/www/html/pub/media': Operation not permitted
It happens because you run chown as www-data user and the current owner of this directory is root.
You can resolve your issue by using the init container run as root (user with id 0). Below you can see a modified version of your magento-install Job with the init cotntainer already added:
apiVersion: batch/v1
kind: Job
metadata:
name: magento-install
namespace: magento
spec:
template:
metadata:
name: install
labels:
app: magento-install
k8s-app: magento
spec:
initContainers:
- name: magento-chown
securityContext:
runAsUser: 0
image: kiweeteam/magento2:vanilla-2.3.4-php7.3-fpm
command: ['sh', '-c', 'chown -R www-data:www-data /var/www/html/pub/media']
volumeMounts:
- name: media
mountPath: "/var/www/html/pub/media"
containers:
- name: magento-setup
image: kiweeteam/magento2:vanilla-2.3.4-php7.3-fpm
command: ["/bin/sh"]
args:
- -c
- |
/bin/bash <<'EOF'
bin/install.sh
php bin/magento setup:perf:generate-fixtures setup/performance-toolkit/profiles/ce/small.xml
magerun index:list | awk '{print $2}' | tail -n+4 | xargs -I{} magerun index:set-mode schedule {}
magerun cache:flush
EOF
envFrom:
- configMapRef:
name: config
volumeMounts:
- mountPath: /var/www/html/pub/media
name: media
volumes:
- name: media
persistentVolumeClaim:
claimName: media
restartPolicy: OnFailure
Once you attach to your newly created Pod by using:
kubectl exec -ti -n magento magento-install-z66qg -- /bin/bash
You'll see that the current owner of the /var/www/html/pub/media directory isn't any more root but www-data user:
www-data#magento-install-z66qg:~/html$ ls -ld /var/www/html/pub/media
drwxr-xr-x 3 www-data www-data 4096 Jul 27 18:45 /var/www/html/pub/media
We can simplify it even more. The init container doesn't even need to use the kiweeteam/magento2:vanilla-2.3.4-php7.3-fpm image. It might as well be a simple container based on busybox, which runs as root by default so you can omit the security context from the previous example and your initContainers section will look as follows:
initContainers:
- name: magento-chown
image: busybox
command: ['sh', '-c', 'chown -R www-data:www-data /var/www/html/pub/media']
volumeMounts:
- name: media
The final effect will be exactly the same.
I have a config for deploying 4 pods(hence, 4 workers) for Airflow on Kubernetes using Docker. However, all of a sudden, worker-0 is unable to make a certain curl request whereas other workers are able to make one. This is resulting in the failure of pipelines.
I have tried reading about mismatching configs and stateful sets but in my case, there is one config for all the workers and this is the only single source of truth.
statefulsets-workers.yaml file is as follows:
# Workers are not in deployment, but in StatefulSet, to allow each worker expose a mini-server
# that only serve logs, that will be used by the web server.
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: {{ template "airflow.fullname" . }}-worker
labels:
app: {{ template "airflow.name" . }}-worker
chart: {{ template "airflow.chart" . }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
spec:
serviceName: "{{ template "airflow.fullname" . }}-worker"
updateStrategy:
type: RollingUpdate
# Use experimental burst mode for faster StatefulSet scaling
# https://github.com/kubernetes/kubernetes/commit/****
podManagementPolicy: Parallel
replicas: {{ .Values.celery.num_workers }}
template:
metadata:
{{- if .Values.airflow.pallet.config_path }}
annotations:
checksum/config: {{ include (print $.Template.BasePath "/configmap.yaml") . | sha256sum }}
{{- end }}
labels:
app: {{ template "airflow.name" . }}-worker
release: {{ .Release.Name }}
spec:
restartPolicy: Always
terminationGracePeriodSeconds: 30
securityContext:
runAsUser: 1002
fsGroup: 1002
containers:
- name: {{ .Chart.Name }}-worker
imagePullPolicy: {{ .Values.airflow.image_pull_policy }}
image: "{{ .Values.airflow.image }}:{{ .Values.airflow.imageTag }}"
volumeMounts:
{{- if .Values.airflow.storage.enabled }}
- name: google-cloud-key
mountPath: /var/secrets/google
readOnly: true
{{- end }}
- name: worker-logs
mountPath: /usr/local/airflow/logs
- name: data
mountPath: /usr/local/airflow/rootfs
env:
{{- if .Values.airflow.storage.enabled }}
- name: GOOGLE_APPLICATION_CREDENTIALS
value: /var/secrets/google/key.json
{{- end }}
{{- range $setting, $option := .Values.airflow.config }}
- name: {{ $setting }}
value: {{ $option }}
{{- end }}
securityContext:
allowPrivilegeEscalation: false
envFrom:
- configMapRef:
name: pallet-env-file
args: ["worker"]
ports:
- name: wlog
containerPort: 8793
protocol: TCP
{{- if .Values.airflow.image_pull_secret }}
imagePullSecrets:
- name: {{ .Values.airflow.image_pull_secret }}
{{- end }}
{{- if .Values.airflow.storage.enabled }}
volumes:
- name: google-cloud-key
secret:
secretName: {{ .Values.airflow.storage.secretName }}
{{- end }}
volumeClaimTemplates:
- metadata:
name: worker-logs
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 50Gi
- metadata:
name: data
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 50Gi
I expect all the workers to be able to connect to the service to which I am making the curl request.
It turns out that the environment was indeed the same, however the receiving machine didn't have the new IP of the node whitelisted.
When all the pods crashed, they took the node down with them and restarting the node gave it a new IP. Hence, connection timed out for the worker in that node.