Jenkins is running on EKS and there are affinity rules in place on both the Jenkins main and worker pods.
The idea is to prevent the Jenkins worker pods from running on the same EKS worker nodes, where the Jenkins main pod is running.
The following rules work, until resources limits are pushed, at which point the Jenkins worker pods are scheduled onto the same EKS worker nodes as the Jenkins master pod.
Are there affinity / anti-affinity rules to prevent this from happening?
The rules in place for Jenkins main:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions: # assign to eks apps worker group
- key: node.app/group
operator: In
values:
- apps
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions: # don't assign to a node running jenkins main
- key: app.kubernetes.io/name
operator: In
values:
- jenkins
- key: app.kubernetes.io/component
operator: In
values:
- main
topologyKey: kubernetes.io/hostname
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions: # try not to assign to a node already running a jenkins worker
- key: app.kubernetes.io/name
operator: In
values:
- jenkins
- key: app.kubernetes.io/component
operator: In
values:
- worker
topologyKey: kubernetes.io/hostname
The rules in place for Jenkins worker:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions: # assign to eks apps worker group
- key: node.app/group
operator: In
values:
- apps
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions: # don't assign to a node running jenkins main
- key: app.kubernetes.io/name
operator: In
values:
- jenkins
- key: app.kubernetes.io/component
operator: In
values:
- main
topologyKey: kubernetes.io/hostname
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions: # try not to assign to a node already running a jenkins worker
- key: app.kubernetes.io/name
operator: In
values:
- jenkins
- key: app.kubernetes.io/component
operator: In
values:
- worker
topologyKey: kubernetes.io/hostname
So low and behold guess what...the main pod labels weren't set correctly.
Now you can see the selector lables displaying here:
> aws-vault exec nonlive-build -- kubectl get po -n cicd --show-labels
NAME READY STATUS RESTARTS AGE LABELS
jenkins-6597db4979-khxls 2/2 Running 0 4m8s app.kubernetes.io/component=main,app.kubernetes.io/instance=jenkins
To achieve this, new entries were added to the values file:
main:
metadata:
labels:
app.kubernetes.io/name: jenkins
app.kubernetes.io/component: main
And the Helm _helpers.tpl template was updated accordingly:
{{- define "jenkins.selectorLabels" -}}
app.kubernetes.io/instance: {{ .Release.Name }}
{{- if .Values.main.metadata.labels }}
{{- range $k, $v := .Values.main.metadata.labels }}
{{ $k }}: {{ $v }}
{{- end }}
{{- end }}
{{- end }}
Related
There is a kubernetes cluster with 100 nodes, I have to clean the specific images manually, I know the kubelet garbage collect may help, but it isn't applied in my case.
After browsing the internet , I found a solution - docker in docker, to solve my problem.
I just wanna remove the image in each node one time, is there any way to run a job in each node one time?
I checked the kubernetes labels and podaffinity, but still no ideas, any body could help?
Also, I tried to use daemonset to solve the problem, but turns out that it can only remove the image for a part of nodes instead of all nodes, I don't what might be the problem...
here is the daemonset example:
kind: DaemonSet
apiVersion: apps/v1
metadata:
name: test-ds
labels:
k8s-app: test
spec:
selector:
matchLabels:
k8s-app: test
template:
metadata:
labels:
k8s-app: test
spec:
containers:
- name: test
env:
- name: DELETE_IMAGE_NAME
value: "nginx"
image: busybox
command: ['sh', '-c', 'curl --unix-socket /var/run/docker.sock -X DELETE http://localhost/v1.39/images/$(DELETE_IMAGE_NAME)']
securityContext:
privileged: true
volumeMounts:
- mountPath: /var/run/docker.sock
name: docker-sock-volume
ports:
- containerPort: 80
volumes:
- name: docker-sock-volume
hostPath:
# location on host
path: /var/run/docker.sock
If you want to run you job on single specific Node you can us the Nodeselector in POD spec
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: test
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: test
image: busybox
args:
- /bin/sh
- -c
- date; echo Hello from the Kubernetes cluster
restartPolicy: OnFailure
nodeSelector:
name: node3
daemon set ideally should resolve your issues, as it creates the PODs on each available Node in the cluster.
You can read more about the affinity at here : https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/
nodeSelector provides a very simple way to constrain pods to nodes
with particular labels. The affinity/anti-affinity feature, greatly
expands the types of constraints you can express. The key enhancements
are
The affinity/anti-affinity language is more expressive. The language
offers more matching rules besides exact matches created with a
logical AND operation;
You can use the Affinity in Job YAML something like
apiVersion: v1
kind: Pod
metadata:
name: with-node-affinity
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/e2e-az-name
operator: In
values:
- e2e-az1
- e2e-az2
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: another-node-label-key
operator: In
values:
- another-node-label-value
containers:
- name: with-node-affinity
image: k8s.gcr.io/pause:2.0
Update
Now if you have issue with the Deamon affinity with the Job is also useless, as Job will create the Single POD which will get schedule to Single node as per affinity. Either create 100 job with different affinity rules or you use Deployment + Affinity to schedule the Replicas on different nodes.
We will create one Deployment with POD affinity and make sure, multiple PODs of a single deployment won't get scheduled on one Node.
apiVersion: apps/v1
kind: Deployment
metadata:
name: test-deployment
labels:
app: test
spec:
replicas: 100
selector:
matchLabels:
app: test
template:
metadata:
labels:
app: test
spec:
containers:
- name: test
image: <Image>
ports:
- containerPort: 80
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: "app"
operator: In
values:
- test
topologyKey: "kubernetes.io/hostname"
Try using this deployment template and replace your image here. You can reduce replicas first to 10 instead of 100 to check it's spreading PODs or not.
Read more at : https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#an-example-of-a-pod-that-uses-pod-affinity
Extra :
You can also write and use your custom CRD : https://github.com/darkowlzz/daemonset-job which will behave as daemon set and job
Jenkins is running in AWS EKS cluster under a jenkins-ci namespace. When multibranch pipeline job "Branch-A" started the build, it is picking up correct configurations (KubernetesPod.yaml) and ran successfully and when job "Branch-B" has started the build it is using job A configurations like docker image and buildurl.
Gitlab Configuration:
Branch-A -- KubernetesPod.yaml
apiVersion: v1
kind: Pod
spec:
serviceAccount: jenkins
nodeSelector:
env: jenkins-build
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: env
operator: In
values:
- jenkins-build
tolerations:
- key: "highcpu"
operator: "Equal"
value: "true"
effect: "NoSchedule"
volumes:
- name: dev
hostPath:
path: /dev
imagePullSecrets:
- name: gitlab
containers:
- name: build
image: registry.gitlab.com/mycompany/sw-group/docker/ycp:docker-buildtest-1
imagePullPolicy: IfNotPresent
command:
- cat
securityContext:
privileged: true
volumeMounts:
- mountPath: /dev
name: dev
tty: true
resources:
requests:
memory: "4000Mi"
cpu: "3500m"
limits:
memory: "4000Mi"
cpu: "3500m"
Branch-B -- KubernetesPod.yaml
apiVersion: v1
kind: Pod
spec:
serviceAccount: jenkins
nodeSelector:
env: jenkins-build
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: env
operator: In
values:
- jenkins-build
tolerations:
- key: "highcpu"
operator: "Equal"
value: "true"
effect: "NoSchedule"
volumes:
- name: dev
hostPath:
path: /dev
imagePullSecrets:
- name: gitlab
containers:
- name: build
image: registry.gitlab.com/mycompany/sw-group/docker/ycp:docker-buildtest-2
imagePullPolicy: IfNotPresent
command:
- cat
securityContext:
privileged: true
volumeMounts:
- mountPath: /dev
name: dev
tty: true
resources:
requests:
memory: "4000Mi"
cpu: "3500m"
limits:
memory: "4000Mi"
cpu: "3500m"
Jenkins Branch-A console output:
Seen branch in repository origin/unknownMishariBranch
Seen branch in repository origin/vikg/base
Seen 471 remote branches
Obtained Jenkinsfile.kubernetes from 85b8ab296342b98be52cbef26acf20b15503c273
Running in Durability level: MAX_SURVIVABILITY
[Pipeline] Start of Pipeline
[Pipeline] readTrusted
Obtained KubernetesPod.yaml from 85b8ab296342b98be52cbef26acf20b15503c273
[Pipeline] podTemplate
[Pipeline] {
[Pipeline] node
Still waiting to schedule task
Waiting for next available executor
Agent company-pod-8whw9-wxflb is provisioned from template Kubernetes Pod Template
---
apiVersion: "v1"
kind: "Pod"
metadata:
annotations:
buildUrl: "https://jenkins.mycompany.com/job/multibranch/job/branch-A/3/"
labels:
jenkins: "slave"
jenkins/mycompany-pod: "true"
name: "mycompany-pod-8whw9-wxflb"
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- preference:
matchExpressions:
- key: "env"
operator: "In"
values:
- "jenkins-build"
weight: 1
containers:
- command:
- "cat"
image: "registry.gitlab.com/mycompany/sw-group/docker/ycp:docker-buildtest-1"
imagePullPolicy: "IfNotPresent"
name: "build"
resources:
limits:
memory: "4000Mi"
cpu: "3500m"
requests:
memory: "4000Mi"
cpu: "3500m"
Jenkins Branch-B console output:
Seen branch in repository origin/unknownMishariBranch
Seen branch in repository origin/viking/base
Seen 479 remote branches
Obtained Jenkinsfile.kubernetes from 38ace636171311ef35dc14245bf7a36f49f24e11
Running in Durability level: MAX_SURVIVABILITY
[Pipeline] Start of Pipeline
[Pipeline] readTrusted
Obtained KubernetesPod.yaml from 38ace636171311ef35dc14245bf7a36f49f24e11
[Pipeline] podTemplate
[Pipeline] {
[Pipeline] node
Still waiting to schedule task
Waiting for next available executor
Agent mycompany-pod-qddx4-08xtm is provisioned from template Kubernetes Pod Template
---
apiVersion: "v1"
kind: "Pod"
metadata:
annotations:
buildUrl: "https://jenkins.mycompany.com/job/multibranch/job/branch-A/3/"
labels:
jenkins: "slave"
jenkins/mycompany-pod: "true"
name: "mycompany-pod-qddx4-08xtm"
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- preference:
matchExpressions:
- key: "env"
operator: "In"
values:
- "jenkins-build"
weight: 1
containers:
- command:
- "cat"
image: "registry.gitlab.com/mycompany/sw-group/docker/ycp:docker-buildtest-1"
imagePullPolicy: "IfNotPresent"
name: "build"
resources:
limits:
memory: "4000Mi"
cpu: "3500m"
requests:
memory: "4000Mi"
cpu: "3500m"
Whenever the build gets triggered it is using same label name in Jenkinsfile.
I am posting below part of my jenkinsfile script.
The below solution solved my problem.
Before:
pipeline {
agent {
kubernetes {
label "sn-optimus"
defaultContainer "jnlp"
yamlFile "KubernetesPod.yaml"
}
}
After:
pipeline {
agent {
kubernetes {
label "sn-optimus-${currentBuild.startTimeInMillis}"
defaultContainer "jnlp"
yamlFile "KubernetesPod.yaml"
}
}
I am running my kubernetes cluster on AWS EKS which runs kubernetes 1.10.
I am following this guide to deploy elasticsearch in my Cluster
elasticsearch Kubernetes
The first time I deployed it everything worked fine. Now, When I redeploy it gives me the following error.
ERROR: [2] bootstrap checks failed
[1]: max file descriptors [4096] for elasticsearch process is too low, increase to at least [65536]
[2018-08-24T18:07:28,448][INFO ][o.e.n.Node ] [es-master-6987757898-5pzz9] stopping ...
[2018-08-24T18:07:28,534][INFO ][o.e.n.Node ] [es-master-6987757898-5pzz9] stopped
[2018-08-24T18:07:28,534][INFO ][o.e.n.Node ] [es-master-6987757898-5pzz9] closing ...
[2018-08-24T18:07:28,555][INFO ][o.e.n.Node ] [es-master-6987757898-5pzz9] closed
Here is my deployment file.
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: es-master
labels:
component: elasticsearch
role: master
spec:
replicas: 3
template:
metadata:
labels:
component: elasticsearch
role: master
spec:
initContainers:
- name: init-sysctl
image: busybox:1.27.2
command:
- sysctl
- -w
- vm.max_map_count=262144
securityContext:
privileged: true
containers:
- name: es-master
image: quay.io/pires/docker-elasticsearch-kubernetes:6.3.2
env:
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: CLUSTER_NAME
value: myesdb
- name: NUMBER_OF_MASTERS
value: "2"
- name: NODE_MASTER
value: "true"
- name: NODE_INGEST
value: "false"
- name: NODE_DATA
value: "false"
- name: HTTP_ENABLE
value: "false"
- name: ES_JAVA_OPTS
value: -Xms512m -Xmx512m
- name: NETWORK_HOST
value: "0.0.0.0"
- name: PROCESSORS
valueFrom:
resourceFieldRef:
resource: limits.cpu
resources:
requests:
cpu: 0.25
limits:
cpu: 1
ports:
- containerPort: 9300
name: transport
livenessProbe:
tcpSocket:
port: transport
initialDelaySeconds: 20
periodSeconds: 10
volumeMounts:
- name: storage
mountPath: /data
volumes:
- emptyDir:
medium: ""
name: "storage"
I have seen a lot of posts talking about increasing the value but I am not sure how to do it. Any help would be appreciated.
Just want to append to this issue:
If you create EKS cluster by eksctl then you can append to NodeGroup creation yaml:
preBootstrapCommand:
- "sed -i -e 's/1024:4096/65536:65536/g' /etc/sysconfig/docker"
- "systemctl restart docker"
This will solve the problem for newly created cluster by fixing docker daemon config.
Update default-ulimit parameter in the file '/etc/docker/daemon.json'
"default-ulimits": {
"nofile": {
"Name": "nofile",
"Soft": 65536,
"Hard": 65536
}
}
and restart docker daemon.
This is the only thing that worked for me using EKS setting up an EFK stack. Add this to your nodegroup creation YAML file under nodeGroups:. Then create your nodegroup and apply your ES pods on it.
preBootstrapCommands:
- "sysctl -w vm.max_map_count=262144"
- "systemctl restart docker"
I was able to get a kubernetes job up and running on AKS (uses docker hub image to process a biological sample and then upload the output to blob storage - this is done with a bash command that I provide in the args section of my yaml file). However, I have 20 samples, and would like to spin up 20 nodes so that I can process the samples in parallel (one sample per node). How do I send each sample to a different node? The "parallelism" option in a yaml file processes all of the 20 samples on each of the 20 nodes, which is not what I want.
Thank you for the help.
if you want each instance of the job to be on a different node, you can use daemonSet, thats exactly what it does, provisions 1 pod per worker node.
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluentd-elasticsearch
namespace: kube-system
labels:
k8s-app: fluentd-logging
spec:
selector:
matchLabels:
name: fluentd-elasticsearch
template:
metadata:
labels:
name: fluentd-elasticsearch
spec:
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
containers:
- name: fluentd-elasticsearch
image: k8s.gcr.io/fluentd-elasticsearch:1.20
resources:
limits:
memory: 200Mi
requests:
cpu: 100m
memory: 200Mi
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
terminationGracePeriodSeconds: 30
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/
Another way of doing that - using pod antiaffinity:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: "app"
operator: In
values:
- zk
topologyKey: "kubernetes.io/hostname"
The requiredDuringSchedulingIgnoredDuringExecution field tells the Kubernetes Scheduler that it should never co-locate two Pods which have app label as zk in the domain defined by the topologyKey. The topologyKey kubernetes.io/hostname indicates that the domain is an individual node. Using different rules, labels, and selectors, you can extend this technique to spread your ensemble across physical, network, and power failure domains
How/where the samples are stored? You could load them (or a pointer to the actual sample) into a queue like Kafka and let the application retrieve each sample once and upload it to the blob after computation. You can then even assure that if a computation fails, another pod will pick it up and restart the computation.
I tried to create a pod with a particular environment for uwsgi configuration , but it was this message :
failed to load "phptime.yml": JSON: I can not unpack the number in the value of the string type Go
when I tried to run this command :
kubectl create -f phptime.yml
I found that trouble in environments that has names like this:
UWSGI_HTTP-MODIFIER1
or
UWSGI_PHP-SAPI-NAME
or
UWSGI_MASTER-AS-ROOT
but with environments that has a next names all ok:
UWSGI_HTTP
or
UWSGI_INCLUDE
A lot of our containers took configuration from environments and I need include all of my conf environments. This is my rc conf:
containers:
- name: phptime
image: ownregistry/phpweb:0.5
env:
- name: UWSGI_UID
value: go
- name: UWSGI_GID
value: go
- name: UWSGI_INCLUDE
value: /var/lib/go-agent/pipelines/test/test-dev0/.uwsgi_dev.ini
- name: UWSGI_PHP-SAPI-NAME
value: apache
- name: UWSGI_HTTP
value: :8086
- name: UWSGI_HTTP-MODIFIER1
value: 14
- name: UWSGI_PIDFILE
value: '/tmp/uwsgi.pid'
- name: UWSGI_MASTER-FIFO
value: '/tmp/fifo0'
- name: UWSGI_MASTER-AS-ROOT
value: 'true'
- name: UWSGI_MASTER
value: 'true'
ports:
- containerPort: 8086
resources:
limits:
cpu: 500m
memory: 200Mi
requests:
cpu: 500m
memory: 200Mi
volumeMounts:
- mountPath: /var/lib/go-agent/pipelines/test/test-dev0/
name: site
readOnly: true
volumes:
- hostPath:
path: /home/user/www/
name: site
Is this kubernetes issue or it`s my? How to solve this? Thanks!
You must quote all of the values that you want to set as environment variables that the yaml parser might interpret as a non-string type.
For example, in influxdb-grafana-controller.yaml the values true and false are quoted because they could be interpreted as booleans. The same constraint applies to purely numerical values.