Kubernetes Parallelize multiple sample in directory - docker

I was able to get a kubernetes job up and running on AKS (uses docker hub image to process a biological sample and then upload the output to blob storage - this is done with a bash command that I provide in the args section of my yaml file). However, I have 20 samples, and would like to spin up 20 nodes so that I can process the samples in parallel (one sample per node). How do I send each sample to a different node? The "parallelism" option in a yaml file processes all of the 20 samples on each of the 20 nodes, which is not what I want.
Thank you for the help.

if you want each instance of the job to be on a different node, you can use daemonSet, thats exactly what it does, provisions 1 pod per worker node.
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluentd-elasticsearch
namespace: kube-system
labels:
k8s-app: fluentd-logging
spec:
selector:
matchLabels:
name: fluentd-elasticsearch
template:
metadata:
labels:
name: fluentd-elasticsearch
spec:
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
containers:
- name: fluentd-elasticsearch
image: k8s.gcr.io/fluentd-elasticsearch:1.20
resources:
limits:
memory: 200Mi
requests:
cpu: 100m
memory: 200Mi
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
terminationGracePeriodSeconds: 30
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/
Another way of doing that - using pod antiaffinity:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: "app"
operator: In
values:
- zk
topologyKey: "kubernetes.io/hostname"
The requiredDuringSchedulingIgnoredDuringExecution field tells the Kubernetes Scheduler that it should never co-locate two Pods which have app label as zk in the domain defined by the topologyKey. The topologyKey kubernetes.io/hostname indicates that the domain is an individual node. Using different rules, labels, and selectors, you can extend this technique to spread your ensemble across physical, network, and power failure domains

How/where the samples are stored? You could load them (or a pointer to the actual sample) into a queue like Kafka and let the application retrieve each sample once and upload it to the blob after computation. You can then even assure that if a computation fails, another pod will pick it up and restart the computation.

Related

how to run a job in each node of kubernetes instead of daemonset

There is a kubernetes cluster with 100 nodes, I have to clean the specific images manually, I know the kubelet garbage collect may help, but it isn't applied in my case.
After browsing the internet , I found a solution - docker in docker, to solve my problem.
I just wanna remove the image in each node one time, is there any way to run a job in each node one time?
I checked the kubernetes labels and podaffinity, but still no ideas, any body could help?
Also, I tried to use daemonset to solve the problem, but turns out that it can only remove the image for a part of nodes instead of all nodes, I don't what might be the problem...
here is the daemonset example:
kind: DaemonSet
apiVersion: apps/v1
metadata:
name: test-ds
labels:
k8s-app: test
spec:
selector:
matchLabels:
k8s-app: test
template:
metadata:
labels:
k8s-app: test
spec:
containers:
- name: test
env:
- name: DELETE_IMAGE_NAME
value: "nginx"
image: busybox
command: ['sh', '-c', 'curl --unix-socket /var/run/docker.sock -X DELETE http://localhost/v1.39/images/$(DELETE_IMAGE_NAME)']
securityContext:
privileged: true
volumeMounts:
- mountPath: /var/run/docker.sock
name: docker-sock-volume
ports:
- containerPort: 80
volumes:
- name: docker-sock-volume
hostPath:
# location on host
path: /var/run/docker.sock
If you want to run you job on single specific Node you can us the Nodeselector in POD spec
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: test
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: test
image: busybox
args:
- /bin/sh
- -c
- date; echo Hello from the Kubernetes cluster
restartPolicy: OnFailure
nodeSelector:
name: node3
daemon set ideally should resolve your issues, as it creates the PODs on each available Node in the cluster.
You can read more about the affinity at here : https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/
nodeSelector provides a very simple way to constrain pods to nodes
with particular labels. The affinity/anti-affinity feature, greatly
expands the types of constraints you can express. The key enhancements
are
The affinity/anti-affinity language is more expressive. The language
offers more matching rules besides exact matches created with a
logical AND operation;
You can use the Affinity in Job YAML something like
apiVersion: v1
kind: Pod
metadata:
name: with-node-affinity
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/e2e-az-name
operator: In
values:
- e2e-az1
- e2e-az2
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: another-node-label-key
operator: In
values:
- another-node-label-value
containers:
- name: with-node-affinity
image: k8s.gcr.io/pause:2.0
Update
Now if you have issue with the Deamon affinity with the Job is also useless, as Job will create the Single POD which will get schedule to Single node as per affinity. Either create 100 job with different affinity rules or you use Deployment + Affinity to schedule the Replicas on different nodes.
We will create one Deployment with POD affinity and make sure, multiple PODs of a single deployment won't get scheduled on one Node.
apiVersion: apps/v1
kind: Deployment
metadata:
name: test-deployment
labels:
app: test
spec:
replicas: 100
selector:
matchLabels:
app: test
template:
metadata:
labels:
app: test
spec:
containers:
- name: test
image: <Image>
ports:
- containerPort: 80
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: "app"
operator: In
values:
- test
topologyKey: "kubernetes.io/hostname"
Try using this deployment template and replace your image here. You can reduce replicas first to 10 instead of 100 to check it's spreading PODs or not.
Read more at : https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#an-example-of-a-pod-that-uses-pod-affinity
Extra :
You can also write and use your custom CRD : https://github.com/darkowlzz/daemonset-job which will behave as daemon set and job

Why do I have so many duplicated processes?

I'm stressing my kubernetes API and I found out that every request is creating a process inside the Worker Node.
Deployment YAML:
apiVersion: apps/v1
kind: Deployment
metadata:
name: ${KUBE_APP_NAME}-deployment
namespace: ${KUBE_NAMESPACE}
labels:
app_version: ${KUBE_APP_VERSION}
spec:
replicas: 2
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
selector:
matchLabels:
app_name: ${KUBE_APP_NAME}
template:
metadata:
labels:
app_name: ${KUBE_APP_NAME}
spec:
containers:
- name: ${KUBE_APP_NAME}
image: XXX:${KUBE_APP_VERSION}
imagePullPolicy: Always
env:
- name: MONGODB_URI
valueFrom:
secretKeyRef:
name: mongodb
key: uri
- name: JWT_PASSWORD
valueFrom:
secretKeyRef:
name: jwt
key: password
ports:
- containerPort: 80
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
imagePullSecrets:
- name: regcred
Apache Bench used ab -p payload.json -T application/json -c 10 -n 2000
Why is this?
It's hard to answer your questions if this is normal that the requests are being kept open.
We don't know what exactly is your payload and how big it is. We also don't know if the image that you are using is handling those correctly.
You should use verbose=2 ab -v2 <host> and check what it taking so long.
You are using Apache Bench with -c 10 -n 2000 options which means there will be:
-c 10 concurrent connections at a time,
-n 2000 request total
You could use -k to enable HTTP KeepAlive
-k
Enable the HTTP KeepAlive feature, i.e., perform multiple requests within one HTTP session. Default is no KeepAlive.
It would be easier if you provided the output of using the ab.
As for the Kubernetes part.
We can read a definition of a pod available at Viewing Pods and Nodes:
A Pod is a Kubernetes abstraction that represents a group of one or more application containers (such as Docker or rkt), and some shared resources for those containers
...
The containers in a Pod share an IP Address and port space, are always co-located and co-scheduled, and run in a shared context on the same Node.

persistentVolumeReclaimPolicy on directly mounted NFS volumes - kubernetes

I have directly mounted NFS volume for mysql data, need to implement storage policy for retaining data across pod deletion, and to avoid any corruption. please recommend some useful.
I did not find a way to enable persistentVolumeReclaimPolicy: Retain in directly mounted volumes . I know it can be done from PV/PVC creation but is it possible from statefulset volumes... Some guidelines is needed in understanding the yaml options for a particular object, where to get all the options(parameters) available for an object. currently googling for each options and trying - so hard.
I could not mount a configmap file (my.cnf) to a file in the pod. it removes the underlying files in the mountpath. curious to know how it is handled generally, do we need separate mount path for each config file.
code block
apiVersion: v1
kind: Service
metadata:
name: mymariadb
labels:
app: mymariadb
spec:
ports:
- port: 3306
name: mysql
targetPort: mysql
nodePort: 30003
type: NodePort
selector:
app: mymariadb
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mymariadb
labels:
app: mymariadb
spec:
serviceName: "mymariadb"
selector:
matchLabels:
app: mymariadb
template:
metadata:
labels:
app: mymariadb
spec:
containers:
- name: mariadb
image: mariadb:10.3.7
env:
- name: MYSQL_ROOT_PASSWORD
value: xxxx
ports:
- name: mysql
containerPort: 3306
volumeMounts:
- name: data
mountPath: /data
subPath: mysql
- name: conf
mountPath: /etc/mysql # /conf.d removing files
resources:
requests:
cpu: 500m
memory: 2Gi
volumes:
- name: data
nfs:
server: 10.12.32.41
path: /data/mymariadb
spec:
persistentVolumeReclaimPolicy: Retain # not taking
- name: conf
configMap:
name: mycustconf
items:
- key: my.cnf
path: my.cnf
Firstly, I did not suggest nfs mount in Kubernetes platform for two reasons. From security perspective, another container can access the nfs mount on the worker nodes. The Second, from performances perspective, the connection between worker nodes and storage will be slower, to compare to another solutions. As you know, performance is so critical for db connections. I think you should evaluate that.
I suggest to you use one of the Cloud Native Storages. You can view them in the link below. Ceph and Gluster are popular products.
https://landscape.cncf.io/category=cloud-native-storage&format=card-mode&grouping=category
If you really want to continue with the nfs solution, you can check two points:
1) Did you check the access list on the storage appliance? You should see the worker nodes for the nfs mount.
2) After you try to mount the nfs storage on the worker nodes, you can try to import the deployment on your kubernetes cluster.

Kubernetes: how to assign pods to all nodes with sepcific label

I want to run certain job on every single node in specific node groups.
Can kubernetes do the same thing like Swarm global mode?
To complete #David Maze answer:
A DaemonSet is used to create a Pod on each Node. More information here.
Example:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluentd-elasticsearch
namespace: kube-system
labels:
k8s-app: fluentd-logging
spec:
selector:
matchLabels:
name: fluentd-elasticsearch
template:
metadata:
labels:
name: fluentd-elasticsearch
spec:
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
containers:
- name: fluentd-elasticsearch
image: k8s.gcr.io/fluentd-elasticsearch:1.20
resources:
limits:
memory: 200Mi
requests:
cpu: 100m
memory: 200Mi
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
terminationGracePeriodSeconds: 30
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
If you want to schedule Pods not on every Node, you can use Taints and Tolerations concept. Taints and tolerations work together to ensure that pods are not scheduled onto inappropriate nodes. For more information, look through the link.
For example:
You can add a Taint to a Node:
kubectl taint nodes <Node_name> key=value:NoSchedule
After that, Pods will have no opportunity to schedule on that Node, even from a DaemonSet.
You can add toleration to a Pod (or to a DaemonSet in your case) to allow it schedule on the Node with the toleration:
tolerations:
- key: "key"
operator: "Equal"
value: "value"
effect: "NoSchedule"
You're looking for a DaemonSet.

Multidrive access for Kubernetes Nodes

So, forgive me. I just started learning docker and kubernets a month ago.
I've got this to the point where I have my .yml file that takes my Minecraft server and runs it. I now want ftp access. Currently, there's a drive for the world folder and the config folder for the server (since I can't put the entire directory on a mounted drive (right?) and those two folders need to save every time the image is rebuilt).
So, I want to be able to access /config. Preferably while the minecraft node is still reading and writing. A few questions here.
How do I make the most minimal FTP image possible when making the docker file for it? I am unable to figure out a scenario. Best I have is a base image on python:alpine and to use something like this
Is it even possible to have the node access the drive when it's in use by another? Or do I have to make some custom script in the interface im making that turns off the minecraft server and then starts up the FTP node?
Current yml:
apiVersion: v1
kind: Service
metadata:
name: lapitos
labels:
type: lapitos
spec:
type: LoadBalancer
ports:
- name: minecraft
port: 25565
protocol: TCP
targetPort: 25565
- name: minecraft-rcon
port: 25575
protocol: TCP
targetPort: 25575
selector:
app: lapitos
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: lapitos
spec:
serviceName: lapitos
replicas: 1
selector:
matchLabels:
app: lapitos
template:
metadata:
labels:
app: lapitos
spec:
containers:
- name: lapitos
image: gcr.io/mchostingnet-202204/lapitosbeta2
resource:
limits:
cpu: "2"
requests:
cpu: "2"
ports:
- containerPort: 25565
name: minecraft
volumeMounts:
- name: world
mountPath: /world
- name: config
mountPath: /config
- name: logs
mountPath: /logs
volumeClaimTemplates:
- metadata:
name: world
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 25Gi
- metadata:
name: config
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 1Gi
- metadata:
name: logs
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 1Gi
1.- Grab an ftp image that suits you from any registry and use it, instead of making your own. If still is a requirement, I don't know.
Note: Compute Engine has got port 21 blocked.
2.- Yes, you can. Volume access modes:
ReadWriteOnce – the volume can be mounted as read-write by a single node
ReadOnlyMany – the volume can be mounted read-only by many nodes
ReadWriteMany – the volume can be mounted as read-write by many nodes

Resources