Add a Persistent Volume Claim to a Kubernetes Dask Cluster - dask

I am running a Dask cluster and a Jupyter notebook server on cloud resources using Kubernetes and Helm
I am using a yaml file for the Dask cluster and Jupyter, initially taken from https://docs.dask.org/en/latest/setup/kubernetes-helm.html:
apiVersion: v1
kind: Pod
worker:
replicas: 2 #number of workers
resources:
limits:
cpu: 2
memory: 2G
requests:
cpu: 2
memory: 2G
env:
- name: EXTRA_PIP_PACKAGES
value: s3fs --upgrade
# We want to keep the same packages on the workers and jupyter environments
jupyter:
enabled: true
env:
- name: EXTRA_PIP_PACKAGES
value: s3fs --upgrade
resources:
limits:
cpu: 1
memory: 2G
requests:
cpu: 1
memory: 2G
an I am using another yaml file to create the storage locally.
#CREATE A PERSISTENT VOLUME CLAIM // attached to our pod config
apiVersion: 1
kind: PersistentVolumeClaim
metadata:
name: dask-cluster-persistent-volume-claim
spec:
accessModes:
- ReadWriteOne #can be used by a single node -ReadOnlyMany : for multiple nodes -ReadWriteMany: read/written to/by many nodes
ressources:
requests:
storage: 2Gi # storage capacity
I would like to add a persistent volume claim to the first yaml file, I couldn't figure out where the add volumes and volumeMounts.
if you have an idea, please share it, thank you

I started by creating a pvc claim with the YAML file:
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: pdask-cluster-persistent-volume-claim
spec:
accessModes:
- ReadWriteOnce #can be used by a single node -ReadOnlyMany : for multiple nodes -ReadWriteMany: read/written to/by many nodes
resources: # https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes
requests:
storage: 2Gi
with lunching in bash:
kubectl apply -f Dask-Persistent-Volume-Claim.yaml
#persistentvolumeclaim/pdask-cluster-persistent-volume-claim created
I checked the creation of persitent volume:
kubectl get pv
I made major changes to the Dask cluster YAML: I added the volumes and volumeMounts where I read/write from a directory /data from the persistent volume created previously, I specified ServiceType to LoadBalancer with port:
apiVersion: v1
kind: Pod
scheduler:
name: scheduler
enabled: true
image:
repository: "daskdev/dask"
tag: 2021.8.1
pullPolicy: IfNotPresent
replicas: 1 #(should always be 1).
serviceType: "LoadBalancer" # Scheduler service type. Set to `LoadBalancer` to expose outside of your cluster.
# serviceType: "NodePort"
# serviceType: "ClusterIP"
#loadBalancerIP: null # Some cloud providers allow you to specify the loadBalancerIP when using the `LoadBalancer` service type. If your cloud does not support it this option will be ignored.
servicePort: 8786 # Scheduler service internal port.
# DASK WORKERS
worker:
name: worker # Dask worker name.
image:
repository: "daskdev/dask" # Container image repository.
tag: 2021.8.1 # Container image tag.
pullPolicy: IfNotPresent # Container image pull policy.
dask_worker: "dask-worker" # Dask worker command. E.g `dask-cuda-worker` for GPU worker.
replicas: 2
resources:
limits:
cpu: 2
memory: 2G
requests:
cpu: 2
memory: 2G
mounts: # Worker Pod volumes and volume mounts, mounts.volumes follows kuberentes api v1 Volumes spec. mounts.volumeMounts follows kubernetesapi v1 VolumeMount spec
volumes:
- name: dask-storage
persistentVolumeClaim:
claimName: pvc-dask-data
volumeMounts:
- name: dask-storage
mountPath: /save_data # folder for storage
env:
- name: EXTRA_PIP_PACKAGES
value: s3fs --upgrade
# We want to keep the same packages on the worker and jupyter environments
jupyter:
name: jupyter # Jupyter name.
enabled: true # Enable/disable the bundled Jupyter notebook.
#rbac: true # Create RBAC service account and role to allow Jupyter pod to scale worker pods and access logs.
image:
repository: "daskdev/dask-notebook" # Container image repository.
tag: 2021.8.1 # Container image tag.
pullPolicy: IfNotPresent # Container image pull policy.
replicas: 1 # Number of notebook servers.
serviceType: "LoadBalancer" # Scheduler service type. Set to `LoadBalancer` to expose outside of your cluster.
# serviceType: "NodePort"
# serviceType: "ClusterIP"
servicePort: 80 # Jupyter service internal port.
# This hash corresponds to the password 'dask'
#password: 'sha1:aae8550c0a44:9507d45e087d5ee481a5ce9f4f16f37a0867318c' # Password hash.
env:
- name: EXTRA_PIP_PACKAGES
value: s3fs --upgrade
resources:
limits:
cpu: 1
memory: 2G
requests:
cpu: 1
memory: 2G
mounts: # Worker Pod volumes and volume mounts, mounts.volumes follows kuberentes api v1 Volumes spec. mounts.volumeMounts follows kubernetesapi v1 VolumeMount spec
volumes:
- name: dask-storage
persistentVolumeClaim:
claimName: pvc-dask-data
volumeMounts:
- name: dask-storage
mountPath: /save_data # folder for storage
Then, I install my Daskconfiguration using helm:
helm install my-config dask/dask -f values.yaml
Finally, I accessed my jupyter interactively:
kubectl exec -ti [pod-name] -- /bin/bash
to examine the existence of the /data folder

Related

Accessing CIFS files from pods

We have a docker image that is processing some files on a samba share.
For this we created a cifs share which is mounted to /mnt/dfs and files can be accessed in the container with:
docker run -v /mnt/dfs/project1:/workspace image
Now what I was aked to do is get the container into k8s and to acces a cifs share from a pod a cifs Volume driver usiong FlexVolume can be used. That's where some questions pop up.
I installed this repo as a daemonset
https://k8scifsvol.juliohm.com.br/
and it's up and running.
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: cifs-volumedriver-installer
spec:
selector:
matchLabels:
app: cifs-volumedriver-installer
template:
metadata:
name: cifs-volumedriver-installer
labels:
app: cifs-volumedriver-installer
spec:
containers:
- image: juliohm/kubernetes-cifs-volumedriver-installer:2.4
name: flex-deploy
imagePullPolicy: Always
securityContext:
privileged: true
volumeMounts:
- mountPath: /flexmnt
name: flexvolume-mount
volumes:
- name: flexvolume-mount
hostPath:
path: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/
Next thing to do is add a PeristentVolume, but that needs a capacity, 1Gi in the example. Does this mean that we lose all data on the smb server? Why should there be a capacity for an already existing server?
Also, how can we access a subdirectory of the mount /mnt/dfs from within the pod? So how to access data from /mnt/dfs/project1 in the pod?
Do we even need a PV? Could the pod just read from the host's mounted share?
apiVersion: v1
kind: PersistentVolume
metadata:
name: mycifspv
spec:
capacity:
storage: 1Gi
flexVolume:
driver: juliohm/cifs
options:
opts: sec=ntlm,uid=1000
server: my-cifs-host
share: /MySharedDirectory
secretRef:
name: my-secret
accessModes:
- ReadWriteMany
No, that field has no effect on the FlexVol plugin you linked. It doesn't even bother parsing out the size you pass in :)
Managed to get it working with the fstab/cifs plugin.
Copy its cifs script to /usr/libexec/kubernetes/kubelet-plugins/volume/exec and give it execute permissions. Also restart kubelet on all nodes.
https://github.com/fstab/cifs
Then added
containers:
- name: pablo
image: "10.203.32.80:5000/pablo"
volumeMounts:
- name: dfs
mountPath: /data
volumes:
- name: dfs
flexVolume:
driver: "fstab/cifs"
fsType: "cifs"
secretRef:
name: "cifs-secret"
options:
networkPath: "//dfs/dir"
mountOptions: "dir_mode=0755,file_mode=0644,noperm"
Now there is the /data mount inside the container pointing to //dfs/dir

AKS - How to mount volume with file for pod/image

I am kind of new to AKS deployment with volume mount. I want to create a pod in AKS with image; that image needs a volume mount with config.yaml file (that I already have and needs to be passed to that image to run successfully).
Below is the docker command that is working on local machine.
docker run -v <Absolute_path_of_config.yaml>:/config.yaml image:tag
I want to achieve same thing in AKS. When I tried to deploy same using Azure File Mount (with PersistentVolumeClaim) volume is getting attached. The question now is how to pass config.yaml file to that pod. I tried uploading config.yaml file to Azure File Share Volume that is attached in POD deployment without any success.
Below is the pod deployment file that I used
kind: Pod
apiVersion: v1
metadata:
name: mypod
spec:
containers:
- name: mypod
image: image:tag
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 250m
memory: 1Gi
volumeMounts:
- mountPath: "/config.yaml"
name: volume
volumes:
- name: volume
persistentVolumeClaim:
claimName: my-azurefile-storage
Need help regarding how I can use that local config.yaml file for aks deployment so image can run properly.
Thanks in advance.
Create a kubernetes secret using config.yaml file.
kubectl create secret generic config-yaml --from-file=config.yaml
Mount it as a volume in the pod.
apiVersion: v1
kind: Pod
metadata:
name: config
spec:
containers:
- name: config
image: alpine
command:
- cat
resources: {}
tty: true
volumeMounts:
- name: config
mountPath: /config.yaml
subPath: config.yaml
volumes:
- name: config
secret:
secretName: config-yaml
Exec to the pod and view the file.
kubectl exec -it config sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
/ # ls
bin dev home media opt root sbin sys usr
config.yaml etc lib mnt proc run srv tmp var
/ # cat config.yaml
---
apiUrl: "https://my.api.com/api/v1"
username: admin
password: password

Kubernetes DaemonSet Permission Denied on mounted Volume - Docker in Docker dind

I tried running simple DaemonSet on kube cluster - the Idea was that other kube pods would connect to that containers docker daemon (dockerd) and execute commands on it. (The other pods are Jenkins slaves and would have just env DOCKER_HOST point to 'tcp://localhost:2375'); In short the config looks like this:
dind.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: dind
spec:
selector:
matchLabels:
name: dind
template:
metadata:
labels:
name: dind
spec:
# tolerations:
# - key: node-role.kubernetes.io/master
# effect: NoSchedule
containers:
- name: dind
image: docker:18.05-dind
resources:
limits:
memory: 2000Mi
requests:
cpu: 100m
memory: 500Mi
volumeMounts:
- name: dind-storage
mountPath: /var/lib/docker
volumes:
- name: dind-storage
emptyDir: {}
Error message when running
mount: mounting none on /sys/kernel/security failed: Permission denied
Could not mount /sys/kernel/security.
AppArmor detection and --privileged mode might break.
mount: mounting none on /tmp failed: Permission denied
I took the idea from medium post that didn't describe it fully: https://medium.com/hootsuite-engineering/building-docker-images-inside-kubernetes-42c6af855f25 describing docker of docker, docker in docker and Kaniko
found the solution
apiVersion: v1
kind: Pod
metadata:
name: dind
spec:
containers:
- name: jenkins-slave
image: gcr.io/<my-project>/myimg # it has docker installed on it
command: ['docker', 'run', '-p', '80:80', 'httpd:latest']
resources:
requests:
cpu: 10m
memory: 256Mi
env:
- name: DOCKER_HOST
value: tcp://localhost:2375
- name: dind-daemon
image: docker:18.05-dind
resources:
requests:
cpu: 20m
memory: 512Mi
securityContext:
privileged: true
volumeMounts:
- name: docker-graph-storage
mountPath: /var/lib/docker
volumes:
- name: docker-graph-storage
emptyDir: {}

Docker container does/doesnt work inside kubernetes

I am a bit confused here. It does work as normal docker container but when it goes inside a pod it doesnt. So here is how i do it.
Dockerfile in my local to create the image and publish to docker registry
FROM alpine:3.7
COPY . /var/www/html
CMD tail -f /dev/null
Now if i just pull the image(after deleting the local) and run as a container. It works and i can see my files inside /var/www/html.
Now i want to use that inside my kubernetes cluster.
Def : Minikube --vm-driver=none
I am running kube inside minikube with driver none option. So for single node cluster.
EDIT
I can see my data inside /var/www/html if i remove volume mounts and claim from deployment file.
Deployment file
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
labels:
io.kompose.service: app
name: app
spec:
replicas: 1
strategy:
type: Recreate
template:
metadata:
creationTimestamp: null
labels:
io.kompose.service: app
spec:
securityContext:
runAsUser: 1000
runAsGroup: 1000
containers:
- image: kingshukdeb/mycode
name: pd-mycode
resources: {}
volumeMounts:
- mountPath: /var/www/html
name: claim-app-storage
restartPolicy: Always
volumes:
- name: claim-app-storage
persistentVolumeClaim:
claimName: claim-app-nginx
status: {}
PVC file
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
creationTimestamp: null
labels:
io.kompose.service: app-nginx1
name: claim-app-nginx
spec:
storageClassName: testmanual
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Mi
status: {}
PV file
apiVersion: v1
kind: PersistentVolume
metadata:
name: app-nginx1
labels:
type: local
spec:
storageClassName: testmanual
capacity:
storage: 100Mi
accessModes:
- ReadWriteOnce
hostPath:
path: "/data/volumes/app"
Now when i run these files it creates the pod, pv, pvc and pvc is bound to pv. But if i go inside my container i dont see my files. hostpath is /data/volumes/app . Any ideas will be appreciated.
When PVC is bound to a pod, volume is mounted in location described in pod/deployment yaml file. In you case: mountPath: /var/www/html. That's why files "baked into" container image are not accessible (simple explanation why here)
You can confirm this by exec to the container by running kubectl exec YOUR_POD -i -t -- /bin/sh, and running mount | grep "/var/www/html".
Solution
You may solve this in many ways. It's best practice to keep your static data separate (i.e. in PV), and keep the container image as small and fast as possible.
If you transfer files you want to mount in PV to your hosts path /data/volumes/app they will be accessible in your pod, then you can create new image omitting the COPY operation. This way even if pod crashes changes to files made by your app will be saved.
If PV will be claimed by more than one pod, you need to change accessModes as described here:
The access modes are:
ReadWriteOnce – the volume can be mounted as read-write by a single node
ReadOnlyMany – the volume can be mounted read-only by many nodes
ReadWriteMany – the volume can be mounted as read-write by many nodes
In-depth explanation of Volumes in Kubernetes docs: https://kubernetes.io/docs/concepts/storage/persistent-volumes/

Kubernetes access Persistance volume mount externally

I have setup kubernetes Cluster and mounted volume mount as gcePersistentDisk in Google Cloud, It claims and mount successfully in Pods.
But i want to access this volume externally so that i can write it through git/ssh or manual. As disk is Already used and mounted i cannot access it.
How to write files through externally?
gcePersistentDisk is a network-based disk, and provisioned volumes can only be used by GCE
instances in the same project and zone.
The fact is that this kind of resource supports readWriteOnce and ReadOnlyMany.
You can use a GCE persistent storage to share data as read-only between multiple pods in the
same zone.
Back to your question: you can write on this volume only from one pod. No other pods can
use it as write storage - neither external nor from the same project.
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: php
labels:
app: php
spec:
replicas: 1
selector:
matchLabels:
app: php
template:
metadata:
labels:
app: php
spec:
containers:
- image: php:7.1-apache
imagePullPolicy: Always
name: php
resources:
requests:
cpu: 200m
ports:
- containerPort: 80
name: php
volumeMounts:
- name: php-persistent-storage
mountPath: /var/www
volumes:
- name: php-persistent-storage
gcePersistentDisk:
pdName: php-phantomjs-disk
fsType: ext4

Resources