I understand that files / folders can be copied into a container using the command:
kubectl cp /tmp/foo_dir <some-pod>:/tmp/bar_dir
However, I am looking to do this in a yaml file
How would I go about doing this? (Assuming that I am using a deployment for the container)
The way you are going is wrong direction. Kubernetes does this with serveral ways.
first, think about configmap
https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap
You can easily define the configuration files for your application running in container
If you do know the files or folders is exist on worker nodes, you can use hostPath to mount it into container with nominated nodeName: node01 in k8s yaml.
https://kubernetes.io/docs/concepts/storage/volumes/#hostpath
if the files or folders are generated temporarily, you can use emptyDir
https://kubernetes.io/docs/concepts/storage/volumes/#emptydir
You cannot, mapping local files from your workstation is not a feature of Kubernetes.
Related
I'm trying to run spark in an kubernetes cluster as described here https://spark.apache.org/docs/latest/running-on-kubernetes.html
It works fine for some basic scripts like the provided examples.
I noticed that the config folder despite being added to the image build by the "docker-image-tool.sh" is overwritten by a mount of a config map volume.
I have two Questions:
What sources does spark use to generate that config map or how do you edit it? As far as I understand the volume gets deleted when the last pod is deleted and regenerated when a new pod is created
How are you supposed to handle the spark-env.sh script which can't be added to a simple config map?
One initially non-obvious thing about Kubernetes is that changing a ConfigMap (a set of configuration values) is not detected as a change to Deployments (how a Pod, or set of Pods, should be deployed onto the cluster) or Pods that reference that configuration. That expectation can result in unintentionally stale configuration persisting until a change to the Pod spec. This could include freshly created Pods due to an autoscaling event, or even restarts after a crash, resulting in misconfiguration and unexpected behaviour across the cluster.
Note: This doesn’t impact ConfigMaps mounted as volumes, which are periodically synced by the kubelet running on each node.
To update configmap execute:
$ kubectl replace -f file.yaml
You must create a ConfigMap before you can use it. So I recommend firstly modify configMap and then redeploy pod.
Note that container using a ConfigMap as a subPath volume mount will not receive ConfigMap updates.
The configMap resource provides a way to inject configuration data into Pods. The data stored in a ConfigMap object can be referenced in a volume of type configMap and then consumed by containerized applications running in a Pod.
When referencing a configMap object, you can simply provide its name in the volume to reference it. You can also customize the path to use for a specific entry in the ConfigMap.
When a ConfigMap already being consumed in a volume is updated, projected keys are eventually updated as well. Kubelet is checking whether the mounted ConfigMap is fresh on every periodic sync. However, it is using its local ttl-based cache for getting the current value of the ConfigMap. As a result, the total delay from the moment when the ConfigMap is updated to the moment when new keys are projected to the pod can be as long as kubelet sync period (1 minute by default) + ttl of ConfigMaps cache (1 minute by default) in kubelet.
But what I strongly recommend you is to use Kubernetes Operator for Spark. It supports mounting volumes and ConfigMaps in Spark pods to customize them, a feature that is not available in Apache Spark as of version 2.4.
A SparkApplication can specify a Kubernetes ConfigMap storing Spark configuration files such as spark-env.sh or spark-defaults.conf using the optional field .spec.sparkConfigMap whose value is the name of the ConfigMap. The ConfigMap is assumed to be in the same namespace as that of the SparkApplication. Spark on K8S provides configuration options that allow for mounting certain volume types into the driver and executor pods. Volumes are "delivered" from Kubernetes side but they can be delivered from local storage in Spark. If no volume is set as local storage, Spark uses temporary scratch space to spill data to disk during shuffles and other operations. When using Kubernetes as the resource manager the pods will be created with an emptyDir volume mounted for each directory listed in spark.local.dir or the environment variable SPARK_LOCAL_DIRS . If no directories are explicitly specified then a default directory is created and configured appropriately.
Useful blog: spark-kubernetes-operator.
I want to create some docker images that generates text files. However, since images are pushed to Container Registry in GCP. I am not sure where the files will be generated to when I use kubectl run myImage. If I specify a path in the program, like '/usr/bin/myfiles', would they be downloaded to the VM instance where I am typing "kubectl run myImage"? I think this is probably not the case.. What is the solution?
Ideally, I would like all the files to be in one place.
Thank you
Container Registry and Kubernetes are mostly irrelevant to the issue of where a container will persist files it creates.
Some process running within a container that generates files will persist the files to the container instance's file system. Exceptions to this are stdout and stderr which are both available without further ado.
When you run container images, you can mount volumes into the container instance and this provides possible solutions to your needs. Commonly, when running Docker Engine, it's common to mount the host's file system into the container to share files between the container and the host: docker run ... --volume=[host]:[container] yourimage ....
On Kubernetes, there are many types of volumes. An seemingly obvious solution is to use gcePersistentDisk but this has a limitation in that it these disks may only be mounted for write on one pod at a time. A more powerful solution may be to use an NFS-based solution such as nfs or gluster. These should provide a means for you to consolidate files outside of the container instances.
A good solution but I'm unsure whether it is available, would be to write your files as Google Cloud Storage objects.
A tenet of containers is that they should operate without making assumptions about their environment. Your containers should not make assumptions about running on Kubernetes and should not make assumptions about non-default volumes. By this I mean, that your containers will write files to container's file system. When you run the container, you apply the configuration that e.g. provides an NFS volume mount or GCS bucket mount etc. that actually persists the files beyond the container.
HTH!
I am trying to setup the AKS in which I have used azure disk to mount the source code of the application. When I am using kubectl describe pods command then also it is showing as mounted but I dont know how may I copy the code into that?
I got some recommendations that use kubectl cp command but my pod name is changing each time whenever I am deploying so please let me know what should i do?
you'd need to copy files to the disk directly (not to the pod). you can use your pod or worker node to do that. You can use kubectl cp to copy files to the pod and then move it to the mounted disk like you normally would. or you can ssh to the worker node and copy files over ssh to the node and put files to the mounted disk.
I have mounted a hostpath volume in a Kubernetes container. Now I want to mount a configmap file onto the hostpath volume.
Is that possible?
Not really, a larger question would be would you'd want to do that?
The standard way to add configurations in Kubernetes is using ConfigMaps. They are stored in etcd and the size limit is 1MB. When your pod comes up the configuration is mounted on a pod mount point that you can specify in the pod spec.
You may want the opposite which is to use a hostPath that has some configuration and that's possible. Say, that you want to have some config that is larger than 1MB (which is not usual) and have your pod use it. The gotcha here is that you need to put this hostPath and the files in all your cluster nodes where your pod may start.
No. The volume mounts are all about pushing data into pods or persisting data that originates in a pod, and aren't usually a bidirectional data transfer mechanism.
If you want to see what's in a ConfigMap, you can always kubectl get configmap NAME -o yaml to dump it out.
(With some exceptions around things like the Docker socket, hostPath volumes aren't that common in non-Minikube Kubernetes installations, especially once you get into multi-host setups, and I'd investigate other paths to do whatever you're using it for now.)
I have a single file which I want to mount in the container. The file is present in conf folder which also contains other file but the file I only want to mount is helper.conf. Doing this in docker:
docker run -it -v /path/to/dir/:/path/inside/container --name mycontainer dockerimage
Doing this throws below error:
Are you trying to mount a directory onto a file (or vice-versa)? Check
if the specified host path exists and is the expected type
To resolve this, I created another folder with name config inside conf and used below line:
docker run -it -v /path/to/dir/config:/path/inside/container --name mycontainer dockerimage
This works perfectly fine. Same is happening with kubernetes. Is it not possible to just mount a single file from a directory where other files are also present. Am I using wrong keywords for this.?
How can I resolve in Kubernetes?
The answer has been provided by #Ryan Dawson.
The best way to mount a single file in container (in Kubernetes) would be to use ConfigMap:
ConfigMaps allow you to decouple configuration artifacts from image
content to keep containerized applications portable.
ConfigMap can be used in this case to create a resource which will allow us to keep the configuration separate from the container image. As the configuration is a set of key-value pairs, it will allow to expose it as an environment variable that can be put inside the container or a volume. After creating ConfigMap, you will have to create a pod where you specify a ConfigMap it can consume to get necessary values.
In your situation, as Joel B and Tommy Nguyen specified in this Stack Overflow question :
You could use subPath like this to mount single file into existing directory.