Using NFS with Dask workers - dask

I have been experimenting with using an NFS shared drive with my user and Dask workers. Is this something that can work? I noticed that Dask created two files in my home directory, global.lock and purge.lock, and did not clean them up when workers were finished. What do these files do?

It is entirely normal to use the NFS to host a user's software environment. The files you're seeing are used by a different system altogether.
When Dask workers run out of space they spill excess data to disk. An NFS can work here, but it's much nicer to use local disk if available. This is usually configurable with the --local-directory dask-worker keyword, or the temporary-directory configuration value.
You can read more about storage issues with NFS and more guidelines here: https://docs.dask.org/en/latest/setup/hpc.html

Yes, Dask can be used with an NFS mound, and indeed you can share configuration/scheduler state between the various processes. Each worker process will use its own temporary storage area. The lock files are safe to ignore, and their existence will depend on exactly the workload you are doing.

Related

What purpose to ephemeral volumes serve in Kubernetes?

I'm starting to learn Kubernetes recently and I've noticed that among the various tutorials online there's almost no mention of Volumes. Tutorials cover Pods, ReplicaSets, Deployments, and Services - but they usually end there with some example microservice app built using a combination of those four. When it comes to databases they simply deploy a pod with the "mongo" image, give it a name and a service so that other pods can see it, and leave it at that. There's no discussion of how the data is written to disk.
Because of this I'm left to assume that with no additional configuration, containers are allowed to write files to disk. I don't believe this implies files are persistent across container restarts, but if I wrote a simple NodeJS application like so:
const fs = require("fs");
fs.writeFileSync("test.txt", "blah");
const value = fs.readFileSync("test.txt", "utf8");
console.log(value);
I suspect this would properly output "blah" and not crash due to an inability to write to disk (note that I haven't tested this because, as I'm still learning Kubernetes, I haven't gotten to the point where I know how to put my own custom images in my cluster yet -- I've only loaded images already on Docker Hub so far)
When reading up on Kubernetes Volumes, however, I came upon the Ephemeral Volume -- a volume that:
get[s] created and deleted along with the Pod
The existence of ephemeral volumes leads me to one of two conclusions:
Containers can't write to disk without being granted permission (via a Volume), and so every tutorial I've seen so far is bunk because mongo will crash when you try to store data
Ephemeral volumes make no sense because you can already write to disk without them, so what purpose do they serve?
So what's up with these things? Why would someone create an ephemeral volume?
Container processes can always write to the container-local filesystem (Unix permissions permitting); but any content that goes there will be lost as soon as the pod is deleted. Pods can be deleted fairly routinely (if you need to upgrade the image, for example) or outside your control (if the node it was on is terminated).
In the documentation, the types of ephemeral volumes highlight two major things:
emptyDir volumes, which are generally used to share content between containers in a single pod (and more specifically to publish data from an init container to the main container); and
injecting data from a configMap, the downward API, or another data source that might be totally artificial
In both of these cases the data "acts like a volume": you specify where it comes from, and where it gets mounted, and it hides any content that was in the underlying image. The underlying storage happens to not be persistent if a pod is deleted and recreated, unlike persistent volumes.
Generally prepackaged versions of databases (like Helm charts) will include a persistent volume claim (or create one per replica in a stateful set), so that data does get persisted even if the pod gets destroyed.
So what's up with these things? Why would someone create an ephemeral volume?
Ephemeral volumes are more of a conceptual thing. The main need for this concept is driven from microservices and orchestration processes, and also guided by 12 factor app. But why?
Because, one major use case is when you are deploying a number of microservices (and their replicas) using containers across multiple machines in a cluster you don't want a container to be reliant on its own storage. If containers rely on their on storage, shutting them down and starting new ones affects the way your app behaves, and this is something everyone wants to avoid. Everyone wants to be able to start and stop containers whenever they want, because this allows easy scaling, updates, etc.
When you actually need a service with persistent data (like DB) you need a special type of configuration, especially if you are running on a cluster of machines. If you are running on one machine, you could use a mounted volume, just to be sure that your data will persist even after container is stopped. But if you want to just load balance across hundreds of stateless API services, ephemeral containers is what you actually want.

Do memory mapped files in Docker containers in Kubernetes work the same as in regular processes in Linux?

I have process A and process B. Process A opens a file, calls mmap and write to it, process B do the same but reads the same mapped region when process A has finished writing.
Using mmap, process B is suppossed to read the file from memory instead of disk assuming process A has not called munmap.
If I would like to deploy process A and process B to diferent containers in the same pod in Kubernetes, is memory mapped IO supposed to work the same way as the initial example? Should container B (process B) read the file from memory as in my regular Linux desktop?
Let's assume both containers are in the same pod and are reading/writing the file from the same persistent volume. Do I need to consider a specific type of volume to achieve mmap IO?
In case you are courious I am using Apache Arrow and pyarrow to read and write those files and achieve zero-copy reads.
A Kubernetes pod is a group of containers that are deployed together on the same host. (reference). So this question is really about what happens for multiple containers running on the same host.
Containers are isolated on a host using a number of different technologies. There are two that might be relevant here. Neither prevent two processes from different containers sharing the same memory when they mmap a file.
The two things to consider are how the file systems are isolated and how memory is ring fenced (limited).
How the file systems are isolated
The trick used is to create a mount namespace so that any new mount points are not seen by other processes. Then file systems are mounted into a directory tree and finally the process calls chroot to set / as the root of that directory tree.
No part of this affects the way processes mmap files. This is just a clever trick on how file names / file paths work for the two different processes.
Even if, as part of that setup, the same file system was mounted from scratch by the two different processes the result would be the same as a bind mount. That means the same file system exists under two paths but it is *the same file system, not a copy.
Any attempt to mmap files in this situation would be identical to two processes in the same namespace.
How are memory limits applied?
This is done through cgroups. cgroups don't really isolate anything, they just put limits on what a single process can do.
But there is a natuarl question to ask, if two processes have different memory limits through cgroups can they share the same shared memory? Yes they can!
Note: file and shmem may be shared among other cgroups. In that case,
mapped_file is accounted only when the memory cgroup is owner of page
cache.
The reference is a little obscure, but describes how memory limits are applied to such situations.
Conclusion
Two processes both memory mapping the same file from the same file system as different containers on the same host will behave almost exactly the same as if the two processes were in the same container.

How to properly manage storage in Jelastic

Okay, another question.
In AWS I have EBS, which allows me to create volumes, define iops/size for them, mount to desired EC2 machines and take snapshots.
How can I achieve same features in Jelastic? I have option to create "Storage Container" but it belongs only to one environment. How can I backup this volume?
Also, what's the best practice of managing storage devices for things like databases? Use separate storage container?
I have option to create "Storage Container" but it belongs only to one environment.
Yes the Storage Container belongs to 1 environment (either part of one of your other environments, or its own), but you can mount it in 1+ other containers (i.e. inside containers of other environments).
You can basically consider a storage container to be similar to AWS EBS: it can be mounted anywhere you like (multiple times even) in containers within environments in the same region.
How can I backup this volume?
Check your hosting provider's backup policy. In our case we perform backups of all containers for our customers for free. Customers do not need to take additional backups themselves. No need for those extra costs and steps... It might be different at some other Jelastic providers so please check this with your chosen provider(s).
If you wish to make your own backups, you can define a script to do it and set it in cron for example. That script can transfer archives to S3 or anything you wish.
what's the best practice of managing storage devices for things like databases?
Just like with AWS, you may experience performance issues if you use remote storage for database access. Jelastic should generally give you lower latency than EBS, but even so I recommend to keep your database storage local (not via storage containers).
Unlike AWS EC2, you do not have the general risk of local storage disappearing (i.e. your Jelastic containers local storage is not ephemeral; you can safely write data there and expect it to be persistent).
If you need multiple database nodes, it is recommended to use database software level clustering features (master-master or master-slave replication for example) instead of sharing the filesystem.
Remember that any shared filesystem is a shared (single) point of failure. What you gain in application / software convenience you may also lose in reliability / high availability. It's often worth making the extra steps in your application to handle this issue another way, or perhaps consider using lsyncd (there are Jelastic marketplace addons for this) to replicate parts of your filesystem instead of mounting a shared storage container.

Docker images across multiple disks

I'm getting going with Docker, and I've found that I can put the main image repository on a different disk (symlink /var/lib/docker to some other location).
However, now I'd like to see if there is a way to split that across multiple disks.
Specifically, I have an old SSD that is blazingly fast to read from, but doesn't have too many writes left until it kicks the can. It would be awesome if I could store the immutable images on here, then have my writeable images on some other location that can handle the writes.
Is this something that is possible? How do you split up the repository?
Maybe you could do this using the AUFS driver and some trickery such as moving layers to the SSD after initially creating them and pointing symlinks at them - I'm not sure, I never had a proper look at how that storage driver worked.
With devicemapper thinp, btrfs and OverlayFS this isnt possible AFAICT:
The Docker dm-thinp and btrfs drivers both build layers one on top of the other using block device snapshot mechanisms. Your best bet here would be to include the SSD in the storage pool and rely on some ability to migrate the r/o snapshots to a specific block device that is part of the pool. Doubt this exists though.
The OverlayFS driver stacks layers by hard-linking files in independent directory structures. Hard-links only work within a filesystem.

File/Directory synchronization across network

I need to synchronize few directories/files within the cluster. Say if a file content changes in one node I need to propagate the change to other nodes so that the file content are same atany point of time.Same applies when some files/directories are deleted. DRBD is not a option so is there any library which can do this for me.
I'd consider using rsync :) A handy tool for syncing between remote hosts.
You need to use a distributed filesystem (GlusterFS comes to mind) that can guarantee the synchronization and locking depending on cluster's usage of the files. Otherwise, you may want to consider a centralized storage served via NFS for the simplicity. Beyond that but still centralized would be a SAN filesystem like GFS but be aware that setup requires more to it like fencing.
Have you considered NFS? SMB? If the updates don't have to be immediate you could consider rsync

Resources