is docker storage driver a persistent storage - docker

i'm new to docker and i'm trying to understand persistent storage in docker.
in the section Manage Application data > Store data within containers > About storage drivers
https://docs.docker.com/storage/storagedriver/ Storage drivers allow
you to create data in the writable layer of your container. The files
won’t be persisted after the container is deleted, and both read and
write speeds are lower than native file system performance.
but later on the section Manage Application data > Store data within containers > Use the Device Mapper storage driver
https://docs.docker.com/storage/storagedriver/device-mapper-driver/
they use direct-lvm that creates logical volumes that allow to persist data
my question : using a storage driver means :
the container-generated data is ephemeral ?
the container-generated data is ephemeral if we are using a logical
volume on a loopback-device ?
the container-generated data is persistent if we are using a logical
volume on a block device ?

The storage driver configuration is essentially an install-time setting that's not really relevant once you've gotten it set up correctly. In particular if you run docker info and it says it's using an overlay2 driver I would recommend closing this particular browser tab and not changing anything.
Of the paragraph you quoted, the important thing to take away is that files you create inside a container, that aren't inside a volume directory, will be lost as soon as the container is deleted. It doesn't matter what underlying storage driver you're using. The performance differences between the container filesystem, named volumes, and bind-mounts almost never matter (except on MacOS hosts where bind mounts are very very slow).
The data the storage driver persists includes both the temporary container filesystems (they get persisted until the container is deleted) and the underlying image data. It does not include named Docker volumes or other bind-mounted host directories.
If you're using devicemapper, you might see if you can upgrade your host to a newer Linux distribution that can use the overlay2 driver. In particular that avoids the fixed space limit of the devicemapper driver. If you must use devicemapper, general wisdom has been that using a dedicated partition for it is better than using a file. As I said up front, though, this is essentially install-time configuration and has no bearing on your application or docker run commands.

Related

How to bypass memory caching while using FIO inside of a docker container?

I am trying to benchmark I/O performance on my host and docker container using flexible IO tool with O_direct enabled in order to bypass memory caching. The result is very suspicious. docker performs almost 50 times better than my host machine which is impossible. It seems like docker is not bypassing the caching at all. even if I ran it with --privileged mode. This is the command I ran inside of a container, Any suggestions?
fio --name=seqread --rw=read --direct=1 --ioengine=libaio --bs=4k --numjobs=1 --size=10G --runtime=600 --group_reporting --output-format=json >/home/docker/docker_seqread_4k.json
(Note this isn't really a programming question so Stackoverflow is the wrong place to ask this... Maybe Super User or Serverfault would be a better choice and get faster answers?)
The result is very suspicious. docker performs almost 50 times better than my host machine which is impossible. It seems like docker is not bypassing the caching at all.
If your best case latencies are suspiciously small compared to your worst case latencies it is highly likely your suspicions are well founded and that kernel caching is still happening. Asking for O_DIRECT is a hint not an order and the filesystem can choose to ignore it and use the cache anyway (see the part about "You're asking for direct I/O to a file in a filesystem but...").
If you have the option and you're interested in disk speed, it is better to do any such test outside of a container (with all the caveats that implies). Another option when you can't/don't want to disable caching is ensure that you do I/O that is at least two to three times the size (both in terms of amount and the region being used) of RAM so the majority of I/O can't be satisfied by buffers/cache (and if you're doing write I/O then do something like end_fsync=1 too).
In summary, the filesystem being used by docker may make it impossible to accurately do what you're requesting (measure the disk speed by bypassing cache while using whatever your default docker filesystem is).
Why a Docker benchmark may give the results you expect
The Docker engine uses, by default, the OverlayFS [1][2] driver for data storage in a containers. It assembles all of the different layers from the images and makes them readable. Writing is always done to the "top" layer, which is the container storage.
When performing reads and writes to the container's filesystem, you're passing through Docker's overlay2 driver, through the OverlayFS kernel driver, through your filesystem driver (e.g. ext4) and onto your block device. Additionally, as Anon mentioned, DIRECT/O_DIRECT is just a hint, and may not be respected by any of the layers you're passing through.
Getting more accurate results
To get an accurate benchmarks within a Docker container, you should write to a volume mount or change your storage driver to one that is not overlaid, such as the Device Mapper driver or the ZFS driver.
Both the Device Mapper driver and the ZFS driver require a dedicated block device (you'll likely need a separate hard drive), so using a volume mount might be the easiest way to do this.
Use a volume mount
Use the -v options with a directory that sits on a block device on your host.
docker run -v /absolute/host/directory:/container_mount_point alpine
Use a different Docker storage driver
Note that the storage driver must be changed on the Docker daemon (dockerd) and cannot be set per container. From the documentation:
Important: When you change the storage driver, any existing images and containers become inaccessible. This is because their layers cannot be used by the new storage driver. If you revert your changes, you can access the old images and containers again, but any that you pulled or created using the new driver are then inaccessible.
With that disclaimer out of the way, you can change your storage driver by editing daemon.json and restarting dockerd.
{
"storage-driver": "devicemapper",
"storage-opts": [
"dm.directlvm_device=/dev/sd_",
"dm.thinp_percent=95",
"dm.thinp_metapercent=1",
"dm.thinp_autoextend_threshold=80",
"dm.thinp_autoextend_percent=20",
"dm.directlvm_device_force=false"
]
}
Additional container benchmark notes - kernel
If you are trying to compare different flavors of Linux, keep in mind that Docker is still running on your host machine's kernel.

GCP: how to access cloud storage bucket from a VM instance

I'm trying to deploy and run a docker image in a GCP VM instance.
I need it to access a certain Cloud Storage Bucket (read and write).
How do I mount a bucket inside the VM? How do I mount a bucket inside the Docker container running in my VM?
I've been reading google cloud documentation for a while, but I'm still confused. All guides show how to access a bucket from a local machine, and not how to mount it to VM.
https://cloud.google.com/storage/docs/quickstart-gsutil
Found something about Fuse, but it looks overly complicated for just mounting a single bucket to VM filesystem.
Google Cloud Storage is a object storage API, it is not a filesystem. As a result, it isn't really designed to be "mounted" within a VM. It is designed to be highly durable and scalable to extraordinarily large objects (and large numbers of objects).
Though you can use gcsfuse to mount it as a filesystem, that method has pretty significant drawbacks. For example, it can be expensive in operation count to do even simple operations for a normal filesystem.
Likewise, there are many surprising behaviors that are a result of the fact that it is an object store. For example, you can't edit objects -- they are immutable. To give the illusion of writing to the middle of an object, the object is, in effect, deleted and recreated whenever a call to close() or fsync() happens.
The best way to use GCS is to design your application to use the API (or the S3 compatible API) directly. That way the semantics are well understood by the application, and you can optimize for them to get better performance and control your costs. Thus, to access it from your docker container, ensure your container has a way to authenticate through GCS (either through the credentials on the instance, or by deploying a key for a service account with the necessary permissions to access the bucket), then have the application call the API directly.
Finally, if what you need is actually a filesystem, and not specifically GCS, Google Cloud does offer at least 2 other options if you need a large mountable filesystem that is designed for that specific use case:
Persistent Disk, which is the baseline filesystem that you get with a VM, but you can mount many of these devices on a single VM. However, you can't mount them read/write to multiple VMs at once -- if you need to mount to multiple VMs, the persistent disk must be read only for all instances they are mounted to.
Cloud Filestore is a managed service that provides an NFS server in front of a persistent disk. Thus, the filesystem can be mounted read/write and shared across many VMs. However it is significantly more expensive (as of this writing, about $0.20/GB/month vs $0.04/GB/month in us-central1) than PD, and there are minimum size requirements (1TB).
Google Cloud Storage buckets cannot be mounted in Google Compute instances or containers without third-party software such as FUSE. Neither Linux nor Windows have built-in drivers for Cloud Storage.
GCS VM comes with google cloud SDK installed. So without mounting you can copy in and out files using those commands.
gsutil ls gs://

Can multiple clients of Infinispan replicated cache share the same persistent file store?

Cross posted at https://developer.jboss.org/thread/279735
Suppose we have multiple docker containers (each of which has java webapps, so multiple JVMs essentially), each of which are using the same replicated Infinispan cache either in:
Embedded mode using jgroups for discovery
Server mode using Docker Hub as Infinispan server, and the clients connecting via hotrod.
In either case, all the cache members/clients have a file-store from which they preload at startup (sample is for server mode, its similar for embedded mode's xml):
Via docker compose, the path inside each container's file store (lets say /var/tmp/server/OUR_CACHE.dat) is bind-mounted to the same file on the docker host.
<paths>
<path name="cachestore.root" path="/var/tmp"/>
</paths>
...
<local-cache name="OUR_CACHE">
<expiration lifespan="-1"/>
<locking isolation="SERIALIZABLE" acquire-timeout="30000" concurrency-level="1000" striping="false"/>
<file-store relative-to="cachestore.root" path="server" max-entries="-1" purge="false" passivation="false" preload="true" fetch-state="true"/>
<memory>
<binary size="100000000" eviction="MEMORY"/>
</memory>
</local-cache>
My question is that - is the persistence mechanism designed so that multiple replicated cache clients can read/write to/from the same file store without any errors?
My understanding is that in replicated mode, the key values will become eventually consistent across all node's memory. But my goal is to ensure that multiple client containers using the same file-store for persistence and preloading should not adversely affect this replication.
If its not advised to share the same file-store .dat file, then what is the best practice to have a GUID in the filepath for each container's path in the .xml file.
Each docker container's hostname (which is containerId) is unique, but it won't be known before its deployed, so it won't be easy to populate the infinispan_server.xml file with the value for "path" statically.
Thanks,
_Prateek
Isn't the point of replicated mode that each node has its own independent copy of the data? I don't fully understand what you're trying to achieve.
To the last point of your question:
If its not advised to share the same file-store .dat file, then what
is the best practice to have a GUID in the filepath for each
container's path in the .xml file. Each docker container's hostname
(which is containerId) is unique, but it won't be known before its
deployed, so it won't be easy to populate the infinispan_server.xml
file with the value for "path" statically.
Can you put a placeholder in the config xml (e.g.: ${myprop}) and specify it at startup (e.g.: -Dmyprop=hostname01)?

In docker, how are storage driver and backing file system different?

The docker info command lists both storage driver e.g. device-mapper and backing filesystem e.g. XFS. What is the meaning of these two attributes and how are they different from each other?
The "storage driver" is software component that docker uses to manages storage. This may be one of the overlay drivers, which use the overlay filesystem driver in the kernel, or the devicemapper driver, which allocates chunks of storage using the Linux device mapper, or any of several other drivers.
At some level, all of these drivers need to store files, which means they need to use a fileysstem. In the case of overlay-type drivers (like overlay, overlay2, aufs) this is an existing filesystem in your host. For the devicemapper driver (and similar drivers that operate on block storage), this is a filesystem created on the block devices that Docker carves out of the devicemapper storage. The "backing filesystem" is the filesystem being used to store the files, and will be something like "XFS" or "ext4", etc.
Some of the Docker storage drivers use regular files on top of an existing file system. aufs, overlay, overlay2 and devicemapper in loop-lvm mode all work via a formatted "backing filesystem".
btrfs, zfs and devicemapper in direct-lvm mode use a volume/device directly so there is no formatted file system in between the Docker container and the actual device being used. A file system is still required for Docker to store data on devices so it will create one using the reported "backing filesystem".

Docker container behavior when used in production

I am currently reading up on Docker. From what I understand, a container which is based on an image saves only the changes. If I were to use this in a production setup, does it persist it as soon as changes are written to disk by applications running "inside" the container or does it have to be done manually?
My concern is - what if the host abruptly shuts down? Will all the changes be lost?
The theory is that there's no real difference between a Docker container and a classical VM or physical host in most situations.
If the host abruptly dies, you can loose recent data using a container as well as using a physical host:
your application may not have decided to really send the write operation to save the data on disk,
the Operating System may have decided to wait a bit before sending data to storage devices
the filesystem may not have finished the write
the data may not have been really flushed to the physical storage device.
Now by default, Docker uses AUFS (stackable filesystem) which works at the file level.
If you're writing to a file that was existing in the Docker image, AUFS will first copy this base file to the upper, writable layer (container), before writing your change. This causes a delay depending on the size of the original file. Interesting and more technical information here.
I guess that if a power cut occurs happens while this original file is being copied and before your changes have been written, then that would be one reason to get more data loss with a Docker container than with any "classical" host.
You can move your critical data to a Docker "volume", which would be a regular filesystem on the host, bind-mounted into the container. This is the recommended way to deal with important data that you want to keep across containers deployments
To mitigate the AUFS potential issue, you could tell Docker to use LVM thin provisioning block devices instead of AUFS (wipe /var/lib/dockerand start the daemon with docker -d -s devicemapper). However I don't know if this storage backend received as much testing as the default AUFS one (it works ok for me though).

Resources