is it possible to speed up writes inside a docker container? - docker

I have a very large file in my docker container (it's a virtualbox image) which --- unfortunately -- must be modified as part of running it. Docker's copy-on-write policy works against me here and unfortunately any mutation/copying of the file takes about 10 minutes, compared to about 10 seconds to copy the same file on the host.
Can anything be done to speed up the creation/copy of very large files within a docker container? Note that this is an entirely transient file that I do not need to persist after the container is closed.

Declare the folder the file is in as a volume. If you do this, the copy-on-write-policy is not applied. Note that you don't have to mount this volume to the host system, it is sufficient to declare it as a volume.
For more information: https://docs.docker.com/userguide/dockervolumes/

Related

Docker: How to include big folder to container?

I am new to Docker and I have a Docker Compose setup with three different services. But I have a problem regarding file size in Docker.
In order to serve images to our users, our server (written in Java/Spring) looks to a local directory called Images, also this directory is used to save new images, this directory is almost 50 GB in size and I can't include it inside Docker Container because of size limitations.
I created an Images folder inside the container then tried to symlink between the Images in the host machine. But it also failed.
My question is, how can I give access to this folder inside the container?
There is a size limit to the Docker container known as base device size. The default value is 10GB.
You can increase this value by setting up storage-opt option to the docker run command. See https://docs.docker.com/engine/reference/commandline/run/#set-storage-driver-options-per-container
Or, if you are running it in docker-compose see https://docs.docker.com/compose/compose-file/compose-file-v2/#storage_opt

Does docker container maintain volume data?

This might come across as a stupid question, but I am unable to figure something about docker volumes. Going through the official documentation I can see that we can map the host machine file system on the container for persistent storage. Following the instruction I was successfully able to mount a folder on my container.
Once I exec bash into the container, I can see the mapped directory structure there as expected. My question is, how is the data mapped between these two paths, that is from the container to the mount volume on host OS. Is the data duplicated or the container directly stores the data on the volume on host OS and the mapped paths are shown for something like symlink ?
This question comes across since we are trying to maintain a large amount of data on a mounted disk but accessible by the container, with the assumption that mounting volume would directly store the data on the disk and nothing on the container.
The Docker documentation refers to this type of mount as a "bind mount"; that's also a technical Linux term that allows one part of the filesystem to also appear somewhere else, and there's a mount --bind option you can use outside of Docker (usually a pretty specialized option).
On native Linux, the host content and the container-visible content are literally the exact same disk content. If you have a bind-mounted host directory or a named Docker volume mounted over a container directory, all reads and writes will use that mounted content, and in fact nothing will be written to the container filesystem on that path.
You mention symlinks; these are always resolved as filenames in their respective filesystem space. If the mounted filesystem has a symlink passwd -> /etc/passwd then reading it will yield the host's password file on the host, and the container's password file inside the container. If it has a symlink f -> ../f then it will look at the directory above the mount point in whichever the local filesystem is.
On non-Linux this process is a little bit more technically complex since there is typically a Linux virtual machine involved in the mix. This usually manifests as file synchronization appearing slow. For data you don't need to directly access as a human, storing it in a named Docker volume will usually be faster.

Where do docker images' new Files get saved to in GCP?

I want to create some docker images that generates text files. However, since images are pushed to Container Registry in GCP. I am not sure where the files will be generated to when I use kubectl run myImage. If I specify a path in the program, like '/usr/bin/myfiles', would they be downloaded to the VM instance where I am typing "kubectl run myImage"? I think this is probably not the case.. What is the solution?
Ideally, I would like all the files to be in one place.
Thank you
Container Registry and Kubernetes are mostly irrelevant to the issue of where a container will persist files it creates.
Some process running within a container that generates files will persist the files to the container instance's file system. Exceptions to this are stdout and stderr which are both available without further ado.
When you run container images, you can mount volumes into the container instance and this provides possible solutions to your needs. Commonly, when running Docker Engine, it's common to mount the host's file system into the container to share files between the container and the host: docker run ... --volume=[host]:[container] yourimage ....
On Kubernetes, there are many types of volumes. An seemingly obvious solution is to use gcePersistentDisk but this has a limitation in that it these disks may only be mounted for write on one pod at a time. A more powerful solution may be to use an NFS-based solution such as nfs or gluster. These should provide a means for you to consolidate files outside of the container instances.
A good solution but I'm unsure whether it is available, would be to write your files as Google Cloud Storage objects.
A tenet of containers is that they should operate without making assumptions about their environment. Your containers should not make assumptions about running on Kubernetes and should not make assumptions about non-default volumes. By this I mean, that your containers will write files to container's file system. When you run the container, you apply the configuration that e.g. provides an NFS volume mount or GCS bucket mount etc. that actually persists the files beyond the container.
HTH!

Docker: Handling user uploads and saving files

I have been reading about Docker, and one of the first things that I read about docker was that it runs images in a read-only manner. This has raised this question in my mind, what happens if I need users to upload files? In that case where would the file go (are they appended to the image)? or in other words, how to handle uploaded files?
Docker containers are meant to be immutable and replaceable - you should be able to stop a container and replace it with a newer version without any ill effects. It's bad practice to store any configuration or operational data inside the container.
The situation you describe with file uploads would typically be resolved with a volume, which mounts a folder from the host filesystem into the container. Any modifications performed by the container to the mounted folder would persist on the host filesystem. When the container is replaced, the folder is re-mounted when the new container is started.
It may be helpful to read up on volumes: https://docs.docker.com/storage/volumes/
docker containers use file systems similar to their underlying operating system, as it seems in your case Windows Nano Server(windows optimized to be used in a container).
so any uploads to your container will be placed on the corresponding path you provided when uploading the file.
but this data is ephemeral, this means your data will persist until the container is for whatever reason stopped.
to use persistent storage you must provide a volume for your docker container, you can think of volumes as external disks attached to a container that mount on a path inside the container. this will persist data regardless of container state

Docker: in memory file system

I have a docker container which does alot of read/write to disk. I would like to test out what happens when my entire docker filesystem is in memory. I have seen some answers here that say it will not be a real performance improvement, but this is for testing.
The ideal solution I would like to test is sharing the common parts of each image and copy to your memory space when needed.
Each container files which are created during runtime should be in memory as well and separated. it shouldn't be more than 5GB fs in idle time and up to 7GB in processing time.
Simple solutions would duplicate all shared files (even those part of the OS you never use) for each container.
There's no difference between the storage of the image and the base filesystem of the container, the layered FS accesses the images layers directly as a RO layer, with the container using a RW layer above to catch any changes. Therefore your goal of having the container running in memory while the Docker installation remains on disk doesn't have an easy implementation.
If you know where your RW activity is occurring (it's fairly easy to check the docker diff of a running container), the best option to me would be a tmpfs mounted at that location in your container, which is natively supported by docker (from the docker run reference):
$ docker run -d --tmpfs /run:rw,noexec,nosuid,size=65536k my_image
Docker stores image, container, and volume data in its directory by default. Container HDs are made of the original image and the 'container layer'.
You might be able set this up using a RAM disk. You would hard allocate some RAM, mount it, and format it with your file system of choice. Then move your docker installation to the mounted RAM disk and symlink it back to the original location.
Setting up a Ram Disk
Best way to move the Docker directory
Obviously this is only useful for testing as Docker and it's images, volumes, containers, etc would be lost on reboot.

Resources