Artifactory Docker Registry Free /var partition space - docker

Due to irregularization of docker images being pushed to jfrog docker registry, my /var partition is currently FULL.
As i am able to ssh into the machine, i wanted to know can i directly go about deleting the images at the /var location as I am not able to start artifactory service due to insufficient space.

The Docker images are stored as checksum binary files in the filestore. You will have no way of knowing what checksum belongs to what image and since images often share the same layer, even deleting a single one can corrupt several images.
For the short term, I recommend moving (not deleting) a few binary files to allow you to start your registry back up. You can also delete the backup directory (since backup is on by default and you may not actually want/need it and it occupies a lot of space). Once that is done, start it up and delete enough images to clear enough space OR, preferably, expand the filestore size OR, better yet, move it to a different partition so you don't mix the app/OS with the application data. In any case, when you have more free space, move the binary files back to their original location.

Related

Increasing the storage space of a docker container on Windows to 2-3TB

Working on a windows computer with 5TB available space. Working on building an application to process large amounts of data that uses docker containers to create replicable environments. Most of the data processing is done in parallel using many smaller docker containers, but the final tool/container requires all the data to come together in one place. The output area is mounted to a volume, but most of the data is just copied into the container. This will be multiple TBs of storage space. RAM luckily isn't an issue in this case.
Willing to try any suggestions and make what changes I can.
Is this possible?
I've tried increasing disk space for docker using .wslconfig but this doesn't help.

Longhorn Volume a lot bigger than mounted drive

I have a small InfluxDB database running inside my K3S cluster.
As Storage Class I use Longhorn.
I know it's not optimal to run a database in Kubernetes, but this is only for some metric logging for Telegraf.
The problem is that in the pod the mounted volume is 200 MB big, but in Longhorn it's 2.5 GB big as actual size. The volume is only 1 day old. At this speed, my disk storage will be full soon.
Why is this? And is this something I can fix?
I suspect the reason for this is snapshots.
Longhorn volumes have different size "properties":
The volume's size - this is what you define in your manifest. The actual filesystem contents can't exceed that
The amount of storage currently used on the volume head - this is essentially how full the volume is. run df -h inside an attached pod or use a tool like df-pv to check usage (this is relevant when your volume is getting full)
snapshot size: how big a snapshot is, building incrementally on top of the last one. this can be viewed in the snapshots section of longhorn UI
actual size: how much space the volume is really using on your host machine. This can be larger than the volume's "defined" size due to a number of reasons - the most common of which being snapshots
Longhorn keeps a history of previous changes to a volume as snapshots. you can either create them manually from the UI or create a RecurringJob that does that for you automatically.
Having many snapshots is problematic when a lot of data is (re-)written to a volume. Imagine the following scenario:
Write a 1GB file to volume
take snapshot (this snapshot is now 1GB big)
delete the file (volume head only contains the "file deleted" info, previous snapshot size is unaffected)
write a new 1GB file. volume head is now 1GB (new file)+the info from 3. big, BUT your previous snapshot is another GB. That way, your actual size is already 2x as big as the space currently used inside the volume
There's also an ongoing discussion about reclaiming space automatically

How to inspect contents of different Docker image layers?

My current understanding of a Docker image is that it is a collection of individual layers. Each layer only contains deltas that are merged via the union filesystem (which simply mounts all layers on top of each other). When instantiating an image, another (writable) layer is put on top that will then contain all container-specific changes that are persisted between restarts. Please correct me if I am wrong in any of the above.
I would like to inspect the contents of each of the various layers. I am particularly interested in inspecting the top-most layer to see whether my containerized app writes any data that would bloat the container, like a log or so. I am working on macOS, which does not store all the files in /var/lib/docker/, but seems to store them in a VM. I read about the docker-machine tools that make it easy to connect to the Docker engine via SSH, where one would be able to see and mount all layers. However, this tool seems to be discontinued.
Does anybody have an idea on 1) how to connect to the docker engine to get access to the layers and 2) how to find out what files are contained in a particular layer?
edit: It seems to be possible to use docker diff to see the file differences between the original image and the running container, which is what I mainly wanted to achieve, but the original questions remain.
You can list the layers and their sizes with the docker history command. But to inspect the contents of all layers I recommend to use the dive tool.

How file lookup work in Docker container

According to Docker docs, every Dockerfile instruction create a layer, and all the layers are kept when you create new image based on an old one. Then when I create my own image, I might have hundreds of layers involved because of the recursive inherit of layers of base image.
In my understand, file lookup in container work this way:
process want to access file a, lookup starts from the container layer(thin w/r layer) .
UnionFS check whether this layer have a record for it (have it or marked as deleted). If yes, return it or say not found respectively, ending the lookup. If no, pass the task to the layer below.
the lookup end at the bottom layer.
If that is the way, consider a file that resides in the bottom layer and unchanged by other layers, /bin/sh maybe, would need going through all the layers to the bottom. Though the layers might be very light-weight, a lookup still need 100x time than a regular one, noticeable. But from my experience, Docker is pretty fast, almost same as a native OS. Where am I wrong?
This is all thanks to UnionFS and Union mounts!
Straight from wikipedia:
It allows files and directories of separate file systems, known as
branches, to be transparently overlaid, forming a single coherent file
system.
And from an interesting article:
In the kernel, the filesystems are stacked in order of their mount
sequence, the first mounted filesystem is at the bottom of the mount
stack, and the latest mount is at the top of the stack. Only the files
and directories of the top of the mount stack are visible. With union
mounts, directory entries from the lower filesystems are merged with
the directory entries of upper filesystem, thus making a logical
combination of all mounted filesystems. Files with the same name in a
lower filesystem are masked, as the upper one takes precedence.
So it doesn't "go through layers" in the conventional sense (e.g one at a time) but rather it knows (at any given time) which file resides on which disk.
Doing this in the filesystem layer also means none of the software has to worry about where the file resides, it knows to ask for /bin/sh and the filesystem knows where to get it.
More info can be found in this webinar.
So to answer your question:
Where am I wrong?
You are thinking that it has to look through the layers one at a time while it doesn't have to do that. (UnionFS is awesome!)
To add to the correct prior answer, copy-on-write (CoW) and union filesystem implementors want to have near-native performance, so, of course, have tuned their implementations and "API" to have best possible lookup/filesystem performance.
That said, it's good to be aware that Docker does not operate on top of only a single 'type' of union/CoW filesystem, but has a small array of available options, with defaults depending on the Linux distro on which it is installed.
AUFS and overlay(fs) are the most common, but Docker also supports devicemapper (Red Hat contributed and supported on Fedora/RHEL/CentOS), btrfs, and zfs. I have a blog post comparing and contrasting the various options that may be of interest.

Docker images across multiple disks

I'm getting going with Docker, and I've found that I can put the main image repository on a different disk (symlink /var/lib/docker to some other location).
However, now I'd like to see if there is a way to split that across multiple disks.
Specifically, I have an old SSD that is blazingly fast to read from, but doesn't have too many writes left until it kicks the can. It would be awesome if I could store the immutable images on here, then have my writeable images on some other location that can handle the writes.
Is this something that is possible? How do you split up the repository?
Maybe you could do this using the AUFS driver and some trickery such as moving layers to the SSD after initially creating them and pointing symlinks at them - I'm not sure, I never had a proper look at how that storage driver worked.
With devicemapper thinp, btrfs and OverlayFS this isnt possible AFAICT:
The Docker dm-thinp and btrfs drivers both build layers one on top of the other using block device snapshot mechanisms. Your best bet here would be to include the SSD in the storage pool and rely on some ability to migrate the r/o snapshots to a specific block device that is part of the pool. Doubt this exists though.
The OverlayFS driver stacks layers by hard-linking files in independent directory structures. Hard-links only work within a filesystem.

Resources