I'm working on a NUMA server which has two memory nodes.
I want to create a file system which will be loaded in main memory like tmpfs or ramfs and I want to bind it to a specific memory node. In other words I don't want the ramfs contents to be interleaved across the two memory nodes.
So how can I achieve this?
I have tried the numactl command with the --file option but it seems to work only for single files (I need to load a directory).
thanks
I found out that the mpol option of the mount command does what I want.
For example the command:
mount -t tmpfs -o size=4g,mpol=bind:0 tmpfs pathToTheDir
will create a 4GB filesystem which will be allocated on memory node 0
Related
I have an Apache server in production that is running in a Docker container, which I've deployed to a Google Compute instance using the "gcloud compute instances create-with-container" command. The /var/www/html folder is mounted onto the container from the boot disk of the computer instance to make it persistent, using the --container-mount-host-path flag:
gcloud compute instances create-with-container $INSTANCE_NAME \
--zone=europe-north1-a \
--container-image gcr.io/my-project/my-image:latest \
--container-mount-host-path mount-path=/var/www/html,host-path=/var/www/html,mode=rw \
--machine-type="$MACHINE_TYPE"
But now I've ran into the problem that the size of the Docker partition is only 5.7G!
Output of df -h:
...
/dev/sda1 5.7G 3.6G 2.2G 62% /mnt/stateful_partition
overlay 5.7G 3.6G 2.2G 62% /var/lib/docker/overlay2/4f223d8157033ce937a79af741df3eadf79a02d2d003f01a085301ff66884bf2/merged
overlay 5.7G 3.6G 2.2G 62% /var/lib/docker/overlay2/86316491e2bb20bc300c1cc55c9f9254001ed77d6ec7f05f716af1e52fe15f53/merged
...
I had assumed that the partition size would increase automatically, but I ran into the problem where the website couldn't write files onto disk anymore because the partition was full. As a quick fix, I ran "docker prune -a" (there were a bunch of old images hanging around) on the host machine to make some more space on the docker partition.
So my question is, what is the proper way of increasing the size of the partition?
You can resize the boot disk in the Google Cloud Console GUI. However, since this is a container host, I recommend deleting the virtual machine instance and creating a new instance with the correct configuration.
The default disk size is usually 10 GB. To create a virtual machine instance with a larger disk, specify that when creating the instance.
Add the following to your CLI command:
--boot-disk-size=32GB
Optionally specify the type of persistent disk to control costs:
--boot-disk-type=pd-standard
gcloud compute instances create-with-container
Is it possible to mount an overlay fs inside a (privileged) docker container? At least my intuitive approach, which works fine outside of a container, fails:
> mkdir /tmp/{up,low,work,merged}
> mount -t overlay overlay -o lowerdir=/tmp/low/,upperdir=/tmp/up/,workdir=/tmp/work/ /tmp/merged/
mount: /tmp/merged: wrong fs type, bad option, bad superblock on overlay, missing codepage or helper program, or other error.
Additional information:
Docker version 18.09.1, build 4c52b90
Kernel 4.19.0-8-amd64
Debian 10 (host and docker-image)
Found something that worked! Mounting the workdir and upperdir as tmpfs does the trick for me.
Like so:
> mkdir /tmp/overlay
> mkdir /tmp/{low,merged}
> mount -t tmpfs tmpfs /tmp/overlay
> mkdir /tmp/overlay/{up,work}
> mount -t overlay overlay -o lowerdir=/tmp/low/,upperdir=/tmp/overlay/up/,workdir=/tmp/overlay/work/ /tmp/merged/
I'd still be interested in an explanation why creating an overlay w/o tmpfs fails within a docker container?
How to mount an overlayfs inside a docker container:
https://gist.github.com/detunized/7c8fc4c37b49c5475e68ef9574587eee
Basically, you need to run the container with either --privileged or the more secure --cap-add=SYS_ADMIN.
This is a bit of a guess but I suspect it is because docker is already using overlayfs and overlayfs is refusing to use upperdir as another overlayfs.
I suspect this may be due to whiteout files:
In order to support rm and rmdir without changing the lower
filesystem, an overlay filesystem needs to record in the upper
filesystem that files have been removed. This is done using whiteouts
and opaque directories (non-directories are always opaque).
A whiteout is created as a character device with 0/0 device number.
When a whiteout is found in the upper level of a merged directory, any
matching name in the lower level is ignored, and the whiteout itself
is also hidden.
To delete a file that exists in a lowerdir, overlayfs will create a whiteout file and hides all whiteout files (device number 0,0). This logically means that you cannot create a character device file with number 0,0 inside an overlayfs because that must be hidden by overlayfs itself.
If you were allowed to use an overlayfs as an upperdir it wouldn't be able to create blackout files and therefore wouldn't be able to rm or rmdir any files from the lower layers. Because it can't create a character device file with number 0,0 on another overlayfs.
I'm running a Mac-native Docker (no virtualbox/docker-machine).
I have a huge image with a lot of infrastructure in it (Postgres, etc.).
I have run cleanup scripts to get rid of a lot of cruft--unused images and so forth.
When I run my image I get an error like:
could not create directory "/var/lib/postgresql/data/pg_xlog": No space left on device
On my host Mac /var is sitting at 60% space available and generally my disk has lots of storage free.
Is this some Docker configuration I need to bump up to give it more resources?
Relevant lines from mount inside docker:
none on / type aufs (rw,relatime,si=5b19fc7476f7db86,dio,dirperm1)
/dev/vda1 on /data type ext4 (rw,relatime,data=ordered)
/dev/vda1 on /etc/resolv.conf type ext4 (rw,relatime,data=ordered)
/dev/vda1 on /etc/hostname type ext4 (rw,relatime,data=ordered)
/dev/vda1 on /etc/hosts type ext4 (rw,relatime,data=ordered)
/dev/vda1 on /var/lib/postgresql/data type ext4 (rw,relatime,data=ordered)
Here’s df:
[11:14]
Filesystem 1K-blocks Used Available Use% Mounted on
none 202054928 4333016 187269304 3% /
tmpfs 1022788 0 1022788 0% /dev
tmpfs 1022788 0 1022788 0% /sys/fs/cgroup
/dev/vda1 202054928 4333016 187269304 3% /data
shm 65536 4 65532 1% /dev/shm
tmpfs 204560 284 204276 1% /run/docker.sock
I haven't found many options for this, the main issue in github is https://github.com/docker/for-mac/issues/371
Some of the options suggested there are:
If you can remove all images/containers, you can follow these instructions:
docker rm $(docker ps -a -q)
docker rmi $(docker images -q)
docker volume rm $(docker volume ls |awk '{print $2}')
rm -rf ~/Library/Containers/com.docker.docker/Data/*
You can try to prune all unused images/containers but this has proven not very effective:
docker system prune
Use a template image that is larger, install qemu using homebrew and move the image around, see this specific comment: https://github.com/docker/for-mac/issues/371#issuecomment-242047368 but you need to have at least 2x of space left to do this without losing containers/images.
See also: How do you get around the size limitation of Docker.qcow2 in the Docker for Mac?
And https://forums.docker.com/t/no-space-left-on-device-error/10894/26
I ran into the same issue, running docker system prune --volumes resolved the problem.
"Volumes are not pruned by default, and you must specify the --volumes flag for docker system prune to prune volumes."
See: https://docs.docker.com/config/pruning/#prune-everything
I ran into this recently on with a docker installation on linux that uses the devicemapper storage driver (default). There was indeed a docker configuration I needed to change to fix this.
Docker images are made of read-only layers of filesystem snapshots, each layer created by a command in your Dockerfile, which are built on top of a common base storage snapshot. The base snapshot is shared by all your images and has a file system with a default size of 10GB. When you run your image you get a new writable layer on top of all the layers in the image, so you can add new files in your running container but it's still eventually based on the same base snapshot with the 10GB filesystem. This is at least true for devicemapper, not sure about other drivers. Here is the relevant documentation from docker.
To change this default value to something else, there's a daemon parameter you can set, e.g. docker daemon --storage-opt dm.basesize=100G. Since you probably don't run the daemon manually need to edit the docker daemon options in some file, depending on how you run the docker daemon. With docker for mac you can edit the daemon parameters as JSON in the preferences under Daemon->Advanced. You probably need to add something like this:
{
"storage-opts": ["dm.basesize=100G"]
}
(but like I said, I had this problem on linux, so didn't try the above).
Anyway in order for this to take effect, you'll need to remove all your existing images (so that they're re-created on top of the new base snapshot with the new size). See storage driver options.
I would like to power a docker instance in ram... totally inside ram... using tmpfs
Can it be done?
I'm not sure how docker uses filesystems as I'm too used to using kvm and xen, they both need to set up a default size before it can be used.
So how does "docker fs" work?
This can be done. If you mount /var/lib/docker on a tmpfs, Docker can use other storage backends, like OverlayFS, on top of it.
Docker uses what it calls a "Union File System", made up of multiple read-only layers with a copy-on-write layer on top (see http://docs.docker.com/terms/layer/). It can use one of several storage drivers for this (in order of preference): AUFS, BTRFS, devicemapper, overlayfs or VFS. Because of this, no, I don't think you will be able to use tmpfs.
More information at https://developerblog.redhat.com/2014/09/30/overview-storage-scalability-docker/
So far, I really only understand VOLUME as a way to
Specify a directory inside of a data container that will be persistent
Specify a location that will link to your host container
What I am failing to understand is why I see so many Dockerfiles that use VOLUME /path/to/app or even worse VOLUME /var/lib/mysql. I understand that you might want to create a container that has this volume and then use --volumes-from to link to that container for persistence. But why would you make that specification inside the container that is actually using that data. How does it help? As far as I can see, VOLUME /var/data is not any different than just saying RUN mkdir /var/data. How are volumes beneficial when they are not inside a data container, shared with the host, or being used by other containers?
Docker images and Docker containers have a layered file system which is slow. By defining directories as data volumes you instruct docker to let those directories outside of the slow layered file system. This as multiple consequences among which:
fast file system
ability to share a volume between multiple containers
persistence (as long as at least one container that use that volume exists)
This is why volumes are not only a commodity but a necessity for directories for which good I/O performances is expected.
As far as I can see, VOLUME /var/data is not any different than just saying RUN mkdir /var/data.
The difference is that with volumes the directory /var/data is a mount point on a different (and faster) file system. You can witness that /var/data is not just another directory by running the mount command in a container:
$ docker run --rm -v /var/data busybox mount
rootfs on / type rootfs (rw)
none on / type aufs (rw,relatime,si=6c354c296f850c3c)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev type tmpfs (rw,nosuid,mode=755)
shm on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=65536k)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666)
sysfs on /sys type sysfs (ro,nosuid,nodev,noexec,relatime)
/dev/mapper/vg0-root on /etc/resolv.conf type ext4 (rw,relatime,errors=remount-ro,data=ordered)
/dev/mapper/vg0-root on /etc/hostname type ext4 (rw,relatime,errors=remount-ro,data=ordered)
/dev/mapper/vg0-root on /etc/hosts type ext4 (rw,relatime,errors=remount-ro,data=ordered)
/dev/mapper/vg0-root on /var/data type ext4 (rw,relatime,errors=remount-ro,data=ordered)
proc on /proc/sys type proc (ro,nosuid,nodev,noexec,relatime)
proc on /proc/sysrq-trigger type proc (ro,nosuid,nodev,noexec,relatime)
proc on /proc/irq type proc (ro,nosuid,nodev,noexec,relatime)
proc on /proc/bus type proc (ro,nosuid,nodev,noexec,relatime)
tmpfs on /proc/kcore type tmpfs (rw,nosuid,mode=755)
/ is on a aufs layered (and slow) file system
/var/data is on a ext4 (and fast) file system