CPU load of JSON export within Docker container - docker

I have a Couchbase instance running as container. I'm thinking on exporting a bucket from there by cbexport.
The thing is that the bucket has 3M documents and it's size is approximately 2GBs and I'm not sure if it's going to be easy for CPU and RAM to export that data. Is it too much to work with for a system?

Related

Docker Container shows different CPU Usage with different tools

I am building a project inside the docker container without any resource limitation on creating the container. when I am monitoring it, I see the different results for CPU usage.
from ctop
From the Grafana (Full Node Exporter Chart)
And from the cAdvisor
I do not understand why the results are different, specially with ctop command.
but my main question is, does Docker really use all CPUs? this machine has 16 vCPU and 16GB RAM
It's not exactly clear in the node exporter what instance or container you are monitoring, though it seems like the node exporter it showing the total machine CPU usage in 0-100 format and ctop shows in 100% per vCPU format.
Also try using docker stats, that should show all running containers resource usage, from cpu to network and disk usage, when using it each vCPU would as 100% so your total will be 1600% for 16 vCPU.
Regarding the cAdvisor output it doesn't show the same time range as the grafana node exporter so it would be hard to make a hard conclusion but it seems like similarly to ctop and docker stats it shows on a per core basis but instead of percentage it shows in 'cores' unit of measurement

GlusterFS storage for docker images

I have a docker swarm with 10 docker worker nodes and i'm experiencing issues with docker images storage (in thin pool). It keeps getting full as i got rather small disks (30GB-60GB).
The error:
Thin Pool has 7330 free data blocks which is less than minimum required 8455 free data blocks. Create more free space in thin pool or use dm.min_free_space option to change behavior
Because of that, cleaning strategy has to be aggressive, meaning deleting all images three times a day. This aggressive cleaning strategy results in broken pulls ( when cleaning happens at the same time when someone is pulling an image) and that developers cannot use cached images - instead they need to download the images that just got deleted by cleaning mechanism.
However, there is an option to use GlusterFs storage and i want to mount glusterFS volumes to each docker node and use them to create thin pool for docker images and /var/lib/docker.
I'm looking for guide how to do that exactly. Have anyone tried that?
P.S. I made my research about shared storage for docker images between multiple docker nodes and it seems its not possible, as stated here. However mounting separate volumes to each docker node should be possible.

GCP: how to access cloud storage bucket from a VM instance

I'm trying to deploy and run a docker image in a GCP VM instance.
I need it to access a certain Cloud Storage Bucket (read and write).
How do I mount a bucket inside the VM? How do I mount a bucket inside the Docker container running in my VM?
I've been reading google cloud documentation for a while, but I'm still confused. All guides show how to access a bucket from a local machine, and not how to mount it to VM.
https://cloud.google.com/storage/docs/quickstart-gsutil
Found something about Fuse, but it looks overly complicated for just mounting a single bucket to VM filesystem.
Google Cloud Storage is a object storage API, it is not a filesystem. As a result, it isn't really designed to be "mounted" within a VM. It is designed to be highly durable and scalable to extraordinarily large objects (and large numbers of objects).
Though you can use gcsfuse to mount it as a filesystem, that method has pretty significant drawbacks. For example, it can be expensive in operation count to do even simple operations for a normal filesystem.
Likewise, there are many surprising behaviors that are a result of the fact that it is an object store. For example, you can't edit objects -- they are immutable. To give the illusion of writing to the middle of an object, the object is, in effect, deleted and recreated whenever a call to close() or fsync() happens.
The best way to use GCS is to design your application to use the API (or the S3 compatible API) directly. That way the semantics are well understood by the application, and you can optimize for them to get better performance and control your costs. Thus, to access it from your docker container, ensure your container has a way to authenticate through GCS (either through the credentials on the instance, or by deploying a key for a service account with the necessary permissions to access the bucket), then have the application call the API directly.
Finally, if what you need is actually a filesystem, and not specifically GCS, Google Cloud does offer at least 2 other options if you need a large mountable filesystem that is designed for that specific use case:
Persistent Disk, which is the baseline filesystem that you get with a VM, but you can mount many of these devices on a single VM. However, you can't mount them read/write to multiple VMs at once -- if you need to mount to multiple VMs, the persistent disk must be read only for all instances they are mounted to.
Cloud Filestore is a managed service that provides an NFS server in front of a persistent disk. Thus, the filesystem can be mounted read/write and shared across many VMs. However it is significantly more expensive (as of this writing, about $0.20/GB/month vs $0.04/GB/month in us-central1) than PD, and there are minimum size requirements (1TB).
Google Cloud Storage buckets cannot be mounted in Google Compute instances or containers without third-party software such as FUSE. Neither Linux nor Windows have built-in drivers for Cloud Storage.
GCS VM comes with google cloud SDK installed. So without mounting you can copy in and out files using those commands.
gsutil ls gs://

Running Docker in Memory?

As far as I understand Docker uses memory mapped files to start from image. Since I can do this over and over again and as far as I remember start different instances of the same image in parallel, I guess docker abstracts the file system and stores changes somewhere else.
I wonder if docker can be configured (or does it by default) to run in a memory only mode without some sort of a temporary file?
Docker uses a union filesystem that allows it to work in "layers" (devicemapper, BTRFS, etc). It's doing copy-on-write so that starting new containers is cheap, and when it performs the first write, it actually creates a new layer.
When you start a container from an image, you are not using memory-mapped files to restore a frozen process (unless you built all of that into the image yourself...). Rather, you're starting a normal Unix process but inside a sandbox where it can only see its own unionfs filesystem.
Starting many copies of an image where no copy writes to disk is generally cheap and fast. But if you have a process with a long start-up time, you'll still pay that cost for every instance.
As for running Docker containers wholly in memory, you could create a RAM disk and specify that as Docker's storage volume (configurable, but typically located under /var/lib/docker).
In typical use-cases, I would not expect this to be a useful performance tweak. First, you'll spend a lot of memory holding files you won't access. The base layer of an image contains most Linux system files. If you fetch 10 packages from the Docker Hub, you'll probably hit 20G worth of images easily (after that the storage cost tends to plateau). Second, the system already manages memory and swapping pretty well (which is why a RAM disk is a performance tweak) and you get all of that applied to processes running inside a container. Third, for most of the cases where a RAM disk might help, you can use the -v flag to mount the disk as a volume on the container rather than needing to store your whole unionfs there.

Docker AUFS... underlying fs... optimization by mount option?

So I understand that docker is using /var/lib/docker/ to store every container and images... right?
That means the only optimization that I can do to my container is to optimize the underlying fs that /var/lib/docker/ is sitting on?
In that sense, can I assume that I should be optimizing the mount options of my underlying system fs? e.g. ext4 noatime, noadirtime etc etc
Also, can i use a different mount per /var/lib/docker/folder?? Any limitations and optimization settings considerations for the underlying disk docker is sitting on?
I would rather squash my images
https://github.com/jwilder/docker-squash
and prefer Debian, for example as said here
https://docker.cn/p/6-dockerfile-tips-official-images-en
extract
The main advantage of the Debian image is the smaller size – it clocks in at around 85.1 MB compared to around 200 MB for Ubuntu.

Resources