Container disk usage in DataDog - docker

Is there any way to monitor disk usage of docker containers in DataDog?
I can see in DataDog web all the CPU, RAM and IO metrics for my containers.
But I can't see any of disk space related metrics.
Their page https://docs.datadoghq.com/integrations/docker/ says about:
docker.disk.used (now reported as docker.data.used)
docker.disk.free (now reported as docker.data.free)
docker.disk.total (now reported as docker.data.total)
I can't find these neither in Dashboards > Docker nor in Metrics > Explorer
I'm new to DataDog, so possibly missing something obvious here.

There are 2 relevant options in /etc/dd-agent/conf.d/docker_daemon.yaml:
collect_disk_stats
If you use devicemapper-backed storage (which is default in ECS but not in vanilla Docker or Kubernetes), docker.data.* and docker.metadata.* statistics should do what you are looking for.
collect_container_size
A generic way, using the docker API but virtually running df in every container. This enables the docker.container.* metrics.
See more here:
https://help.datadoghq.com/hc/en-us/articles/115001786703-How-to-report-host-disk-metrics-when-dd-agent-runs-in-a-docker-container-
and here:
https://github.com/DataDog/docker-dd-agent/blob/master/conf.d/docker_daemon.yaml#L46

Related

Is it possible to run a large number of docker containers?

A small introduction to history. I am building a small service (website) where the user is provided with all sorts of tools that work according to the parameters specified by the user himself. In my implementation, it turns out that the tools are one big script that runs in the docker. It turns out that my service should launch a new docker container for each user.
I was thinking about using "aws fargate" or "gcloud run", or any other resource that makes it possible to run a docker container.
But I'm interested. What if there are 1000 or 10000 users, each one will have its own docker container, is that good? Do the services (aws, gcloud) have any restrictions, or is it a bad implementation?
Based upon my understanding you have suggested that you instantiate a Docker container for each of your users, I think there are a couple of issues with this:
Depending on how many users you have you get into the realms of too many containers. (each container will consume resources, not just Memory and CPU but also TCP/IP pool exhaustion.)
Isolation -> Read containers are not VMs

Docker Container shows different CPU Usage with different tools

I am building a project inside the docker container without any resource limitation on creating the container. when I am monitoring it, I see the different results for CPU usage.
from ctop
From the Grafana (Full Node Exporter Chart)
And from the cAdvisor
I do not understand why the results are different, specially with ctop command.
but my main question is, does Docker really use all CPUs? this machine has 16 vCPU and 16GB RAM
It's not exactly clear in the node exporter what instance or container you are monitoring, though it seems like the node exporter it showing the total machine CPU usage in 0-100 format and ctop shows in 100% per vCPU format.
Also try using docker stats, that should show all running containers resource usage, from cpu to network and disk usage, when using it each vCPU would as 100% so your total will be 1600% for 16 vCPU.
Regarding the cAdvisor output it doesn't show the same time range as the grafana node exporter so it would be hard to make a hard conclusion but it seems like similarly to ctop and docker stats it shows on a per core basis but instead of percentage it shows in 'cores' unit of measurement

Google Cloud Platform: how to monitor memory usage of VM instances

I have recently performed a migration to Google Cloud Platform, and I really like it.
However I can't find a way to monitor the memory usage of the Dataproc VM intances. As you can see on the attachment, the console provides utilization info about CPU, disk and network, but not about memory.
Without knowing how much memory is being used, how is it possible to understand if there is a need of extra memory?
By installing the Stackdriver agent in GCE VMs additional metrics like memory can be monitored. Stackdriver also offers you alerting and notification features. Nevertheless agent metrics are only available for premium tier accounts.
See this answer for Dataproc VMs.
The stackdriver agent only supports monitoring of RAM of the E2 family at the moment. Other instance types such as N1, N2,... are not supported.
See the latest documentation of what is supported; https://cloud.google.com/monitoring/api/metrics_gcp#gcp-compute
Well you can use the /proc/meminfo virtual file system to get information on current memory usage. You can create a simple bash script that reads the memory usage information from /proc/meminfo. The script can be run periodically as a cron job service. The script can send an alert email if the memory usage exceeds a given threshold.
See this link: https://pakjiddat.netlify.app/posts/monitoring-cpu-and-memory-usage-on-linux
The most up-to-date answer here.
How to see memory usage in GCP?
Install the agent on your virtual machine. Takes less than 5 minutes.
curl -sSO https://dl.google.com/cloudagents/add-monitoring-agent-repo.sh
sudo bash add-monitoring-agent-repo.sh
sudo apt-get update
sudo apt-get install stackdriver-agent
the code snippet should install the most recent version of the agent, but for up-to-date guide you can always refer to
https://cloud.google.com/monitoring/agent/installation#joint-install.
After it's installed, in a minute or two, you should see the additional metrics in Monitoring section of GCP.
https://console.cloud.google.com/monitoring
Explanation and why it's invisible by default?
The metrics (such as CPU usage or memory usage) can be collected at different places. For instance, CPU usage is a piece of information that the host (machine with special software running your virtual machine) can collect.
The thing with memory usage and virtual machines, is, it's the underlying operating system that manages it (the operating system of your virtual machine). Host cannot really know how much is used, for all it can see in the memory given to that virtual machine, is a stream of bytes.
That's why there's an idea to install agents inside of that virtual machine that would collect the metrics from inside and ship it somewhere where they can be interpreted. There are many types of agents available out there, but Google promotes their own - Monitoring Agent - and it integrates into the entire GCP suite well.
The agent metrics page may be useful:
https://cloud.google.com/monitoring/api/metrics_agent
You'll need to install stackdriver. See: https://app.google.stackdriver.com/?project="your project name"
The stackdriver metrics page will provide some guidance. You will need to change the "project name" (e.g. sinuous-dog-133823) to suit your account:
https://app.google.stackdriver.com/metrics-explorer?project=sinuous-dog-133823&timeSelection={"timeRange":"6h"}&xyChart={"dataSets":[{"timeSeriesFilter":{"filter":"metric.type=\"agent.googleapis.com/memory/bytes_used\" resource.type=\"gce_instance\"","perSeriesAligner":"ALIGN_MEAN","crossSeriesReducer":"REDUCE_NONE","secondaryCrossSeriesReducer":"REDUCE_NONE","minAlignmentPeriod":"60s","groupByFields":[],"unitOverride":"By"},"targetAxis":"Y1","plotType":"LINE"}],"options":{"mode":"COLOR"},"constantLines":[],"timeshiftDuration":"0s","y1Axis":{"label":"y1Axis","scale":"LINEAR"}}&isAutoRefresh=true
This REST call will get you the cpu usage. You will need to modify the parameters to suite your project name (e.g. sinuous-dog-133823) and other params to suit needs.
GET /v3/projects/sinuous-cat-233823/timeSeries?filter=metric.type="agent.googleapis.com/memory/bytes_used" resource.type="gce_instance"& aggregation.crossSeriesReducer=REDUCE_NONE& aggregation.alignmentPeriod=+60s& aggregation.perSeriesAligner=ALIGN_MEAN& secondaryAggregation.crossSeriesReducer=REDUCE_NONE& interval.startTime=2019-03-06T20:40:00Z& interval.endTime=2019-03-07T02:51:00Z& $unique=gc673 HTTP/1.1
Host: content-monitoring.googleapis.com
authorization: Bearer <your token>
cache-control: no-cache
Postman-Token: 039cabab-356e-4ee4-99c4-d9f4685a7bb2
VM memory metrics is not available by default, it requires Cloud Monitoring Agent 1.
The UI you are showing is Dataproc, which already has the agent installed, but disabled by default, you don't have to reinstall it. To enable Cloud Monitoring Agent for Dataproc clusters, set --properties dataproc:dataproc.monitoring.stackdriver.enable=true 2 when creating the cluster. Then you can monitor VM memory and create alerts in the Cloud Monitoring UI (not integrated with Dataproc UI yet).
Also see this related question: Dataproc VM memory and local disk usage metrics
This article is now out of date as Stackdriver is now a legacy agent. This has been replaced by the Ops Agent. Please read the latest articles on GCP about migrating to Ops Agent

How to limit Docker filesystem space available to container(s)

The general scenario is that we have a cluster of servers and we want to set up virtual clusters on top of that using Docker.
For that we have created Dockerfiles for different services (Hadoop, Spark etc.).
Regarding the Hadoop HDFS service however, we have the situation that the disk space available to the docker containers equals to the disk space available to the server. We want to limit the available disk space on a per-container basis so that we can dynamically spawn an additional datanode with some storage size to contribute to the HDFS filesystem.
We had the idea to use loopback files formatted with ext4 and mount these on directories which we use as volumes in docker containers. However, this implies a large performance loss.
I found another question on SO (Limit disk size and bandwidth of a Docker container) but the answers are almost 1,5 years old which - regarding the speed of development of docker - is ancient.
Which way or storage backend would allow us to
Limit storage on a per-container basis
Has near bare-metal performance
Doesn't require repartitioning of the server drives
You can specify runtime constraints on memory and CPU, but not disk space.
The ability to set constraints on disk space has been requested (issue 12462, issue 3804), but isn't yet implemented, as it depends on the underlying filesystem driver.
This feature is going to be added at some point, but not right away. It's a bit more difficult to add this functionality right now because a lot of chunks of code are moving from one place to another. After this work is done, it should be much easier to implement this functionality.
Please keep in mind that quota support can't be added as a hack to devicemapper, it has to be implemented for as many storage backends as possible, so it has to be implemented in a way which makes it easy to add quota support for other storage backends.
Update August 2016: as shown below, and in issue 3804 comment, PR 24771 and PR 24807 have been merged since then. docker run now allow to set storage driver options per container
$ docker run -it --storage-opt size=120G fedora /bin/bash
This (size) will allow to set the container rootfs size to 120G at creation time.
This option is only available for the devicemapper, btrfs, overlay2, windowsfilter and zfs graph drivers
Documentation: docker run/#Set storage driver options per container.

Is there a formula for calculating the overhead of a Docker container?

Supposed I want to run several Docker containers at the same time.
Is there any formula I can use to find out in advance on how many containers can be run at the same time by a single Docker host? I.e., how much CPU, memory & co. do I have to take into account for the containers themselves?
It's not a formula per se, but you can gather information about resource usage in the container by examining Linux control groups in /sys/fs/cgroup.
Links
See this excellent post by Jérôme Petazzoni of Docker, Inc on the subject.
See also Google's cAdvisor tool to view container resource usage.
This IBM research paper documents that Docker performance is higher than KVM in every measurement.
docker stats is also useful for getting a rough idea for how CPU and Memory your containers use.
cAdvisor is going to provide resource usage and other interesting stats about all containers o a host. We have a preliminary setup for root usage, but we are adding a lot more this week.

Resources