How to measure CPU/memory usage of "main" container from a sidecar container in k8s? - docker

I would like to deploy a sidecar container that is measuring the memory usage (and potentially also CPU usage) of the main container in the pod and then send this data to an endpoint.
I was looking at cAdvisor, but Google Kubernetes Engine has hardcoded 10s measuring interval, and I need 1s granularity. Deploying another cAdvisor is an option, but I need those metrics only for a subset of pods, so it would be wasteful.
Is it possible to write a sidecar container that monitors the main container metrics? If so, what tools could the sidecar use to gather the data?

That one second granularity will be probably the main showstopper for many monitoring tools. In theory you can script it on your own. You can use Docker stats API and read stats stream only for main pod. You will need to mount /var/run/docker.sock to the sidecar container. Curl example:
curl -N --unix-socket /var/run/docker.sock http:/containers/<container-id>/stats
Another option is to read metric from cgroups. But you will need more calculations in this case. Mounting of croups to the sidecar container will be required. See some examples of cgroup pseudo-files on https://docs.docker.com/config/containers/runmetrics/

This could be done by sharing the process namespace for the Pod. Then the sidecar container would be able to see the processes from the main container (e.g. via ps), and would be able to monitor the CPU / Memory usage with standard unix tools.
One tool could be node-exporter, with the processes collector enabled. This can then be monitored by Prometheus

See topz, a simple utility to expose top command as web interface.

You can use Prometheus and Grafana for memory and cpu usage and monitoring. These are open source tools and can be used on production environment as well.

Related

docker swarm get cpu and memory via api

Can i get cpu and memory utilization of all worker nodes via Docker Api, with simplest dependeny or just docker Api.
Need this for sending / alerting (writing a custom altering code).

Jenkins in a container is much slower than on the server itself

We recently had our jenkins redone. We decided to have the new version on a docker container on the server.
While migrating, I noticed that the jenkins is MUCH slower when its in a container than when it ran on the server itself.
This is a major issue and could mess up our migration.
I tried looking for ways to give more resources to the container with not much help.
How can I speed the jenkins container/ give it all the resources it needs on the server (the server is dedicated only to jenkins).
Also, how do I devide these resources when I want to start up slave containers as well?
Disk operations
One thing that can go slow with Docker is when the process running in a container is making a lot of I/O calls to the container file system. The container file system is a union file system which is not optimized for speed.
This is where docker volumes are useful. Additionally to providing a location on the file system which survives container deletion, disk performance on a docker volume is good.
The Jenkins Docker image defines the JENKINS_HOME location as a docker volume, so as long as your Jenkins jobs are making their disk operations within that location you should be fine.
If you determine that disk access on that volume is still too slow, you could customize the mount location of that volume on your docker host so that it would end up being mounted on a fast drive such as a SSD.
Another trick is to make a docker volume mounted to RAM with tmpfs. Note that such a volume does not offer persistence and that data at that location will be lost when the container is stopped or deleted.
JVM memory exhaustion / Garbage collector
As Jenkins is a Java application, another potential issue comes in mind: memory exhaustion. In the case the JVM on which the Jenkins process runs on is too limited in memory, the Java garbage collector will runs too frequently. You can witness that when you realize your Java app is using too much CPU (the garbage collector uses CPU). If that is the case, give more memory to the JVM:
docker run-p 8080:8080 -p 50000:50000 --env JAVA_OPTS="-Xmx2048m -Djava.awt.headless=true" jenkins/jenkins:lts
Network
Docker containers have a virtual network stack and custom network settings. You also want to make sure that all network related operation are fast.
The DNS server might be an issue, check it by executing ping <some domain name> from the Jenkins container.

application(JMX) monitoring running inside kubernetes cluster using prometheus

I have multiple java application running inside the container and all of them is managed by kubernetes.
I am using prometheus to monitor cotainer level metrics i.e cpu, mem, etc.
Now I want to do applciation level monitoring using jmx_exporter. but with every deploy container IP keep changing.
can I some how use kuberentes service-ip(cluster-ip) which don't change.
I cannot just directly put kube-service ip as it load-balance it among containers. and every time i will get only one contaienr metrics insted of all.
or is there a way to dynamically discover cotainer with service_name or replication_controller name in prometheus.
Yes, you can scrape pods in kubernetes.
You can find an example of how to do that here

Docker telemetry and performance monitoring

What will telemetry and monitoring tools show if I lunch in (2 options)
docker container
host system
Will they show cpu\memory and etc usage of container only or of host system?
What are best practise? Monitoring software in each container or in host system?
What you want to do is monitor both, the host(s) and the containers running on them. A good way to do that is run a container that collects all data on each docker host. That is how Sematext Docker Agent runs, for example -- it runs as a tiny container on each Docker host and collects all host+containers metrics, events, and logs. It then parses logs, can route them, blacklist/whitelist them, auto-discovery new containers, and so on. In the end logs end up in Logsene and metrics and events end up in SPM, which gives you a single pane of glass sort of view into all your Docker ops bits, with alerting, anomaly detection, correlation, and so on. I hope this helps and points you in the right direction.
The results should be exactly the same, because Docker containers are sharing their resources (unlike virtual machines).
Putting an agent in your containers is not advisable, not just for performance reasons, but it is an anti-pattern in the Docker world, where each container should run a single process. Better is to run a monitoring agent on the host or in a separate container that can be configured to extract metrics from the other containers. This is the way we work at CoScale. If you are interested, have a look at our solution for monitoring Docker.

Working monitoring solution for Docker Containers and Swarm?

I'm looking for the monitoring solution for the web application, deployed as a Swarm of Docker containers spread through 7-10 VMs. High level requirements are:
Configurable Web and REST interface to performance dashboard
General performance metrics on VM levels (CPU/Memory/IO)
Alerts when containers and/or VMs are going offline/restart
Possibility to drill down into containers process activity when needed
Host OS are CoreOS and Ubuntu
Any recommendations/best practices here?
NOTE: external Kibana installation is being used to collect application logs from Logstash agents deployed on VMs.
Based on your requirements, it sounds like Sematext Docker Agent would be a good fit. It runs as a tiny container on each Docker host and collects all host+containers metrics, events, and logs. It can parse logs, route them, blacklist/whitelist them, has container auto-discovery, and so on. In the end logs end up in Logsene and metrics and events end up in SPM, which gives you a single pane of glass sort of view into all your Docker ops bits, with alerting, anomaly detection, correlation, and so on.
I am currently evaluating bosun with scollector + cAdvisor support. Look ok so far.
Edit:
It should meet all the listed requirements and a little bit more. :)
Take a look at Axibase Time-Series Database / Google Cadvisor / collectd stack.
Disclosure: I work for the company that develops ATSD.
Deploy 1 Cadvisor container per VM to collect Docker container statistics. Cadvisor front-end allows you to view top container processes.
Deploy 1 ATSD container to ingest data from multiple Cadvisor instances.
Deploy collectd daemon on each VM to collect host statistics, configure collectd daemons to stream data into ATSD using write_atsd plugin.
Dashboards:
Host:
Container:
API / SQL:
https://github.com/axibase/atsd/tree/master/api#api-categories
Alerts:
ATSD comes with a built-in rule engine. You can configure a rule to watch when containers stops collecting data and trigger an email or system command.

Resources