Can i get cpu and memory utilization of all worker nodes via Docker Api, with simplest dependeny or just docker Api.
Need this for sending / alerting (writing a custom altering code).
Related
Sorry if this is a dumb question but i'm quite new to Docker.
I understand that, if a the --memory parameter is set and the container uses all the memory, Docker will kill the container if the container.
I wonder if it's possible to create a new container (without killing the previous one) when the container reaches a certain memory limit defined by me.
docker does not have built in service scaling.
most implementations ive seen for docker that do this use:
prometheus, a monitoring server that can scrape docker container metrics.
alertmanager, a server that, given metrics to monitor on a prometheus server, can raise alerts when thresholds are reached.
a custom piece of code using the docker golang sdk that increases or decreases the number of service replicas in response to alert thresholds.
Single docker container is working good for less number of parellel processes but when we increase the number of parellel processes to 20-30 the process execution get slows. The processes are getting slow but still the docker is utilizing only 30-40% of cpu.
I have tried following things to make docker utilize proper cpu and don't slow down the processes -
I have explicitly allocated the cpu and ram to the docker container.
I have also increased the number of file descriptors, number of process and stack size using ulimit.
even after doing this two thing still the container is not utilizing cpu properly. I am using docker exec to start multiple processes in single running container. Is there any efficient way to use single docker container for executing multiple processes or to make container use 100% of cpu?
The configuration i am using is
Server - aws ec2 t2.2Xlarge ( 8 core, 32 gb ram)
Docker version - 18.09.7
Os- ubuntu 18.04
When you run something on machine it consumes following resources:1. CPU 2. RAM 3. DISK I/O 4. Network Bandwidth. If your container is exhausting any one resource listed above than it is possible that other resources available. So monitor your system matrices to find the root cause.
We recently had our jenkins redone. We decided to have the new version on a docker container on the server.
While migrating, I noticed that the jenkins is MUCH slower when its in a container than when it ran on the server itself.
This is a major issue and could mess up our migration.
I tried looking for ways to give more resources to the container with not much help.
How can I speed the jenkins container/ give it all the resources it needs on the server (the server is dedicated only to jenkins).
Also, how do I devide these resources when I want to start up slave containers as well?
Disk operations
One thing that can go slow with Docker is when the process running in a container is making a lot of I/O calls to the container file system. The container file system is a union file system which is not optimized for speed.
This is where docker volumes are useful. Additionally to providing a location on the file system which survives container deletion, disk performance on a docker volume is good.
The Jenkins Docker image defines the JENKINS_HOME location as a docker volume, so as long as your Jenkins jobs are making their disk operations within that location you should be fine.
If you determine that disk access on that volume is still too slow, you could customize the mount location of that volume on your docker host so that it would end up being mounted on a fast drive such as a SSD.
Another trick is to make a docker volume mounted to RAM with tmpfs. Note that such a volume does not offer persistence and that data at that location will be lost when the container is stopped or deleted.
JVM memory exhaustion / Garbage collector
As Jenkins is a Java application, another potential issue comes in mind: memory exhaustion. In the case the JVM on which the Jenkins process runs on is too limited in memory, the Java garbage collector will runs too frequently. You can witness that when you realize your Java app is using too much CPU (the garbage collector uses CPU). If that is the case, give more memory to the JVM:
docker run-p 8080:8080 -p 50000:50000 --env JAVA_OPTS="-Xmx2048m -Djava.awt.headless=true" jenkins/jenkins:lts
Network
Docker containers have a virtual network stack and custom network settings. You also want to make sure that all network related operation are fast.
The DNS server might be an issue, check it by executing ping <some domain name> from the Jenkins container.
I would like to deploy a sidecar container that is measuring the memory usage (and potentially also CPU usage) of the main container in the pod and then send this data to an endpoint.
I was looking at cAdvisor, but Google Kubernetes Engine has hardcoded 10s measuring interval, and I need 1s granularity. Deploying another cAdvisor is an option, but I need those metrics only for a subset of pods, so it would be wasteful.
Is it possible to write a sidecar container that monitors the main container metrics? If so, what tools could the sidecar use to gather the data?
That one second granularity will be probably the main showstopper for many monitoring tools. In theory you can script it on your own. You can use Docker stats API and read stats stream only for main pod. You will need to mount /var/run/docker.sock to the sidecar container. Curl example:
curl -N --unix-socket /var/run/docker.sock http:/containers/<container-id>/stats
Another option is to read metric from cgroups. But you will need more calculations in this case. Mounting of croups to the sidecar container will be required. See some examples of cgroup pseudo-files on https://docs.docker.com/config/containers/runmetrics/
This could be done by sharing the process namespace for the Pod. Then the sidecar container would be able to see the processes from the main container (e.g. via ps), and would be able to monitor the CPU / Memory usage with standard unix tools.
One tool could be node-exporter, with the processes collector enabled. This can then be monitored by Prometheus
See topz, a simple utility to expose top command as web interface.
You can use Prometheus and Grafana for memory and cpu usage and monitoring. These are open source tools and can be used on production environment as well.
I'm looking for the monitoring solution for the web application, deployed as a Swarm of Docker containers spread through 7-10 VMs. High level requirements are:
Configurable Web and REST interface to performance dashboard
General performance metrics on VM levels (CPU/Memory/IO)
Alerts when containers and/or VMs are going offline/restart
Possibility to drill down into containers process activity when needed
Host OS are CoreOS and Ubuntu
Any recommendations/best practices here?
NOTE: external Kibana installation is being used to collect application logs from Logstash agents deployed on VMs.
Based on your requirements, it sounds like Sematext Docker Agent would be a good fit. It runs as a tiny container on each Docker host and collects all host+containers metrics, events, and logs. It can parse logs, route them, blacklist/whitelist them, has container auto-discovery, and so on. In the end logs end up in Logsene and metrics and events end up in SPM, which gives you a single pane of glass sort of view into all your Docker ops bits, with alerting, anomaly detection, correlation, and so on.
I am currently evaluating bosun with scollector + cAdvisor support. Look ok so far.
Edit:
It should meet all the listed requirements and a little bit more. :)
Take a look at Axibase Time-Series Database / Google Cadvisor / collectd stack.
Disclosure: I work for the company that develops ATSD.
Deploy 1 Cadvisor container per VM to collect Docker container statistics. Cadvisor front-end allows you to view top container processes.
Deploy 1 ATSD container to ingest data from multiple Cadvisor instances.
Deploy collectd daemon on each VM to collect host statistics, configure collectd daemons to stream data into ATSD using write_atsd plugin.
Dashboards:
Host:
Container:
API / SQL:
https://github.com/axibase/atsd/tree/master/api#api-categories
Alerts:
ATSD comes with a built-in rule engine. You can configure a rule to watch when containers stops collecting data and trigger an email or system command.