Prometheus with Dockerfile - docker

I have the following Dockerfile:
FROM prom/prometheus
ADD prometheus.yml /etc/prometheus/
with prometheus.yml:
global:
scrape_interval: 15s
external_labels:
monitor: 'codelab-monitor'
scrape_configs:
- job_name: 'prometheus'
metrics_path: /metrics
scrape_interval: 15s
static_configs:
- targets: ['localhost:9090']
- job_name: 'auth-service'
scrape_interval: 15s
metrics_path: /actuator/prometheus
static_configs:
- targets: ['localhost:8080']
And run it with the following command:
docker build -t prometheus .
docker run -d -p 9090:9090 --rm prometheus
prometheus has status up
auth-service has status down (Get "http://localhost:8080/actuator/prometheus": dial tcp 127.0.0.1:8080: connect: connection refused)
How can I solve problem with auth-service, because from local machine I can get metrics from this address http://localhost:8080/actuator/prometheus:
v.balun#macbook-vbalun Trainter-Prometheus % curl -X GET
http://localhost:8080/actuator/prometheus
# HELP jvm_memory_committed_bytes The amount of memory in bytes that is committed for the
Java virtual machine to use
# TYPE jvm_memory_committed_bytes gauge
jvm_memory_committed_bytes{area="heap",id="G1 Survivor Space",} 4194304.0
jvm_memory_committed_bytes{area="heap",id="G1 Old Gen",} 3.145728E7
jvm_memory_committed_bytes{area="nonheap",id="Metaspace",} 3.0982144E7
jvm_memory_committed_bytes{area="nonheap",id="CodeHeap 'non-nmethods'",} 2555904.0
jvm_memory_committed_bytes{area="heap",id="G1 Eden Space",} 2.7262976E7
jvm_memory_committed_bytes{area="nonheap",id="Compressed Class Space",} 4325376.0
jvm_memory_committed_bytes{area="nonheap",id="CodeHeap 'non-profiled nmethods'",} 6291456.0

The issue you are having seems not related to prometheus, it seems it is at the docker network level.
Inside your prometheus container you are saying this:
static_configs:
- targets: ['localhost:8080']
But remember that localhost is NOT now your physical host (As when you ran it locally outside Docker), it's now inside the container, and inside the same container most likely you don't have your service running...
With the information provided I suggest you the following:
Try first instead localhost use your real IP, depending on the network configuration you are using for your container, it will be enough...
You can use instead localhost the ip address of your auth-service, this is the one given by docker, you can run a docker inspect... to get it.
If #1 and #2 didn't work and if auth-service is running in another container inside the same physical host, then you could use a bridge network to make the communication between the containers possible, more details here: https://docs.docker.com/network/bridge/
👆 Once both containers are running in the same network you can use the container name to reference it instead localhost, something like:
static_configs:
- targets: ['auth-service:8080']

Related

Connect the Cassandra container to application web container failed - Error: 202 Connecting to Node

So, I created two docker's images and I want to connect one to which other with the docker composer. The first image is Cassandra 3.11.11 (from the official hub docker) and the other I created by myself with the tomcat version 9.0.54 and my application spring boot.
I ran the docker-compose.ylm below to connect the two container, where cassandra:latest is the cassandra's image and centos7-tomcat9-myapp is my app web's image.
version: '3'
services:
casandra:
image: cassandra:latest
myapp:
image: centos7-tomcat9-myapp
depends_on:
- casandra
environment:
- CASSANDRA_HOST=cassandra
I ran the command line to start the app web's image : docker run -it --rm --name fe3c2f120e01 -p 8888:8080 centos7-tomcat9-app .
In the console log the spring boot show me the error below. It happened, because the myapp's container could not connect to the Cassandra's container.
2021-10-15 15:12:14.240 WARN 1 --- [ s0-admin-1]
c.d.o.d.i.c.control.ControlConnection : [s0] Error connecting to
Node(endPoint=127.0.0.1:9042, hostId=null, hashCode=47889c49), trying
next node (ConnectionInitException: [s0|control|connecting...]
Protocol initialization request, step 1 (OPTIONS): failed to send
request (io.netty.channel.StacklessClosedChannelException))
What am I doing wrong?
EDIT
This is the nodetool status about the cassandra's image:
[root#GDBDEV04 cassandradb]# docker exec 552d359d177e nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.18.0.3 84.76 KiB 16 100.0% 685b6e0a-13c2-4d41-ba99-f3b0fa94477c rack1
EDIT 2
I need to connect the Cassandra's DB image with the web application image. It is different to connect microservices. I tried to change the 127.0.0.0 (inside the cassandra.yaml) to 0.0.0.0 (only to test) and the error persist. I think something missing in my docker-compose.yml for sure. However, I did not know what.
Finally I found the error. In my case, I need to fixed the docker-compose.yml file adding the Cassandra and Tomcat's ports. And in my application.properties (spring boot config file), I changed the cluster's name.
Docker-compose.yml:
version: '3'
services:
cassandra:
image: cassandra:latest
ports:
- "9044:9042"
myapp:
image: centos7-tomcat9-myapp
ports:
-"8086:8080"
depends_on:
- cassandra
environment:
- CASSANDRA_HOST=cassandra
Application.config :
# CASSANDRA (CassandraProperties)
cassandra.cluster = Test Cluster
cassandra.contactpoints=${CASSANDRA_HOST}
This question help me to resolve my problem: Accessing docker container mysql databases

how to pull from a private registry in gitlab CI, with docker DIND

actually I'm using gitlab runners, with docker executor, and I'm trying to pull some docker images to do some tests, and to preserve my network connection, I've created a private docker registry, to "cache" the images .
So, my registry is linked to my gitlab runner (with configuration in the config.toml https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-runnersdocker-section ) .
This work, my image can ask the registry :
$ wget http://registry:5000/v2/_catalog
--2019-02-15 10:40:54-- http://registry:5000/v2/_catalog
Resolving registry... 172.17.0.3
Connecting to registry|172.17.0.3|:5000... connected.
HTTP request sent, awaiting response... 200 OK
Length: 20 [application/json]
Saving to: '_catalog'
0K 100% 1.17M=0s
2019-02-15 10:40:54 (1.17 MB/s) - '_catalog' saved [20/20]
but the DIND service can't :
pull registry:5000/arminc/clair-db:latest
Error response from daemon: Get http://registry:5000/v2/: dial tcp: lookup registry on 192.168.9.254:53: no such host
My gitlab-ci conf for this task
scan:image:
stage: scans
image: docker:git
services:
- name: docker:dind
command: ["--insecure-registry=registry:5000"]
variables:
DOCKER_DRIVER: overlay2
allow_failure: true
script:
- chmod 777 ./docker/scan.sh
- docker login -u $DOCKER_USERNAME -p $DOCKER_PASSWORD $DOCKER_REGISTRY
- ./docker/scan.sh
artifacts:
paths: [gl-container-scanning-report.json]
only:
- master
Probably, you might need to add a DNS entry to your DNS server or dockers host file:
192.168.xx.xxx registry

cAdvisor prometheus integration returns container_cpu_load_average_10s as 0

I have configured Prometheus to scrape metrics from cAdvisor. However, the metric "container_cpu_load_average_10s" only returns 0. I am able to see the CPU metrics under the cAdvisor web UI correctly but Prometheus receives only 0. It is working fine for other metrics like "container_cpu_system_seconds_total". Could someone point if I am missing something here?
Prometheus version: 2.1.0
Prometheus config:
scrape_configs:
- job_name: cadvisor
scrape_interval: 5s
metrics_path: /metrics
scheme: http
static_configs:
- targets:
- 172.17.0.2:8080
cAdvisor version: 0.29.0
In order to get the metric container_cpu_load_average_10s, the cAdvisor must run with the option
--enable_load_reader=true
which is set fo false by default. This is described here.
If the value is zero, it means the container is idle.
You don't need 'enable_load_reader'. I don't enable it as it may make cAdvisor unstable.
Some useful links:
Linux Load Averages: Solving the Mystery
High CPU utilization but low load average question
cAdvisor enable_load_reader

Prometheus create label from metric label

We are running the node-exporter in containers. To quickly identify on which host each node-exporter is running, I created a metric that looks like this: host{host="$HOSTNAME",node="$CONTAINER_ID"} 1
I'm looking for a way to extract the hostname in host= and create a label for each node-exporter instance as a hostname label. I tried numerous configurations and none seem to work. Current prometheus config looks like this:
scrape_configs:
- job_name: 'node'
scrape_interval: 10s
scrape_timeout: 5s
metrics_path: /metrics
scheme: http
dns_sd_configs:
- names:
- tasks.master-nodeexporter
refresh_interval: 30s
type: A
port: 9100
relabel_configs:
- source_labels: ['host']
regex: '"(.*)".*'
target_label: 'hostname'
replacement: '$1'
This is not possible, as target relabelling happens before the scrape.
What you want to do here is use service discovery to have the right hostname in the first place, which is not possible with dns_sd_configs. You might look at something like Consul and https://www.robustperception.io/controlling-the-instance-label/
If someone comes across this:
Create this as docker-entrypoint.sh and make it executable.
#!/bin/sh -e
# Must be executable by others
NODE_NAME=$(cat /etc/nodename) echo "node_meta{node_id=\"$NODE_ID\", container_label_com_docker_swarm_node_id=\"$NODE_ID\", node_name=\"$NODE_NAME\"} 1" > /etc/node-exporter/node-meta.prom
set -- /bin/node_exporter "$#"
exec "$#"
Than create a Dockerfile like this
FROM prom/node-exporter:latest
ENV NODE_ID=none
USER root
COPY conf /etc/node-exporter/
ENTRYPOINT [ "/etc/node-exporter/docker-entrypoint.sh" ]
CMD [ "/bin/node_exporter" ]
Than build it and you will always get the Hostname as a node_meta metric
This answer explains how to export node name via node_name label at node_meta metric. Then it is possible to add node_name label to any metric exposed by node_exporter with group_left() modifier during querying. For example, the following PromQL query adds node_name label from node_meta metric to node_memory_Active_bytes metric:
node_memory_Active_bytes
* on(job,instance) group_left(node_name)
node_meta
See more details about group_left() modifier in these docs and this article.

Docker - Prometheus container dies immediately

I have cadvisor running with port mapping 4000:8080 and I have to link it with a container with prometheus.
My prometheus.yml is:
scrape_configs:
# Scrape Prometheus itself every 2 seconds.
- job_name: 'prometheus'
scrape_interval: 2s
target_groups:
- targets: ['localhost:9090', 'cadvisor:8080']
This file has path /home/test/prometheus.yml.
To run the container with prometheus, I do:
docker run -d -p 42047:9090 --name=prometheus -v /home/test/prometheus.yml:/etc/prometheus/prometheus.yml --link cadvisor:cadvisor prom/prometheus -config.file=/etc/prometheus/prometheus.yml -storage.local.path=/prometheus -storage.local.memory-chunks=10000
The container is created, but it dies immediately.
Can you tell me where the problem is?
Messages form docker events& :
2016-11-21T11:43:04.922819454+01:00 container start 69d03c68525c5955cc40757dc973073403b13fdd41c7533f43b7238191088a25 (image=prom/prometheus, name=prometheus)
2016-11-21T11:43:05.152141981+01:00 container die 69d03c68525c5955cc40757dc973073403b13fdd41c7533f43b7238191088a25 (exitCode=1, image=prom/prometheus, name=prometheus)
Config format is changed. targets come under static_config in the latest version.
scrape_configs:
# Scrape Prometheus itself every 2 seconds.
- job_name: 'prometheus'
scrape_interval: 2s
static_configs:
- targets: ['localhost:9090', 'cadvisor:8080']
Prometheus Documentation for further help
Yes, target_groups is renamed to static_configs. Please use the latest Prometheus image with the following.
static_configs:
- targets: ['localhost:9090', 'cadvisor:8080']
The above worked for me.
I think target_groups have been deprecated from scrape_configs in latest version of prometheus.
you can try static_configs or file_sd_config
scrape_config
static_config
file_sd_config
scrape_configs:
- job_name: node_exporter
static_configs:
- targets:
- "stg-elk-app-01:9100"
- "stg-app-02:9100"
The indentation isn't correct, try:
scrape_configs:
# Scrape Prometheus itself every 2 seconds.
- job_name: 'prometheus'
scrape_interval: 2s
target_groups:
- targets: ['localhost:9090', 'cadvisor:8080']
As you said in your earlier comment:
from logs: time="2016-11-21T11:21:40Z" level=error msg="Error loading config: couldn't load configuration (-config.file=/etc/prometheus/prometheus.yml): unknown fields in scrape_config: target_groups" source="main.go:149"
Which clearly means that the field "target_groups" is causing the problem. This is due to the fact that the new version of Prometheus (v1.5 onwards) have discarded the use of "target_groups" field and simply provide the targets. Even I faced this issue about 6 months ago. Please try it with a new version. The docker pull prom/prometheus might be getting you the old one.
Hope this helps...!!!
The name of the container is prometheus.
Generally, when a container exists immediately after it starts, I would recommend adding the -log.level=debug right after -config.file.
docker run -d -p 42047:9090 --name=prometheus -v /home/test/prometheus.yml:/etc/prometheus/prometheus.yml --link cadvisor:cadvisor prom/prometheus -config.file=/etc/prometheus/prometheus.yml -log.level=debug -storage.local.path=/prometheus -storage.local.memory-chunks=10000
Next, see the logs for the container:
docker logs prometheus
Any issues with the configuration will be there.

Resources