Why is Loki's Docker Driver Client stopping to log after some time?

Why is Loki's Docker Driver Client stopping to log after some time? - docker

I want to send logs of my Docker containers to Grafana Loki. Therefore, I installed Loki's Docker Driver Client and started my containers with it. First I can see logs, but after some time I see no more logs.
Installation
I installed Loki's Docker Driver Client as a Docker plugin on my Docker Engine (version 20.10.2):
$ docker plugin install grafana/loki-docker-driver:master-54d1d3b --alias loki --grant-all-permissions
I didn't use the tag lastest, because of the bug Unable to connect to logging plugin in Swarm
Configuration
I started my Docker containers with Loki's Docker Driver Client as log driver:
$ docker container run
--log-driver=loki
--log-opt loki-url="$LOKI_URL"
--log-opt loki-retries=5
--log-opt loki-batch-size=400
--log-opt max-size="10m"
--log-opt max-file=5
--detach
--name $CONTAINER_NAME
--restart unless-stopped
$IMAGE:$TAG
I also added json-log driver's max-size and max-file to limit disk space, see Configuring the Docker Driver.
Problem
First I could see logs in Grafana and in command line with docker container logs, but after some time no more logs were shown. If I tried to look into the logs on Docker host and I saw an error:
$ docker container logs 75d4b13eb3e8
error from daemon in stream: Error grabbing logs: error getting log reader: LogDriver.ReadLogs: logger does not exist for 75d4b13eb3e8203b9247ecdeb41fdf495cc8fea7dcfc4775fd8261263b1dcd32
Research
I looked into the directories of the containers (see Where is a log file with logs from a container?), but I couldn't see any log files:
$ sudo ls /var/lib/docker/containers/75d4b13eb3e8203b9247ecdeb41fdf495cc8fea7dcfc4775fd8261263b1dcd32
checkpoints config.v2.json hostconfig.json hostname hosts mounts resolv.conf resolv.conf.hash
I also checked the log path (see Get an instance’s log path), but it was empty:
$ docker inspect --format='{{.LogPath}}' 75d4b13eb3e8
I found container's logs in plugin's directory (see Loki log driver not storing logs as files on disk, even with keep-file: true), but the log files don't change anymore:
$ sudo ls -la /var/lib/docker/plugins/eac33cc9913ca962a189904392e516dd495d6fd52391fb5af4a34af46b281288/rootfs/var/log/docker/75d4b13eb3e8203b9247ecdeb41fdf495cc8fea7dcfc4775fd8261263b1dcd32
total 912
drwxr-xr-x 2 root root 4096 Jan 22 12:59 .
drwxr-xr-x 17 root root 4096 Jan 22 15:46 ..
-rw-r----- 1 root root 923177 Jan 22 13:34 json.log
I looked into Docker daemon's logs (see Read the logs) and found errors and a warning (at the same time logging stopped):
$ sudo journalctl -u docker.service | grep eac33cc9913c
[...]
[...]level=error msg="panic: send on closed channel" plugin=eac33cc9913ca962a189904392e516dd495d6fd52391fb5af4a34af46b281288
[...]level=error plugin=eac33cc9913ca962a189904392e516dd495d6fd52391fb5af4a34af46b281288
[...]level=error msg="goroutine 153 [running]:" plugin=eac33cc9913ca962a189904392e516dd495d6fd52391fb5af4a34af46b281288
[...]level=error msg="main.(*loki).Log(0xc0000c5e00, 0xc0001d81c0, 0xc0000c5e80, 0x0)" plugin=eac33cc9913ca962a189904392e516dd495d6fd52391fb5af4a34af46b281288
[...]level=error msg="\t/src/loki/cmd/docker-driver/loki.go:69 +0x2fb" plugin=eac33cc9913ca962a189904392e516dd495d6fd52391fb5af4a34af46b281288
[...]level=error msg="main.consumeLog(0xc0002c0480)" plugin=eac33cc9913ca962a189904392e516dd495d6fd52391fb5af4a34af46b281288
[...]level=error msg="\t/src/loki/cmd/docker-driver/driver.go:165 +0x4c2" plugin=eac33cc9913ca962a189904392e516dd495d6fd52391fb5af4a34af46b281288
[...]level=error msg="created by main.(*driver).StartLogging" plugin=eac33cc9913ca962a189904392e516dd495d6fd52391fb5af4a34af46b281288
[...]level=error msg="\t/src/loki/cmd/docker-driver/driver.go:116 +0xa75" plugin=eac33cc9913ca962a189904392e516dd495d6fd52391fb5af4a34af46b281288
[...]level=warning msg="Unable to connect to plugin: /run/docker/plugins/eac33cc9913ca962a189904392e516dd495d6fd52391fb5af4a34af46b281288/loki.sock/LogDriver.StopLogging: Post http://%2Frun%2Fdocker%2Fplugins%2Feac33cc9913ca962a189904392e516dd495d6fd52391fb5af4a34af46b281288%2Floki.sock/LogDriver.StopLogging: EOF, retrying in 1s"
[...]
What did I do wrong?

I was experiencing the same issue.
My only differences in configuration are that I'm trialing the latest Enterprise Edition (19.03) as it brings dual logging capability although this is also supported in the latest CE versions, and I'm using the latest Loki Docker driver client now that the Github issue previously mentioned has been resolved.
I ended up setting the log-opts properties no-file and keep-file in docker-compose.yml:
logging:
driver: "loki"
options:
loki-url: "http://${LOKI_URL}:3100/loki/api/v1/push"
loki-batch-size: "400"
no-file: "false"
keep-file: "true"
max-size: "5m"
max-file: "3"
Since making this change I am receiving logs in Loki and can still use docker container logs and docker service logs on my Docker hosts.
no-file: "false" tells the driver to continue creating logs on disk and keep-file: "true" tells the driver to keep json logs if the container is stopped (by default files are removed).
Note: Originally I was adding these settings to /etc/docker/daemon.json on the host but would still see the error getting log reader issue, I had to switch to specifying the log driver per container/swarm service.

Regarding this issue
First I could see logs in Grafana and in command line with docker container logs, but after some time no more logs were shown.
On Grafana please select Query type: Range not Instant and you will see all the logs for the selected period of time, if exists in loki.

Related

Logspout container in Docker

I am trying to deploy logspout container in docker, but keep running into an issue which I have searched in this website and github but to no avail, so hoping someone knows.
I followed the following commands as per the Readme here: https://github.com/gliderlabs/logspout
(1) docker pull gliderlabs/logspout:latest (also tried with logspout:master, same results)
(2) docker run -d --name="logspout" --volume=/var/run/docker.sock:/var/run/docker.sock --publish=127.0.0.1:8000:80 gliderlabs/logspout (also tried with -v /var/run/docker.sock:/var/run/docker.sock, same results)
The container gets created but stops immediately. When I check the container logs (docker container logs logspout), I only see the following entries:
2021/12/19 06:37:12 # logspout v3.2.14 by gliderlabs
2021/12/19 06:37:12 # adapters: raw syslog tcp tls udp multiline
2021/12/19 06:37:12 # options :
2021/12/19 06:37:12 persist:/mnt/routes
2021/12/19 06:37:12 # jobs : pump routes http[health,logs,routes]:80
2021/12/19 06:37:12 # routes : none
2021/12/19 06:37:12 pump ended: Get http://unix.sock/containers/json?: dial unix /var/run/docker.sock: connect: no such file or directory
I checked docker.sock as ls -la /var/run/docker.sock results in srw-rw---- 1 root docker 0 Dec 12 09:49 /var/run/docker.sock. So docker.sock does exist, which adds to the confusion as to why the container can't find it.
I am new to linux/docker, but my understanding is that using -v or --version would automatically mount the location to the container, but does not seem to be happening here. So I am wondering if anyone has any suggestion on what needs to be done so that the logspout container can find the docker.sock.
System Info: Docker version 20.10.11, build dea9396; Raspberry Pi 4 ARM 64, OS: Debian GNU/Linux 11 (bullseye)
EDIT: added comment about -v tag in step (2) above

The container must be able to access the Docker Unix socket to mount it. This is typically a problem when namespace remapping is enabled. To disable remapping for the logspout container, pass the --userns=host flag to docker run, .. create, etc.

Can I run k8s master INSIDE a docker container? Getting errors about k8s looking for host's kernel details

In a docker container I want to run k8s.
When I run kubeadm join ... or kubeadm init commands I see sometimes errors like
\"modprobe: ERROR: ../libkmod/libkmod.c:586 kmod_search_moddep() could
not open moddep file
'/lib/modules/3.10.0-1062.1.2.el7.x86_64/modules.dep.bin'.
nmodprobe:
FATAL: Module configs not found in directory
/lib/modules/3.10.0-1062.1.2.el7.x86_64",
err: exit status 1
because (I think) my container does not have the expected kernel header files.
I realise that the container reports its kernel based on the host that is running the container; and looking at k8s code I see
// getKernelConfigReader search kernel config file in a predefined list. Once the kernel config
// file is found it will read the configurations into a byte buffer and return. If the kernel
// config file is not found, it will try to load kernel config module and retry again.
func (k *KernelValidator) getKernelConfigReader() (io.Reader, error) {
possibePaths := []string{
"/proc/config.gz",
"/boot/config-" + k.kernelRelease,
"/usr/src/linux-" + k.kernelRelease + "/.config",
"/usr/src/linux/.config",
}
so I am bit confused what is simplest way to run k8s inside a container such that it consistently past this getting the kernel info.
I note that running docker run -it solita/centos-systemd:7 /bin/bash on a macOS host I see :
# uname -r
4.9.184-linuxkit
# ls -l /proc/config.gz
-r--r--r-- 1 root root 23834 Nov 20 16:40 /proc/config.gz
but running exact same on a Ubuntu VM I see :
# uname -r
4.4.0-142-generic
# ls -l /proc/config.gz
ls: cannot access /proc/config.gz
[Weirdly I don't see this FATAL: Module configs not found in directory error every time, but I guess that is a separate question!]
UPDATE 22/November/2019. I see now that k8s DOES run okay in a container. Real problem was weird/misleading logs. I have added an answer to clarify.

I do not believe that is possible given the nature of containers.
You should instead test your app in a docker container then deploy that image to k8s either in the cloud or locally using minikube.
Another solution is to run it under kind which uses docker driver instead of VirtualBox
https://kind.sigs.k8s.io/docs/user/quick-start/

It seems the FATAL error part was a bit misleading.
It was badly formatted by my test environment (all on one line.
When k8s was failing I saw the FATAL and assumed (incorrectly) that was root cause.
When I format the logs nicely I see ...
kubeadm join 172.17.0.2:6443 --token 21e8ab.1e1666a25fd37338 --discovery-token-unsafe-skip-ca-verification --experimental-control-plane --ignore-preflight-errors=all --node-name 172.17.0.3
[preflight] Running pre-flight checks
[WARNING FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables does not exist
[preflight] The system verification failed. Printing the output from the verification:
KERNEL_VERSION: 4.4.0-142-generic
DOCKER_VERSION: 18.09.3
OS: Linux
CGROUPS_CPU: enabled
CGROUPS_CPUACCT: enabled
CGROUPS_CPUSET: enabled
CGROUPS_DEVICES: enabled
CGROUPS_FREEZER: enabled
CGROUPS_MEMORY: enabled
[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.09.3. Latest validated version: 18.06
[WARNING SystemVerification]: failed to parse kernel config: unable to load kernel module: "configs", output: "modprobe: ERROR: ../libkmod/libkmod.c:586 kmod_search_moddep() could not open moddep file '/lib/modules/4.4.0-142-generic/modules.dep.bin'\nmodprobe: FATAL: Module configs not found in directory /lib/modules/4.4.0-142-generic\n", err: exit status 1
[discovery] Trying to connect to API Server "172.17.0.2:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://172.17.0.2:6443"
[discovery] Failed to request cluster info, will try again: [the server was unable to return a response in the time allotted, but may still be processing the request (get configmaps cluster-info)]
There are other errors later, which I originally though were a side-effect of the nasty looking FATAL error e.g. .... "[util/etcd] Attempt timed out"]} but I now think root cause is Etcd part times out sometimes.
Adding this answer in case someone else puzzled like I was.

Docker (Spotify) API - cannot connect to Docker

In my Docker (Spring Boot) application I would like to execute Docker commands. I use the docker-spotify-api (client).
I get different connection errors. I start the application as part of a docker-compose.yml.
This is what I tried so far on an EC2 AWS VPS:
docker = DefaultDockerClient.builder()
.uri(URI.create("tcp://localhost:2376"))
.build();
=> TCP protocol not supported.
docker = DefaultDockerClient.builder()
.uri(URI.create("tcp://localhost:2375"))
.build();
=> TCP protocol not supported.
docker = new DefaultDockerClient("unix:///var/run/docker.sock");
==> No such file
docker = DefaultDockerClient.builder()
.uri("unix:///var/run/docker.sock")
.build();
==> No such file
docker = DefaultDockerClient.builder()
.uri(URI.create("http://localhost:2375")).build();
or
docker = DefaultDockerClient.builder()
.uri(URI.create("http://localhost:2376")).build();
or
docker = DefaultDockerClient.builder()
.uri(URI.create("https://localhost:2376"))
.build();
==> Connect to localhost:2376 [localhost/127.0.0.1] failed: Connection refused (Connection refused)
Wthat is my environment on EC2 VPS:
$ ls -l /var/run
lrwxrwxrwx 1 root root 6 Nov 14 07:23 /var/run -> ../run
$ groups ec2-user
ec2-user : ec2-user adm wheel systemd-journal docker
$ ls -l /run/docker.sock
srw-rw---- 1 root docker 0 Feb 14 17:16 /run/docker.sock
echo $DOCKER_HOST $DOCKER_CERT_PATH
(empty)

This situation is similar to https://github.com/spotify/docker-client/issues/838#issuecomment-318261710.
You use docker-compose on the host to start up your application; Within the container, the Spring Boot application is using docker-spotify-api.
What you can try is to mount /var/run/docker.sock:/var/run/docker.sock in you compose file.

As #Benjah1 indicated, /var/run/docker.sock had to be mounted first.
To do so in a docker-compose / Docker Swarm environment you can do:
volumes:
- /var/run/docker.sock:/var/run/docker.sock
Furthermore, the other options resulted in errors because the default setting of Docker is that it won't open up to tcp/http connections. You can change this, of course, taking a small risk.

What is your DOCKER_HOST and DOCKER_CERT_PATH env vars value.
Try below as docker-client communicates with your local Docker daemon using the HTTP Remote API
final DockerClient docker = DefaultDockerClient.builder()
.uri(URI.create("https://localhost:2376"))
.build();
please also verify the privileges of docker.sock is it visible to your app and check weather your docker service is running or not as from above screenshot your docker.sock looks empty but if service is running it should contain pid in it

It took me some time to figure out, but I was running https://hub.docker.com/r/alpine/socat/ locally, and also wanted to connect to my Docker daemon and couldn't (same errors). Then it struck me: the solution on that webpage uses 127.0.0.1 as the ip address to bind to. Instead, start that container with 0.0.0.0, and then inside your container, you can do this: DockerClient dockerClient = new DefaultDockerClient("http://192.168.1.215:2376"); (use your own ip-address of course).
This worked for me.

startNodeManager.sh not found

I have been trying to run Oracle weblogic in Docker containers and i am facing trouble in starting the NodeManager.I ran the following command.
docker run -d --name MS1 --link wlsadmin:wlsadmin -p 8001:8001 -e ADMIN_PASSWORD=#123 \
-e MS_NAME=MS1 --volumes-from wlsadmin a5e55 createServer.sh
Under normal circumstances it is expected to start the Nodemanager.
I am able to access the weblogic console and start the Managed Server which then returns the error-
-- Warning For server MS1, the Node Manager associated with machine Machine_MS1 is not reachable
This is the part of the log file that is returned on executing the above "docker run" command :
Domain Home: /u01/oracle/user_projects/domains/base_domain
Managed Server Name: MS1
NodeManager Name:
----> 'weblogic' admin password: ctebs#123
Waiting for WebLogic Admin Server on wlsadmin:7001 to become available...
WebLogic Admin Server is now available. Proceeding...
Setting NodeManager
----> No NodeManager Name set
Node Manager Name: Machine_MS1
Node Manager Home for Container: /u01/oracle/user_projects/domains/base_domain/Machine_MS1
cp: cannot stat '/u01/oracle/user_projects/domains/base_domain /bin/startNodeManager.sh': No such file or directory
cp: cannot stat '/u01/oracle/user_projects/domains/base_domain/nodemanager/*': No such file or directory
NODEMGR_HOME_STR: NODEMGR_HOME="/u01/oracle/user_projects/domains/base_domain/Machine_MS1"
NODEMGRHOME_STR: NodeManagerHome=/u01/oracle/user_projects/domains/base_domain/Machine_MS1
DOMAINSFILE_STR: DomainsFile=/u01/oracle/user_projects/domains/base_domain/Machine_MS1/nodemanager.domains
LOGFILE_STR: LogFile=/u01/oracle/user_projects/domains/base_domain/Machine_MS1/nodemanager.log
sed: can't read /u01/oracle/user_projects/domains/base_domain/Machine_MS1/startNodeManager.sh: No such file or directory
sed: can't read /u01/oracle/user_projects/domains/base_domain/Machine_MS1/nodemanager.properties: No such file or directory
sed: can't read /u01/oracle/user_projects/domains/base_domain/Machine_MS1/nodemanager.properties: No such file or directory
sed: can't read /u01/oracle/user_projects/domains/base_domain/Machine_MS1/nodemanager.properties: No such file or directory
Starting NodeManager in background...
NodeManager started.
Connection refused (Connection refused). Could not connect to NodeManager. Check that it is running at /172.17.0.3:5556.
Starting server MS1 ...No stack trace available.
This Exception occurred at Tue Dec 12 03:38:06 GMT 2017.
weblogic.management.scripting.ScriptException: Error occurred while performing start : Server with name MS1 failed to be started
No stack trace available.
How can I get past this error message?

You can try and follow this OracleWebLogic workshop intro which points out:
The ~/docker-images/OracleWebLogic/samples/1221-domain/container-scripts has useful Bash and WLST scripts that provide three possible modes to run WebLogic Managed Servers on a Docker container. Make sure you have an AdminServer container running before starting a ManagedServer container.
The sample scripts will by default, attempt to find the AdminServer running at t3://wlsadmin:8001. You can change this.
But most importantly, the AdminServer container has to be linked with Docker's --link parameter.
Below, are the three suggestions for running ManagedServer Container within the sample 12c-domain:
Start NodeManager (Manually):
docker run -d --link wlsadmin:wlsadmin startNodeManager.sh
Start NodeManager and Create a Machine Automatically:
docker run -d --link wlsadmin:wlsadmin createMachine.sh
Start NodeManager, Create a Machine, and Create a ManagedServer Automatically
docker run -d --link wlsadmin:wlsadmin createServer.sh
See more at "Example of Image with WLS Domain", removed in commit e49bb4d in Apr. 2019, 2 yers later, since Oracle no longer supports WebLogic versions.

docker - driver "devicemapper" failed to remove root filesystem after process in container killed

I am using Docker version 17.06.0-ce on Redhat with devicemapper storage. I am launching a container running a long-running service. The master process inside the container sometimes dies for whatever reason. I get the following error message.
/bin/bash: line 1: 40 Killed python -u scripts/server.py start go
I would like the container to exit and to be restarted by docker. However docker never exits. If I do it manually I get the following error:
Error response from daemon: driver "devicemapper" failed to remove root filesystem.
After googling, I tried a bunch of things:
docker rm -f <container>
rm -f <pth to mount>
umount <pth to mount>
All result in device is busy. The only remedy right now is to reboot the host system which is obviously not a long-term solution.
Any ideas?

I had the same problem and the solution was a real surprise.
So here is the error om docker rm:
$ docker rm 08d51aad0e74
Error response from daemon: driver "devicemapper" failed to remove root filesystem for 08d51aad0e74060f54bba36268386fe991eff74570e7ee29b7c4d74047d809aa: remove /var/lib/docker/devicemapper/mnt/670cdbd30a3627ae4801044d32a423284b540c5057002dd010186c69b6cc7eea: device or resource busy
Then I did the following (basically go through all processes and look for docker in mountinfo):
$ grep docker /proc/*/mountinfo | grep 958722d105f8586978361409c9d70aff17c0af3a1970cb3c2fb7908fe5a310ac
/proc/20416/mountinfo:629 574 253:15 / /var/lib/docker/devicemapper/mnt/958722d105f8586978361409c9d70aff17c0af3a1970cb3c2fb7908fe5a310ac rw,relatime shared:288 - xfs /dev/mapper/docker-253:5-786536-958722d105f8586978361409c9d70aff17c0af3a1970cb3c2fb7908fe5a310ac rw,nouuid,attr2,inode64,logbsize=64k,sunit=128,swidth=128,noquota
This got be the PID of the offending process keeping it busy - 20416 (the item after /proc/)
So I did a ps -p and to my surprise find:
[devops#dp01app5030 SeGrid]$ ps -p 20416
PID TTY TIME CMD
20416 ? 00:00:19 ntpd
A true WTF moment. So I pair problem solved with Google and found this:
Then found this https://github.com/docker/for-linux/issues/124
Turns out I had to restart ntp daemon and that fixed the issue!!!

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart