fluentd flush every time unit or when buffer reached a certain size - fluentd

how can fluentd version 1.15.2 be configured to flush the store/output of the buffer every 5 minutes or when the buffers size reached 10 megabytes?
i've tried the following buffer configuration, but it didn't achieve the desired action.
<buffer tag,time>
#type file
path /var/log/fluent/foo
timekey 5m
timekey_wait 0s
chunk_limit_size 10m
</buffer>

Related

What does increasing NET I/O value in docker stats mean?

I am running command docker stats <container_id> > performance.txt over a period of 1 hour during multi-user testing. Some stats like memory, CPU increase, then normalize. But, it is with NET I/O value that it kept on increasing.
At the start, the O/P was:
NAME CPU % MEM USAGE / LIMIT NET I/O BLOCK I/O PIDS
my-service 0.10% 5.63GiB / 503.6GiB 310MB / 190MB 0B / 0B 80
NAME CPU % MEM USAGE / LIMIT NET I/O BLOCK I/O PIDS
my-service 0.20% 5.63GiB / 503.6GiB 310MB / 190MB 0B / 0B 80
After 1 hour, it is:
NAME CPU % MEM USAGE / LIMIT NET I/O BLOCK I/O PIDS
my-service 116.26% 11.54GiB / 503.6GiB 891MB / 523MB 0B / 0B 89
NAME CPU % MEM USAGE / LIMIT NET I/O BLOCK I/O PIDS
my-service 8.52% 11.54GiB / 503.6GiB 892MB / 523MB 0B / 0B 89
As above, the value of NET I/O is always increasing. What can it probably mean?
Docker documentation says it is the input received and output given by the container. If so, then why is it increasing? Is there some issue with the image running in the container?
NET I/O is a cumulative counter. It only goes up (when your app receives and sends data).
https://docs.docker.com/engine/reference/commandline/stats/
Column name
Description
NET I/O
The amount of data the container has sent and received over its network interface
So it's accumulated over time. Unlike, say, CPU % which is how much CPU the container is using right now.
The docker stats command returns a live data stream for running containers.
It's considered the total amount of data passed over the network since the container started. From the definition of stream:
computing
a continuous flow of data or instructions,
It doesn't say that explicitly but you can obviously infer this by the term continuous or stream. Perhaps the documentation could be made a bit more clear in that respect.

Why is my cgroup write throughput not limited?

I am trying to set upper write throughput limit per cgroup via blkio cgroup controller.
I have tried it like this:
echo "major:minor 10485760" > /sys/fs/cgroup/blkio/docker/XXXXX/blkio.throttle.write_bps_device
This should limit throughput to 10 MBps. However tool, that's monitoring servers disk, reports this behaviour.
I thought that, the line should hold somewhere around 10M. Can somebody explain this behaviour to me and maybe propose a better way to limit throughput?
Are you sure that the major/minor numbers that you specified in the command line are correct? Moreover, as you are running in docker, the limitation is for the processes running in the docker container not for the processes running outside. So, you need to check from where the information taken by the monitoring tool come from (does it take numbers for all the processes inside and outside the container or only for the processes inside the container?).
To check the setting, the Linux documentation provides an example with the dd command and a device limited to 1MB/second on reads. You can try the same with a limit on the writes to see if the monitoring tool is coherent with the output of dd. Make the latter run in the container.
For example, my home directory is located on /dev/sdb2:
$ df
Filesystem 1K-blocks Used Available Use% Mounted on
[...]
/dev/sdb2 2760183720 494494352 2125409664 19% /home
[...]
$ ls -l /dev/sdb*
brw-rw---- 1 root disk 8, 16 mars 14 08:14 /dev/sdb
brw-rw---- 1 root disk 8, 17 mars 14 08:14 /dev/sdb1
brw-rw---- 1 root disk 8, 18 mars 14 08:14 /dev/sdb2
I check the speed of the writing in a file:
$ dd oflag=direct if=/dev/zero of=$HOME/file bs=4K count=1024
1024+0 records in
1024+0 records out
4194304 bytes (4,2 MB, 4,0 MiB) copied, 0,131559 s, 31,9 MB/s
I set the 1MB/s write limit on the whole disk (8:16) as it does not work on individual partitions (8:18) on which my home directory resides:
# echo "8:16 1048576" > /sys/fs/cgroup/blkio/blkio.throttle.write_bps_device
# cat /sys/fs/cgroup/blkio/blkio.throttle.write_bps_device
8:16 1048576
dd's output confirms the limitation of the I/O throughput to 1 MB/s:
$ dd oflag=direct if=/dev/zero of=$HOME/file bs=4K count=1024
1024+0 records in
1024+0 records out
4194304 bytes (4,2 MB, 4,0 MiB) copied, 4,10811 s, 1,0 MB/s
So, it is possible to make the same in a container.

Fluentd file output does not output to file

On Ubuntu 18.04, I am running td-agent v4 which uses Fluentd v1.0 core. First I configured it with TCP input and stdout output. It receives and outputs the messages fine. I then configure it to output to file with a 10s flush interval, yet I do not see any output files generated in the destination path.
This is my file output configuration:
<match>
#type file
path /var/log/td-agent/test/access.%Y-%m-%d.%H:%M:%S.log
<buffer time>
timekey 10s
timekey_use_utc true
timekey_wait 2s
flush_interval 10s
</buffer>
</match>
I perform this check every 10s to see if log files are generated, but all I see is a directory with a name that still has the placeholders that I set for the path param:
ls -la /var/log/td-agent/test
total 12
drwxr-xr-x 3 td-agent td-agent 4096 Feb 5 23:14 .
drwxr-xr-x 6 td-agent td-agent 4096 Feb 6 00:17 ..
drwxr-xr-x 2 td-agent td-agent 4096 Feb 5 23:14 access.%Y-%m-%d.%H:%M:%S.log
From following the Fluentd docs, I was expecting this should be fairly straight forward since the file output and buffering plugins are bundled with Fluentd's core.
Am I missing something trivial here?
I figured it out, and it works now. I had two outputs, one to file and another to stdout. Apparently that won't work if they're both defined separately in the config file with their own <match> ... </match>. I believe output to stdout was read first in the config, so Fluentd outputted to that and not to file. They should both instead be nested under the copy output like this:
<match>
#type copy
<store>
#type file
...
</store>
<store>
#type stdout
</store>
</match>

Kubernetes node ulimit settings

I am running Kubernets v1.11.1 cluster, sometime my kube-apiserver server started throwing the 'too many open files' message. I noticed to many open TCP connections node kubelet port 10250
My server configured with 65536 file descriptors. Do I need to increase the number of open files for the container host? What are the recommended ulimit settings for the container host?
api server log message
I1102 13:57:08.135049 1 logs.go:49] http: Accept error: accept tcp [::]:6443: accept4: too many open files; retrying in 1s
I1102 13:57:09.135191 1 logs.go:49] http: Accept error: accept tcp [::]:6443: accept4: too many open files; retrying in 1s
I1102 13:57:10.135437 1 logs.go:49] http: Accept error: accept tcp [::]:6443: accept4: too many open files; retrying in 1s
I1102 13:57:11.135589 1 logs.go:49] http: Accept error: accept tcp [::]:6443: accept4: too many open files; retrying in 1s
I1102 13:57:12.135755 1 logs.go:49] http: Accept error: accept tcp [::]:6443: accept4: too many open files; retrying in 1s
my host ulimit values:
# ulimit -a
-f: file size (blocks) unlimited
-t: cpu time (seconds) unlimited
-d: data seg size (kb) unlimited
-s: stack size (kb) 8192
-c: core file size (blocks) unlimited
-m: resident set size (kb) unlimited
-l: locked memory (kb) 64
-p: processes unlimited
-n: file descriptors 65536
-v: address space (kb) unlimited
-w: locks unlimited
-e: scheduling priority 0
-r: real-time priority 0
Thanks
SR
65536 seems a bit low, although there are many apps that recommend that number. This is what I have on one K8s cluster for the kube-apiserver:
# kubeapi-server-container
# |
# \|/
# ulimit -a
-f: file size (blocks) unlimited
-t: cpu time (seconds) unlimited
-d: data seg size (kb) unlimited
-s: stack size (kb) 8192
-c: core file size (blocks) unlimited
-m: resident set size (kb) unlimited
-l: locked memory (kb) 16384
-p: processes unlimited
-n: file descriptors 1048576 <====
-v: address space (kb) unlimited
-w: locks unlimited
-e: scheduling priority 0
-r: real-time priority 0
Different from a regular bash process system limits:
# ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 15447
max locked memory (kbytes, -l) 16384
max memory size (kbytes, -m) unlimited
open files (-n) 1024 <===
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 15447
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
But yet the total max of the whole system:
$ cat /proc/sys/fs/file-max
394306
If you see this nothing can exceed /proc/sys/fs/file-max on the system, so I would also check that value. I would also check the number of file descriptors being used (first column), this will give you an idea of how many open files you have:
$ cat /proc/sys/fs/file-nr
2176 0 394306

Openshift Monitoring - cAdvisor + Prometheus - Docker

I tried to implement a monitoring solution for Openshift cluster based on Prometheus + node-exporter + grafana + cAdvisor.
I have a huge problem with cAdvisor component. I did a lot of configuration (The changes always do with volumes), but none of them work well, containter restarting every ~2min or do not collect all data metrics (processes)
example of configuration(with this config containter do not restart every 2min, but not collect processes) I know, i dont have /rootfs in volumes, but with this container work like 5s and goes down:
containers:
- image: >-
google/cadvisor#sha256:fce642268068eba88c27c666e92ed4144be6188447a23825015884741cf0e352
imagePullPolicy: IfNotPresent
name: cadvisor-new-version
ports:
- containerPort: 8080
protocol: TCP
resources: {}
securityContext:
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: '/sys/fs/cgroup/cpuacct,cpu'
name: sys
readOnly: true
- mountPath: /var/lib/docker
name: docker
readOnly: true
- mountPath: /var/run/containerd/containerd.sock
name: docker-socketd
readOnly: true
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: cadvisor-sa
serviceAccountName: cadvisor-sa
terminationGracePeriodSeconds: 300
volumes:
- hostPath:
path: '/sys/fs/cgroup/cpu,cpuacct'
name: sys
- hostPath:
path: /var/lib/docker
name: docker
- hostPath:
path: /var/run/containerd/containerd.sock
name: docker-socketd
i use a service account in my OS project with scc-privileged.
Openshift version - 3.6
Docker version - 1.12
cAdvisor version - I tried every one from v0.26.3 to newest
I found a post that the problem can be the old version od docker, can anyone confirmed this?
Maybe someone do the right configuration and implement cAdvisor on Openshift?
example of logs:
I0409 08:41:46.661453 1 manager.go:231] Version:
{KernelVersion:3.10.0-693.17.1.el7.x86_64 ContainerOsVersion:Alpine Linux v3.4 DockerVersion:1.12.6 DockerAPIVersion:1.24 CadvisorVersion:v0.28.3 CadvisorRevision:1e567c2}
E0409 08:41:50.823560 1 factory.go:340] devicemapper filesystem stats will not be reported: usage of thin_ls is disabled to preserve iops
I0409 08:41:50.825280 1 factory.go:356] Registering Docker factory
I0409 08:41:50.826394 1 factory.go:54] Registering systemd factory
I0409 08:41:50.826949 1 factory.go:86] Registering Raw factory
I0409 08:41:50.827388 1 manager.go:1178] Started watching for new ooms in manager
I0409 08:41:50.838169 1 manager.go:329] Starting recovery of all containers
W0409 08:41:56.853821 1 container.go:393] Failed to create summary reader for "/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podc323db44_39a9_11e8_accd_005056800e7b.slice/docker-26db795af0fa28047f04194d8169cf0249edf2c918c583422a1404d35ed5b62c.scope": none of the resources are being tracked.
I0409 08:42:03.953261 1 manager.go:334] Recovery completed
I0409 08:42:37.874062 1 cadvisor.go:162] Starting cAdvisor version: v0.28.3-1e567c2 on port 8080
I0409 08:42:56.353574 1 fsHandler.go:135] du and find on following dirs took 1.20076874s: [ /rootfs/var/lib/docker/containers/2afa2c457a9c1769feb6ab542102521d8ad51bdeeb89581e4b7166c1c93e7522]; will not log again for this container unless duration exceeds 2s
I0409 08:42:56.453602 1 fsHandler.go:135] du and find on following dirs took 1.098795382s: [ /rootfs/var/lib/docker/containers/65e4ad3536788b289e2b9a29e8f19c66772b6f38ec10d34a2922e4ef4d67337f]; will not log again for this container unless duration exceeds 2s
I0409 08:42:56.753070 1 fsHandler.go:135] du and find on following dirs took 1.400184357s: [ /rootfs/var/lib/docker/containers/2b0aa12a43800974298a7d0353c6b142075d70776222196c92881cc7c7c1a804]; will not log again for this container unless duration exceeds 2s
I0409 08:43:00.352908 1 fsHandler.go:135] du and find on following dirs took 1.199079344s: [ /rootfs/var/lib/docker/containers/aa977c2cc6105e633369f48e2341a6363ce836cfbe8e7821af955cb0cf4d5f26]; will not log again for this container unless duration exceeds 2s
There's a cAdvisor process embedded in the OpenShift's kubelet. Maybe there's a race condition that makes the pod crash.
I'm seeing something similar in a three node docker swarm where cadvisor on one node - and only that one - keeps dying after a few minutes. I've watched the process and looked at it's resource usage - it's running out of memory.
I've set a 128MB limit but I've tried higher limits as well. It just buys it more time but even at 500MB it soon died because it ran out of memory.
The only thing that seems to be abnormal are those same "du and find on following dirs took" messages:
I0515 15:14:37.109399 1 fsHandler.go:135] du and find on following dirs took 46.19060577s: [/rootfs/var/lib/docker/aufs/diff/69a2bd344a635cde23e6c27a69c165ed001178a9093964d73bebdbb81d90369b /rootfs/var/lib/docker/containers/6fd8113e383f78e20608be807a38e17b14715636b94aa99112dd6d7208764a2e]; will not log again for this container unless duration exceeds 5s
I0515 15:14:35.511417 1 fsHandler.go:135] du and find on following dirs took 58.306835696s: [/rootfs/var/lib/docker/aufs/diff/bed9b7ad307f36ae97659b79912ff081f5b64fb8d57d6a48f143cd3bf9823e64 /rootfs/var/lib/docker/containers/108f4b879f7626023be8790af33ad6b73189a27e7c9bb7d6f219521d43099bbe]; will not log again for this container unless duration exceeds 5s
I0515 15:14:47.513604 1 fsHandler.go:135] du and find on following dirs took 45.911742867s: [/rootfs/var/lib/docker/aufs/diff/c9989697f40789a69be47511c2b931f8949323d144051912206fe719f12e127d /rootfs/var/lib/docker/containers/4cd1baa15522b58f61e9968c1616faa426fb3dfd9ac8515896dcc1ec7a0cb932]; will not log again for this container unless duration exceeds 5s
I0515 15:14:49.210788 1 fsHandler.go:135] du and find on following dirs took 46.406268577s: [/rootfs/var/lib/docker/aufs/diff/7605c354c073800dcbb14df16da4847da3d70107509d27f8f1675aab475eb0df /rootfs/var/lib/docker/containers/00f37c6569bb29c028a90118cf9d12333907553396a95390d925a4c2502ab058]; will not log again for this container unless duration exceeds 5s
I0515 15:14:45.614715 1 fsHandler.go:135] du and find on following dirs took 1m1.573576904s: [/rootfs/var/lib/docker/aufs/diff/62d99773c5d1be97863f90b5be03eb94a4102db4498931863fa3f5c677a06a06 /rootfs/var/lib/docker/containers/bf3e2d8422cda2ad2bcb433e30b6a06f1c67c3a9ce396028cdd41cce3b0ad5d6]; will not log again for this container unless duration exceeds 5s
What's interesting is that it starts out taking only a couple of seconds:
I0515 15:09:48.710609 1 fsHandler.go:135] du and find on following dirs took 1.496309475s: [/rootfs/var/lib/docker/aufs/diff/a11190ca4731bbe6d9cbe1a2480e781490dc4e0e6c91c404bc33d37d7d251564 /rootfs/var/lib/docker/containers/d0b45858ae55b6613c4ecabd8d44e815c898bbb5ac5c613af52d6c1f4804df76]; will not log a
gain for this container unless duration exceeds 2s
I0515 15:09:49.909390 1 fsHandler.go:135] du and find on following dirs took 1.29921035s: [/rootfs/var/lib/docker/aufs/diff/62d99773c5d1be97863f90b5be03eb94a4102db4498931863fa3f5c677a06a06 /rootfs/var/lib/docker/containers/bf3e2d8422cda2ad2bcb433e30b6a06f1c67c3a9ce396028cdd41cce3b0ad5d6]; will not log ag
ain for this container unless duration exceeds 2s
I0515 15:09:51.014721 1 fsHandler.go:135] du and find on following dirs took 1.502355544s: [/rootfs/var/lib/docker/aufs/diff/5264e7a8c3bfb2a4ee491d6e42e41b3300acbcf364455698ab232c1fc9e8ab4e /rootfs/var/lib/docker/containers/da355f40535a001c5ba0e16da61b6340028b4e432e0b2f14b8949637559ff001]; will not log a
gain for this container unless duration exceeds 2s
I0515 15:09:53.309486 1 fsHandler.go:135] du and find on following dirs took 2.19038347s: [/rootfs/var/lib/docker/aufs/diff/8b0fd9287d107580b76354851b75c09ce47e114a70092305d42f8c2b5f5e23b2 /rootfs/var/lib/docker/containers/5fd8ac9fd8d98d402851f2642266ca89598a964f50cfabea9bdf50b87f7cff66]; will not log ag
So something seems to be getting progressively worse until the container dies.

Resources