how to parse kubelet log with fluentd - parsing

The origal kubelet log is such like this:
I0605 09:03:41.463195 28799 setters.go:72] Using node IP: "10.127.7.174"
I can parse it in fluentd as:
format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
However, kubespary deploy kubelet as following:
1. journald collects kubelet log;
2. I write a rsyslog file, so kubelet log can be stored in /var/log/kubelet.log.
And the log changes to:
Jun 5 09:03:41 k8s-4 kubelet: I0605 09:03:41.463195 28799 setters.go:72] Using node IP: "10.127.7.174"
I wonder how to parse this.

I've tried to parse your log example file and use the following regexp filter to achieve the result:
format /(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[^ :\[]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$/
This will split keys accordingly as per Fluentular output:
time 2019/06/06 08:19:35 +0000
host k8s-4
ident kubelet
message I0605 09:03:41.463195 28799 setters.go:72] Using node IP:
"10.127.7.174"
In order to get more stuff to learn about Fluentd regexp just read documentation.
FYI. There is also opportunity to capture logs from systemd via fluent-plugin-systemd as well.

Related

How to parse kubernetes logs with Fluentd

I use few services in EKS cluster. I want the logs from 1 of my services to be parsed
kubectl logs "pod_name" --> this are the logs when I check directly in the pod service
2022-09-21 10:44:26,434 [springHikariCP housekeeper ] DEBUG HikariPool - springHikariCP - Fill pool skipped, pool is at sufficient level.
2022-09-21 10:44:36,316 [springHikariCP housekeeper ] DEBUG HikariPool - springHikariCP - Before cleanup stats (total=10, active=0, idle=10, waiting=0)
This service has java based login (Apache Commons logging) and in kibana at the moment is displayed whole log message with date and time + Log Level + message :
Is it possible this whole log to be parsed into the separate fields (time and date + Log Level + message) and displayed in the Kibana like that.
This is my fluentd config file for the source and pattern:
<source>
#type tail
path /var/log/containers/*background-executor*.log
pos_file fluentd-docker.pos
tag kubernetes.*
read_from_head true
<parse>
#type multi_format
<pattern>
format json
time_key time
time_type string
time_format "%Y-%m-%dT%H:%M:%S.%NZ"
keep_time_key false
</pattern>
<pattern>
format regexp
expression /^(?<time>\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2}.\d{3})\s+(?<level>[^\s]+)\s+(?<pid>\d+).*?\[\s+(?<thread>.*)\]\s+(?<class>.*)\s+:\s+(?<message>.*)/
time_format '%Y-%m-%dT%H:%M:%S.%N%:z'
keep_time_key false
</pattern>
</parse>
</source>
You have to just update the filter as per need
<filter **>
#type record_transformer
enable_ruby
<record>
foo "bar"
KEY "VALUE"
podname "${record['tailed_path'].to_s.split('/')[-3]}"
test "passed"
time "${record['message'].match('[0-9]{2}\\/[A-Z][a-z]{2}\\/[0-9]{4}:[0-9]{2}:[0-9]{2}:[0-9]{2}\\s
+[0-9]{4}').to_s}"
</record>
</filter>
you have to parse the record & message with data something like if between [0-9] or [A-Z] same way show in above example.
Edit the filter.conf
You can create your own Key and value, in value you have to parse the filed and flutenD will populate the value.

Fluent-bit Filter seems to only work when Match is *

I'm using docker-compose, that generates over 20 services. Most of them are similar, but parses different datetime format or values differ slightly. My logging idea is logging everything to systemd and then getting it over fluent-bit.
most of the services in docker-compose look something like this (tag beginning gets different names based on parser I will want to use later):
A-service:
image: A-service
restart: always
network_mode: host
depends_on:
- kafka
- schema-registry
environment:
- KAFKA_BROKERS=127.0.0.1:9092
- SCHEMA_REGISTRY_URL=127.0.0.1:8081
logging:
driver: journald
options:
tag: "dockerC/{{.ImageName}}/{{.Name}}/{{.ID}}"
B-service:
image: B-service
restart: always
network_mode: host
depends_on:
- kafka
- schema-registry
environment:
- KAFKA_BROKERS=127.0.0.1:9092
- SCHEMA_REGISTRY_URL=127.0.0.1:8081
logging:
driver: journald
options:
tag: "dockerJ/{{.ImageName}}/{{.Name}}/{{.ID}}"
fluent-bit.conf:
[SERVICE]
Flush 5
Daemon Off
Log_Level info
parsers_file parsers.conf
[INPUT]
Name systemd
Tag *
Path /run/log/journal
Systemd_Filter _SYSTEMD_UNIT=docker.service
Systemd_Filter _SYSTEMD_UNIT=kubelet.service
[FILTER]
Name parser
Parser dockerJ
Match dockerJ*
Key_Name MESSAGE
Reserve_Data On
Preserve_Key On
[FILTER]
Name parser
Parser dockerC
Match dockerC*
Key_Name MESSAGE
Reserve_Data On
Preserve_Key On
[OUTPUT]
Name es
Match *
Index fluent_bit
Type json
Retry_Limit false
Host ${ELASTICSEARCH_HOST}
Port ${ELASTICSEARCH_PORT}
HTTP_User ${ELASTICSEARCH_USERNAME}
HTTP_Passwd ${ELASTICSEARCH_PASSWORD}
tls off
tls.verify Off
parsers.conf
[PARSER]
Name dockerJ
Format json
Time_Key timeStamp
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep On
# Command | Decoder | Field | Optional Action
# =============|==================|=================
Decode_Field_As escaped_utf8 MESSAGE do_next
Decode_Field json MESSAGE
[PARSER]
Name dockerC
Format json
Time_Key time
Time_Format %Y/%m/%d %H:%M:%S.%L
Time_Keep On
# Command | Decoder | Field | Optional Action
# =============|==================|=================
Decode_Field_As escaped_utf8 MESSAGE do_next
Decode_Field json MESSAGE
if I change Filter Match:
Match dockerC* -> Match *
Match dockerJ* -> Match *
It matches and JSON gets parsed without any problem in es, but I get problems due to different time formats later in my elastic search or fluent-bit invalid time format error
I could edit and make like 8 different [INPUT] fields with different tags, but it seems just a waist of computer resources to do so.
So my question would be: how to actually use tags/filters and send messages based on Tags that are set outside the scope of fluent-bit (like in this case - docker-compose.yml)?
Systemd_Filter _SYSTEMD_UNIT=docker.service sets the TAG as "docker.service" and not the tag field I was expecting. To use the TAG I want - I have to manually change every TAG. Which is achievable my adding rewrite_tag filter:
[FILTER]
Name rewrite_tag
Match docker.service*
Rule $NAME_OF_THE_TAG_KEY .* $NAME_OF_THE_TAG_KEY false
Emitter_Name re_emitted
since I want to change every field I just added .* regex that matches to anything

Injest logs as JSON in Container Optimized OS

I am able to injest logs to Google Log Viewer with the help of stackdriver logging agent from Container Optimized OS as JSON.
It injests logs as a value to message, but not as json payload with the default configuration
What I have tried?
I have changed the fluentd config in /etc/stackdriver/logging.config.d/fluentd-lakitu.conf to the following:
<source>
#type tail
format json
path /var/lib/docker/containers/*/*.log
<parse>
#type json
</parse>
pos_file /var/log/google-fluentd/containers.log.pos
tag reform_contain
read_from_head true
</source>
But its unable to send logs to Log viewer
OS: Container Optimized OS cos-81-12871-1196-0
I've found this issue on Google's Public Issue Tracker which discusses the same problem you mentioned in your use case. Google Product team has been notified about this limitation and they are working on it. You just have to go there and click on the star next to the title so you get updates on the issue and you give the issue more visibility.
As #Kamelia Y mentioned about the https://issuetracker.google.com/issues/137517429
There is a mention on workaround used
<filter cos_containers.**>
#type parser
format json
key_name message
reserve_data false
emit_invalid_record_to_error false
</filter>
The above snippet parses the logs into JSON and injest to Cloud Logging.
In this discussion in Google Groups on Stackdriver, we have discussed on how to use it with startup-script.
Here is the snippet for startup script.
cp /etc/stackdriver/logging.config.d/fluentd-lakitu.conf /etc/stackdriver/logging.config.d/fluentd-lakitu.conf-save
# Shorter version of the above: cp /etc/stackdriver/logging.config.d/fluentd-lakitu.conf{,-save}
(
head -n -2 /etc/stackdriver/logging.config.d/fluentd-lakitu.conf-save; cat <<EOF
<filter cos_containers.**>
#type parser
format json
key_name message
reserve_data false
emit_invalid_record_to_error false
</filter>
EOF
) > /etc/stackdriver/logging.config.d/fluentd-lakitu.conf
sudo systemctl start stackdriver-logging
This image can be used to generate random JSON logs.
https://hub.docker.com/repository/docker/patelathreya/json-random-logger

Is it possible to prepend and append to fluentd events?

Currently using fluentd to stream a third party's application logs to stdout.
Log that is received by fluentd is:
Jun 12, 2020 11:40:00 PM UTC INFO [com.app.purge.PurgeManager run] PURGE: appAtom purge local data complete
Essentially, I want to be able to manipulate this log entry to become:
[LOG_START] [APP_LOG] Jun 12, 2020 11:40:00 PM UTC INFO [com.app.purge.PurgeManager run] PURGE: appAtom purge local data complete [LOG_END]
Went through a lot of the plugins in the fluentd documentation but couldn't find anything that does this.
Fluentd configuration:
<source>
#type tail
path "path/Molecule/logs/*.shared_http_server.deployment.log"
pos_file "path/fluentd/access.pos"
tag app.access
read_from_head true
refresh_interval 1s
<parse>
#type none
</parse>
</source>
<match app.access>
#type stdout
<format>
#type single_value
</format>
</match>
Any help would be appreciated. Thanks.
You can use fluentd record_transformer plugin to append any string to your log record. Quoting this link from fluentd docs.
<filter foo.bar>
#type record_transformer
<record>
message yay, ${record["message"]}
</record>
</filter>
An input like {"message":"hello world!"} is transformed into {"message":"yay, hello world!"}

fluentd failed loading custom parser

fluentd fails to run with my custom parser:
my parser is named dn_multiline_parser and located at lib/fluent/plugin/parser_dn_log.rb
require 'fluent/plugin/parser'
# default format: "%time [%level] %message"
module Fluent
module Plugin
class DnLogParser < Parser
Plugin.register_parser('dn_multiline_parser', self)
... (parser impl)
end
end
end
fluentd.conf:
<worker 0>
<source>
#type tail
# path /external/core/traces/*/*
tag dn.traces
path /core/dn_gen*
path_key logfile
pos_file /tmp/traces.position.traces
tag dn.traces
rotate_wait 5
read_from_head true
multiline_flush_interval 5s
read_lines_limit 100000
refresh_interval 1
exclude_path ["*.gz","*.zip","*.backup"]
<parse>
#type dn_multiline_parser
time_key dn_timestamp
log_format "%t [%l] %m"
time_length 23
time_format %Y-%m-%d %H:%M:%S
</parse>
</source>
</worker>
... (rest of configuration)
when i run fluentd, i get:
2019-12-23 14:32:24 +0000 [error]: config error file="/etc/fluent/fluent.conf" error_class=Fluent::ConfigError error="Unknown parser plugin 'dn_multiline_parser'. Run 'gem search -rd fluent-plugin' to find plugins"
what i've tried:
run fluentd with '-p /path/to/plugin_folder', and it failed with the above error
build a gem, install it, and run fluentd without the '-p', and it failed with the above error (my plugin exists in 'gem list' outputs)
any ideas what am i doing wrong?

Resources