Multiple filter and formatting issue - fluentd

I'm trying to send logs from td-agent to Datadog using the below configuration. My expectation is filtering some keywords and formating that logs with using CSV format type. How can I do this?
I tried to grep and format plugin in the filter section as below but it doesn't work as expected. The current and expected situation as below picture. How can I solve this situation?
<source>
#type syslog
port 8888
tag rsyslog
</source>
<filter rsyslog.**>
#type grep
<regexp>
key message
pattern /COMMAND/
</regexp>
<format>
#type csv
fields hostname,from,to
</format>
</filter>
<match rsyslog.**>
#type datadog
#id awesome_agent
api_key xxxxxxxxxx
</match>
current
expected

Related

How to parse kubernetes logs with Fluentd

I use few services in EKS cluster. I want the logs from 1 of my services to be parsed
kubectl logs "pod_name" --> this are the logs when I check directly in the pod service
2022-09-21 10:44:26,434 [springHikariCP housekeeper ] DEBUG HikariPool - springHikariCP - Fill pool skipped, pool is at sufficient level.
2022-09-21 10:44:36,316 [springHikariCP housekeeper ] DEBUG HikariPool - springHikariCP - Before cleanup stats (total=10, active=0, idle=10, waiting=0)
This service has java based login (Apache Commons logging) and in kibana at the moment is displayed whole log message with date and time + Log Level + message :
Is it possible this whole log to be parsed into the separate fields (time and date + Log Level + message) and displayed in the Kibana like that.
This is my fluentd config file for the source and pattern:
<source>
#type tail
path /var/log/containers/*background-executor*.log
pos_file fluentd-docker.pos
tag kubernetes.*
read_from_head true
<parse>
#type multi_format
<pattern>
format json
time_key time
time_type string
time_format "%Y-%m-%dT%H:%M:%S.%NZ"
keep_time_key false
</pattern>
<pattern>
format regexp
expression /^(?<time>\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2}.\d{3})\s+(?<level>[^\s]+)\s+(?<pid>\d+).*?\[\s+(?<thread>.*)\]\s+(?<class>.*)\s+:\s+(?<message>.*)/
time_format '%Y-%m-%dT%H:%M:%S.%N%:z'
keep_time_key false
</pattern>
</parse>
</source>
You have to just update the filter as per need
<filter **>
#type record_transformer
enable_ruby
<record>
foo "bar"
KEY "VALUE"
podname "${record['tailed_path'].to_s.split('/')[-3]}"
test "passed"
time "${record['message'].match('[0-9]{2}\\/[A-Z][a-z]{2}\\/[0-9]{4}:[0-9]{2}:[0-9]{2}:[0-9]{2}\\s
+[0-9]{4}').to_s}"
</record>
</filter>
you have to parse the record & message with data something like if between [0-9] or [A-Z] same way show in above example.
Edit the filter.conf
You can create your own Key and value, in value you have to parse the filed and flutenD will populate the value.

FluentD How to ignore pattern not match log not to forward to endpoint

We have a requirement where we need to forward only specific string logs to kibana endpoint/console. Currently we are getting pattern not match line where the matched string not found. How to ignore those logs not to send to forwarder and only send match logs.
<source>
#type tail
path session.txt
pos_file session.txt.pos
tag sessionlog
<parse>
#type regexp
expression ^\<#\>\s+(?<time>\w+/\w+/\w+\s+[:0-9]+)\s+(?<hostname>[-0-9A-Z]+)\s+(?<message>.*Clip.*)$/
</parse>
</source>
<match sessionlog>
#type stdout
</match>
<#> 2019/11/16 13:56:33 ABC-Hostanme 278424 Dispatcher_1 Msg [Unit1] error emitted: '404'from session start: 2021-11-16T08:54:01
<#> 2019/11/16 13:56:33 ABC-Hostanme 278424 Dispatcher_1 Msg [Unit1] clip result: a1=0, a2=217, a3=152475, a4=148692
Result:
[warn]: #0 pattern not match: <#> 2019/11/16 13:56:33 ABC-Hostanme 278424 Dispatcher_1 Msg [Unit1] error emitted: '404'from session start: 2021-11-16T08:54:01
sessionlog: {"hostname":"DESKTOP-3JOOBVV","message":"278424 Dispatcher_1 Msg [Unit1] clip result: a1=0, a2=217, a3=152475, a4=148692"}
We want to get only matched pattern logs.
#sunshine, If the regexp parser cannot extract a match from the log, it will emit that error. So, its recommended that all log lines passing through the regexp parser can be matched by the expression. I recommend you use the grep filter before the regexp parser to avoid those "pattern not match" logs from fluentd.
I've pasted an example below but you can also use <exclude> blocks in the grep filter. See here for more info and examples: https://docs.fluentd.org/filter/grep
<source>
#type tail
path session.txt
pos_file session.txt.pos
tag sessionlog
</source>
<filter sessionlog>
#type grep
<regexp>
key message
pattern /INCLUDE_PATTERN_HERE/
</regexp>
</filter>
<filter sessionlog>
#type parser
key_name message
reserve_data true
<parse>
#type regexp
expression ^\<#\>\s+(?<time>\w+/\w+/\w+\s+[:0-9]+)\s+(?<hostname>[-0-9A-Z]+)\s+(?<message>.*Clip.*)$/
</parse>
</filter>
<match sessionlog>
#type stdout
</match>
The answer of #renegaderyu is a very clear solution. FluentD, however, offers a less verbose, built-in solution. You can just set the key emit_invalid_record_to_error to false inside the <filter> in which you parse. It is important to note that this option only works in a <filter> and does not have any effect within a <source>.
<source>
#type tail
path session.txt
pos_file session.txt.pos
tag sessionlog
</source>
<filter sessionlog>
#type parser
key_name message
reserve_data true
<parse>
#type regexp
expression ^\<#\>\s+(?<time>\w+/\w+/\w+\s+[:0-9]+)\s+(?<hostname>[-0-9A-Z]+)\s+(?<message>.*Clip.*)$/
</parse>
emit_invalid_record_to_error false
</filter>
<match sessionlog>
#type stdout
</match>

Fluentd - How to parse logs whose messages are JSON formatted parsed AND whose messages are in text; as is without getting lost due to parse error

I have certain log messages from certain services that are in JSON format; and then this fluentd filter is able to parse that properly. However with this; it discards all other logs from other components whose message field is not proper JSON.
<source>
#type tail
#id in_tail_container_logs
path /var/log/containers/*.log
pos_file /var/log/fluentd-containers.log.pos
tag "#{ENV['FLUENT_CONTAINER_TAIL_TAG'] || 'kubernetes.*'}"
exclude_path "#{ENV['FLUENT_CONTAINER_TAIL_EXCLUDE_PATH'] || use_default}"
read_from_head true
#https://github.com/fluent/fluentd-kubernetes-daemonset/issues/434#issuecomment-752813739
#<parse>
# #type "#{ENV['FLUENT_CONTAINER_TAIL_PARSER_TYPE'] || 'json'}"
# time_format %Y-%m-%dT%H:%M:%S.%NZ
#</parse>
#https://github.com/fluent/fluentd-kubernetes-daemonset/issues/434#issuecomment-831801690
<parse>
#type cri
<parse> # this will parse the neseted feilds properly - like message in JSON; but if mesage is not in json then this is lost
#type json
</parse>
</parse>
#emit_invalid_record_to_error # when nested logging fails, see if we can parse via JSON
#tag backend.application
</source>
But all other messages which do not have proper JSON format are lost;
If I comment out the nested parse part inside type cri; then I get all logs; but logs whose messages are in JSON format are not parsed further. Espcially severity field.See last two lines in the screen shot below
<parse>
#type cri
</parse>
To overcome this ; I try to use the LABEL #ERROR, if nested parsing fails for some logs; whose message is not in JSON format- I need to still see the pod name and other details and message as text in Kibana; However with the below config, it is only able to parse logs whose message is proper JSON format
<source>
#type tail
#id in_tail_container_logs
path /var/log/containers/*.log
pos_file /var/log/fluentd-containers.log.pos
tag "#{ENV['FLUENT_CONTAINER_TAIL_TAG'] || 'kubernetes.*'}"
exclude_path "#{ENV['FLUENT_CONTAINER_TAIL_EXCLUDE_PATH'] || use_default}"
read_from_head true
#https://github.com/fluent/fluentd-kubernetes-daemonset/issues/434#issuecomment-752813739
#<parse>
# #type "#{ENV['FLUENT_CONTAINER_TAIL_PARSER_TYPE'] || 'json'}"
# time_format %Y-%m-%dT%H:%M:%S.%NZ
#</parse>
#https://github.com/fluent/fluentd-kubernetes-daemonset/issues/434#issuecomment-831801690
<parse>
#type cri
<parse> # this will parse the neseted feilds properly - like message in JSON; but if mesage is not in json then this is lost
#type json
</parse>
</parse>
#emit_invalid_record_to_error # when nested logging fails, see if we can parse via JSON
#tag backend.application
</source>
<label #ERROR> # when nested logs fail this is not working
<filter **>
#type parser
key_name message
<parse>
#type none
</parse>
</filter>
<match kubernetes.var.log.containers.elasticsearch-kibana-**> #ignore from this container
#type null
</match>
</label>
How do I get logs whose messages are JSON formatted parsed; and whose messages are in text; as is without getting lost ?
Config here (last there commits) https://github.com/alexcpn/grpc_templates.git
One way to solve this issue is to prepare the logs before parsing them with cir plugin, to do so you need to perform the following steps
collect container logs and tag them with a given tag.
classify the logs to JSON and none JSON logs using rewrite_tag_filter. and regex.
parse JSON logs with cri
parse none JSON Logs
example of configs (not tested)
## collect row logs from files
<source>
#type tail
#id in_tail_container_logs
path /var/log/containers/*.log
pos_file /var/log/fluentd-containers.log.pos
tag kubernetes.*
exclude_path "#{ENV['FLUENT_CONTAINER_TAIL_EXCLUDE_PATH'] || use_default}"
read_from_head true
format json
</source>
# add metadata to the records (container_name, image etc..)
<filter kubernetes.**>
#type kubernetes_metadata
</filter>
# classify the logs to different categories
<match kubernetes.**>
#type rewrite_tag_filter
<rule>
key message
pattern /^\{.+\}$/
tag json.${tag}
</rule>
<rule>
key message
pattern /^\{.+\}$/
tag nonejson.${tag}
invert true
</rule>
</match>
# filter or match logs that match the json tag
<filter json.**>
</filter>
<match json.**>
</match>
# filter or match logs that match the none json tag
<filter nonejson.**>
</filter>
<match nonejson.**>
</match>

Injest logs as JSON in Container Optimized OS

I am able to injest logs to Google Log Viewer with the help of stackdriver logging agent from Container Optimized OS as JSON.
It injests logs as a value to message, but not as json payload with the default configuration
What I have tried?
I have changed the fluentd config in /etc/stackdriver/logging.config.d/fluentd-lakitu.conf to the following:
<source>
#type tail
format json
path /var/lib/docker/containers/*/*.log
<parse>
#type json
</parse>
pos_file /var/log/google-fluentd/containers.log.pos
tag reform_contain
read_from_head true
</source>
But its unable to send logs to Log viewer
OS: Container Optimized OS cos-81-12871-1196-0
I've found this issue on Google's Public Issue Tracker which discusses the same problem you mentioned in your use case. Google Product team has been notified about this limitation and they are working on it. You just have to go there and click on the star next to the title so you get updates on the issue and you give the issue more visibility.
As #Kamelia Y mentioned about the https://issuetracker.google.com/issues/137517429
There is a mention on workaround used
<filter cos_containers.**>
#type parser
format json
key_name message
reserve_data false
emit_invalid_record_to_error false
</filter>
The above snippet parses the logs into JSON and injest to Cloud Logging.
In this discussion in Google Groups on Stackdriver, we have discussed on how to use it with startup-script.
Here is the snippet for startup script.
cp /etc/stackdriver/logging.config.d/fluentd-lakitu.conf /etc/stackdriver/logging.config.d/fluentd-lakitu.conf-save
# Shorter version of the above: cp /etc/stackdriver/logging.config.d/fluentd-lakitu.conf{,-save}
(
head -n -2 /etc/stackdriver/logging.config.d/fluentd-lakitu.conf-save; cat <<EOF
<filter cos_containers.**>
#type parser
format json
key_name message
reserve_data false
emit_invalid_record_to_error false
</filter>
EOF
) > /etc/stackdriver/logging.config.d/fluentd-lakitu.conf
sudo systemctl start stackdriver-logging
This image can be used to generate random JSON logs.
https://hub.docker.com/repository/docker/patelathreya/json-random-logger

Fluentd log source

I'm using Fluentd for shipping two types of logs to Elasticsearch cluster (application and other logs).
Logs located in the same folder /var/log/containers/ and have same name format e.g: app-randomtext.log, dts-randomtext.log etc .
I'd like to assign different indices to them to separate app logs from any other that present now or will appear in this folder.
Here is my try to make a wildcard for "path" in the block, but it doesn't work. Could anybody point me where is my mistake? Thanks
##source for app logs
<source>
#type tail
path /var/log/containers/app*.log
pos_file /var/log/fluentd-containers-app.log.pos
time_format %Y-%m-%dT%H:%M:%S.%NZ
tag app.*
keep_time_key true
format json
</source>
##source for everything else
<source>
#type tail
path /var/log/containers/!(app*.log)
pos_file /var/log/fluentd-containers-non-app.log.pos
time_format %Y-%m-%dT%H:%M:%S.%NZ
tag non-app.*
keep_time_key true
format json
</source>
<match app.**>
#type "aws-elasticsearch-service"
type_name "kube-fluentd-aws-es"
index_name app
include_tag_key true
tag_key "#log_name"
#log_level info
<endpoint>
url "#{ENV['ELASTICSEARCH_URL']}"
region "#{ENV['ELASTICSEARCH_REGION']}"
access_key_id "#{ENV['ELASTICSEARCH_ACCESS_KEY']}"
secret_access_key "#{ENV['ELASTICSEARCH_SECRET_KEY']}"
</endpoint>
</match>
<match non-app.**>
#type "aws-elasticsearch-service"
type_name "kube-fluentd-aws-es"
index_name non-app
include_tag_key true
tag_key "#log_name"
#log_level info
<endpoint>
url "#{ENV['ELASTICSEARCH_URL']}"
region "#{ENV['ELASTICSEARCH_REGION']}"
access_key_id "#{ENV['ELASTICSEARCH_ACCESS_KEY']}"
secret_access_key "#{ENV['ELASTICSEARCH_SECRET_KEY']}"
</endpoint>
</match>
I expect Fluentd to follow tail of all the logs in the folder, but with this config Fluentd follows tail only for app-randomtext.log
Thanks
Finally I managed to find an answer. exclude_path is what I need.
##source for everything else
<source>
#type tail
path /var/log/containers/*.log
exclude_path ["/var/log/containers/app*.log"]
pos_file /var/log/fluentd-containers-non-app.log.pos
time_format %Y-%m-%dT%H:%M:%S.%NZ
tag non-app.*
keep_time_key true
format json
</source>
Here Fluentd follows all the *.log files excluding those that starts with app

Resources