duplicate logs in fluentd, when integrating plugin fluent-plugin-detect-exceptions - fluentd

I have a basic fluentd config that uses a plugin to detect exceptions and bundle multi-line stack-trace into a single one. The problem is that it duplicates the exception logs. I get both the bundled lines and the raw lines.
The difference between the plugins documentation and the below config is that it uses several match directives. Fluentd does not give any information on exactly what double match directives are interpreted. One solution would be to remove the double match directive, but then how could it be sent to elasticsearch?
I suspect that it is a misunderstanding of the match directive on my part. But I can't find documentation that helps me understand why the below config does what it does.
Here's the relevant part of the fluentd config
<label #DISPATCH>
<match kubernetes.**>
#type detect_exceptions
remove_tag_prefix kubernetes
multiline_flush_interval 0.2
</match>
<match **>
#type relabel
#label #OUTPUT
</match>
</label>
<label #OUTPUT>
<match **>
#type elasticsearch
host "elasticsearch-master"
port 9200
path ""
user elastic
password changeme
</match>
</label>
Any help / pointers is very appreciated

Related

Adding client-unique record to a log event, fluentd side. E.g., using filter

I succeeded getting a dockerized fluentd TCP logging to run! Meaning: There are
remote python containers using a slightly modified
logging.handlers.SocketHandler to send some JSON to fluentd - and
it actually arrives there, looking like this:
2020-08-31T09:06:31+00:00 paws.tcp {"service_uuid":"paws_log","loglvl":"INFO","file":"paws_log.paws_log","line":59,"msg":"Ping log line #2"}
I have multiple such python containers and would like to have fluentd add some
kind of source id to each log event. Reading the docs made me give the filter -> record
mechanism a chance. Leading to the following config snippet with a newly added
filter block:
<source>
#type tcp
#label stream_paws
#id paws_tcp
tag paws.tcp
port 5170
bind 0.0.0.0
# https://docs.fluentd.org/parser/regexp
<parse>
#type regexp
expression /^(?<service_uuid>[a-zA-Z0-9_-]+): (?<logtime>[^\s]+) (?<loglvl>[^\s]+) \[(?<file>[^\]:]+):(?<line>\d+)\]: (?<msg>.*)$/
time_key logtime
time_format %H:%M:%S
types line:integer
</parse>
</source>
# Add meta data fluentd side.
# https://docs.fluentd.org/deployment/logging
<filter **> # << Does NOT seem to work if kept outside the label-block! Inside is fine.
#type record_transformer
<record>
host "#{Socket.gethostname}"
</record>
</filter>
<label stream_paws>
<match paws.tcp>
#type file
#id output_paws_tcp
path /fluentd/log/paws/data/tcp.*.log
symlink_path /fluentd/log/paws/tcp.log
</match>
</label>
I have two questions here:
Above config works if I put the filter-block inside the label-block. But this I do not want to do because I want the filter to act globally. #include directives might offer a work-around here. Anything better?
I suspect "#{Socket.gethostname}" yields information on the fluentd server. However, I want something on the client. Ideally including some id that is unique on a docker container level (might be the container id. However, any old client-unique uuid would be fine). Do you know of such a property accessible to fluentd?
If you are using fluentd docker logging driver it will already add container metadata (including id) to every log record:
https://docs.docker.com/config/containers/logging/fluentd/
Above config works if I put the filter-block inside the label-block. But this I do not want to do because I want the filter to act globally. #include directives might offer a work-around here. Anything better?
A global filter usually implemented on a server like:
<source>
...
</source>
<filter **> # filter globally
...
</filter>
<match tag.one>
...
</match>
<match tag.two>
...
</match>
<match **> # the rest
...
</match>
I suspect "#{Socket.gethostname}" yields information on the fluentd server.
Correct, see: https://docs.fluentd.org/filter/record_transformer#example-configurations. This can be useful when you wanna also track which server processed the log record.
If you are using kubernetes then use kubernetes metadata it will add pod details with each log entry.
<filter kubernetes.**>
#id filter_kubernetes_metadata
#type kubernetes_metadata
</filter>
For Docker
I've not really used fluentd before, so apologies for a slightly abstract response here. But .. checking on http://docs.fluentd.org/ I guess you're probably using in_tail for the logs? From the example there, it looks like you'd probably want to get the path to the file into the input message:
path /path/to/file
tag foo.*
which apparently tags events with foo.path.to.file
you could probably use http://docs.fluentd.org/articles/filter_record_transformer with enable_ruby. From this, it looks like you could probably process the foo.path.to.file tag and use a little ruby to extract the container ID and then parse out then JSON file.
For example, testing with the following ruby file, say, foo.rb
tag = 'foo.var.lib.docker.containers.ID.ID-json.log'
require 'json'; id = tag.split('.')[5]; puts JSON.parse(IO.read("/var/lib/docker/containers/#{id}/config.v2.json"))['image']
where config.v2.json was something like:
{"image":"foo"}
will print you
foo
Fluentd might already be including json for you, so maybe you could leave out the require 'json'; bit. Then, putting this in fluentd terms, perhaps you could use something like
<filter>
enable_ruby
<record>
container ${tag.split('.')[5]}
image ${id = tag.split('.')[5]; JSON.parse(IO.read("/var/lib/docker/containers/#{id}/config.v2.json"))['image']}
</record>
</filter>
In your case might be you can use like below
<filter raw.**>
#type record_transformer
enable_ruby
<record>
container ${id = tag.split('.')[5]; JSON.parse(IO.read("/var/lib/docker/containers/#{id}/config.v2.json"))["Name"][1..-1]}
hostname "#{Socket.gethostname}"
</record>
</filter>

accessing plugin variables in td-agent/fluentd config

I've got a fluentd config that i'm using to push logs to cloudwatch logs, and i'd like to extract a certain piece of the container name to use as the log group name. I see that i can embed ruby expressions in my config, but i can't figure out how to access the "container_name" variable from inside that embedded expression. is this possible?
this is my config, which works but uses the raw container_name value as the log group name.
<match container.**>
#type cloudwatch_logs
region us-west-2
log_group_name container_name
log_stream_name "#{File.open('/etc/machine-id').read.strip()}"
auto_create_stream true
retention_in_days 7
</match>
this is what i want to do, but container_name is not defined in the embedded ruby expression
<match container.**>
#type cloudwatch_logs
region us-west-2
log_group_name "#{container_name.match(/\w+/)[0]}"
log_stream_name "#{File.open('/etc/machine-id').read.strip()}"
auto_create_stream true
retention_in_days 7
</match>
is this possible?

Configure fluentd to properly parse and ship java stacktrace,which is formatted using docker json-file logging driver,to elastic as single message

Our service runs as a docker instance.
Given limitation is that the docker logging driver cannot be changed to anything different than the default json-file driver.
The (scala micro)service outputs a log that looks like this
{"log":"10:30:12.375 [application-akka.actor.default-dispatcher-13] [WARN] [rulekeepr-615239361-v5mtn-7]- c.v.r.s.logic.RulekeeprLogicProvider(91) - decision making have failed unexpectedly\n","stream":"stdout","time":"2017-05-08T10:30:12.376485994Z"}
{"log":"java.lang.RuntimeException: Error extracting fields to make a lookup for a rule at P2: [failed calculating amount/amountEUR/directive: [failed getting accountInfo of companyId:3303 from deadcart: unexpected status returned: 500]]\n","stream":"stdout","time":"2017-05-08T10:30:12.376528449Z"}
{"log":"\u0009at org.assbox.rulekeepr.services.BasicRuleService$$anonfun$lookupRule$2.apply(BasicRuleService.scala:53)\n","stream":"stdout","time":"2017-05-08T10:30:12.376537277Z"}
{"log":"\u0009at org.assbox.rulekeepr.services.BasicRuleService$$anonfun$lookupRule$2.apply(BasicRuleService.scala:53)\n","stream":"stdout","time":"2017-05-08T10:30:12.376542826Z"}
{"log":"\u0009at scala.concurrent.Future$$anonfun$transform$1$$anonfun$apply$2.apply(Future.scala:224)\n","stream":"stdout","time":"2017-05-08T10:30:12.376548224Z"}
{"log":"Caused by: java.lang.RuntimeException: failed calculating amount/amountEUR/directive: [failed getting accountInfo of companyId:3303 from deadcart: unexpected status returned: 500]\n","stream":"stdout","time":"2017-05-08T10:30:12.376674554Z"}
{"log":"\u0009at org.assbox.rulekeepr.services.logic.TlrComputedFields$$anonfun$calculatedFields$1.applyOrElse(AbstractComputedFields.scala:39)\n","stream":"stdout","time":"2017-05-08T10:30:12.376680922Z"}
{"log":"\u0009at org.assbox.rulekeepr.services.logic.TlrComputedFields$$anonfun$calculatedFields$1.applyOrElse(AbstractComputedFields.scala:36)\n","stream":"stdout","time":"2017-05-08T10:30:12.376686377Z"}
{"log":"\u0009at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)\n","stream":"stdout","time":"2017-05-08T10:30:12.376691228Z"}
{"log":"\u0009... 19 common frames omitted\n","stream":"stdout","time":"2017-05-08T10:30:12.376720255Z"}
{"log":"Caused by: java.lang.RuntimeException: failed getting accountInfo of companyId:3303 from deadcart: unexpected status returned: 500\n","stream":"stdout","time":"2017-05-08T10:30:12.376724303Z"}
{"log":"\u0009at org.assbox.rulekeepr.services.mixins.DCartHelper$$anonfun$accountInfo$1.apply(DCartHelper.scala:31)\n","stream":"stdout","time":"2017-05-08T10:30:12.376729945Z"}
{"log":"\u0009at org.assbox.rulekeepr.services.mixins.DCartHelper$$anonfun$accountInfo$1.apply(DCartHelper.scala:24)\n","stream":"stdout","time":"2017-05-08T10:30:12.376734254Z"}
{"log":"\u0009... 19 common frames omitted\n","stream":"stdout","time":"2017-05-08T10:30:12.37676087Z"}
How can I harness fluentd directives for properly combining the following log event that contains a stack trace, so it all be shipped to elastic as single message?
I have full control of the logback appender pattern used, so I can change the order of occurrence of log values to something else, and even change the appender class.
We're working with k8s and it turns out its not straight forward to change the docker logging driver so we're looking for a solution that will be able to handle the given example.
I don't care so much about extracting the loglevel, thread, logger into specific keys so I could later easily filter by them in kibana. it would be nice to have, but less important.
What is important is to accurately parse the timestamp, down to the milliseconds and use it as the actual log even timestamp as it shipped to elastic.
You can use fluent-plugin-concat.
For example with Fluentd v0.14.x,
<source>
#type tail
path /var/log/containers/*.log
pos_file /var/log/fluentd-containers.log.pos
tag kubernetes.*
read_from_head true
<parse>
#type json
</parse>
#label #INPUT
</source>
<label #INPUT>
<filter kubernetes.**>
#type concat
key log
multiline_start_regexp ^\d{2}:\d{2}:\d{2}\.\d+
continuous_line_regexp ^(\s+|java.lang|Caused by:)
separator ""
flush_interval 3s
timeout_label #PARSE
</filter>
<match kubernetes.**>
#type relabel
#label #PARSE
</match>
</label>
<label #PARSE>
<filter kubernetes.**>
#type parser
key_name log
inject_key_prefix log.
<parse>
#type multiline_grok
grok_failure_key grokfailure
<grok>
pattern YOUR_GROK_PATTERN
</grok>
</parse>
</filter>
<match kubernetes.**>
#type relabel
#label #OUTPUT
</match>
</label>
<label #OUTPUT>
<match kubernetes.**>
#type stdout
</match>
</label>
Similar issues:
https://github.com/fluent/fluent-plugin-grok-parser/issues/36
https://github.com/fluent/fluent-plugin-grok-parser/issues/37
You can try using the fluentd-plugin-grok-parser - but I am having the same issue - it seems that the \u0009 tab character is not being recognized and so using fluentd-plugin-detect-exceptions will not detect the multiline exceptions - at least not yet in my attempts... .
In fluentd 1.0 I was able to achieve this with fluent-plugin-concat. The concat plugin starts and continues concatenation until it sees multiline_start_regexp pattern again. This captures JAVA exceptions and multiline slf4j log statements. Adjust your multiline_start_regexp pattern to match your slf4j log output line.
Any line, including exceptions starting starting with a timestamp matching pattern 2020-10-05 18:01:52.871
will be concatenated
ex:
2020-10-05 18:01:52.871 ERROR 1 --- [nio-8088-exec-3] c.i.printforever.DemoApplication multiline statement
I am using container_id as the identity key,
<system>
log_level debug
</system>
# Receive events from 24224/tcp
# This is used by log forwarding and the fluent-cat command
<source>
#type forward
#id input1
#label #mainstream
port 24224
</source>
# All plugin errors
<label #ERROR>
<match **>
#type file
#id error
path /fluentd/log/docker/error/error.%Y-%m-%d.%H%M
compress gzip
append true
<buffer>
#type file
path /fluentd/log/docker/error
timekey 60s
timekey_wait 10s
timekey_use_utc true
total_limit_size 200mb
</buffer>
</match>
</label>
<label #mainstream>
<filter docker.**>
#type concat
key log
stream_identity_key container_id
multiline_start_regexp /^\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2}\.\d{1,3}/
</filter>
# Match events with docker tag
# Send them to S3
<match docker.**>
#type copy
<store>
#type s3
#id output_docker_s3
aws_key_id "#{ENV['AWS_KEY_ID']}"
aws_sec_key "#{ENV['AWS_SECRET_KEY']}"
s3_bucket "#{ENV['S3_BUCKET']}"
path "#{ENV['S3_OBJECT_PATH']}"
store_as gzip
<buffer tag,time>
#type file
path /fluentd/log/docker/s3
timekey 300s
timekey_wait 1m
timekey_use_utc true
total_limit_size 200mb
</buffer>
time_slice_format %Y%m%d%H
</store>
<store>
#type stdout
</store>
<store>
#type file
#id output_docker_file
path /fluentd/log/docker/file/${tag}.%Y-%m-%d.%H%M
compress gzip
append true
<buffer tag,time>
#type file
timekey_wait 1m
timekey 1m
timekey_use_utc true
total_limit_size 200mb
path /fluentd/log/docker/file/
</buffer>
</store>
</match>
<match **>
#type file
#id output_file
path /fluentd/log/docker/catch-all/data.*.log
</match>
</label>

splitting docker stdout and stderr with fluentd fluent-plugin-rewrite-tag-filter plugin

I currently have the following config:
<match docker.nginx>
#type rewrite_tag_filter
rewriterule1 source stdout docker.nginx.stdout
rewriterule2 source stderr docker.nginx.stderr
</match>
but this means, that with each container I have to do the same.
This isn't working, but probably you'll get what I want from it:
<match docker.*>
#type rewrite_tag_filter
rewriterule1 source stdout docker.*.stdout
rewriterule2 source stdout docker.*.stderr
</match>
So my question is can I somehow refer to the matched tag in the match block? So if it's nginx/rabbitmq/zookeeper/anything, it will split all event flows into docker.<fluentd-tag>.stdout and stderr.
Thanks in advance!

How do you forward an event from one match block to the next in fluentd?

I came across Fluentd last week. I liked it at first (still do), but there seem to be a few holes that are preventing me from using it.
I'm trying to forward our logs to two different locations - an S3 bucket to archive, and an Elasticsearch database for analytics with kibana. I looked at the fluent-forest-plugin, but I realize that won't work because of this. I tried using the copy plugin, but I'm getting this error:
[error]: config error file="/etc/td-agent/td-agent.conf" error="Other 's3' plugin already use same buffer_path: type = s3, buffer_path = /tmp/fluent-plugin-s3"
with this config
<source>
type tail
path /var/log/nginx/web__error.log
pos_file /var/tmp/nginx_web__error.pos
tag web__error
format /^(?<time>[^ ]+ [^ ]+) \[(?<log_level>.*)\] (?<pid>\d*).(?<tid>[^:]*): (?<message>.*)$/
</source>
<match web__error>
type copy
<store>
type s3
aws_key_id ACC_KEY
aws_sec_key SEC_KEY
s3_bucket log-bucket
path web__error/
buffer_path /tmp/fluent-plugin-s3
s3_object_key_format %{path}%{time_slice}_%{index}.%{file_extension}
time_slice_format %Y-%m-%d/%H
flush_interval 15s
utc
</store>
<store>
type elasticsearch
logstash_format true
logstash_prefix web__error
flush_interval 15s
include_tag_key true
utc_index true
</store>
</match>
From what I've read, once an event is caught in one match block, it can't be caught by any subsequent ones. As a last resort, I need to know if there is any way to do this that I haven't found yet?
This is a non-issue - I forgot I was using the same buffer_path in other config files, which caused this error.

Resources