accessing plugin variables in td-agent/fluentd config - fluentd

I've got a fluentd config that i'm using to push logs to cloudwatch logs, and i'd like to extract a certain piece of the container name to use as the log group name. I see that i can embed ruby expressions in my config, but i can't figure out how to access the "container_name" variable from inside that embedded expression. is this possible?
this is my config, which works but uses the raw container_name value as the log group name.
<match container.**>
#type cloudwatch_logs
region us-west-2
log_group_name container_name
log_stream_name "#{File.open('/etc/machine-id').read.strip()}"
auto_create_stream true
retention_in_days 7
</match>
this is what i want to do, but container_name is not defined in the embedded ruby expression
<match container.**>
#type cloudwatch_logs
region us-west-2
log_group_name "#{container_name.match(/\w+/)[0]}"
log_stream_name "#{File.open('/etc/machine-id').read.strip()}"
auto_create_stream true
retention_in_days 7
</match>
is this possible?

Related

Is there a way to conditionally configure a source input plugin in fluentd?

I have a generic fluentd conf that I am using for multiple services.
fluent.conf
<source>
#type dummy
dummy {"hello":"world"}
</source>
<match>
#type stdout
</match>
In this config I want to inject the S3 source input plugin (below snippet) based on a condition. The condition could be if a specific env variable is set to true inject this source otherwise skip. Is there a way to to do this
<source>
#type s3
tag input.s3
s3_bucket "#{ENV['S3_BUCKET']}"
s3_region "#{ENV['S3_REGION']}"
add_object_metadata true
store_as gzip
<sqs>
queue_name "#{ENV['SQS_QUEUE']}"
</sqs>
</source>
Or if there are other ways to handle this please let me know.

How to use tag in the output path of fluentd?

I'm trying to use fluentd to copy a bunch of log files. All log files need to be written to the same destination directory.
#type tail
#id container-input
format none
path "/var/log/containers/plugin*.log"
# This path would match multiple files that I want to log
pos_file "/var/log/plugin.log.pos"
refresh_interval 5
rotate_wait 5
read_from_head "true"
tag plugin.*
</source>
<filter plugin.**>
#type record_transformer
<record>
filename ${tag_suffix[-2]}
</record>
</filter>
<match plugin**>
#type file
path /destlogs/plugin.log
</match>
What I want is to use the filename somewhere in the output path, something like
path /destlogs/plugin-${filename}.log
However when I use such configuration, fluentd does not pick the filename tag as a variable, rather it just creates the path as is.
How to use a tag as a variable in the output path?
The issue here was the version. v0.12 seems to not support tags in the match section, whereas v1.0 does. I was using fluentd image
fluent/fluentd-kubernetes-daemonset:elasticsearch Which I realized uses the older fluentd version. Updating it to
fluent/fluentd-kubernetes-daemonset:v1-debian-elasticsearch resolved the issue.

duplicate logs in fluentd, when integrating plugin fluent-plugin-detect-exceptions

I have a basic fluentd config that uses a plugin to detect exceptions and bundle multi-line stack-trace into a single one. The problem is that it duplicates the exception logs. I get both the bundled lines and the raw lines.
The difference between the plugins documentation and the below config is that it uses several match directives. Fluentd does not give any information on exactly what double match directives are interpreted. One solution would be to remove the double match directive, but then how could it be sent to elasticsearch?
I suspect that it is a misunderstanding of the match directive on my part. But I can't find documentation that helps me understand why the below config does what it does.
Here's the relevant part of the fluentd config
<label #DISPATCH>
<match kubernetes.**>
#type detect_exceptions
remove_tag_prefix kubernetes
multiline_flush_interval 0.2
</match>
<match **>
#type relabel
#label #OUTPUT
</match>
</label>
<label #OUTPUT>
<match **>
#type elasticsearch
host "elasticsearch-master"
port 9200
path ""
user elastic
password changeme
</match>
</label>
Any help / pointers is very appreciated

Adding client-unique record to a log event, fluentd side. E.g., using filter

I succeeded getting a dockerized fluentd TCP logging to run! Meaning: There are
remote python containers using a slightly modified
logging.handlers.SocketHandler to send some JSON to fluentd - and
it actually arrives there, looking like this:
2020-08-31T09:06:31+00:00 paws.tcp {"service_uuid":"paws_log","loglvl":"INFO","file":"paws_log.paws_log","line":59,"msg":"Ping log line #2"}
I have multiple such python containers and would like to have fluentd add some
kind of source id to each log event. Reading the docs made me give the filter -> record
mechanism a chance. Leading to the following config snippet with a newly added
filter block:
<source>
#type tcp
#label stream_paws
#id paws_tcp
tag paws.tcp
port 5170
bind 0.0.0.0
# https://docs.fluentd.org/parser/regexp
<parse>
#type regexp
expression /^(?<service_uuid>[a-zA-Z0-9_-]+): (?<logtime>[^\s]+) (?<loglvl>[^\s]+) \[(?<file>[^\]:]+):(?<line>\d+)\]: (?<msg>.*)$/
time_key logtime
time_format %H:%M:%S
types line:integer
</parse>
</source>
# Add meta data fluentd side.
# https://docs.fluentd.org/deployment/logging
<filter **> # << Does NOT seem to work if kept outside the label-block! Inside is fine.
#type record_transformer
<record>
host "#{Socket.gethostname}"
</record>
</filter>
<label stream_paws>
<match paws.tcp>
#type file
#id output_paws_tcp
path /fluentd/log/paws/data/tcp.*.log
symlink_path /fluentd/log/paws/tcp.log
</match>
</label>
I have two questions here:
Above config works if I put the filter-block inside the label-block. But this I do not want to do because I want the filter to act globally. #include directives might offer a work-around here. Anything better?
I suspect "#{Socket.gethostname}" yields information on the fluentd server. However, I want something on the client. Ideally including some id that is unique on a docker container level (might be the container id. However, any old client-unique uuid would be fine). Do you know of such a property accessible to fluentd?
If you are using fluentd docker logging driver it will already add container metadata (including id) to every log record:
https://docs.docker.com/config/containers/logging/fluentd/
Above config works if I put the filter-block inside the label-block. But this I do not want to do because I want the filter to act globally. #include directives might offer a work-around here. Anything better?
A global filter usually implemented on a server like:
<source>
...
</source>
<filter **> # filter globally
...
</filter>
<match tag.one>
...
</match>
<match tag.two>
...
</match>
<match **> # the rest
...
</match>
I suspect "#{Socket.gethostname}" yields information on the fluentd server.
Correct, see: https://docs.fluentd.org/filter/record_transformer#example-configurations. This can be useful when you wanna also track which server processed the log record.
If you are using kubernetes then use kubernetes metadata it will add pod details with each log entry.
<filter kubernetes.**>
#id filter_kubernetes_metadata
#type kubernetes_metadata
</filter>
For Docker
I've not really used fluentd before, so apologies for a slightly abstract response here. But .. checking on http://docs.fluentd.org/ I guess you're probably using in_tail for the logs? From the example there, it looks like you'd probably want to get the path to the file into the input message:
path /path/to/file
tag foo.*
which apparently tags events with foo.path.to.file
you could probably use http://docs.fluentd.org/articles/filter_record_transformer with enable_ruby. From this, it looks like you could probably process the foo.path.to.file tag and use a little ruby to extract the container ID and then parse out then JSON file.
For example, testing with the following ruby file, say, foo.rb
tag = 'foo.var.lib.docker.containers.ID.ID-json.log'
require 'json'; id = tag.split('.')[5]; puts JSON.parse(IO.read("/var/lib/docker/containers/#{id}/config.v2.json"))['image']
where config.v2.json was something like:
{"image":"foo"}
will print you
foo
Fluentd might already be including json for you, so maybe you could leave out the require 'json'; bit. Then, putting this in fluentd terms, perhaps you could use something like
<filter>
enable_ruby
<record>
container ${tag.split('.')[5]}
image ${id = tag.split('.')[5]; JSON.parse(IO.read("/var/lib/docker/containers/#{id}/config.v2.json"))['image']}
</record>
</filter>
In your case might be you can use like below
<filter raw.**>
#type record_transformer
enable_ruby
<record>
container ${id = tag.split('.')[5]; JSON.parse(IO.read("/var/lib/docker/containers/#{id}/config.v2.json"))["Name"][1..-1]}
hostname "#{Socket.gethostname}"
</record>
</filter>

fluentd - how to source log file name with timestamp

fluentd - how to source log file name with timestamp
e.g. Catalina logs are generated with timestamp e.g.
catalina.2018-11-05.log
catalina.2018-12-03.log
catalina.2018-12-10.log
I would like fluentd to access latest log file based on the timestamp in file name. Can you suggest what the source path should look like in td-agent.conf
<source>
#type tail
path D:\apache-tomcat-9.0.12\logs\catalina.**[TODAY]**.log
pos_file C:\opt\td-agent\javalogs.log.pos
tag javalogs
<parse>
#type json
</parse>
</source>
<match javalogs>
#type stdout
</match>
Try below path syntax.
path D:\apache-tomcat-9.0.12\logs\catalina.%Y-%m-%d.log
Note - Make sure your files are created according to same timezone as fluentd agent process so it can properly tail the correctly created files. Also, fluentd process should have correct read permissions on catalina files.

Resources