Fluentd change the source field value - fluentd

Using fluentd I wanted to change the "source" field value but not sure how to go about it.
Currently it is picking up the container IP address as "source" but I need that to be a self-defined name.
Any ideas?
EDIT:
Things I tried:
This is a Alpine container running on Fargate ECS so setting the hostname value in the task definition.
Error: ClientException: hostname is not supported on container when networkMode=awsvpc.
Use record transformer to set the hostname/source.
<filter foo.**>
#type record_transformer
<record>
hostname "foo-#{Socket.gethostname}"
</record>
</filter>
Also
<filter foo.**>
#type record_transformer
<record>
source "foo-#{Socket.gethostname}"
</record>
</filter>
No change in the records at all but I can see in the logs the filter being read and gethostname is working.
So looking further at record_transformer I have been able to write a new field with this config:
<filter foo.**>
#type record_transformer
<record>
server "foo-#{Socket.gethostname}"
</record>
</filter>
How do I change the contents of an existing field?
How about this one:
<filter foo.**>
#type record_modifier
<replace>
key source
expression /[\s\S]*/
replace "foo-#{Socket.gethostname}"
</replace>
</filter>
The "source" field should it contain anything its contents should replaced with "foo-#{Socket.gethostname}" only it doesn't.

Related

Fluentd sql output plugin configuration for auto incremented column

I have a fluentd configuration that pulls data from the file and pushes to the SQL server however there is a primary key with the auto-incremented column, so, in my fluentd configuration if I don't mention that column it throws an error saying that the field is missing and if I include the column in the configuration it gives identity error, in below configuration "Id" is the primary and auto-incremented column, also let me know if adapter "sqlserver" is the right thing to use.
<filter record.**>
#type record_transformer
enable_ruby true
<record>
Id ${id}
</record>
<record>
timestamp ${time}
</record>
</filter>
<filter record.**>
#type stdout
</filter>
<match record.**>
#type sql
host myhost
username myuser
password mypassword
database mydb
adapter sqlserver
<table>
table simple_table
column_mapping 'Id:Id,timestamp:timestamp'
</table>
flush_interval 1s
# disable_retry_limit
# num_threads 8
# slow_flush_log_threshold 40.0
</match>
Well, I figured this out, it's mandatory to send the column name in the column_mapping even though if its primary key and auto-incremented, if you login with some other SQL credential it will give you an error, however, if you login with the same details used at the time of table creation it works.

Configuration of fluent-plugin-concat makes the logs dissappear

my configuration of ” fluent-plugin-concat” is causing my long logs to disappear instead of be concatenated and sent to Kinesis steam.
I use fluentd to send logs from containers deployed on AWS/ECS to a kinesis stream. ( and then to ES cluster somewhere)
on rare occasions, some of the logs are very big. most of the time they are under the docker limit of 16K. However, those rare long logs are very important and we don't want to miss them.
My configuration file is attached.
Just before the final match sequence, I added:
<filter>
#type concat
key log
stream_identity_key container_id
partial_key partial_message
partial_value true
separator “”
</filter>
Another configuration I tried:
with the bellow options only the second partial log is sent too ES, the first part can only be seen in the fluentd logs. Adding the logs of this config as a file.
<filter>
#type concat
key log
stream_identity_key partial_id
use_partial_metadata true
separator ""
</filter>
and
<filter>
#type concat
key log
use_partial_metadata true
separator ""
</filter>
The log I’m testing with is also attached as a json document.
If I removed this configuration, this log will be sent in 2 chunks.
What am I doing wrong? (edited)
the full config file:
<system>
log_level info
</system>
# just listen on the unix socket in a dir mounted from host
# input is a json object, with the actual log line in the `log` field
<source>
#type unix
path /var/fluentd/fluentd.sock
</source>
# tag log line as json or text
<match service.*.*>
#type rewrite_tag_filter
<rule>
key log
pattern /.*"logType":\s*"application"/
tag application.${tag}.json
</rule>
<rule>
key log
pattern /.*"logType":\s*"exception"/
tag exception.${tag}.json
</rule>
<rule>
key log
pattern /.*"logType":\s*"audit"/
tag audit.${tag}.json
</rule>
<rule>
key log
pattern /^\{".*\}$/
tag default.${tag}.json
</rule>
<rule>
key log
pattern /.+/
tag default.${tag}.txt
</rule>
</match>
<filter *.service.*.*.*>
#type record_transformer
<record>
service ${tag_parts[2]}
childService ${tag_parts[3]}
</record>
</filter>
<filter *.service.*.*.json>
#type parser
key_name log
reserve_data true
remove_key_name_field true
<parse>
#type json
</parse>
</filter>
<filter *.service.*.*.*>
#type record_transformer
enable_ruby
<record>
#timestamp ${ require 'time'; Time.now.utc.iso8601(3) }
</record>
</filter>
<filter>
#type concat
key log
stream_identity_key container_id
partial_key partial_message
partial_value true
separator ""
</filter>
<match exception.service.*.*.*>
#type copy
<store>
#type kinesis_streams
region "#{ENV['AWS_DEFAULT_REGION']}"
stream_name the-name-ex
debug false
<instance_profile_credentials>
</instance_profile_credentials>
<buffer>
flush_at_shutdown true
flush_interval 10
chunk_limit_size 16m
flush_thread_interval 1.0
flush_thread_burst_interval 1.0
flush_thread_count 1
</buffer>
</store>
<store>
#type stdout
</store>
</match>
<match audit.service.*.*.json>
#type copy
<store>
#type kinesis_streams
region "#{ENV['AWS_DEFAULT_REGION']}"
stream_name the-name-sa
debug false
<instance_profile_credentials>
</instance_profile_credentials>
<buffer>
flush_at_shutdown true
flush_interval 1
chunk_limit_size 16m
flush_thread_interval 0.1
flush_thread_burst_interval 0.01
flush_thread_count 15
</buffer>
</store>
<store>
#type stdout
</store>
</match>
<match *.service.*.*.*>
#type copy
<store>
#type kinesis_streams
region "#{ENV['AWS_DEFAULT_REGION']}"
stream_name the-name-apl
debug false
<instance_profile_credentials>
</instance_profile_credentials>
<buffer>
flush_at_shutdown true
flush_interval 10
chunk_limit_size 16m
flush_thread_interval 1.0
flush_thread_burst_interval 1.0
flush_thread_count 1
</buffer>
</store>
<store>
#type stdout
</store>
</match>
<match **>
#type stdout
</match>
example log message - long single line:
{"message": "some message", "longlogtest": "averylongjsonline", "service": "longlog-service", "logType": "application", "log": "aaa .... ( ~18000 chars )..longlogThisIsTheEndOfTheLongLog"}
fluentd-container-log ... contains only the first part of the message:
and the following error message:
dump an error event: error_class=Fluent::Plugin::Parser::ParserError error="pattern not match with data
2021-03-05 13:45:41.886672929 +0000 fluent.warn: {"error":"#<Fluent::Plugin::Parser::ParserError: pattern not match with data '{\"message\": \"some message\", \"longlogtest\": \"averylongjsonline\", \"service\": \"longlog-service\", \"logType\": \"application\", \"log\": \"aaaasssssdddddjjjjjjjkkkkkkklllllllwewwwiiiiiilonglogaaaasssssdddddjjjjjjjkkkkkkklllllllwewwwiiiiiilonglogaaaasssssdddddjjjjjjjkkkkkkklllllllwewww
< .....Manay lines of original log erased here...... >
djjjjjjjkkkkkkklllllllwewwwiiiaaaasssssdddddjjjjjjjkkkkkkklllllllwewwwiiiiiilongloglonglogaaaassss'>","location":null,"tag":"application.service.longlog.none.json","time":1614951941,"record":{"source":"stdout","log":"{\"message\": \"some message\", \"longlogtest\": \"averylongjsonline\", \"service\": \"longlog-service\", \"logType\": \"application\", \"log\": \"aaaasssssdddddjjjjjjjkkkkkkklllllllwewwwiiiiiilonglogaaaasssssdddddjjjjjjjkkkkkkklllllllwewwwiiiiiilonglogaaaasssssdddddjjjjjjjkkkkkkklllllllwewww
< .....Manay lines of original log erased here...... >
wwiiiiiilonglogaaaasssssdddddjjjjjjjkkkkkkklllllllwewwwiiiaaaasssssdddddjjjjjjjkkkkkkklllllllwewwwiiiiiilongloglonglogaaaassss","partial_message":"true","partial_id":"5c752c1bbfda586f1b867a8ce2274e0ed0418e8e10d5e8602d9fefdb8ad2b7a1","partial_ordinal":"1","partial_last":"false","container_id":"803c0ebe4e6875ea072ce21179e4ac2d12e947b5649ce343ee243b5c28ad595a","container_name":"/ecs-longlog-18-longlog-b6b5ae85ededf4db1f00","service":"longlog","childService":"none"},"message"
:"dump an error event: error_class=Fluent::Plugin::Parser::ParserError error=\"pattern not match with data '{\\\"message\\\": \\\"some message\\\", \\\"longlogtest\\\": \\\"averylongjsonline\\\", \\\"service\\\": \\\"longlog-service\\\", \\\"logType\\\": \\\"application\\\", \\\"log\\\": \\\"aaaasssssdddddjjjjjjjkkkkkkklllllllwewwwiiiiiilonglogaaaasssssdddddjjjjjjjkkkkkkklllllllwewwwiiiiiilonglogaaaasssssdddddjjjjjjjkkkkkkklllllllwewwwiiiiiilonglogaaaasssssdddddjjjjjjjkkkkkkklllllllwewwwiiiiiilonglogaaaasss

Fluentd element emitts message record

We have a td_agent.conf file with the following tag:
#this filter is used for C API which remove "[stdout]" from log
#if CLOG Unified Logging C API won't be used, this filter can be removed
<filter k.**.log>
#type parser
format /^(\[stdout\])*(?<log>.+)$/
key_name log
suppress_parse_error_log true
</filter>
and the following sample log line:
{"host":"omer","level":"TRACE","log":{"classname":"Manager:452","message":"^~\"DD\"-^ TRACE Added context","stacktrace":"","threadname":"Processing-ThreadPool-2"},"process":"Context","service":"","time":"2020-11-04T13:37:12.979Z","timezone":"Kolkata","type":"log"}
When having the above logic in Fluentd, we get the log outputted, with the log: {} emitted, that means not having the info that we want in the elastic db. When removing the tag, it all works fine.
Can anyone explain why this is needed?
The start of the td-agent is:
<source>
#type tail
path /var/log/containers/*s*.log
pos_file /var/log/td-agent/containers.json.access.pos
tag k.*
#read_from_head true
<parse>
#type regexp
expression /(^(?<header>[^\{]+)?(?<message>\{.+\})$)|(^(?<log>[^\{].+))/
</parse>
</source>
<filter k.var.log.containers.**.log>
#type parser
key_name message
format json
#time_parse false
time_key time
time_format %iso8601
keep_time_key true
</filter>
#this filter is used for C API which remove "[stdout]" from log
#if CLOG Unified Logging C API won't be used, this filter can be removed
<filter k.**.log>
#type parser
format /^(\[stdout\])*(?<log>.+)$/
key_name log
suppress_parse_error_log true
</filter>

Adding client-unique record to a log event, fluentd side. E.g., using filter

I succeeded getting a dockerized fluentd TCP logging to run! Meaning: There are
remote python containers using a slightly modified
logging.handlers.SocketHandler to send some JSON to fluentd - and
it actually arrives there, looking like this:
2020-08-31T09:06:31+00:00 paws.tcp {"service_uuid":"paws_log","loglvl":"INFO","file":"paws_log.paws_log","line":59,"msg":"Ping log line #2"}
I have multiple such python containers and would like to have fluentd add some
kind of source id to each log event. Reading the docs made me give the filter -> record
mechanism a chance. Leading to the following config snippet with a newly added
filter block:
<source>
#type tcp
#label stream_paws
#id paws_tcp
tag paws.tcp
port 5170
bind 0.0.0.0
# https://docs.fluentd.org/parser/regexp
<parse>
#type regexp
expression /^(?<service_uuid>[a-zA-Z0-9_-]+): (?<logtime>[^\s]+) (?<loglvl>[^\s]+) \[(?<file>[^\]:]+):(?<line>\d+)\]: (?<msg>.*)$/
time_key logtime
time_format %H:%M:%S
types line:integer
</parse>
</source>
# Add meta data fluentd side.
# https://docs.fluentd.org/deployment/logging
<filter **> # << Does NOT seem to work if kept outside the label-block! Inside is fine.
#type record_transformer
<record>
host "#{Socket.gethostname}"
</record>
</filter>
<label stream_paws>
<match paws.tcp>
#type file
#id output_paws_tcp
path /fluentd/log/paws/data/tcp.*.log
symlink_path /fluentd/log/paws/tcp.log
</match>
</label>
I have two questions here:
Above config works if I put the filter-block inside the label-block. But this I do not want to do because I want the filter to act globally. #include directives might offer a work-around here. Anything better?
I suspect "#{Socket.gethostname}" yields information on the fluentd server. However, I want something on the client. Ideally including some id that is unique on a docker container level (might be the container id. However, any old client-unique uuid would be fine). Do you know of such a property accessible to fluentd?
If you are using fluentd docker logging driver it will already add container metadata (including id) to every log record:
https://docs.docker.com/config/containers/logging/fluentd/
Above config works if I put the filter-block inside the label-block. But this I do not want to do because I want the filter to act globally. #include directives might offer a work-around here. Anything better?
A global filter usually implemented on a server like:
<source>
...
</source>
<filter **> # filter globally
...
</filter>
<match tag.one>
...
</match>
<match tag.two>
...
</match>
<match **> # the rest
...
</match>
I suspect "#{Socket.gethostname}" yields information on the fluentd server.
Correct, see: https://docs.fluentd.org/filter/record_transformer#example-configurations. This can be useful when you wanna also track which server processed the log record.
If you are using kubernetes then use kubernetes metadata it will add pod details with each log entry.
<filter kubernetes.**>
#id filter_kubernetes_metadata
#type kubernetes_metadata
</filter>
For Docker
I've not really used fluentd before, so apologies for a slightly abstract response here. But .. checking on http://docs.fluentd.org/ I guess you're probably using in_tail for the logs? From the example there, it looks like you'd probably want to get the path to the file into the input message:
path /path/to/file
tag foo.*
which apparently tags events with foo.path.to.file
you could probably use http://docs.fluentd.org/articles/filter_record_transformer with enable_ruby. From this, it looks like you could probably process the foo.path.to.file tag and use a little ruby to extract the container ID and then parse out then JSON file.
For example, testing with the following ruby file, say, foo.rb
tag = 'foo.var.lib.docker.containers.ID.ID-json.log'
require 'json'; id = tag.split('.')[5]; puts JSON.parse(IO.read("/var/lib/docker/containers/#{id}/config.v2.json"))['image']
where config.v2.json was something like:
{"image":"foo"}
will print you
foo
Fluentd might already be including json for you, so maybe you could leave out the require 'json'; bit. Then, putting this in fluentd terms, perhaps you could use something like
<filter>
enable_ruby
<record>
container ${tag.split('.')[5]}
image ${id = tag.split('.')[5]; JSON.parse(IO.read("/var/lib/docker/containers/#{id}/config.v2.json"))['image']}
</record>
</filter>
In your case might be you can use like below
<filter raw.**>
#type record_transformer
enable_ruby
<record>
container ${id = tag.split('.')[5]; JSON.parse(IO.read("/var/lib/docker/containers/#{id}/config.v2.json"))["Name"][1..-1]}
hostname "#{Socket.gethostname}"
</record>
</filter>

Fluentd: How to place the time key inside the json string

This is the record that fluentd write to my log.
2016-02-22 14:38:59 {"login_id":123,"login_email":"abc#gmail.com"}
The date time is the time key of fluentd. How can i place that time inside the json string ?
My friend helped me this. He used this fluentd plugin:
http://docs.fluentd.org/articles/filter-plugin-overview
This is the config:
<filter trackLog>
type record_modifier
<record>
fluentd_time ${Time.now.strftime("%Y-%m-%d %H:%M:%S")}
</record>
</filter>
<match trackLog>
type record_modifier
tag trackLog.finished
</match>
<match trackLog.finished>
type webhdfs
host localhost
port 50070
path /data/trackLog/%Y%m%d_%H
username hdfs
output_include_tag false
remove_prefix trackLog.finished
output_include_time false
buffer_type file
buffer_path /mnt/ramdisk/trackLog
buffer_chunk_limit 4m
buffer_queue_limit 50
flush_interval 5s
</match>

Resources