fluentd output json value as json without message_key - fluentd

I'm creating a log pipeline with filtering, transformations and multiple output routes.
I have a problem with outputing the raw log (without the "message_key").
Currently, the log looks like:
{"log": {"type": "debug", "log" :"This is the log message" , <More Entries>} }
I would like to drop the "log" message_key and output:
{"type": "debug", "log" :"This is the log message", <More Entries>}
I've tried:
1.
<filter *>
#type parser
key_name log
<parse>
#type json
</parse>
</filter>
And got an error probally since the the type is already a json.
2.
<filter *>
#type parser
key_name log
<parse>
#type none
</parse>
</filter>
And got this output (message "message_key" instead of the current "log"):
{"message": {"type": "debug", "log" :"This is the log message"} }
Tried to use the #type record_transformer, but the <record> want's a key-value and I would like to select the value only.
Tried to format under with single value, but the output was:
{"type" => "debug", "log" => "This is the log message"}
How can this be done? What's the best way to drop the message_key before outputing the log?

After skimming through the fluentd plugins here I didn't find a way to do what I wanted, so I've ended out writing my own plugin.
I'm not going to accept my answer since I hope someone will provide a better using a certified plugin.
Just in case you are desperate for a solution, here's the plugin:
require "fluent/plugin/filter"
module Fluent
module Plugin
class JsonRecordByKeyFilter < Fluent::Plugin::Filter
Fluent::Plugin.register_filter("json_record_by_key", self)
config_param :key
def filter(tag, time, record)
record[#key]
end
end
end
end
Usage:
<filter *>
#type json_record_by_key
key log
</filter>

Ran into this same problem, but found a solution as outlined on Fluentd's website using the remove_key_name_field command:
https://docs.fluentd.org/filter/parser#remove_key_name_field
<filter *>
#type parser
key_name log
reserve_data true
remove_key_name_field true
<parse>
#type json
</parse>
</filter>

Related

How to unnest/flatten json in fluentd using parser filter?

This is my current config.
<source>
#type dummy
rate 1
tag eggsample
dummy {"event":"signup","context":{"ip":"105.175.82.28"}}
</source>
<filter eggsample>
#type parser
key_name context
reserve_data true
<parse>
#type json
</parse>
</filter>
<match eggsample>
#type stdout
</match>
As shown in the example above, I'm trying to unnest/flatten the "context" object
Hence:
{"event":"purchase","context":{"ip":"105.175.82.28"}} > {"event":"purchase","ip":"105.175.82.28"}
However, I'm getting a error where the parser plugin is raising pattern not matched error, even though my config should be similar to that of the documentation.
2022-11-28 07:43:03 +0000 [warn]: #0 dump an error event: error_class=Fluent::Plugin::Parser::ParserError error="pattern not matched with data '{\"ip\"=>\"105.175.82.28\"}'" location=nil tag="eggsample" time=2022-11-28 07:43:03.006905641 +0000 record={"event"=>"signup", "log"=>{"ip"=>"105.175.82.28"}}
2022-11-28 07:43:03.006905641 +0000 eggsample: {"event":"signup","log":{"ip":"105.175.82.28"}}

Fluentd element emitts message record

We have a td_agent.conf file with the following tag:
#this filter is used for C API which remove "[stdout]" from log
#if CLOG Unified Logging C API won't be used, this filter can be removed
<filter k.**.log>
#type parser
format /^(\[stdout\])*(?<log>.+)$/
key_name log
suppress_parse_error_log true
</filter>
and the following sample log line:
{"host":"omer","level":"TRACE","log":{"classname":"Manager:452","message":"^~\"DD\"-^ TRACE Added context","stacktrace":"","threadname":"Processing-ThreadPool-2"},"process":"Context","service":"","time":"2020-11-04T13:37:12.979Z","timezone":"Kolkata","type":"log"}
When having the above logic in Fluentd, we get the log outputted, with the log: {} emitted, that means not having the info that we want in the elastic db. When removing the tag, it all works fine.
Can anyone explain why this is needed?
The start of the td-agent is:
<source>
#type tail
path /var/log/containers/*s*.log
pos_file /var/log/td-agent/containers.json.access.pos
tag k.*
#read_from_head true
<parse>
#type regexp
expression /(^(?<header>[^\{]+)?(?<message>\{.+\})$)|(^(?<log>[^\{].+))/
</parse>
</source>
<filter k.var.log.containers.**.log>
#type parser
key_name message
format json
#time_parse false
time_key time
time_format %iso8601
keep_time_key true
</filter>
#this filter is used for C API which remove "[stdout]" from log
#if CLOG Unified Logging C API won't be used, this filter can be removed
<filter k.**.log>
#type parser
format /^(\[stdout\])*(?<log>.+)$/
key_name log
suppress_parse_error_log true
</filter>

Adding client-unique record to a log event, fluentd side. E.g., using filter

I succeeded getting a dockerized fluentd TCP logging to run! Meaning: There are
remote python containers using a slightly modified
logging.handlers.SocketHandler to send some JSON to fluentd - and
it actually arrives there, looking like this:
2020-08-31T09:06:31+00:00 paws.tcp {"service_uuid":"paws_log","loglvl":"INFO","file":"paws_log.paws_log","line":59,"msg":"Ping log line #2"}
I have multiple such python containers and would like to have fluentd add some
kind of source id to each log event. Reading the docs made me give the filter -> record
mechanism a chance. Leading to the following config snippet with a newly added
filter block:
<source>
#type tcp
#label stream_paws
#id paws_tcp
tag paws.tcp
port 5170
bind 0.0.0.0
# https://docs.fluentd.org/parser/regexp
<parse>
#type regexp
expression /^(?<service_uuid>[a-zA-Z0-9_-]+): (?<logtime>[^\s]+) (?<loglvl>[^\s]+) \[(?<file>[^\]:]+):(?<line>\d+)\]: (?<msg>.*)$/
time_key logtime
time_format %H:%M:%S
types line:integer
</parse>
</source>
# Add meta data fluentd side.
# https://docs.fluentd.org/deployment/logging
<filter **> # << Does NOT seem to work if kept outside the label-block! Inside is fine.
#type record_transformer
<record>
host "#{Socket.gethostname}"
</record>
</filter>
<label stream_paws>
<match paws.tcp>
#type file
#id output_paws_tcp
path /fluentd/log/paws/data/tcp.*.log
symlink_path /fluentd/log/paws/tcp.log
</match>
</label>
I have two questions here:
Above config works if I put the filter-block inside the label-block. But this I do not want to do because I want the filter to act globally. #include directives might offer a work-around here. Anything better?
I suspect "#{Socket.gethostname}" yields information on the fluentd server. However, I want something on the client. Ideally including some id that is unique on a docker container level (might be the container id. However, any old client-unique uuid would be fine). Do you know of such a property accessible to fluentd?
If you are using fluentd docker logging driver it will already add container metadata (including id) to every log record:
https://docs.docker.com/config/containers/logging/fluentd/
Above config works if I put the filter-block inside the label-block. But this I do not want to do because I want the filter to act globally. #include directives might offer a work-around here. Anything better?
A global filter usually implemented on a server like:
<source>
...
</source>
<filter **> # filter globally
...
</filter>
<match tag.one>
...
</match>
<match tag.two>
...
</match>
<match **> # the rest
...
</match>
I suspect "#{Socket.gethostname}" yields information on the fluentd server.
Correct, see: https://docs.fluentd.org/filter/record_transformer#example-configurations. This can be useful when you wanna also track which server processed the log record.
If you are using kubernetes then use kubernetes metadata it will add pod details with each log entry.
<filter kubernetes.**>
#id filter_kubernetes_metadata
#type kubernetes_metadata
</filter>
For Docker
I've not really used fluentd before, so apologies for a slightly abstract response here. But .. checking on http://docs.fluentd.org/ I guess you're probably using in_tail for the logs? From the example there, it looks like you'd probably want to get the path to the file into the input message:
path /path/to/file
tag foo.*
which apparently tags events with foo.path.to.file
you could probably use http://docs.fluentd.org/articles/filter_record_transformer with enable_ruby. From this, it looks like you could probably process the foo.path.to.file tag and use a little ruby to extract the container ID and then parse out then JSON file.
For example, testing with the following ruby file, say, foo.rb
tag = 'foo.var.lib.docker.containers.ID.ID-json.log'
require 'json'; id = tag.split('.')[5]; puts JSON.parse(IO.read("/var/lib/docker/containers/#{id}/config.v2.json"))['image']
where config.v2.json was something like:
{"image":"foo"}
will print you
foo
Fluentd might already be including json for you, so maybe you could leave out the require 'json'; bit. Then, putting this in fluentd terms, perhaps you could use something like
<filter>
enable_ruby
<record>
container ${tag.split('.')[5]}
image ${id = tag.split('.')[5]; JSON.parse(IO.read("/var/lib/docker/containers/#{id}/config.v2.json"))['image']}
</record>
</filter>
In your case might be you can use like below
<filter raw.**>
#type record_transformer
enable_ruby
<record>
container ${id = tag.split('.')[5]; JSON.parse(IO.read("/var/lib/docker/containers/#{id}/config.v2.json"))["Name"][1..-1]}
hostname "#{Socket.gethostname}"
</record>
</filter>

Fluentd (td-agent) secondary type should be same with primary one error

I'm using td-agent with http plugin for sending the log data to another server.
But when start td-agent with my config file, I got the warning message like below,
2019-09-06 11:02:15 +0900 [warn]: #0 secondary type should be same
with primary one primary="Fluent::TreasureDataLogOutput"
secondary="Fluent::Plugin::FileOutput"
Here is my config file,
<source>
#type tail
path /pub/var/log/mylog_%Y%m%d.log
pos_file /var/log/td-agent/www_log.log.pos
tag my.log
format /^(?<log_time>\d{4}\/\d{2}\/\d{2}\s\d{2}:\d{2}:\d{2}) \[INFO\] (?<message>.*)$/
</source>
<filter my.log>
#type parser
key_name message
<parse>
#type json
</parse>
</filter>
<match my.log>
#type http
endpoint_url http://localhost:3000/
custom_headers {"user-agent": "td-agent"}
http_method post
raise_on_error true
</match>
It is sending the log data correctly but I need to resolve the warning message too. How can I resolve the warning?

google-fluentd : change severity in Cloud Logging log_level

We are running spark jobs (lot of spark streaming) on Google cloud Dataproc clusters.
we are using cloud logging to collect all the logs generated by spark jobs.
currently it is generating lot of "INFO" messages which causes the whole log volumes to size of few TBs.
I want to edit the google-fluentd config to restrict the log level to "ERROR" level instead of "INFO".
tried to set the config as "log_level error" , but did not work.
also its mentioned in the comment section in /etc/google-fluentd/google-fluentd.conf as # Currently severity is a seperate field from the Cloud Logging log_level.
# Fluentd config to tail the hadoop, hive, and spark message log.
# Currently severity is a seperate field from the Cloud Logging log_level.
<source>
type tail
format multi_format
<pattern>
format /^((?<time>[^ ]* [^ ]*) *(?<severity>[^ ]*) *(?<class>[^ ]*): (?<message>.*))/
/etc/google-fluentd/google-fluentd.conf/etc/google-fluentd/google-fluentd.conf/etc/google-fluentd/google-fluentd.conf time_format %Y-%m-%d %H:%M:%S,%L
</pattern>
<pattern>
format none
</pattern>
path /var/log/hadoop*/*.log,/var/log/hadoop-yarn/userlogs/**/stderr,/var/log/hive/*.log,/var/log/spark/*.log,
pos_file /var/tmp/fluentd.dataproc.hadoop.pos
refresh_interval 2s
read_from_head true
tag raw.tail.*
</source>
Correct. As the comment states, #log_level and severity are not the same, which is confusing at best. #log_level configures the verbosity for the logger of the component, whereas severity is the field that Stackdriver Logging ingests.
In order to make fluentd exclude any severity below ERROR you can add a grep filter to /etc/google-fluentd/google-fluentd.conf that explicitly excludes these by name.
At some point before the <match **> block add the following:
<filter raw.tail.**>
#type grep
exclude1 severity (DEBUG|INFO|NOTICE|WARNING)
</filter>
Which will check the record for the severity field and reject it if the value matches the regex.

Resources