I need to remove keys from json messages, pretty easy with record_transformer and keep_keys or remove_keys.
But the keys I want to remove are values from a specific key (ex: filter)
input like:
{"message":"hello world!", "key1":"test1", "key2"="test2", "key3"="test3", "**filter**"="message,key3"}
want transformed into:
{"message":"hello world!", "key3"="test3"}
I want the keep_keys parameter been dynamic for each messages.
How this can be achieve ? with just config or with plugin modification ?
Any suggestions ?
You can try with ruby based logic with record field for removal like below.
Example 1:
https://github.com/repeatedly/fluent-plugin-record-modifier/issues/15
Sample for ruby implementation:
<filter kubernetes.**>
#type record_transformer
enable_ruby
<record>
log ${record["message"] or record["log"]}
</record>
remove_keys message
</filter>
Related
I intend to write some logs to a file called output.log using fluentd. Im using this configuration
<match foo.*>
#type file
path /var/log/output
path_suffix .log
append true
<buffer>
flush_mode interval
flush_interval 1m
</buffer>
format json
</match>
However fluentd is appending timestamps to the output file making it output..log. Is there a workaround to make this file output.log?
Check out the documentation. Seems like you can define custom log formats. ( https://docs.fluentd.org/v/0.12/articles/common-log-formats )
format /^\[[^ ]* \] \[(?<level>[^\]]*)\] \[pid (?<pid>[^\]]*)\] \[client (?<client>[^\]]*)\] (?<message>.*)$/
I have the following time key: "2019-05-12T14:52:13.136621898Z"
I can't figure out the time_format to parse this. I tried: "%Y-%m-%dT%H:%M:%S.%NZ" which should work from my understanding. When I parse with it my logs are stored starting from the epoch, suggesting parsing is failing.
If you are using the configuration provided in your earlier question then within source and parse section, specify time_format pattern which should take care of creating files based on your input file time
Here is example source configuration.
<source>
#type dummy
tag dummy
dummy [
{"message": "blah","time":"2019-05-12T14:52:13.136621898Z"}
]
<parse>
#type json
time_format "%Y-%m-%dT%H:%M:%S.%NZ"
</parse>
</source>
<match dummy>
#type stdout
</match>
If your input time is under different JSON key name then additonally provide the time_key.
{"message": "blah","mytimekey":"2019-05-12T14:52:13.136621898Z"}
time_key mytimekey
https://docs.fluentd.org/v1.0/articles/parse-section#parse-parameters
I am trying to receive data by fluentd from external system thats looks like:
data={"version":"0.0";"secret":null}
Response is:
400 Bad Request
'json' or 'msgpack' parameter is required
If i send (can not change real source) same string with "json" instead of "data" (like json={"version":"0.0";"secret":null}), everything is OK. How can i config fluentd to accept it same way? Thanks.
example of fluent.conf:
<source>
#type http
port 24224
bind 0.0.0.0
# accept "{"key":"value"} input
format json
# accept "json={"key":"value"} input
#format default
</source>
<match **>
#type file
#id output1
path /fluentd/log/data.*.log
symlink_path /fluentd/log/data.log
format json
append true
time_slice_format %Y%m%d
time_slice_wait 10m
time_format %Y%m%dT%H%M%S%z
</match>
I have tried using regex or to modify data by nginx. Regex is not possible due to encoded and complex data and did not find way how to modify POST data with nginx (also this is bad way).
Ill answer myself. After trying a lot of configurations (and hours of reading official documentations of fluentd/nginx and blogs) I decided to create plugin (http://docs.fluentd.org/articles/plugin-development#parser-plugins). I have ended with this solution:
Parser plugin
module Fluent
class TextParser
class CMXParser < Parser
# Register this parser
Plugin.register_parser("parser_CMX", self)
config_param :format_hash, :string, :default => "data" # delimiter is configurable with " " as default
def configure(conf)
super
end
# This is the main method. The input "text" is the unit of data to be parsed.
def parse(text)
text = WEBrick::HTTPUtils.parse_query(text)
record = JSON.parse(text[#format_hash])
yield nil, record
end
end
end
end
Config for Fluentd
<source>
#type http
port 24224
bind 0.0.0.0
body_size_limit 32m
keepalive_timeout 5s
format parser_CMX
</source>
<match **>
#type file
#id output1
path /fluentd/log/data.*.log
symlink_path /fluentd/log/data.log
format json
append true
time_slice_format %Y%m%d
time_slice_wait 10m
time_format %Y%m%dT%H%M%S%z
</match>
I think there is space to implement this to core code, becouse base in_http script does the same thing, except it use only hardcoded string "params['json']". It can use new variable like "format_hash"/"format_map" that can contains map for this purpose.
http://docs.fluentd.org/articles/in_http
This article shows accepted formats.
How can i config fluentd to accept it same way?
It means do you want to parse data={"k":"v"} with format json?
If so, it can't.
I'm using fluentd to tail log files and and push the logs to an elastic search index. I have two questions -
1) How does fluentd store the position it last read into for a given file?
An example in my pos file is -
/myfolder/myfile.log 00000000004cfccb 0000000000116ce0
What do the values 00000000004cfccb and 0000000000116ce0 denote?
2) This particular file (myfile.log) has 2520 lines in total. For some reason the last 100 lines were not read. I restarted the td agent but it still failed to read the last 100 lines. When can that happen?
My td-agent source looks something like this -
<source>
type tail
format none
path /myfolder/*.log
pos_file /var/log/td-agent/mylogfiles.pos
tag mylog.*
read_from_head true
</source>
Thanks,
For 1, see this comment: https://github.com/fluent/fluentd/blob/5ed4b29ebda0815edb7bdd444e7e5ca2e9874a27/lib/fluent/plugin/in_tail.rb#L652.
Hex number of position in the file and inode of the file.
For 2, putting events into file is also 2420 lines?
You can check the problem is in_tail or out_elasticsearch with below.
<match mylog.*>
type copy
<store>
type file
</store>
<store>
type elasticsearch
</store>
</match>
1)
pos_file /var/log/td-agent/mylogfiles.pos
above parameter is highly recommended. Fluentd will record the position it last read into this file. When ever you restart the td-agent it read the file of last index i.e (mylogfiles.pos internaly generated following values for ex: 00000000004cfccb and 0000000000116ce0 these values can changed whenever td-agent restart )
2)
Please follow the below URL
https://docs.fluentd.org/v1.0/articles/in_tail#read_lines_limit
read_from_head true remove and try once
From fluentd docs
The regexp for the format parameter can be specified. If the parameter
value starts and ends with “/”, it is considered to be a regexp. The
regexp must have at least one named capture (?PATTERN). If the
regexp has a capture named ‘time’, it is used as the time of the
event. You can specify the time format using the time_format
parameter.
I'm just getting started with fluentd, but I would like to be able to set up a single output match rule, like so:
<match myapp.**>
type file
path logs/
time_slice_format %Y%m%dT%H
time_slice_wait 5m
time_format %Y%m%dT%H%M%S%z
</match>
This works great, but I would like to find a way to further add the name of the tag into the output filename; is this possible? For example, if I log with myapp.debug I would like it to write to logs/myapp.debug20140918T12_0.log, and if I log with myapp.info it would write to logs/myapp.info20140918T12_0.log etc
Is there a way to add the tag into the filename?
Use the fluent-plugin-forest plugin.
https://github.com/tagomoris/fluent-plugin-forest