I am using fluentd version 0.14.6. I want to have the milliseconds (or better) captured by fluentd and then passed on to ElasticSearch, so that the entries are shown in the correct order. Here is my fluentd.conf:
<source>
#type tail
path /home/app/rails/current/log/development.log
pos_file /home/app/fluentd/rails.pos
tag rails.access
format /^\[(?<time>\S{10}T\S{8}.\d{3})\] \[(?<remoteIP>\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})\] \[(?<requestUUID>\w+)\] \[AWSELB=(?<awselb>\w+)\] (?<message>.*)/
time_format %Y-%m-%dT%H:%M:%S.%L
</source>
<match rails.access>
#type stdout
time_as_integer false
</match>
And here is a sample log entry from Rails
[2016-09-27T19:10:05.732] [xxx.xxx.xxx.46] [46171c9870ab2d06bc3a9a0bb02] [AWSELB=97B1C1B51866B68887CF7F5B8C352C45CA31592743CF389F006C541D59ED5E01852E7EF67C807B1CFC8BC145D569BCB9859AFCA73D10A87920CF2269DE5A47D16536B33873DEEF4A24967661232B38E564] Completed 200 OK in 39.8ms (Views: 0.5ms | ActiveRecord: 14.9ms)
This all parses fine, except the milliseconds are dropped. Here is a result from STDOUT
2016-09-27 19:43:56 +0000 rails.access: {"remoteIP":"xxx.xxx.xxx.46","requestUUID":"0238cb3d812534487181b2c54bd20","awselb":"97B1C1B51866B68887CF7F5B8C352C43CA21592743CF389F006C541D59ED5E01852E7EF67C807B1CFC8BC145D569BCB9859AFCA73D10A87920CF2269DE5A47D16536B33873DEEF4A24967661232B38E564","message":""}
I have searched SO, but the two posts listed are from a time before this PR, which is supposed to add in milliseconds. It is merged. The PR mentions adding a time_as_integer option, which I have done. I tried setting it to both true and false, as there is some confusion in the PR, but it made no difference. I also tried putting it into the source, but that threw an error.
I also looked at this post, which is trying to get to nano second, which I don't need. It also is not a good solution for me, as the time would then come from fluentd, not Rails.
Thanks for your help !
Your source is properly configured, the milliseconds are available to the output plugin. The stdout output plugin does not output milliseconds code.
You can test the milliseconds availability by using the file output plugin.
<match rails.access>
#type file
path example.out.log
time_format %Y-%m-%dT%H:%M:%S.%L
</match>
The ElasticSearch output plugin takes milliseconds into account source.
Related
I'm trying to use fluentd to copy a bunch of log files. All log files need to be written to the same destination directory.
#type tail
#id container-input
format none
path "/var/log/containers/plugin*.log"
# This path would match multiple files that I want to log
pos_file "/var/log/plugin.log.pos"
refresh_interval 5
rotate_wait 5
read_from_head "true"
tag plugin.*
</source>
<filter plugin.**>
#type record_transformer
<record>
filename ${tag_suffix[-2]}
</record>
</filter>
<match plugin**>
#type file
path /destlogs/plugin.log
</match>
What I want is to use the filename somewhere in the output path, something like
path /destlogs/plugin-${filename}.log
However when I use such configuration, fluentd does not pick the filename tag as a variable, rather it just creates the path as is.
How to use a tag as a variable in the output path?
The issue here was the version. v0.12 seems to not support tags in the match section, whereas v1.0 does. I was using fluentd image
fluent/fluentd-kubernetes-daemonset:elasticsearch Which I realized uses the older fluentd version. Updating it to
fluent/fluentd-kubernetes-daemonset:v1-debian-elasticsearch resolved the issue.
I succeeded getting a dockerized fluentd TCP logging to run! Meaning: There are
remote python containers using a slightly modified
logging.handlers.SocketHandler to send some JSON to fluentd - and
it actually arrives there, looking like this:
2020-08-31T09:06:31+00:00 paws.tcp {"service_uuid":"paws_log","loglvl":"INFO","file":"paws_log.paws_log","line":59,"msg":"Ping log line #2"}
I have multiple such python containers and would like to have fluentd add some
kind of source id to each log event. Reading the docs made me give the filter -> record
mechanism a chance. Leading to the following config snippet with a newly added
filter block:
<source>
#type tcp
#label stream_paws
#id paws_tcp
tag paws.tcp
port 5170
bind 0.0.0.0
# https://docs.fluentd.org/parser/regexp
<parse>
#type regexp
expression /^(?<service_uuid>[a-zA-Z0-9_-]+): (?<logtime>[^\s]+) (?<loglvl>[^\s]+) \[(?<file>[^\]:]+):(?<line>\d+)\]: (?<msg>.*)$/
time_key logtime
time_format %H:%M:%S
types line:integer
</parse>
</source>
# Add meta data fluentd side.
# https://docs.fluentd.org/deployment/logging
<filter **> # << Does NOT seem to work if kept outside the label-block! Inside is fine.
#type record_transformer
<record>
host "#{Socket.gethostname}"
</record>
</filter>
<label stream_paws>
<match paws.tcp>
#type file
#id output_paws_tcp
path /fluentd/log/paws/data/tcp.*.log
symlink_path /fluentd/log/paws/tcp.log
</match>
</label>
I have two questions here:
Above config works if I put the filter-block inside the label-block. But this I do not want to do because I want the filter to act globally. #include directives might offer a work-around here. Anything better?
I suspect "#{Socket.gethostname}" yields information on the fluentd server. However, I want something on the client. Ideally including some id that is unique on a docker container level (might be the container id. However, any old client-unique uuid would be fine). Do you know of such a property accessible to fluentd?
If you are using fluentd docker logging driver it will already add container metadata (including id) to every log record:
https://docs.docker.com/config/containers/logging/fluentd/
Above config works if I put the filter-block inside the label-block. But this I do not want to do because I want the filter to act globally. #include directives might offer a work-around here. Anything better?
A global filter usually implemented on a server like:
<source>
...
</source>
<filter **> # filter globally
...
</filter>
<match tag.one>
...
</match>
<match tag.two>
...
</match>
<match **> # the rest
...
</match>
I suspect "#{Socket.gethostname}" yields information on the fluentd server.
Correct, see: https://docs.fluentd.org/filter/record_transformer#example-configurations. This can be useful when you wanna also track which server processed the log record.
If you are using kubernetes then use kubernetes metadata it will add pod details with each log entry.
<filter kubernetes.**>
#id filter_kubernetes_metadata
#type kubernetes_metadata
</filter>
For Docker
I've not really used fluentd before, so apologies for a slightly abstract response here. But .. checking on http://docs.fluentd.org/ I guess you're probably using in_tail for the logs? From the example there, it looks like you'd probably want to get the path to the file into the input message:
path /path/to/file
tag foo.*
which apparently tags events with foo.path.to.file
you could probably use http://docs.fluentd.org/articles/filter_record_transformer with enable_ruby. From this, it looks like you could probably process the foo.path.to.file tag and use a little ruby to extract the container ID and then parse out then JSON file.
For example, testing with the following ruby file, say, foo.rb
tag = 'foo.var.lib.docker.containers.ID.ID-json.log'
require 'json'; id = tag.split('.')[5]; puts JSON.parse(IO.read("/var/lib/docker/containers/#{id}/config.v2.json"))['image']
where config.v2.json was something like:
{"image":"foo"}
will print you
foo
Fluentd might already be including json for you, so maybe you could leave out the require 'json'; bit. Then, putting this in fluentd terms, perhaps you could use something like
<filter>
enable_ruby
<record>
container ${tag.split('.')[5]}
image ${id = tag.split('.')[5]; JSON.parse(IO.read("/var/lib/docker/containers/#{id}/config.v2.json"))['image']}
</record>
</filter>
In your case might be you can use like below
<filter raw.**>
#type record_transformer
enable_ruby
<record>
container ${id = tag.split('.')[5]; JSON.parse(IO.read("/var/lib/docker/containers/#{id}/config.v2.json"))["Name"][1..-1]}
hostname "#{Socket.gethostname}"
</record>
</filter>
fluentd - how to source log file name with timestamp
e.g. Catalina logs are generated with timestamp e.g.
catalina.2018-11-05.log
catalina.2018-12-03.log
catalina.2018-12-10.log
I would like fluentd to access latest log file based on the timestamp in file name. Can you suggest what the source path should look like in td-agent.conf
<source>
#type tail
path D:\apache-tomcat-9.0.12\logs\catalina.**[TODAY]**.log
pos_file C:\opt\td-agent\javalogs.log.pos
tag javalogs
<parse>
#type json
</parse>
</source>
<match javalogs>
#type stdout
</match>
Try below path syntax.
path D:\apache-tomcat-9.0.12\logs\catalina.%Y-%m-%d.log
Note - Make sure your files are created according to same timezone as fluentd agent process so it can properly tail the correctly created files. Also, fluentd process should have correct read permissions on catalina files.
We are running spark jobs (lot of spark streaming) on Google cloud Dataproc clusters.
we are using cloud logging to collect all the logs generated by spark jobs.
currently it is generating lot of "INFO" messages which causes the whole log volumes to size of few TBs.
I want to edit the google-fluentd config to restrict the log level to "ERROR" level instead of "INFO".
tried to set the config as "log_level error" , but did not work.
also its mentioned in the comment section in /etc/google-fluentd/google-fluentd.conf as # Currently severity is a seperate field from the Cloud Logging log_level.
# Fluentd config to tail the hadoop, hive, and spark message log.
# Currently severity is a seperate field from the Cloud Logging log_level.
<source>
type tail
format multi_format
<pattern>
format /^((?<time>[^ ]* [^ ]*) *(?<severity>[^ ]*) *(?<class>[^ ]*): (?<message>.*))/
/etc/google-fluentd/google-fluentd.conf/etc/google-fluentd/google-fluentd.conf/etc/google-fluentd/google-fluentd.conf time_format %Y-%m-%d %H:%M:%S,%L
</pattern>
<pattern>
format none
</pattern>
path /var/log/hadoop*/*.log,/var/log/hadoop-yarn/userlogs/**/stderr,/var/log/hive/*.log,/var/log/spark/*.log,
pos_file /var/tmp/fluentd.dataproc.hadoop.pos
refresh_interval 2s
read_from_head true
tag raw.tail.*
</source>
Correct. As the comment states, #log_level and severity are not the same, which is confusing at best. #log_level configures the verbosity for the logger of the component, whereas severity is the field that Stackdriver Logging ingests.
In order to make fluentd exclude any severity below ERROR you can add a grep filter to /etc/google-fluentd/google-fluentd.conf that explicitly excludes these by name.
At some point before the <match **> block add the following:
<filter raw.tail.**>
#type grep
exclude1 severity (DEBUG|INFO|NOTICE|WARNING)
</filter>
Which will check the record for the severity field and reject it if the value matches the regex.
I came across Fluentd last week. I liked it at first (still do), but there seem to be a few holes that are preventing me from using it.
I'm trying to forward our logs to two different locations - an S3 bucket to archive, and an Elasticsearch database for analytics with kibana. I looked at the fluent-forest-plugin, but I realize that won't work because of this. I tried using the copy plugin, but I'm getting this error:
[error]: config error file="/etc/td-agent/td-agent.conf" error="Other 's3' plugin already use same buffer_path: type = s3, buffer_path = /tmp/fluent-plugin-s3"
with this config
<source>
type tail
path /var/log/nginx/web__error.log
pos_file /var/tmp/nginx_web__error.pos
tag web__error
format /^(?<time>[^ ]+ [^ ]+) \[(?<log_level>.*)\] (?<pid>\d*).(?<tid>[^:]*): (?<message>.*)$/
</source>
<match web__error>
type copy
<store>
type s3
aws_key_id ACC_KEY
aws_sec_key SEC_KEY
s3_bucket log-bucket
path web__error/
buffer_path /tmp/fluent-plugin-s3
s3_object_key_format %{path}%{time_slice}_%{index}.%{file_extension}
time_slice_format %Y-%m-%d/%H
flush_interval 15s
utc
</store>
<store>
type elasticsearch
logstash_format true
logstash_prefix web__error
flush_interval 15s
include_tag_key true
utc_index true
</store>
</match>
From what I've read, once an event is caught in one match block, it can't be caught by any subsequent ones. As a last resort, I need to know if there is any way to do this that I haven't found yet?
This is a non-issue - I forgot I was using the same buffer_path in other config files, which caused this error.