fluentd failed loading custom parser - fluentd

fluentd fails to run with my custom parser:
my parser is named dn_multiline_parser and located at lib/fluent/plugin/parser_dn_log.rb
require 'fluent/plugin/parser'
# default format: "%time [%level] %message"
module Fluent
module Plugin
class DnLogParser < Parser
Plugin.register_parser('dn_multiline_parser', self)
... (parser impl)
end
end
end
fluentd.conf:
<worker 0>
<source>
#type tail
# path /external/core/traces/*/*
tag dn.traces
path /core/dn_gen*
path_key logfile
pos_file /tmp/traces.position.traces
tag dn.traces
rotate_wait 5
read_from_head true
multiline_flush_interval 5s
read_lines_limit 100000
refresh_interval 1
exclude_path ["*.gz","*.zip","*.backup"]
<parse>
#type dn_multiline_parser
time_key dn_timestamp
log_format "%t [%l] %m"
time_length 23
time_format %Y-%m-%d %H:%M:%S
</parse>
</source>
</worker>
... (rest of configuration)
when i run fluentd, i get:
2019-12-23 14:32:24 +0000 [error]: config error file="/etc/fluent/fluent.conf" error_class=Fluent::ConfigError error="Unknown parser plugin 'dn_multiline_parser'. Run 'gem search -rd fluent-plugin' to find plugins"
what i've tried:
run fluentd with '-p /path/to/plugin_folder', and it failed with the above error
build a gem, install it, and run fluentd without the '-p', and it failed with the above error (my plugin exists in 'gem list' outputs)
any ideas what am i doing wrong?

Related

OpenSearch Dashboard time field

I have Fluentd + OpenSearch + OpenSearch Dashboard stack for working with logs. The problem is my time field in Opensearch Dashboard is string, so my filter by time doesn't work.
Any body knows what's wrong with my configuration?
Fluentd parser:
<source>
#type tail
path /opt/liferay/logs/*.json.log
pos_file /var/log/td-agent/test1_gpay.pos
read_from_head true
follow_inodes true
refresh_interval 10
tag gpay1
<parse>
#type json
time_type string
time_format %Y-%m-%d %H:%M:%S.%L
time_key time
keep_time_key true
</parse>
</source>
My log format is:
{"time":"2023-02-07 14:00:00.039", "level":"DEBUG", "thread":"[liferay/scheduler_dispatch-3]", "logger":"[GeneralListener:82]", "message":"Found 0 tasks for launch."}
And what I have in OpenSearch Dashboard:
I tried to use scripted fields in OpenSearch Dashboard, but my filter for time doesn't work.

How to parse kubernetes logs with Fluentd

I use few services in EKS cluster. I want the logs from 1 of my services to be parsed
kubectl logs "pod_name" --> this are the logs when I check directly in the pod service
2022-09-21 10:44:26,434 [springHikariCP housekeeper ] DEBUG HikariPool - springHikariCP - Fill pool skipped, pool is at sufficient level.
2022-09-21 10:44:36,316 [springHikariCP housekeeper ] DEBUG HikariPool - springHikariCP - Before cleanup stats (total=10, active=0, idle=10, waiting=0)
This service has java based login (Apache Commons logging) and in kibana at the moment is displayed whole log message with date and time + Log Level + message :
Is it possible this whole log to be parsed into the separate fields (time and date + Log Level + message) and displayed in the Kibana like that.
This is my fluentd config file for the source and pattern:
<source>
#type tail
path /var/log/containers/*background-executor*.log
pos_file fluentd-docker.pos
tag kubernetes.*
read_from_head true
<parse>
#type multi_format
<pattern>
format json
time_key time
time_type string
time_format "%Y-%m-%dT%H:%M:%S.%NZ"
keep_time_key false
</pattern>
<pattern>
format regexp
expression /^(?<time>\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2}.\d{3})\s+(?<level>[^\s]+)\s+(?<pid>\d+).*?\[\s+(?<thread>.*)\]\s+(?<class>.*)\s+:\s+(?<message>.*)/
time_format '%Y-%m-%dT%H:%M:%S.%N%:z'
keep_time_key false
</pattern>
</parse>
</source>
You have to just update the filter as per need
<filter **>
#type record_transformer
enable_ruby
<record>
foo "bar"
KEY "VALUE"
podname "${record['tailed_path'].to_s.split('/')[-3]}"
test "passed"
time "${record['message'].match('[0-9]{2}\\/[A-Z][a-z]{2}\\/[0-9]{4}:[0-9]{2}:[0-9]{2}:[0-9]{2}\\s
+[0-9]{4}').to_s}"
</record>
</filter>
you have to parse the record & message with data something like if between [0-9] or [A-Z] same way show in above example.
Edit the filter.conf
You can create your own Key and value, in value you have to parse the filed and flutenD will populate the value.

Fluentd/Bindplane wrong timestamp for collecting HAProxy logs

I'm facing an issue when I send the logs generated from HAProxy to Google Blindplane.
The log time format is "%b %d %H:%M:%S", and GCP stackdriver logs accepted format is UTC such as this form "2020-10-12T07:20:50.52Z".
When I try to transform the log time it doesn't show the right log time, it only shows the time of which I ran fluentd service neglecting the log real time.
I've tried couple of ways using fluentd time_format, Ruby time.Parse(), & Ruby time.strptime() none of them worked.
Here's the config for collecting HAProxy logs
<source>
#type tail
tag varnish.haproxy
path e:/haproxy/log/haproxy.log*
pos_file 'C:\Program Files (x86)\Stackdriver\LoggingAgent\Main\pos\varnish_haproxy.pos'
<parse>
#type multiline
format_firstline /\w{3} \d{2} \d{2}:\d{2}:\d{2}/
format1 /(?<message>.*)/
</parse>
read_from_head true
refresh_interval 2
</source>
<filter varnish.haproxy>
#type parser
key_name message
remove_key_name_field false
reserve_data true
reserve_time true
<parse>
#type multiline
format_firstline /\w{3} \d{2} \d{2}:\d{2}:\d{2}/
format1 /(?<timestamp>\w{3} \d{2} \d{2}:\d{2}:\d{2}) (?<servername>[^\s]) (?<ps>[^\[]+)(?<pid>\[[^\]]+\]): (?<client_ip>[\w\.]+):(?<client_port>\d+) \[(?<request_date>.+)\] (?<frontend_name>[\w\.-]+)~ (?<backend_name>[\w\.-]+)\/(?<server_name>[\w\.-]+) (?<TR>\d+)\/(?<Tw>\d+)\/(?<Tc>\d+)\/(?<Tr>\d+)\/(?<Ta>\d+) (?<status_code>\d+) (?<bytes_read>\d+) (?<captured_request_cookie>.+) (?<captured_response_cookie>.+) (?<termination_state>.+) (?<actconn>\d+)\/(?<feconn>\d+)\/(?<beconn>\d+)\/(?<srv_conn>\d+)\/(?<retries>\d+) (?<srv_queue>\d+)\/(?<backend_queue>\d+) \"(?<message>.*)\"/
</parse>
</filter>
<filter varnish.haproxy>
#type record_transformer
enable_ruby
<record>
timestamp ${t = Time.parse(record['timestamp']).utc; {'seconds' => t.tv_sec, 'nanos' => t.tv_nsec}}
</record>
</filter>
the log file sample:
Jun 23 14:00:00 localhost haproxy[26781]: xx.xx.xx.xxx:xxxxx [23/Jun/2021:14:00:00.561] https-in~ nodes-http/xxxxxxxx 0/0/0/314/314 200 278 - - --NI 274/274/100/23/0 0/0 "GET /api/customer/xxxxxxxx HTTP/1.1"
Jun 23 14:00:00 localhost haproxy[26781]: xx.xx.xx.xxx:xxxxxx [23/Jun/2021:13:59:59.901] https-in~ nodes-http/xxxxxxxxx 0/0/1/994/995 200 1485 - - --NI 274/274/100/22/0 0/0 "GET /api/customer/view HTTP/1.1"
Here's the screenshot from GCP logging timestamp
Have you tried using Time.strftime?
timestamp ${Time.parse(record['timestamp']).strftime("%Y-%m-%dT%k:%M:%S:00Z")}
I assumed the nano seconds would be non-important since it seems that fluentd doesn't provide them.

format of fluentd logs

I'm having issues figuring out how to parse logs in my k8s cluster using fluentd.
I get some logs that look like
2019-09-19 16:05:44.257 [info] some log message
I would like to parse out the time, log level, and log
I've tested the regex here and it seems to parse out the pieces I need. But when I add that to my config map that contains the fluentd configuration when my logs ship I still just see one long log message that looks like the above and level is not parsed out. Brand new to using fluentd so I'm not sure how to do this.
config map looks like
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-config-map
namespace: logging
labels:
k8s-app: fluentd-logzio
data:
fluent.conf: |-
<match fluent.**>
# this tells fluentd to not output its log on stdout
#type null
</match>
# here we read the logs from Docker's containers and parse them
<source>
#id fluentd-containers.log
#type tail
path /var/log/containers/*.log
pos_file /var/log/es-containers.log.pos
tag raw.kubernetes.*
read_from_head true
<parse>
#type multi_format
<pattern>
format /^(?<time>.+) (\[(?<level>.*)\]) (?<log>.*)$/
time_format %Y-%m-%dT%H:%M:%S.%N%:z
</pattern>
<pattern>
format json
time_key time
time_format %Y-%m-%dT%H:%M:%S.%NZ
</pattern>
<pattern>
format /^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/
time_format %Y-%m-%dT%H:%M:%S.%N%:z
</pattern>
</parse>
</source>
# Detect exceptions in the log output and forward them as one log entry.
<match raw.kubernetes.**>
#id raw.kubernetes
#type detect_exceptions
remove_tag_prefix raw
message log
stream stream
multiline_flush_interval 5
max_bytes 500000
max_lines 1000
</match>
# Concatenate multi-line logs
<filter **>
#id filter_concat
#type concat
key message
multiline_end_regexp /\n$/
separator ""
</filter>
# Enriches records with Kubernetes metadata
<filter kubernetes.**>
#id filter_kubernetes_metadata
#type kubernetes_metadata
</filter>
<match kubernetes.**>
#type logzio_buffered
#id out_logzio
endpoint_url ###
output_include_time true
output_include_tags true
<buffer>
# Set the buffer type to file to improve the reliability and reduce the memory consumption
#type file
path /var/log/fluentd-buffers/stackdriver.buffer
# Set queue_full action to block because we want to pause gracefully
# in case of the off-the-limits load instead of throwing an exception
overflow_action block
# Set the chunk limit conservatively to avoid exceeding the GCL limit
# of 10MiB per write request.
chunk_limit_size 2M
# Cap the combined memory usage of this buffer and the one below to
# 2MiB/chunk * (6 + 2) chunks = 16 MiB
queue_limit_length 6
# Never wait more than 5 seconds before flushing logs in the non-error case.
flush_interval 5s
# Never wait longer than 30 seconds between retries.
retry_max_interval 30
# Disable the limit on the number of retries (retry forever).
retry_forever true
# Use multiple threads for processing.
flush_thread_count 2
</buffer>
</match>

Fluentd log source

I'm using Fluentd for shipping two types of logs to Elasticsearch cluster (application and other logs).
Logs located in the same folder /var/log/containers/ and have same name format e.g: app-randomtext.log, dts-randomtext.log etc .
I'd like to assign different indices to them to separate app logs from any other that present now or will appear in this folder.
Here is my try to make a wildcard for "path" in the block, but it doesn't work. Could anybody point me where is my mistake? Thanks
##source for app logs
<source>
#type tail
path /var/log/containers/app*.log
pos_file /var/log/fluentd-containers-app.log.pos
time_format %Y-%m-%dT%H:%M:%S.%NZ
tag app.*
keep_time_key true
format json
</source>
##source for everything else
<source>
#type tail
path /var/log/containers/!(app*.log)
pos_file /var/log/fluentd-containers-non-app.log.pos
time_format %Y-%m-%dT%H:%M:%S.%NZ
tag non-app.*
keep_time_key true
format json
</source>
<match app.**>
#type "aws-elasticsearch-service"
type_name "kube-fluentd-aws-es"
index_name app
include_tag_key true
tag_key "#log_name"
#log_level info
<endpoint>
url "#{ENV['ELASTICSEARCH_URL']}"
region "#{ENV['ELASTICSEARCH_REGION']}"
access_key_id "#{ENV['ELASTICSEARCH_ACCESS_KEY']}"
secret_access_key "#{ENV['ELASTICSEARCH_SECRET_KEY']}"
</endpoint>
</match>
<match non-app.**>
#type "aws-elasticsearch-service"
type_name "kube-fluentd-aws-es"
index_name non-app
include_tag_key true
tag_key "#log_name"
#log_level info
<endpoint>
url "#{ENV['ELASTICSEARCH_URL']}"
region "#{ENV['ELASTICSEARCH_REGION']}"
access_key_id "#{ENV['ELASTICSEARCH_ACCESS_KEY']}"
secret_access_key "#{ENV['ELASTICSEARCH_SECRET_KEY']}"
</endpoint>
</match>
I expect Fluentd to follow tail of all the logs in the folder, but with this config Fluentd follows tail only for app-randomtext.log
Thanks
Finally I managed to find an answer. exclude_path is what I need.
##source for everything else
<source>
#type tail
path /var/log/containers/*.log
exclude_path ["/var/log/containers/app*.log"]
pos_file /var/log/fluentd-containers-non-app.log.pos
time_format %Y-%m-%dT%H:%M:%S.%NZ
tag non-app.*
keep_time_key true
format json
</source>
Here Fluentd follows all the *.log files excluding those that starts with app

Resources