Fluentd how to get source from a file by executing a script - fluentd

I have a script called script.py. After running the script, I get different/various .log files in a folder. How do I write a config file which runs the script and then sends a log file successfully?
Here is my configuration which appears to be getting no output from match #type stdout:
<source>
#type exec
tag sensor_1.log-raw-data
command python /home/cool/Desktop/script.py
run_interval 5m
<parse>
keys something
</parse>
</source>
<source>
#type tail
read_from_head true
path /home/cool/Desktop/logs/0slaprunner.log
tag foo.*
<parse>
#type none
</parse>
</source>
<match pattern>
#type stdout
</match>

The argument for match section is pattern. It means that this would cater for the events with tag pattern. But, none of the tags in source sections is pattern; hence, no routing to stdout.
From your description, it looks like you want to route events from the tail input plugin to the stdout output plugin, so the relevant configuration would be something like this:
<source>
#type tail
# ...
tag foo # tag for the events
# ...
</source>
<match foo> # cater events with tag `foo`
#type stdout
</match>
For debugging purposes, run fluentd with -v or -vv command-line option.

Related

Fluentd input_tail plugin generates different event key_name

I was to told collect 3 JAVA Logs to ES, the log have logrotate.
I'm using in_tail plugin to collect Logs.
I'm using td-agent v4, installed by yum
I notice that td-agent have some contradictory error logs:
2022-07-15 09:58:32 +0800 [warn]: #0 dump an error event: error_class=ArgumentError error="message does not exist" location=nil tag="condition-service" time=2022-07-15 09:57:26.032947914 +0800 record={"Log"=>"2022-07-15 09:57:26.032 [DubboServerHandler-10.65.8.13:20882-thread-100] ..."}
... I notice error above and then modify key_name
2022-07-15 16:44:37 +0800 [warn]: #0 dump an error event: error_class=ArgumentError error="Log does not exist" location=nil tag="condition-service" time=2022-07-15 16:44:37.938689464 +0800 record={"message"=>"2022-07-15 16:44:37.938 [checkQuoteAutoPush-thread] ..."}
I think it is the in_tail generates differnet event key_name, why? How do I control key_name and fix it?
Here is my td-agent configuration:
<system>
log_level info
</system>
<source>
#type tail
#label #condition-service
path /XXXX/log1.log,/XXXX/log2.log
pos_file /var/log/td-agent/tmp/condition-service.log.pos
tag condition-service
pos_file_compaction_interval 24h
<parse>
#type none
</parse>
</source>
<source>
#type tail
#label #condition-quotes
path /XXXX/log3.log
pos_file /var/log/td-agent/tmp/condition-quotes.log.pos
tag condition-quotes
pos_file_compaction_interval 24h
<parse>
#type none
</parse>
</source>
<label #condition-quotes>
<filter condition-quotes>
#type parser
key_name Log
<parse>
#type regexp
expression /^(?<Log>.*)/gm
</parse>
</filter>
<match condition-quotes>
#type elasticsearch
host XX.XX.XX.XX
port 9200
logstash_format true
logstash_prefix ${tag}
</match>
</label>
<label #condition-service>
<filter condition-service>
#type parser
key_name Log
<parse>
#type regexp
expression /^(?<Log>.*)/gm
</parse>
</filter>
<match condition-service>
#type elasticsearch
host XX.XX.XX.XX
port 9200
logstash_format true
logstash_prefix ${tag}
</match>
</label>
After I review my configuration,
I think the root cause is none parser plugin, I should specify message_key param.
...
<parse>
#type none
message_key Log
</parse>
...

multi_format with multiple lines (logs)

Yoo!
I'm new to fluentd and I've been messing around with it to work with GKE, and stepped upon one issue.
Problem
I'm using a filter to parse the containers log and I need different regex expressions so I added multi_format and it worked perfectly. After that I noticed that Tracelogs and exceptions were being splited into different logs/lines, so I then saw the multiline parser but didn't go as planned (it didn't work). I then found out why in the README.md of the multi-format repo
NOTE
This plugin doesn't work with multiline parsers because parser itself doesn't store previous lines.
Question
So I was wondering if somehow it was possible to workaround this limitation and have multiple regexes to filter the logs and also have a multi line parser or concat
Config file
kind: ConfigMap
apiVersion: v1
metadata:
#[[START configMapNameCM]]
name: fluentd-gcp-config-custom
namespace: kube-system
labels:
k8s-app: fluentd-gcp-custom
#[[END configMapNameCM]]
data:
containers.input.conf: |-
# This configuration file for Fluentd is used
# to watch changes to Docker log files that live in the
# directory /var/lib/docker/containers/ and are symbolically
# linked to from the /var/log/containers directory using names that capture the
# pod name and container name. These logs are then submitted to
# Google Cloud Logging which assumes the installation of the cloud-logging plug-in.
#
# Example
# =======
# A line in the Docker log file might look like this JSON:
#
# {"log":"2014/09/25 21:15:03 Got request with path wombat\\n",
# "stream":"stderr",
# "time":"2014-09-25T21:15:03.499185026Z"}
#
# The original tag is derived from the log file's location.
# For example a Docker container's logs might be in the directory:
# /var/lib/docker/containers/997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b
# and in the file:
# 997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b-json.log
# where 997599971ee6... is the Docker ID of the running container.
# The Kubernetes kubelet makes a symbolic link to this file on the host
# machine in the /var/log/containers directory which includes the pod name,
# the namespace name and the Kubernetes container name:
# synthetic-logger-0.25lps-pod_default_synth-lgr-997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b.log
# ->
# /var/lib/docker/containers/997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b/997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b-json.log
# The /var/log directory on the host is mapped to the /var/log directory in the container
# running this instance of Fluentd and we end up collecting the file:
# /var/log/containers/synthetic-logger-0.25lps-pod_default_synth-lgr-997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b.log
# This results in the tag:
# var.log.containers.synthetic-logger-0.25lps-pod_default_synth-lgr-997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b.log
# where 'synthetic-logger-0.25lps-pod' is the pod name, 'default' is the
# namespace name, 'synth-lgr' is the container name and '997599971ee6..' is
# the container ID.
# The record reformer is used to extract pod_name, namespace_name and
# container_name from the tag and set them in a local_resource_id in the
# format of:
# 'k8s_container.<NAMESPACE_NAME>.<POD_NAME>.<CONTAINER_NAME>'.
# The reformer also changes the tags to 'stderr' or 'stdout' based on the
# value of 'stream'.
# local_resource_id is later used by google_cloud plugin to determine the
# monitored resource to ingest logs against.
# Json Log Example:
# {"log":"[info:2016-02-16T16:04:05.930-08:00] Some log text here\n","stream":"stdout","time":"2016-02-17T00:04:05.931087621Z"}
# CRI Log Example:
# 2016-02-17T00:04:05.931087621Z stdout F [info:2016-02-16T16:04:05.930-08:00] Some log text here
<source>
#type tail
path /var/log/containers/*.log
pos_file /var/log/gcp-containers.log.pos
# Tags at this point are in the format of:
# reform.var.log.containers.<POD_NAME>_<NAMESPACE_NAME>_<CONTAINER_NAME>-<CONTAINER_ID>.log
tag reform.*
read_from_head true
<parse>
#type multi_format
<pattern>
format json
time_key time
time_format %Y-%m-%dT%H:%M:%S.%NZ
</pattern>
<pattern>
format /^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/
time_format %Y-%m-%dT%H:%M:%S.%N%:z
</pattern>
</parse>
</source>
<filter reform.var.log.containers.backend*.log>
#type parser
key_name log
<parse>
#type multi_format
<pattern>
# Gunicorn logs
format regexp
expression /^\[(?<time>[\d -:,+ZT\.]+)\] \[(?<pid>\d+)\] \[(?<severity>[a-zA-Z]+)\] (?<log>.+)$/
</pattern>
<pattern>
# Backend logging
format regexp
expression /^(?<time>[\d -:,+ZT\.]+) - (?<severity>[a-zA-Z]+) - (?<log>.+)$/
</pattern>
<pattern>
# Alembic logs
format regexp
expression /^(?<severity>[a-zA-Z]+)[ \t\x0B\f\r]+\[(?<name>.+)\] (?<log>.+)$/
</pattern>
<pattern>
# Access logs
format regexp
expression /^(?<target>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}:\d{1,5}) - (?<log>.+)$/
</pattern>
<pattern>
format none
</pattern>
</parse>
reserve_data true
emit_invalid_record_to_error false
</filter>
<filter reform.var.log.containers.listener*.log>
#type parser
key_name log
format /^(?<time>[\d -:,+ZT\.]+) - (?<name>.+) - (?<severity>[a-zA-Z]+) - (?<log>.+)$/
reserve_data true
emit_invalid_record_to_error false
</filter>
<match reform.**>
#type record_reformer
enable_ruby true
<record>
# Extract local_resource_id from tag for 'k8s_container' monitored
# resource. The format is:
# 'k8s_container.<namespace_name>.<pod_name>.<container_name>'.
"logging.googleapis.com/local_resource_id" ${"k8s_container.#{tag_suffix[4].rpartition('.')[0].split('_')[1]}.#{tag_suffix[4].rpartition('.')[0].split('_')[0]}.#{tag_suffix[4].rpartition('.')[0].split('_')[2].rpartition('-')[0]}"}
# Rename the field 'log' to a more generic field 'message'. This way the
# fluent-plugin-google-cloud knows to flatten the field as textPayload
# instead of jsonPayload after extracting 'time', 'severity' and
# 'stream' from the record.
message ${record['log']}
# If 'severity' is not set, assume stderr is ERROR and stdout is INFO.
severity ${record['severity'] || if record['stream'] == 'stderr' then 'ERROR' else 'INFO' end}
</record>
tag ${if record['stream'] == 'stderr' then 'raw.stderr' else 'raw.stdout' end}
remove_keys stream,log
</match>
# Detect exceptions in the log output and forward them as one log entry.
<match {raw.stderr,raw.stdout}>
#type detect_exceptions
remove_tag_prefix raw
message message
stream "logging.googleapis.com/local_resource_id"
multiline_flush_interval 5
max_bytes 500000
max_lines 1000
</match>
monitoring.conf: |-
# This source is used to acquire approximate process start timestamp,
# which purpose is explained before the corresponding output plugin.
<source>
#type exec
command /bin/sh -c 'date +%s'
tag process_start
time_format %Y-%m-%d %H:%M:%S
keys process_start_timestamp
</source>
# This filter is used to convert process start timestamp to integer
# value for correct ingestion in the prometheus output plugin.
<filter process_start>
#type record_transformer
enable_ruby true
auto_typecast true
<record>
process_start_timestamp ${record["process_start_timestamp"].to_i}
</record>
</filter>
system.input.conf: |-
# Example:
# Dec 21 23:17:22 gke-foo-1-1-4b5cbd14-node-4eoj startupscript: Finished running startup script /var/run/google.startup.script
<source>
#type tail
format syslog
path /var/log/startupscript.log
pos_file /var/log/gcp-startupscript.log.pos
tag startupscript
</source>
# Examples:
# time="2016-02-04T06:51:03.053580605Z" level=info msg="GET /containers/json"
# time="2016-02-04T07:53:57.505612354Z" level=error msg="HTTP Error" err="No such image: -f" statusCode=404
# TODO(random-liu): Remove this after cri container runtime rolls out.
<source>
#type tail
format /^time="(?<time>[^)]*)" level=(?<severity>[^ ]*) msg="(?<message>[^"]*)"( err="(?<error>[^"]*)")?( statusCode=($<status_code>\d+))?/
path /var/log/docker.log
pos_file /var/log/gcp-docker.log.pos
tag docker
</source>
# Example:
# 2016/02/04 06:52:38 filePurge: successfully removed file /var/etcd/data/member/wal/00000000000006d0-00000000010a23d1.wal
<source>
#type tail
# Not parsing this, because it doesn't have anything particularly useful to
# parse out of it (like severities).
format none
path /var/log/etcd.log
pos_file /var/log/gcp-etcd.log.pos
tag etcd
</source>
# Multi-line parsing is required for all the kube logs because very large log
# statements, such as those that include entire object bodies, get split into
# multiple lines by glog.
# Example:
# I0204 07:32:30.020537 3368 server.go:1048] POST /stats/container/: (13.972191ms) 200 [[Go-http-client/1.1] 10.244.1.3:40537]
<source>
#type tail
format multiline
multiline_flush_interval 5s
format_firstline /^\w\d{4}/
format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
time_format %m%d %H:%M:%S.%N
path /var/log/kubelet.log
pos_file /var/log/gcp-kubelet.log.pos
tag kubelet
</source>
# Example:
# I1118 21:26:53.975789 6 proxier.go:1096] Port "nodePort for kube-system/default-http-backend:http" (:31429/tcp) was open before and is still needed
<source>
#type tail
format multiline
multiline_flush_interval 5s
format_firstline /^\w\d{4}/
format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
time_format %m%d %H:%M:%S.%N
path /var/log/kube-proxy.log
pos_file /var/log/gcp-kube-proxy.log.pos
tag kube-proxy
</source>
# Example:
# I0204 07:00:19.604280 5 handlers.go:131] GET /api/v1/nodes: (1.624207ms) 200 [[kube-controller-manager/v1.1.3 (linux/amd64) kubernetes/6a81b50] 127.0.0.1:38266]
<source>
#type tail
format multiline
multiline_flush_interval 5s
format_firstline /^\w\d{4}/
format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
time_format %m%d %H:%M:%S.%N
path /var/log/kube-apiserver.log
pos_file /var/log/gcp-kube-apiserver.log.pos
tag kube-apiserver
</source>
# Example:
# I0204 06:55:31.872680 5 servicecontroller.go:277] LB already exists and doesn't need update for service kube-system/kube-ui
<source>
#type tail
format multiline
multiline_flush_interval 5s
format_firstline /^\w\d{4}/
format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
time_format %m%d %H:%M:%S.%N
path /var/log/kube-controller-manager.log
pos_file /var/log/gcp-kube-controller-manager.log.pos
tag kube-controller-manager
</source>
# Example:
# W0204 06:49:18.239674 7 reflector.go:245] pkg/scheduler/factory/factory.go:193: watch of *api.Service ended with: 401: The event in requested index is outdated and cleared (the requested history has been cleared [2578313/2577886]) [2579312]
<source>
#type tail
format multiline
multiline_flush_interval 5s
format_firstline /^\w\d{4}/
format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
time_format %m%d %H:%M:%S.%N
path /var/log/kube-scheduler.log
pos_file /var/log/gcp-kube-scheduler.log.pos
tag kube-scheduler
</source>
# Example:
# I1104 10:36:20.242766 5 rescheduler.go:73] Running Rescheduler
<source>
#type tail
format multiline
multiline_flush_interval 5s
format_firstline /^\w\d{4}/
format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
time_format %m%d %H:%M:%S.%N
path /var/log/rescheduler.log
pos_file /var/log/gcp-rescheduler.log.pos
tag rescheduler
</source>
# Example:
# I0603 15:31:05.793605 6 cluster_manager.go:230] Reading config from path /etc/gce.conf
<source>
#type tail
format multiline
multiline_flush_interval 5s
format_firstline /^\w\d{4}/
format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
time_format %m%d %H:%M:%S.%N
path /var/log/glbc.log
pos_file /var/log/gcp-glbc.log.pos
tag glbc
</source>
# Example:
# I0603 15:31:05.793605 6 cluster_manager.go:230] Reading config from path /etc/gce.conf
<source>
#type tail
format multiline
multiline_flush_interval 5s
format_firstline /^\w\d{4}/
format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
time_format %m%d %H:%M:%S.%N
path /var/log/cluster-autoscaler.log
pos_file /var/log/gcp-cluster-autoscaler.log.pos
tag cluster-autoscaler
</source>
# Logs from systemd-journal for interesting services.
# TODO(random-liu): Keep this for compatibility, remove this after
# cri container runtime rolls out.
<source>
#type systemd
filters [{ "_SYSTEMD_UNIT": "docker.service" }]
pos_file /var/log/gcp-journald-docker.pos
read_from_head true
tag docker
</source>
<source>
#type systemd
filters [{ "_SYSTEMD_UNIT": "{{ fluentd_container_runtime_service }}.service" }]
pos_file /var/log/gcp-journald-container-runtime.pos
read_from_head true
tag container-runtime
</source>
<source>
#type systemd
filters [{ "_SYSTEMD_UNIT": "kubelet.service" }]
pos_file /var/log/gcp-journald-kubelet.pos
read_from_head true
tag kubelet
</source>
<source>
#type systemd
filters [{ "_SYSTEMD_UNIT": "node-problem-detector.service" }]
pos_file /var/log/gcp-journald-node-problem-detector.pos
read_from_head true
tag node-problem-detector
</source>
# BEGIN_NODE_JOURNAL
# Whether to include node-journal or not is determined when starting the
# cluster. It is not changed when the cluster is already running.
<source>
#type systemd
pos_file /var/log/gcp-journald.pos
read_from_head true
tag node-journal
</source>
<filter node-journal>
#type grep
<exclude>
key _SYSTEMD_UNIT
pattern ^(docker|{{ fluentd_container_runtime_service }}|kubelet|node-problem-detector)\.service$
</exclude>
</filter>
# END_NODE_JOURNAL
---
kind: ConfigMap
apiVersion: v1
metadata:
name: fluentd-gcp-main-config
namespace: kube-system
labels:
k8s-app: fluentd-gcp-custom
data:
google-fluentd.conf: |-
#include config.d/*.conf
# This match is placed before the all-matching output to provide metric
# exporter with a process start timestamp for correct exporting of
# cumulative metrics to Stackdriver.
<match process_start>
#type prometheus
<metric>
type gauge
name process_start_time_seconds
desc Timestamp of the process start in seconds
key process_start_timestamp
</metric>
</match>
# This filter allows to count the number of log entries read by fluentd
# before they are processed by the output plugin. This in turn allows to
# monitor the number of log entries that were read but never sent, e.g.
# because of liveness probe removing buffer.
<filter **>
#type prometheus
<metric>
type counter
name logging_entry_count
desc Total number of log entries generated by either application containers or system components
</metric>
</filter>
# This section is exclusive for k8s_container logs. Those come with
# 'stderr'/'stdout' tags.
# TODO(instrumentation): Reconsider this workaround later.
# Trim the entries which exceed slightly less than 256KB, to avoid
# dropping them. It is a necessity, because Stackdriver only supports
# entries that are up to 256KB in size.
<filter {stderr,stdout}>
#type record_transformer
enable_ruby true
<record>
message ${record['message'].length > 256000 ? "[Trimmed]#{record['message'][0..256000]}..." : record['message']}
</record>
</filter>
# Do not collect fluentd's own logs to avoid infinite loops.
<match fluent.**>
#type null
</match>
# Add a unique insertId to each log entry that doesn't already have it.
# This helps guarantee the order and prevent log duplication.
<filter **>
#type add_insert_ids
</filter>
# This section is exclusive for k8s_container logs. These logs come with
# 'stderr'/'stdout' tags.
# We use a separate output stanza for 'k8s_node' logs with a smaller buffer
# because node logs are less important than user's container logs.
<match {stderr,stdout}>
#type google_cloud
# Try to detect JSON formatted log entries.
detect_json true
# Collect metrics in Prometheus registry about plugin activity.
enable_monitoring true
monitoring_type prometheus
# Allow log entries from multiple containers to be sent in the same request.
split_logs_by_tag false
# Set the buffer type to file to improve the reliability and reduce the memory consumption
buffer_type file
buffer_path /var/log/fluentd-buffers/kubernetes.containers.buffer
# Set queue_full action to block because we want to pause gracefully
# in case of the off-the-limits load instead of throwing an exception
buffer_queue_full_action block
# Set the chunk limit conservatively to avoid exceeding the recommended
# chunk size of 5MB per write request.
buffer_chunk_limit 512k
# Cap the combined memory usage of this buffer and the one below to
# 512KiB/chunk * (6 + 2) chunks = 4 MiB
buffer_queue_limit 6
# Never wait more than 5 seconds before flushing logs in the non-error case.
flush_interval 5s
# Never wait longer than 30 seconds between retries.
max_retry_wait 30
# Disable the limit on the number of retries (retry forever).
disable_retry_limit
# Use multiple threads for processing.
num_threads 2
use_grpc true
# Skip timestamp adjustment as this is in a controlled environment with
# known timestamp format. This helps with CPU usage.
adjust_invalid_timestamps false
</match>
# Attach local_resource_id for 'k8s_node' monitored resource.
<filter **>
#type record_transformer
enable_ruby true
<record>
"logging.googleapis.com/local_resource_id" ${"k8s_node.#{ENV['NODE_NAME']}"}
</record>
</filter>
# This section is exclusive for 'k8s_node' logs. These logs come with tags
# that are neither 'stderr' or 'stdout'.
# We use a separate output stanza for 'k8s_container' logs with a larger
# buffer because user's container logs are more important than node logs.
<match **>
#type google_cloud
detect_json true
enable_monitoring true
monitoring_type prometheus
# Allow entries from multiple system logs to be sent in the same request.
split_logs_by_tag false
detect_subservice false
buffer_type file
buffer_path /var/log/fluentd-buffers/kubernetes.system.buffer
buffer_queue_full_action block
buffer_chunk_limit 512k
buffer_queue_limit 2
flush_interval 5s
max_retry_wait 30
disable_retry_limit
num_threads 2
use_grpc true
# Skip timestamp adjustment as this is in a controlled environment with
# known timestamp format. This helps with CPU usage.
adjust_invalid_timestamps false
</match>

Parse different formats using fluentd from same source given different tag?

I'm using fluentd in a docker-compose file, where i want it to parse the log output of an apache container as well as other containers with a custom format.
In order to differentiate the formats, I'm planning to set tags in docker-compose like this:
logging:
driver: "fluentd"
options:
tag: "apache2"
So fluentd should be able to use different formats based on the tag. But how can I configure fluentd to do this?
The documentation says one should put this in the source section (which I can't do because I need two different formats):
<parse>
#type apache2
</parse>
My very basic source looks like this:
<source>
#type forward
port 24224
bind 0.0.0.0
</source>
Is it possible to use fluentd routing to use two different formats for data coming from the same source with different tags?
Yes it's possible:
# 1. Omit parsing at the source
<source>
#type forward
port 24224
bind 0.0.0.0
</source>
# 2. Write a dedicated filter for each format you want
<filter docker.apache2> # Check your exact produced tag, depends of versions. Just guessing here.
#type parser
key_name log
<parse>
#type apache2
</parse>
</filter>
<filter docker.backend>
#type parser
key_name log
<parse>
#type json
</parse>
</filter>
# 3. Match and store all
<match **>
#type s3
...
</match>

Unable to Run a Simple Python Script on Fluentd

I have a python script called script.py. When I run this script, it creates a logs folder on the Desktop and downloads all the necessary logs from a website and writes them as .log files in this logs folder. I want Fluentd to run this script every 5 minutes and do nothing more. The next source I have on the config file does the real job of sending this log data to another place. If I already have the logs folder on the Desktop, this log files are uploaded correctly to the next destination. But the script never runs. If I delete my logs folder locally, this is the output fluentd gives:
2020-07-27 10:20:42 +0200 [trace]: #0 plugin/buffer.rb:350:enqueue_all: enqueueing all chunks in buffer instance=47448563172440
2020-07-27 10:21:09 +0200 [trace]: #0 plugin/buffer.rb:350:enqueue_all: enqueueing all chunks in buffer instance=47448563172440
2020-07-27 10:21:36 +0200 [debug]: #0 plugin_helper/child_process.rb:255:child_process_execute_once: Executing command title=:exec_input spawn=[{}, "python /home/zohair/Desktop/script.py"] mode=[:read] stderr=:discard
This never gives a logs folder on my Desktop which the script normally does output if run locally like python script.py
If I already have the logs folder, I can see the logs on the stdout normally. Here is my config file:
<source>
#type exec
command python /home/someuser/Desktop/script.py
run_interval 5m
<parse>
#type none
keys none
</parse>
<extract>
tag_key none
</extract>
</source>
<source>
#type tail
read_from_head true
path /home/someuser/Desktop/logs/*
tag sensor_1.log-raw-data
refresh_interval 5m
<parse>
#type none
</parse>
</source>
<match sensor_1.log-raw-data>
#type stdout
</match>
I just need fluentd to run the script and do nothing else, and let the other source take this data and send it to somewhere else. Any solutions?
Problem was solved by creating another #type exec for pip install -r requirements.txt which fulfilled the missing module error which was not being shown on the fluentd error log (Was running fluentd as superuser).

fluentd file plugin \ how to keep only latest X files?

I'm using fluentd file plugin. is there a way to configure it to keep only last X files (for example last 10 files)? I didn't find it in the documentation.
second and less important question - how can I set the file name with index
myApp_1.log
myApp_2.log
.
.
myApp_10.log
my fluent.conf is:
# Receive events from 24224/tcp
# This is used by log forwarding and the fluent-cat command
<source>
#type forward
port 24224
</source>
<filter **>
#type record_transformer
remove_keys source,container_id,container_name
</filter>
<match tagger*>
#type copy
<store>
#type file
path /fluentd/log/tagger/tagger
append true
time_slice_format %Y%m%d%H
output_time false
</store>
<store>
#type stdout
</store>
</match>

Resources