my system time is :Tue Jan 6 09:44:49 CST 2015
td-agent.conf :
<match apache.access>
type webhdfs
host Page on test.com
port 50070
path /apache/%Y%m%d_%H/access.log.${hostname}
time_slice_format %Y%m%d
time_slice_wait 10m
time_format %Y-%m-%dT%H:%M:%S.%L%:z
timezone +08:00
flush_interval 1s
</match>
the time of dir is right!
[hadoop#node1 ~]$ hadoop fs -ls /apache/20150106_09
Found 1 items
-rw-r--r-- 2 webuser supergroup 17496 2015-01-06 09:47 /apache/20150106_09/access.log.node1.Page on test.com
but the time of log is wrong,I don't know why?
2015-01-06T01:47:00.000+00:00 apache.access {"host":"::1","user":null,"method":"GET","path":"/06","code":404,"size":275,"referer":null,"agent":"ApacheBench/2.3"}
20
my config is ok.
type webhdfs
host 192.168.80.41
port 50070
path /log/fluent_%Y%m%d_%H.log
time_format %Y-%m-%d %H:%M:%S
localtime
flush_interval 10s
add localtime on config.
Related
I have created a sample fluentd.config to send output to kafka with ssl. But while running the fluentd pod with the below config. Getting error
2022-05-11 18:55:05 +0000 [warn]: #0 suppressed same stacktrace
2022-05-11 18:55:07 +0000 [warn]: #0 Send exception occurred: Local: Invalid argument or configuration (invalid_arg) at /fluentd/vendor/bundle/ruby/2.7.0/gems/rdkafka-0.7.0/lib/rdkafka/producer.rb:135:in `produce'
What is the exact config for output kafka with SSL?
fluent.conf: |
<source>
#type dummy
dummy {"hello":"world"}
tag test
</source>
<match test>
#type rdkafka
brokers 'test_kafka_broker:9001'
ssl_ca_cert '/fluentd/etc/client/ca-cert'
ssl_client_cert '/fluentd/etc/client/client-cert.pem'
ssl_client_cert_key '/fluentd/etc/client/client.key'
<format>
#type json
</format>
topic_key 'test_fluentd_kafka_topic'
<buffer topic>
flush_interval 10s
</buffer>
</match>
Can someone help me how to configure the file buffer for multiprocess workers in fluentd?
I use this config, but when I add #type file+id to buffer for redis_store plugin, it throws this error:
failed to configure sub output copy: Plugin 'file' does not support multi workers configuration"
without id it failed with:
failed to configure sub output copy: Other 'redis_store' plugin already use same buffer path
But there is a tag in path and for different outputs (file) it works, it doesn't work only with Redis output.
I don't want to use the default memory buffer for this because of increasing memory when there is too much data. Is it possible to config this combo? (multiprocess+file buffer for redis_store plugin or Elasticsearch plugin?)
Configuration:
<system>
workers 4
root_dir /fluentd/log/buffer/
</system>
<worker 0-3>
<source>
#type forward
bind 0.0.0.0
port 9880
</source>
<label #TEST>
<match test.**>
#type forest
subtype copy
<template>
<store>
#type file
#id "file_${tag_parts[2]}/${tag_parts[3]}/${tag_parts[3]}-#{worker_id}"
#log_level debug
path "fluentd/log/${tag_parts[2]}/${tag_parts[3]}/${tag_parts[3]}-#{worker_id}.*.log"
append true
<buffer>
flush_mode interval
flush_interval 3
flush_at_shutdown true
</buffer>
<format>
#type single_value
message_key log
</format>
</store>
<store>
#type redis_store
host server_ip
port 6379
key test
store_type list
<buffer>
##type file CANT USE
#id test_${tag_parts[2]}/${tag_parts[3]}/${tag_parts[3]}-#{worker_id} WITH ID - DOESNT SUPPORT MULTIPROCESS..
#path fluentd/log/${tag_parts[2]}/${tag_parts[3]}/${tag_parts[3]}-#{worker_id}.*.log WITHOUT ID - OTHER PLUGIN USE SAME BUFFER PATH
flush_mode interval
flush_interval 3
flush_at_shutdown true
flush_thread_count 4
</buffer>
</store>
</template>
</match>
</label>
</worker>
Versions:
Fluentd v1.14.3
fluent-plugin-redis-store v0.2.0
fluent-plugin-forest v0.3.3
Thanks!
The redis_store config was wrong, correct version has id under FIRST #type:
<store>
#type redis_store
#id test_${tag_parts[2]}/${tag_parts[3]}/${tag_parts[3]}-#{worker_id}
host server_ip
port 6379
key test
store_type list
<buffer>
#type file
flush_mode interval
flush_interval 3
flush_at_shutdown true
flush_thread_count 4
</buffer>
</store>
Thank you for your time Azeem :)
On Ubuntu 18.04, I am running td-agent v4 which uses Fluentd v1.0 core. First I configured it with TCP input and stdout output. It receives and outputs the messages fine. I then configure it to output to file with a 10s flush interval, yet I do not see any output files generated in the destination path.
This is my file output configuration:
<match>
#type file
path /var/log/td-agent/test/access.%Y-%m-%d.%H:%M:%S.log
<buffer time>
timekey 10s
timekey_use_utc true
timekey_wait 2s
flush_interval 10s
</buffer>
</match>
I perform this check every 10s to see if log files are generated, but all I see is a directory with a name that still has the placeholders that I set for the path param:
ls -la /var/log/td-agent/test
total 12
drwxr-xr-x 3 td-agent td-agent 4096 Feb 5 23:14 .
drwxr-xr-x 6 td-agent td-agent 4096 Feb 6 00:17 ..
drwxr-xr-x 2 td-agent td-agent 4096 Feb 5 23:14 access.%Y-%m-%d.%H:%M:%S.log
From following the Fluentd docs, I was expecting this should be fairly straight forward since the file output and buffering plugins are bundled with Fluentd's core.
Am I missing something trivial here?
I figured it out, and it works now. I had two outputs, one to file and another to stdout. Apparently that won't work if they're both defined separately in the config file with their own <match> ... </match>. I believe output to stdout was read first in the config, so Fluentd outputted to that and not to file. They should both instead be nested under the copy output like this:
<match>
#type copy
<store>
#type file
...
</store>
<store>
#type stdout
</store>
</match>
this is my config:
<source>
#type http
port 8080
bind 0.0.0.0
body_size_limit 1m
keepalive_timeout 20s
cors_allow_origins ["http://example.net"]
add_remote_addr true
</source>
<match log>
#type sql
host 127.0.0.1
port 3306
database user
adapter mysql2
username user
password pass
socket /var/run/mysqld/mysqld.sock
flush_interval 1s
num_threads 2
<table>
table http_logs
column_mapping 'timestamp:created_at,REMOTE_ADDR:ip,name:name,value:value,value2:value2,url:url'
</table>
</match>
I am sending the data into mysql,
mysql and fluentd server are using the same time and timezone, but fluentd always send the time 3 hours backwards so if the real time is
root#fluentd:~# date
Mon Aug 7 21:22:04 IDT 2017
fluentd is inserting new data with time of:
Mon Aug 7 18:22:04 IDT 2017
I looked in fluentd input and output plugins and there no config for timezone
I think that what happen is that fluentd converts your timestamp to UTC format (with no time zone) after that mysql is converting UTC time format to GMT+0 which is probably -3 hours from your timezone.
you can configure fluentd timezone like in this example:
<match pattern>
type file
path /var/log/fluent/myapp
time_slice_format %Y%m%d
time_slice_wait 10m
time_format %Y%m%dT%H%M%S%z
timezone +08:00
</match>
I'm struggling with data loss between fluentd and influxdb.
Using the fluent-plugin-influxdb plugin with this configuration:
<source>
id test_syslog
type syslog
port 42185
protocol_type tcp
time_format %Y-%m-%dT%H:%M:%SZ
tag test_syslog
format /^(?<time>[^ ]*) (?<fastly_server>[^ ]*) (?<log_name>[^ ]*) (?<host>[^ ]*) ([^ ]*) ([^ ]*) (?<http_method>[^ ]*) (?<http_request>[^ ]*) (?<http_status>[^ ]*) (?<cache_status>[^ ]*) (?<uuid>[^ ]*) *(?<device_model>.*)$/
</source>
<match test_syslog.**>
type copy
<store>
type file
path /var/log/td-agent/test_syslog
</store>
<store>
id test_syslog
type influxdb
dbname test1
flush_interval 10s
host localhost
port 8086
remove_tag_suffix .local0.info
</store>
</match>
When comparing the file output and the data in influxdb I find this:
user#ip-xxx-xxx-xxx-xxx:/var/log/td-agent# curl -G 'http://localhost:8086/query' --data-urlencode "db=test1" --data-urlencode "q=SELECT COUNT(host) FROM log_data" ; cat test_syslog.20150901.b51eb4653c54c63e7 | wc -l
{"results":[{"series":[{"name":"log_data","columns":["time","count"],"values":[["1970-01-01T00:00:00Z",582]]}]}]}2355
2355 lines in the log, but only 582 records in the database.
I've enabled debug/trace logging from both influxdb and fluentd but nothing interesting in the logs so far.
Any ideas?
Just stumbled over this problem this week.
In my case, the logs are lost because of duplicated points.
I am using fluentd v0.12 which only supports precision to second. The subsecond precision is provided in newer versions see https://github.com/fluent/fluentd/issues/461. Therefore, it is quite easy to have same measurements with same tags and same timestamp
The solution is to use the "sequence_tag" in type influxdb like:
#type influxdb
host {{INFLUXDB_HOST}}
port {{INFLUXDB_PORT}}
dbname {{INFLUXDB_DBNAME}}
sequence_tag sequence
The records with same timestamp will have an additional tag like:
time requestDurationMillis sequence success testClients testGroup testRunId
---- --------------------- -------- ------- ----------- --------- ---------
2018-01-17T11:09:13Z 530 0 1 2 warmup 20180117/130908
2018-01-17T11:09:13Z 529 1 1 2 warmup 20180117/130908