Fluentd sql output plugin configuration for auto incremented column - ruby-on-rails

I have a fluentd configuration that pulls data from the file and pushes to the SQL server however there is a primary key with the auto-incremented column, so, in my fluentd configuration if I don't mention that column it throws an error saying that the field is missing and if I include the column in the configuration it gives identity error, in below configuration "Id" is the primary and auto-incremented column, also let me know if adapter "sqlserver" is the right thing to use.
<filter record.**>
#type record_transformer
enable_ruby true
<record>
Id ${id}
</record>
<record>
timestamp ${time}
</record>
</filter>
<filter record.**>
#type stdout
</filter>
<match record.**>
#type sql
host myhost
username myuser
password mypassword
database mydb
adapter sqlserver
<table>
table simple_table
column_mapping 'Id:Id,timestamp:timestamp'
</table>
flush_interval 1s
# disable_retry_limit
# num_threads 8
# slow_flush_log_threshold 40.0
</match>

Well, I figured this out, it's mandatory to send the column name in the column_mapping even though if its primary key and auto-incremented, if you login with some other SQL credential it will give you an error, however, if you login with the same details used at the time of table creation it works.

Related

Fluentd change the source field value

Using fluentd I wanted to change the "source" field value but not sure how to go about it.
Currently it is picking up the container IP address as "source" but I need that to be a self-defined name.
Any ideas?
EDIT:
Things I tried:
This is a Alpine container running on Fargate ECS so setting the hostname value in the task definition.
Error: ClientException: hostname is not supported on container when networkMode=awsvpc.
Use record transformer to set the hostname/source.
<filter foo.**>
#type record_transformer
<record>
hostname "foo-#{Socket.gethostname}"
</record>
</filter>
Also
<filter foo.**>
#type record_transformer
<record>
source "foo-#{Socket.gethostname}"
</record>
</filter>
No change in the records at all but I can see in the logs the filter being read and gethostname is working.
So looking further at record_transformer I have been able to write a new field with this config:
<filter foo.**>
#type record_transformer
<record>
server "foo-#{Socket.gethostname}"
</record>
</filter>
How do I change the contents of an existing field?
How about this one:
<filter foo.**>
#type record_modifier
<replace>
key source
expression /[\s\S]*/
replace "foo-#{Socket.gethostname}"
</replace>
</filter>
The "source" field should it contain anything its contents should replaced with "foo-#{Socket.gethostname}" only it doesn't.

Fluentd configuration for ORACLE clob datatype

I'm using fluentd configuration to read data from a text file and it is pushed to the Oracle database, I have clob & nclob type of datatype when fluentd pushed data, the column is always null and I don't see any errors. I'm not sure how to resolve this issue in fluentd, below is the configuration that I have done.
I 'm using oracle enhanced adapter & sql plugin
https://github.com/rsim/oracle-enhanced
https://github.com/fluent/fluent-plugin-sql/issues
#
# Fluentd configuration file
#
# Config input
<source>
#type forward
port 24224
</source>
# Config output
<match cpu_*>
#type stdout
</match>
<match foo_*>
#type stdout
</match>
<match memory_*>
#type sql
host {DATABASE_HOSTNAME}
port 1521
database {DATABASE_NAME}
adapter oracle_enhanced
username {DATABASE_USERNAME}
password {DATABASE_PASSWORD}
<table>
table fluentd_log
column_mapping 'timestamp:created_at,Mem.text:mem_text,Mem.used:mem_used'
# This is the default table because it has no "pattern" argument in <table>
# The logic is such that if all non-default <table> blocks
# do not match, the default one is chosen.
# The default table is required.
</table>
</match>
CREATE TABLE FLUENTD_LOG
(
ID NUMBER(8),
CREATED_AT VARCHAR2(50 BYTE),
MEM_TEXT CLOB,
MEM_USED VARCHAR2(50 BYTE)
)
ID CREATED_AT MEM_TEXT MEMUSED
1 29-08-99 null test

Fluentd hide password or encrypt in configuration

Because of security reasons, we can't keep SQL authentication in plain text, is there a way to hide or encrypt passwords?
<source>
#type sql
#id output_sql
host "sqlserverhost.aws_region.rds.amazonaws.com"
database db_name
adapter sqlserver
username user
password pwd
tag_prefix myrdb # optional, but recommended
select_interval 60s # optional
select_limit 500 # optional
state_file /var/run/fluentd/sql_state
<table>
table tbl_name
update_column insert_timestamp
</table>
</source>
<match **>
#type stdout
</match>

Adding client-unique record to a log event, fluentd side. E.g., using filter

I succeeded getting a dockerized fluentd TCP logging to run! Meaning: There are
remote python containers using a slightly modified
logging.handlers.SocketHandler to send some JSON to fluentd - and
it actually arrives there, looking like this:
2020-08-31T09:06:31+00:00 paws.tcp {"service_uuid":"paws_log","loglvl":"INFO","file":"paws_log.paws_log","line":59,"msg":"Ping log line #2"}
I have multiple such python containers and would like to have fluentd add some
kind of source id to each log event. Reading the docs made me give the filter -> record
mechanism a chance. Leading to the following config snippet with a newly added
filter block:
<source>
#type tcp
#label stream_paws
#id paws_tcp
tag paws.tcp
port 5170
bind 0.0.0.0
# https://docs.fluentd.org/parser/regexp
<parse>
#type regexp
expression /^(?<service_uuid>[a-zA-Z0-9_-]+): (?<logtime>[^\s]+) (?<loglvl>[^\s]+) \[(?<file>[^\]:]+):(?<line>\d+)\]: (?<msg>.*)$/
time_key logtime
time_format %H:%M:%S
types line:integer
</parse>
</source>
# Add meta data fluentd side.
# https://docs.fluentd.org/deployment/logging
<filter **> # << Does NOT seem to work if kept outside the label-block! Inside is fine.
#type record_transformer
<record>
host "#{Socket.gethostname}"
</record>
</filter>
<label stream_paws>
<match paws.tcp>
#type file
#id output_paws_tcp
path /fluentd/log/paws/data/tcp.*.log
symlink_path /fluentd/log/paws/tcp.log
</match>
</label>
I have two questions here:
Above config works if I put the filter-block inside the label-block. But this I do not want to do because I want the filter to act globally. #include directives might offer a work-around here. Anything better?
I suspect "#{Socket.gethostname}" yields information on the fluentd server. However, I want something on the client. Ideally including some id that is unique on a docker container level (might be the container id. However, any old client-unique uuid would be fine). Do you know of such a property accessible to fluentd?
If you are using fluentd docker logging driver it will already add container metadata (including id) to every log record:
https://docs.docker.com/config/containers/logging/fluentd/
Above config works if I put the filter-block inside the label-block. But this I do not want to do because I want the filter to act globally. #include directives might offer a work-around here. Anything better?
A global filter usually implemented on a server like:
<source>
...
</source>
<filter **> # filter globally
...
</filter>
<match tag.one>
...
</match>
<match tag.two>
...
</match>
<match **> # the rest
...
</match>
I suspect "#{Socket.gethostname}" yields information on the fluentd server.
Correct, see: https://docs.fluentd.org/filter/record_transformer#example-configurations. This can be useful when you wanna also track which server processed the log record.
If you are using kubernetes then use kubernetes metadata it will add pod details with each log entry.
<filter kubernetes.**>
#id filter_kubernetes_metadata
#type kubernetes_metadata
</filter>
For Docker
I've not really used fluentd before, so apologies for a slightly abstract response here. But .. checking on http://docs.fluentd.org/ I guess you're probably using in_tail for the logs? From the example there, it looks like you'd probably want to get the path to the file into the input message:
path /path/to/file
tag foo.*
which apparently tags events with foo.path.to.file
you could probably use http://docs.fluentd.org/articles/filter_record_transformer with enable_ruby. From this, it looks like you could probably process the foo.path.to.file tag and use a little ruby to extract the container ID and then parse out then JSON file.
For example, testing with the following ruby file, say, foo.rb
tag = 'foo.var.lib.docker.containers.ID.ID-json.log'
require 'json'; id = tag.split('.')[5]; puts JSON.parse(IO.read("/var/lib/docker/containers/#{id}/config.v2.json"))['image']
where config.v2.json was something like:
{"image":"foo"}
will print you
foo
Fluentd might already be including json for you, so maybe you could leave out the require 'json'; bit. Then, putting this in fluentd terms, perhaps you could use something like
<filter>
enable_ruby
<record>
container ${tag.split('.')[5]}
image ${id = tag.split('.')[5]; JSON.parse(IO.read("/var/lib/docker/containers/#{id}/config.v2.json"))['image']}
</record>
</filter>
In your case might be you can use like below
<filter raw.**>
#type record_transformer
enable_ruby
<record>
container ${id = tag.split('.')[5]; JSON.parse(IO.read("/var/lib/docker/containers/#{id}/config.v2.json"))["Name"][1..-1]}
hostname "#{Socket.gethostname}"
</record>
</filter>

How to inject `time` attribute based on certain json key value?

I am still new on fluentd, I've tried various configuration, but I am stuck.
Suppose I have this record pushed to fluend that has _epoch to tell the epoch time the record is created.
{"data":"dummy", "_epoch": <epochtime_in_second>}
Instead of using time attribute being processed by fluentd, I want to override the time with this _epoch field. How to produce fluentd output with time overriden?
I've tried this
# TCP input to receive logs from the forwarders
<source>
#type forward
bind 0.0.0.0
port 24224
</source>
# HTTP input for the liveness and readiness probes
<source>
#type http
bind 0.0.0.0
port 9880
</source>
# rds2fluentd_test
<filter rds2fluentd_test.*>
#type parser
key_name _epoch
reserve_data true
<parse>
#type regexp
expression /^(?<time>.*)$/
time_type unixtime
utc true
</parse>
</filter>
<filter rds2fluentd_test.*>
#type stdout
</filter>
<match rds2fluentd_test.*>
#type s3
#log_level debug
aws_key_id "#{ENV['AWS_ACCESS_KEY']}"
aws_sec_key "#{ENV['AWS_SECRET_KEY']}"
s3_bucket foo-bucket
s3_region ap-southeast-1
path ingestion-test-01/${_db}/${_table}/%Y-%m-%d-%H-%M/
#s3_object_key_format %{path}%{time_slice}_%{index}.%{file_extension}
# if you want to use ${tag} or %Y/%m/%d/ like syntax in path / s3_object_key_format,
# need to specify tag for ${tag} and time for %Y/%m/%d in <buffer> argument.
<buffer time,_db,_table>
#type file
path /var/log/fluent/s3
timekey 1m # 5 minutes partition
timekey_wait 10s
timekey_use_utc true # use utc
chunk_limit_size 256m
</buffer>
time_slice_format %Y%m%d%H
store_as json
</match>
But upon receiving data like above, it shows warning error like this:
#0 dump an error event: error_class=Fluent::Plugin::Parser::ParserError error="parse failed no implicit conversion of Integer into Hash" location="/usr/local/bundle/gems/fluentd-1.10.4/lib/fluent/plugin/filter_parser.rb:110:in `rescue in filter_with_time'" tag="rds2fluentd_test." time=1590578507 record={....
was getting the same warning message, setting hash_value_field parsed under filter section solved the issue.

Resources