How to use FluentD as a buffer between Telegraf and InfluxDB - influxdb

Is there any way to ship metrics gathered form Telegraf to FluentD, then into InfluxDB?
I know it's possible to write data from FluentD into InfluxDB; but how does one ship data from Telegraf into FluentD, basically using use FluentD as a buffer (as opposed to using Kafka or Redis)?

While it might be possible to do with FluentD using some of the available, although outdated output plugins, such as InfluxDB-Metrics, I couldn't get the plugin to work properly and it hasn't been updated in over six years, so it will probably not work with newer releases of FluentD.
Fluent Bit however, has an Influxdb output built right into it, so I was able to get it to work with that. The caveat is that it has no Telegraf plugin. So the solution I found was to setup a tcp input plugin in Fluent Bit, and set Telegraf to write JSON formatted data to it in it's output section.
The caveat of doing this, is that the JSON data is nested and not formatted properly for InfluxDB. The workaround is to use nest filters in Fluent Bit to 'lift' the nested data format, and re-format properly for InfluxDB.
Below is an example for disk-space, which is not a metric that is natively supported with Fluent Bit metrics but is natively supported with Telegraf:
#SET me=${HOST_HOSTNAME}
[INPUT] ## tcp recipe ## Collect data from telegraf
Name tcp
Listen 0.0.0.0
Port 5170
Tag telegraf.${me}
Chunk_Size 32
Buffer_Size 64
Format json
[FILTER] ## rename the three tags sent from Telegraf to prevent duplicates
Name modify
Match telegraf.*
Condition Key_Value_Equals name disk
Rename fields fieldsDisk
Rename name nameDisk
Rename tags tagsDisk
[FILTER] ## un-nest nested JSON formatted info under 'field' tag
Name nest
Match telegraf.*
Operation lift
Nested_under fieldsDisk
Add_prefix disk.
[FILTER] ## un-nest nested JSON formatted info under 'disk' tag
Name nest
Match telegraf.*
Operation lift
Nested_under tagsDisk
Add_prefix disk.
[OUTPUT] ## output properly formatted JSON info
Name influxdb
Match telegraf.*
Host influxdb.server.com
Port 8086
#HTTP_User whatever
#HTTP_Passwd whatever
Database telegraf.${me}
Sequence_Tag point_in_time
Auto_Tags On
NOTE: This is just a simple awkward config for my own proof of concept

Related

Can Telegraf combine/add value of metrics that are per-node, say for a cluster?

Let's say I have some software running on a VM that is emitting two metrics that are fed through Telegraf to be written into InfluxDB. Let's say the metric are no. successfully handled HTTP requests (S), and no. of failed HTTP requests (F), on that VM. However, I might configure three such VMs each emitting those 2 metrics.
Now, if I would like to have a computed metric which is the sum of S from each VM, and sum of F from each VM, and store as new metrics, at various instants of time. Is this something that can be achieved using Telegraf ? Or is there a better, more efficient, more elegant way ?
Kindly note that my knowledge of Telegraf and InfluxDB are theoretical, as I've recently started reading up about them, so I have not actually tried any of the above, yet.
This isn't something telegraf would be responsible for.
With Influx 1.x, you'd use a TICKScript or Continuous Queries to calculate the sum and inject the new sampled value.
Roughly, this would look like:
CREATE CONTINUOUS QUERY "sum_sample_daily" ON "database"
BEGIN
SELECT sum("*") INTO "daily_measurement" FROM "measurement" GROUP BY time(1d)
END
CQ docs

Renaming a Telegraf measurement with a wildcard

I'm collecting measurements with a Telegraf client. Unfortunately, the measurement name isn't static. Rather, it encodes a timestamp (terrible design choice, but out of my hands) as part of its name.
For example, the following 3 lines represent 3 instances of the same measurement, but have different names:
info.quorum.2902864.agree: 6
info.quorum.2902865.agree: 6
info.quorum.2902866.agree: 5
...
is there a way to transform these measurement names into one static name? In other words, I'd like to transform these entries above to:
info.quorum.hello.agree: 6
info.quorum.hello.agree: 6
info.quorum.hello.agree: 5
I saw the rename processor (https://github.com/influxdata/telegraf/tree/master/plugins/processors/rename) - but that doesn't support wildcard.
I also saw the regex processor (https://github.com/influxdata/telegraf/tree/master/plugins/processors/regex) but that doesn't support measurement names.
any ideas on how to get this done?
EDIT: some background: the measurements are collected with http input, usin a GJSON path like a.b.*.c
EDIT2: here's what i'm trying to parse. the problem is with the key '2931747', which grows on each subsequent reading:
"quorum" : {
"2931747" : {
"agree" : 8,
"disagree" : 0,
}
So they put actual value there as the key... Downright, eh, unsmart, let's put it this way.
And I won't blame the writers of JSON format parser for not putting a handle for that situation.
So, the answer is: in current form, with HTTP plugin, available parsers & processors - there's no way to shape it in any proper form (unless you can drop that damned number-key completely - then it's trivial).
I would suggest you to push on data providers to make them stop this stupidity.
If that is not an option - you need to write your own processor for that, alas.
It could be either fully standalone (poll the http endpoint, parse the stuff, form a batch of line protocol records, send to influx) - or it could cut it off on producing lines in Influx line protocol as the its output, and be executed with Exec input plugin

How to transform "Tags Values" in Telegraf

How can I transform the Tag Values in Telegraf?
I am trying to import Web access logs into InfluxDB with Telegraf. However, some of the URL PATHs include identifiers (session IDs, product IDs, etc).
I need to search and aggregate per path type (ids excluded), therefore, I can't(?) have them vary like that.
In the input plugin "logparser" I can use a grok extraction pattern but I can't do transformations of the values extracted that I know of.
And the only processor plugin (in between Input and Output) is merely a "printer".
I can't find any clean way of doing this with Telegraf. Maybe I could do some gymmics with Telegraf (multiple Grok parsers + ex/inclusions?) but after some quite extensive attempts I didn't manage to make anything work - it appeared quite fiddly.
This is only half an answer but:
I managed to achieve what I was trying with LogStash instead, outputting to InfluxDB (LogStash has its own output plugin to InfluxDB). Not as desirable, since now I'm having to run both Telegraf + LogStash but it's working.
I've created a feature request on Telegraf's GitHub:
https://github.com/influxdata/telegraf/issues/2667

wrong times tamps in netflow data generated by ESXi

I have a problem in "Date first seen" column in the result generated by nfdump. I have enabled netflow on an ESXi 5.5 to send netflow data to a netflow server. up to now everything is OK and I can capture netflow data with nfcapd with the following command:
nfcapd -D -z -u netflow -p 9996 -n Esxi,192.168.20.54,/data/nfdump -S2 -e
but the problem is that when I filter the traffic with nfdump (e.g. with nfdump -R nfdump5/2016/ -c 10) I see "1970-01-01 03:30:00.000" for "Date first seen" column in all entries!!! What should I do to get the right time stamps?
Any help is appreciated.
The NetFlow header has a timestamp for the whole datagram; most likely, their export is using the "first seen" field as an offset from that. It's possible nfdump isn't correctly interpreting that field; I'd recommend having a look at the capture in Wireshark, which I've found to be pretty reliable in decoding NetFlow. That will also let you examine the flow records directly to see if the timestamps are really coming across that small, or are just being misinterpreted. Just remember that if you're capturing NetFlow v9 or IPFIX, you'll need to make sure that your capture includes a template datagram.
If the ESXi's NetFlow isn't exporting timestamps correctly, you can also look into monitoring using a small virtual machine running a software flow exporter (there are a number of free ones available - just Google "free flow exporter") with an interface in promiscuous mode.

Why does Riak store all my documents on only one node? n_val is equal to 3

I have built a 5-node cluster using Riak 2.0pre11 on EC2 servers. Installed Riak, got it working, then repeated the same actions on 4 more servers using a bash script. At that point I used riak-admin cluster join riak#node1.example.com on nodes 2 thru 5 to form a cluster.
Using the Python Riak client I wrote a script to send 10,000 documents to Riak. Works fine and I can wrote another script to retrieve a doc which worked fine. Other than specifying the use of protobufs I haven't specified any other options when storing keys. I stored all the docs via a connection to node1.
However Riak seems to be storing all 3 replicas on the same node, in other words the storage used on node1 is about 3x the original HTML docs.
The script connected to node 1 and that is where all docs are stored. I changed the script to connect to node 2 and send 10,000 more which also all ended up in node 1. I used the command du -h /data/riak/bitcask to verify the aggregate stored size of the objects. On nodes 2 thru 4 there is only a few K which is the overhead of an empty Bitcask datastore.
For each document I specified the key similar to this
http://www.example.com/blogstore/007529.html4787somehash4787947:2014-03-12T19:14:32.887951Z
The first part of all keys are identical (testing), only the .html name and the ISO 8601 timestamp are different. Is it possible that I have somehow subverted the perfect hashing function?
Basically I used a default config. What could be wrong? Since Riak 2.0 uses a different config format, here is a fragment of the generated config for riak-core in the old format:
{riak_core,
[{enable_consensus,false},
{platform_log_dir,"/var/log/riak"},
{platform_lib_dir,"/usr/lib/riak/lib"},
{platform_etc_dir,"/etc/riak"},
{platform_data_dir,"/var/lib/riak"},
{platform_bin_dir,"/usr/sbin"},
{dtrace_support,false},
{handoff_port,8099},
{ring_state_dir,"/datapool/riak/ring"},
{handoff_concurrency,2},
{ring_creation_size,64},
{default_bucket_props,
[{n_val,3},
{last_write_wins,false},
{allow_mult,true},
{basic_quorum,false},
{notfound_ok,true},
{rw,quorum},
{dw,quorum},
{pw,0},
{w,quorum},
{r,quorum},
{pr,0}]}]}
If the bitcask directory only grows on a single node, it sounds like the nodes might not be communicating. Please run riak-admin member-status to verify that all nodes in the cluster are active.
Once you have issued the riak-admin cluster join <node> commands on all the nodes joining the cluster, you will also need to run riak-admin cluster plan to verify that the plan is correct before committing it using riak-admin cluster commit. These commands are described in greater detail here..

Resources