As per my information, Kapacitor can work on streams or batches. In case of batches, it fetches data from Influxdb and operate on that.
But how does it work with stream. Does it subscribe to InfluxDB or Telegraph. I hope it subscribe to InfluxDB. So in case any client write data to InfluxDb, Kapacitor also receive that data. Is this understanding correct? Or it subscribe directly to Telegraph?
Why this question is important to us is because we want to use Azure IoT hub in place of Telegraph. So, we will read the data from Azure IoT hub and write it to InfluxDb. We hope that we can use Kapacitor Stream here.
Thanks In Advance
Kapacitor subscribes to influxDB. By default if we do not mention what databases to subscribe to, it subscribes to all of them. In Kapacitor's config file you can mention the databases in InfluxDB which you want to subscribe to.
Related
I'm just researching InfluxDB at this stage and got this question: is it possible to do anything with MQTT message before ingest it to InfluxDB?
I get some sensors data trough MQTT and it's in JSON format, but main parameters are coming in encrypted format as a field of the JSON and before to ingest it I need to preprocess them and extract real values. What would be the best practice to do this?
I understand that I can write my own service that subscribe to MQTT, encode the data and then send it to InfluxDb, but is it only the option? Can I extend Telegraf somehow to do this?
I wanted to ask what options make sense with Redis, as I am unsure about Redis Pub/Sub in particular. Suppose I have a service A (Java client) that processes images. Unfortunately it can't process all kinds of images (because the language/framework doesn't support it yet). This is where service B comes into play (Node.js).
Service A streams the image bytes to Redis. Service B should read these bytes from Redis and encode them into the correct format. Then stream back to Redis and Service A is somehow notified to read the result from Redis.
There are two strategies I consider for this:
Using the Pub/Sub feature of Redis. Service A streams via writeStream e.g. the chunks to Redis and then publishes as publisher certain metadata to Service B (& replicas) as subscriber. Service B then reads the stream ( locks it for other replicas), processes it, and then streams the result back to Redis. Then sends a message to Service A as Publisher that the result can be fetched from Redis.
I put everything directly into the pub/sub Redis. Metadata and bytes and then proceed as in 1). But how do I then lock the message for other replicas of B? I want to avoid that all process the same image.
So my question is:
Does the pub/sub feature of Redis allow strategy no. 2 in terms of performance or is this exclusively intended for "lightweight" messages such as log data, metadata, IDs?
And if Redis in general would not be a good solution for this approach. Which one then? Async rest endpoints?
I've used influxDB for a while now, but never had to continuously stream data from it. A simple GET /query was sufficient. But now I need a way to stream data to the frontend to draw pretty graphs and such.
So far we've been running GET /query periodically from the frontend, but this is highly inefficient. I would much rather get keep the connection open and receive the data when it's written to the DB. Searching the interwebs there doesn't seem to be support neither for websockets, nor for HTTP/2 in influxDB right now.
So, question to others, who possibly hit this issue - how did you solve this?
InfluxDB v1.x supports subscriptions. As data is written to InfluxDB, writes are duplicated to subscriber endpoints via HTTP, HTTPS, or UDP in line protocol.
https://docs.influxdata.com/influxdb/v1.7/administration/subscription-management/
I'm glossing over their documentation here :
http://www.rubydoc.info/github/github/statsd-ruby/Statsd
And there's methods for recording data, but I can't seem to find anything about retrieving recorded data. I'm adopting a projecting with an existing statsd addition. It's host is likely a defunct URL. Perhaps, is the host where those stats are recorded?
The statsd server implementations that Mircea links just take care of receiving, aggregating metrics and publishing them to a backend service. Etsy's statsd definition (bold is mine):
A network daemon that runs on the Node.js platform and listens for
statistics, like counters and timers, sent over UDP or TCP and sends
aggregates to one or more pluggable backend services (e.g.,
Graphite).
To retrieve the recorded data you have to query the backend. Check the list of available backends. The most common one is Graphite.
See also this question: How does StatsD store its data?
There are 2 parts to statsd: a client and a server.
What you're looking at is the client part. You will not see functionality related to retrieving the data as it's not there - it normally is on the server side.
Here is a list of statsd server implementations:
http://www.joemiller.me/2011/09/21/list-of-statsd-server-implementations/
Research and pick one that fits your needs.
Statsd originally started at etsy: https://github.com/etsy/statsd/wiki
I plan to create a system where I can read web logs in real time, and use apache spark to process them. I am planning to use kafka to pass the logs to spark streaming to aggregate statistics.I am not sure if I should do some data parsing (raw to json ...), and if yes, where is the appropriate place to do it (spark script, kafka, somewhere else...) I will be grateful if someone can guide me. Its kind of a new stuff to me. Cheers
Apache Kafka is a distributed pub-sub messaging system. It does not provide any way to parse or transform data it is not for that. But any Kafka consumer can process, parse or transform the data published to Kafka and republished the transformed data to another topic or store it in a database or file system.
There are many ways to consume data from Kafka one way is the one you suggested, real-time stream processors(apache flume, apache-spark, apache storm,...).
So the answer is no, Kafka does not provide any way to parse the raw data. You can transform/parse the raw data with spark but as well you can write your own consumer as there are many Kafka clients ports or use any other built consumer Apache flume, Apache storm, etc