I am trying to write an influxdb query such that a measurement looks like
dummydata value=1,2,3,4
and influxdb doesn't like this format. I'm guessing influxdb cannot do this, but I can't find any documentation that says it cannot, nor do I see a feasible workaround. I have to write 500 points per timestamp: it seems to me that 500 separate measurements per timestamp would get hairy quick.
So:
Can influxdb accept an array/list as a value
If not, is there a workaround
Or is influxdb just the wrong tool for this job?
Thanks in advance.
InfluxDB accepts strings, float64s, int64s, and booleans as field values.
it seems to me that 500 separate measurements per timestamp would get hairy quick.
That's where you are mistaken. InfluxDB 0.10+ is specifically designed to encourage multiple fields per point, where a field is a measured value. What you want to write is a point like this:
dummydata value=1,value2=2,value3=3,value4=4...
Related
https://github.com/influxdata/telegraf/pull/1557
Apparently some people have been asking for this, and this Github PR is the closest thing I can find to a solution, but it was ultimately denied(I think?).
Basically, I have a JSON object I'm getting from Stackdriver, which includes a Timestamp in ISO8601, which I convert to Unix time. I can insert the entire JSON response into Influx fine, but the timestamp from Stackdriver appears as a tag for a series, rather than the index of the time series itself. As a result, it is unfeasible to query by Stackdriver's provided timestamp. I could simply just drop it, and use the Influx provided timestamp, but it is essentially querying incorrect/imprecise data.
Does anyone have a clever way to approach this?
tl;dr How can I use Telegraf to override InfluxDB's timestamps with my own timestamps?
Is it possible to downsample older data using influxdb in a way that it only keeps change of values?
My example is the following:
I have a binary sensor sending data every 10 min, so naturally the consecutive values look something like this: 0,0,0,0,0,1,1,0,0,0,0...
My goal is to keep this kind of raw data over a certain period of time using retention policies and downsample the data for longer storage. I want to delete all successive values with the same number so that I have only the datapoint with their timestamps when the value actually changed. The downsampled data should look like this: 0,1,0,1,0,1,0.... but with the correct timestamp when the event actually occurred.
Currently this isn't possible with InfluxDB, though the plan is to eventually support this kind of use case.
I would encourage you to open an feature request on the InfluxDB repo asking for this.
Is it possible to aggregate measurements or create custom queries beyond the standard dateFrom dateTo queries?
As an example, I have measurements which have a time delta of 1 minute (2015-01-01T05:05:00, 2015-01-01T05:05:00, 2015-01-01T05:05:00, ...) and I would like to query the measurements at 15 minute intervals (2015-01-01T05:15:00, 2015-01-01T05:30:00, 2015-01-01T05:45:00, ...)
So far I have only come up with these solutions:
Using the standard api request as in
https://tenant.cumulocity.com/measurement/measurements?dateFrom=2015-10-01&dateTo=2015-11-05
and then throwing away most of the data will use a massive amount of time loading the data.
Using cep (cumulocity event language) to generate a new measurement every 15 minutes using the nearest 1 minute measurement seems like a bit of overkill and not very elegant.
Batch requesting the exact minute
https://tenant.cumulocity.com/measurement/measurements?dateFrom=2015-11-05T05:15:00%2B01:00&dateTo=2015-11-05T05:16:00%2B01:00
which will in a massive amount of API requests and also does not seem very efficient.
Use the /measurements/series endpoint which will only give me all series, even those I do not want, as well as only having the aggregation options hourly and daily (as far as I can tell).
Is there a better way of doing this?
you have captured nearly all of the mechanisms that are currently available. There is one more possibility -- not sure if this is an option for you:
Mark the fifteenth measurement when sending it from the device, using e.g. a different type.
I would normally use 2. It's actually quite efficient, it's similar to a materialized view in traditional SQL, plus you can use the data everywhere and in all widgets.
Good luck :-)
Cheers,
André
I would prefer the CEP solution. The rule wouldn't be that complicated. You would of course then store these measurements twice which is not that nice but having your desired measurement with a specific type or fragment will give you the fastest way to query it.
Instead of copying the measurement you could just add a special fragment to the measurement every 15 min in the CEP rule. You cannot update measurements so you would have to delete the measurement incoming every 15 min and then create a new measurement with exactly the same values but add a fragement (e.g. "aggregatedMeasurement": {}).
Your query then looks like this:
https://tenant.cumulocity.com/measurement/measurements?dateFrom=2015-10-01&dateTo=2015-11-05&fragmentType=aggregatedMeasurement
One more idea for point 3:
You could use SmartREST to create a template with the query string and leave the dateFrom and dateTo as placeholders.
From the client side you then would have to make only one request using the bulking feature in SmartREST.
On the server side this would still be transformed into the single requests so you wouldn't gain anything in speed.
I have two write points for InfluxDB, one is the start and the other is the end. I just need to determine the duration between those two events, and make queries around it. InfluxDB has difference() aggregate method, but it doesn't work on the time meta field.
Is supplying a custom timestamp value the only way to accomplish this?
As per "Can I perform mathematical operations against timestamps?"
No:
"Currently, it is not possible to execute mathematical operators against timestamp values in InfluxDB. Most time calculations must be carried out by the client receiving the query results."
and yes, maybe:
The function ELAPSED() returns the difference between subsequent timestamps in a single field.
So it depends on the shape of your data.
If you write only the mentioned two entries then you can follow the below steps -
Limit the result to two (Eg: select * from timeseries limit 2)
Extract the time from the result set
Take the difference between the time
I have a lot of logfile data that I want to display dynamic graphs from, for basically arbitrary time periods, optionally filtered or aggregated by different columns (that I could pregenerate). I'm wondering about the best way to store the data in a database and access it for displaying charts, when:
the time resolution should be variable from one second to a year
there are entries that span several 'time buckets', e.g. a connection might have been open for a few days and I want to count and display the user for every hour she was connected, not just in the hour 'slot' the connection was created or finished
Are there best practices, or tools/plugins for rails that help handle this kind and amount of data? Are there maybe database engines specifically tailored towards this, or having helpful functions (e.g. CouchDB indexes)?
EDIT: I'm looking for a scalable way to handle this data and access pattern. Things we considered: Run a query for each bucket, merge in app - probably way too slow. GROUP BY timestamp/granularity - does not count connections correctly. Preprocessing data into rows by smallest granularity and downsampling on query - probably the best way.
I think you can use mysql timestamps for this.
The way I solved it in the end was to pre-process the data into per-minute buckets, so there's one row for every event and minute. That makes it easy and fast enough to select and yields correct results. To get different granularity, you can do integer arithmetic on the timestamp columns - select abs(timestamp/factor)*factor and group by abs(timestamp/factor)*factor.