InfluxDB query measurements with multiple retention policy - influxdb

if I create multiple retention policy on a measurement, when I query the measurement, do I have to use
select * from RetentionPolicy.measurement
it looks to me the points with different retention policy has to be queried from a different measurement name with RetentionPolicy name as prefix?

Related

How can I delete measurements within a time range for a given RP?

Question
Is it possible to delete measurement data using a time range, For a specific Retention Policy?
DELETE
FROM "SensorData"."Quarantine"./.*/
WHERE "time" >= '2018-02-28T02:26:08.0000000Z'
AND "time" <= '2018-02-28T02:27:08.0000000Z'
Is our current attempt at a query, to drop all data between a time period, however Delete doesn't appear to be happy to have a database or a retention policy listed.
Possible XY Problem
The reason (I suspect it's an unsolved XY problem) (see github://influxdata/influxdb#8088) (This is step 3. below)
We have a Database called SensorData , that has a primary buffer default retention policy of 30d so we don't run out of disk space.
However, if the sensors register an 'exceedance' we have a requirement that requires us to keep that data, + an hour either side, for evidence. We call this Quarantine.
We have so far implemented this as a retention policy called Quarantine.
So we have Primary and Quarantine, and possibly in the future, some sort of high freq retention policy that might be down sampled to Primary.
The XY problem is, "How do you transaction-ally copy/move/change the retention policy on some recorded data in Influx?"
Our solution (after failing to find one)
Was,
e.g.
Create a temp db, named such as to uniquely identify an in progress quarantine operation.
create "TempDB"+"_Quarantine_"+startUnixTime+"_"+"endUnixTime"
Copy the data from Primary to tempdb
Copy Primary -> TempDB
3. delete the data from primary
`Delete Primary`
Copy data to Quarantine
Copy TempDB -> Quarantine
Drop TempDB
Drop TempDB
This would allow rollback for a failed operation, or rollback/resume in the case of a crash.
Chronograf was being really funky with parsing the query, causing a lot of confusion.
Influx (as of 1.4) does not have the ability to delete data for a specific retention policy, and Chronograf did not have the ability to parse the delete command without a database specified.
What ended up working, was (via an API) calling
DELETE FROM /.*/ WHERE "time" >= '2018-02-28T02:26:08.0000000Z' AND "time" <= '2018-02-28T02:27:08.0000000Z'
The database isn't specified, as it was specified elsewhere in the API.
Expected to be equivalent to calling use SensorData on the line before or in the CLI.
So for now the workaround is to just delete the data for all RP and hope you don't need a High Frequency Data retention policy in the future.
If you just want to change the retention policy on some data range, I suggest just copying such data range into another retention policy:
USE "SensorData"
SELECT *
INTO "Quarantine"."MeasurementName"
FROM "Primary"."MeasurementName"
WHERE "time" >= '2018-02-28T02:26:08.0000000Z'
AND "time" <= '2018-02-28T02:27:08.0000000Z'
The data will be deleted from "Primary"."MeasurementName" as usual after duration specified for "Primary" RP ends (30 days), while the copied range will be preserved in "Quarantine" RP.
And if you want to delete the data from primary immediately what can try to do is next:
USE "SensorData"."Primary"
DELETE
FROM "MeasurementName"
WHERE "time" >= '2018-02-28T02:26:08.0000000Z'
AND "time" <= '2018-02-28T02:27:08.0000000Z'
Specifying Database and retention policy is not supported via InfluxQL. I hope it will be in the future or in IFQL.
For now I would recommend to use a different measurement for the aggregated data.

Select from multiple measurements

I have a bunch of measurements, all starting with task_runtime.
i.e.
task_runtime.task_a
task_runtime.task_b
task_runtime.task_c
Is there a way to select all of them by a partial measurement name?
I'm using grafana on top of influxdb and I want to display all of these measurements in a single graph, but I don't have a closed list of these measurements.
I thought about something like
select * from (select table_name from all_tables where table_name like "task_runtime.*")
But not sure on the influxdb syntax for this
You can use a regular expression when specifying measurements in the FROM clause as described in the InfluxDB documentation.
For example, in your case:
SELECT * FROM /^task_runtime.*/
Grafana also supports this and will display all measurements separately.

Add tag to existing data point in InfluxDB

Is there a way to add a tag to an existing entry in InfluxDB measurement? If not in the existing db measurement, is there a way to insert the records with a new tag into a new influx measurement?
Currently I have a set of measurements that should probably be entries in a single measurement where their current measurement names should be tag-keys in the superset of the merged measurements.
e.g.
show measurements
measurement1
measurement2
measurement3
measurement4
should instead be tags on the data included in each measurement and union to form a single measurement joinedmeasurement with indexed tags measurment1, measurement2,...
It would have to be done manually via queries.
For example, in python using the official client:
from influxdb import InfluxDBClient
client = InfluxDBClient('localhost', database='my_db')
measurement = 'measurement1'
db_data = client.query('select value from %s' % (measurement))
data_to_write = [{'measurement': 'joinedmeasurement',
'tags': ['measurement1'],
'time': d['time'],
'fields': {'value': d['value']},
}
for d in db_data.get_points()]
client.write_points(data_to_write)
And so on for the rest of the measurements. Can run the above in a loop to do all of them in one go.
Consider using named fields though in addition to tags. The above example only uses one field - can have as many as you want.
This improves performance further, though obviously fields are not indexed so do not use them for data that queries are to run on.

Can I create different retention policy for different measurements in influxdb?

Is it possible to treat different measurements in influxdb with different a retention policy?
This is entirely possible with InfluxDB. To do this you'll need to create a database that has two retention policies and then write the data to the associated retention policy.
Example:
$ influx
> create database mydb
> create retention policy rp_1 on mydb duration 1h replication 1
> create retention policy rp_2 on mydb duration 2h replication 1
Now that our retention policies have been created we simple write data in the following manner:
Sensor 1 will write data to rp_1
curl http://localhost:8086/write?db=mydb&rp=rp_1 --data-binary SOMEDATA
Sensor 2 will write data to rp_2
curl http://localhost:8086/write?db=mydb&rp=rp_2 --data-binary SOMEDATA

Time Series Databases - Metrics vs. tags

I'm new with TSDB and I have a lot of temperature sensors to store in my database with one point per second. Is it better to use one unique metric per sensor, or only one metric (temperature for example) with distinct tags depending sensor??
I searched on Internet what is the best practice, but I didn't found a good answer...
Thank you! :-)
Edit:
I will have 8 types of measurements (temperature, setpoint, energy, power,...) from 2500 sources
If you are storing your data in InfluxDB, I would recommend storing all the metrics in a single measurement and using tags to differentiate the sources, rather than creating a measurement per source. The reason being that you can trivially merge or decompose the metrics using tags within a measurement, but it is not possible in the newest InfluxDB to merge or join across measurements.
Ultimately the decision rests with both your choice of TSDB and the queries you care most about running.
For comparison purposes, in Axibase Time-Series Database you can store temperature as a metric and sensor id as entity name. ATSD schema has a notion of entity which is the name of system for which the data is being collected. The advantage is more compact storage and the ability to define tags for entities themselves, for example sensor location, sensor type etc. This way you can filter and group results not just by sensor id but also by sensor tags.
To give you an example, in this blog article 0601911 stands for entity id - which is EPA station id. This station collects several environmental metrics and at the same time is described with multiple tags in the database: http://axibase.com/environmental-monitoring-using-big-data/.
The bottom line is that you don't have to stage a second database, typically a relational one, just to store extended information about sensors, servers etc. for advanced reporting.
UPDATE 1: Sample network command:
series e:sensor-001 d:2015-08-03T00:00:00Z m:temperature=42.2 m:humidity=72 m:precipitation=44.3
Tags that describe sensor-001 such as location, type, etc are stored separately, minimizing storage footprint and speeding up queries. If you're collecting energy/power metrics you often have to specify attributes to series such as Status because data may not come clean/verified. You can use series tags for this purpose.
series e:sensor-001 d:2015-08-03T00:00:00Z m:temperature=42.2 ... t:status=Provisional
You should use one metric per sensor. You probably won't be needing to aggregate values from different temperature sensors, but you will be needing to aggregate values of a given sensor (average over a minute for instance).
Metrics correspond to data coming from the same source, or at least data you will be likely to aggregate. You can create almost as many metrics as you want (up to 16 million metrics in OpenTSDB for instance).
Tags make distinctions between these pieces of data. For instance, you could tag data differently if they suddenly change a lot, in order to retrieve only relevant data if needed, without losing the rest of the data. Although for a temperature sensor getting data every second, the best would probably be to filter and only store data when the value changed...
Best practices are summed up here

Resources