Drop measurement that belong to a specific retention policy - influxdb

Recently, we migrated our data from measurement 'users' with retention policy autogen (default) to retention policy sixty_days (sixty_days.users).
So we don't need anymore the data in autogen.users.
How can we remove it without harming the data in sixty_days.users?
This is a strange behavior because when we insert data to sixty_days.users - it's not applied on autogen.users, but when we try to delete / drop - it removes it from both of them.
We tried to use DROP, and DELETE by time, but without success.
We expect to remove only data in a measurement that belongs to a specific RT.

As Nikolay Manolov mentioned in his answer, you cannot specify retention policy as part of DELETE OR DROP command. But do we really need it?
After tested I confirmed, your issue is related to using same measurement name along with different retention policies. So a better hack may be using a temporary measurement name other than 'user' until you delete the old copy with default retention policy.
select * into "sixty_days".temp_user from "autogen".user group by *;
delete from user;
select * into "sixty_days".user from "sixty_days".temp_user group by *;
delete temp_user;
You need to group by clause as part of above queries to avoid tags converted to fields by default.

This was not supported and it seems it still is not.
It is the intended behavior that DELETE affects all retention policies. And, yes, it is inconsistent for example with a SELECT where you can select from a particular retention policy only. See the docs on this https://docs.influxdata.com/influxdb/v1.7/query_language/database_management/#retention-policy-management
I suppose a not very elegant workaround could be to apply a retention_policy_tag to the data which would then allow you to delete data from particular retention policies.
EDIT: in your case, since the data is already there, you can't apply a tag. You could do a migration via copying the data into a temporary measurement, cleaning up, and then copying the data back. Alternatively, wait 60 days and then delete everything that is older, until which time your data will be duplicated.

Related

"Transactional safety" in influxDB

We have a scenario where we want to frequently change the tag of a (single) measurement value.
Our goal is to create a database which is storing prognosis values. But it should never loose data and track changes to already written data, like changes or overwriting.
Our current plan is to have an additional field "write_ts", which indicates at which point in time the measurement value was inserted or changed, and a tag "version" which is updated with each change.
Furthermore the version '0' should always contain the latest value.
name: temperature
-----------------
time write_ts (val) current_mA (val) version (tag) machine (tag)
2015-10-21T19:28:08Z 1445506564 25 0 injection_molding_1
So let's assume I have an updated prognosis value for this example value.
So, I do:
SELECT curr_measurement
INSERT curr_measurement with new tag (version = 1)
DROP curr_mesurement
//then
INSERT new_measurement with version = 0
Now my question:
If I loose the connection in between for whatever reason in between the SELECT, INSERT, DROP:
I would get double records.
(Or if I do SELECT, DROP, INSERT: I loose data)
Is there any method to prevent that?
Transactions don't exist in InfluxDB
InfluxDB is a time-series database, not a relational database. Its main use case is not one where users are editing old data.
In a relational database that supports transactions, you are protecting yourself against UPDATE and similar operations. Data comes in, existing data gets changed, you need to reliably read these updates.
The main use case in time-series databases is a lot of raw data coming in, followed by some filtering or transforming to other measurements or databases. Picture a one-way data stream. In this scenario, there isn't much need for transactions, because old data isn't getting updated much.
How you can use InfluxDB
In cases like yours, where there is additional data being calculated based on live data, it's common to place this new data in its own measurement rather than as a new field in a "live data" measurement.
As for version tracking and reliably getting updates:
1) Does the version number tell you anything the write_ts number doesn't? Consider not using it, if it's simply a proxy for write_ts. If version only ever increases, it might be duplicating the info given by write_ts, minus the usefulness of knowing when the change was made. If version is expected to decrease from time to time, then it makes sense to keep it.
2) Similarly, if you're keeping old records: does write_ts tell you anything that the time value doesn't?
3) Logging. Do you need to over-write (update) values? Or can you get what you need by adding new lines, increasing write_ts or version as appropriate. The latter is a more "InfluxDB-ish" approach.
4) Reading values. You can read all values as they change with updates. If a client app only needs to know the latest value of something that's being updated (and the time it was updated), querying becomes something like:
SELECT LAST(write_ts), current_mA, machine FROM temperature
You could also try grouping the machine values together:
SELECT LAST(*) FROM temperature GROUP BY machine
So what happens instead of transactions?
In InfluxDB, inserting a point with the same tag keys and timestamp over-writes any existing data with the same field keys, and adds new field keys. So when duplicate entries are written, the last write "wins".
So instead of the traditional SELECT, UPDATE approach, it's more like SELECT A, then calculate on A, and put the results in B, possibly with a new timestamp INSERT B.
Personally, I've found InfluxDB excellent for its ability to accept streams of data from all directions, and its simple protocol and schema-free storage means that new data sources are almost trivial to add. But if my use case has old data being regularly updated, I use a relational database.
Hope that clear up the differences.

How can I delete measurements within a time range for a given RP?

Question
Is it possible to delete measurement data using a time range, For a specific Retention Policy?
DELETE
FROM "SensorData"."Quarantine"./.*/
WHERE "time" >= '2018-02-28T02:26:08.0000000Z'
AND "time" <= '2018-02-28T02:27:08.0000000Z'
Is our current attempt at a query, to drop all data between a time period, however Delete doesn't appear to be happy to have a database or a retention policy listed.
Possible XY Problem
The reason (I suspect it's an unsolved XY problem) (see github://influxdata/influxdb#8088) (This is step 3. below)
We have a Database called SensorData , that has a primary buffer default retention policy of 30d so we don't run out of disk space.
However, if the sensors register an 'exceedance' we have a requirement that requires us to keep that data, + an hour either side, for evidence. We call this Quarantine.
We have so far implemented this as a retention policy called Quarantine.
So we have Primary and Quarantine, and possibly in the future, some sort of high freq retention policy that might be down sampled to Primary.
The XY problem is, "How do you transaction-ally copy/move/change the retention policy on some recorded data in Influx?"
Our solution (after failing to find one)
Was,
e.g.
Create a temp db, named such as to uniquely identify an in progress quarantine operation.
create "TempDB"+"_Quarantine_"+startUnixTime+"_"+"endUnixTime"
Copy the data from Primary to tempdb
Copy Primary -> TempDB
3. delete the data from primary
`Delete Primary`
Copy data to Quarantine
Copy TempDB -> Quarantine
Drop TempDB
Drop TempDB
This would allow rollback for a failed operation, or rollback/resume in the case of a crash.
Chronograf was being really funky with parsing the query, causing a lot of confusion.
Influx (as of 1.4) does not have the ability to delete data for a specific retention policy, and Chronograf did not have the ability to parse the delete command without a database specified.
What ended up working, was (via an API) calling
DELETE FROM /.*/ WHERE "time" >= '2018-02-28T02:26:08.0000000Z' AND "time" <= '2018-02-28T02:27:08.0000000Z'
The database isn't specified, as it was specified elsewhere in the API.
Expected to be equivalent to calling use SensorData on the line before or in the CLI.
So for now the workaround is to just delete the data for all RP and hope you don't need a High Frequency Data retention policy in the future.
If you just want to change the retention policy on some data range, I suggest just copying such data range into another retention policy:
USE "SensorData"
SELECT *
INTO "Quarantine"."MeasurementName"
FROM "Primary"."MeasurementName"
WHERE "time" >= '2018-02-28T02:26:08.0000000Z'
AND "time" <= '2018-02-28T02:27:08.0000000Z'
The data will be deleted from "Primary"."MeasurementName" as usual after duration specified for "Primary" RP ends (30 days), while the copied range will be preserved in "Quarantine" RP.
And if you want to delete the data from primary immediately what can try to do is next:
USE "SensorData"."Primary"
DELETE
FROM "MeasurementName"
WHERE "time" >= '2018-02-28T02:26:08.0000000Z'
AND "time" <= '2018-02-28T02:27:08.0000000Z'
Specifying Database and retention policy is not supported via InfluxQL. I hope it will be in the future or in IFQL.
For now I would recommend to use a different measurement for the aggregated data.

How to delete field for a given measurement from influxdb?

I created multiple fields to test output in grafana, however I want to delete the unwanted fields from influxdb, is there a way?
Q: I want to delete the unwanted fields from influxdb, is there a way?
A: Short answer. No. Up until the latest release 1.4.0, there is no straightforward way to do this.
The reason why this is so was because Influxdb is explicitly designed to optimise point creation. Thus functionalities for the "UPDATE" and "DELETE" side of things are compromised for it.
To drop fields of a given measurement, the easiest way would be to;
Retrieve the data out first
Modify its content
Drop the measurement
Re-insert the modified data back
Reference:
https://docs.influxdata.com/influxdb/v1.4/concepts/insights_tradeoffs/

how to remove a particular row of data from influxdb

I have some data in influxdb that is unnecessary, like some values like "0", so how do i delete those particular once. My database name is "bootstrap" and my measurement name is "response_time"
Tried this "delete from response_time where time > 2016-01-22T06:32:44Z"
but it says "Server returned error: error parsing query: found -01, expected SELECT, DELETE, SHOW, CREATE, DROP, GRANT, REVOKE, ALTER, SET at line 1, char 44"
Tried this also: "delete from bootstrap where time > 2016-01-22T06:32:44Z"
The current release of InfluxDB is a bit painful with deletes. You can drop an entire measurement, or a particular series, or an entire database, or the part of a series older than 'x' (retention policy). Anything finer-grained than this is still a bit alpha. Apparently, it was more flexible in v 0.7, but that feature has gone away. Probably not the answer you were hoping for, sorry.
See here:
https://docs.influxdata.com/influxdb/v0.9/query_language/database_management/
(shameless self-promotion follows)
A similar set of questions were asked here. Beware: it seems some answers depend on which version of InfluxDB you use.
My answer, which seems to be version-independent (so far):
Because InfluxDB is a bit painful about deletes, we use a schema that has a boolean field called "ForUse", which looks like this when posting via the line protocol (v0.9):
your_measurement,your_tag=foo ForUse=TRUE,value=123.5 1262304000000000000
You can overwrite the same measurement, tag key, and time with whatever field keys you send, so we do "deletes" by setting "ForUse" to false, and letting retention policy keep the database size under control.
Since the overwrite happens seamlessly, you can retroactively add the schema too. Noice.
Doing this, you can set up your Grafana queries to include "WHERE ForUse = TRUE". By filtering this way, and updating the "ForUse" field, you can replicate the functionality of "deleting" or "undeleting" points.
It's a bit kludgy, but I'm used to kludgy - every time series database I've worked with seems a bit awkward with partial deletes, so it must be something about their nature.

Why does a select with consistent read from Amazon SimpleDB yield different results?

I have a domain on SimpleDB and I never delete from it.
I am doing the following query on it.
select count(*) from table_name where last_updated > '2012-09-25';
Though I am setting consistent read parameter as true, it is still returning me different results in different executions. As I am not deleting anything from this domain, ideally the results should be in increasing order, but that is not happening.
Am I missing something here?
If I understand your use case correctly, you might be misreading the semantics of the ConsistentRead parameter in the context of Amazon SimpleDB, see Select:
When set to true, ensures that the most recent data is returned. For
more information, see Consistency
The phrase most recent can admittedly be misleading eventually, but it doesn't address/affect result ordering in any way, rather it means most recently updated and ConsistentRead guarantees that every update operation preceding your select statement is visible to this select operation already, see the description:
Amazon SimpleDB keeps multiple copies of each domain. When data is
written or updated, all copies of the data are updated. However, it
takes time for the update to propagate to all storage locations. The
data will eventually be consistent, but an immediate read might not
show the change. If eventually consistent reads are not acceptable for
your application, use ConsistentRead. Although this operation might
take longer than a standard read, it always returns the last updated
value. [emphasis mine]
The linked section on Consistency provides more details and an illustration regarding this concept.
Sort order
To achieve the results you presumably desire, a simple order by statement should do the job, e.g.:
select * from table_name where last_updated > '2012-09-25' order by last_updated;
There are a couple of constraints/subtleties regarding this operation on SimpleDB, so make sure to skim the short documentation of Sort for details.

Resources