how to remove a particular row of data from influxdb - influxdb

I have some data in influxdb that is unnecessary, like some values like "0", so how do i delete those particular once. My database name is "bootstrap" and my measurement name is "response_time"
Tried this "delete from response_time where time > 2016-01-22T06:32:44Z"
but it says "Server returned error: error parsing query: found -01, expected SELECT, DELETE, SHOW, CREATE, DROP, GRANT, REVOKE, ALTER, SET at line 1, char 44"
Tried this also: "delete from bootstrap where time > 2016-01-22T06:32:44Z"

The current release of InfluxDB is a bit painful with deletes. You can drop an entire measurement, or a particular series, or an entire database, or the part of a series older than 'x' (retention policy). Anything finer-grained than this is still a bit alpha. Apparently, it was more flexible in v 0.7, but that feature has gone away. Probably not the answer you were hoping for, sorry.
See here:
https://docs.influxdata.com/influxdb/v0.9/query_language/database_management/
(shameless self-promotion follows)
A similar set of questions were asked here. Beware: it seems some answers depend on which version of InfluxDB you use.
My answer, which seems to be version-independent (so far):
Because InfluxDB is a bit painful about deletes, we use a schema that has a boolean field called "ForUse", which looks like this when posting via the line protocol (v0.9):
your_measurement,your_tag=foo ForUse=TRUE,value=123.5 1262304000000000000
You can overwrite the same measurement, tag key, and time with whatever field keys you send, so we do "deletes" by setting "ForUse" to false, and letting retention policy keep the database size under control.
Since the overwrite happens seamlessly, you can retroactively add the schema too. Noice.
Doing this, you can set up your Grafana queries to include "WHERE ForUse = TRUE". By filtering this way, and updating the "ForUse" field, you can replicate the functionality of "deleting" or "undeleting" points.
It's a bit kludgy, but I'm used to kludgy - every time series database I've worked with seems a bit awkward with partial deletes, so it must be something about their nature.

Related

"Transactional safety" in influxDB

We have a scenario where we want to frequently change the tag of a (single) measurement value.
Our goal is to create a database which is storing prognosis values. But it should never loose data and track changes to already written data, like changes or overwriting.
Our current plan is to have an additional field "write_ts", which indicates at which point in time the measurement value was inserted or changed, and a tag "version" which is updated with each change.
Furthermore the version '0' should always contain the latest value.
name: temperature
-----------------
time write_ts (val) current_mA (val) version (tag) machine (tag)
2015-10-21T19:28:08Z 1445506564 25 0 injection_molding_1
So let's assume I have an updated prognosis value for this example value.
So, I do:
SELECT curr_measurement
INSERT curr_measurement with new tag (version = 1)
DROP curr_mesurement
//then
INSERT new_measurement with version = 0
Now my question:
If I loose the connection in between for whatever reason in between the SELECT, INSERT, DROP:
I would get double records.
(Or if I do SELECT, DROP, INSERT: I loose data)
Is there any method to prevent that?
Transactions don't exist in InfluxDB
InfluxDB is a time-series database, not a relational database. Its main use case is not one where users are editing old data.
In a relational database that supports transactions, you are protecting yourself against UPDATE and similar operations. Data comes in, existing data gets changed, you need to reliably read these updates.
The main use case in time-series databases is a lot of raw data coming in, followed by some filtering or transforming to other measurements or databases. Picture a one-way data stream. In this scenario, there isn't much need for transactions, because old data isn't getting updated much.
How you can use InfluxDB
In cases like yours, where there is additional data being calculated based on live data, it's common to place this new data in its own measurement rather than as a new field in a "live data" measurement.
As for version tracking and reliably getting updates:
1) Does the version number tell you anything the write_ts number doesn't? Consider not using it, if it's simply a proxy for write_ts. If version only ever increases, it might be duplicating the info given by write_ts, minus the usefulness of knowing when the change was made. If version is expected to decrease from time to time, then it makes sense to keep it.
2) Similarly, if you're keeping old records: does write_ts tell you anything that the time value doesn't?
3) Logging. Do you need to over-write (update) values? Or can you get what you need by adding new lines, increasing write_ts or version as appropriate. The latter is a more "InfluxDB-ish" approach.
4) Reading values. You can read all values as they change with updates. If a client app only needs to know the latest value of something that's being updated (and the time it was updated), querying becomes something like:
SELECT LAST(write_ts), current_mA, machine FROM temperature
You could also try grouping the machine values together:
SELECT LAST(*) FROM temperature GROUP BY machine
So what happens instead of transactions?
In InfluxDB, inserting a point with the same tag keys and timestamp over-writes any existing data with the same field keys, and adds new field keys. So when duplicate entries are written, the last write "wins".
So instead of the traditional SELECT, UPDATE approach, it's more like SELECT A, then calculate on A, and put the results in B, possibly with a new timestamp INSERT B.
Personally, I've found InfluxDB excellent for its ability to accept streams of data from all directions, and its simple protocol and schema-free storage means that new data sources are almost trivial to add. But if my use case has old data being regularly updated, I use a relational database.
Hope that clear up the differences.

How do I fix inconsistent types in InfluxDB?

In InfluxDB (1.5), I have a table where the fields have become inconsistently typed. Most rows in the table are Integer, however, some rows have become strings.
How is this possible? I thought, once a field's types were set (upon first insert), any insert into the table with incorrect typing would fail.
What do I do now? If I go back and attempt to overwrite the data in the inconsistent rows, I get errors saying the field is a string.
After some more research, here's what I've discovered:
Answer to Part 1:
InfluxDB uses a system they refer to as 'sharding' - while I don't know the specifics, I do know that data from the same measurement/table can be stored across multiple, different 'shards'.
According to the InfluxDB documentation, field types can differ between these shards, within the same field, on the same table.
Answer to Part 2:
In order to fix this, the currently-suggested answer is to make a new table, download all the data, and re-insert while ensuring the data that gets inserted is the proper types.
If you had a tag which changed type and became a field, this can be especially difficult to fix, the link above does not address that. To do selects only on tag or field, you can use tag_name::tag or field_name::field within a select statement.
The GROUP BY * clause suggested in the link is required in order to preserve tags, but seemed to cause issues when I used it.
My current solution is a PHP script that uses curl, downloads the points, chunks them, then re-inserts the points into the new table, ensuring each point that gets inserted is casted to the new, uniform type, and properly inserted.
The best way to stopping future issues, is simply not to have them. I went looking for how to lock field types in all cases, across all shards, for a particular measurement table.
Unfortunately, it seems impossible to guarantee 100% type consistency across all current and future shards. "Don't make mistakes because it's really difficult to clean up" seems to be InfluxDB's modus operandi.

How to delete field for a given measurement from influxdb?

I created multiple fields to test output in grafana, however I want to delete the unwanted fields from influxdb, is there a way?
Q: I want to delete the unwanted fields from influxdb, is there a way?
A: Short answer. No. Up until the latest release 1.4.0, there is no straightforward way to do this.
The reason why this is so was because Influxdb is explicitly designed to optimise point creation. Thus functionalities for the "UPDATE" and "DELETE" side of things are compromised for it.
To drop fields of a given measurement, the easiest way would be to;
Retrieve the data out first
Modify its content
Drop the measurement
Re-insert the modified data back
Reference:
https://docs.influxdata.com/influxdb/v1.4/concepts/insights_tradeoffs/

appsheet prevent duplicate entries

I would like to know how I can prevent a duplicate entry (based on my own client/project definition of what that means-below), in an AppSheet mobile app connected to Google Sheets.
AppSheet talks alot about UNIQUEID() which they also encourage using and designating as the KEY field. row_number is another possibility.
This is fine for the KEY in the sense of its purpose is to be unique, meaningless, and uniquely identify a record, and relate to other tables.
However, it doesn't prevent a duplicate ("duplicate" again, as defined by my own client's business rules&process) from occurring. I mean, I assume the UniqueId() theoretically would, but that's abstract theory, because it would only produce unique ones anyway.
MY TABLE HAS THESE COLUMN: [FACILITY NUMBER] and [TIMESTAMP] (date and time of event). We consider it a duplicate event, and want to DISALLOW the adding of such a record to this table, if the 2nd record has the same DATE (time irrelevant), with the same FACILITY. (we just do one facility per day, ever).
In AppSheet how can I create some logic that disallows the add based on that criteria? I even basically know some ways I would do it. it just seems like I can't find a place to "put" it. I created an expression that perfectly evalutes to TRUE or FALSE and nothing else, (by referencing whether or not the FACILIY NUMBER on the new record being added is in a SLICE which I've defined as today's entries). I wanted to place this expression in another (random) field's VALIDIF. To me it seemed like that would meet the platform documentation. the other random field would be considered valid, only if the expression evaluated to true. but instead appsheet thought i wanted to conver the entire [other random column] to a dependent dropdown.
Please help! I will cry tears of joy when appsheet introduces FORM events and RECORD events that can be hooked into at the time of keying, saving, etc.
surprised to see this question here in stackoverflow --- most AppSheet questions are at http://community.appsheet.com.
The brief answer is that you are doing the right thing providing a Valid_If constraint. Your constraint is of the form IN([_THIS], ) so AppSheet is doing the "smart" thing by automatically converting that list into a dropdown of allowed values. From your post, it appears that you may instead want to say NOT(IN([_THIS], )) -- thereby saying that the value [_THIS] is valid as long as it is not in the list specified (making sure it is not a duplicate).
Old question, but in case someone stumbles upon the same:
The (not so simple) answer is given in https://help.appsheet.com/en/articles/961274-list-expressions-and-aggregates.
From the reference:
NOT(IN([_THIS], SELECT(Customers[State], NOT(IN([CustomerId],
LIST([_THISROW].[CustomerId])))))): when used as the Valid_If
condition for the State column, it ensures that every customer has a
unique value for State. In this example, we assume that CustomerId is
the key for the Customers table.
This could be written more schematic like this:
NOT(IN([_THIS], SELECT(<TableName>[<UnqiueColumnName>], NOT(IN([<KeyColumnName>], LIST([_THISROW].[<KeyColumnName>]))))))
Technically it says:
Get me a list of the current values of the column of the table
Ignore the value of the current row (identified by [_THISROW] and looking into the column)
Check, if the given value exists in the resulting list
This statement has to be defined - with the correct values for , & - as Valid_If statement.

Why does a select with consistent read from Amazon SimpleDB yield different results?

I have a domain on SimpleDB and I never delete from it.
I am doing the following query on it.
select count(*) from table_name where last_updated > '2012-09-25';
Though I am setting consistent read parameter as true, it is still returning me different results in different executions. As I am not deleting anything from this domain, ideally the results should be in increasing order, but that is not happening.
Am I missing something here?
If I understand your use case correctly, you might be misreading the semantics of the ConsistentRead parameter in the context of Amazon SimpleDB, see Select:
When set to true, ensures that the most recent data is returned. For
more information, see Consistency
The phrase most recent can admittedly be misleading eventually, but it doesn't address/affect result ordering in any way, rather it means most recently updated and ConsistentRead guarantees that every update operation preceding your select statement is visible to this select operation already, see the description:
Amazon SimpleDB keeps multiple copies of each domain. When data is
written or updated, all copies of the data are updated. However, it
takes time for the update to propagate to all storage locations. The
data will eventually be consistent, but an immediate read might not
show the change. If eventually consistent reads are not acceptable for
your application, use ConsistentRead. Although this operation might
take longer than a standard read, it always returns the last updated
value. [emphasis mine]
The linked section on Consistency provides more details and an illustration regarding this concept.
Sort order
To achieve the results you presumably desire, a simple order by statement should do the job, e.g.:
select * from table_name where last_updated > '2012-09-25' order by last_updated;
There are a couple of constraints/subtleties regarding this operation on SimpleDB, so make sure to skim the short documentation of Sort for details.

Resources