How to delete field for a given measurement from influxdb? - influxdb

I created multiple fields to test output in grafana, however I want to delete the unwanted fields from influxdb, is there a way?

Q: I want to delete the unwanted fields from influxdb, is there a way?
A: Short answer. No. Up until the latest release 1.4.0, there is no straightforward way to do this.
The reason why this is so was because Influxdb is explicitly designed to optimise point creation. Thus functionalities for the "UPDATE" and "DELETE" side of things are compromised for it.
To drop fields of a given measurement, the easiest way would be to;
Retrieve the data out first
Modify its content
Drop the measurement
Re-insert the modified data back
Reference:
https://docs.influxdata.com/influxdb/v1.4/concepts/insights_tradeoffs/

Related

"Transactional safety" in influxDB

We have a scenario where we want to frequently change the tag of a (single) measurement value.
Our goal is to create a database which is storing prognosis values. But it should never loose data and track changes to already written data, like changes or overwriting.
Our current plan is to have an additional field "write_ts", which indicates at which point in time the measurement value was inserted or changed, and a tag "version" which is updated with each change.
Furthermore the version '0' should always contain the latest value.
name: temperature
-----------------
time write_ts (val) current_mA (val) version (tag) machine (tag)
2015-10-21T19:28:08Z 1445506564 25 0 injection_molding_1
So let's assume I have an updated prognosis value for this example value.
So, I do:
SELECT curr_measurement
INSERT curr_measurement with new tag (version = 1)
DROP curr_mesurement
//then
INSERT new_measurement with version = 0
Now my question:
If I loose the connection in between for whatever reason in between the SELECT, INSERT, DROP:
I would get double records.
(Or if I do SELECT, DROP, INSERT: I loose data)
Is there any method to prevent that?
Transactions don't exist in InfluxDB
InfluxDB is a time-series database, not a relational database. Its main use case is not one where users are editing old data.
In a relational database that supports transactions, you are protecting yourself against UPDATE and similar operations. Data comes in, existing data gets changed, you need to reliably read these updates.
The main use case in time-series databases is a lot of raw data coming in, followed by some filtering or transforming to other measurements or databases. Picture a one-way data stream. In this scenario, there isn't much need for transactions, because old data isn't getting updated much.
How you can use InfluxDB
In cases like yours, where there is additional data being calculated based on live data, it's common to place this new data in its own measurement rather than as a new field in a "live data" measurement.
As for version tracking and reliably getting updates:
1) Does the version number tell you anything the write_ts number doesn't? Consider not using it, if it's simply a proxy for write_ts. If version only ever increases, it might be duplicating the info given by write_ts, minus the usefulness of knowing when the change was made. If version is expected to decrease from time to time, then it makes sense to keep it.
2) Similarly, if you're keeping old records: does write_ts tell you anything that the time value doesn't?
3) Logging. Do you need to over-write (update) values? Or can you get what you need by adding new lines, increasing write_ts or version as appropriate. The latter is a more "InfluxDB-ish" approach.
4) Reading values. You can read all values as they change with updates. If a client app only needs to know the latest value of something that's being updated (and the time it was updated), querying becomes something like:
SELECT LAST(write_ts), current_mA, machine FROM temperature
You could also try grouping the machine values together:
SELECT LAST(*) FROM temperature GROUP BY machine
So what happens instead of transactions?
In InfluxDB, inserting a point with the same tag keys and timestamp over-writes any existing data with the same field keys, and adds new field keys. So when duplicate entries are written, the last write "wins".
So instead of the traditional SELECT, UPDATE approach, it's more like SELECT A, then calculate on A, and put the results in B, possibly with a new timestamp INSERT B.
Personally, I've found InfluxDB excellent for its ability to accept streams of data from all directions, and its simple protocol and schema-free storage means that new data sources are almost trivial to add. But if my use case has old data being regularly updated, I use a relational database.
Hope that clear up the differences.

How to SELECT, modify and INSERT data in Influx DB?

I'm new to InfluxDB. I currently have two databases in influx. I now want to copy certain data points from a measurement in the database1, then I want to introduce a couple field sets manually and modify the few field values that I copied and finally insert the changed data points in database2 under a different measurement.
I can use the select into statement however that will not allow me to make changes to the datapoint.
If I use the Insert command I will have to type all the field sets and tag sets individually.
I have a solution that I can use python and query the data points and manipulate the data then insert it back. However that will be a lengthy process.
Is there any easy way to accomplish this task?
Thanks.

How do I fix inconsistent types in InfluxDB?

In InfluxDB (1.5), I have a table where the fields have become inconsistently typed. Most rows in the table are Integer, however, some rows have become strings.
How is this possible? I thought, once a field's types were set (upon first insert), any insert into the table with incorrect typing would fail.
What do I do now? If I go back and attempt to overwrite the data in the inconsistent rows, I get errors saying the field is a string.
After some more research, here's what I've discovered:
Answer to Part 1:
InfluxDB uses a system they refer to as 'sharding' - while I don't know the specifics, I do know that data from the same measurement/table can be stored across multiple, different 'shards'.
According to the InfluxDB documentation, field types can differ between these shards, within the same field, on the same table.
Answer to Part 2:
In order to fix this, the currently-suggested answer is to make a new table, download all the data, and re-insert while ensuring the data that gets inserted is the proper types.
If you had a tag which changed type and became a field, this can be especially difficult to fix, the link above does not address that. To do selects only on tag or field, you can use tag_name::tag or field_name::field within a select statement.
The GROUP BY * clause suggested in the link is required in order to preserve tags, but seemed to cause issues when I used it.
My current solution is a PHP script that uses curl, downloads the points, chunks them, then re-inserts the points into the new table, ensuring each point that gets inserted is casted to the new, uniform type, and properly inserted.
The best way to stopping future issues, is simply not to have them. I went looking for how to lock field types in all cases, across all shards, for a particular measurement table.
Unfortunately, it seems impossible to guarantee 100% type consistency across all current and future shards. "Don't make mistakes because it's really difficult to clean up" seems to be InfluxDB's modus operandi.

Most effective, secure way to delete Postgres column data

I have a column in my Postgres table that I want to remove for expired rows. What's the best way to do this securely? It's my understanding that simply writing 0's for those columns is ineffective because Postgres creates a new row upon Updates and marks the old row as dead.
Is the best way to set the column to null and manually vacuum to clean up the old records?
I will first say that it is bad practice to alter data like this - you are changing history. Also the below is only ONE way to do this (a quick and dirty way and not to be recommended):
1 Backup your database first.
2 Open PgAdmin, select the database, open the Query Editor and run a query.
3 It would be something like this
UPDATE <table_name> SET <column_name>=<new value (eg null)>
WHERE <record is dead>
The WHERE part is for you to figure out based on you are identifying which rows are dead (eg. is_removed=true, is_deleted=true are common for identifying soft deleted records).
Obviously you would have to run this script regularly. The better way would be to update your application to do this job instead.

Find changes quickly in larger SQL database?

There is a Java Swing application which uses an Informix database. I have user rights granted for the Swing application (i.e. no source code), and read only access to a mirror of the database.
Sometimes I need to find a database column, which is backing a GUI element (TextBox, TableField, Label...). What would be best approach to find out which database column and table is holding the data shown e.g. in a TextBox?
My general approach is to capture the state of the database. Commit a change using the GUI and then capture the state of the database again. Then I need to examine the difference. I've already tried:
Use the nrows field of systables: Didn't work, because the number in nrows does not seem to be a realtime representation of the row count.
Create a script with SELECT COUNT(*) ... for all tables: didn't work because too many tables (> 5000). Also tried to optimize by removing empty tables, but there are still too many left.
Is there a simple solution that I'm missing?
Please look at the Change Data Capture API and check if this suits your needs
There probably isn't a simple solution.
You probably need to build yourself a map of the database, or a data dictionary for it. It sounds as though you can eliminate many of the tables from consideration since they're empty — at least for a preliminary pass. If you're dealing with information in a text box, the chances are it is some sort of character data; you can analyze which (non-empty) tables which contain longer character strings, and they'd be the primary targets of your searches. If the schema is badly designed with lots of VARCHAR(255) columns even though the columns normally only hold short strings, life is more difficult. Over time, you can begin to classify tables and columns so that you end up knowing where to look for parts of the application.
One problem to beware of: the tabid in informix.systables isn't necessarily as stable as you'd like. Your data dictionary needs to record its own dd_tabid for the table it describes, and can store the last known tabid from informix.systables, but it needs to be ready to find a new tabid value on occasion. You should probably only mark data in your dictionary for logical deletion.
To some extent, this assumes you can create a database in which to record this information. If you can't create an Informix database, you may have to use something else (MySQL, or SQLite, perhaps) to store the data dictionary. Alternatively, go to your DBA team and ask them for the information. Unless you're trying something self-evidently untoward, they're likely to help (but politics can get in the way — I've no idea how collegial your teams are).

Resources