Inserting with a specific time? - time-series

I am looking through all the InfluxDB examples, and they all seem to insert with "time now" (time of insert). There is a well-defined "time" field, but none of the examples use it.
Recording the time of an event as "insert time into the DB" is a poor pattern. It's always better to have the sensor attach to the sensor value its idea of the current time, pass that record around, and insert into various analytics DBs with that time value. ( really small sensors might have a "controller" that knows time better, but that's still not the database insert ).
An obvious example is log files. Each line has a timestamp, right at the beginning. Love it or hate it, but that's your best view of the time the event happened.
I'm looking for examples of inserting into InfluxDB with a specified time value, and haven't come up with one yet. Time appears to always be the implied current time.

Simply specify a timestamp along side your tags and values in your points, see here for examples:
https://docs.influxdata.com/influxdb/v1.3/guides/writing_data/#writing-data-using-the-http-api
Docs for the 0.9 version :
http://influxdb.com/docs/v0.9/concepts/schema_and_data_layout.html
If you are using 0.8, then you'll want your points to start with a time column instead:
http://influxdb.com/docs/v0.8/api/reading_and_writing_data.html

Yes, it's perfectly possible. You just have to specify a "time" column together with a value field. For instance:
{
name: "my_serie",
columns: ["time", "value1", "value2"],
points: [
[1429807111, 1, 2],
[1429807111, 11, 22],
[1429807111, 111, 222]
]
}
Of course you can specify as many columns as you want.

In the influx CLI, you can add the timestamp at the end of the line, in nanosecond-precision Unix time, per the Line Protocol:
$ influx
Connected to http://localhost:26131 version 1.3.5
InfluxDB shell version: 1.3.5
> insert log value=1 1504225728000123456

Related

How to find longest run of a certain value in InfluxDB

I have an InfluxDB measurement which includes a field that holds either 0 or 1.
How do I find the longest unbroken run of a given value?
Imagine that the field represents whether the sun is up or not, and I have a year's worth of data. I would like the query which finds the longest unbroken run of 1's, which would represent the longest day of the year and return me something like "23rd June 5am to 23rd June 9pm". (I'm in the northern hemisphere, and totally made those times up, but hopefully you get the idea.)
I don't think this can be done with InfluxQL. In many RDBMS, it's possible to do similar operations in a single SQL query using window functions and grouping.
I've experimented a few ways, but as of v1.3 I believe InfluxQL is just not expressive enough for this task. Limitations include:
No window functions (although some functions exhibit similar behaviour, e.g. DIFFERENCE, DERIVATIVE).
time cannot be manipulated like an ordinary tag or field value. For example, it's not possible to take the FIRST(time) of a group of measurements.
Can only GROUP BY time or tag, not by field value (or derived value from a subquery result). Additionally, when grouped by time, only group interval timestamps are returned by selector functions.
Can only ORDER BY time.
The best way to do this is therefore at the application level.
Edit: for the record, the closest I can get is to use ELAPSED to find the longest gap(s) between subsequent 0 values. This might work for you if your data model is a specific shape and data comes in at regular intervals:
SELECT TOP(elapsed, N) AS elapsed FROM (SELECT ELAPSED(field) FROM measurement WHERE field != 1)
Returns e.g. for N = 1:
time elapsed
---- -------
2000 500
However, there is no guarantee that there is a value of 1 in the gap. Use FIRST to retrieve the first measurement with field == 1 within the gap, or nothing if there are none:
SELECT FIRST(field) FROM measurement WHERE field = 1 AND time < 2000 and time > (2000 - 500)
Returns e.g.:
time first
---- -----
1000 1
Therefore the longest run of 1 values is from 1000 -> 2000.

Is there a way to manually insert records into InfluxDB with custom timestamps via telegraf?

https://github.com/influxdata/telegraf/pull/1557
Apparently some people have been asking for this, and this Github PR is the closest thing I can find to a solution, but it was ultimately denied(I think?).
Basically, I have a JSON object I'm getting from Stackdriver, which includes a Timestamp in ISO8601, which I convert to Unix time. I can insert the entire JSON response into Influx fine, but the timestamp from Stackdriver appears as a tag for a series, rather than the index of the time series itself. As a result, it is unfeasible to query by Stackdriver's provided timestamp. I could simply just drop it, and use the Influx provided timestamp, but it is essentially querying incorrect/imprecise data.
Does anyone have a clever way to approach this?
tl;dr How can I use Telegraf to override InfluxDB's timestamps with my own timestamps?

InfluxDB - Query milliseconds since last data point in a time series

Is it possible to write a InfluxDB query that will give me the number of milliseconds since the last entry in a time series? I'd like to add a single-stat panel in Grafana displaying how old the data is.
I don't think it is possible since you are not able to query the time alone. A influxdb query needs at least one non-time field in a query. You could workaround that by double saving the time in a extra field which you are able to query alone.
But you still want to use now() - "the extra time field". But as far as I found out you also can't use now() inside grafana.
Update: there is a [Feature-Request] now on grafanas github. Make sure to vote it up so it gets implemented one day: https://github.com/grafana/grafana/issues/6710
Update 2: The feature got finaly implemented -> See my answer here: How to show "33 minutes ago" on Grafana dashboard with InfluxDB?

InfluxDB performance

For my case, I need to capture 15 performance metrics for devices and save it to InfluxDB. Each device has a unique device id.
Metrics are written into InfluxDB in the following way. Here I only show one as an example
new Serie.Builder("perfmetric1")
.columns("time", "value", "id", "type")
.values(getTime(), getPerf1(), getId(), getType())
.build()
Writing data is fast and easy. But I saw bad performance when I run query. I'm trying to get all 15 metric values for the last one hour.
select value from perfmetric1, perfmetric2, ..., permetric15
where id='testdeviceid' and time > now() - 1h
For an hour, each metric has 120 data points, in total it's 1800 data points. The query takes about 5 seconds on a c4.4xlarge EC2 instance when it's idle.
I believe InfluxDB can do better. Is this a problem of my schema design, or is it something else? Would splitting the query into 15 parallel calls go faster?
As #valentin answer says, you need to build an index for the id column for InfluxDB to perform these queries efficiently.
In 0.8 stable you can do this "indexing" using continuous fanout queries. For example, the following continuous query will expand your perfmetric1 series into multiple series of the form perfmetric1.id:
select * from perfmetric1 into perfmetric1.[id];
Later you would do:
select value from perfmetric1.testdeviceid, perfmetric2.testdeviceid, ..., permetric15.testdeviceid where time > now() - 1h
This query will take much less time to complete since InfluxDB won't have to perform a full scan of the timeseries to get the points for each testdeviceid.
Build an index on id column. Seems that he engine uses full scan on table to retrieve data. By splitting your query in 15 threads, the engine will use 15 full scans and the performance will be much worse.

How to store frequency dates

My users have the fallowing frequency options: Daily, Weekly, Biweekly, Monthly.
Also for the last tree they have the option to choose which days, for example on weekly/biweekly they can choose every (Monday, Tuesday, Friday) and for monthly (10, 15, 25, 30).
For the weekly frequency I can get the days of the week which they selected, in rails they are from 0-6(Sunday-Saturday).
So I came with the table:
Settings: setting_id, resource_id, frequency(daily, weekly, biweekly, monthly), days[]: [0, 1, 2]
Now I need to build a postgresql DB view daily which needs the resource_id from my Settings table and should get only the resources where the frequency and days were scheduled for current day(now()).
One solution that I can think of, is to use the postgresql CASE function and check to see what type of frequency was set for the resource,in the case block I can parse the current date to get the day number or the week number with the day number of the week and compare them with what was stored in the table.
Is there a better way of doing this, because I can see some performance issues with the DB view.
Another option will be I guess to have different views for each report.
You have several options. One possibility is to use a string, and encode the frequency into a cron-like syntax.
That is a well know format. It's not immediate to read, but it is excellent to store somewhere, for instance in a file (like cron does) or in a database field.
There are Ruby libraries, such as whenever and rufus-scheduler that already deals with the hassle of converting a cron syntax into a Ruby representation, and vice-versa.
Therefore you can write
every :day, :at => '12:20am'
or
every '3h'
at '2030/12/12 23:30:00'
and get a nicely formatted cron schedule definition. You can pull the parser of one of those libraries into your project, and use it to convert your database serialized string into a Ruby representation, and vice-versa.

Resources