No results when running query against data written to InfluxDB - influxdb

I have InfluxDB 0.13 and I'm sending data via the HTTP API. I'm getting status code 204 back, assuming that means OK. I can see the series if I query with "SHOW SERIES", I see the measurement and tags. But I cannot query on any of the data, it just says no results (query: SELECT * FROM "sql-query").
This is the raw data sent to Influx from Fiddler. Any idea what's wrong?
sql-query,Environment=QA,Service=XTAM_Lag SubscriberName="TXXOff",LagMinutes=141278i 1472628420980000000
sql-query,Environment=QA,Service=XTAM_Lag SubscriberName="TXXTIMEDEPOT",LagMinutes=248i 1472628420980000000
sql-query,Environment=QA,Service=XTAM_Lag SubscriberName="TXXOffMirror",LagMinutes=0i 1472628420980000000
sql-query,Environment=QA,Service=XTAM_Lag SubscriberName="TXXOffMirrorQA",LagMinutes=527i 1472628420980000000
sql-query,Environment=QA,Service=XTAM_Lag SubscriberName="TXXOff",LagMinutes=141279i 1472628480390000128
sql-query,Environment=QA,Service=XTAM_Lag SubscriberName="TXXTIMEDEPOT",LagMinutes=249i 1472628480390000128
sql-query,Environment=QA,Service=XTAM_Lag SubscriberName="TXXOffMirror",LagMinutes=0i 1472628480390000128
sql-query,Environment=QA,Service=XTAM_Lag SubscriberName="TXXOffMirrorQA",LagMinutes=528i 1472628480390000128

By default, all InfluxDB queries with no time constraint will use the current time in UTC on the InfluxDB server as an implicit upper time bound. Essentially, the query SELECT * FROM "sql-query" is interpreted as SELECT * FROM "sql-query" WHERE time < now().
The current UTC time on the server running InfluxDB can be different from the current time on the server generating metrics. This difference can be due to either a bad clock or, more likely, the use of a time zone other than UTC.
If there is an offset, new data will sometimes be written with timestamps in the relative future. Due to the implicit upper time bound on queries explained above, those points will then be excluded from a basic query.
To confirm whether this is the issue, try running a query with the upper time bound set a few days in the future.
SELECT * FROM "sql-query" WHERE time < now() + 1w
The query above will return all points in the sql-query measurement, plus any points written written with a relative time up to a week in the future.

Related

How to space out influxdb continuous query execution?

I have many influxdb continuous queries(CQ) used to downsample data over a period of time on several occasions. At one point, the load became high and influxdb went to out of memory at the time of executing continuous queries.
Say I have 10 CQ and all the 10 CQ execute in influxdb at a time. That impacts the memory heavily. I am not sure whether there is any way to evenly space out or have some delay in executing each CQ one by one. My speculation is executing all the CQ at the same time makes a influxdb crash. All the CQ are specified in influxdb config. I hope there may be a way to include time delay between the CQ in the influx config. I didn't know exactly how to include the time delay in the config. One sample CQ:
CREATE CONTINUOUS QUERY "cq_volume_reads" ON "metrics"
BEGIN
SELECT sum(reads) as reads INTO rollup1.tire_volume FROM
"metrics".raw.tier_volume GROUP BY time(10m),*
END
And also I don't know whether this is the best way to resolve the problem. Any thoughts on this approach or suggesting any better approach will be much appreciated. It would be great to get suggestions in using debugging tools for influxdb as well. Thanks!
#Rajan - A few comments:
The canonical documentation for CQs is here. Much of what I'm suggesting is from there.
Are you using back-referencing? I see your example CQ uses GROUP BY time(10m),* - the * wildcard is usually used with backreferences. Otherwise, I don't believe you need to include the * to indicate grouping by all tags - it should already be grouped by all tags.
If you are using backreferences, that runs the CQ for each measurement in the metrics database. This is potentially very many CQ executions at the same time, especially if you have many CQ defined this way.
You can set offsets with GROUP BY time(10m, <offset>) but this also impacts the time interval used for your aggregation function (sum in your example) so if your offset is 1 minute then timestamps will be a sum of data between e.g. 13:11->13:21 instead of 13:10 -> 13:20. This will offset execution but may not work for your downsampling use case. From a signal processing standpoint, a 1 minute offset wouldn't change the validity of the downsampled data, but it might produce unwanted graphical display problems depending on what you are doing. I do suggest trying this option.
Otherwise, you can try to reduce the number of downsampling CQs to reduce memory pressure or downsample on a larger timescale (e.g. 20m) or lastly, increase the hardware resources available to InfluxDB.
For managing memory usage, look at this post. There are not many adjustments in 1.8 but there are some.

Grafana: Panel with time of last result

I have an elasticsearch instance that receives logs from multiple backup routines. I'd like to query ES for these logs from Grafana and set up a panel that shows the last time for the different backups. Ideally I would also like to be able to show this in color if the time is longer than a certain threshold.
Basically the idea is to have a display that shows, for instance, green if a certain backup has been completed in the last 24 hours, and red if it hasn't.
How would I do this in Grafana with ES as the datasource?
Exact implementation depends on the used panel.
Example for singlestat: write ES query and then select Stat: Time of last point, you may need to select suitable unit/format:
Unfortunately, Grafana doesn't understand thresholds in your requested time format (older than 24 hours). You will need to return it as metric (for example as age of last backup in seconds) = you will need to write query for that. That means, that you will have 2 stats to show (last time + age), so you won't be able to use singlestat. Probably table panel will be better - you can use thresholding based on the age metric there.
In addition to the great answer by Jan Garaj, it looks like there is work being done to make this type of thing much easier in the future.
Check out this issue to check progress.

Grafana + Prometheus: Display single stat of how often an event occurred

How do I use Prometheus + Grafana to tell how many time an event occurs
during a given time period?
I have a Prometheus counter that I increment every time this event happens. I would like to display it in a Singlestat number. It seems like this should be as simple as:
sum(increase(some_event_happened{application="example-app"}[$__range]))
And the display set to "Current" value.
However, this gives numbers that are much higher than the actual number of events in the given range. Also, it seems to vary based on how much I offset the range, and how large the range is.
More importantly, it crashes our Prometheus server with an out of memory error when I have three or four of these on a single dashboard.
I've tried setting a recorded rule to address the crashes, but I haven't figured out the right way to slice up the record rule to still be able to display the Grafana range.
So in summary, I want a Singlestat displaying the number of times an event happened in the current time range set in the Grafana dashboard. It seems like this is a very basic thing for a monitoring system. Am I just using the wrong approach?
I've encountered similar issues and they appear to be due to discrepancies between the query interval (in Prometheus) and the min step (in Grafana). Try using this global, built-in variable for your interval, which will make sure Prometheus is always in sync with the Grafana step: $__interval.
sum(increase(some_event_happened{application="example-app"}[$__interval]))
http://docs.grafana.org/reference/templating/
https://www.stroppykitten.com/technical/prometheus-grafana-statistics

Measure service latency with Prometheus

I am new to Prometheus and Grafana. My primary goal is to get the response time per request.
For me it seemed to be a simple thing - but whatever I do I do not get the results I require.
I need to be able to analyse the service latency in the last minutes/hours/days. The current implementation I found was a simple SUMMARY (without definition of quantiles) which is scraped every 15s.
Is it possible to get the average request latency of the last minute from my Prometheus SUMMARY?
If YES: How? If NO: What should I do?
Currently I am using the following query:
rate(http_response_time_sum{application="myapp",handler="myHandler", status="200"}[1m])
/
rate(http_response_time_count{application="myapp",handler="myHandler", status="200"}[1m])
I am getting two "datasets". The value of the first is "NaN". I suppose this is the result from a division by zero.
(I am using spring-client).
Your query is correct. The result will be NaN if there have been no queries in the past minute.

Issue in GraphAware TimeTree - bad insertions for minute resolution

EDIT: Solved problem.
TL;DR: TimeTree wants milliseconds since epoch. I was using seconds since epoch as my time values.
Versions:
Neo4j community : 3.0.3
GraphAware / TimeTree community server plugins : 3.0.3.39
I recently started using a time tree to search my graph by time ranges. I noticed some funny behavior the other day when I made a query like this:
"
WITH ({start:1350542000,end:1350543000}) as tr
CALL ga.timetree.events.range(tr) YIELD node as n
RETURN n
LIMIT 5;
"
Note that the time range here is only 1000 seconds apart. What was strange is that my return nodes (which are all of the same type) looked like this:
Node[343421]{gtype:1,bbox:[121.01454162597656,20.602155685424805,121.01454162597656,20.602155685424805],meta:"KAOU_20110613v20001_1422",time:1308026580,lat:20.602155685424805,lon:121.01454162597656}
Specifically, note that the value time:1308026580 is NOT within the bounds I provided. Now, I made this example up (because the query takes forever to run right now), but I was getting similar results the last time I ran the query.
So I investigated a little. First off, this is how I insert my data into the TimeTree:
MATCH (r:record {meta:"KAOU_20110613v20001_1422"})
WITH r
CALL ga.timetree.events.attach({node: r, time: r.time, relationshipType: "observedOn", resolution:"Minute"})
YIELD node
RETURN node.meta;
Notice the resolution: "Minute". When I first wrote this query as a function, I forgot to specify the resolution. So when I added about 4-5 records with this method, the resolution defaulted to "Day".
I didn't think this was an issue, so I just left these records in the graph at "Day" resolution and everything following would be at resolution "Minute".
So I decided to check out the graph using the Neo4J browser to see if anything weird is going on. From here, I executed the following query:
MATCH p=(:TimeTreeRoot)-[:CHILD*5]-()-[]-(:record) RETURN p LIMIT 25;
Ah ha! I noticed that all those records attached to the Minute node are consecutive records in terms of their time value. For example:
KAOU_20110613v20001_0956 has time:1307998620
KAOU_20110613v20001_0957 has time:1307998680
These consecutive records are all 1 minute apart. (i.e. time1 - time2 == 60)
So why are they being added to the same Minute node? I used an epoch time converter to verify that these time stamps are in fact 1 minute apart and represent the dates they are intended to.
I believe this problem is contributing to my performance lag since all my records are globbing up on Minute nodes.
So, either I missed something regarding the time values and how the Time Tree handles them, or something else fishy is going on.
I figured out my problem. Going back through some documentation, I found the following in the Examples section:
"time instant represented by {time} which is a long (the number of milliseconds since 1/1/1970)."
I must have misread this somewhere, and assumed the value was represented in seconds. That explains the behavior I am experiencing perfectly. All I need to do is multiply all my time values by 1000 to get milliseconds instead.

Resources