Graphing CPU Usage % on Grafana using influxDB data from Telegraf - influxdb

I have grafana 3.0.4 / influxdb 0.13.0 / telegraf 0.13.1
I am trying to graph overall CPU usage by %.
When I create a query using the idle time, I get exactly that (I'm looking for 100 - idle time) (usage)
I switched to manual mode and did exactly that:
and it works great..
But is there a way to use a math function or something in the normal editor versus dropping into the manual text mode?

Some slightly convoluted math using Grafana's Math field setting should do what you want without dropping into manual text mode.
Math(* -1 + 100)

Related

How to use cAdvisor data to calculate network bandwidth usage (per month) in grafana?

I’m using Prometheus (incl. cAdvisor) and Grafana to monitor my server, on which docker containers are running. cAdvisor gives me the data for my docker containers.
I’m trying to monitor the network bandwidth usage for the selected time (on the top right corner of grafana). It should output a value like 15 GB (for the selected month) or 500 MB (for the selected day).
My approach so far:
In Grafana I am using the Stat UI with Value options > Calculation of Total, while using the following query:
sum(rate(container_network_receive_bytes_total{instance=~"$host",name=~"$container",name=~".+"}[1m]))
(FYI: I have a container variable to filter the values for the selected container. This is why you can find the part ,name=~"$container" in the query above.)
The problem with the approach above is, that the outputted values do not seem to be right, because if I change the time range to a smaller one, I will get a bigger value. For instance, if I select Last 2 days the output is 1.19 MB, while selecting Last 24 hours gives me 2.38 MB. Of course, this does not make sense, because yesterday + today can’t be smaller than just today.
What do I oversee? How can I achieve it to output correct values?

Execution time of neo4j cypher query

I'm trying to find the execution time of GDS algorithms using the community edition of Neo4j. Is there any way to find it rather than query logging? Since this facility is specific to the enterprise edition.
Update:
I did as suggested. Why the result is 0 for the computeMillis and preProcessingMillis?
Update 2:
The following table indicates the time in ms required for running the Yen algorithm to retrieve one path for each topology. However, the time does not dependent on the graph size. Why? is it normal to have such results?
When you are executing the mutate or the write mode of the algorithm, you can YIELD the computeMillis property, which can tell you the execution time of the algorithm. Note that some algorithms like PageRank have more properties available to be YIELD-ed
preProcessingMillis - Milliseconds for preprocessing the graph.
computeMillis - Milliseconds for running the algorithm.
postProcessingMillis - Milliseconds for computing the
centralityDistribution.
writeMillis - Milliseconds for writing result data back.

How long Prometheus timeseries last without and update

If I send a gauge to Prometheus then the payload has a timestamp and a value like:
metric_name {label="value"} 2.0 16239938546837
If I query it on Prometheus I can see a continous line. Without sending a payload for the same metric the line stops. Sending the same metric after some minutes I get another continous line, but it is not connected with the old line.
Is this fixed in Prometheus how long a timeseries last without getting an update?
I think the first answer by Marc is in a different context.
Any timeseries in prometheus goes stale in 5m by default if the collection stops - https://www.robustperception.io/staleness-and-promql. In other words, the line stops on graph (or grafana).
So if you resume the metrics collection again within 5 minutes, then it will connect the line by default. But if there is no collection for more than 5 minutes then it will show a disconnect on the graph. You can tweak that on Grafana to ignore drops but that not ideal in some cases as you do want to see when the collection stopped instead of giving the false impression that there was continuous collection. Alternatively, you can avoid the disconnect using some functions like avg_over_time(metric_name[10m]) as needed.
There is two questions here :
1. How long does prometheus keeps the data ?
This depends on the configuration you have for your storage. By default, on local storage, prometheus have a retention of 15days. You can find out more in the documentation. You can also change this value with this option : --storage.tsdb.retention.time
2. When will I have a "hole" in my graph ?
The line you see on a graph is made by joining each point from each scrape. Those scrape are done regularly based on the scrape_interval value you have in your scrape_config. So basically, if you have no data during one scrape, then you'll have a hole.
So there is no definitive answer, this depends essentially on your scrape_interval.
Note that if you're using a function that evaluate metrics for a certain amount of time, then missing one scrape will not alter your graph. For example, using a rate[5m] will not alter your graph if you scrape every 1m (as you'll have 4 other samples to do the rate).

How to space out influxdb continuous query execution?

I have many influxdb continuous queries(CQ) used to downsample data over a period of time on several occasions. At one point, the load became high and influxdb went to out of memory at the time of executing continuous queries.
Say I have 10 CQ and all the 10 CQ execute in influxdb at a time. That impacts the memory heavily. I am not sure whether there is any way to evenly space out or have some delay in executing each CQ one by one. My speculation is executing all the CQ at the same time makes a influxdb crash. All the CQ are specified in influxdb config. I hope there may be a way to include time delay between the CQ in the influx config. I didn't know exactly how to include the time delay in the config. One sample CQ:
CREATE CONTINUOUS QUERY "cq_volume_reads" ON "metrics"
BEGIN
SELECT sum(reads) as reads INTO rollup1.tire_volume FROM
"metrics".raw.tier_volume GROUP BY time(10m),*
END
And also I don't know whether this is the best way to resolve the problem. Any thoughts on this approach or suggesting any better approach will be much appreciated. It would be great to get suggestions in using debugging tools for influxdb as well. Thanks!
#Rajan - A few comments:
The canonical documentation for CQs is here. Much of what I'm suggesting is from there.
Are you using back-referencing? I see your example CQ uses GROUP BY time(10m),* - the * wildcard is usually used with backreferences. Otherwise, I don't believe you need to include the * to indicate grouping by all tags - it should already be grouped by all tags.
If you are using backreferences, that runs the CQ for each measurement in the metrics database. This is potentially very many CQ executions at the same time, especially if you have many CQ defined this way.
You can set offsets with GROUP BY time(10m, <offset>) but this also impacts the time interval used for your aggregation function (sum in your example) so if your offset is 1 minute then timestamps will be a sum of data between e.g. 13:11->13:21 instead of 13:10 -> 13:20. This will offset execution but may not work for your downsampling use case. From a signal processing standpoint, a 1 minute offset wouldn't change the validity of the downsampled data, but it might produce unwanted graphical display problems depending on what you are doing. I do suggest trying this option.
Otherwise, you can try to reduce the number of downsampling CQs to reduce memory pressure or downsample on a larger timescale (e.g. 20m) or lastly, increase the hardware resources available to InfluxDB.
For managing memory usage, look at this post. There are not many adjustments in 1.8 but there are some.

How to use the data of cadvisor to calculate the cpu usage and load in Prometheus

when I use the cadvisor to get the information about the cpu in a docker container,I get the information as follow:
my question is that how to caculate the cpu usage and load,which is the same as Prometheus,through the information returned by cadvisor?how Prometheus caculate the cpu usage?
The algorithm that Prometheus uses for rate() is a little intricate due to handling of issues like alignment and counter resets as explained in Counting with Prometheus.
The short version is to subtract first value from the last value, and divide by the time they are over. It's probably easiest to use Prometheus rather than doing this yourself.
Below query should return your top 10 containers which are consuming most CPU time:
topk(10, sum(irate(container_cpu_usage_seconds_total{container_label_com_docker_swarm_node_id=~".+", id=~"/docker/.*"}[$interval])) by (name)) * 100

Resources