I am relatively new to time series db world . I am running a Influxdb 1.8.x as a docker container, and I have configured the influxdb.conf file as a default config. Currently I am facing a issue of high CPU usage by influxdb, the CPU jumps to 80 to 90% and creating a problem for other process running on same machine.
I have tried a solution given here ->> Influx high CPU issue but unfortunately It did not work? I am unable to understand the reason behind the issue and also struggling to get support in terms of documentation or community help.
What I have tried so far:
updated the monitor section of influxdb.conf file like this ->> monitor DB
Checked the series cardinality SHOW SERIES CARDINALITY and it looks well within limits--9400(I am also not sure about the ideal number for high cardinality red flag)
I am looking for an approach, which will help me understand this problem the root cause?
Please let me know if you need any further information on same.
After reading about Influxdb debug and CPU profiling HTTP API influxdb I was able to pin-down the issue, the problem was in the way I was making the query, my query involved more complex functions and also GROUP BAY tag.I also tried query analysis using EXPLAIN ANALYZE (query) command to check how much time a query is taking to execute. I resolved that and noticed a huge Improvement in CPU load.
Basically I can suggest the following:
Run the CPU profile analysis using influxdb HTTP API with the command curl -o <file name> http://localhost:8086/debug/pprof/all?cpu=true e and collect result.
Visualize the result using tool like Pprof tool and find the problem
Also one can run basic commands like SHOW SERIES CARDINALITY and EXPLAIN ANALYZE <query> to understand the execution of the query
Before designing any schema and Influx client check the hardware recommendation ->> Hardware sizing guidelines
Related
I have two controllers accessing distributed database. I am receiving some data from a device to the controllers and i store them in Cassandra database. I use Docker to install cassandra
The node 1 is on controller 1 and node 2 is on controller 2. I would like to know if there is a possibility to measure the time it takes to update the node 2, when i receive data at node 1.
I would like to draw a graph with it. So could someone tell me how do i measure it.
Thanks
Cassandra provides tools and insights of all this internal information with the nodetool gossipinfo command and cqlsh tracing.
In the scenario that you are proposing, I'm inferring that you are using a Replication Factor of 2, and that you are interested in the exact time that is taking to have the information written in all the nodes, you can measure the time required to do a write with the consistency level set to ALL, and compare it with similar writes using the consistency level of ONE. The difference of the times will be the propagation from one node to the other.
Finally, if you are interested in measuring the performance of the queries in Cassandra, there are several tools that enhance the tracing functionality, in our team we have been using zipkin with good results.
Using JConsole someone can access to the metrics that were gathered by default for OS like memory, CPU load and ..., in addition to process specific metrics. My question is can we add some OS customized metrics, like the usage of some directory using Java Files API or checking if a specific port is responsive?
I gather so-called metrics using remote SSH and the commands like du -sh /directory that has so many delays and I want to get it using JMX so it could run faster.
This question talked about adding spring metrics.
As the linked question shows it is easy to expose a Java class as an MBean, so you could certainly write a class that collects the metrics you need. Implementing du in Java is not difficult. However, I'm not sure that it will solve your problem. The example of du -sh /directory is probably slow because it needs to recursively measure the size of a directory hierarchy. That will be just as slow (probably slower!) in Java.
As a side note I would normally use collectd or Telegraf for that kind of thing, but again the I/O cost for finding disk usage would be the same.
I would suggest adding some logs with times to your current script so that you can see where it spends time. If it takes less than a second to connect with SSH and 15 seconds to determine the directory size, for example, moving from SSH to JMX won't help.
hello i really want to use bosun/tsdbrelay/opentsdb with the telegraf collector, as it gets all the metrics we want to monitor out of the box.
i allready have a small setup to push metrics from 5 servers to bosun for indexing and opentsdb for storage.
i used the haproxy configs from kyle brandts bosun infrastructure blog to make the tsdbs ha-ready
but bosun is showing that it cannot use the auto-type for metrics, and also in the primary stats view does not show any graphs for cpu / mem etc.
what can i provide that the graphs show up.
kind regards.
Both of these features are mostly scollector specific. The "host" view (I've considered ripping that out, it was done in the early days, better to use something like grafana) depends on scollector specific metrics such as os.cpu.
As far as "Auto" for rate vs gauge, that is also metadata that comes from scollector and sent to bosun. If you want to try to mimic the behavior see https://github.com/bosun-monitor/bosun/blob/master/metadata/metadata.go#L30 and https://github.com/bosun-monitor/bosun/blob/master/metadata/metadata.go#L195 - you would need to create at least the "rate" key for each metric you are getting from telegraph.
Sorry I just started to learn docker. My question may seem stupid for some of you.
In fact, I would like to know if there is a way to collect performance metrics from "CAdvisor" container (not from cgroup) at runtime ? I mean, extract performance values from the curves designed by cadvisor like memory usage or network traffic.
I need to record this values and save them in a database so that, I can perform a statistic analyzes upon these generated values (like comparing memory consumption for two docker containers at t=50s).
Thanks in advance.
As other answers mention, cAdvisor doesn't provide its own performance data API, instead it exposes metrics which are typically handled in a separate database if one wants to derive performance data beyond "real time". For example, cAdvisor exports Prometheus metrics natively:
http://prometheus.io/docs/instrumenting/exporters/
The Prometheus metric types:
http://prometheus.io/docs/concepts/metric_types/
Prometheus supports a fairly rich functional expression language that can be used for querying and visualization:
http://prometheus.io/docs/querying/basics/
cAdvisor does provide a rest endpoint to get any stats in real time. By default, it keeps latest two minute of data. You can configure it to keep more or less. It also supports a storage backend to keep dumping stats to an influxdb database.
REST Api:
eg. /api/v1.3/containers
doc: https://github.com/google/cadvisor/blob/master/docs/api.md
Doc on setting up InfluxDB:
https://github.com/google/cadvisor/blob/master/docs/influxdb.md
I think you could use https://github.com/tutumcloud/container-metrics for this. Basically what that would be doing is using influxdb http://influxdb.com/ as a time series data store.
There is some more information available here: http://blog.tutum.co/2014/08/25/panamax-docker-application-template-with-cadvisor-elasticsearch-grafana-and-influxdb/
A couple of people seemed to be looking into the ELK stack (Elastic Search, Logstash, Kibana) for visualising some of this data here: https://github.com/google/cadvisor/issues/634
I am using grinder in order to generate reports for the performance tests for my application. But I noticed that it does not generate any report on CPU and memory usage. On further investigation, I found that Grinder does not provide this information. Now, my question is, is there any tool that can be hooked up with grinder, to record the CPU and memory usage details?
As you have discovered, this is not supported directly in The Grinder itself. You will need to use a collection of tools to accomplish this.
I use a combination of Quickstatd, Graphite, and Grinder to Graphite to get all my results in the same place where I can see them. If you need to support Windows, you can probably use collectd (with ssc-serv and the Graphite plugin) instead of Quickstatd, which is based on bash scripts.
You can also pull in server side metrics (like DB lookups per second, etc.) with tools like jmxtrans, statsd, and metrics.
Having all that information in the same place is really powerful, and can give you some good insights.
If you grind a Java server, you can get data via JMX from OperatingSystemMXBean and MemoryMXBean.
Then add the data to a Grinder user Statistic and the data will end up in the -data.log
grinder.statistics.registerDataLogExpression("Load", "userDouble0")
..
grinder.statistics.forCurrentTest.setDouble("userDouble0", systemLoadAverage)
the -data.log can directly be fed into Gnuplot
gnuplot> plot 'client-0-data.log' using 2:7 title "System Load"