Capture / Monitor system data of application server in Graphite - memory

I am using graphite server to capture my metrics data and bring down to graphs. I have 4 application servers which is load balancer setup. My aim is capture system data such as cpu usage, memory usage, disk load, etc., for all the 4 application servers. I setup an graphite environment in a separate server and i wanted to push the system data for all the applications servers to graphite and get it display as graphs. I don't know what needs to be done for feeding system data to graphite. My thinking was to install statsd in all application servers and feed the system data to graphite but looks like statsd does not support system data rather application data.
Can anyone help me to catch the right track. Thanks in advance.

Running collectd with a graphite agent would be an excellent start to gather the information your after.
There is an almost unlimited amount of ways to get your data into graphite.
You can find a list of tools that have known to work very well with graphite on the readthedocs.org page: http://graphite.readthedocs.org/en/0.9.10/tools.html
There is also an example script that gathers load average from the system in the carbon project: example-client.py

Related

Solution for Data pipeline,ETL load monitoring

My team is looking for a solution (both in house or tools) for performance monitoring and operation management for 500 plus SSIS ETL loads ( with varied run frequencies- daily, weekly, monthly etc) and 100 plus data pipelines ( currently Hadoop is used as data lake storage layer but the plan is to migrate to data bricks hosted on AWS). Data pipes will increase as ML And AI needs evolve over time. The solution should be able to handle input from SSIS as well as from data pipelines. Primary goals for this solution are:
Generate a dashboard that shows data quality anomalies ( ETL source sent 100 but destination received only 90, For pipeline x -- avg data volume received is reduced by 90%).
Send alerts on failures. Can integrate with service now to create tickets for severe failure etc.
Allow the operation team to quickly figure out important performance metrics -- Execution run time, any slowness in a particular pipeline/bottlenecks.
Ability to customize the dashoards.
Right now we are thinking a SQL server based centralized logging table which will get metrics from ETL and pipelines and then expose this data to powerbi and create custom dashboards. Write some API to integrate with service now to create alerts. But this solution might be hard to scale.
Can someone suggest me some good Data monitoring tools which can serve our needs. My google search came up with these 3 tools :
Data Dog, Data fold, Dyna trace and ELK stack.

How to get Average Disk Response Time in Linux/Redhat through SNMP

Actually I am developing system monitoring tools using different opens source such as InfluxDb, Telegraf, Grafana and SNMP(Simple Network Management Protocol).
I am enabling SNMP in the different linux or Redhat systems and fetching the systems statics through Telegraf and storing data into influxDB.
So here I am using telegraf as a agent master(collecting data from multiple systems) and SNMP as a agent and got CPUs, Ram/swap, Disk usage, Network packets etc through Mibs.
I am trying to monitor Average Disk Response time (wait+service time) as well.
I got UCD-DISKIO-MIB but its providing some row data.
So the question is how to calculate average response time using those data.

What is recommended solution for monitoring heterogeneous infrastructure?

I am looking for monitoring tool for the following use cases:
Collect basic metrics about virtual machine (cpu usage, memory usage, i/o, available space)
Extract metrics from SQL Server (probably running some queries)
Extract information from external service about processing i.e how many processing are currently running and for how long. I am thinking about writing python scripts, but don't know how to combine with monitoring tool
Have the ability to plot charts and manage alerts and it will nice to have ability to send not only mails, but send message to slack/ms teams.
I was thing about Prometheus, because it has wmi_exporter, node_exporter, sql exporter, alert manager with possibility to send notifications to multiple destinations, but I don't know what to do with this external service and python scripts.
Any suggestions?
Prometheus can definitely do what you say you need done. Some of it may not be trivial, but you can definitely fill in the blanks yourself.
E.g. you can get machine metrics basically out of the box by firing up a node_exporter and having it scraped by Prometheus, but I don't think it has e.g. information on all running processes. The latter might require you to write an agent/exporter: a simple web server that exposes metrics on /metrics; there exists a Python client library to help with that. Or have said processes (assuming they're your code) push metrics to a Pushgateway instead, if they're short lived batch jobs.
Oh, and for charts/dashboards you probably want Grafana, as Prometheus' abilities in that area are rather limited and Grafana integrates rather well with Prometheus.

How do I retrieve data from statsd?

I'm glossing over their documentation here :
http://www.rubydoc.info/github/github/statsd-ruby/Statsd
And there's methods for recording data, but I can't seem to find anything about retrieving recorded data. I'm adopting a projecting with an existing statsd addition. It's host is likely a defunct URL. Perhaps, is the host where those stats are recorded?
The statsd server implementations that Mircea links just take care of receiving, aggregating metrics and publishing them to a backend service. Etsy's statsd definition (bold is mine):
A network daemon that runs on the Node.js platform and listens for
statistics, like counters and timers, sent over UDP or TCP and sends
aggregates to one or more pluggable backend services (e.g.,
Graphite).
To retrieve the recorded data you have to query the backend. Check the list of available backends. The most common one is Graphite.
See also this question: How does StatsD store its data?
There are 2 parts to statsd: a client and a server.
What you're looking at is the client part. You will not see functionality related to retrieving the data as it's not there - it normally is on the server side.
Here is a list of statsd server implementations:
http://www.joemiller.me/2011/09/21/list-of-statsd-server-implementations/
Research and pick one that fits your needs.
Statsd originally started at etsy: https://github.com/etsy/statsd/wiki

Best tool to record CPU and memory usage with Grinder?

I am using grinder in order to generate reports for the performance tests for my application. But I noticed that it does not generate any report on CPU and memory usage. On further investigation, I found that Grinder does not provide this information. Now, my question is, is there any tool that can be hooked up with grinder, to record the CPU and memory usage details?
As you have discovered, this is not supported directly in The Grinder itself. You will need to use a collection of tools to accomplish this.
I use a combination of Quickstatd, Graphite, and Grinder to Graphite to get all my results in the same place where I can see them. If you need to support Windows, you can probably use collectd (with ssc-serv and the Graphite plugin) instead of Quickstatd, which is based on bash scripts.
You can also pull in server side metrics (like DB lookups per second, etc.) with tools like jmxtrans, statsd, and metrics.
Having all that information in the same place is really powerful, and can give you some good insights.
If you grind a Java server, you can get data via JMX from OperatingSystemMXBean and MemoryMXBean.
Then add the data to a Grinder user Statistic and the data will end up in the -data.log
grinder.statistics.registerDataLogExpression("Load", "userDouble0")
..
grinder.statistics.forCurrentTest.setDouble("userDouble0", systemLoadAverage)
the -data.log can directly be fed into Gnuplot
gnuplot> plot 'client-0-data.log' using 2:7 title "System Load"

Resources