Get SNMP monitoring metrics to Ganglia - monitoring

I have a cluster of servers monitored with ganglia and I just added a new application on one of them. The new application uses an SNMP handler to report about its activity.
I never used SNMP before and would like to gather most of SNMP metrics that I see in the MIB file to my rrd files used by Ganglia.
Would this be possible ?
I wanted to write a new ganglia module that would take into account the new metrics from the new application. So I tried to read the code of the SNMP handler of the application but cannot catch where it takes its information from.
What would be a good way to figure out this situation ?
Thank you!

Related

What is recommended solution for monitoring heterogeneous infrastructure?

I am looking for monitoring tool for the following use cases:
Collect basic metrics about virtual machine (cpu usage, memory usage, i/o, available space)
Extract metrics from SQL Server (probably running some queries)
Extract information from external service about processing i.e how many processing are currently running and for how long. I am thinking about writing python scripts, but don't know how to combine with monitoring tool
Have the ability to plot charts and manage alerts and it will nice to have ability to send not only mails, but send message to slack/ms teams.
I was thing about Prometheus, because it has wmi_exporter, node_exporter, sql exporter, alert manager with possibility to send notifications to multiple destinations, but I don't know what to do with this external service and python scripts.
Any suggestions?
Prometheus can definitely do what you say you need done. Some of it may not be trivial, but you can definitely fill in the blanks yourself.
E.g. you can get machine metrics basically out of the box by firing up a node_exporter and having it scraped by Prometheus, but I don't think it has e.g. information on all running processes. The latter might require you to write an agent/exporter: a simple web server that exposes metrics on /metrics; there exists a Python client library to help with that. Or have said processes (assuming they're your code) push metrics to a Pushgateway instead, if they're short lived batch jobs.
Oh, and for charts/dashboards you probably want Grafana, as Prometheus' abilities in that area are rather limited and Grafana integrates rather well with Prometheus.

Zabbix & external monitoring systems

I need to make freinds zabbix & other monitoring system.
My company uses Zabbix for monitoring. Our partner plans to use other system.
We need to exchange monitoring datas.
I'm interested in coopereation with the next systems: BMC Patrol, MS SCOM, NetCool, Portal.
What is the best way to integrate it?
Maybe via SNMP?
Replicate hosts and metrics into your Zabbix (use Zabbix trapper item type and setup also Allowed hosts value) and then just use some suitable zabbix-sender implementation and push data into Zabbix.
IMO it's terrible idea, because latency, syncing, ... Do you really need data (item values) or do you need only visualize data from different datasources in one graph?
Regarding BMC Patrol you can use History Loader/Propagator KM to export the monitoring data:
https://docs.bmc.com/docs/display/public/unixlinux912/PATROL+KM+for+History+Loader
or you can use the 'dump_hist' command to dump the history data from the agents:
https://docs.bmc.com/docs/display/pia9600/dump_hist+uility
Regarding Netcool events, you could get the information using different approaches, for example, depending on the version, you could get the events from the HTTP interface, as described below:
https://www.ibm.com/support/knowledgecenter/en/SSNFET_9.2.0/com.ibm.netcool_OMNIbus.doc_7.4.0/omnibus/wip/api/reference/omn_api_http_httpinterface.html
Or perhaps you could create a flat file gateway to read the events and write them on a file:
https://www.ibm.com/support/knowledgecenter/en/SSSHTQ/omnibus/gateways/flatfilegw/wip/concept/flatfilegw_intro.html

Windows Service Bus Topic/Queue Monitoring

What is the recommended way of monitoring Windows Service Bus Subscription and Queues? I would ideally like to monitor and alert on:
-Dead letter count
-Total Message
-Messages older than a given timespan
I have looked at SCOM Packs http://www.microsoft.com/en-gb/download/details.aspx?id=35383 but it appears to monitor the Farm and Hosts etc, not individual queues or topics.
Ideally I would like a pre-build application instead of having to develop one if at all possible.
Any advice would be much appreciated.
In the Windows Azure SDK 2.0 release was added "Message Browse" features
http://msdn.microsoft.com/en-us/library/dn198643.aspx
You can use the features above to try to monitoring queue to obtain requested data.
Paolo.

How to use ganglia ui with flume?

I am interested in monitoring my multi-agent apache flume setup. I have enabled the inbuilt ganglia server which provides me the flume metrics through JSON data. Now I am interested in viewing these info in graphs/charts. TO achieve this I am using ganglia web ui, I have these questions - Do I have to install gmond and gmetad to achieve it, if not then how I will use the existing ganglia info with the ganglia web ui ?
Thanks in advance.
You'll need both, IMHO. Moreover, I think Flume can communicate directly to a gmond by appending some stuff in JAVA_OPTS, see hortonworks docs.
You'll need gmetad because it stores your data in RRD files, and the web UI query on it to display graphs.
Graphite can do the job too.

Capture / Monitor system data of application server in Graphite

I am using graphite server to capture my metrics data and bring down to graphs. I have 4 application servers which is load balancer setup. My aim is capture system data such as cpu usage, memory usage, disk load, etc., for all the 4 application servers. I setup an graphite environment in a separate server and i wanted to push the system data for all the applications servers to graphite and get it display as graphs. I don't know what needs to be done for feeding system data to graphite. My thinking was to install statsd in all application servers and feed the system data to graphite but looks like statsd does not support system data rather application data.
Can anyone help me to catch the right track. Thanks in advance.
Running collectd with a graphite agent would be an excellent start to gather the information your after.
There is an almost unlimited amount of ways to get your data into graphite.
You can find a list of tools that have known to work very well with graphite on the readthedocs.org page: http://graphite.readthedocs.org/en/0.9.10/tools.html
There is also an example script that gathers load average from the system in the carbon project: example-client.py

Resources