What is the difference between statsd client and the statsd daemon? - monitoring

I have an application that I wish to monitor graphically.
I am using this StatsD client. I am using Graphite as the backend. I have a question about the basic workflow:
We use the StatsD client in order to include metrics within our application. These metrics are then sent in the form of UDP packets (usually). Graphite (specifically Carbon within Graphite) captures these packets and stores them in the Whisper database as time-series data.
What exactly then, is the role of the StatsD daemon? I have written a working application using only the StatsD client and Graphite. Where am I missing the usage of StatsD daemon?

Had the same question, so I'm going to answer it here even thogh the post is 7 months old.
From what I could gather (as explained here), a StatsD Deamon is synonymous to a StatsD Server. In your case, it's Carbon/Graphite or maybe a StatsD specific component within your Graphite Stack.
In my company, for instance, we use the StatsD Beats Daemon within the ELK-Stack.

Related

How to collect messages (total number and size) between microservices?

I have a microservices based software architecture.
There is a php application which orchestrates the communication among microservices and the application's whole logic.
I need to simulate the communication between microservices as a graph.
There will be edges with weights , which will represent the affinities between microservices.
I am searching for a tool in order to collect all messages and their size.
I have read that there are distibuted tracing systems like Zipkin which i have already deployed, and could accomplish this task.
But, i cannot find how to collect the messages i want.
This is the php library i used for the instrumentation of my app
[https://github.com/openzipkin/zipkin-php]
Any ideas about other tools or how to use Zipkin differently to achieve my goal?
Let me add to this thread my three bits. Speaking of Envoy, yes, when attached to your application it adds a lot of useful features from observability bucket, e.g. network level statistics and tracing.
Here is the question, have you considered running your legacy apps inside service mesh, like Istio ?.
Istio simplifies deployment and configuration of Envoy for you. It injects sidecar container (istio-proxy, in fact Envoy instance) to your Pod application, and gives you these extra features like a set of service metrics out of the box*.
Example: Stats produced by Envoy in Prometheus format, like istio_request_bytes are visualized in Kiali Metrics dashboard for inbound traffic as request_size (check screenshot)
*as mentioned by #David Kruk, you still needs to have Prometheus server deployed in your cluster to be able to pull these metrics to Kiali dashboards.
You can learn more about Istio here. There is also a dedicated section on how to visualize metrics collected by Istio (e.g. request size).

What is recommended solution for monitoring heterogeneous infrastructure?

I am looking for monitoring tool for the following use cases:
Collect basic metrics about virtual machine (cpu usage, memory usage, i/o, available space)
Extract metrics from SQL Server (probably running some queries)
Extract information from external service about processing i.e how many processing are currently running and for how long. I am thinking about writing python scripts, but don't know how to combine with monitoring tool
Have the ability to plot charts and manage alerts and it will nice to have ability to send not only mails, but send message to slack/ms teams.
I was thing about Prometheus, because it has wmi_exporter, node_exporter, sql exporter, alert manager with possibility to send notifications to multiple destinations, but I don't know what to do with this external service and python scripts.
Any suggestions?
Prometheus can definitely do what you say you need done. Some of it may not be trivial, but you can definitely fill in the blanks yourself.
E.g. you can get machine metrics basically out of the box by firing up a node_exporter and having it scraped by Prometheus, but I don't think it has e.g. information on all running processes. The latter might require you to write an agent/exporter: a simple web server that exposes metrics on /metrics; there exists a Python client library to help with that. Or have said processes (assuming they're your code) push metrics to a Pushgateway instead, if they're short lived batch jobs.
Oh, and for charts/dashboards you probably want Grafana, as Prometheus' abilities in that area are rather limited and Grafana integrates rather well with Prometheus.

How to send server metrics data to statsd?

Our monitoring stack is Grafana + InluxDB + statsD.
We use it for application monitoring.
We need to add server metrics (CPU, memory, network connections, etc...) to Grafana, so I'm guessing we'll need some agent to collect server metrics and pass to statsD.
Do you know of any agent that can do that? or any other way to implement this?
Probably the simplest option for you would be to switch over to using collectd https://collectd.org/, and replace statsd with the statsd plugin for collectd https://collectd.org/wiki/index.php/Plugin:StatsD
Check https://my-netdata.io.
It can monitor a ton of things, it is a statsd server by itself, it can visualize all metrics by itself and can push all metrics to graphite, opentsdb, prometheus, influxdb, etc.
Free and open source: GPL v3+.
EDIT: it also allows you to send statsd metrics from shell scripts: https://github.com/firehol/netdata/wiki/statsd#sending-statsd-metrics-from-shell-scripts

How do I retrieve data from statsd?

I'm glossing over their documentation here :
http://www.rubydoc.info/github/github/statsd-ruby/Statsd
And there's methods for recording data, but I can't seem to find anything about retrieving recorded data. I'm adopting a projecting with an existing statsd addition. It's host is likely a defunct URL. Perhaps, is the host where those stats are recorded?
The statsd server implementations that Mircea links just take care of receiving, aggregating metrics and publishing them to a backend service. Etsy's statsd definition (bold is mine):
A network daemon that runs on the Node.js platform and listens for
statistics, like counters and timers, sent over UDP or TCP and sends
aggregates to one or more pluggable backend services (e.g.,
Graphite).
To retrieve the recorded data you have to query the backend. Check the list of available backends. The most common one is Graphite.
See also this question: How does StatsD store its data?
There are 2 parts to statsd: a client and a server.
What you're looking at is the client part. You will not see functionality related to retrieving the data as it's not there - it normally is on the server side.
Here is a list of statsd server implementations:
http://www.joemiller.me/2011/09/21/list-of-statsd-server-implementations/
Research and pick one that fits your needs.
Statsd originally started at etsy: https://github.com/etsy/statsd/wiki

Capture / Monitor system data of application server in Graphite

I am using graphite server to capture my metrics data and bring down to graphs. I have 4 application servers which is load balancer setup. My aim is capture system data such as cpu usage, memory usage, disk load, etc., for all the 4 application servers. I setup an graphite environment in a separate server and i wanted to push the system data for all the applications servers to graphite and get it display as graphs. I don't know what needs to be done for feeding system data to graphite. My thinking was to install statsd in all application servers and feed the system data to graphite but looks like statsd does not support system data rather application data.
Can anyone help me to catch the right track. Thanks in advance.
Running collectd with a graphite agent would be an excellent start to gather the information your after.
There is an almost unlimited amount of ways to get your data into graphite.
You can find a list of tools that have known to work very well with graphite on the readthedocs.org page: http://graphite.readthedocs.org/en/0.9.10/tools.html
There is also an example script that gathers load average from the system in the carbon project: example-client.py

Resources