Any API to get services level performance metrics - monitoring

We like to know if there is a way to get the service level monitoring parameters like ( how many request / sec, latency / request , etc...) from the Kubernetes Service?.
I understand that if Kubernetes service is created with type LoadBalancer, then we can leverage the cloud provider interfaces for those metrics; However I like to know if there is any provision to get the above said metrics at service level or container level without any latency.?.

Not presently. This is being tracked in issue 9215. As is pointed out in the issue, use of iptables makes this non-trivial.

Related

How to collect messages (total number and size) between microservices?

I have a microservices based software architecture.
There is a php application which orchestrates the communication among microservices and the application's whole logic.
I need to simulate the communication between microservices as a graph.
There will be edges with weights , which will represent the affinities between microservices.
I am searching for a tool in order to collect all messages and their size.
I have read that there are distibuted tracing systems like Zipkin which i have already deployed, and could accomplish this task.
But, i cannot find how to collect the messages i want.
This is the php library i used for the instrumentation of my app
[https://github.com/openzipkin/zipkin-php]
Any ideas about other tools or how to use Zipkin differently to achieve my goal?
Let me add to this thread my three bits. Speaking of Envoy, yes, when attached to your application it adds a lot of useful features from observability bucket, e.g. network level statistics and tracing.
Here is the question, have you considered running your legacy apps inside service mesh, like Istio ?.
Istio simplifies deployment and configuration of Envoy for you. It injects sidecar container (istio-proxy, in fact Envoy instance) to your Pod application, and gives you these extra features like a set of service metrics out of the box*.
Example: Stats produced by Envoy in Prometheus format, like istio_request_bytes are visualized in Kiali Metrics dashboard for inbound traffic as request_size (check screenshot)
*as mentioned by #David Kruk, you still needs to have Prometheus server deployed in your cluster to be able to pull these metrics to Kiali dashboards.
You can learn more about Istio here. There is also a dedicated section on how to visualize metrics collected by Istio (e.g. request size).

How to configure Prometheus in a multi-location scenario?

I love using Prometheus for monitoring and alerting. Until now, all my targets (nodes and containers) lived on the same network as the monitoring server.
But now I'm facing a scenario, where we will deploy our application stack (as a bunch of Docker containers) to several client machines in thier networks. Nearly all of the clients networks are behind a firewall or NAT. So scraping becomes quite difficult.
As we're still accountable for our stack, I'd like to have a central montioring server, altering and dashboards.
I was wondering what could be the best architecture if want to implement it with Prometheus, but I couldn't find any convincing approaches. My ideas so far:
Use a Pushgateway on our side and push all data out of the client networks. As the docs state, it's not intended that way: https://prometheus.io/docs/practices/pushing/
Use a federation setup (https://prometheus.io/docs/prometheus/latest/federation/): Place a Prometheus server in every client network behind a reverse proxy (to enable SSL and authentication) and aggregate relevant metricts there. Open/forward just a single port for federation scraping.
Other more experimental setups, such as SSH Tunneling (e.g. here https://miek.nl/2016/february/24/monitoring-with-ssh-and-prometheus/) or VPN!?
Thank you in advance for your help!
Nobody posted an answer so I will try to give my opinion on the second choice because that's what I think I would do in your situation.
The second setup seems the most flexible, you have access to the datas and only need to open one port on for the federating server, so it should still be secure.
One other bonus of this type of setup is that even if the firewall stop working for a reason or another, you will still have a prometheus scraping, you will have an alert because you won't be able to access the server(s) but when the connexion comes again you will have all the datas. You won't have a hole in the grafana dashboards because there was no datas, apart during the incident.
The issue with this setup is the fact that you need to maintain a number of server equivalent to the number of networks. A solution for this would be to have a packer image or maybe an ansible playbook to deploy.

Collect metrics from dockers in Kubernetes

I work with Docker and Kubernetes.
I would like to collect application specific metrics from each Docker.
There are various applications, each running in one or more Dockers.
I would like to collect the metrics in JSON format in order to perform further processing on each type of metrics.
I am trying to understand what is the best practice, if any and what tools can I use to achieve my goal.
Currently I am looking into several options, none looks too good:
Connecting to kubectl, getting a list of pods, performing a command (exec) at each pod to cause the application to print/send JSON with metrics. I don't like this option as it means that I need to be aware to which pods exist and access each, while the whole point of having Kubernetes is to avoid dealing with this issue.
I am looking for Kubernetes API HTTP GET request that will allow me to pull a specific file.
The closest I found is GET /api/v1/namespaces/{namespace}/pods/{name}/log and it seems it is not quite what I need.
And again, it forces me to mention each pop by name.
I consider to use ExecAction in Probe to send JSON with metrics periodically. It is a hack (as this is not the purpose of Probe), but it removes the need to handle each specific pod
I can't use Prometheus for reasons that are out of my control but I wonder how Prometheus collects metric. Maybe I can use similar approach?
Any possible solution will be appreciated.
From an architectural point of view you have 2 options here:
1) pull model: your application exposes metrics data through a mechanisms (for instance using the HTTP protocol on a different port) and an external tool scrapes your pods at a timed interval (getting pod addresses from the k8s API); this is the model used by prometheus for instance.
2) push model: your application actively pushes metrics to an external server, tipically a time series database such as influxdb, when it is most relevant to it.
In my opinion, option 2 is the easiest to implement, because:
you don't need to deal with k8s APIs in order to discover pods addresses;
you don't need to create a local storage layer to store your metrics (because you push them one by one as they occour);
But there is a drawback: you need to be careful how you implement this, it could cause your API to become slower and you might need to deal with asynchronous calls to your metrics server.
This is obviously a very generic answer, but I hope it could point you in the right direction.
Pity you can not use Prometheus, but it's a good lead for what can be done in this scope. What Prom does is as follows :
1: it assumes that metrics you want to scrape (collect) are exposed with some http endpoint that Prometheus can access.
2: it connects to kubernetes api to perform a discovery of endpoints to scrape metrics from (there is a config for it, but generaly it means it has to be able to connect to the API and list services/deployments/pods and analyze their annotations (as they have info about metrics endpoints) to compose a list of places to scrape data from
3: periodicaly (15s, 60s etc.) it connects to the endpoints and collects the exposed metrics.
That's it. Rest is storage/postprocessing. The kube related part might be a significant amount of work to do though, so it would be way better to go with something that already exists.
Sidenote: while this is generaly a pull based model, there are cases where pull is not possible (vide short lived scripts like php), that is where Prometheus pushgateway comes into play to allow pushing metrics to a place where Prometheus will pull from.

microservices & service discovery with random ports

My question is related to microservices & service discovery of a service which is spread between several hosts.
The setup is as follows:
2 docker hosts (host A & host B)
a Consul server (service discovery)
Let’s say that I have 2 services:
service A
service B
Service B is deployed 10 times (with random ports): 5 times on host A and 5 times on host B.
When service A communicates with service B, for example, it sends a request to serviceB.example.com (hard coded).
In order to get an IP and a port, service A should query the Consul server for an SRV record.
It will get 10 ip:port pairs, for which the client should apply some load-balancing logic.
Is there a simpler way to handle this without me developing a client resolver (+LB) library for that matter ?
Is there anything like that already implemented somewhere ?
Am I doing it all wrong ?
There are a few options:
Load balance on client as you suggest for which you'll either need to find a ready-build service discovery library that works with SRV records and handles load balancing and circuit breaking. Another answer suggested Netflix' ribbon which I have not used and will only be interesting if you are on JVM. Note that if you are building your own, you might find it simpler to just use Consul's HTTP API for discovering services than DNS SRV records. That way you can "watch" for changes too rather than caching the list and letting it get stale.
If you don't want to reinvent that particular wheel, another popular and simple option is to use a HAProxy instance as the load balancer. You can integrate it with consul via consul-template which will automatically watch for new/failed instances of your services and update LB config. HAProxy then provides robust load balancing and health checking with a lot of options (http/tcp, different balancing algorithms, etc). One possible setup is to have a local HAProxy instance on each docker host and a fixed port assigned statically to each logical service (can store it in Consul KV) so you connect to localhost:1234 for service A for example and localhost:2345 for service B. Local instance means you don't pay for extra round trip to loadbalancer instance then to the actual service instance but this might not be an issue for you.
I suggest you to check out Kontena. It will solve this kind of problem out of the box. Every service will have an internal DNS that you can use in communication between services. Kontena has also built-in load balancer that is very easy to use making it very easy to create and scale micro services.
There are also lot's of built-in features that will help developing containerized applications, like private image registry, VPN access to running services, secrets management, stateful services etc.
Kontena is open source project and the code is visible on Github
If you look for a minimal setup, you can wrap the values you receive from Consul via ribbon, Netflix' client based load balancer.
You will find it as a module for Spring Cloud.
I didn't find an up-to-date standalone example, only this link to chrisgray's dropwizard-consul implementation that is using it in a Dropwizard context. But it might serve as a starting point for you.

Best practice for rate limiting users of a REST API?

I am putting together a REST API and as I'm unsure how it will scale or what the demand for it will be, I'd like to be able to rate limit uses of it as well as to be able to temporarily refuse requests when the box is over capacity or if there is some kind of slashdotted scenario.
I'd also like to be able to gracefully bring the service down temporarily (while giving clients results that indicate the main service is offline for a bit) when/if I need to scale the service by adding more capacity.
Are there any best practices for this kind of thing? Implementation is Rails with mysql.
This is all done with outer webserver, which listens to the world (i recommend nginx or lighttpd).
Regarding rate limits, nginx is able to limit, i.e. 50 req/minute per each IP, all over get 503 page, which you can customize.
Regarding expected temporary down, in rails world this is done via special maintainance.html page. There is some kind of automation that creates or symlinks that file when rails app servers go down. I'd recommend relying not on file presence, but on actual availability of app server.
But really you are able to start/stop services without losing any connections at all. I.e. you can run separate instance of app server on different UNIX socket/IP port and have balancer (nginx/lighty/haproxy) use that new instance too. Then you shut down old instance and all clients are served with only new one. No connection lost. Of course this scenario is not always possible, depends on type of change you introduced in new version.
haproxy is a balancer-only solution. It can extremely efficiently balance requests to app servers in your farm.
For quite big service you end-up with something like:
api.domain resolving to round-robin N balancers
each balancer proxies requests to M webservers for static and P app servers for dynamic content. Oh well your REST API don't have static files, does it?
For quite small service (under 2K rps) all balancing is done inside one-two webservers.
Good answers already - if you don't want to implement the limiter yourself, there are also solutions like 3scale (http://www.3scale.net) which does rate limiting, analytics etc. for APIs. It works using a plugin (see here for the ruby api plugin) which hooks into the 3scale architecture. You can also use it via varnish and have varnish act as a rate limiting proxy.
I'd recommend implementing the rate limits outside of your application since otherwise the high traffic will still have the effect of killing your app. One good solution is to implement it as part of your apache proxy, with something like mod_evasive

Resources