I love using Prometheus for monitoring and alerting. Until now, all my targets (nodes and containers) lived on the same network as the monitoring server.
But now I'm facing a scenario, where we will deploy our application stack (as a bunch of Docker containers) to several client machines in thier networks. Nearly all of the clients networks are behind a firewall or NAT. So scraping becomes quite difficult.
As we're still accountable for our stack, I'd like to have a central montioring server, altering and dashboards.
I was wondering what could be the best architecture if want to implement it with Prometheus, but I couldn't find any convincing approaches. My ideas so far:
Use a Pushgateway on our side and push all data out of the client networks. As the docs state, it's not intended that way: https://prometheus.io/docs/practices/pushing/
Use a federation setup (https://prometheus.io/docs/prometheus/latest/federation/): Place a Prometheus server in every client network behind a reverse proxy (to enable SSL and authentication) and aggregate relevant metricts there. Open/forward just a single port for federation scraping.
Other more experimental setups, such as SSH Tunneling (e.g. here https://miek.nl/2016/february/24/monitoring-with-ssh-and-prometheus/) or VPN!?
Thank you in advance for your help!
Nobody posted an answer so I will try to give my opinion on the second choice because that's what I think I would do in your situation.
The second setup seems the most flexible, you have access to the datas and only need to open one port on for the federating server, so it should still be secure.
One other bonus of this type of setup is that even if the firewall stop working for a reason or another, you will still have a prometheus scraping, you will have an alert because you won't be able to access the server(s) but when the connexion comes again you will have all the datas. You won't have a hole in the grafana dashboards because there was no datas, apart during the incident.
The issue with this setup is the fact that you need to maintain a number of server equivalent to the number of networks. A solution for this would be to have a packer image or maybe an ansible playbook to deploy.
Related
tldr; does docker swarm have a forceful and centered proxy setting that explicitly proxies all internet traffic in all services that is hosted in the cluster? Or any other tip of how to go about using a global proxy solution in a swarm cluster...?
Obs! this is not a question about a reversed proxy.
I have a docker swarm cluster (moving to Kubernatives as a solution is off-topic)
I have 3 managers and 3 workers, I label the workers accordingly to the expected containers they can host. The cluster only deploys docker swarm services, when I write "container" in this writing I'm referring to a docker swarm service container.
One of the workers is labelless, though active, and therefore does not host any containers to any service. If I would label the worker to allow it to host any container, then I will suffer issues in different firewalls that I don't always control, because the IP simply is not allowed.
This causes the problem for me that I can't do horizontal scaling, because when I add a new worker to the cluster, I also add a new IP that the requests can originate from. To update the many firewalls that would need to be updated because of a horizontal scaling is quite large, and simply not an option.
In my attempt to solve this on my own, I did what every desperate developer does and googled for a solution... and there is a simple and official documentation to be able to achieve this: https://docs.docker.com/network/proxy/
I followed the environment variables examples on that page. Doing so did however not really help, none of the traffic goes through the proxy I configured. After some digging, I noticed that this is due to nodejs (all services are written using nodejs), ignoring the proxy settings set by the environment. To solve that nodejs can use these proxy settings, I have to refactor a lot of components in a lot of services... a workload that is quite trumendus and possibly dangerous to perform given the different protocols and ports I use to connect to different infrastructural services outside the cluster...
I expect there to be a better solution for this, I expect there to be a built in functionality that forces all internet access from the containers to go through this proxy, a setting I don't have to make in the code, in my implementations. I expect there to be a wrapping solution that I can control in a central manner.
Now reading this again, I think maybe I should have tested the docker client configuration on the same page to see if it has the desired effect I'm requiring, but I assume they both would have the same outcome, being described on the same page with no noticeable difference written in the documentation.
My question is, is there a solution, that I just don't seem to be able to find, that wraps the proxy functionality around all the services? or is it a requirement to solve these issues in the implementation itself?
My thought is to maybe depend on an image, that in its turn depends on the nodejs image that I use today - that is responsible for this wrapping functionality, though still on an implantation level. Doing so would however still force the inheriting of a distributed solution of this kind - if I need to change the proxy configurations, then I need to change them everywhere, and redeploy everything... given a less complex solution without an in common data access layer.
The Google Cloud Platform Kubernetes Engine based backend deployment I work on has between 4-60 nodes running at all times, spanning two different services.
I want to interface with an API that employs IP whitelisting however, which would mean that all outgoing requests would have to be funneled through one singular IP address.
How do I do this? The deployment uses an Nginx Ingress controller, which doesn't allow many options when it comes to the egress part of things.
I tried setting up a VM outside of the deployment, but still on GCP in the same region, and was unable to set up a forward proxy. At least, not one that I could connect to off my local device. Not sure if this was because of GCP's firewall or anything of that sort. This was using Squid, as well Apache, with no success in either.
I also looked at the Cloud NAT option, but it seems like I would have to recreate all the services, CI/CD pipelines, and DNS settings etc. I would ideally avoid that, as it would be a few days worth of work and would call for some downtime of the systems as well.
Ideally I would have a working forward proxy. I tried looking for Docker images that would function as one, but that does not seem to be a thing, sadly. SSHing into a VM to set up such a proxy hasn't led to success yet, either.
You have already found the solution, you have to rebuild things using either Cloud NAT or an equivalent solution made yourself. Even that is relatively recent and I've not actually tried it myself, as recently as a 6 months ago we were told this was not supported for GKE. Our solution was the proxy idea you mentioned, an HTTP proxy running outside of GKE and directing things through it at the app code level rather than infrastructure. It was not fun.
I’m doing on-prem deployments using docker swarm and I need application and DB high availability.
As far as application HA is concerned, it works great within docker (service discovery and load balancing), but I’m not sure how to use it on my network. I mean how can I assign a virtual IP to all of my docker managers so that if any of them goes down, that virtual IP automatically points to the other docker manager in the cluster. I don’t want to have a single point of failure in my architecture, that’s why I’m not inclined to use any (single) reverse proxy solution in front of my swarm cluster (because to my understanding, if nginx/HAProxy goes down, the whole system goes into abyss. I would love to know that I’m wrong).
Secondly, I use WebSockets in my application for push notifications which doesn’t behave normally with all the load balancing stuff because socket handshakes get distorted.
I want a solution to these problems without writing anything in code (HA-specific and non-generic like hard coding IPs etc). Any suggestions? I hope I explained my problem correctly.
Docker Flow Proxy or Traefik can be placed on a set of swarm nodes that you want to receive traffic for incoming connections, and use DNS routing to get packets to the correct containers. Both have sticky sessions option (I know Docker Flow does, not sure about Traefik).
Then you can either:
If your incoming connections are just client HTTP/S requests, you can use DNS Round Robin with multiple A records, which works great, or
By an expensive hardware fault tolerant reverse proxy like F5
Use some network-layer IP failover that is at the OS and physical network level (not related to Docker really), but I'm not sure how well that would work with Swarm.
Number 2 is the typical solution in private datacenters that need full HA at all layers.
I am developing a spring boot application with netflix cloud stack. and deploying each module(microservice) in separate docker container. Structure is as follows:
Eureka
Zuul
Business logic in Microservices
MySQL
Angular4 UI
Keycloak - User management and Authentication
ELK - for log maintenance
Hystrix
Zipkin
Okay so after facing lot of problems and spending whole lot of network bandwidth on googling on the matter I have deployed in following way, What I need to know is, if it is correct way to do it ?
The limitation here is that I have been provided with 2 hosts to test this configuration and further action plan is not there yet.
So here is what I have done: I have not yet used full stack which I mentioned.
Server 1
Eureka
Zuul
ELK
Server2
Keycloak
Business Logic microservices
MySQL
Anguar4 UI
Haven't configured and used Hystrix and Zipkin yet.
So I have given the IP:PORT of the Server1 in the Eureka configuration of all the microservices which needs to register on Eureka. Same goes for Zuul(given the IP:PORT of Eureka).
In the Angular4 UI I have given the URL:PORT of Zuul deployment, because all the services will be called through Zuul.
This I understand is correct because Services needs to know where Eureka is located and rest can be managed through Eureka.
Now my key question is, because MySQL, ELK can't be registered on Eureka, so is it correct to give IP:PORT of MySQL and ELK wherever required ?
Same goes with the configuration of ELK, with ELK my requirement is also that all the logs are located at common place for this I have used docker, volume mounting but I don't know how to accomplish this on multi host environment, I can only make dockers out put logs on external volume which can then probably be accessed by ELK over URL, haven't tested this configuration yet.
If so then isn't this configuration not so Independent if we think it will be able to manage itself ?
I have configured my docker compose to use "network_mode": host so host to host docker communication can be done.
Again All I need to know is, is my configuration/architecture correct for multi-host environment and in future for Cloud environments ?
If Not, then please kindly guide me to correct path.
Thank you!
p.s. excuse me for my English and Grammar, I have tried best to my knowledge to make it understandable, please point out and ask questions if you need more input from my side.
This kind of question is really beyond the scope of Stackoverflow, but it really sounds like you haven't come to understand the pieces of your infrastructure yet.
The Netflix stack (Eureka/Zuul etc) and things like Zipkin, Hystrix and the whole ELK stack only start to make sense when you have really large deployments of many services in multi-site, with many hosts where managing "by hand" becomes a real problem, where you have a lot of moving parts in the architecture where something can break and your system still needs to keep running, like a host disconnects or a database node dies.
With 2 hosts and a couple of services it doesn't make sense to introduce all this complexity, it will just overwhelm and confuse you (it already has). If one of your 2 hosts dies even if you're using Eureka and Zuul and it will not save you. The whole system will go down.
Throw out all those latest buzzword libraries (you're not Netflix yet) and just think through a simple architecture where you will run your services say on one host and database on another host (no need for Eureka or Zuul). Think of a shared location for logs and organise a nice, easy to use folder structure to store them so they're easy to find and search with simple command line tools that are much better than Kibana (which is TERRIBLE to look at logs).
Stay simple and only introduce new pieces when you feel it is getting difficult to manage.
I'm compiling things on a raspberry pi and it's not going fast enough, even when I use my desktop's CPU to help.
I could just install distcc the old fashioned way on a cloud server, but what if someday I was to real quick spin up a bunch of servers for a minute with docker machine?
distccd can use SSH auth, but I don't see a good way to run both SSH and distccd. And it seems there will be hassle with managing ssh keys.
What if configured distcc to only accept the WAN IP of my house (and then turned the image off as soon as it was done)?
But it'd be great to make something other raspberry pi users could easily spin up.
You seem to already know the answer to this, set up distcc to use SSH. This will ensure encrypted communication between your distcc client and the distcc servers you have deployed as Docker images in the cloud. You have highlighted that the cost of doing this would be spending time to set up an SSH key that would be accepted by all of your Docker images. From memory this key could be the same for all the Docker nodes, as long as they all had the same user name using the same key. Is that really such a complex task?
You ask for a slightly less secure option for building your Compile Farm. Well limiting things based on the Internet accessible IP address for your house would limit the scope and increase the complexity of others using your build cluster. Someone might spoof the special IP address and get access to your distcc servers but that would just cost you their runtime. The larger concern would be that your code could be transmitted in plain text over the internet to these distcc servers. If that is not a big concern then it could be considered low risk.
An alternative might be to setup a secure remote network of docker nodes and set up VPN access to them. This would bind your local machine to the remote network and you could consider the whole thing to be a secured LAN. If it is considered safe to have the Docker nodes talk between themselves in an unencrypted manner within the cloud, it should be as secure to have a VPN link to them and do the same.
They best option might be to dig out a some old PCs and set those up as local distcc servers. Within a LAN their is no need for security.
You mention a wish to share this with other Raspberry PI users. There have been other Public Compile Farms in the past but many of them have fallen out of favour. Distributing such things publicly, as computational projects such as BOINC do, works poorly because the network latency and transfer rates can slow the builds significantly.