The Google Cloud Platform Kubernetes Engine based backend deployment I work on has between 4-60 nodes running at all times, spanning two different services.
I want to interface with an API that employs IP whitelisting however, which would mean that all outgoing requests would have to be funneled through one singular IP address.
How do I do this? The deployment uses an Nginx Ingress controller, which doesn't allow many options when it comes to the egress part of things.
I tried setting up a VM outside of the deployment, but still on GCP in the same region, and was unable to set up a forward proxy. At least, not one that I could connect to off my local device. Not sure if this was because of GCP's firewall or anything of that sort. This was using Squid, as well Apache, with no success in either.
I also looked at the Cloud NAT option, but it seems like I would have to recreate all the services, CI/CD pipelines, and DNS settings etc. I would ideally avoid that, as it would be a few days worth of work and would call for some downtime of the systems as well.
Ideally I would have a working forward proxy. I tried looking for Docker images that would function as one, but that does not seem to be a thing, sadly. SSHing into a VM to set up such a proxy hasn't led to success yet, either.
You have already found the solution, you have to rebuild things using either Cloud NAT or an equivalent solution made yourself. Even that is relatively recent and I've not actually tried it myself, as recently as a 6 months ago we were told this was not supported for GKE. Our solution was the proxy idea you mentioned, an HTTP proxy running outside of GKE and directing things through it at the app code level rather than infrastructure. It was not fun.
Related
My organization manages systems where each client is provisioned a VPS and then their tech stack is spun up on that system via Docker Compose.
Data is stored on-system, using Docker Compose volumes. None of the fancy named storage - just good old direct path volumes.
While this solution is workable, the problem is that this method does not scale. We can always give the VPS more CPU/Memory but that does not fix the underlying issues.
Staging / development environments must be brought up manually - and there is no service redundancy. Hot swapping is impossible with our current system.
Kubernetes has been pitched to me to solve our problems, but honestly I have no idea where to begin - most of the documentation is obtuse and I have failed to find somebody with our particular predicament.
The end goal would be to have just a few high-spec machines running Kubernetes - with redundancy, staging, and the ability to spin up new clients as necessary (without having to provision additional machines or external IPs).
What specific tools would my organization need to use to achieve this goal?
Are there any tools that would allow us to bring over our existing Docker Compose stacks into Kubernetes?
Where to begin: given what you're telling us, I would first look into my options to implement some SDS.
You're currently using local volumes, which you probably won't be able to do with Kubernetes - or at least shouldn't, if you don't want to bind your containers to a unique node.
The most easy way - while not necessarily the one I would recommend - would be to use some NFS servers. Even better: with some DRBD, pacemaker / corosync, using a VIP for failover -- or the FreeBSD way: hastd, carp, ifstated, maybe some zfs. You would probably have to deploy distinct systems scaling your Kubernetes cluster, distributing IOs, ... a single NFS server doesn't last long without its load going over 50 and iowaits spiking ...
A better way would be to look into actual SDS solutions. One I could recommend is Ceph, though there's a lot of new solutions I'm less familiar with ... and there's GlusterFS I would definitely avoid. An easy way to deploy Ceph would be to use ceph-ansible.
Given what corporate hardware you have at your disposal, maybe you would have some NetApp or equivalent, something that can implement NFS shares, and/or some iSCSI gateways.
Now, those are all solutions you could run on the side, although note that you would also find "CNS" solutions (container native), which are meant to be deployed on top of Kubernetes. Ceph clusters can be managed using Rook. These can be interesting, though in terms of maintenance and operations, it requires good knowledge of both the solution you operate and kubernetes/containers in general: troubleshooting issues and fixing outages may not be as easy as a good-old bare-meta/VM setup. For a first Kubernetes experience: I would refrain myself. When you'll feel comfortable enough, go ahead.
In any cases, another critical consideration before deploying your cluster would be the network that would host your installation. Consider that Kubernetes should not be directly deployed on public instances: you would probably want to have some private VLAN, maybe an internal DNS, a local resitry (could be Kubernetes-hosted), or other tools such as an LDAP server, some SMTP relay, HTTP cache/proxies, loadbalancers to put in front of your API, ...
Once you've made up your mind regarding those issues, you can look into deploying a Kubernetes cluster using tools such as Kubespray (ansible) or Kops (uses Terraform, and thus requires some cloud API, eg: aws). Both projects are part of the Kubernetes project and maintained by its community. Kubespray would cover all scenarios (IAAS & bare-metal), integrate with popular SDS out of the box, can ship with various ingress controllers, ... overall offers good defaults, and lots of variables to customize your installation.
Start with a 3-master 2-workers cluster, make sure the resulting cluster matches what you would expect.
Before going to prod, take your time to properly translate your existing configurations. Sometime, refactoring code or images could be worth it.
Going to prod, consider adding a group of "infra" nodes: if you want to host some logging solution or other internal services that are somewhat critical to users and shouldn't suffer outages caused by end-users workloads (eg: ingress routers, monitoring, logging, integrated registry, ...).
Kubespray: https://github.com/kubernetes-sigs/kubespray/
Kops: https://github.com/kubernetes/kops
Ceph: https://ceph.com/en/discover/
Ceph Ansible: https://github.com/ceph/ceph-ansible
Rook (Ceph CNS): https://github.com/rook/rook
tldr; does docker swarm have a forceful and centered proxy setting that explicitly proxies all internet traffic in all services that is hosted in the cluster? Or any other tip of how to go about using a global proxy solution in a swarm cluster...?
Obs! this is not a question about a reversed proxy.
I have a docker swarm cluster (moving to Kubernatives as a solution is off-topic)
I have 3 managers and 3 workers, I label the workers accordingly to the expected containers they can host. The cluster only deploys docker swarm services, when I write "container" in this writing I'm referring to a docker swarm service container.
One of the workers is labelless, though active, and therefore does not host any containers to any service. If I would label the worker to allow it to host any container, then I will suffer issues in different firewalls that I don't always control, because the IP simply is not allowed.
This causes the problem for me that I can't do horizontal scaling, because when I add a new worker to the cluster, I also add a new IP that the requests can originate from. To update the many firewalls that would need to be updated because of a horizontal scaling is quite large, and simply not an option.
In my attempt to solve this on my own, I did what every desperate developer does and googled for a solution... and there is a simple and official documentation to be able to achieve this: https://docs.docker.com/network/proxy/
I followed the environment variables examples on that page. Doing so did however not really help, none of the traffic goes through the proxy I configured. After some digging, I noticed that this is due to nodejs (all services are written using nodejs), ignoring the proxy settings set by the environment. To solve that nodejs can use these proxy settings, I have to refactor a lot of components in a lot of services... a workload that is quite trumendus and possibly dangerous to perform given the different protocols and ports I use to connect to different infrastructural services outside the cluster...
I expect there to be a better solution for this, I expect there to be a built in functionality that forces all internet access from the containers to go through this proxy, a setting I don't have to make in the code, in my implementations. I expect there to be a wrapping solution that I can control in a central manner.
Now reading this again, I think maybe I should have tested the docker client configuration on the same page to see if it has the desired effect I'm requiring, but I assume they both would have the same outcome, being described on the same page with no noticeable difference written in the documentation.
My question is, is there a solution, that I just don't seem to be able to find, that wraps the proxy functionality around all the services? or is it a requirement to solve these issues in the implementation itself?
My thought is to maybe depend on an image, that in its turn depends on the nodejs image that I use today - that is responsible for this wrapping functionality, though still on an implantation level. Doing so would however still force the inheriting of a distributed solution of this kind - if I need to change the proxy configurations, then I need to change them everywhere, and redeploy everything... given a less complex solution without an in common data access layer.
I love using Prometheus for monitoring and alerting. Until now, all my targets (nodes and containers) lived on the same network as the monitoring server.
But now I'm facing a scenario, where we will deploy our application stack (as a bunch of Docker containers) to several client machines in thier networks. Nearly all of the clients networks are behind a firewall or NAT. So scraping becomes quite difficult.
As we're still accountable for our stack, I'd like to have a central montioring server, altering and dashboards.
I was wondering what could be the best architecture if want to implement it with Prometheus, but I couldn't find any convincing approaches. My ideas so far:
Use a Pushgateway on our side and push all data out of the client networks. As the docs state, it's not intended that way: https://prometheus.io/docs/practices/pushing/
Use a federation setup (https://prometheus.io/docs/prometheus/latest/federation/): Place a Prometheus server in every client network behind a reverse proxy (to enable SSL and authentication) and aggregate relevant metricts there. Open/forward just a single port for federation scraping.
Other more experimental setups, such as SSH Tunneling (e.g. here https://miek.nl/2016/february/24/monitoring-with-ssh-and-prometheus/) or VPN!?
Thank you in advance for your help!
Nobody posted an answer so I will try to give my opinion on the second choice because that's what I think I would do in your situation.
The second setup seems the most flexible, you have access to the datas and only need to open one port on for the federating server, so it should still be secure.
One other bonus of this type of setup is that even if the firewall stop working for a reason or another, you will still have a prometheus scraping, you will have an alert because you won't be able to access the server(s) but when the connexion comes again you will have all the datas. You won't have a hole in the grafana dashboards because there was no datas, apart during the incident.
The issue with this setup is the fact that you need to maintain a number of server equivalent to the number of networks. A solution for this would be to have a packer image or maybe an ansible playbook to deploy.
I’m doing on-prem deployments using docker swarm and I need application and DB high availability.
As far as application HA is concerned, it works great within docker (service discovery and load balancing), but I’m not sure how to use it on my network. I mean how can I assign a virtual IP to all of my docker managers so that if any of them goes down, that virtual IP automatically points to the other docker manager in the cluster. I don’t want to have a single point of failure in my architecture, that’s why I’m not inclined to use any (single) reverse proxy solution in front of my swarm cluster (because to my understanding, if nginx/HAProxy goes down, the whole system goes into abyss. I would love to know that I’m wrong).
Secondly, I use WebSockets in my application for push notifications which doesn’t behave normally with all the load balancing stuff because socket handshakes get distorted.
I want a solution to these problems without writing anything in code (HA-specific and non-generic like hard coding IPs etc). Any suggestions? I hope I explained my problem correctly.
Docker Flow Proxy or Traefik can be placed on a set of swarm nodes that you want to receive traffic for incoming connections, and use DNS routing to get packets to the correct containers. Both have sticky sessions option (I know Docker Flow does, not sure about Traefik).
Then you can either:
If your incoming connections are just client HTTP/S requests, you can use DNS Round Robin with multiple A records, which works great, or
By an expensive hardware fault tolerant reverse proxy like F5
Use some network-layer IP failover that is at the OS and physical network level (not related to Docker really), but I'm not sure how well that would work with Swarm.
Number 2 is the typical solution in private datacenters that need full HA at all layers.
TL;DR Kubernetes allows all containers to access all other containers on the entire cluster, this seems to greatly increase the security risks. How to mitigate?
Unlike Docker, where one would usually only allow network connection between containers that need to communicate (via --link), each Pod on Kubernetes can access all other Pods on that cluster.
That means that for a standard Nginx + PHP/Python + MySQL/PostgreSQL, running on Kubernetes, a compromised Nginx would be able to access the database.
People used to run all those on a single machine, but that machine would have serious periodic updates (more than containers), and SELinux/AppArmor for serious people.
One can mitigate a bit the risks by having each project (if you have various independent websites for example) run each on their own cluster, but that seems wasteful.
The current Kubernetes security seems to be very incomplete. Is there already a way to have a decent security for production?
In the not-too-distant future we will introduce controls for network policy in Kubernetes. As of today that is not integrated, but several vendors (e.g. Weave, Calico) have policy engines that can work with Kubernetes.
As #tim-hockin says, we do plan to have a way to partition the network.
But, IMO, for systems with more moving parts, (which is where Kubernetes should really shine), I think it will be better to focus on application security.
Taking your three-layer example, the PHP pod should be authorized to talk to the database, but the Nginx pod should not. So, if someone figures out a way to execute an arbitrary command in the Nginx pod, they might be able to send a request to the database Pod, but it should be rejected as not authorized.
I prefer the application-security approach because:
I don't think the --links approach will scale well to 10s of different microservices or more. It will be too hard to manage all the links.
I think as the number of devs in your org grows, you will need fine grained app-level security anyhow.
In terms of being like docker compose, it looks like docker compose currently only works on single machines, according to this page:
https://github.com/docker/compose/blob/master/SWARM.md