How can I somewhat securely run distccd on a docker image in the cloud? - docker

I'm compiling things on a raspberry pi and it's not going fast enough, even when I use my desktop's CPU to help.
I could just install distcc the old fashioned way on a cloud server, but what if someday I was to real quick spin up a bunch of servers for a minute with docker machine?
distccd can use SSH auth, but I don't see a good way to run both SSH and distccd. And it seems there will be hassle with managing ssh keys.
What if configured distcc to only accept the WAN IP of my house (and then turned the image off as soon as it was done)?
But it'd be great to make something other raspberry pi users could easily spin up.

You seem to already know the answer to this, set up distcc to use SSH. This will ensure encrypted communication between your distcc client and the distcc servers you have deployed as Docker images in the cloud. You have highlighted that the cost of doing this would be spending time to set up an SSH key that would be accepted by all of your Docker images. From memory this key could be the same for all the Docker nodes, as long as they all had the same user name using the same key. Is that really such a complex task?
You ask for a slightly less secure option for building your Compile Farm. Well limiting things based on the Internet accessible IP address for your house would limit the scope and increase the complexity of others using your build cluster. Someone might spoof the special IP address and get access to your distcc servers but that would just cost you their runtime. The larger concern would be that your code could be transmitted in plain text over the internet to these distcc servers. If that is not a big concern then it could be considered low risk.
An alternative might be to setup a secure remote network of docker nodes and set up VPN access to them. This would bind your local machine to the remote network and you could consider the whole thing to be a secured LAN. If it is considered safe to have the Docker nodes talk between themselves in an unencrypted manner within the cloud, it should be as secure to have a VPN link to them and do the same.
They best option might be to dig out a some old PCs and set those up as local distcc servers. Within a LAN their is no need for security.
You mention a wish to share this with other Raspberry PI users. There have been other Public Compile Farms in the past but many of them have fallen out of favour. Distributing such things publicly, as computational projects such as BOINC do, works poorly because the network latency and transfer rates can slow the builds significantly.

Related

What is the "proper" way to migrate from Docker Compose to Kubernetes?

My organization manages systems where each client is provisioned a VPS and then their tech stack is spun up on that system via Docker Compose.
Data is stored on-system, using Docker Compose volumes. None of the fancy named storage - just good old direct path volumes.
While this solution is workable, the problem is that this method does not scale. We can always give the VPS more CPU/Memory but that does not fix the underlying issues.
Staging / development environments must be brought up manually - and there is no service redundancy. Hot swapping is impossible with our current system.
Kubernetes has been pitched to me to solve our problems, but honestly I have no idea where to begin - most of the documentation is obtuse and I have failed to find somebody with our particular predicament.
The end goal would be to have just a few high-spec machines running Kubernetes - with redundancy, staging, and the ability to spin up new clients as necessary (without having to provision additional machines or external IPs).
What specific tools would my organization need to use to achieve this goal?
Are there any tools that would allow us to bring over our existing Docker Compose stacks into Kubernetes?
Where to begin: given what you're telling us, I would first look into my options to implement some SDS.
You're currently using local volumes, which you probably won't be able to do with Kubernetes - or at least shouldn't, if you don't want to bind your containers to a unique node.
The most easy way - while not necessarily the one I would recommend - would be to use some NFS servers. Even better: with some DRBD, pacemaker / corosync, using a VIP for failover -- or the FreeBSD way: hastd, carp, ifstated, maybe some zfs. You would probably have to deploy distinct systems scaling your Kubernetes cluster, distributing IOs, ... a single NFS server doesn't last long without its load going over 50 and iowaits spiking ...
A better way would be to look into actual SDS solutions. One I could recommend is Ceph, though there's a lot of new solutions I'm less familiar with ... and there's GlusterFS I would definitely avoid. An easy way to deploy Ceph would be to use ceph-ansible.
Given what corporate hardware you have at your disposal, maybe you would have some NetApp or equivalent, something that can implement NFS shares, and/or some iSCSI gateways.
Now, those are all solutions you could run on the side, although note that you would also find "CNS" solutions (container native), which are meant to be deployed on top of Kubernetes. Ceph clusters can be managed using Rook. These can be interesting, though in terms of maintenance and operations, it requires good knowledge of both the solution you operate and kubernetes/containers in general: troubleshooting issues and fixing outages may not be as easy as a good-old bare-meta/VM setup. For a first Kubernetes experience: I would refrain myself. When you'll feel comfortable enough, go ahead.
In any cases, another critical consideration before deploying your cluster would be the network that would host your installation. Consider that Kubernetes should not be directly deployed on public instances: you would probably want to have some private VLAN, maybe an internal DNS, a local resitry (could be Kubernetes-hosted), or other tools such as an LDAP server, some SMTP relay, HTTP cache/proxies, loadbalancers to put in front of your API, ...
Once you've made up your mind regarding those issues, you can look into deploying a Kubernetes cluster using tools such as Kubespray (ansible) or Kops (uses Terraform, and thus requires some cloud API, eg: aws). Both projects are part of the Kubernetes project and maintained by its community. Kubespray would cover all scenarios (IAAS & bare-metal), integrate with popular SDS out of the box, can ship with various ingress controllers, ... overall offers good defaults, and lots of variables to customize your installation.
Start with a 3-master 2-workers cluster, make sure the resulting cluster matches what you would expect.
Before going to prod, take your time to properly translate your existing configurations. Sometime, refactoring code or images could be worth it.
Going to prod, consider adding a group of "infra" nodes: if you want to host some logging solution or other internal services that are somewhat critical to users and shouldn't suffer outages caused by end-users workloads (eg: ingress routers, monitoring, logging, integrated registry, ...).
Kubespray: https://github.com/kubernetes-sigs/kubespray/
Kops: https://github.com/kubernetes/kops
Ceph: https://ceph.com/en/discover/
Ceph Ansible: https://github.com/ceph/ceph-ansible
Rook (Ceph CNS): https://github.com/rook/rook

Guidance on when to chose virtual machines or physical machines over containers

There are many articles and videos comparing containers, virtual machines, physical machines. However almost all information is theoretical: containers are fast, VMs are secure, etc. But I could not find description of specific use cases or guidance on when to choose virtual machines, physical machines, but not containers. So, currently I cannot imagine situation when somebody gives recommendation to not use containers.
Question:
Could you please list specific applications or solutions when you would recommend using VMs, but not container?
Could you please list specific applications or solutions when you would recommend using OS over bare metal, but not containers or VMs?
Here is example of answer I would appreciate to get (note, that I am not sure if this information is correct):
Use case 1: Edge Router
Edge router is a router which connects organizational network to the Internet. Also, in this case it is assumed, that vendor of the router provides it not as device but as a software package (virtualized router).
Edge router most probably will be one of target of hacker's attacks. Thus security requirements come to the first place.
Containers are not recommended in this case. By default containers provide mediocre level of security. Strong security can be achieved with complex configuration (what configuration?) but this is more difficult than in case of VM or bare metal. In addition, high security level may require special hardened Linux kernel, however containers technology does not allow adjusting kernel configuration.
Virtual Machines would be a good choice if vendor of the router provides software as VM image or when organization has many edge routers (for example, many offices with internet access points), and has (or is ready to create) well-established process of preparation of VM images. In this case using VMs will simplify rollout, update and healing the virtualized edge router. VM also provides high security level; nevertheless is it still recommended to place such a VM in a separate server and to not share same server with other applications/VMs to avoid cross-VM attacks.
Physical machine would be a good choice if router vendor provides router's software as an application package (not as a VM) such as .rpm, and rollout, update and healing processes are not expected to take much efforts; this might be the case when when company has few routers (so updates can be performed manually or automated with tools like Ansible), and couple of hour of planned and unplanned downtime is acceptable.
Use case 2: ...
Thank you in advance.
The question is a bit vague so I'll try my best:
you'd usually allocate work to containers when you have a few separate applications with limited physical resources and you'd like to run them each with their own different environment (different runtime version, architecture and dependencies) which managing on a machine (physical or virtual) would be cumbersome.
you'd use a VM when you want specifically a feature that containers couldn't satisfy or it would just be a headache to set them up with it and a simple quick and easy VM could solve (and again you have limited resources you'd like to share between use cases)
and finally, a physical machine when performance is of the essence like I/O requests and latency around that.
you can also mix and match to match each tier needs:
we need to run many applications that VM would be too much of an overhead for them and containers would make their handling more automated and streamline so containers with k8s, but on the other hand, we want local storage offered to those containers to be very fast so we run the k8s cluster on physical machines.
if recoverability would be of the essence we would have used VM due to the options of snapshotting VM states over time.
It's all a big LEGO set you can mix and match depending on your use case and needs

How to configure Prometheus in a multi-location scenario?

I love using Prometheus for monitoring and alerting. Until now, all my targets (nodes and containers) lived on the same network as the monitoring server.
But now I'm facing a scenario, where we will deploy our application stack (as a bunch of Docker containers) to several client machines in thier networks. Nearly all of the clients networks are behind a firewall or NAT. So scraping becomes quite difficult.
As we're still accountable for our stack, I'd like to have a central montioring server, altering and dashboards.
I was wondering what could be the best architecture if want to implement it with Prometheus, but I couldn't find any convincing approaches. My ideas so far:
Use a Pushgateway on our side and push all data out of the client networks. As the docs state, it's not intended that way: https://prometheus.io/docs/practices/pushing/
Use a federation setup (https://prometheus.io/docs/prometheus/latest/federation/): Place a Prometheus server in every client network behind a reverse proxy (to enable SSL and authentication) and aggregate relevant metricts there. Open/forward just a single port for federation scraping.
Other more experimental setups, such as SSH Tunneling (e.g. here https://miek.nl/2016/february/24/monitoring-with-ssh-and-prometheus/) or VPN!?
Thank you in advance for your help!
Nobody posted an answer so I will try to give my opinion on the second choice because that's what I think I would do in your situation.
The second setup seems the most flexible, you have access to the datas and only need to open one port on for the federating server, so it should still be secure.
One other bonus of this type of setup is that even if the firewall stop working for a reason or another, you will still have a prometheus scraping, you will have an alert because you won't be able to access the server(s) but when the connexion comes again you will have all the datas. You won't have a hole in the grafana dashboards because there was no datas, apart during the incident.
The issue with this setup is the fact that you need to maintain a number of server equivalent to the number of networks. A solution for this would be to have a packer image or maybe an ansible playbook to deploy.

On-prem docker swarm deployment with HA

I’m doing on-prem deployments using docker swarm and I need application and DB high availability.
As far as application HA is concerned, it works great within docker (service discovery and load balancing), but I’m not sure how to use it on my network. I mean how can I assign a virtual IP to all of my docker managers so that if any of them goes down, that virtual IP automatically points to the other docker manager in the cluster. I don’t want to have a single point of failure in my architecture, that’s why I’m not inclined to use any (single) reverse proxy solution in front of my swarm cluster (because to my understanding, if nginx/HAProxy goes down, the whole system goes into abyss. I would love to know that I’m wrong).
Secondly, I use WebSockets in my application for push notifications which doesn’t behave normally with all the load balancing stuff because socket handshakes get distorted.
I want a solution to these problems without writing anything in code (HA-specific and non-generic like hard coding IPs etc). Any suggestions? I hope I explained my problem correctly.
Docker Flow Proxy or Traefik can be placed on a set of swarm nodes that you want to receive traffic for incoming connections, and use DNS routing to get packets to the correct containers. Both have sticky sessions option (I know Docker Flow does, not sure about Traefik).
Then you can either:
If your incoming connections are just client HTTP/S requests, you can use DNS Round Robin with multiple A records, which works great, or
By an expensive hardware fault tolerant reverse proxy like F5
Use some network-layer IP failover that is at the OS and physical network level (not related to Docker really), but I'm not sure how well that would work with Swarm.
Number 2 is the typical solution in private datacenters that need full HA at all layers.

Reverse engineering a Docker deployment on private cloud

I am working on a software that has to be deployed on private cloud of a client. The client has root access, as well as hardware. I don't want the client to reverse engineer our software.
We can control two things here:
we have access to a secure port of the server, which we can use to send tokens to decrypt the code, and shut it down if necessary;
we can do manual installation (key in a password at the time of installation) or use Tamper resistance device if we have to.
Can a Docker deployment prevent our client from reverse engineering our code? We plan to open a single port and use SSL to protect incoming and outgoing data.
If user has root, or he able to use his custom kernel (or even kernel modules), he can do anything - dump memory, stop process, attach debugger - to start reverse engineering. If user has access to hardware, he also can get root or custom kernel. The only way to protect soft from user - is using good DRM, for example with help of TPM (Trusted Platform Module), or ARM TrustZone. SecureBoot will not fully protect your soft (on x86 it usually may be turned off). Other variant is using Tamper-resistant hardware (http://www.emc.com/emc-plus/rsa-labs/standards-initiatives/what-is-tamper-resistant-hardware.htm), like what is used to store master encryption keys (to process pin-codes) in banks (http://en.wikipedia.org/wiki/Hardware_security_module), but this hardware have very high cost.
It is known that Docker does not give protection to the code from user:
https://stackoverflow.com/a/26108342/196561 -
The root user on the host machine (where the docker daemon runs) has full access to all the processes running on the host. That means the person who controls the host machine can always get access to the RAM of the application as well as the file system. That makes it impossible to hide a key for decrypting the file system or protecting RAM from debugging.
Any user capable of deploying docker container (user from docker group) has full access to the container fs, has root access to the container processes and can debug them and dump their memory.
https://www.andreas-jung.com/contents/on-docker-security-docker-group-considered-harmful
Only trusted users should be allowed to control your Docker daemon
http://docs.docker.com/articles/security/#docker-daemon-attack-surface
Docker allows you to share a directory between the Docker host and a guest container; and it allows you to do so without limiting the access rights of the container.
So, Docker give no additional protection to your code from user; we can consider it just like other packaging system, like rpm and deb. Rpm and deb allows you to pack your code into single file and list dependencies, and docker packs your code and dependencies into single file.
Our solution is hosted on our client's cloud server, so they do have access to both root and the hardware. However, we have two advantages here: 1) we have access to a secure port, which we can use to send tokens to decrypt the code, and audit suspicious activities; 2) we can do manual installation (key in a token at the time of installation)
You can protect only the code you own, if it is running on the hardware you own (turn off all NSA/IntelME/IPMI/UEFI backdoors to own hardware). If user runs your code on his hardware, he will have all binaries and will be capable of memory dumping (after receiving the token from you).
Virtualization on his hardware will not give your code any additional protection.
Does "secure port" means SSL/TLS/SSH? It is secure only to protect data when it is send on network; both endpoints will have the data in plain, unencrypted form.
Manual installation will not help to protect code after you leave user's datacenter.
I think you can buy some usual software protection solution, like flexlm, may be with some hardware tokens required to run the software. But any protection may be cracked, early (cheaper) will be cracked easier, and modern (more expensive) protection is bit harder to crack.
You may also run some part of software on your own servers; this part will be not cracked.
or use Tamper resistance hardware if we have to.
You can't use tamper resistance hardware if there is no such hardware in the user's server. And it is very expensive.

Resources