Why doesn't Docker support multi-tenancy? - docker

I watched this YouTube video on Docker and at 22:00 the speaker (a Docker product manager) says:
"You're probably thinking 'Docker does not support multi-tenancy'...and you are right!"
But never is any explanation of why actually given. So I'm wondering: what did he mean by that? Why Docker doesn't support multi-tenancy?! If you Google "Docker multi-tenancy" you surprisingly get nothing!

One of the key features most assume with a multi-tenancy tool is isolation between each of the tenants. They should not be able to see or administer each others containers and/or data.
The docker-ce engine is a sysadmin level tool out of the box. Anyone that can start containers with arbitrary options has root access on the host. There are 3rd party tools like twistlock that connect with an authz plugin interface, but they only provide coarse access controls, each person is either allowed or disallowed from an entire class of activities, like starting containers, or viewing logs. Giving users access to either the TLS port or docker socket results in the users being lumped into a single category, there's no concept of groups or namespaces for the users connecting to a docker engine.
For multi-tenancy, docker would need to add a way to define users, and place them in a namespace that is only allowed to act on specific containers and volumes, and restrict options that allow breaking out of the container like changing capabilities or mounting arbitrary filesystems from the host. Docker's enterprise offering, UCP, does begin to add these features by using labels on objects, but I haven't had the time to evaluate whether this would provide a full multi-tenancy solution.

Tough question that others might know how to answer better than me. But here it goes.
Let's take this definition of multi tenancy (source):
Multi-tenancy is an architecture in which a single instance of a software application serves multiple customers.
It's really hard to place Docker in this definition. It can be argued that it's both the instance and the application. And that's where the confusion comes from.
Let's break Docker up into three different parts: the daemon, the container and the application.
The daemon is installed on a host and runs Docker containers. The daemon does actually support multi tenancy, as it can be used my many users on the same system, each of which has their own configuration in ~/.docker.
Docker containers run a single process, which we'll refer to as the application.
The application can be anything. For this example, let's assume the Docker container runs a web application like a forum or something. The forum allows users to sign in and post under their name. It's a single instance that serves multiple customers. Thus it supports multi tenancy.
What we skipped over is the container and the question whether or not it supports multi tenancy. And this is where I think the answer to your question lies.
It is important to remember that Docker containers are not virtual machines. When using docker run [IMAGE], you are creating a new container instance. These instances are ephemeral and immutable. They run a single process, and exit as soon as the process exists. But they are not designed to have multiple users connect to them and run commands simultaneously. This is what multi tenancy would be. Instead, Docker containers are just isolated execution environments for processes.
Conceptually, echo Hello and docker run echo Hello are the same thing in this example. They both execute a command in a new execution environment (process vs. container), neither of which supports multi tenancy.
I hope this answers is readable and answers your question. Let me know if there is any part that I should clarify.

Related

Is it possible to run a large number of docker containers?

A small introduction to history. I am building a small service (website) where the user is provided with all sorts of tools that work according to the parameters specified by the user himself. In my implementation, it turns out that the tools are one big script that runs in the docker. It turns out that my service should launch a new docker container for each user.
I was thinking about using "aws fargate" or "gcloud run", or any other resource that makes it possible to run a docker container.
But I'm interested. What if there are 1000 or 10000 users, each one will have its own docker container, is that good? Do the services (aws, gcloud) have any restrictions, or is it a bad implementation?
Based upon my understanding you have suggested that you instantiate a Docker container for each of your users, I think there are a couple of issues with this:
Depending on how many users you have you get into the realms of too many containers. (each container will consume resources, not just Memory and CPU but also TCP/IP pool exhaustion.)
Isolation -> Read containers are not VMs

"Workspace virtulization" vs. Docker container

There is a Wikipedia article about somewhat called "workspace virtualization". The article isn't perfect and doesn't have any good references, but there some another ones:
https://www.businessnewsdaily.com/5951-workspace-virtualization.html
https://www.cio.com/article/3104533/virtualization/workspace-virtualization.html
https://en.wikipedia.org/wiki/Symantec_Workspace_Virtualization
I'm trying to understand how this "workspace virtualization" differs from Docker containers in that case that we have multiple applications inside single container.
I'd expect that term to include setups where you have a complete desktop environment, with multiple interactive bundled applications, that either you can log into remotely or you can distribute as a self-contained virtual machine.
That might include:
Multiple applications bundled into one environment
A notion of a "user"
Data persisted across login sessions
The ability to transparently migrate the session across hosts
Running interactive GUI applications, not server-type applications
All of the above things are significant challenges in Docker. In Docker you typically have:
Only one thing running in a container
Run as the Unix root user or a single, non-configurable, system account
Content is lost when the container exits unless storage was explicitly configured at startup time
Migration is usually done by moving data (if any) and recreating the environment, not a live migration
Server-type programs, like HTTP-based services
I might implement the kind of "workspace virtualization" you're asking about using a full virtual machine environment, which has more of the right properties. It wouldn't be impossible per se to implement it on Docker, but you'd have to reinvent a lot of the pieces that get omitted in a typical Docker setup to make it lighter-weight, and you'd still be missing things like live migrations that are very mature in VM setups.

Containers Orchestations and some docker functions

I am familiarizing with the architecture and practices to package, build and deploy software or unless, small pieces of software.
If suddenly I am mixing concepts with specific tools (sometimes is unavoidable), let me know if I am wrong, please.
On the road, I have been reading and learning about the images and containers terms and their respective relationships in order to start to build the workflow software systems of a better possible way.
And I have a question about the services orchestration in the context of the docker :
The containers are lightweight and portable encapsulations of an environment in which we have all the binary and dependencies we need to run our application. OK
I can set up communication between containers using container links --link flag.
I can replace the use of container links, with docker-compose in order to automate my services workflow and running multi-containers using .yaml file configurations.
And I am reading about of the Container orchestration term, which defines the relationship between containers when we have distinct "software pieces" separate from each other, and how these containers interact as a system.
Well, I suppose that I've read good the documentation :P
My question is:
A docker level, are container links and docker-compose a way of container orchestration?
Or with docker, if I want to do container orchestration ... should I use docker-swarm?
You should forget you ever read about container links. They've been obsolete in pure Docker for years. They're also not especially relevant to the orchestration question.
Docker Compose is a simplistic orchestration tool, but I would in fact class it as an orchestration tool. It can start up multiple containers together; of the stack it can restart individual containers if their configurations change. It is fairly oriented towards Docker's native capabilities.
Docker Swarm is mostly just a way to connect multiple physical hosts together in a way that docker commands can target them as a connected cluster. I probably wouldn't call that capability on its own "orchestration", but it does have some amount of "scheduling" or "placement" ability (Swarm, not you, decides which containers run on which hosts).
Of the other things I might call "orchestration" tools, I'd probably divide them into two camps:
General-purpose system automation tools that happen to have some Docker capabilities. You can use both Ansible and Salt Stack to start Docker containers, for instance, but you can also use these tools for a great many other things. They have the ability to say "run container A on system X and container B on system Y", but if you need inter-host communication or other niceties then you need to set them up as well (probably using the same tool).
Purpose-built Docker automation tools like Docker Compose, Kubernetes, and Nomad. These tend to have a more complete story around how you'd build up a complete stack with a bunch of containers, service replication, rolling updates, and service discovery, but you mostly can't use them to manage tasks that aren't already in Docker.
Some other functions you might consider:
Orchestration: How can you start multiple connected containers all together?
Networking: How can one container communicate with another, within the cluster? How do outside callers connect to the system?
Scheduling: Which containers run on which system in a multi-host setup?
Service discovery: When one container wants to call another, how does it know who to call?
Management plane: As an operator, how do you do things like change the number of replicas of some specific service, or cause an update to a newer image for a service?

Manage docker containers in a college lab environment

Is there's a proper way to manage and configure docker containers in a college lab environment?
I requested for docker to be installed so that I could experiment with it for a project, but after speaking with our sysadmin, it seems very complicated. Wondering if SO has any insight.
Some exceptions that need to be handled:
Students will download images, which may be bad
Students may leave images running indefinitely
Some containers will require elevated privileges, for networking/IO/et cetera
Students will make images so images may be buggy, if docker is given a sticky permission bit or an elevated user group this may lead to a breach
One of the solutions that comes to mind is to just allow students to use a hypervisor within which they can install whatever software they like, including docker (we currently cannot do so), but that kinda bypasses the advantage of lightweight containers.
Your sysdmins' concerns are reasonable but using Docker should add only minor refinements to your existing security practices.
If your students have internet access today from these machines, then they can:
download binaries that may be bad
leave processes running indefinitely
may require processes with elevated privileges
may create buggy|insecure binaries
Containers provide some partitioning between processes on a machine but essentially all that happens is that namespaces are created and linux processes run in them; the name "containers" is slightly misleading, ps aux will show you all the processes (including container-based processes) running for the user on the machine.
So... Assuming you still need to control what students are downloading from the Internet and what roles they have on the machines:
private Image registries may be used either from the Cloud or locally
Registries can be coupled with vulnerability tools to help identify bad images
Tidying students' "sessions" will cover the processes in Docker containers too
Privilege escalations aren't complex (different but not complex)
Using some form of VM virtualization on bare metal machine is a good idea
If you were use Cloud-based VMs (or Containers), you can destroy these easily
One area where I find Docker burdensome is in managing the container life-cycle (rm old containers, tidying up images) but this should be manageable.

Should I use separate Docker containers for my web app?

Do I need use separate Docker container for my complex web application or I can put all required services in one container?
Could anyone explain me why I should divide my app to many containers (for example php-fpm container, mysql container, mongo container) when I have ability to install and launch all stuff in one container?
Something to think about when working with Docker is how it works inside. Docker replaces your PID 1 with the command you specify in the CMD (and ENTRYPOINT, which is slightly more complex) directive in your Dockerfile. PID 1 is normally where your init system lives (sysvinit, runit, systemd, whatever). Your container lives and dies by whatever process is started there. When the process dies, your container dies. Stdout and stderr for that process in the container is what you are given on the host machine when you type docker logs myContainer. Incidentally, this is why you need to jump through hoops to start services and run cronjobs (things normally done by your init system). This is very important in understanding the motivation for doing things a certain way.
Now, you can do whatever you want. There are many opinions about the "right" way to do this, but you can throw all that away and do what you want. So you COULD figure out how to run all of those services in one container. But now that you know how docker replaces PID 1 with whatever command you specify in CMD (and ENTRYPOINT) in your Dockerfiles, you might think it prudent to try and keep your apps running each in their own containers, and let them work with each other via container linking. (Update -- 27 April 2017: Container linking has been deprecated in favor of regular ole container networking, which is much more robust, the idea being that you simply join your separate application containers to the same network so they can talk to one another).
If you want a little help deciding, I can tell you from my own experience that it ends up being much cleaner and easier to maintain when you separate your apps into individual containers and then link them together. Just now I am building a Wordpress installation from HHVM, and I am installing Nginx and HHVM/php-fpm with the Wordpress installation in one container, and the MariaDB stuff in another container. In the future, this will let me drop in a replacement Wordpress installation directly in front of my MariaDB data with almost no hassle. It is worth it to containerize per app. Good luck!
When you divide your web application to many containers, you don't need to restart all the services when you deploy your application. Like traditionally you don't restart your mysql server when you update your web layer.
Also if you want to scale your application, it is easier if your application is divided separate containers. Then you can just scale those parts of your application that are needed to solve your bottlenecks.
Some will tell you that you should run only 1 process per container. Others will say 1 application per container. Those advices are based on principles of microservices.
I don't believe microservices is the right solution for all cases, so I would not follow those advices blindly just for that reason. If it makes sense to have multiples processes in one container for your case, then do so. (See Supervisor and Phusion baseimage for that matter)
But there is also another reason to separate containers: In most cases, it is less work for you to do.
On the Docker Hub, there are plenty of ready to use Docker images. Just pull the ones you need.
What's remaining for you to do is then:
read the doc for those docker images (what environnement variable to set, etc)
create a docker-compose.yml file to ease operating those containers
It is probably better to have your webapp in a single container and your supporting services like databases etc. in a separate containers. By doing this if you need to do rolling updates or restarts you can keep your database online while your application nodes are doing individual restarts so you wont experience downtime. If you have caching with something like Redis etc this is also useful for the same reason. It will also allow you to more easily add nodes to scale in a loosely coupled fashion. It will also allow you to manage the containers in a manner more suitable to a specific purpose. For the type of application you are describing I see very few arguments for running all services on a single container.
It depends on the vision and road map you have for your application. Putting all components of an application in one tier in this case docker container is like putting all eggs in one basket.
Whenever your application would require security, performance related issues then separating those three components in their own containers would be an ideal solution. It's needless to mention that this division of labor across containers would come at some cost and which would be related to wiring up those containers together for communication and security etc.

Resources