I am pretty familiar with Docker itself, but now facing a question of how to orchestrate a varying set of Docker containers on demand for a system architecture that I am developing. I have heard of Kubernetes and Docker Swarm, but am not sure which orchestration tool would be best suited for the requirements.
Here's the situation, what I am imagining my software architecture to work like:
There's 1 to N clients and 1 to M servers (probably only 1 server in the beginning).
The clients are running 1 to X Docker containers that also communicate with each other. The Docker containers do not necessarily run the same Docker image.
The clients can request to run a particular function on one of the servers.
In that case, one (or more) servers are spinning up 1 to Y new Docker containers on demand. At the same time, the client is pausing/shutting down 1 or more of the X containers on the client.
Multiple sets of different Docker containers and their server replacements could be allowed. So e.g. containers x1, x2 on the client could be substituted by containers y1, y2, y3 on the server.
The exact configuration of which 1 to Y containers to launch on-demand and how they are connected would depend on the request made to the server.
Given the need for a particular container y1 on the server side, it could also help to dynamically scale this container to M servers to serve more clients.
I believe one of the core questions is: How can I spin up 1 to X (pre-defined, but configurable) Docker containers on-demand, i.e., given some external condition? The request would be triggered by some kind of call from client to server.
I hope that I was able to understandably present my current thoughts and that they make sense. Feel free to point me to any existing tools, workflows, or even completely alternate system designs. Thank you in advance!
Related
I'm jumping from a local docker-compose building, to a production environment, in which I have 4 vps. The first (the manager) is the one with the least resources. The other 3 have the same and are biggers (the workers). I decided to use docker swarm, to manage these infrastructure. My doubt is, Should I be concerned about which host x container is running on? Or this is a bad concept of mine? I mean, is the docker swarm meant for me to abstract from the underlying nodes, and create the services and containers trusting that docker will manage the resources successfully?
Answer is... both!
The goal is to let Docker Swarm manage things for you as much as possible, but also add constraints in order for your application to deploy on the hardware that matches best its requirements.
For example, if you have a reverse proxy and machine learning models, you might want to deploy your reverse proxy on a CPU optimized server, and your machine learning models on a memory optimized instance.
You need to label your nodes properly, and then add constraints so that services are only deployed to the nodes that match your labels. For example in the example above you could add 2 labels: reverse-proxy and ml.
I am explaining how to do this more precisely in this article in case you're interested: https://juliensalinas.com/en/container-orchestration-docker-swarm-nlpcloud/
This question illustrates the theoretical differences between docker run and docker service.
What I don't understand is when would one need to use the exact same container replicated multiple times (as per the Docker documentation example)?
There, they run the same web app replicated 5 times.
Is deployment on Kubernetes (for example) a potential use case, where the developer does not want to centralize the app on one host, in order to make it more resilient, hence why 5 replicas are created?
To understand, can someone please please with an example use case, where the docker service is useful?
swarm is an orchestrator just like kubernetes. docker service deploys services to swarm just as you deploy your services to kubernetes using kubectl.
swarm is essentially built-in primitive orchestrator. One possible case for replicas is running a proxy that directs requests to proper containers. You could expose multiple machines and have one take place of another in case another fails. Or any other high availability case you could think of.
Your question could be rephrased as "What's the difference between running a single container and running containers in a cluster?", which would be another question altogether, but that rephrasing might help illustrate what docker service does.
If you want to scale your application, you can run multiple instances of it (horizontal scaling) or you beef up the machine(s) that it runs on (vertical scaling). For the first, you would have to put a load balancer in front of your application so that the traffic is evenly distributed between the different instances. The idea is that those instances run on different hosts, so if one goes down, your application is still up. Some controlling instance (a Kubernetes service, for example) will notice that one of your instances has gone south and won't direct any more traffic to it. Nowadays, with all the cloud stuff going on, this is typically the way to go.
You don't need Kubernetes for such a setup, but you're right, this would be a typical use case for it. At least if you run your application in a Docker container.
Once use case is running on Docker swarm which consists of n number of nodes in your swarm cluster. You can run replicas of your application on the swarm cluster with a load balancer/reverse proxy to load balance your setup. If any one of the nodes goes down the application can still run.
But the exact use case for running multiple instances is scalabilty. Suppose you know that one instance of your app can serve 10000 users (Assume Bank authentication) at a time.
If you want your application to serve 50K users just run 5 replicas(using docker service create) .
I just want to know .. Is there any kind of facility available now in docker. I have already gone through some of the documentations in docker regarding the multi-host facility such as,
Docker swarm
Docker service (with replicas)
And also I am aware about the volume problems in swarm mode and the maximum resource (RAM and CPU) limit to a container will vary and depends upon where (at what machine) it assigned by the swarm manager. So here my question is,
How to run a single container instance over multiple machines (not as service) ? (This means a single container can acquire all resources [RAM1 + RAM2 + ... + RAMn] over these connected machines)
is there any way to achieve this ?
My question may be idiotic. But I am curious to know.. how to achieve the same ?
The answer is No. Containerization technologies cannot handle compute, network and storage resources across cluster as one unit. They're only orchestrate them.
Docker and Co. based on cgroup, namespaces, layered FS, virtual networks, etc. All of them wired to specific machine + running processes and requiring additional servicing to manage containers not only on concrete machine, but in the cluster(For example, Mesos, k8s or Swarm).
You can check products such as Hadoop, Spark, Cassandra, Akka framework and other distributed computation implementations to see examples how to manage cluster resources as one unit.
PS You should always think about increasing system complexity with rising of components distribution.
I am familiarizing with the architecture and practices to package, build and deploy software or unless, small pieces of software.
If suddenly I am mixing concepts with specific tools (sometimes is unavoidable), let me know if I am wrong, please.
On the road, I have been reading and learning about the images and containers terms and their respective relationships in order to start to build the workflow software systems of a better possible way.
And I have a question about the services orchestration in the context of the docker :
The containers are lightweight and portable encapsulations of an environment in which we have all the binary and dependencies we need to run our application. OK
I can set up communication between containers using container links --link flag.
I can replace the use of container links, with docker-compose in order to automate my services workflow and running multi-containers using .yaml file configurations.
And I am reading about of the Container orchestration term, which defines the relationship between containers when we have distinct "software pieces" separate from each other, and how these containers interact as a system.
Well, I suppose that I've read good the documentation :P
My question is:
A docker level, are container links and docker-compose a way of container orchestration?
Or with docker, if I want to do container orchestration ... should I use docker-swarm?
You should forget you ever read about container links. They've been obsolete in pure Docker for years. They're also not especially relevant to the orchestration question.
Docker Compose is a simplistic orchestration tool, but I would in fact class it as an orchestration tool. It can start up multiple containers together; of the stack it can restart individual containers if their configurations change. It is fairly oriented towards Docker's native capabilities.
Docker Swarm is mostly just a way to connect multiple physical hosts together in a way that docker commands can target them as a connected cluster. I probably wouldn't call that capability on its own "orchestration", but it does have some amount of "scheduling" or "placement" ability (Swarm, not you, decides which containers run on which hosts).
Of the other things I might call "orchestration" tools, I'd probably divide them into two camps:
General-purpose system automation tools that happen to have some Docker capabilities. You can use both Ansible and Salt Stack to start Docker containers, for instance, but you can also use these tools for a great many other things. They have the ability to say "run container A on system X and container B on system Y", but if you need inter-host communication or other niceties then you need to set them up as well (probably using the same tool).
Purpose-built Docker automation tools like Docker Compose, Kubernetes, and Nomad. These tend to have a more complete story around how you'd build up a complete stack with a bunch of containers, service replication, rolling updates, and service discovery, but you mostly can't use them to manage tasks that aren't already in Docker.
Some other functions you might consider:
Orchestration: How can you start multiple connected containers all together?
Networking: How can one container communicate with another, within the cluster? How do outside callers connect to the system?
Scheduling: Which containers run on which system in a multi-host setup?
Service discovery: When one container wants to call another, how does it know who to call?
Management plane: As an operator, how do you do things like change the number of replicas of some specific service, or cause an update to a newer image for a service?
We are working with a dockerized kafka environment. I would like to know the best practices for deployments of kafka-connectors and kafka-streams applications in such scenerio . Currently we are deploying each connector and stream as springboot applications and are started as systemctl microservices . I do not find a significant advantage in dockerizing each kafka connector and stream . Please provide me insights on the same
To me the Docker vs non-Docker thing comes down to "what does your operations team or organization support?"
Dockerized applications have an advantage in that they all look / act the same: you docker run a Java app the same way as you docker run a Ruby app. Where as with an approach of running programs with systemd, there's not usually a common abstraction layer around "how do I run this thing?"
Dockerized applications may also abstract some small operational details, like port management for example - ie making sure all your app's management.ports don't clash with each other. An application in a Docker container will run as one port inside the container, and you can expose that port as some other number outside. (either random, or one to your choosing).
Depending on the infrastructure support, a normal Docker scheduler may auto-scale a service when that service reaches some capacity. However, in Kafka streams applications the concurrency is limited by the number of partitions in the Kafka topics, so scaling up will just mean some consumers in your consumer groups go idle (if there's more than the number of partitions).
But it also adds complications: if you use RocksDB as your local store, you'll likely want to persist that outside the (disposable, and maybe read only!) container. So you'll need to figure out how to do volume persistence, operationally / organizationally. With plain ol' Jars with Systemd... well you always have the hard drive, and if the server crashes either it will restart (physical machine) or hopefully it will be restored by some instance block storage thing.
By this I mean to say: that kstream apps are not stateless, web apps where auto-scaling will always give you some more power, and that serves HTTP traffic. The people making these decisions at an organization or operations level may not fully know this. Then again, hey if everyone writes Docker stuff then the organization / operations team "just" have some Docker scheduler clusters (like a Kubernetes cluster, or Amazon ECS cluster) to manage, and don't have to manage VMs as directly anymore.
Dockerizing + clustering with kubernetes provide many benefits like auto healing, auto horizontal scaling.
Auto healing: in case spring application crashes, kubernetes will automatically run another instances and will ensure required number of containers are always up.
Auto horizontal scaling: if you get burst of messages, yo can tune spring applications to auto scale up or down using HPA that can use custom metrics also.