I have a web application consisting of a few services - web, DB and a job queue/worker. I host everything on a single Google VM and my deployment process is very simple and naive:
I manually install all services like the database on the VM
a bash script scheduled by crontab polls a remote git repository for changes every N minutes
if there were changes, it would simply restart all services using supervisord (job queue, web, etc)
Now, I am starting a new web project where I enjoy using docker-compose for local development. However, I seem to suck in analysis paralysis deciding between available options for production deployment - I looked at Kubernetes, Swarm, docker-compose, container registries and etc.
I am looking for a recipe that will keep me productive with a single machine deployment. Ideally, I should be able to scale it to multiple machines when the time comes, but simplicity and staying frugal (one machine) is more important for now. I want to consider 2 options - when the VM already exists and when a new bare VM can be allocated specifically for this application.
I wonder if docker-compose is a reasonable choice for a simple web application. Do people use it in production and if so, how does the entire process look like from bare VM to rolling out an updated application? Do people use Kubernetes or Swarm for a simple single-machine deployment or is it an overkill?
I wonder if docker-compose is a reasonable choice for a simple web application.
It can be, sure, if the development time is best spent focused on the web application and less on the non-web stuff such as the job queue and database. The other asterisk is whether the development environment works ok with hot-reloads or port-forwarding and that kind of jazz. I say it's a reasonable choice because 99% of the work of creating an application suitable for use in a clustered environment is the work of containerizing the application. So if the app already works under docker-compose, then it is with high likelihood that you can take the docker image that is constructed on behalf of docker-compose and roll it out to the cluster.
Do people use it in production
I hope not; I am sure there are people who use docker-compose to run in production, just like there are people that use Windows batch files to deploy, but don't be that person.
Do people use Kubernetes or Swarm for a simple single-machine deployment or is it an overkill?
Similarly, don't be a person that deploys the entire application on a single virtual machine or be mentally prepared for one failure to wipe out everything that you value. That's part of what clustering technologies are designed to protect against: one mistake taking down the entirety of the application, web, queuing, and persistence all in one fell swoop.
Now whether deploying kubernetes for your situation is "overkill" or not depends on whether you get benefit from the other things that kubernetes brings aside from mere scaling. We get benefit from developer empowerment, log aggregation, CPU and resource limits, the ability to take down one Node without introducing any drama, secrets management, configuration management, using a small number of Nodes for a large number of hosted applications (unlike creating a single virtual machine per deployed application because the deployments have no discipline over the placement of config file or ports or whatever). I can keep going, because kubernetes is truly magical; but, as many people will point out, it is not zero human cost to successfully run a cluster.
Many companies I have worked with are shifting their entire production environment towards Kubernetes. That makes sense because all cloud providers are currently pushing Kubernetes and we can be quite positive about Kubernetes being the future of cloud-based deployment. If your application is meant to run in any private or public cloud, I would personally choose Kubernetes as operating platform for it. If you plan to add additional services, you will be easily able to connect them and scale your infrastructure with a growing number of requests to your application. However, if you already know that you do not expect to scale your application, it may be over-powered to use a Kubernetes cluster to run it although Google Cloud etc. make it fairly easy to setup such a cluster with a few clicks.
Regarding an automated development workflow for Kubernetes, you can take a look at my answer to this question: How to best utilize Kubernetes/minikube DNS for local development
Related
I don't know much about kubernetes, but as far as I know, it is a system that enables you to control and manage containerized applications. So, generally speaking, the essence of the benefit that we get from kubernetes is the ability to "tell" kubernetes what containers we want running, how many of them, on which machines, among other details, and kubernetes will take care of doing that for us. Is that correct?
If so, I just can't see the benefit of running a CI pipeline using a kubernetes pod, as I understand that some people do. Let's say you have your build tools on Docker containers instead of having them installed on a specific machine, that's great - you can just use those containers in the build process, why kubernetes? Is there any performance gain or something like this?
Appreciate some insights.
It is highly recommended to get a good understanding of what Kubernetes is and what it can and cannot do.
Generally, containers combined with an orchestration tools can provide a better management of your machines and services. It can significantly improve the reliability of your application and reduce the time and resources spent on DevOps.
Some of the features worth noting are:
Horizontal infrastructure scaling: New servers can be added or removed easily.
Auto-scaling: Automatically change the number of running containers, based on CPU utilization or other application-provided metrics.
Manual scaling: Manually scale the number of running containers through a command or the interface.
Replication controller: The replication controller makes sure your cluster has an equal amount of pods running. If there are too many pods, the replication controller terminates the extra pods. If there are too few, it starts more pods.
Health checks and self-healing: Kubernetes can check the health of nodes and containers ensuring your application doesn’t run into any failures. Kubernetes also offers self-healing and auto-replacement so you don’t need to worry about if a container or pod fails.
Traffic routing and load balancing: Traffic routing sends requests to the appropriate containers. Kubernetes also comes with built-in load balancers so you can balance resources in order to respond to outages or periods of high traffic.
Automated rollouts and rollbacks: Kubernetes handles rollouts for new versions or updates without downtime while monitoring the containers’ health. In case the rollout doesn’t go well, it automatically rolls back.
Canary Deployments: Canary deployments enable you to test the new deployment in production in parallel with the previous version.
However you should also know what Kubernetes is not:
Kubernetes is not a traditional, all-inclusive PaaS (Platform as a
Service) system. Since Kubernetes operates at the container level
rather than at the hardware level, it provides some generally
applicable features common to PaaS offerings, such as deployment,
scaling, load balancing, and lets users integrate their logging,
monitoring, and alerting solutions. However, Kubernetes is not
monolithic, and these default solutions are optional and pluggable.
Kubernetes provides the building blocks for building developer
platforms, but preserves user choice and flexibility where it is
important.
Especially in your use case note that Kubernetes:
Does not deploy source code and does not build your application.
Continuous Integration, Delivery, and Deployment (CI/CD) workflows are
determined by organization cultures and preferences as well as
technical requirements.
The decision is yours but having in mind the main concepts above will help you make it.
An important detail is that you do not tell Kubernetes what nodes a given pod should run on; it picks itself, and if the cluster is low on resources, in many cases it can actually allocate more nodes on its own (via the cluster autoscaler).
So if your CI system is fairly busy, and uses all containers for everything, it could make more sense to run an individual build job as a Kubernetes Job. If you have 100 builds that all start at the same time, it's possible for the cluster to give itself more hardware, and the build queue will clear out faster. Particularly if you're using Kubernetes for other tasks, this can save you same administrative effort over maintaining a dedicated pool of CI-system workers that need to be separately updated and will sit mostly idle until that big set of builds arrives.
Kubernetes's security settings are also substantially better than Docker's. Say your CI system needs to launch containers as part of a build. In Kubernetes, it can run under a service account, and be given permissions to create and delete deployments in a specific namespace, and nothing else. In Docker the standard approach is to give your CI system access to the host's Docker socket, but this can be easily exploited to take over the host.
I am new to docker and I have done some ground work on docker like how to build a image, how to create a container, what is dockerfile.yml, docker-compose.yml file does etc. I have done my practice in my local machine. I am having following questions when it is coming to real time development using docker.
1) Does each developer in a team has to do development on docker and create images in their local machine and push it to docker registry ? or developers work without docker and one person will be creating final image from the committed code?
2) For each release, do we maintain the container or an image for that release?
3) what is the best practice to maintain the database means do we incorporate the database in image and build the container or we include only application related things and build image and container, and this container will communicate to database which is in outside container or physical database server ?
Thanks in advance.
Questions like these generally do not have a definitive answer. It depends heavily on your company, your team, the tooling that is being used, the software stack, etc. The best thing to do would be to lean on the senior development resources and senior technical leadership in your organization to help shape the answers to questions like these.
Take the following answers with a silo of salt as there is no way to answer these kinds of questions definitively.
1) Depends on what is most convenient for the developers and what language you are using. Some developers find an all container workflow to be convenient, some developers find they can iterate faster with their existing IDE/CLI workflow and test against running container images last.
In most cases you will want to let CI/CD tooling take care of the builds that are intended for production.
2) Yes. You can use container tagging for this purpose.
3) Running databases in containers is possible, but unless your team is experienced with containers and container orchestration I would leave the databases on traditional bare-metal or VMs.
Containers are a fancy wrapper around a single linux process. Generally the rule of thumb is one container for one process. You should not be combining multiple things in a single container. (This story gets slightly more complicated when you go to something like Red Hat OpenShift or Kubernetes as the discussion revolves around how many containers per pod).
The setup I'm used to is that developers mostly ignore Docker, until they need it to deploy. Tools like Node's node_modules directory or Python's virtual environments can help isolate subproject from each other. Any individual developer should be able to run docker build to build an image, but won't usually need to until the final stages of testing a particular change. You should deploy a continuous integration system and that will take responsibility for testing and building a final Docker image of each commit.
You never "maintain containers". If a container goes wrong, delete it and start a new one. Your CI system should build an image for each release, and you should deploy a registry to hold these.
You should never keep the database in the same container as the application. (See the previous point about routinely deleting containers.) My experience has generally been that production databases are important and special enough to merit their own dedicated non-Docker hosts, but there's nothing actually wrong with running a database in Docker; just make sure you know how to do backups and restores and migrations and whatever else on it.
There's no technical reason you can't use Docker Compose for production, but if you wind up needing to deploy your application on more than one physical server you might find it limiting. Kubernetes is more complex but seems to be the current winner in this space; Docker Swarm has some momentum; Hashicorp Nomad is out there; or you can build a manual deployment system by hand. (Note that at least Kubernetes and Nomad are both very big on the "something changed so I'm going to delete and recreate a container" concept, and both make it extremely tricky to do live development in a quasi-production setup.)
Also note that where I say "deploy" there are public-cloud versions of all of these things (Docker Hub, CircleCI, end-to-end solutions including a registry and Kubernetes built on top of AWS or Azure or GCP) and if you're comfortable with the cost-to-effort tradeoff and the implications of using an external service in your build/deploy chain then these can help you get started faster
I'm using docker on a bare metal server. I'm pretty happy with docker-compose to configure and setup applications.
Still some features are missing, like configuration management and monitoring maybe there are other solutions to solve this issues but I'm a bit overwhelmed by the feature set of Kubernetes and can't judge if it would help me here.
I'm also open for recommendations to solve the requirements separately:
Configuration / Secret management
Monitoring of my docker hostes applications (e.g. having some kind of dashboard)
Remot container control (SSH is okay with only one Server)
Being ready to scale my environment (based on multiple different Dockerized applications) to more than one server in future - already thinking about networking/service discovery issues with a pure docker-compose setup
I'm sure Kubernetes covers some of these features, but I have the feeling that it's too much focused on Cloud platforms where Machines are created on the fly (since I only have at most few bare metal Servers)
I hope the questions scope is not too broad, else please use the comment section and help me to narrow down the question.
Thanks.
I think the Kubernetes is absolutely much your requests and it is what you need.
Let's start one by one.
I have the feeling that it's too much focused on Cloud platforms where Machines are created on the fly (since I only have at most few bare metal Servers)
No, it is not focused on Clouds. Kubernates can be installed almost on any bare-metal platform (include ARM) and have many tools and instructions which can help you to do it. Also, it is easy to deploy it on your local PC using Minikube, which will prepare local cluster for you within VMs or right in your OS (only for Linux).
Configuration / Secret management
Kubernates has a powerful configuration and management based on special objects which can be attached to your containers. You can read more about configuration management in that article.
Moreover, some tools like Helm can provide you more automation and range of preconfigured applications, which you can install using a single command. And you can prepare your own charts for it.
Monitoring of my docker hostes applications (e.g. having some kind of dashboard)
Kubernetes has its own dashboard where you can get many kinds of information: current applications status, configuration, statistics and many more. Also, Kubernetes has great integration with Heapster which can be used with Grafana for powerful visualization of almost anything.
Remot container control (SSH is okay with only one Server)
Kubernetes controlling tool kubectl can get logs and connect to containers in the cluster without any problems. As an example, to connect a container "myapp" you just need to call kubectl exec -it myapp sh, and you will get sh session in the container. Also, you can connect to any application inside your cluster using kubectl proxy command, which will forward a port you need to your PC.
Being ready to scale my environment (based on multiple different Dockerized applications) to more than one server in future - already thinking about networking/service discovery issues with a pure docker-compose setup
Kubernetes can be scaled up to thousands of nodes. Or can have only one. It is your choice. Independent of a cluster size, you will get production-grade networking, service discovery and load balancing.
So, do not afraid, just try to use it locally with Minikube. It will make many of operation tasks more simple, not more complex.
I would like to know what are the strong reasons to go or not to go with the Docker with Elixir/Erlang Application in Production.This is the first time I am asked for starting with the Docker in production.I worked on the production without Docker.I am an Erlang/Elixir Developer.I worked on the high traffic productions servers with millions of transactions per second which are running without Docker.I spent one day for creating and running a Elixir Application image with lots of issues with the network.I had to do lots of configurations for DNS setup etc.After that I started thinking What are the strong reasons for proceeding further.Are there any strong reasons to go or not to go with the Docker with Elixir/Erlang Applications in production.
I went through some of the reasons in the forums but still It am not convinced.All the advantages that docker is providing is already there in the Erlang VM. Could any Erlang Expert in the form please help me.
I deploy Elixir packaged in Docker on AWS in production.
This used to be my preferred way of doing things but now I am more inclined to create my own AMI using Packer with everything preinstalled.
The matter central in deployments is that of control, which to a certain extent I feel is relinquished when leveraging Docker.
The main disadvantage of Docker is that it limits the capabilities of Erlang/Elixir, such as internode connection over epmd. This also means that remsh is practically out of the question and the cool :observer.start is a no-no. If you ever need to interact with a production node for whatever reason, there is an extra barrier of entry of first ssh-ing into the server, going inside Docker etc.. Fine when it is just about checking something, frustrating when production is burning down in agony. Launching multiple containers in one Node is kinda useless as the BEAM makes efficient use of all your cores. Hot upgrades are practically out of the question, but that is not really a feature we personally have an intrinsic business need for.
Effort has been made to have epmd working within container setup, such as: https://github.com/Random-Liu/Erlang-In-Docker but that will require you to rebuild Erlang for custom net_kernel modifications.
Amazon has recently released a new feature to AWS ECS, AWS VPC Networking Mode, which perhaps may facilitate inter-container epmd communication and thus connecting to your node directly. I haven't validated that as yet.
Besides the issue of epmd communication is the matter of deployment time. Creating your image with Docker, even though you have images that boast 5MB only, quickly will end up taking 300MB, with 200MB of that just for all the various dependencies to have your release created. There may be ways to reduce that, but that requires specialized knowledge and dedicated effort. I would classify this extra space more as an annoyance as opposed to a deal breaker, but believe me if you have to wait 25 minutes for your immutable deployments to complete, any minute you can shave off would be worthwhile.
Performance wise, I did not notice a significant difference between bare metal deployments and docker deployments. AWS EB Docker nicely expands the container resources to that of the EC2-instance.
The advantage of course is that of portability. If you have a front end engineer that needs to hit a JSON API then in terms of local development it is a huge win that with some careful setup they can just spawn up the latest api running on their local without having to know about Erlang/Elixir/Rserve/Postgres.
Also, Vendor lock-in is greatly reduced, especially ever since AWS launched their support for Kubernetes
This is a question of tradeoffs, if you are a developer who needs to get to production and have very little Devops knowledge, then perhaps a Docker deployment may be warranted. If you are more familiar with infrastructure, deployments etc., then as developer I believe that creating your own AMI gives you more control over your environment.
All by all, I would encourage to at least play around with Docker and experiment with it, it may open a new realm of possibilities.
Maybe it depends on the server you want to use. From what I know, for example, Docker facilitates the deployment of a Phoenix application on AWS Elastic Beanstalk a lot, but I'm not competent enough to give you very specific reasons at the moment.
Maybe someone can elaborate more.
Docker is primarily a deployment and distribution tool. From the Docker docs:
Docker streamlines the development lifecycle by allowing developers to work in standardized environments using local containers which provide your applications and services. Containers are great for continuous integration and continuous development (CI/CD) workflows.
If your application has external dependencies (for example, a crypto library), interacts with another application written in another language (for example, a database running as a separate process), or if it relies on certain operating system / environment configuration (you mentioned you had to do some DNS configuration), then packaging your application in a docker container helps you avoid doing duplicate work installing dependencies and configuring the environment. It helps you avoid extra work keeping in sync your testing and production environment in terms of dependencies or investigating why an application works on one machine in one environement, but not another.
The above is not specific to an Erlang application, though I can agree that Erlang helps eliminate some of the problems being cross-platform and abstracting away some of the dependencies, and OTP release handling helps you package your application.
Since you mentioned you are a developer, it is worth mentioning that Docker offers more advantages for an administrator or a team running the infrastructure rather than it does for a developer.
There are many reasons for deploying containerized application on kubernetes. But we may get overwhelmed with its usefulness and start deploying applications when it should not.
Can there be a case when deploying a application on kubernetes would not add any value and in fact it would be disadvantageous?
To be specific and take an example, would deploying support tool like jenkins on kubernetes be a wrong decision, if scaling and high availability is not really a concern.
Deploying Jenkins or other services on kubernetes can be a good decision if you'd like an existing k8 infrastructure to monitor and manage those pods for you. There is more to k8 than scaling. Running two pods can provide higher availability for example. Maybe you dont need to scale, but you really do not want any downtime. Also, things like rolling updates, etc can be useful in some situations.
I'm finding that deploying anything that requires complex individual configuration and especially data persistence is not awesome in k8 currently. PetSets is aiming to change that but is only Alpha at this point in time.
I suggest avoiding deploying complex database like Cassandra and Vertica in K8. I'd also avoid deploying Elasticsearch, Zookeeper, and Kafka systems in K8. These all require individual node configuration and data persistence that currently will cause you more grief than benefit in my experience.