DevOps Simple Setup - docker

I'm looking to start creating proper isolated environments for django web apps. My first inclination is to use Docker. Also, it's usually recommended to use virtualenv with any python project to isolate dependencies.
Is virtualenv still necessary if I'm isolating projects via Docker images?

If your Docker container is relatively long-lived or your project dependencies change, there is still value in using a Python virtual-environment. Beyond (relatively) isolating dependencies of a codebase from other projects and the underlying system (and notably, the project at a given state), it allows for a certain measure of denoting the state of requirements at a given time.
For example, say that you make a Docker image for your Django app today, and end up using it for the following three weeks. Do you see your requirements.txt file being modified between now and then? Can you imagine a scenario in which you put out a hotpatch that comes with environmental changes?
As of Python 3.3, virtual-env is stdlib, which means it's very cheap to use, so I'd continue using it, just in case the Docker container isn't as disposable as you originally planned. Stated another way, even if your Docker-image pipeline is quite mature and the version of Python and dependencies are "pre-baked", it's such low-hanging fruit that while not explicitly necessary, it's worth sticking with best-practices.

No not really if each Python / Django is going to live in it's own container.

Related

Is it possible to manage Dockerfile for a project externally

Is it possible to manage Dockerfile for a project externally
So instead of ProjectA/Dockerfile, ProjectB/Dockerfile
Can we do something like project-deploy/Dockerfile.ProjectA, project-deploy/Dockerfile.ProjectB which somehow will know how to build ProjectA, ProjectB docker images.
We would like to allow separation of the developer, devops roles
Yes this is possible, though not recommended (I'll explain why in a second). First, how you would accomplish what you asked:
Docker Build
The command to build an image in its simplest form is docker build . which performs a build with a build context pulled from the current directory. That means the entire current directory is sent to the docker service, and the service will use it to build an image. This build context should contain all of the local resources docker needs to build your image. In this case, docker will also assume the existence of a file called Dockerfile inside of this context, and use it for the actual build.
However, we can override the default behavior by specifying a the -f flag in our build command, e.g. docker build -f /path/to/some.dockerfile . This command uses your current directory as the build context, but uses it's own Dockerfile that can be defined elsewhere.
So in your case, let's we assume the code for ProjectA is housed in the directory project-a and project-deploy in project-deploy. You can build and tag your docker image as project-a:latest like so:
docker build -f project-deploy/Dockerfile.ProjectA -t project-a:latest project-a/
Why this is a bad idea
There are many benefits to using containers over traditional application packaging strategies. These benefits stem from the extra layer of abstraction that a container provides. It enables operators to use a simple and consistent interface for deploying applications, and it empowers developers with greater control and ownership of the environment their application runs in.
This aligns well with the DevOps philosophy, increases your team's agility, and greatly alleviates operational complexity.
However, to enjoy the advantages containers bring, you must make the organizational changes to reflect them or all your doing is making thing more complex, and further separating operations and development:
If your operators are writing your dockerfiles instead of your developers, then you're just adding more complexity to their job with few tangible benefits;
If your developers are not in charge of their application environments, there will continue to be conflict between operations and development, accomplishing basically nothing for them either.
In short, Docker is a tool, not a solution. The real solution is to make organizational changes that empower and accelerate the individual with logically consistent abstractions, and docker is a great tool designed to complement that organizational change.
So yes, while you could separate the application's environment (the Dockerfile) from its code, it would be in direct opposition to the DevOps philosophy. The best solution would be to treat the docker image as an application resource and keep it in the application project, and allow operational configuration (like environment variables and secrets) to be accomplished with docker's support for volumes and variables.

Why do we build "inside" docker?

When I first learned Docker I expected a config file, image producer, CLI, and options for mounting and networks. That's all there.
I did not expect to put build commands inside a Dockerfile. I thought docker would wrap/tar/include a prebuilt task I made. Why give build commands in Docker?
Surely it can import a task thus keeping Jenkins/Bazel etc. distinct and apart for making an image/container?
I guess we are dealing with a misconception here. Docker is NOT a lighweight version of VMware/Xen/KVM/Parallels/FancyVirtualization.
Disclaimer: The following is heavily simplified for the sake of comprehensiveness.
So what is Docker?
In one sentence: Docker is a system to isolate processes from the other processes within an operating system as much as possible while still providing all means to run them. Put differently:
Docker is a package manager for isolated processes.
One of its closest ancestors are chroot and BSD jails. What those basically do is to isolate (more in the case of BSD, less in the case of chroot) a part of your OS resources and have a complete environment running independently from the rest of the OS - except for the kernel.
In order to be able to do that, a Docker image obviously needs to contain everything except for a kernel. So you need to provide a shell (if you choose to do so), standard libraries like glibc and even resources like CA certificates. For reference: In order to set up chroot jails, you did all this by hand once upon a time, preinstalling your chroot environment with each and every piece of software required. Docker is basically taking the heavy lifting from you here.
The mentioned isolation even down to the installed (and usable software) sounds cumbersome, but it gives you several advantages as a developer. Since you provide basically everything except for a (compatible) kernel, you can develop and test your code in the same environment it will run later down the road. Not a close approximation, but literally the same environment, bit for bit. A rather famous proverb in relation to Docker is:
"Runs on my machine" is no excuse any more.
Another advantage is that can add static resources to your Docker image and access them via quite ordinary file system semantics. While it is true that you can do that with virtualisation images as well, they usually do not come with a language for provisioning. Docker does - the Dockerfile:
FROM alpine
LABEL maintainer="you#example.com"
COPY file/in/host destination/on/image
Ok, got it, now why the build commands?
As described above, you need to provide all dependencies (and transitive dependencies) your application has. The easiest way to ensure that is to build your application inside your Docker image:
FROM somebase
RUN yourpackagemanager install long list of dependencies && \
make yourapplication && \
make install
If the build fails, you know you have missing dependencies. Now you can tweak and tune your Dockerfile until it compiles and is tested. So now your Docker image is finished, you can confidently distribute it, since you know that as long as the docker daemon runs on the machine somebody tries to run your image on, your image will run.
In the Go ecosystem, you basically assure your go.mod and go.sum are up to date and working and your work stay's reproducible.
Again, this works with virtualisation as well, so where is the deal?
A (good) docker image only runs what it needs to run. In the vast majority of docker images, this means exactly one process, for example your Go program.
Side note: It is very bad practise to run multiple processes in one Docker image, say your application and a database server and a cache and whatnot. That is what docker-compose is there for, or more generally container orchestration. But this is far too big of a topic to explain here.
A virtualised OS, however, needs to run a kernel, a shell, drivers, log systems and whatnot.
So the deal basically is that you get all the good stuff (isolation, reproducibility, ease of distribution) with less waste of resources (running 5 versions of the same OS with all its shenanigans).
Because we want to have enviroment for reproducible build. We don't want to depend on version of language, existence of compiler, version of libraires and so on.
Building inside a Dockerfile allows you to have all the tools and environment you need inside independently of your platform and ready to use. In a development perspective is easier to have all you need inside the container.
But you have to think about the objective of building inside a Dockerfile, if you have a very complex build process with a lot of dependencies you have to be worried about having all the tools inside and it reflects on the final size of your resulting image. Because this is not the same building to generate an artifact than building to produce the final container.
Thinking about this two aspects you have to learn to use the multistage build process in Docker here. The main idea is closer to your question because you can have a as many stages as you need depending on your build process and use different FROM images to ensure you have the correct requirements and dependences on each stage, to finally generate the image with the minimum dependencies and smaller size.
I'll add to the answers above:
Doing builds in or out of docker is a choice that depends on your goal. In my case I am more interested in docker containers for kubernetes, and in addition we have mature builds already.
This link shows how you take prebuilt tasks and add them to an image. This strategy together with adding libs, env etc leverages docker well and shows that indeed docker is flexible. https://medium.com/#chemidy/create-the-smallest-and-secured-golang-docker-image-based-on-scratch-4752223b7324

Advises about how to personal services setup with Helm and Kubernetes

This is more an advises asking than a specific technique question.
I did some search but it's hard to find the exact same issue. If you think it's a duplicate of another one question, please give me some links! :-)
The context
As many developers (I guess), I have one "Ali Baba's cave" server hosting my blog and multiple services: GitLab, Minio, Billing system for my freelance account etc...
All services are setup on an Ubuntu server using differents ways according to the possibilities I have: apt-get install, tar extraction or Capistrano deployments for personal projects.
This is working, but it's a maintenance hell for me. Some project can't be upgraded because a system dependency is conflicted with another one or simply not available on my OS, or an update may have side-effect on some project. For example, a PHP upgrade needed for my personal project completely broke a manually installed PHP service because the new version was not supported.
The needs
I'm currently learning Kubernetes and Helm charts. The goal is to setup a new CoreOS server and a Kubernetes ecosystem with all my apps and projects on it.
With that, I'll be able:
To get rid of maintenance hell with completely independent applications.
To maintain configuration with ease thanks to a simple git project with CI deployment
How to use Helm for that?
I did a test by created a basic chart with helm create my-network, creating a basic nginx app, perfect to add my network homepage!
But now I would like to add and connect some application, let's start with Gitlab.
I found two ways to add it:
Just running the helm upgrade --install gitlab gitalb/gitlab command with a yaml values file for configuration, outside my own chart.
Adding gitlab as a dependency thanks to the requirements.yaml
Both works, giving me the nearly same result.
The first solution seems more "independent" but I don't really know how to build/test it under CI (I would like upgrade automation).
The second allows me to configure all with a single values.yaml files, but I don't know what it is done during a upgrade (Are the upgrade processes of gitlab run during my chart upgrade?) and all in combined onto one "project".
GitLab is an example, but I want to add more of "ready-to-use" apps this way.
What would you advice to me? Solution 1 or 2? And what I should really take care of for both solution, especially for upgrade/backup?
If you have a completely different third solution to propose using Helm, feel free! :-)
Thanks
My experience has generally been that using a separate helm install for each piece/service is better. If those services have dependencies (“microservice X needs a Redis cache”) then those are good things to put in the requirements.yaml files.
A big “chart of charts” runs into a couple of issues:
Helm will flatten dependencies, so if service X needs Redis and service Y also needs Redis, a chart-of-charts setup will install one Redis and let it be shared; but in practice that’s often not what you want.
Separating out “shared” vs. “per-service” configuration gets a little weird. With separate charts you can use helm install -f twice to provide two separate values files, but in a chart-of-charts it’s harder to have a set of really global settings and also a set of per-component settings without duplicating everything.
There’s a standard naming convention that incorporates the Helm helm install --name and the specific component name. This looks normal if it’s service-x-redis, a little weird if it’s service-x-service-x, and kind of strange if you have one global release name the-world-service-x.
There can be good reasons to want to launch multiple independent copies of something, or to test out just the deployment scripting for one specific service, and that’s harder if your only deployment is “absolutely everything”.
For your use case you also might consider whether non-Docker systems management tools (Ansible, Chef, Salt Stack) could reproduce your existing hand deployment without totally rebuilding your system architecture; Kubernetes is pretty exciting but the old ways work very well too.

Docker, update image or just use bind-mounts for website code?

I'm using Django but I guess the question is applicable to any web project.
In our case, there are two types of codes, the first one being python code (run in django), and others are static files (html/js/css)
I could publish new image when there is a change in any of the code.
Or I could use bind mounts for the code. (For django, we could bind-mount the project root and static directory)
If I use bind mounts for code, I could just update the production machine (probably with git pull) when there's code change.
Then, docker image will handle updates that are not strictly our own code changes. (such as library update or new setup such as setting up elasticsearch) .
Does this approach imply any obvious drawback?
For security reasons is advised to keep an operating system up to date with the last security patches but docker images are meant to be released in an immutable fashion in order we can always be able to reproduce productions issues outside production, thus the OS will not update itself for security patches being released. So this means we need to rebuild and deploy our docker image frequently in order to stay on the safe side.
So I would prefer to release a new docker image with my code and static files, because they are bound to change more often, thus requiring frequent release, meaning that you keep the OS more up to date in terms of security patches without needing to rebuild docker images in production just to keep the OS up to date.
Note I assume here that you release new code or static files at least in a weekly basis, otherwise I still recommend to update at least once a week the docker images in order to get the last security patches for all the software being used.
Generally the more Docker-oriented solutions I've seen to this problem learn towards packaging the entire application in the Docker image. That especially includes application code.
I'd suggest three good reasons to do it this way:
If you have a reproducible path to docker build a self-contained image, anyone can build and reproduce it. That includes your developers, who can test a near-exact copy of the production system before it actually goes to production. If it's a Docker image, plus this code from this place, plus these static files from this other place, it's harder to be sure you've got a perfect setup matching what goes to production.
Some of the more advanced Docker-oriented tools (Kubernetes, Amazon ECS, Docker Swarm, Hashicorp Nomad, ...) make it fairly straightforward to deal with containers and images as first-class objects, but trickier to say "this image plus this glop of additional files".
If you're using a server automation tool (Ansible, Salt Stack, Chef, ...) to push your code out, then it's straightforward to also use those to push out the correct runtime environment. Using Docker to just package the runtime environment doesn't really give you much beyond a layer of complexity and some security risks. (You could use Packer or Vagrant with this tool set to simulate the deploy sequence in a VM for pre-production testing.)
You'll also see a sequence in many SO questions where a Dockerfile COPYs application code to some directory, and then a docker-compose.yml bind-mounts the current host directory over that same directory. In this setup the container environment reflects the developer's desktop environment and doesn't really test what's getting built into the Docker image.
("Static files" wind up in a gray zone between "is it the application or is it data?" Within the context of this question I'd lean towards packaging them into the image, especially if they come out of your normal build process. That especially includes the primary UI to the application you're running. If it's things like large image or video assets that you could reasonably host on a totally separate server, it may make more sense to serve those separately.)

Docker's standardization of environments

I am struggling on a question that nobody seems to answer in detail on the Internet.
"Standardizing service infrastructure across the entire pipeline allows every team member to work in a production parity environment"
This is a key benefit of Docker : it allows everybody to develop, test or whatever in a production-like environment. Because the container that is passed through the pipeline is always the same.
I get that. I understand that this is necessary and that Docker allows this easily.
But what I don't understand, is why was it so hard before Docker ? If I have a production machine and a testing machine, I won't have any problem building a script that installs the right dependencies, no matter what the machine is. So my environment in terms of libraries or frameworks will be the same.
The only thing that I understand with this whole environment-related benefit, is that Docker allows a developer to choose his OS without fear of the platform-related bugs. I've already run into features that worked on Windows and not on Mac. Worst kind of bugs in my opinion. So yeah if I had Docker at the time, I wouldn't have had this problem. But I don't understand why Docker was such a miracle for other environment-related stuff.
I think I am not understanding this because I've only worked on small scale projects. Maybe I also don't realize the full meaning of the word "environment".
What am I missing here ? Why containers were a breakthrough for standardizing environments, whereas scripts can achieve that ?
The following list is not exhaustive, it represents only three important advantages of docker. Please note that docker is not a magical solution and may not be adapted in specific contexts.
Firstly, with containers you don't have conflicts between dependencies.
If you have two programs using the same library at different version you'll have to manually install both versions and specify custom environment variables before executing your programs. (For example LD_LIBRARY_PATH). Please note that some tools exists to address this issue but only in specific cases (virtualenvs in python for example).
Secondly, with containers you don't have persistence.
For example if you write a little bash script to install your development environment based on Nginx and PHP and by mistake I install Apache, my package will still be present even if you run again your script. The thing is Apache will sometimes starts before Nginx and block the 80 port, breaking your development environment.
To sum up, without docker you're not sure about the state of untracked elements and they may break your environment.
Thirdly, docker allows you to reduce the gap between development and production.
The close environment is "everything needed for your code to run". For example libraries, config files, your interpreter (python, php, ...). Docker packages the application with its close environment so you don't have mismatches between what your app needs and the environment you provided.
This is especially important when you update dependencies during development and may forget to update them in production.
A false argument is security and isolation. The security process starts with defining your threat model and then choosing countermeasures. Adding docker because it increases security in a risky environment won't be enough (there is no kernelspace isolation) and adding docker for security if you don't need more is called paranoïa. Docker adds userspace isolation and default seccomp profiles, but this is not a reason to use it, except if it matches your threat model.

Resources