Using docker to run a distributed computation

Using docker to run a distributed computation - docker

I just came across docker, and was looking through its docs to figure out how to use this to distribute a java project across multiple nodes, while making this distribution platform independent i.e the nodes can be running any platform. Currently i'm sending classes to different nodes and running it on them with the assumption that these nodes have the same environment as the client. I couldn't quite figure out how to do this, any suggestions wouldbe greatly appreciated.

I do something similar. In my humble opinion Docker or not is not your biggest problem. However, using Docker images for this purpose can and will save you a lot of headaches.
We have a build pipeline where a very large Java project is built using Maven. The outcome of this is a single large JAR file that contains the software we need to run on our nodes.
But some of our nods also need to run some 3rd party software such as Zookeeper and Cassandra. So after the Maven build we use packer.io to create a Docker image that contains all needed components which ends up on a web server that can be reached only from within our private cloud infrastructure.
If we want to roll out our system we use a combination of Python scripts that talk with the OpenStack API and create virtual machines on our cloud, and Puppet which performs the actual software provisioning inside of the VMs. Our VMs are CentOS 7 images, so what Puppet actually does is to add the Docker yum repos. Then installs Docker through yum, pulls in the Docker image from our repository server and finally uses a custom bash script to launch our Docker image.
For each of these steps there are certainly even more elegant ways of doing it.

Related

Building a docker image on EC2 for web application with many dependencies

I am very new to Docker and have some very basic questions. I was unable to get my doubts clarified elsewhere and hence posting it here. Pardon me if the queries are very obvious. I know I lack some basic understanding regarding images but i had a hard time finding some easy to understand explanation for the whole of it.
Problem at hand:
I have my application running on an EC2 node (r4.xlarge). It is a web application which has a LOT of dependencies (system dependencies + other libraries etc). I would like to create a docker image of my machine so that i can easily run it at ease when I launch a new EC2 instance.
Questions:
Do i need to build the docker image from scratch or can I use some base image?
If i can use a base image, which one do I select? (It is hard to know the OS version on the EC2 machine and hence I am not sure which base image do i start on.
I referred this documentation-
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/docker-basics.html#install_docker
But it creates from an Ubuntu base image.
The above example has instructions on installing apache (and other things needed for the application). Let's say my application needs server X to be installed + 20 system dependencies + 10 other libraries.
Ex:
yum install gcc
yum install gfortran
wget <abc>
When I create a docker file do i need to specify all the installation instructions like above? I thought creating an image is like taking a copy of your existing machine. What is the docker file supposed to have in this case?
Pointing me out to some good documentation to build a docker image on EC2 for a web app with dependencies will be very useful too.
Thanks in advance.

First, if you want to move toward docker then I will suggest using AWS ECS which specially designed for docker container and have auto-scaling and load balancing feature.
As for your question is concern so
You need a docker file which contains all the packages and application which already installed in your EC2 instance. As for base image is concern i will recommend Alpine. Docker default image is Alpine
Why Alpine?
Alpine describes itself as:
Small. Simple. Secure. Alpine Linux is a security-oriented,
lightweight Linux distribution based on musl libc and busybox.
https://nickjanetakis.com/blog/the-3-biggest-wins-when-using-alpine-as-a-base-docker-image
https://hub.docker.com/_/alpine/
Let's say my application needs server X to be installed + 20 system
dependencies + 10 other libraries.
So You need to make dockerfile which need all these you mentioned.
Again I will suggest ECS for best docker based application because that is ECS that designed for docker, not EC2.
CONTAINERIZE EVERYTHING
Amazon ECS lets you easily build all types of
containerized applications, from long-running applications and
microservices to batch jobs and machine learning applications. You can
migrate legacy Linux or Windows applications from on-premises to the
cloud and run them as containerized applications using Amazon ECS.
https://aws.amazon.com/ecs/
https://aws.amazon.com/getting-started/tutorials/deploy-docker-containers/
https://caylent.com/containers-kubernetes-docker-swarm-amazon-ecs/

You can use a base image, you specify it with the first line of
your Docker file, with FROM
The base OS of the EC2 instance doesn't matter for the container.
that's the point of containers, you can run linux on windows, arch
on debian, whatever you want.
Yes, dependencies that don't exist in your base image will need to
be specified and installed. ( Depending on the default packager
manger for the base image you are working from you might use dpkg,
or yum or apt-get. )

What are the advantages of running Jenkins in a docker container

I've found quite a few blogs on how to run your Jenkins in Docker but none really explain the advantages of doing it.
These are the only reasons I found:reasons to use Docker.
1) I want most of the configuration for the server to be under version control.
2) I want the ability to run the build server locally on my machine when I’m experimenting with new features or configurations
3) I want to easily be able to set up a build server in a new environment (e.g. on a local server, or in a cloud environment such as AWS)
Luckily I have people who take care of my Jenkins server for me so these points don't matter as much.
Are these the only reasons or are there better arguments I'm overlooking, like automated scaling and load balancing when many builds are triggered at once (I assume this would be possible with Docker)?

This answer for Docker, what is it and what is the purpose
covered What is docker? and Why docker?
Docker official site also provides an explanation.
The simple guide here is:
Faster delivery of your applications
Deploy and scale more easily
Get higher density and run more workloads
Faster deployment makes for easier management
For Jenkins usage, it's faster and easier to deploy/install in the docker way.
Maybe you don't need the scale more easily feature right now. And since the docker is quite lightweight, so you can run more workloads.
However
The docker way would also bring some other problem. Generally speaking, it's the accessing privilege.
Like when you need to run Docker inside the Jenkins(in Docker), it would become complicated somehow. This blog would provide you with some knowledge of that situation.
So there is no silver bullet as always. There is no single development, in either technology or in management technique, that by itself promises even one order-of-magnitude improvement in productivity, in reliability, in simplicity.
The choice should be made based on the specific scenario.

Jenkins as Code
You list mainly the advantages of having "Jenkins as Code". Which is a very powerfull setup indeed, but does not necessary requires Docker.
So why is Docker the best choice for a Jenkins as Code setup?
Docker
The main reason is that Jenkins pipelines work really well with Docker. Without Docker you need to install additional tools and add different agents to Jenkins. With Docker,
there is no need to install additional tools, you just use images of these tools. Jenkins will download them from internet for you (Docker Hub).
For each stage in the pipeline you can use a different image (i.e. tool). Essentially you get "micro Jenkins agents" which only exists temporary. Hence you do not need fixed agents anymore. This makes your Jenkins setup much more clean.
Getting started
A while ago I have written an small blog on how to get started with Jenkins and Docker, i.e. create a Jenkins image for development which you can launch and destroy in seconds.

docker is great for run-anywhere but what about the machines to host docker?

I am wondering how do we make machines that host docker to be easily replaceable. I would like something like a Dockerfile that contains instructions on how to set-up the machine that will host docker. Is there a way to do that?
The naive solution would be to create an official "docker host" binary image to install on new machines, but I would like to have something that is reproducible and transparent like the dockerfile?
It seems like tools like Vagrant, Puppet, or Chef may be useful but they appear to be for virtual machine procurement and they seem to all require set-up of some sort of "master node" server. I am not going to be spinning up and tearing down regularly so a master server is a waste of a server, I just want something that is reproducible in the event i need to set-up or replace a new machine.

this is basically what docker-machine does for you https://docs.docker.com/machine/overview/
and other "orchestration" systems will make this automated and easier, as well

There are lots of solutions to this with no real one size fits all answer.
Chef and Puppet are the popular configuration management tools that typically use a centralized server. Ansible is another option that typically runs without a server and just connects with ssh to configure the host. All three of these works very similarly, so if your concern is simply managing the CM server, Ansible may be the best option for you.
For VM's Vagrant is the typical solution and it can be combined with other tools like Ansible to provision the VM after creating it.
In the cloud space, there's tools like Terraform or vendor specific tools like CloudFormation.
Docker is working on a project called Infrakit to deploy infrastructure the way compose deploys containers. It includes hooks for several of the above tools, including Terraform and Vagrant. For your own requirements, this may be overkill.
Lastly, for designing VM images, Docker recently open sourced their Moby project which creates the VM image containing a minimal container OS, the same one used under the covers in Docker for Windows, Docker for Mac, and possibly some of the cloud hosing providers.

We automate Docker installation on hosts using Ansible + Jenkins. Given the propper SSH access, provisioning new Docker hosts is a matter of triggering a Jenkins job.

Docker for non-code deployments?

I am trying to help a sysadmin group reduce server & service downtime on the projects they manage. Their biggest issue is that they have to take down a service, install upgrade/configure, and then restart it and hope it works.
I have heard that docker is a solution to this problem, but usually from developer circles in the context of deploying their node/python/ruby/c#/java, etc. applications to production.
The group I am trying to help is using vendor software that requires a lot of configuration and management. Can docker still be used in this case? Can we install any random software on a container? Then keep that in a private repository, upgrade versions, etc.?
This is a windows environment if that makes any difference.

Docker excels at stateless applications. You can use it for persistent data style applications, but requires the use of volume commands.
Can docker still be used in this case?
Yes, but it depends on the application. It should be able to be installed headless, and a couple other things that are pretty specific. (EG: talking to third party servers to get an license can create issues)
Can we install any random software on a container?
Yes... but: remember that when the container restarts, that software will be gone. It's better to create it as an image, and then deploy it.See my example below.
Then keep that in a private repository, upgrade versions, etc.?
Yes.
Here is an example pipeline:
Create a Dockerfile for the OS and what steps it takes to install the application. (Should be headless)
Build the image (at this point, it's called an image, not a container)
Test the image locally by creating a local container. This container is what has the configuration data such as environment variables, the volumes for persistent data it needs, etc.
If it satisifies the local developers wants, then you can either:
Let your build servers create the image and publish it an internal
docker registry (best practice)
Let your local developer publish it
to an internal docker registry
At that point, your next level environments can then pull down the image from the docker registry, configure them and create the container.
In short, it will require a lot of elbow grease but is possible.

Can we install any random software on a container?
Generally yes, but you can have many problems with legacy software which was developed to work on bare metal.
At first it can be persistence problem, but it can be solved using volumes.
At second program that working good on full OS can work not so good in container. Containers have some difference with VM's or bare metal. For example due to missing init process some containers have zombie process issue. About others difference you can read here
Docker have big profit for stateless apps, but some heave legacy apps can work not so good inside containers and should be tested good before using it in production.

A Docker workflow for a developers team

In our team we currently use vagrant as a development environment. Now I want to replace it with docker, but I can't understand the team workflow with it.
This is what confuses me: with vagrant I create a project repo with a Vagrantfile in it, and every developer pulls a repo and runs vagrant up. If the project needs some changes in environment, I edit Vagrantfile, chef recipe or requirements-file, and developers must run vagrant provision to get an updated environment.
But with docker I see at least two options:
create a Dockerfile and put it in repo, every developer builds an image from it. On every change they rebuild their own image.
build an image, put it on server, every developer pulls it and run. On every change rebuild and image on server (maybe some auto-rebuilds on server and auto-pull scripts).
Docker phylosophy is 'build once, run anywhere', but the same time we have a Dockerfile in repo... What do you think about it? How do you do this in your team?

Docker is more for production as for development
Docker is a deployment tool to package apps for production (or tests environments). It is not so much a tool for development. It is meant to create an isolated environment to run your already developed application somewhere on a server, the cloud or your laptop.
Use a Dockerfile to document the packaging
I think it is nice to have a Dockerfile in your project. Similar to a Vagrant file, it is a kind of executable documentation which describes how your production environment should look like. Somebody who is new to your project could just run the file and will get a packaged and ready-to-run container. Cool!
Use a registry to integrate Docker
I think you should provide a (private) Docker registry if you integrate Docker into your (CI) workflow (e.g. into test and build systems). A single repository to store validated and tested images of all your products will definitely speed-up your time to create new test or production systems (e.g. to scale your app or to setup an installation for a demo or a customer). If your product is open source, consider the public Docker index so people could find your stuff there. You can configure your build system to create a new Docker image after each (successful) build and to push it to the registry. Since the images are layered (and those layers are shared), this will be fast and will not take to much disk space.
If you want to integrate Docker in your development, I don't see so much possibilities:
You can create a repository with final images (as described before)
Or you can use Docker images to develop against them (e.g. to run a MongoDB)
Maybe you have a team A which programs against the API of team B and always needs a running instance of team B's product. Then you could package this product into a Docker image and share it with team A. In this case, team B should provide the image in a repository (and team A shouldn't take care how to build it and use it as a blackbox).
Edit: If you depend on many external apps
To make this "team A and team B" thing more clear: If you develop an app against many other tools, e.g. an app from another team, a MongoDB or an Elasticsearch, you can package those apps into Docker images, run them (locally) and develop against them. You will also have a good chance to find popular apps (such as MongoDB) in the public Docker Index. So instead of installing them manually, you can just pull and start them. But to put together an environment like this, you will need Vagrant again.
You could also use Docker for test environments (build and run an image and test against it). But this wouldn't be a replacement for Vagrant in development.
Vagrant + Docker
I would suggest to use both. Provide a Vagrantfile to build the development environment and provide a Dockerfile to build the production environment.
Also take a look at http://docs.vagrantup.com/v2/provisioning/docker.html. Vagrant has a Docker integration since a while, so you can create Docker containers/environments with Vagrant.

I work for Docker.
Think of Docker (and the Docker index/registry) as equivalent to Git. You don't have to make this very hard. If you change a Dockerfile, it is a cheap and quick operation to update an image. If you use "Trusted Builds" in our registry, then you can have it build automatically off of any branch at any time you want.
These are basic building blocks, but it works great for development. Docker itself is built and developed inside of Docker containers, so we know it works fine.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart