Docker Image Requirements for a GitLab-CI Runner - docker

I have GitLab, GitLab-CI and gitlab-ci-multi-runner running on different machines. I've successfully added a runner using docker and the ruby:2.1 image as detailed on https://gitlab.com/gitlab-org/gitlab-ci-multi-runner/blob/master/docs/install/linux-repository.md
What I'd like to do next is have runners for a minimal Ubuntu 12.04, 14.04 configured. Not knowing docker, I thought I'd try to use the ubuntu:14.04 and ubuntu:12.04 images. However when the start up and try to clone my project repo, they complain about git not being found. I then assumed that with these images, git wasn't installed. So my questions are:
What tools need to be available in a docker image to be used "out-of-the-box" by the gitlab-ci-multi-runner
Are there a set of images already available for various OS's with these already included
Should I really be looking to create my own docker images for this purpose?

Popular base images that contain commonly used software for building software are (based on) buildpack-deps https://hub.docker.com/r/library/buildpack-deps/ (e. g. openjdk is based on that)
In your case you could specify FROM buildpack-deps:stretch-scm which is based on Debian Stretch and includes source code management (SCM) tools like git.
I suppose you could create a Gitlab Runner that uses these images as default. But I think you should always specify the needed image in your .gitlab-ci.yml-file.
I would always hesitate in creating my own Docker Images, because there are a lot available. Still many specific ones are maintained (or better not maintained) by one person for a specific use case that often does not fit your application. The best choice would be to pick a base-image and only add minor RUN commands for the customization.

Related

Docker best practice: use OS or application as base image?

I would like to build a docker image which contains Apache and Python3. What is the suggested base image to use in this case?
There is a offical apache:2.4.43-alpine image I can use as the base image, or I can install apache on top of a alpine base image.
What would be the best approach in this case?
Option1:
FROM apache:2.4.43-alpine
<Install python3>
Option2:
FROM alpine:3.9.6
<Install Apache>
<Install Python3>
Here are my rules.
rule 1: If the images are official images (such as node, python, apache, etc), it is fine to use them as your application's base image directly, more than you build your own.
rule 2:, if the images are built by the owner, such as hashicorp/terraform, hashicorp is the owner of terraform, then it is better to use it, more than build your own.
rule 3: If you want to save time only, choice the most downloaded images with similar applications installed as base image
Make sure you can view its Dockerfile. Otherwise, don't use it at all, whatever how many download counted.
rule 4: never pull images from public registry servers, if your company has security compliance concern, build your own.
Another reason to build your own is, the exist image are not built on the operation system you prefer. Such as some images proved by aws, they are built with amazon linux 2, in most case, I will rebuild with my own.
rule 5: When build your own, never mind from which base image, no need reinvent the wheel and use exis image's Dockerfile from github.com if you can.
Avoid Alpine, it will often make Python library installs muuuuch slower (https://pythonspeed.com/articles/alpine-docker-python/)
In general, Python version is more important than Apache version. Latest Apache from stable Linux distro is fine even if not latest version, but latest Python might be annoyingly too old. Like, when 3.9 comes out, do you want to be on 3.7?
As such, I would recommend python:3.8-slim-buster (or whatever Python version you want), and install Apache with apt-get.

How to compose it?

Target: build opencv docker
Dockerfile creation:
From Ubuntu14.04
or
From Python3.7
Which to choose and why?
I was trying to write dockerfile from scratch without copy paste from others dockerfile.
I would usually pick the highest-level Docker Hub library image that matches what I need. It's also worth searching the https://hub.docker.com/ search box which will often find relevant things, though of rather varied ownership and maintenance levels.
The official Docker Hub images tend to have thought through a lot of issues around persistence and configuration and first-time setup. Compare "I'll just apt-get install mysql-server" with all of the parts that go into the official mysql image; just importing that real-world experience and reusing it can save you some trouble.
I'd consider building my own from an OS base like ubuntu:16.04 if:
There is a requirement that Docker images must be built from some specific distribution base ("my job requires everything to be built off of CentOS so I need a CentOS-based MySQL image")
I need a combination of software versions or patches that the Docker Hub image no longer supports (jruby:9.1.16.0 is no longer being built, so if I need OS updates, I need to build my own base image)
I need an especially exotic set of build options for whatever reason ("I have a C extension that only works if the interpreter is specifically built with UTF-16 Unicode support")
I need or want very detailed control over what version(s) of software are embedded; for example if it's something Java-based where there's a JVM version and a runtime version and an application version that all could matter
In my opinion you should choose From Python3.7.
Since you are writing a dockerfile for opencv which is an open source computer vision and machine learning software library so you may require python also in your container.
Now if you use From Ubuntu14.04 you may need to add python also in the dockerfile whereas with From Python3.7 that will become redundant and will also make the dockerfile a bit shorter.

Building a docker image on EC2 for web application with many dependencies

I am very new to Docker and have some very basic questions. I was unable to get my doubts clarified elsewhere and hence posting it here. Pardon me if the queries are very obvious. I know I lack some basic understanding regarding images but i had a hard time finding some easy to understand explanation for the whole of it.
Problem at hand:
I have my application running on an EC2 node (r4.xlarge). It is a web application which has a LOT of dependencies (system dependencies + other libraries etc). I would like to create a docker image of my machine so that i can easily run it at ease when I launch a new EC2 instance.
Questions:
Do i need to build the docker image from scratch or can I use some base image?
If i can use a base image, which one do I select? (It is hard to know the OS version on the EC2 machine and hence I am not sure which base image do i start on.
I referred this documentation-
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/docker-basics.html#install_docker
But it creates from an Ubuntu base image.
The above example has instructions on installing apache (and other things needed for the application). Let's say my application needs server X to be installed + 20 system dependencies + 10 other libraries.
Ex:
yum install gcc
yum install gfortran
wget <abc>
When I create a docker file do i need to specify all the installation instructions like above? I thought creating an image is like taking a copy of your existing machine. What is the docker file supposed to have in this case?
Pointing me out to some good documentation to build a docker image on EC2 for a web app with dependencies will be very useful too.
Thanks in advance.
First, if you want to move toward docker then I will suggest using AWS ECS which specially designed for docker container and have auto-scaling and load balancing feature.
As for your question is concern so
You need a docker file which contains all the packages and application which already installed in your EC2 instance. As for base image is concern i will recommend Alpine. Docker default image is Alpine
Why Alpine?
Alpine describes itself as:
Small. Simple. Secure. Alpine Linux is a security-oriented,
lightweight Linux distribution based on musl libc and busybox.
https://nickjanetakis.com/blog/the-3-biggest-wins-when-using-alpine-as-a-base-docker-image
https://hub.docker.com/_/alpine/
Let's say my application needs server X to be installed + 20 system
dependencies + 10 other libraries.
So You need to make dockerfile which need all these you mentioned.
Again I will suggest ECS for best docker based application because that is ECS that designed for docker, not EC2.
CONTAINERIZE EVERYTHING
Amazon ECS lets you easily build all types of
containerized applications, from long-running applications and
microservices to batch jobs and machine learning applications. You can
migrate legacy Linux or Windows applications from on-premises to the
cloud and run them as containerized applications using Amazon ECS.
https://aws.amazon.com/ecs/
https://aws.amazon.com/getting-started/tutorials/deploy-docker-containers/
https://caylent.com/containers-kubernetes-docker-swarm-amazon-ecs/
You can use a base image, you specify it with the first line of
your Docker file, with FROM
The base OS of the EC2 instance doesn't matter for the container.
that's the point of containers, you can run linux on windows, arch
on debian, whatever you want.
Yes, dependencies that don't exist in your base image will need to
be specified and installed. ( Depending on the default packager
manger for the base image you are working from you might use dpkg,
or yum or apt-get. )

Docker image just contain software without OS

I have a PROD environment running on RHEL 7 server. I want to use docker for deployment. I want to package all the software and apps in a Docker image, without a base OS. Because I don't want to add an additional layer on top of RHEL. Also, I could not find an official base image for RHEL. Is that possible?
I see some old posts mentioned about "FROM scratch" but looks it does not work in the latest version of Docker -- 1.12.5.
If this is impossible, any suggestions for this?
Docker is designed to also abstract the OS dependencies - that is what it has been build for. Beside it also encapsulates the runtime, memory and things, it specifically is used as a extreme-better variant of chroot ( lets say chroot on ultra-steroids ).
It seems like you neither want the runtime seperation nor the OS layer seperation ( dependencies ) - thus docker makes absolutely no sense for you.
Deploying with docker the is not "simple" or simpler as using other tools. You can use capistrano or, probably something like https://www.habitat.sh/ which actually does not require a software to be bundled in docker containers to be "deployable", it also works on barebones and uses its own packaging format. Thus you have a state-of-the-art deployment solution, and with habitat, you can later even upgrade using docker-containers.

Using docker to run a distributed computation

I just came across docker, and was looking through its docs to figure out how to use this to distribute a java project across multiple nodes, while making this distribution platform independent i.e the nodes can be running any platform. Currently i'm sending classes to different nodes and running it on them with the assumption that these nodes have the same environment as the client. I couldn't quite figure out how to do this, any suggestions wouldbe greatly appreciated.
I do something similar. In my humble opinion Docker or not is not your biggest problem. However, using Docker images for this purpose can and will save you a lot of headaches.
We have a build pipeline where a very large Java project is built using Maven. The outcome of this is a single large JAR file that contains the software we need to run on our nodes.
But some of our nods also need to run some 3rd party software such as Zookeeper and Cassandra. So after the Maven build we use packer.io to create a Docker image that contains all needed components which ends up on a web server that can be reached only from within our private cloud infrastructure.
If we want to roll out our system we use a combination of Python scripts that talk with the OpenStack API and create virtual machines on our cloud, and Puppet which performs the actual software provisioning inside of the VMs. Our VMs are CentOS 7 images, so what Puppet actually does is to add the Docker yum repos. Then installs Docker through yum, pulls in the Docker image from our repository server and finally uses a custom bash script to launch our Docker image.
For each of these steps there are certainly even more elegant ways of doing it.

Resources