Microservice with long build time in Docker - docker

We have an in-house c++ tool that we are investigating as a docker microservice and wondering if it's even a good idea.
The issue is that the tool has a lot of dependencies including GDAL which can take 30 minutes to download and compile.
Normally my provisioning steps would look like:
git clone gdal
./configure; make; make install;
git clone myTool
make myTool
My question is how should I approach this problem using docker? I can just put "RUN" statements in my Dockerfile but then it takes a long time to build containers and each one is 600MB+. I'd like to know if there's a better way.

Create a separate base image for gdal then base your final images on that. This way you rarely have to rebuild gdal.
As for image size, there is currently no clean way to distinguish build and runtime dependencies of an docker image. At work we resorted to some bash glue that essentially enables nested docker builds. For further details on that you can check out this repo.

Related

Do I need to share the docker image if I can just share the docker file along with the source code?

I am just starting to learn about docker. Is docker repository (like Docker Hub) useful? I see the docker image as a package of source code and environment configurations (dockerfile) for deploying my application. Well if it's just a package, why can't I just share my source code with the dockerfile (via GitHub for example)? Then the user just downloads it all and uses docker build and docker run. And there is no need to push the docker image to the repository.
There are two good reasons to prefer pushing an image somewhere:
As a downstream user, you can just docker run an image from a repository, without additional steps of checking it out or building it.
If you're using a compiled language (C, Java, Go, Rust, Haskell, ...) then the image will just contain the compiled artifacts and not the source code.
Think of this like any other software: for most open-source things you can download its source from the Internet and compile it yourself, or you can apt-get install or brew install a built package using a package manager.
By the same analogy, many open-source things are distributed primarily as source code, and people who aren't the primary developer package and redistribute binaries. In this context, that's the same as adding a Dockerfile to the root of your application's GitHub repository, but not publishing an image yourself. If you don't want to set up a Docker Hub account or CI automation to push built images, but still want to have your source code and instructions to build the image be public, that's a reasonable decision.
That is how it works. You need to put the configuration files in your code, i.e,
Dockerfile and docker-compose.yml.

Docker dealing with images instead of Dockerfiles

Can someone explain to me why the normal Docker process is to build an image from a Dockerfile and then upload it to a repository, instead of just moving the Dockerfile to and from the repository?
Let's say we have a development laptop and a test server with Docker.
If we build the image, that means uploading and downloading all of the packages inside the Dockerfile. Sometimes this can be very large (e.g. PyTorch > 500MB).
Instead of transporting the large imagefile to and from the server, doesn't it make sense to, perhaps compile the image locally to verify it works, but mostly transport the small Dockerfile and build the image on the server?
This started out as a comment, but it got too long. It is likely to not be a comprehensive answer, but may contain useful information regardless.
Often the Dockerfile will form part of a larger build process, with output files from previous stages being copied into the final image. If you want to host the Dockerfile instead of the final image, you’d also have to host either the (usually temporary) processed files or the entire source repo & build script.
The latter is often done for open source projects, but for convenience pre-built Docker images are also frequently available.
One tidy solution to this problem is to write the entire build process in the Dockerfile using multi-stage builds (introduced in Docker CE 17.05 & EE 17.06). But even with the complete build process described in a platform-independent manner in a single Dockerfile, the complete source repository must still be provided.
TL,DR: Think of a Docker image as a regular binary. It’s convenient to download and install without messing around with source files. You could download the source for a C application and build it using the provided Makefile, but why would you if a binary was made available for your system?
Instead of transporting the large imagefile to and from the server,
doesn't it make sense to, perhaps compile the image locally to verify
it works, but mostly transport the small Dockerfile and build the
image on the server?
Absolutely! You can, for example, set up an automated build on Docker Hub which will do just that every time you check in an updated version of your Dockerfile to your GitHub repo.
Or you can set up your own build server / CI pipeline accordingly.
IMHO, one of the reason for building the images concept and putting into repository is sharing with people too. For example we call Python's out of the box image for performing all python related stuff for a python program to run in Dockerfile. Similarly we could create a custom code(let's take example I did for apache installation with some custom steps(like ports changes and additionally doing some steps) I created its image and then finally put it to my company's repository.
I came to know after few days that may other teams are using it too and now when they are sharing it they need NOT to make any changes simply use my image and they should be done.

Efficient svn checkout in docker container

I want to checkout some files (specifically, test suite http://llvm.org/svn/llvm-project/test-suite/trunk) in my docker container.
Now I just use RUN svn co http://llvm.org/svn/llvm-project/test-suite/trunk train.out/llvm-test-suite inside Dockerfile.
It works, but doesn't look efficient: on each docker-compose I need to wait for ~5 minutes while the tests are loading.
Is there a better way to prevent Docker from checking out this file each time? The only alternative I see for now is including the file into container.
You generally don’t run source-control tools from inside a Dockerfile. Check them out in a host directory (better still, if you can manage it, add the Dockerfile directly to the repository you’re trying to build) and run docker build with all of its inputs directly on disk.
There are a couple of good reasons for this:
Docker image caching can often mean that Docker won’t repeat a “clone”, “checkout”, or “pull” type operation: it knows it’s done it once and already knows the output of it and skips the step, even if there are new commits you don’t have.
Adding tools like svn or git to the image that you only need to build it makes it unnecessarily larger. (Multi-stage builds can avoid this, but they’re relatively new.)
The more common use case for this is to clone a private repository that needs credentials, and it’s hard to avoid leaking those credentials into the final image. (Again multi-stage builds can avoid this, with some care, but it’s better to not have the security exposure at all.)

Best practice/way to develop Golang app to be run in Docker container

Basically what the title says... Is there a best practice or an efficient way to develop a Golang app that will be Dockerized? I know you can mount volumes to point to your source code, and it works great for languages like PHP where you don't need to compile your code. But for Go, it seems like it would be a pain to develop alongside Docker since you pretty much only have two options I guess.
First would be to have a Dockerfile that is just onbuild so it starts the go app when a container is run, thus having to build a new image on every change (whether it be small or not). Or, you do mount your source code dir to the container dir, then attach to the container itself and do the manual go build/run yourself as if you would normally.
Those two ways are really the only way that I see it happening unless you just don't develop your Go app in a docker container. Just develop it as normal, then use the scratch image method where you pre build the Go into a binary then copy that into your container when you are ready to run it. I assume that is probably the way to go, but I wanted to ask more professional people on the subject and maybe get some feedback on the topic.
Not sure it's the best pratice but here is my way.
Makefile is MANDATORY
Use my local machine and my go tools for small iterations
Use a dedicated build container based on golang:{1.X,latest}, mount code directory to build a release, mainly to ensure that my code will build correctly on the CI. (Tips, here is my standard go build command for release build : CGO_ENABLED=0 GOGC=off go build -ldflags -s -w)
Test code
Then use a FROM scratch to build a release container (copy the bin + entrypoint)
Push you image to your registry
Steps 3 to 6 are tasks for the CI.
Important note : this is changing due to the new multistage builds feature : https://docs.docker.com/engine/userguide/eng-image/multistage-build/, no more build vs release containers.
Build container and release container will be merged in one multistage build so one Dockerfile with (not sure about the correct syntax but, you will get the idea) :
FROM golang:latest as build
WORKDIR /go/src/myrepos/myproject
RUN go build -o mybin
FROM scratch
COPY --from=build /go/src/myrepos/myproject/mybin /usr/local/bin/mybin
ENTRYPOINT [ "/usr/local/bin/mybin" ]
Lately, I've been using
https://github.com/thockin/go-build-template
As a base for all of my projects. The template comes with a Makefile that will build/test your application in a Docker.
As far as I understood from you question, you want to have a running container to develop a golang application. The same thing can be done in your host machine also. But good thing is that if you could build such application, then that will be consider as cloud Platform-as-a-Service(PaaS).
The basic requirement of the container will be: Ubuntu image and other packages such as editor, golang compiler and so on.
I would suggest to look on the docker development environment.
https://docs.docker.com/opensource/project/set-up-dev-env/
The docker development environment is running inside a container and the files are mounted from one of the host directory. The container image is build from Ubuntu scratch image and added required packages which are needed to compile docker source code.
I hope you almost got what you are looking for.

Build chain in the cloud?

(I understand this question is somewhat out of scope for stack overflow, because contains more problems and somewhat vague. Suggestions to ask it in the proper ways are welcome.)
I have some open source projects depending in each other.
The code resides in github, the builds happen in shippable, using docker images which in turn are built on docker hub.
I have set up an artifact repo and a debian repository where shippable builds put the packages, and docker builds use them.
The build chain looks like this in terms of deliverables:
pre-zenta docker image
zenta docker image (two steps of docker build because it would time out otherwise)
zenta debian package
zenta-tools docker image
zenta-tools debian package
xslt docker image
adadocs artifacts
Currently I am triggering the builds by pushing to github and sometimes rerunning failed builds on shippable after the docker build ran.
I am looking for solutions for the following problems:
Where to put Dockerfiles? Now they are in the repo of the package needing the resulting docker image for build. This way all information to build the package are in one place, but sometimes I have to trigger an extra build to have the package actually built.
How to trigger build automatically?
..., in a way supporting git-flow? For example if I change the code in zenta develop branch, I want to make sure that zenta-tools will build and test with the development version of it, before merging with master.
Are there a tool with which I can overview the health of the whole build chain?
Since your question is related to Shippable, I've created a support issue for you here - https://github.com/Shippable/support/issues/2662. If you are interested in discussing the best way to handle your scenario, you can also send me an email at support#shippable.com You can set up your entire flow, including building the docker images, using Shippable.

Resources