My goal is to compare two docker solutions for my golang app:
use ubuntu as base image
use golang:alpine as base image
My Dockerfile is quite straight-forward, similar to:
FROM ubuntu:20.04
# FROM golang:alpine for alpine based images
# myApp binary is pre-built before running docker build
COPY bin/myApp app/myApp
COPY myApp-config.json app/myApp-config.json
CMD MYAPP_CONFIG=app/myApp-config.json ./app/myApp
With alpine-based images, I hit the same issue here, where /app/myApp cannot be started due to missing CGO dynamic links in my generated image. I addressed it by disabling CGO during go build:
CGO_ENABLED=0
Since I'm quite new to docker, my questions are:
Is there any risk disabling CGO? My understanding is that go build will fall back to native go implementation of CGO, but I don't know whether it has any hidden traps. My app does have critical dependency on net/http which seems to be requiring presence of CGO during runtime.
It seems to be that alpine images are the defacto base image to use for golang, what is the standard way to deal with this CGO problem?
Thanks!
If your app is net/http based - then really the only consideration you may need to worry about is with DNS resolution.
TL;DR
The only gotchas you may see is in a kubernetes environment with a hostname
ending in .local
With CGO_ENABLED=1 (the go build default) it will use the native OS's DNS resolver.
With CGO_ENABLED=0 used with scratch Docker builds - then Go's DNS resolver is used.
What's the difference? See the official Go docs:
The method for resolving domain names, whether indirectly with
functions like Dial or directly with functions like LookupHost and
LookupAddr, varies by operating system.
On Unix systems, the resolver has two options for resolving names. It
can use a pure Go resolver that sends DNS requests directly to the
servers listed in /etc/resolv.conf, or it can use a cgo-based resolver
that calls C library routines such as getaddrinfo and getnameinfo.
By default the pure Go resolver is used, because a blocked DNS request
consumes only a goroutine, while a blocked C call consumes an
operating system thread. When cgo is available, the cgo-based resolver
is used instead under a variety of conditions: on systems that do not
let programs make direct DNS requests (OS X), when the LOCALDOMAIN
environment variable is present (even if empty), when the RES_OPTIONS
or HOSTALIASES environment variable is non-empty, when the ASR_CONFIG
environment variable is non-empty (OpenBSD only), when
/etc/resolv.conf or /etc/nsswitch.conf specify the use of features
that the Go resolver does not implement, and when the name being
looked up ends in .local or is an mDNS name.
Personally, I've built dozens of net/http based microservices running in kubernetes in scratch Docker containers with CGO_ENABLED=0 without issue.
Related
I am new to Docker and confused about the concept of containerization. I wonder about the key difference between Docker container and Binder project. Here is the definition from Google search:
A Docker container is a standardized, encapsulated environment that runs applications.
A Binder or "Binder-ready repository" is a code repository that contains both code and content to run, and configuration files for the environment needed to run it.
Can anyone elaborate it a bit? Thanks!
You confusion is understandable. Docker itself is a lot to follow and then adding in Binder, makes it even more complex if you look behind the curtain.
One big point to be aware of is that much of the use of MyBinder.org by a typical user is targeted at eliminating the need for those users to learn about Docker, Dockerfile syntax, and concepts of a container, etc.. The idea of the configuration files that you include to make your repository 'Binder-ready' is to make it easier to make a resulting container without the need for writing a dockerfile with the dockerfile syntax. You can more-or-less simply list the packages you need in requirements.txt or environment.yml and not deal with making a dockerfile while still getting those dependencies already installed in the container you end up working in. environment.yml is a step above in complexity from requirements.txt as the .yml file has a syntax to it while requirements.txt at it's most basic is simply a list. The active container the user gets at the end of the launch is not readily apparent to the typical user. Typically, they should go from launching a session to having the environment they specified on an active JupyterHub session on a remote computer.
Binder combines quite a bit of tech including docker to make things like MyBinder.org work. MyBinder.org is a public BinderHub. And a Binderhub is actually just a specialized JupyterHub running on a cluster that uses images to serve up containers to users.
You can point a MyBinder.org at a repo and it will spin up a JupyterHub session with that content, and an environment based on any configuration files in the repository. If there aren't any configuration files, you'll have the content but it just gives you a default Python stack.
Binder uses repo2docker to take a repository and make it into something that can work with docker. You can run repo2docker yourself locally on your own machine and use the produced image to spawn a running container if you want.
The built images specify the environment on the JupyterHub you get from MyBinder.org has backing it. In fact, the session you get served from MyBinder.org is a running docker container running on a Kubernetes cluster.
While testing new Docker builds (modifying Dockerfile) it can take quite some time for the image to rebuild due to huge download size (either direct by wget, or indirect using apt, pip, etc)
One way around this that I personally use often is to just split commands I plan to modify into their own RUN variable. This avoids re-downloading some parts because previous layers are cached. This, however, doesn't cut it if the command that requires "tuning" is early on in the Dockerfile.
Another solution is to use an image that already contains most of the required packages so that it would just be pulled once and cached, but this can come with unnecessary "baggage".
So is there a straight forward way to cache all downloads done by Docker while building/running? I'm thinking of having Memcached on the host machine but it seems kind of an overkill. Any suggestions?
I'm also aware that I can test in an interactive shell first but sometimes you need to test the Dockerfile and make sure it's production-ready (including arguments and defaults) especially if the only way you will ever see what's going on after that point is ELK or cluster crash logs
This here:
https://superuser.com/questions/303621/cache-for-apt-packages-in-local-network
Is the same question but regarding a local network instead of the same machine. However, the answer can be used in this scenario, it's actually a simpler scenario than a network with multiple machines.
If you install Squid locally you can use it to cache all your downloads including your host-side downloads.
But more specifically, there's also a Squid Docker image!
Headsup: If you use a squid service in a docker-compose file, don't forget to use the squid service name instead of docker's subnet gateway 172.17.0.1:3128 becomes squid:3128
the way i did this was
used the new --mount=type=cache,target=/home_folder/.cache/curl
wrote a script which looks into the cache before calling curl (wrapper over curl with cache)
called the script in the Dockerfile during build
it is more a hack, works
I have a software suite (node web server, database, other tools) that I'm developing inside a corporate firewall, building into docker images, and deploying with docker-compose. In order to actually install all the software into the images, I need to set up the environment to use a network proxy, and also to disable strict SSL checking (because the firewall includes ssl inspection), not only in terms of environment variables but also for npm, apt and so on.
I've got all this working so that I can build within the firewall and deploy within the firewall, and have set up my Dockerfiles and build scripts so that enabling all the proxy/ssl config stuff is dependent on a docker --build-arg which sets an environment variable via ENV enable_proxies=$my_build_arg, so I can also just as easily skip all that configuration for building and deploying outside the firewall.
However, I need to be able to build everything inside the firewall, and deploy outside it. Which means that all the proxy stuff has to be enabled at build time (so the software packages can all be installed) if the relevant --build-arg is specified, and then also separately either enabled or disabled at runtime using --env enable_proxies=true or something similar.
I'm still relatively new to some aspects of Docker, but my understanding is that the only thing executed when the image is run is the contents of the CMD entry in the Dockerfile, and that CMD can only execute a single command.
Does anyone have any idea how I can/should go about separating the proxy/ssl settings during build and runtime like this?
You should be able to build and ship a single image; “build inside the firewall, deploy outside” is pretty normal.
One approach that can work for this is to use Docker’s multi-stage build functionality to have two stages. The first maybe has special proxy settings and gets the dependencies; the second is the actual runtime image.
FROM ... AS build
ARG my_build_arg
ENV enable_proxies=$my_build_arg
WORKDIR /artifacts
RUN curl http://internal.source.example.com/...
FROM ...
COPY --from=build /artifacts/ /artifacts/
...
CMD ["the_app"]
Since the second stage doesn’t have an ENV directive, it never will have $enable_proxies set, which is what you want for the actual runtime image.
Another similar approach is to write a script that runs on the host that downloads dependencies into a local build tree and then runs docker build. (This might be required if you need to support particularly old Dockers.) Then you could use whatever the host has set for $http_proxy and not worry about handling the proxy vs. non-proxy case specially.
I have a custom kernel module I need to build for a specific piece of hardware. I want to automate setting up my system so I have been containerizing several applications. One of the things I need is this kernel module. Assuming the kernel headers et al in the Docker container and the kernel on the host are for the exact same version, is it possible to have my whole build process containerized and allow the host to use that module?
Many tasks that involve controlling the host system are best run directly on the host, and I would avoid Docker here.
At a insmod(8) level, Docker containers generally run with a restricted set of permissions and can’t make extremely invasive changes like this over the host. There’s probably a docker run --cap-add option that would theoretically make it possible, but a significant design statement of Docker is that container processes aren’t supposed to be able to impact other containers or the host like this.
At an even broader Linux level, the build version of custom kernel modules has to match the host’s kernel exactly. This means, if you update the host kernel (for a routine security update for example) you have to also rebuild and reinstall any custom modules. Mainstream Linux distributions have support for this, but if you’ve boxed away management of this into a container, you have to remember how to rebuild the container with the newer kernel headers and make sure it doesn’t get restarted until you reboot the host. That can be tricky.
At a Docker level, you’re in effect building an image that can only be used on one very specific system. Usually the concept is to build an image that can be reused in multiple contexts; you want to be able to push the image to a registry and run it on another system with minimal configuration. It’s hard to do this if an image is tied to an extremely specific kernel version or other host-level dependency.
There is an option to use FROM scratch for me it looks like a really attractive way of building my Go containers.
My question is what does it still have natively to run binaries do I need to add anything in order to reliably run Go binaries? Compiled Go binary seems to run it at least on my laptop.
My goal is to keep image size to a minimum both for security and infra management reasons. In an optimal situation, my container would not be able to execute binaries or shell commands outside of build phase.
The scratch image contains nothing. No files. But actually, that can work to your advantage. It turns out, Go binaries built with CGO_ENABLED=0 require absolutely nothing, other than what they use. There are a couple things to keep in mind:
With CGO_ENABLED=0, you can't use any C code. Actually not too hard.
With CGO_ENABLED=0, your app will not use the system DNS resolver. I don't think it does by default anyways because it's blocking and Go's native DNS resolver is non-blocking.
Your app may depend on some things that are not present:
Apps that make HTTPS calls (as in, to other services, i.e. Amazon S3, or the Stripe API) will need ca-certs in order to confirm HTTPS certificate authenticity. This also has to be updated over time. This is not needed for serving HTTPS content.
Apps that need timezone awareness will need the timezone info files.
A nice alternative to FROM scratch is FROM alpine, which will include a base Alpine image - which is very tiny (5 MiB I believe) and includes musl libc, which is compatible with Go and will allow you to link to C libraries as well as compile without setting CGO_ENABLED=0. You can also leverage the fact that alpine is regularly updated, using its tzinfo and ca-certs.
(It's worth noting that the overhead of Docker layers is amortized a bit because of Docker's deduplication, though of course that is negated by how often your base image is updated. Still, it helps sell the idea of using the quite small Alpine image.)
You may not need tzinfo or ca-certs now, but it's better to be safe than sorry; you can accidentally add a dependency without realizing it breaks your build. So I recommend using alpine as your base. alpine:latest should be fine.
Bonus: If you want the advantages of reproducible builds inside Docker, but with small image sizes, you can use the new Docker multi-stage builds available in Docker 17.06+.
It works a bit like this:
FROM golang:alpine
ADD . /go/src/github.com/some/gorepo # may need some go getting if you don't vendor
RUN go build -o /app github.com/some/gorepo
FROM scratch # or alpine
COPY --from=0 /app /app
ENTRYPOINT ["/app"]
(I apologize if I've made any mistakes, I'm typing that from memory.)
Note that when using FROM scratch you must use the exec form of ENTRYPOINT, because the shell form won't work (it depends on the Docker image having /bin/sh, which it won't.) This will work fine in Alpine.