Setting up multi-stage Docker build on Heroku - docker

[Edit: It looks like my specific question is how to push a multi-stage Docker build to Heroku]
I'm trying to set up a NLP server using the spacy-api-docker Github repository.
The project README lists a base image (jgontrum/spacyapi:base_v2) with no included language models as well an English language model image (jgontrum/spacyapi:en_v2) which is what I'm looking for.
When I pull and run the English language image the localhost API works perfectly, but when I try to build an image from the cloned Github repository the main Dockerfile seems to only build the base model (which is useless), and when I follow the steps listed in this heroku docker documentation and this other third party tutorial to push the container to Github it only seems to use that base Dockerfile - I can get the api running but it's useless with no models.
The repository also has a bunch of shorter language-specific Dockerfiles in a subfolder which I'm guessing need to be specified in some way? Just sticking the english Dockerfile after the main Dockerfile didn't work, at any rate.
My guess is that I might have to:
a. figure out how to push an image from Docker hub to Heroku without
a repository (the only image that's worked completely I pulled
directly from docker)
b. figure out how to make a repository from a
pulled image, which I can then make into a Heroku project with heroku
create
c. figure out how to specify the :en_v2 part when I build to
Heroku from the repository (is that a Docker tag? does it somehow
specify which additional Dockerfile to use?)
d. look into multi-stage Docker builds
I'm new to programming and have been banging my head against this for a while now, so would be very grateful for any pointers (and please pardon any terms I've used poorly, my vocabulary is pretty basic for this stuff).
Thanks!

What I can help you is just show sample code if just you wanna know how to setup multistage build and how to build.
I'm also using multistage build on Docker because several containers are required on this system and just show you related source code as below.
multistage build on Dockerfile
https://github.com/hiromaily/go-goa/blob/master/docker/Dockerfile.multistage.heroku
how to deploy to Heroku from travis-ci in my case
https://github.com/hiromaily/go-goa/blob/master/.travis.yml
I didn't read carefully, so if I miss that point, please ignore me.

Related

Do I need to share the docker image if I can just share the docker file along with the source code?

I am just starting to learn about docker. Is docker repository (like Docker Hub) useful? I see the docker image as a package of source code and environment configurations (dockerfile) for deploying my application. Well if it's just a package, why can't I just share my source code with the dockerfile (via GitHub for example)? Then the user just downloads it all and uses docker build and docker run. And there is no need to push the docker image to the repository.
There are two good reasons to prefer pushing an image somewhere:
As a downstream user, you can just docker run an image from a repository, without additional steps of checking it out or building it.
If you're using a compiled language (C, Java, Go, Rust, Haskell, ...) then the image will just contain the compiled artifacts and not the source code.
Think of this like any other software: for most open-source things you can download its source from the Internet and compile it yourself, or you can apt-get install or brew install a built package using a package manager.
By the same analogy, many open-source things are distributed primarily as source code, and people who aren't the primary developer package and redistribute binaries. In this context, that's the same as adding a Dockerfile to the root of your application's GitHub repository, but not publishing an image yourself. If you don't want to set up a Docker Hub account or CI automation to push built images, but still want to have your source code and instructions to build the image be public, that's a reasonable decision.
That is how it works. You need to put the configuration files in your code, i.e,
Dockerfile and docker-compose.yml.

Docker dealing with images instead of Dockerfiles

Can someone explain to me why the normal Docker process is to build an image from a Dockerfile and then upload it to a repository, instead of just moving the Dockerfile to and from the repository?
Let's say we have a development laptop and a test server with Docker.
If we build the image, that means uploading and downloading all of the packages inside the Dockerfile. Sometimes this can be very large (e.g. PyTorch > 500MB).
Instead of transporting the large imagefile to and from the server, doesn't it make sense to, perhaps compile the image locally to verify it works, but mostly transport the small Dockerfile and build the image on the server?
This started out as a comment, but it got too long. It is likely to not be a comprehensive answer, but may contain useful information regardless.
Often the Dockerfile will form part of a larger build process, with output files from previous stages being copied into the final image. If you want to host the Dockerfile instead of the final image, you’d also have to host either the (usually temporary) processed files or the entire source repo & build script.
The latter is often done for open source projects, but for convenience pre-built Docker images are also frequently available.
One tidy solution to this problem is to write the entire build process in the Dockerfile using multi-stage builds (introduced in Docker CE 17.05 & EE 17.06). But even with the complete build process described in a platform-independent manner in a single Dockerfile, the complete source repository must still be provided.
TL,DR: Think of a Docker image as a regular binary. It’s convenient to download and install without messing around with source files. You could download the source for a C application and build it using the provided Makefile, but why would you if a binary was made available for your system?
Instead of transporting the large imagefile to and from the server,
doesn't it make sense to, perhaps compile the image locally to verify
it works, but mostly transport the small Dockerfile and build the
image on the server?
Absolutely! You can, for example, set up an automated build on Docker Hub which will do just that every time you check in an updated version of your Dockerfile to your GitHub repo.
Or you can set up your own build server / CI pipeline accordingly.
IMHO, one of the reason for building the images concept and putting into repository is sharing with people too. For example we call Python's out of the box image for performing all python related stuff for a python program to run in Dockerfile. Similarly we could create a custom code(let's take example I did for apache installation with some custom steps(like ports changes and additionally doing some steps) I created its image and then finally put it to my company's repository.
I came to know after few days that may other teams are using it too and now when they are sharing it they need NOT to make any changes simply use my image and they should be done.

How to tell the software version under a tag on Docker hub

I am quite newbie in docker, and I am trying to find the way to tell version for a docker hub tagged image.
For instance, the jenkins/jenkins:lts-latest image, listed here https://hub.docker.com/r/jenkins/jenkins/tags/, what image version does actually aliase? And how can I infer the correspondent dockerfile/branch in jenkins repo?
I tried with docker search but I couldn't. I tried also to find a clue in the official Jenkins github dockerfile repo: https://github.com/jenkinsci/docker, but I don't see any bindung tag or anything that gives me a hint on the source of the image
Another example, I have a Kubernetes cluster, and when I check my Nexus pod, I see likewise that the image is defined as sonatype/nexus3:latest.
In this case at least I have the imageID: docker-pullable://sonatype/nexus3#sha256:434a2564aa64646464afaf.. but once again I don't know how to map it to the actual version of the software
For the repo you asked, the answer is No.
When setup repo on dockerhub, there are two kinds of options for user to choose as follows:
1) Create Repository:
In this way, dockerhub just create a repo for user, and user need to build his own image on local server, tag it, and push it to dockerhub.
When user push his image to dockerhub, no additional information about the source version will be appended, so can't get any source map from dockerhub.
jenkins/jenkins, just this kind of repo.
2) Create Automated Build
In this way, dockerhub will fetch the code from github or bitbucket, and build the image on its cloud infrastructure, so it will know exactly what source commit is for current docker image.
jenkins/jnlp-slave, just this kind of repo.
Then, you can click its Build Details on the web page, click into one link, e.g. 3.26-1-alpine, you will see log mentioned 0a0239228bf2fd26d2458a91dd507e3c564bc7d0 is the source commit.
To sum up, for the repo you mentioned in the question, they are not Automated Build, so you cannot get the map for the image & source code, but if you happen to find a repo in dockerhub which is Automated Build later & want to know the map, then you can.
As long as I understand your question, you are trying to tag the docker image exact with same version as of your software version. For that I use to create image tag:
$ export VERSION="2.31-b19"
$ docker tag "<user>/<image>:${VERSION}" "<docker_hub_user>/<repo>:latest"
If this is not the case. Please explain your use case a bit more so that we can provide you a better workaround.

Build chain in the cloud?

(I understand this question is somewhat out of scope for stack overflow, because contains more problems and somewhat vague. Suggestions to ask it in the proper ways are welcome.)
I have some open source projects depending in each other.
The code resides in github, the builds happen in shippable, using docker images which in turn are built on docker hub.
I have set up an artifact repo and a debian repository where shippable builds put the packages, and docker builds use them.
The build chain looks like this in terms of deliverables:
pre-zenta docker image
zenta docker image (two steps of docker build because it would time out otherwise)
zenta debian package
zenta-tools docker image
zenta-tools debian package
xslt docker image
adadocs artifacts
Currently I am triggering the builds by pushing to github and sometimes rerunning failed builds on shippable after the docker build ran.
I am looking for solutions for the following problems:
Where to put Dockerfiles? Now they are in the repo of the package needing the resulting docker image for build. This way all information to build the package are in one place, but sometimes I have to trigger an extra build to have the package actually built.
How to trigger build automatically?
..., in a way supporting git-flow? For example if I change the code in zenta develop branch, I want to make sure that zenta-tools will build and test with the development version of it, before merging with master.
Are there a tool with which I can overview the health of the whole build chain?
Since your question is related to Shippable, I've created a support issue for you here - https://github.com/Shippable/support/issues/2662. If you are interested in discussing the best way to handle your scenario, you can also send me an email at support#shippable.com You can set up your entire flow, including building the docker images, using Shippable.

Where to keep Dockerfile's in a project?

I am gaining knowledge about Docker and I have the following questions
Where are Dockerfile's kept in a project?
Are they kept together with the source?
Are they kept outside of the source? Do you have an own Git repository just for the Dockerfile?
If the CI server should create a new image for each build and run that on the test server, do you keep the previous image? I mean, do you tag the previous image or do you remove the previous image before creating the new one?
I am a Java EE developer so I use Maven, Jenkins etc if that matter.
The only restriction on where a Dockerfile is kept is that any files you ADD to your image must be beneath the Dockerfile in the file system. I normally see them at the top level of projects, though I have a repo that combines a bunch of small images where I have something like
top/
project1/
Dockerfile
project1_files
project2/
Dockerfile
project2_files
The Jenkins docker plugin can point to an arbitrary directory with a Dockerfile, so that's easy. As for CI, the most common strategy I've seen is to tag each image built with CI as 'latest'. This is the default if you don't add a tag to a build. Then releases get their own tags. Thus, if you just run an image with no arguments you get the last image built by CI, but if you want a particular release it's easy to say so.
I'd recommend keeping the Dockerfile with the source as you would a makefile.
The build context issue means most Dockerfiles are kept at or near the top-level of the project. You can get around this by using scripts or build tooling to copy Dockerfiles or source folders about, but it gets a bit painful.
I'm unaware of best practice with regard to tags and CI. Tagging with the git hash or similar might be a good solution. You will want to keep at least one generation of old images in case you need to rollback.

Resources