Can I build a Docker image to "cache" a yocto/bitbake build?

Can I build a Docker image to "cache" a yocto/bitbake build? - docker

I'm building a Yocto image for a project but it's a long process. On my powerful dev machine it takes around 3 hours and can consume up to 100 GB of space.
The thing is that the final image is not "necessarily" the end goal; it's my application that runs on top of it that is important. As such, the yocto recipes don't change much, but my application does.
I would like to run continuous integration (CI) for my app and even continuous delivery (CD). But both are quite hard for now because of the size of the yocto build.
Since the build does not change much, I though of "caching" it in some way and use it for my application's CI/CD and I though of Docker. That would be quite interesting as I could maintain that image and share it with colleagues who need to work on the project and use it in CI/CD.
Could a custom Docker image be built for that kind of use?
Would it be possible to build such an image completely offline? I don't want to have to upload the 100GB and have to re-download it on build machines...
Thanks!

1. Yes.
I've used docker to build Yocto images for many different reasons, always with positive results.
2. Yes, with some work.
You want to take advantage of the fact that Yocto caches all the stuff you need to do your build in what it calls "Shared State Cache". This is normally located in your build directory under ${BUILDDIR}/sstate-cache, and it contains exactly what you are looking for in this case. There are a couple of options for how to get these files to your build machines.
Option 1 is using sstate mirrors:
This isn't completely offline, but lets you download a much smaller cache and build from that cache, rather than from source.
Here's what's in my local.conf file:
SSTATE_MIRRORS ?= "\
file://.* http://my.shared-computer.com/some-folder/PATH"
Don't forget the PATH at the end. That is required. The build system substitutes the correct path within the directory structure.
Option 2 lets you keep a local copy of your sstate-cache and build from that locally.
In your dockerfile, create the sstate-cache directory (location isn't important here, I like /opt for my purposes):
RUN mkdir -p /opt/yocto/sstate-cache
Then be sure to bindmount these directories when you run your build in order to preserve the contents, like this:
docker run ... -v /place/to/save/cache:/opt/yocto/sstate-cache
Edit the local.conf in your build directory so that it points at these folders:
SSTATE_DIR ?= "/opt/yocto/sstate-cache"
In this way, you can get your cache onto your build machines in whatever way is best for you (scp, nfs, sneakernet).
Hope this helps!

Related

Combine executables from separate Docker images?

Let's say I need A.exe and B.exe installed in my final node image by runtime. Both A.exe and B.exe happen to be available on Docker Hub, but they're from separate images. Does Docker have a way to somehow make both executables from different images available in my final image?
I don't think Docker's multi-stage build is relevant as it only simplifies passing artefacts that we want available on the next image. Whereas in my case, I need the whole runtime environment from previous images to be available. I have the option to RUN shell commands to manually install these dependencies but is this really the only way?

You could use a multistage build, since you could declare a.exe/b.exe together with the required runtime to be the required artefacts.
But I agree it could be easier if you install the runtime from packages and just copy the application.

If you feel sure enough about what you need, you could export image A and image B as tar balls.
Now comes the tricky part: merging the two filesystem structures
Extract both archives such that you have one target filesystem structure and wrap that up into a tar ball again.
So it is not impossible to get the files - but you need to know exactly what you are doing.
Finally import that tar ball into a docker image.

One option to get a combined image is not to merge two images but maybe merge two dockerfiles. This may need a review once in a while but might even change less often than the images themselves.

How to optimize Docker build for image with large web static content

We need to generate a docker image which takes a very long time to build, maybe around 45 minutes to an hour. The build consists of first downloading many (100+) .tgz files from different npm locations, then decompress them to be served up by a web server (details will be omitted).
For simplicity, let's just assume there is exactly 100 modules, and that this app is just a combination of a bunch of different developer teams all delivering their web content to this one image. On-demand, any one of the 100 modules can get a new version with a new build, and that new build needs to regenerate a new image with 99 modules not changing, and only 1 module has now changed.
The build directory just before running the docker build command looks like this:
/build/libs/server.jar
/embeddedDir/
module1-1.0.0/package/...
module2-1.0.0/package/...
module3-1.0.0/package/...
module4-1.0.0/package/...
// 100 modules in total
The Dockerfile looks like this:
FROM certifiedBaseImage:1.0.0
WORKDIR /app
COPY /embeddedDir/ ./embeddedDir/
COPY /build/libs/server.jar ./server.jar
ENTRYPOINT ["startmyserver"]
The first COPY layer takes so long to finish (45m-1hr). The "/embeddedDir" has maybe 2GB of just static content copied from various module packages from NPM. There are probably hundreds of thousands of small static files, css, images, etc.
It seems weird that 2GB of web content would take so long to copy into the docker image during the build. Especially if you consider that this build machine can download and decompress all this content in less than a minute, which is amazing speed.
So my optimization idea was to then just make an intermediate base image. So now I have 2 files like this:
Dockerfile_base
FROM certifiedBaseImage:1.0.0
WORKDIR /app
COPY /embeddedDir/ ./embeddedDir/
COPY /inventory.json ./inventory.json
The "inventory.json" is a list of all modules that are in the base image and their version.
Dockerfile_new
FROM intermediateBaseImage:1.0.0
COPY /embeddedDir/ ./embeddedDir/
COPY /build/libs/server.jar ./server.jar
ENTRYPOINT ["startmyserver"]
Before I build the Dockerfile_new I first pull the intermediateBaseImage, then copy the inventory.json from there, and make sure not to download any of the modules that are already there, so the embeddedDir only has files inside that are new in this build. It's OK to have older versions of a few modules in the final image, since the server only bothers to serve up the latest versions of each module anyway. I then have a separate build that generates the Dockerfile_base every week, and automatically updates the version on the Dockerfile_new to the updated base.
This SIGNIFICANTLY improved my build times by an order of magnitude. It went down from 45 minutes to build the docker image to 4 minutes.
But now my problem is that the security scans think my output image base is the intermediateBaseImage, and it complains that this image is not the "certified one". Of course my examples above aren't real base image names, just an example. But suffice to say the security scan doesn't trust the "intermediate one" that I generate once a week.
Here are my question:
Is there any strategy out there that can improve my build times but still have the final docker image be seen as having the certified base image (so the scans are satisfied?)

multi-stage builds are what you are looking for.

Swap out a layer of a container image

I am wondering if it is possible to swap out a layer of a container image for another. Here is my scenario:
I have a docker file that does the following:
It pulls the .net core 3.1 debian slim (buster) image
Builds the source code of my application and adds it as a layer on top of the .net core image
When a new version of the .net core 3.1 runtime image comes out, I would like to make a new image that has that new version, but has the same application layer on top of it.
I do NOT want to have to find the exact version of the code I used to build the application and re-build.
The idea is to replicate upgrading the machine and runtime, but not have any alteration to the application (less to test with the upgrade).
Is there a docker command that I can use to swap out a layer of an image?

This answer is broken into two distinct sections:
Part I: Build From Sources Instead: Explains why this idea should not be done.
Part II: A Proof-of-concept Implemented CLI Tool: A CLI tool I implemented specifically for this question, with a demonstration.
The first section can be skipped if potentially adverse side effects are not a concern.
Part I: Build From Sources Instead
No, there is probably not a docker command for this - and likely never will, due to a whole slough of technical issues. Even by design, this is not meant to be trivially possible; images and layers are meant to be immutable after being built. It is strongly recommended that the application image is instead rebuilt from the original sources, but with a new base image by modifying the FROM command.
There are numerous consistency issues that make this idea ill-advised, of which a few are listed below:
Certain Dockerfile commands do not actually create a layer, they update the final layer's internal manifest and the image's manifest. Commands that state Removing intermediate container XXXXXXXXXXXX are actually updating these aforementioned manifests, and not creating new layers. This requires correctly updating the only relevant changes when swapping from the old base image to the new base image; e.g. reconciling changes from ENV/LABEL/MAINTAINER/EXPOSE/CMD/ENTRYPOINT commands.
ENV commands that alter the application image's configuration from variables inherited in the previous base image may not be updated correctly. For example, in the application image, there might be the following command ENV command:
ENV PATH="/path:${PATH}"
If the application image's old base image layers are swapped out, and the new base image contains a different ${PATH} variable, it is ambiguous how to reconcile the differences without a manual decision by a developer.
If apt/apt-get/apk/yum is used to install Linux packages, these packages are installed in the application image as subsequent layers. The installed packages are not guaranteed to be compatible with the new base image if the underlying OS or executables change in the new base image layers.
Part II: A Proof-of-concept Implemented CLI Tool
"Upgrading" an image's base is technically possible by doing direct manipulation on the image archive of the already-built Docker application image. However, I cannot reiterate this enough - you should not even be attempting to edit the bottom layers of an existing image. Seriously - stop it, get some help. I am 90% sure this is a war crime.
For the sake of humoring this question thoroughly though, I have developed a proof-of-concept CLI tool written in Java over on my GitHub project concision/docker-base-image-swapper that is designed to swap base images (but not arbitrary layers). The design choices I made to resolve various consistency issues are a "best guess" on what action should be taken.
Included is a demonstration for swapping the base image for an already-built Java application image from JDK8 to JDK11, implemented in the demo/demo.sh script. All core code is ran in isolated Docker contains, so only Bash and Docker are necessary dependencies on the host to run this demonstration. The demo application image is only built once on JDK 8, but is run twice - once on the original JDK 8 image, and another time on a swapped-base JDK 11 image.
If you experience some technical difficulty with the tool, I may potentially be able to fix the issue. This project was quickly hacked together and has pretty poor code quality; furthermore, it likely suffers from various unaccounted edge cases. I might thoroughly rewrite this in Rust within the next month or two with a focus on maintainability and handling all edge cases.
Warning: I still do not advise trying to edit images; the use of the tool is at your own risk.
The Concept
There are three images relevant in this process, all of which are already-built (i.e. no original sources are needed):
application image: the application image that is currently based off of the old base image, but needs to be swapped to a newer base image.
old base image: the older base image that the application image is based off using a FROM command in the original Dockerfile.
new base image: the newer base image that should replace the application image's layers inherited solely from the FROM old-base-image layers.
By knowing which layers and configurations of the application image are inherited from the old base image, they can be replaced with the layers and configurations from the new base image. All image layers and manifests can be obtained as a tar archive by using the docker save command. With an archive(s) of all relevant three images, a tool can analyze the differences between
Warnings on Alternatives
Beware of doing simply a COPY --from=... from the old application image, as the original application's image configuration (through commands such as CMD, ENTRYPOINT, ENV, EXPOSE, LABEL, USER, VOLUME, WORKDIR) will not be properly replicated.

Two tools that provide this type of functionality that I have found are listed below.
Crane rebase https://github.com/google/go-containerregistry/blob/main/cmd/crane/rebase.md#using-crane-rebase
Buildpack (rebase) https://buildpacks.io/docs/concepts/operations/rebase/
NOTE: You do have to watch that you don't have items that need updates in higher layers. And that higher layers do NOT have updates in them that would conflict with the new base. Example using image below the base is a TOMCAT container. If layer 4 of the original container did a update tomcat package that would overshadow the update of same files on the NEW BASE. Which could result in a non functional system. So as always your mileage may vary.

Is it possible? Yes. But it's rarely done because of how error prone it is. For example, if the previous build would have created a library, but the library already exists in the original base image, that library won't be included in the first build. If the base image removes that library, then the resulting merged image will be missing that library since it's not in the new base layers and isn't in the old application layer.
There are a few ways I can think of to do this.
Option 1: If you know the specific files in one image, then use the COPY --from syntax to copy those files between images. This is the least error prone method, but requires that you know every file you want to include. The resulting Dockerfile looks like:
FROM new_base
# if every file is in /usr/local/bin:
COPY --from=old_image /usr/local/bin/ /usr/local/bin/
Option 2: you can export the images and create your own new image by combining the layers between the two. For this, there's docker save and docker load. That would look like:
docker save old_image >old_image.tar
docker save new_base >new_base.tar
mkdir old_image new_base new_image
tar -xvf old_image.tar -C old_image
tar -xvf new_base.tar -C new_base
cp old_image/*.json new_image/
# manually: copy each layer directory you want to save from old_image, you can view the nested tar files to find the proper ones
# manually: copy each layer directory from new_base into new_image
# manually: modify the new_image/manifest.json to have the new tag for your image, and adjust the layer references
tar -cvf new_image.tar -C new_image .
docker load <new_image.tar
Option 3: this could be done directly to the registry with API calls. You would pull the old manifest, adjust it with the new layers, and then push any new layers and then the new manifest. This would require a fair bit of coding (see regclient/regclient for how I've implemented some of these API's in Go).
Option 4: I know I've seen a tool that does this in a specific scenario but the name of it escapes me. I believe it required that you use their base images that were curated to reduce the risk of incompatibilities between versions, and limited what base images could be swapped, so I don't think it was a general purpose tool.
Note that option 2 and 3 both require either manual steps or for you to write some code if you want to automate it. Because of how error prone this is (as described above) I don't think you'll find anyone maintaining and supporting a tool to implement it. The vast majority rebuild from a Dockerfile using a CI tool.

I have developed a small utility script using Python which lets you append a tarball to an existing container image in a container registry (without having to pull existing image data), available at https://github.com/malthe/appendlayer.
The following illustrates how it works:
[ base layer ] => image1
[ base layer ] [ appended layer ] => image2
The script supports any registry that implements the OCI Distribution Spec.

docker build context and sensitive data

The title of this question might suggest that it has already been answered, but trust me, I searched intensively here on SO :-)
As I understand it when building a docker-image the current folder will be packaged up and sent to the docker-deamon as the build-context. From this build-context the docker-image is build by "ADD"ing or "COPY"ing files and "RUN"ning the commands in the Dockerfile.
And furthermore: In case I have sensitive configuration-files in the folder of the DockerFile, these files will be sent to the docker-deamon as part of the build-context.
Now my question:
Lets say I did not use any COPY or ADD in my Dockerfile... will these configuration files be included somewhere in the docker-image? I ran a bash inside the image and could not find the configuration-files, but maybe they are still somewhere in the deeper layers of the image?
Basically my question is: Will the context of the build be stored in the image?

Only things you explicitly COPY or ADD to the image will be present there. It's common to have lines like COPY . . which will copy the entire context into the image, so it's up to you to check that you're not copying in things you don't want to have persisted and published.
It still is probably a good idea to keep these files from being sent to the Docker daemon at all. If you know which files have this information, you can add them to a .dockerignore file (syntax similar to .gitignore and similar files). There are other ways to more tightly control what's in the build context (by making a shadow install tree that has only the context content) but that's a relatively unusual setup.

As you said only COPY, ADD and RUN operations create layers, and therefore, only those operations add something to the image.
The build context is only the directory with the resources those operations (specifically COPY and ADD) will have access to while building the image. But it's not anything like a "base layer".
In fact, you said you ran bash and double checked that nothing sensitive was there. Another way to make sure about this is by checking the layers of the image. To do so, docker history --no-trunc <image>

How to add Chromedriver to existing Docker?

I've personal ASP.NET Core project which scrapes data from the web using Selenium and Chromium and saves it in local sqlite database.
I want to be able to run this app in Docker image on my Synology NAS. Managed to create and run Docker image (on my Mac), it displays data from sqlite db correctly, but getting error when trying to scrape:
The chromedriver file does not exist in the current directory or in a directory on the PATH environment variable.
From my very limited understanding of Dockers in general, I understand that I need to add chromiumdriver inside the docker somehow.
I've searched a lot, went trough ~30 different examples and still can't get this to work.
Any help is appreciated!

You need to build a new image based on the existing one, in which you add the chromedriver binary. In other words you need to extend your current image.
So create a directory containing a Dockerfile and the chromedriver binary.
Your Dockerfile should look like this:
FROM your_existing_image_name:version
COPY chromedriver desired_path_inside_container
Then open a terminal inside this directory and execute:
docker build -t your_existing_image_name:version++ .
After that you should be able to start a container from the newly created image.
Some notes:
I have assumed that your existing image has been tagged with a version. If it is not the case then remove :version from Dockerfile
Similarly, remove :version++ from the build command. However, is a good practice to include versioning in your images.
I have not add any entrypoint assuming that you do not need to change the existing one.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart