Lightweight fustfmt installation in docker - docker

I want to have a docker image with the rustfmt. But I don't want the whole rust tooling to have smaller image.
I tried (the same approach worked for gofmt)
COPY --from=rust:1.65 /usr/local/cargo/bin/rustfmt /usr/bin/rustfmt
but that didn't work with error:
error: rustup could not choose a version of rustfmt to run, because one wasn't specified explicitly, and no default is configured.
help: run 'rustup default stable' to download the latest stable release of Rust and set it as your default toolchain.
How can I handle this?

Related

Swap out a layer of a container image

I am wondering if it is possible to swap out a layer of a container image for another. Here is my scenario:
I have a docker file that does the following:
It pulls the .net core 3.1 debian slim (buster) image
Builds the source code of my application and adds it as a layer on top of the .net core image
When a new version of the .net core 3.1 runtime image comes out, I would like to make a new image that has that new version, but has the same application layer on top of it.
I do NOT want to have to find the exact version of the code I used to build the application and re-build.
The idea is to replicate upgrading the machine and runtime, but not have any alteration to the application (less to test with the upgrade).
Is there a docker command that I can use to swap out a layer of an image?
This answer is broken into two distinct sections:
Part I: Build From Sources Instead: Explains why this idea should not be done.
Part II: A Proof-of-concept Implemented CLI Tool: A CLI tool I implemented specifically for this question, with a demonstration.
The first section can be skipped if potentially adverse side effects are not a concern.
Part I: Build From Sources Instead
No, there is probably not a docker command for this - and likely never will, due to a whole slough of technical issues. Even by design, this is not meant to be trivially possible; images and layers are meant to be immutable after being built. It is strongly recommended that the application image is instead rebuilt from the original sources, but with a new base image by modifying the FROM command.
There are numerous consistency issues that make this idea ill-advised, of which a few are listed below:
Certain Dockerfile commands do not actually create a layer, they update the final layer's internal manifest and the image's manifest. Commands that state Removing intermediate container XXXXXXXXXXXX are actually updating these aforementioned manifests, and not creating new layers. This requires correctly updating the only relevant changes when swapping from the old base image to the new base image; e.g. reconciling changes from ENV/LABEL/MAINTAINER/EXPOSE/CMD/ENTRYPOINT commands.
ENV commands that alter the application image's configuration from variables inherited in the previous base image may not be updated correctly. For example, in the application image, there might be the following command ENV command:
ENV PATH="/path:${PATH}"
If the application image's old base image layers are swapped out, and the new base image contains a different ${PATH} variable, it is ambiguous how to reconcile the differences without a manual decision by a developer.
If apt/apt-get/apk/yum is used to install Linux packages, these packages are installed in the application image as subsequent layers. The installed packages are not guaranteed to be compatible with the new base image if the underlying OS or executables change in the new base image layers.
Part II: A Proof-of-concept Implemented CLI Tool
"Upgrading" an image's base is technically possible by doing direct manipulation on the image archive of the already-built Docker application image. However, I cannot reiterate this enough - you should not even be attempting to edit the bottom layers of an existing image. Seriously - stop it, get some help. I am 90% sure this is a war crime.
For the sake of humoring this question thoroughly though, I have developed a proof-of-concept CLI tool written in Java over on my GitHub project concision/docker-base-image-swapper that is designed to swap base images (but not arbitrary layers). The design choices I made to resolve various consistency issues are a "best guess" on what action should be taken.
Included is a demonstration for swapping the base image for an already-built Java application image from JDK8 to JDK11, implemented in the demo/demo.sh script. All core code is ran in isolated Docker contains, so only Bash and Docker are necessary dependencies on the host to run this demonstration. The demo application image is only built once on JDK 8, but is run twice - once on the original JDK 8 image, and another time on a swapped-base JDK 11 image.
If you experience some technical difficulty with the tool, I may potentially be able to fix the issue. This project was quickly hacked together and has pretty poor code quality; furthermore, it likely suffers from various unaccounted edge cases. I might thoroughly rewrite this in Rust within the next month or two with a focus on maintainability and handling all edge cases.
Warning: I still do not advise trying to edit images; the use of the tool is at your own risk.
The Concept
There are three images relevant in this process, all of which are already-built (i.e. no original sources are needed):
application image: the application image that is currently based off of the old base image, but needs to be swapped to a newer base image.
old base image: the older base image that the application image is based off using a FROM command in the original Dockerfile.
new base image: the newer base image that should replace the application image's layers inherited solely from the FROM old-base-image layers.
By knowing which layers and configurations of the application image are inherited from the old base image, they can be replaced with the layers and configurations from the new base image. All image layers and manifests can be obtained as a tar archive by using the docker save command. With an archive(s) of all relevant three images, a tool can analyze the differences between
Warnings on Alternatives
Beware of doing simply a COPY --from=... from the old application image, as the original application's image configuration (through commands such as CMD, ENTRYPOINT, ENV, EXPOSE, LABEL, USER, VOLUME, WORKDIR) will not be properly replicated.
Two tools that provide this type of functionality that I have found are listed below.
Crane rebase https://github.com/google/go-containerregistry/blob/main/cmd/crane/rebase.md#using-crane-rebase
Buildpack (rebase) https://buildpacks.io/docs/concepts/operations/rebase/
NOTE: You do have to watch that you don't have items that need updates in higher layers. And that higher layers do NOT have updates in them that would conflict with the new base. Example using image below the base is a TOMCAT container. If layer 4 of the original container did a update tomcat package that would overshadow the update of same files on the NEW BASE. Which could result in a non functional system. So as always your mileage may vary.
Is it possible? Yes. But it's rarely done because of how error prone it is. For example, if the previous build would have created a library, but the library already exists in the original base image, that library won't be included in the first build. If the base image removes that library, then the resulting merged image will be missing that library since it's not in the new base layers and isn't in the old application layer.
There are a few ways I can think of to do this.
Option 1: If you know the specific files in one image, then use the COPY --from syntax to copy those files between images. This is the least error prone method, but requires that you know every file you want to include. The resulting Dockerfile looks like:
FROM new_base
# if every file is in /usr/local/bin:
COPY --from=old_image /usr/local/bin/ /usr/local/bin/
Option 2: you can export the images and create your own new image by combining the layers between the two. For this, there's docker save and docker load. That would look like:
docker save old_image >old_image.tar
docker save new_base >new_base.tar
mkdir old_image new_base new_image
tar -xvf old_image.tar -C old_image
tar -xvf new_base.tar -C new_base
cp old_image/*.json new_image/
# manually: copy each layer directory you want to save from old_image, you can view the nested tar files to find the proper ones
# manually: copy each layer directory from new_base into new_image
# manually: modify the new_image/manifest.json to have the new tag for your image, and adjust the layer references
tar -cvf new_image.tar -C new_image .
docker load <new_image.tar
Option 3: this could be done directly to the registry with API calls. You would pull the old manifest, adjust it with the new layers, and then push any new layers and then the new manifest. This would require a fair bit of coding (see regclient/regclient for how I've implemented some of these API's in Go).
Option 4: I know I've seen a tool that does this in a specific scenario but the name of it escapes me. I believe it required that you use their base images that were curated to reduce the risk of incompatibilities between versions, and limited what base images could be swapped, so I don't think it was a general purpose tool.
Note that option 2 and 3 both require either manual steps or for you to write some code if you want to automate it. Because of how error prone this is (as described above) I don't think you'll find anyone maintaining and supporting a tool to implement it. The vast majority rebuild from a Dockerfile using a CI tool.
I have developed a small utility script using Python which lets you append a tarball to an existing container image in a container registry (without having to pull existing image data), available at https://github.com/malthe/appendlayer.
The following illustrates how it works:
[ base layer ] => image1
[ base layer ] [ appended layer ] => image2
The script supports any registry that implements the OCI Distribution Spec.

Update dependencies in the Dockerfile and create an image without re downloading the previously mentioned dependencies

Consider the following scenario. There is an application that depends on the libraries "A", "B", "C", to build and run, otherwise it throws an error. Not knowing about the dependencies "B", and "C" a Dockerfile is created that builds an image with the dependency "A" installed.
The app is run inside a container started from the image and the app fails to build, since the container is missing the dependencies "B" and "C".
Now if the image is destroyed and rebuilt, the previously downloaded dependencies will again be re-downloaded. A workaround could be to write a Dockerfile to import from the existing image (that has the dependency "A" installed) and mention the installation of the dependencies "B" and "C" on top of it.
But this way, Every-time a new dependency needs to be added a new docker image has to be built that will import from the old image, so, the old and the new image both remains important.
My question is that
if there is any way to keep building images mentioning the new dependencies without re-downloding the old dependencies?
without importing the dependencies from the old image ?
and, without the hassle of writing a new "FROM" in the dockerfile?
What is the most clean solution for a scenario like this?
1. If there is any way to keep building images mentioning the new dependencies without re-downloading the old dependencies?
Well, i often optimize Dockerfile using layer caching. Whenever you write down a command in Dockerfile, it creates a new layer. Between 2 times build, docker compares the Dockerfile's commands top down and rebuild from where it detects command's changes. So i often put stable layers (like dependencies, environment setup) at the top of dockerfile. Otherwise layers like EXPOSE Port or CMD which i often change so i put them at bottom of the file. By doing this, it saves a lot of time whenerver i rebuild image.
You can also use multistage-build. But i not often use it so you can check it here: https://docs.docker.com/develop/develop-images/multistage-build/
2. without keeping the old image and import from that into the new one?
Sometime when i want to reinstall everything again, i just rebuild image use option --no-cache.**
docker build --no-cache=true .
3. Without the hassle of writing a new "FROM" in the dockerfile
Sometimes i use base image like linux alpine and install everything i need from scratch so my image will have smaller size and does not contain things that i dont need. FROM is just pulling images from Dockerhub which are created by the some way.
For example Dockerfile of image nginx-alpine :
https://github.com/nginxinc/docker-nginx/blob/2ef3fa66f2a434cd5e44e35a02f4ac502cf50808/mainline/alpine/Dockerfile
You can checkout alpine linux for more details: https://alpinelinux.org/

Building tensorflow 2.2.0 pip wheel file, for use in CentOS system (older libc)

Introduction:
I have to create a pip wheel of Tensorflow 2.2.0 with cuda libraries dynamically linked(specifically cudart.so). To accomplish this i am currently using the tensorflow-dev docker image.
I am able to build the tf wheel file, an able to install and use it while inside the build container.
Issue:
The issue is that importing the generated wheel file in a CentOS server, i get the following error:
ImportError: /lib64/libm.so.6: version `GLIBC_2.27' not found (required by /home1/private/mavridis/Vineyard/tensorflowshared/test/lib64/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so)
Having looked around, the issue is caused by the build container using a newer libc:
ldd --version
ldd (Ubuntu GLIBC 2.27-3ubuntu1) 2.27
Compared to CentOS older version:
ldd --version
ldd (GNU libc) 2.17
Expected behavior:
Having already tried the 'vanilla' tenorflow 2.2.0 version with no issues, installed using pip:
pip install tensorflow==2.2.0
I expected my own build to also work.
So i assume there is some configuration option or docker configuration to allow me to use the docker built wheel file, in a CentOS setup, just like the pip installed version. As this wheel file is intended to be deployed to setups beyond my control, solutions involving alternate OSes and/or libc replacement are not applicable.
Build configuration:
During build i use the following configuration/ command line:
export TF_NEED_CUDA=1
export TF_USE_XLA=0
export TF_SET_ANDROID_WORKSPACE=0
export TF_NEED_OPENCL_SYCL=0
export TF_NEED_ROCM=0
bazel build --config=opt --config=cuda --output_filter=DONT_MATCH_ANYTHING --linkopt=-L/usr/local/cuda/lib64 --linkopt=-lcudart --linkopt=-static-libstdc++ //tensorflow/tools/pip_package:build_pip_package
Regarding options used:
--output_filter=DONT_MATCH_ANYTHING : Silence warnings
--linkopt=-L/usr/local/cuda/lib64 --linkopt=-lcudart : Dynamic linking of cudart.so
--linkopt=-static-libstdc++ : Static link libstc++ as libstc++ also caused the libc error, this however is not possible for libm
I expected my own build to also work.
That expectation is (obviously) incorrect. The symbols your program or library requires from GLIBC depend on exactly which functions you call.
Consider the following program:
int main() { exit(0); }
When compiled/linked on a GLIBC-2.30 system, this program only depends on GLIBC_2.2.5 (because it doesn't call any newer symbols).
Now change the program slightly:
int main() { gettid(); exit(0); }
Compile/link it again, and all of a sudden this program now requires GLIBC_2.30 (because that's where gettid() was added to GLIBC), and will not work on any system which has older GLIBC.
So i assume there is some configuration option or docker configuration
Sure: your Docker image must have GLIBC that is not newer than what your target system have, i.e. GLIBC-2.17. Your current image contains GLIBC-2.27 (or newer).
You need a different Docker image, and you'll likely have to build it yourself, since GLIBC-2.17 is over 7 years old, and predates TensorFlow by many years.
Update:
What i don't understand is how come the pip tensorflow package (which i assumed was build with the docker image i am using) works with CentOS?
It works by accident, just like my first program would work on CentOS, but the second one wouldn't.
In short i wanted to generate a pip package that would work on 'any' linux/libc version
That is an impossible goal: Linux predates GLIBC, and it is impossible to build a single package that will work on a Linux distribution which didn't include GLIBC and on a distribution that did.
You have to draw a line somewhere. The developers of tensorflow-dev docker image drew a line at GLIBC-2.27. Packages built on this image should work on any system with 2.27 or later, and might (but are not at all guaranteed to) work on older systems.
just like the pip installed version.
You claim that the pip installed version has no "only GLIBC-xx or later" requirement, but that is not true. I am 99.9% sure that it requires at least GLIBC-2.14.
To find which GLIBC versions that package requires, run this command:
readelf -WV _pywrap_tensorflow_internal.so | grep GLIBC_
I assumed, the pip installed version was built using the publicly available tensorflow-devel docker image.
That is quite likely. And like I said, it happens to work on CentOS, but minute changes may make it not work anymore.
Update 2:
So running the readelf command as you suggested, does show the most recent required versions to be: - pip version: GLIBC_2.12 - mine : GLIBC_2.27 So from what i understand the pip version uses an older version even from CentOS, which explains why it works.
It doesn't "use" older version, it uses whatever version is available.
It requires a minimum version 2.12, while your build requires a minimum version 2.27.
How do they achieve this? Do they use a different image that has an older libc? If so, where can i get it? Or do they use the public image, but build with some bazel flag, that 'limits' symbols to the ones contained up to libc 2.12?
You are still not getting it.
The version that your program requires depends on exactly which functions you call. In my example program, if I only call exit, my program requires vesion 2.2.5, but if I also call gettid, then my program requires version 2.30. Note: these two programs are built on the same system with the same flags.
So no: they (most likely) didn't use a different Docker image, and didn't use "magic" bazel flags. They just happened to not call any functions which require GLIBC version > 2.12, and you did.
P.S. You can find which symbol(s) are causing "bad" dependency in your build like so:
readelf -Ws _pywrap_tensorflow_internal.so | egrep 'GLIBC_2.2[0-9]'
readelf -Ws _pywrap_tensorflow_internal.so | egrep 'GLIBC_2.1[89]'
This would produce output similar to (using my second program):
readelf -Ws a.out | egrep 'GLIBC_2.[23][0-9]'
2: 0000000000000000 0 FUNC GLOBAL DEFAULT UND gettid#GLIBC_2.30 (2)
48: 0000000000000000 0 FUNC GLOBAL DEFAULT UND gettid##GLIBC_2.30
The output above shows that the only symbol my binary requires from GLIBC 2.20 or above is gettid.
To make a counter point to what Employed Russian wrote:
The version that your program requires depends on exactly which functions you call. In my example program, if I only call exit, my program requires vesion 2.2.5, but if I also call gettid, then my program requires version 2.30. Note: these two programs are built on the same system with the same flags.
I don't think that's quite accurate. My understanding, which is corroborated by https://github.com/wheybags/glibc_version_header, is that things work like so (quoting that project, emphasis mine):
Glibc uses something called symbol versioning. This means that when you use e.g., malloc in your program, the symbol the linker will actually link against is malloc#GLIBC_YOUR_INSTALLED_VERSION (actually, it will link to malloc from the most recent version of glibc that changed the implementaton of malloc, but you get the idea).
So my guess (I have not checked) would be that the Tensorflow releases are built against an older glibc (perhaps by way of being built on an older release of their target Linux distro).

Docker Image which uses another image for running test cases

I have a BDD framework in Java which I am planning to dockerize. I am able to build and run that image as a whole. But what I want is:
To build 2 images, Image-1: Entire project (without feature files) & Image-2: Feature files.
Reason to do this is: My feature file will change often. I don't want to create my image again every time to install JDK and maven when there is only a change in the feature file.
What I expect is - Image-1 runs always as a container in background and when there is a change in feature files, I build Image-2 and start it as a container. This should trigger test by using already running container which has an entire dependency.
Reason to do this is: My feature file will change often. I don't want to create my image again every time to install JDK and maven when there is only a change in the feature file.
If you just want to meet above requirement, what you is just image inherit like next:
base/Dockerfile:
FROM ubuntu:16.04
# install JDK/MAVEN here
RUN xxx
Build a base image now:
$ docker build -t mybase:1 .
Then, for your application, use this base image:
app/Dockerfile:
FROM mybase:1
# add new feature files here
ADD ... ...
Everytime, your feature file change, you could rebuild your app Dockerfile and run a container base on this new built out image. But, As the JDK/MAVEN is in another base image (mybase:1) which was already built there, so they won't be built again.

How do I install a project built with bazel?

I am working on a project that is transitioning from CMake to Bazel. One critical feature that we are apparently losing in the bargain is the ability to install the project, so that it can be used by other (not necessarily Bazel) projects.
AFAICT, there is currently no built in support for installing a project?!
I need to create a portable (must work on at least Linux and MacOS) way to install the project. Specifically:
I need to be able to specify libraries, headers, executables, and other files (e.g. LICENSE) that need to be installed.
The user needs to be able to specify an absolute prefix where things should be installed.
I really, really should be able to execute the "install" step more than once, giving different prefixes each time, without Bazel getting confused (i.e. it must not try to "remember" what files it already installed, or if it does, must understand when the prefix is different from last time).
Libraries should be installed to the right place (e.g. lib64), or at least it should be possible for the user to specify the correct libdir.
The install step MUST NOT touch the time stamp on any file from a previous install that has not changed. (Ideally, Bazel itself would handle this; using the install command is tricky and has potential portability issues. Note platform requirements, above.)
What is the best way to go about doing this?
Unless you want to do specific package (e.g. deb or rpm), you probably want to create an executable rule that does the install for you.
You can create a rule that would create an executable (e.g. a shell script) that does the install for you (e.g. do checksums to check if there are change to the installed file and does the actual copy of the files if they have changed). You would have to use the extension language to do, that would look similar to what the docker rules does to load an image with the incremental loader
Addition: I forgot to say that the install itself would be run by using the run command: bazel run install if the rule is named install in the top level BUILD file.

Resources