Is it possible to run the Bazel with a mode, which execute all actions on the remote executor? Ideally I would like to limit all possible network traffic like sending *.o files, which are required only to link a final binary
It sure is. This is something I tend to keep as my default (as I often develop on cellular data).
If you add the following to your .bazelrc you can minimise the downloads.
build --remote_download_minimal
build --remote_download_outputs=minimal
There are a whole host of other flags that can be used to fine-tune what is downloaded from the remote executor and when. You can find these flags by searching through the Bazel command-line reference. e.g. use ctrl-F and 'remote_download_'.
NOTE: This does not eliminate downloads for targets that are tagged as 'local'.
Related
Having needed several times in the last few days to upload a 1Gb image after some micro change, I can't help but wonder why there isnt a deploy path built into docker and related tech (e.g. k8s) to push just the application files (Dockerfile, docker-compose.yml and app related code) and have it build out the infrastructure from within the (live) docker host?
In other words, why do I have to upload an entire linux machine whenever I change my app code?
Isn't the whole point of Docker that the configs describe a purely deterministic infrastructure output? I can't even see why one would need to upload the whole container image unless they make changes to it manually, outside of Dockerfile, and then wish to upload that modified image. But that seems like bad practice at the very least...
Am I missing something or this just a peculiarity of the system?
Good question.
Short answer:
Because storage is cheaper than processing power, building images "Live" might be complex, time-consuming and it might be unpredictable.
On your Kubernetes cluster, for example, you just want to pull "cached" layers of your image that you know that it works, and you just run it... In seconds instead of compiling binaries and downloading things (as you would specify in your Dockerfile).
About building images:
You don't have to build these images locally, you can use your CI/CD runners and run the docker build and docker push from the pipelines that run when you push your code to a git repository.
And also, if the image is too big you should look into ways of reducing its size by using multi-stage building, using lighter/minimal base images, using few layers (for example multiple RUN apt install can be grouped to one apt install command listing multiple packages), and also by using .dockerignore to not ship unnecessary files to your image. And last read more about caching in docker builds as it may reduce the size of the layers you might be pushing when making changes.
Long answer:
Think of the Dockerfile as the source code, and the Image as the final binary. I know it's a classic example.
But just consider how long it would take to build/compile the binary every time you want to use it (either by running it, or importing it as a library in a different piece of software). Then consider how indeterministic it would download the dependencies of that software, or compile them on different machines every time you run them.
You can take for example Node.js's Dockerfile:
https://github.com/nodejs/docker-node/blob/main/16/alpine3.16/Dockerfile
Which is based on Alpine: https://github.com/alpinelinux/docker-alpine
You don't want your application to perform all operations specified in these files (and their scripts) on runtime before actually starting your applications as it might be unpredictable, time-consuming, and more complex than it should be (for example you'd require firewall exceptions for an Egress traffic to the internet from the cluster to download some dependencies which you don't know if they would be available).
You would instead just ship an image based on the base image you tested and built your code to run on. That image would be built and sent to the registry then k8s will run it as a black box, which might be predictable and deterministic.
Then about your point of how annoying it is to push huge docker images every time:
You might cut that size down by following some best practices and well designing your Dockerfile, for example:
Reduce your layers, for example, pass multiple arguments whenever it's possible to commands, instead of re-running them multiple times.
Use multi-stage building, so you will only push the final image, not the stages you needed to build to compile and configure your application.
Avoid injecting data into your images, you can pass it later on-runtime to the containers.
Order your layers, so you would not have to re-build untouched layers when making changes.
Don't include unnecessary files, and use .dockerignore.
And last but not least:
You don't have to push images from your machine, you can do it with CI/CD runners (for example build-push Github action), or you can use your cloud provider's "Cloud Build" products (like Cloud Build for GCP and AWS CodeBuild)
Intro
There is an option --platform for Docker image to be run and config platform for docker-compose.
Also, almost in all official Docker images in hub.docker.com there is some of supported architectures in one tag.
Example, Ubuntu official image:
Most of Servers (also in Kubernetes) are linux/amd64.
I updated my MacBook to new one with their own Silicon chip (M1/M2...) and now Docker Desktop showing me message:
For official images (you can see them without yellow note) it downloads automatically needed platform (I guess).
But for custom created images (in private repository like nexus, artifacts) I have no influence. Yes, I can build appropriate images (like with buildx) for different platforms and push it to the private repository, but, in companies, where repos managed by DevOps - it is tricky to do so. They say that the server architecture is linux/amd64, and if I develop web-oriented software (PHP etc.) on a different platform, even if the version (tag) is the same - then the environment is different, and there is no guarantee that it will work on the server.
I assumed that it is only the difference in interpretation of instructions between the software and the hardware.
I would like to understand the subject better. There is a lot of superficial information on the web, no details.
Questions
what "platform/architecture" for Docker image does it really means? Like core basics.
Will you really get different code for interpreted programming languages?
It seems to me that if the wrong platform is specified, the containers work very slowly. But how to measure this (script performance, interaction with the host file system, etc.)
TLDR
Build multi-arch images supporting multiple architectures
Always ensure that the image you're trying to run has compatible architecture
what "platform/architecture" for docker image does it really means? Like core basics. Links would be appreciated.
It means that some of the compiled binary code within the image contains CPU instructions exlusive to that specific architecture.
If you run that image on the incorrect architecture, it'll either be slower due to the incompatible code needing to run through an emulator, or it might even not work at all.
Some images are "multi-arch", where your Docker installation selects the most suitable architecture of the image to run.
Will you really get different code for interpreted programming languages?
Different machine code, yes. But it will be functionally equivalent.
It seems to me that if the wrong platform is specified, the containers work very slowly. But how to measure this (script performance, interaction with the host file system, etc.)
I recommend to always ensure you're running images meant for your machine's infrastructure.
For the sake of science, you could do an experiment.
You can build an image that is meant to run a simple batch job for two different architectures, and then you can try running them both on your machine. Compare the time it takes the containers to finish.
Sources:
https://serverfault.com/questions/1066298/why-are-docker-images-architecture-specific#:~:text=Docker%20images%20obviously%20contain%20processor,which%20makes%20them%20architecture%20dependent.
https://www.docker.com/blog/multi-arch-build-and-images-the-simple-way/
https://www.reddit.com/r/docker/comments/o7u8uy/run_linuxamd64_images_on_m1_mac/
I have a development use-case where I use a script to configure a shell with docker-machine or other environment and then open a directory containing source and settings (/.vscode/, .devcontainer/) that I can edit/build/debug in the VS code Remote Containers extension.
In short, I'm looking to implement the following sequence when a "start-development.sh" script/hook runs:
Set up host-side env or remote resources (reverse sshfs to mount source to a remote docker-machine, modprobe, docker buildx, xhost for x-passthrough, etc.)
Run VS Code in that shell so settings aren't thrown away with a specified directory (may be mounted via sshfs or other means) in container, not just open on the host
Run cleanup scripts to clean-up and/or destroy real resources (unmount, modprobe -r, etc.) when the development container is stopped (by either closing VS Code or rebuilding the container).
See this script for a simple example of auto-configuring a shell with an AWS instance via docker-machine. I'll be adding a few more examples to this repository over the coming day or so.
It's easy enough to open VS Code in that directory (code -w -n --folder-uri /path/here) and wait for it to quit (so I can perform cleanup steps like taking down the remote docker-machine, un-mounting reverse-sshfs mounted code or disabling kernel mods I use for development, etc.).
However, VS code currently opens in "host mode" and when I choose "Reopen in container" or "Rebuild container" via the UI or command palette, it kills that process and opens another top-level(?) process, quitting the shell & throwing away my configuration and/or prematurely running my cleanup portion of the script so it has the wrong env. when it finally launches in-container. Sadness.
So finally, my question is:
Is there a way to tell VS code to open a folder "in-container"? This would solve a ton of problems for me, instead of a janky dev. cycle where I have to ensure that the code instance isn't restarting itself and messing things up - whenever I rebuild the container, for example.
Alternatively, it'd be great to not quit the top-level code process I started altogether, enabling me to wait on that, or perhaps monitor it in other ways I'm not aware of to prevent erasure of my settings and premature run of my cleanup script?
Thanks in advance!
PS: Please read the entire question before flagging it as "not related to development". If the idea of a zero-install development environment for a complex native project, live on-device development/debugging or deep learning using cloud instances with giant GPUs for Docker where you don't have to manually manage everything and write pages of readmes appeals to you - this is very much about programming.
After all weekend of trying different things, I finally figured it out! The key was this section in the awesome articles about advanced container configuration.
I put that into a bash script and used jq to merge docker.host and other docker env settings into .vscode/settings.json. See this example here.
After running a script that generates this file, the user will only need to reload/relaunch VS code in that workspace folder (where the settings were created) and yay, everything works as expected.
I plan to add some actual samples now that I have the basics working. Unfortunately, I had to separate my create and teardown as separate activate and deactivate hooks. Not a bad workflow still, IMO.
When I first learned Docker I expected a config file, image producer, CLI, and options for mounting and networks. That's all there.
I did not expect to put build commands inside a Dockerfile. I thought docker would wrap/tar/include a prebuilt task I made. Why give build commands in Docker?
Surely it can import a task thus keeping Jenkins/Bazel etc. distinct and apart for making an image/container?
I guess we are dealing with a misconception here. Docker is NOT a lighweight version of VMware/Xen/KVM/Parallels/FancyVirtualization.
Disclaimer: The following is heavily simplified for the sake of comprehensiveness.
So what is Docker?
In one sentence: Docker is a system to isolate processes from the other processes within an operating system as much as possible while still providing all means to run them. Put differently:
Docker is a package manager for isolated processes.
One of its closest ancestors are chroot and BSD jails. What those basically do is to isolate (more in the case of BSD, less in the case of chroot) a part of your OS resources and have a complete environment running independently from the rest of the OS - except for the kernel.
In order to be able to do that, a Docker image obviously needs to contain everything except for a kernel. So you need to provide a shell (if you choose to do so), standard libraries like glibc and even resources like CA certificates. For reference: In order to set up chroot jails, you did all this by hand once upon a time, preinstalling your chroot environment with each and every piece of software required. Docker is basically taking the heavy lifting from you here.
The mentioned isolation even down to the installed (and usable software) sounds cumbersome, but it gives you several advantages as a developer. Since you provide basically everything except for a (compatible) kernel, you can develop and test your code in the same environment it will run later down the road. Not a close approximation, but literally the same environment, bit for bit. A rather famous proverb in relation to Docker is:
"Runs on my machine" is no excuse any more.
Another advantage is that can add static resources to your Docker image and access them via quite ordinary file system semantics. While it is true that you can do that with virtualisation images as well, they usually do not come with a language for provisioning. Docker does - the Dockerfile:
FROM alpine
LABEL maintainer="you#example.com"
COPY file/in/host destination/on/image
Ok, got it, now why the build commands?
As described above, you need to provide all dependencies (and transitive dependencies) your application has. The easiest way to ensure that is to build your application inside your Docker image:
FROM somebase
RUN yourpackagemanager install long list of dependencies && \
make yourapplication && \
make install
If the build fails, you know you have missing dependencies. Now you can tweak and tune your Dockerfile until it compiles and is tested. So now your Docker image is finished, you can confidently distribute it, since you know that as long as the docker daemon runs on the machine somebody tries to run your image on, your image will run.
In the Go ecosystem, you basically assure your go.mod and go.sum are up to date and working and your work stay's reproducible.
Again, this works with virtualisation as well, so where is the deal?
A (good) docker image only runs what it needs to run. In the vast majority of docker images, this means exactly one process, for example your Go program.
Side note: It is very bad practise to run multiple processes in one Docker image, say your application and a database server and a cache and whatnot. That is what docker-compose is there for, or more generally container orchestration. But this is far too big of a topic to explain here.
A virtualised OS, however, needs to run a kernel, a shell, drivers, log systems and whatnot.
So the deal basically is that you get all the good stuff (isolation, reproducibility, ease of distribution) with less waste of resources (running 5 versions of the same OS with all its shenanigans).
Because we want to have enviroment for reproducible build. We don't want to depend on version of language, existence of compiler, version of libraires and so on.
Building inside a Dockerfile allows you to have all the tools and environment you need inside independently of your platform and ready to use. In a development perspective is easier to have all you need inside the container.
But you have to think about the objective of building inside a Dockerfile, if you have a very complex build process with a lot of dependencies you have to be worried about having all the tools inside and it reflects on the final size of your resulting image. Because this is not the same building to generate an artifact than building to produce the final container.
Thinking about this two aspects you have to learn to use the multistage build process in Docker here. The main idea is closer to your question because you can have a as many stages as you need depending on your build process and use different FROM images to ensure you have the correct requirements and dependences on each stage, to finally generate the image with the minimum dependencies and smaller size.
I'll add to the answers above:
Doing builds in or out of docker is a choice that depends on your goal. In my case I am more interested in docker containers for kubernetes, and in addition we have mature builds already.
This link shows how you take prebuilt tasks and add them to an image. This strategy together with adding libs, env etc leverages docker well and shows that indeed docker is flexible. https://medium.com/#chemidy/create-the-smallest-and-secured-golang-docker-image-based-on-scratch-4752223b7324
Some general questions about the docker nodemcu-build process:
Is there a way to specify which modules are included in the build? (similar to the way the cloud build service works)
Is there a way to include a description that will appear when the resultant firmware is run?
Is SSL enabled?
The size of the bin file created by the docker nodemcu-build process (from dev branch source) is 405k. A recent build using the cloud service resulted in a bin file of size 444k. The cloud service build only included the following modules: cjson, file, gpio, http, net, node, tmr, uart, wifi, ssl. Why is the docker build bin file, that contains all modules(?), smaller than the cloud build bin file that only contains 10 modules? (i am concerned that my local docker build version is missing something - even though the build process was error free).
You specify the modules to be built by uncommenting them in the /app/include/user_modules.h file in the source tree. The default build from the source tree is relatively minimal - not an "all modules" build.
The banner at connection is the "Version" field. The nodemcu-build.com builds change this out for custom text. It is defined in /app/include/user_version.h as the USER_VERSION define. You'll need to embed "\n" newlines in the string to get separate lines.
Yes, the Net module can include limited SSL support (TLS 1.1 only) (TLS 1.2 in dev per Marcel's comment below). You need to enable it in /app/include/user_config.h by defining CLIENT_SSL_ENABLE.
The default module and config setup in user_modules.h / user_config.h is different than the defaults on nodemcu-build.com, so the builds are not likely to be the same out of the box.