I have the need to, after building targets, trigger a deploy of these.
To really not waste any time I would like to have that as a build rule. Now for that to work I would like to have the deploy be done every time.
So the question is:
How do I force a target to be rebuilt from scratch deterministically?
It would be more bazel-y to do this as a second step, e.g.,
java_binary(
name = "target1",
...
)
java_binary(
name = "target2",
...
)
sh_binary(
name = "deploy-targets",
srcs = ["deploy-targets.sh"],
data = [":target1.jar", ":target2.jar", ...],
)
Then do bazel run //path/to:deploy-targets when you want to deploy.
deploy-targets.sh would look something like:
#!/bin/bash
for t in $(ls ws/path/to/*.jar); do
mvn deploy:deploy-file -Dfile=$t ...
done
Actions (which are what happen during a build) are not supposed to interact with the outside environment, so deploying kind of breaks that contract. run, on the other hand, can do anything, it's just running a binary.
Using run would also solve your "run every time" problem: Bazel can't "cache" forking a binary.
Related
This question already has answers here:
Disable cache for specific RUN commands
(9 answers)
Closed 1 year ago.
I frequently seem to have to write Dockerfiles like this (line numbers added for clarity):
1. FROM somebase
2. RUN cp /some/local/stuff /some/docker/container/path
3. RUN some-other-local-commands
4. RUN wget http://some.remote.server/some.remote.path.for.example.json
5. RUN some-other-local-commands-which-may-depend-on-the-json
On line (4), I'm fetching a remote resource. Let's assume for now that's a JSON file. It might change from time-to-time, maybe not on every build, but perhaps every few hours or days.
What this means is that every time I build my container, I want to ensure the freshest JSON file is fetched. One way to force this is to add the --no-cache parameter to my docker build command, but this forces all of the lines/layers to rebuild, including (1)-(3), where that is likely not necessary. Is there a pattern or technique to automatically 'taint' or 'mark' line (4) so that Docker knows it always has to re-run the wget (presumably this would also have to force a rebuild of line 5), whilst still getting the layer caching behaviour for lines (1)-(3) when Docker detects the pre-req files haven't changed?
If the specific thing you're trying to trigger rebuilds is the result of RUN wget ... a specific URL, Docker does actually have native support for this.
There are two similar commands to copy files into a container. COPY only copies files from the build context. ADD can also fetch external URLs and unpack local archives (but not both at the same time). The general recommendation is to use COPY, unless you need one of the specific things ADD does differently.
So you should be able to say
ADD http://some.remote.server/some.remote.path.for.example.json .
RUN some-other-local-commands-which-may-depend-on-the-json
and the RUN command will use the Docker layer cache based on the contents of the fetched file.
If this approach doesn't work for you (maybe you need special authentication to fetch the file) you can also fetch the file outside of Docker before you run docker build, and then COPY it in. Again, it will work like any other file you COPY in, and layer caching will take effect based on whether the file has changed or not.
I have a java_binary target in my workspace that I'm later passing as an executable to ctx.actions.run inside the rule. So far so good.
Now I want to debug this java_binary while Bazel is executing the rule. In order to attach a debugger I need the java_binary run in debug mode. So far, the only thing I came up with is setting jvm_flags on the java_binary. I was able to get that to work. But I was wondering if there is a way to achieve it from the command line instead of baking it into the java_binary.
java_binary(
...
jvm_flags = [
"-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=8000"
],
)
Is it possible to achieve this from the command line without hard coding jvm_flags?
Try:
bazel run //:my-target -- --debug
One strategy is to run the build with --subcommands, which will tell bazel to print out all the commands it's running during the build. Then find the command line corresponding to the invocation of the java_binary you're interested in. Then you can copy/paste that command (including the cd part) and modify it to include the debug flags, and debug it as you would any other process.
Note also that java_binary outputs a wrapper script that includes a --debug[=<port>] flag, so that should be all that needs to be added to the command line.
Note also that --subcommands will only print the commands that are actually executed during the build, so a fully cached / fully incremental build will print nothing. You may need to do a clean, or delete some of the outputs of the action you're interested in so that bazel runs that command.
It looks like you can pass the --jvm_flag option as part of the program options after the --.
BUILD:
java_binary(
name = "extract",
main_class = "com.pkg.Main",
resources = glob(["src/main/resources/**/*"]),
)
CLI:
bazel run //:extract -- --jvm_flag="-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=7942" -path %cd%\config.json
It seems that the --jvm_flag option needs to come immediately after the --, before the program options (-path in the example). This is with Bazel 3.7.0.
I'm creating an auto-testing service for my university. I need to take student code, put it into the project directory, and run tests.
This needs to be done for multiple different languages in an extensible way.
My initial plan:
Have a "base image" for each language (i.e. install the language runtime on buildpack-deps:stretch)
Take user files & pre-made project structure
Put user files into the correct location in the project
Build an image of the project extending the base image
Run the container. It will compile the project and run tests.
Save test results to the database, stop & delete the image
Rinse repeat for every submission
When testing manually, the image sizes are huge! Almost 1.5GB in size! I'm installing the runtime for one language, and I was testing with Hello World - so the project wasn't big either.
This "works", but feels very inefficient. I'm also very new to Docker – is there a better way to do this?
Cheers
In this specific application, I'd probably compile the program inside a container and not build an image out of it (since you're throwing it away immediately, and the compilation and testing is the important part and, unusually, you don't need the built program for anything after that).
If you assume that the input file gets into the container somehow, then you can write a script that does the building and testing:
#!/bin/sh
cd /project/src/student
tar xzf "/app/$1"
cd ../..
make
...
curl ??? # send the test results somewhere
Then your Dockerfile just builds this into an image, without any specific student code in it
FROM buildpack-deps:stretch
RUN apt-get update && apt-get install ...
RUN adduser user
COPY build_and_test.sh /usr/local/bin
USER user
ADD project-structure.tar.gz /project
Then when you actually go to run it, you can use the docker run -v option to inject the submitted code.
docker run --rm -v $HOME/submissions:/app theimage \
build_and_test.sh student_name.tar.gz
In your original solution, note that the biggest things are likely to be the language runtime, C toolchain, and associated header files, and so while you get an apparently huge image, all of these things come from layers in the base image and so are shared across the individual builds (it's not taking up quite as much space as you think).
I've seen many dockerfiles include all build steps in a RUN statement, like:
RUN echo "Hello" &&
cd /tmp &&
mv a.txt b.txt &&
...
and so on...
My question is: what's the benefits/drawbacks on replace these instructions by a single bash script that gives me highlight syntax, loop capabilities, etc?
Something like:
COPY ./script.sh /tmp
RUN bash /tmp/script.sh
and then
#!/bin/bash
echo "hello" ;
cd /tmp ;
mv a.txt b.txt ;
...
Thanks!
The primary difference is that when you COPY the bash script into the image it will be available for inspection in the running container, whereas the RUN command is a little more opaque. Putting your commands in a file like that is arguably more manageable for other reasons: changes in your VCS history will be a little more clear, and for longer or more complex scripts you will probably find it easier to format things cleanly with the script in a separate file rather than embedded in your Dockerfile in a RUN command.
Otherwise the result is the same (in both cases, you are executing the same set of commands), although the COPY and RUN will result in an extra image layer (vs. just the RUN by itself).
I guess running it off as a shell script gives you more control.
For instance, you can do if-else statements to check whether a command has failed or not and provide a code path to handle it. Whereas RUN is more straight forward and when the return code is not 0 it fails the build immediately.
Obviously the case you have there is a relatively simple one and it would not have had a huge difference. The only impact I can see here is the code readability aspect. Someone would have to read the shell script to know what is happening, comparing to having everything on a single file.
I guess it all comes down to using the right tool for the right job. If it is a simple command and you don't need complex logic handling then do RUN.
I have a few RUN commands in my Dockerfile that I would like to run with -no-cache each time I build a Docker image.
I understand the docker build --no-cache will disable caching for the entire Dockerfile.
Is it possible to disable cache for a specific RUN command?
There's always an option to insert some meaningless and cheap-to-run command before the region you want to disable cache for.
As proposed in this issue comment, one can add a build argument block (name can be arbitrary):
ARG CACHEBUST=1
before such region, and modify its value each run by adding --build-arg CACHEBUST=$(date +%s) as a docker build argument (value can also be arbitrary, here it is current datetime, to ensure its uniqueness across runs).
This will, of course, disable cache for all following blocks too, as hash sum of the intermediate image will be different, which makes truly selective cache disabling a non-trivial problem, taking into account how docker currently works.
Use
ADD "https://www.random.org/cgi-bin/randbyte?nbytes=10&format=h" skipcache
before the RUN line you want to always run. This works because ADD will always fetch the file/URL and the above URL generates random data on each request, Docker then compares the result to see if it can use the cache.
I have also tested this and works nicely since it does not require any additional Docker command line arguments and also works from a Docker-compose.yaml file :)
If your goal is to include the latest code from Github (or similar), one can use the Github API (or equivalent) to fetch information about the latest commit using an ADD command.
docker build will always fetch an URL from an ADD command, and if the response is different from the one received last time docker build ran, it will not use the subsequent cached layers.
eg.
ADD "https://api.github.com/repos/username/repo_name/commits?per_page=1" latest_commit
RUN curl -sLO "https://github.com/username/repo_name/archive/main.zip" && unzip main.zip
As of February 2016 it is not possible.
The feature has been requested at GitHub
Not directly but you can divide your Dockerfile in several parts, build an image, then FROM thisimage at the beginning of the next Dockerfile, and build the image with or without caching
the feature added a week ago.
ARG FOO=bar
FROM something
RUN echo "this won't be affected if the value of FOO changes"
ARG FOO
RUN echo "this step will be executed again if the value of FOO changes"
FROM something-else
RUN echo "this won't be affected because this stage doesn't use the FOO build-arg"
https://github.com/moby/moby/issues/1996#issuecomment-550020843
Building on #Vladislav’s solution above I used in my Dockerfile
ARG CACHEBUST=0
to invalidate the build cache from hereon.
However, instead of passing a date or some other random value, I call
docker build --build-arg CACHEBUST=`git rev-parse ${GITHUB_REF}` ...
where GITHUB_REF is a branch name (e.g. main) whose latest commit hash is used. That means that docker’s build cache is being invalidated only if the branch from which I build the image has had commits since the last run of docker build.
I believe that this is a slight improvement on #steve's answer, above:
RUN git clone https://sdk.ghwl;erjnv;wekrv;qlk#gitlab.com/your_name/your_repository.git
WORKDIR your_repository
# Calls for a random number to break the cahing of the git clone
# (https://stackoverflow.com/questions/35134713/disable-cache-for-specific-run-commands/58801213#58801213)
ADD "https://www.random.org/cgi-bin/randbyte?nbytes=10&format=h" skipcache
RUN git pull
This uses the Docker cache of the git clone, but then runs an uncached update of the repository.
It appears to work, and it is faster - but many thanks to #steve for providing the underlying principles.
Another quick hack is to write some random bytes before your command
RUN head -c 5 /dev/random > random_bytes && <run your command>
writes out 5 random bytes which will force a cache miss