Abort docker build if nested RPM command has %post failed - docker

Given
postinstall.sh shell script
example.rpm package with #post that runs postinstall.sh
Dockerfile with RUN rpm -ih -vv /opt/my/example.rpm directive
docker build ... command execution
In general it works fine: docker builds image, image deployed, container works fine and there is 100% evidence of postinstall.sh success
some evil person introduces mistake in into postinstall.sh and it starts to fail
Let's say, this evil man deletes /usr/bin/some_removed_program program from the RPM, then we will see following text in rpm (with -vv) output:
#7 1.261 /var/tmp/rpm-tmp.Ks2Ej8: line 34: /usr/bin/some_removed_program: No such file or directory
#7 1.261 D: %post(***): waitpid(25) rc 25 status 7f00
#7 1.261 warning: %post(***) scriptlet failed, exit status 127
This log looks fairly informative, but there is the problem - docker build command returned successfully (returned 0 exit status), which makes #post errors to be detectable only at runtime, which is bad.
The question is:
How to make whole thing fail in case of RPM's #post script failure?
My thoughts about the solution
rpm -ih -vv ... | if grep --quiet "scriptlet failed"; then exit 1; else echo "rpm ok"; fi;
It should work, but it looks like reinventing a wheel. I'd prefer to fallback to this solution if there are really no better ways.
there could be a magic flag for rpm to make it fail if #post failed. Is it?
Yeah, I read about the ideology "rpm %post should never fail, because app is already installed". that ideology is relevant for bare metal and VMs, but here we install it to ephemeral container, which we don't want to exist in case of %post failure.
and I don't want to "check everything in %pre", because it's more verbose and error prone than solution #1.

Related

How to gain visibility of the output of a bash script executed from a Dockerfile?

I received this error message which means something is erroring inside a bash script executed by the Dockerfile.
As an example, if something inside test.sh errors:
RUN test.sh
# 16 ERROR: executor failed running [/bin/sh -c test.sh]: exit code: 127
Question
What is the recommended way to gain visibility over the exact error message (i.e. to find out what's gone wrong) and to diagnose which line(s) of a bash script executed from a Dockerfile are problematic? Can docker be made to provide the output of the bash script so the exact error message is provided? Rather than just the somewhat cryptic:
executor failed running exit code: 127
as seen here.
What I know so far
One way to diagnose which line(s) is playing up is to survey the script, assess which line(s) might be causing problems, and comment out the suspect line and everything after it. If the error goes away, you've found the (first) line that is a problem, and it can be addressed. Rinse and repeat until the script is error-free. But this seems more manual than one would hope.

Running docker pipeline in snakemake using singularity without specifying singularity exec docker://

I'm trying to tie scripts from an existing pipeline on docker into my snakemake pipeline. I have the docker pipeline set up using singularity and it works. For instance,
singularity exec docker://mypipeline some_command.sh file.bam out_file.bam
works perfectly when I run it interactively on the command line. Similarly, when I incorporate the exact same command into my Snakefile it also works:
rule myrule:
input:
"file.bam"
output:
"out_file.bam"
shell:
"singularity exec docker://mypipeline some_command.sh {input} {output}"
However, when I try to follow this tutorial https://reproducibility.sschmeier.com/container/index.html#using-a-container-in-our-workflow to incorporate the container into my workflow as follows
singularity: "docker://mypipeline"
rule myrule:
input:
"file.bam"
output:
"out_file.bam"
shell:
"some_command.sh {input} {output}"
And I run snakemake -p --use-singularity --cores 1 I get the following output
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 myrule
1
[Sun May 17 15:28:11 2020]
rule myrule:
input: file.bam
output: out_file.bam
jobid: 0
some_command.sh file.bam out_file.bam
Activating singularity image myImage.simg
Then I get a very long report that I'm not sure what to make of, followed by this error message
Waiting at most 5 seconds for missing files.
MissingOutputException in line 3 of Snakefile:
Job completed successfully, but some output files are missing. Missing files after 5 seconds:
out_file.bam
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2020-05-17T152810.484310.snakemake.log
My questions:
Why does one work and not the other/how can I get the last example to work?
Is it good practice to declare singularity: "docker://... upfront or does it not matter?
Error message suggests singularity command got executed successfully but snakemake doesn't see the output file. Is the output file out_file.bam shown in your code same as the one you actually use, or you removed some filepath? I would suggest adding --verbose flag to snakemake and reviewing the actual singularity command that snakemake executes.

Bazel fails to run command, by hand it works. How to debug?

I am trying to compile tensorflow in a bit difficult environment (Centos 6.2 with a load of "own" compilations).
A lot of problems I have solved already, but the one baffling me is a SWIG error. I run with "--verbose_failures" and I can see the command that bazel tries to run:
(cd /home/../.cache/bazel/_bazel../[long-hexa]/execroot/org_tensorflow && \
exec env - \
bazel-out/host/bin/external/swig/swig -c++ ...
It fails with "Exit 1: swig failed: error executing command.
However, when I cd to the mentioned directory and run the mentioned bazel-out/../swig command it succeeds! What I have understood is that this probably is a problem of bazel trying to sanitize the environment.
How can I go to debug this? Is there a way to show what environment variables bazel is using?

Dockerfile build - possible to ignore error?

I've got a Dockerfile. When building the image, the build fails on this error:
automake: error: no 'Makefile.am' found for any configure output
Error build: The command [/bin/sh -c aclocal && autoconf && automake -a] returned a non-zero code: 1
which in reality is harmless. The library builds fine, but Docker stops the build once it receives this error. Is there any way I can instruct Docker to just ignore this?
Sure. Docker is just responding to the error codes returned by the RUN shell scripts in the Dockerfile. If your Dockerfile has something like:
RUN make
You could replace that with:
RUN make; exit 0
This will always return a 0 (success) exit code. The disadvantage here is that your image will appear to build successfully even if there are actual errors in the build process.
This might be of interest to those, whose potential errors in their images are not harmless enough to go unnoticed/logged. (Also, not enough rep. to comment, so here as an answer.)
As pointed out, the disadvantage of RUN make; exit 0 is you don't get to know, if your build failed. Hence, rather use something like:
make test 2>&1 > /where/ever/make.log || echo "There were failing tests!"
Like this, you get notified via the docker image build process log, and you can see what exactly went bad during make (or whatsoever else execution, this is not restricted to make).
You can also use the standard bash ignore error || true, which is nice if you are in the middle of a chain:
RUN <first stage> && <job that might fail> || true && <next stage>

Nagios return status unknow

I install Nagios on CentOS to monitor some servers, and one of them is a TSM server.
I download a plugin written in bash when i execute it in command line it works.
/usr/lib64/nagios/plugins/check_tsm db -v6
db - database utilization 42%, OK
and the return code of the batch script is 0 ( from the command echo $? )
So the script work fine, and return 0 that mean a OK status in nagios, but the status still unknown, I really don't know why.
And i check logs in nagios, etc. It's not a problem of commands definition in commands.cfg or the declaration of service, because I copy the command that nagios send automatically every 5 min and the command works fine in command line, but still unknow status.
Definition of command:
define command{
command_name check_tsm_v6
command_line /usr/lib64/nagios/plugins/check_tsm $ARG1$ -v6 $ARG2$ $ARG3$
}
Service declaration :
define service{
use generic-service
host_name tsm-test
service_description database utilization
check_command check_tsm_v6!db!85!90
}
And here's the bash script.
One thing that's caught me out in the past with Nagios scripts is user rights. When testing your script directly on the command line be sure to precede it with:
sudo -u nagios
So yours would be:
sudo -u nagios /usr/lib64/nagios/plugins/check_tsm db -v6
This assumes that your nagios instance is being run by the nagios user, which is a fairly safe bet.
Good luck
Brad
Try to use yum install sysstat -y command to download the package.
If it work that will a great. If you are facing still same please upload the complete error which is showing in browser?

Resources