Running docker pipeline in snakemake using singularity without specifying singularity exec docker:// - docker

I'm trying to tie scripts from an existing pipeline on docker into my snakemake pipeline. I have the docker pipeline set up using singularity and it works. For instance,
singularity exec docker://mypipeline some_command.sh file.bam out_file.bam
works perfectly when I run it interactively on the command line. Similarly, when I incorporate the exact same command into my Snakefile it also works:
rule myrule:
input:
"file.bam"
output:
"out_file.bam"
shell:
"singularity exec docker://mypipeline some_command.sh {input} {output}"
However, when I try to follow this tutorial https://reproducibility.sschmeier.com/container/index.html#using-a-container-in-our-workflow to incorporate the container into my workflow as follows
singularity: "docker://mypipeline"
rule myrule:
input:
"file.bam"
output:
"out_file.bam"
shell:
"some_command.sh {input} {output}"
And I run snakemake -p --use-singularity --cores 1 I get the following output
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 myrule
1
[Sun May 17 15:28:11 2020]
rule myrule:
input: file.bam
output: out_file.bam
jobid: 0
some_command.sh file.bam out_file.bam
Activating singularity image myImage.simg
Then I get a very long report that I'm not sure what to make of, followed by this error message
Waiting at most 5 seconds for missing files.
MissingOutputException in line 3 of Snakefile:
Job completed successfully, but some output files are missing. Missing files after 5 seconds:
out_file.bam
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2020-05-17T152810.484310.snakemake.log
My questions:
Why does one work and not the other/how can I get the last example to work?
Is it good practice to declare singularity: "docker://... upfront or does it not matter?

Error message suggests singularity command got executed successfully but snakemake doesn't see the output file. Is the output file out_file.bam shown in your code same as the one you actually use, or you removed some filepath? I would suggest adding --verbose flag to snakemake and reviewing the actual singularity command that snakemake executes.

Related

How to gain visibility of the output of a bash script executed from a Dockerfile?

I received this error message which means something is erroring inside a bash script executed by the Dockerfile.
As an example, if something inside test.sh errors:
RUN test.sh
# 16 ERROR: executor failed running [/bin/sh -c test.sh]: exit code: 127
Question
What is the recommended way to gain visibility over the exact error message (i.e. to find out what's gone wrong) and to diagnose which line(s) of a bash script executed from a Dockerfile are problematic? Can docker be made to provide the output of the bash script so the exact error message is provided? Rather than just the somewhat cryptic:
executor failed running exit code: 127
as seen here.
What I know so far
One way to diagnose which line(s) is playing up is to survey the script, assess which line(s) might be causing problems, and comment out the suspect line and everything after it. If the error goes away, you've found the (first) line that is a problem, and it can be addressed. Rinse and repeat until the script is error-free. But this seems more manual than one would hope.

Is it possible to run a command in a Docker image as a test in Bazel?

I would like to run a command inside a container to test that it works. It should be invoked by bazel test.
Something like this:
container_test(
image = "//:my_image"
test_command = "exit 1"
)
I noticed this: https://github.com/bazelbuild/rules_docker/blob/master/contrib/test.bzl#L125
However it isn't documented.
How should I approach this in Bazel?
Take a look at the sample test rule here
This is a test rule which creates a script (script) that can be invoked in the CLI
The script will then exit with a non-zero error-code to indicate that the test failed (or 0 for success)
The script is then written as an executable output (ctx.actions.write), declares the list of files it needs available at runtime (runfiles)
This python function is then wrapped as a bazel rule (see full guide here)
So, how would you proceed towards creating your container test rule?
The script we want to generate above is probably some usage of docker run --rm IMAGE [COMMAND] [ARG...] to create a container from an image, run a command, and remove the container when done
Don't forget to set the script exit status based on the exit status of the docker command (as done in the example, where they copy the exit status of grep as the exit status for the overall script)
Update the sample above to use the above docker command, and plant the path to the image accordingly
See f.path in the script above showing how they access the path of an individual source file
You will need to make sure docker is available when your bazel rules are evaluated
I haven't done this fully myself since I don't have a computer with both bazel and docker, but this should be enough to get you started :)
Good luck!

gdbserver does not attach to a running process in a docker container

In my docker container (based on SUSE distribution SLES 15) both the C++ executable (with debug enhanced code) and the gdbserver executable are installed.
Before doing anything productive the C++ executable sleeps for 5 seconds, then initializes and processes data from a database. The processing time is long enough to attach it to gdbserver.
The C++ executable is started in the background and its process id is returned to the console.
Immediately afterwards the gdbserver is started and attaches to the same process id.
Problem: The gdbserver complains not being able to connect to the process:
Cannot attach to lwp 59: No such file or directory (2)
Exiting
In another attempt, I have copied the same gdbserver executable to /tmp in the docker container.
Starting this gdbserver gave a different error response:
Cannot attach to process 220: Operation not permitted (1)
Exiting
It has been verified, that in both cases the process is still running. 'ps -e' clearly shows the process id and the process name.
If the process is already finished, a different error message is thrown; this is clear and needs not be explained:
gdbserver: unable to open /proc file '/proc/79/status'
The gdbserver was started once from outside of the container and once from inside.
In both scenarios the gdbserver refused to attach the running process:
$ kubectl exec -it POD_NAME --container debugger -- gdbserver --attach :44444 59
Cannot attach to lwp 59: No such file or directory (2)
Exiting
$ kubectl exec -it POD_NAME -- /bin/bash
bash-4.4$ cd /tmp
bash-4.4$ ./gdbserver 10.0.2.15:44444 --attach 220
Cannot attach to process 220: Operation not permitted (1)
Exiting
Can someone explain what causes gdbserver refusing to attach to the specified process
and give advice how to overcome the mismatch, i.e. where/what do I need to examine for to prepare the right handshake between the C++ executable and the gdbserver?
The basic reason why gdbserver could not attach to the running C++ process is due to
a security enhancement in Ubuntu (versions >= 10.10):
By default, process A cannot trace a running process B unless B is a direct child of A
(or A runs as root).
Direct debugging is still always allowed, e.g. gdb EXE and strace EXE.
The restriction can be loosen by changing the value of /proc/sys/kernel/yama/ptrace_scope from 1 (=default) to 0 (=tracing allowed for all processes). The security setting can be changed with:
echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
All credits for the description of ptrace scope belong to the following post,
see 2nd answer by Eliah Kagan - thank you for the thorough explanation! - here:
https://askubuntu.com/questions/143561/why-wont-strace-gdb-attach-to-a-process-even-though-im-root

How to run repo from a script inside a container in a jenkins job

I am unable to run repo non-interactively inside a container as part of a freestyle job.
It prompts for the user-name and email. I got round that by doing a git config --global inside the job.
But then it does the color test, and that hangs indefinitely.
Looking at the source code for repo I see this
if os.isatty(0) and os.isatty(1) and not self.manifest.IsMirror:
if opt.config_name or self._ShouldConfigureUser():
self._ConfigureUser()
self._ConfigureColor()
So, I ran the following inside the container:
python -C "import os; print os.isatty(0), os.isatty(1)"
and, sure enough, it printed out True True
Looking at the Jenkins log, it launches the container with --tty specified, and there seems no way to configure that option.
I can't find a bash option to force a script to be run in a non-interactive shell. If I put the above python line in a file and execute it with almost any combination of commands and options, it still prints out True True
The only way I see something different is if I use I/O redirection
bash <a.sh
which prints out False True - i.e. stdin is not a tty, and
bash <a.sh >a.log
which prints False False.
For a complex script, are there any problems using the bash <script approach?
Does anyone know any jenkins magic to prevent docker being launched using --tty?
I know that the --tty is the culprit. I built the container locally and ran the following
$ docker run repotest python -c "import os;print os.isatty(0), os.isatty(1)"
False False
$ docker run --tty repotest python -c "import os;print os.isatty(0), os.isatty(1)"
True True
Running Versions:
repo: 1.12.37 (per Ubuntu 16.04 apt-get)
Jenkins: 2.149
Cloudbees Docker Plugin: 1.7.3
Container base is ubuntu:xenial
I'm using the "Build inside a docker container" option.
To run bash script repo_script.sh "non-interactively", or more exactly speaking without having terminals associated with standard streams, you could run your script simply as
repo_script.sh < /dev/null 2>&1 | cat
assuming you want to see the output the way you would see it running simply as repo_script.sh. By piping the standard output and error to a different process the file descriptor appears as a pipe and not TTY to repo_script.sh. You could also direct output to a file, or even to /dev/null if you do not care about the output:
log_file=/dev/null
repo_script.sh < /dev/null > "${log_file}" 2>&1
Running the script as
bash < repo_script.sh | cat
might would work too, though it is very unorthodox and to my mind hackish way of running a script just to break the association of TTY to the standard input. From script engine point of view, it is different to read a script program from a file than from standard input (which typically, if it is a terminal, is not seekable), so there might be some subtle differences that could possibly bite you in unexpected ways. This way does not as clearly communicate your intention to the next person that need to understand your code, and may lead to partial hair loss in that person due to extraneous head scratching.
There is no need for any bash options, just using the output directions from within the interpreting shell as above described is an easy-to-comprehend, multi-platform compatible standard convention for changing the standard stream associations.
P.S. I think it should be enough for your repo script to just test if the standard input is a TTY. It looks to me like the author of that script did not think deeply enough there. There is simply no use waiting for input if you do not have terminal device associated with standard input, and you could determine that everything needs to run without user interaction from there or stop with an error if that is not possible.

why does dockerized zap hang at the end of a baseline scan?

Fresh image and container of
owasp/zap2docker-stable:latest
The command:
docker exec zap1 ./zap-baseline.py
Hangs or processes forever afer:
FAIL-NEW: 0 FAIL-INPROG: 0 WARN-NEW: 4 WARN-INPROG: 0 INFO: 0 IGNORE: 0 PASS: 12
While earlier (2-3 months ago) it executed properly. Btw when I execute the same command inside the container, then it executes and shuts down properly. How to fix this so that jenkins job won't be stuck forever at the summary?
BTW Why does baseline-scan.py always print out the help section if I add '-r report.html' at the end? (EDIT, a typo -t instead of -r, but the problem stays)
That command doesnt look right to me.
The recommended command is:
docker run -t owasp/zap2docker-stable zap-baseline.py -t https://www.example.com
As per https://github.com/zaproxy/zaproxy/wiki/ZAP-Baseline-Scan
Its always printing out the help because '-t report-html' isnt valid. Look at the help shown to see the valid arguments. For an html report you should be using '-r report.html'

Resources