MPI on docker main process - docker

Recommended way of dealing with horovod and docker is: https://github.com/uber/horovod/blob/master/docs/docker.md. That's bad in a way because it leaves bash as a primary docker process and python process as a secondary. Docker logs report of bash logs, docker state is dependent on bash state, docker closes if bash process closes, etc, so it thinks its main process is bash while it should be python process we're starting. Is it possible to make python process main process in all dockers workers, primary and secondary?
I tried starting mpirun process outside instead of starting mpirun inside of the docker, with interactive docker start command as a mpirun command (docker containers were already prepared with nvidia-docker create):
mpirun -H localhost,localhost \
-np 1 \
-bind-to none \
-map-by slot \
-x NCCL_DEBUG=INFO \
-x LD_LIBRARY_PATH \
-x PATH \
-x NCCL_SOCKET_IFNAME=^docker0,lo \
-mca btl_tcp_if_exclude lo,docker0 \
-mca oob_tcp_if_exclude lo,docker0 \
-mca pml ob1 \
-mca btl ^openib \
docker start -a -i bajaga_aws-ls0-l : \
-np 1 \
-bind-to none \
-map-by slot \
-x NCCL_DEBUG=INFO \
-x LD_LIBRARY_PATH \
-x PATH \
-x NCCL_SOCKET_IFNAME=^docker0,lo \
-mca btl_tcp_if_exclude lo,docker0 \
-mca oob_tcp_if_exclude lo,docker0 \
-mca pml ob1 \
-mca btl ^openib \
docker start -a -i bajaga_aws-ls1-l
But that failed - processes didn't communicate via horovod and were working as independent processes.
Do you know how could I achieve making python process docker main process?

Managed to execute this good enough via few tricks:
* Starting container with entrypoint that runs forever until sigterm is passed
* Starting mpi stuff as another process
* Writting output to process 1 stdout/err, so that docker logs works
* At the end of my process sending sigterm to process 1, so that whole container close.

Related

connect jenkins swarm client over http proxy

i'm attempting to run a swarm client on an RFC1918 node. the idea is to make use of a squid proxy so that it can communicate to hudson which is in AWS.
however, when i attempt to run
/usr/bin/java \
-Dhttp.proxyHost=my.proxy.com -Dhttps.proxyHost=my.proxy.com \
-Dhttp.proxyPort=3128 -Dhttps.proxyPort=3128 \
-Dhttp.nonProxyHosts=127.0.0.0/8,192.168.0.0/16,10.0.0.0/8,.proxy.com \
-jar /usr/share/jenkins/swarm-client-3.15.jar \
-mode normal -executors 1 \
-username user -passwordEnvVariable JSWARM_PASSWORD \
-name my-agent \
-master https://external.host.com \
-labels 'docker ldfc' \
-fsroot /j \
-disableClientsUniqueId \
-deleteExistingClients
i can see from packet traces that it makes no attempt to go through my.proxy.com and instead tries to communicate directly to external.host.com (which of course fails).
i believe i'm following the official docs at https://github.com/jenkinsci/swarm-plugin/blob/master/docs/proxy.adoc; what am i doing wrong?

How can I run fIrefox from within a docker container

I'm trying to create a docker container that will let me run firefox, so I can eventually use a jupyter notebook. Right now, although I have successfully installed firefox, I cannot get a window to open.
Following instructions from running-gui-apps-within-docker, I created an image (i.e. "sample") with Firefox and then tried to run it using
$ docker run -it --rm -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix --net=host sample
When I did so, I got the following error:
root#machine:~# firefox
No protocol specified
Unable to init server: Could not connect: Connection refused
Error: cannot open display: :1
Using man docker run to understand the flags, I was not able to find the --net flag, though I did see a --network flag. However, replacing --net with --network didn't change anything. How do I specify a protocol, that will let me create an image from whose containers I will be able to run firefox?
PS - For what it's worth, when I check the value of DISPLAY, I get the predictable:
~# echo $DISPLAY
:1
I have been running firefox inside docker for quite some time so this is possible. With regards to the security aspects I think the following is the relevant parts:
Building
The build needs to match up uid/gid values with the user that is running the container. I do this with UID and GID build args:
Dockerfile
...
FROM fedora:35 as runtime
ENV DISPLAY=:0
# uid and gid in container needs to match host owner of
# /tmp/.docker.xauth, so they must be passed as build arguments.
ARG UID
ARG GID
RUN \
groupadd -g ${GID} firefox && \
useradd --create-home --uid ${UID} --gid ${GID} --comment="Firefox User" firefox && \
true
...
ENTRYPOINT [ "/entrypoint.sh" ]
Makefile
build:
docker pull $$(awk '/^FROM/{print $$2}' Dockerfile | sort -u)
docker build \
-t $(USER)/firefox:latest \
-t $(USER)/firefox:`date +%Y-%m-%d_%H-%M` \
--build-arg UID=`id -u` \
--build-arg GID=`id -g` \
.
entrypoint.sh
#!/bin/sh
# Assumes you have run
# pactl load-module module-native-protocol-tcp auth-ip-acl=127.0.0.1 auth-anonymous=1
# on the host system.
PULSE_SERVER=tcp:127.0.0.1:4713
export PULSE_SERVER
if [ "$1" = /bin/bash ]
then
exec "$#"
fi
exec /usr/local/bin/su-exec firefox:firefox \
/usr/bin/xterm \
-geometry 160x15 \
/usr/bin/firefox --no-remote "$#"
So I am running firefox as a dedicated non-root user, and I wrap it via xterm so that the container does not die if firefox accidentally exit or if you want to restart. It is a bit annoying having all these extra xterm windows, but I have not found any other way in preventing accidental loss of the .mozilla directory content (mapping out to a volume would prevent running multiple independent docker instances which I definitely want, and also from a privacy point of view not dragging along a long history is something I want. Whenever I do want to save something I make a copy of the .mozilla directory and save it on the host computer (and restore later in a new container)).
Running
run.sh
#!/bin/bash
export XSOCK=/tmp/.X11-unix
export XAUTH=/tmp/.docker.xauth
touch ${XAUTH}
xauth nlist ${DISPLAY} | sed -e 's/^..../ffff/' | uniq | xauth -f ${XAUTH} nmerge -
DISPLAY2=$(echo $DISPLAY | sed s/localhost//)
if [ $DISPLAY2 != $DISPLAY ]
then
export DISPLAY=$DISPLAY2
xauth nlist ${DISPLAY} | sed -e 's/^..../ffff/' | uniq | xauth -f ${XAUTH} nmerge -
fi
ARGS=$(echo $# | sed 's/[^a-zA-Z0-9_.-]//g')
docker run -ti --rm \
--user root \
--name firefox-"$ARGS" \
--network=host \
--memory "16g" --shm-size "1g" \
--mount "type=bind,target=/home/firefox/Downloads,src=$HOME/firefox_downloads" \
-v ${XSOCK}:${XSOCK} \
-v ${XAUTH}:${XAUTH} \
-e XAUTHORITY=${XAUTH} \
-e DISPLAY=${DISPLAY} \
${USER}/firefox "$#"
With this you can for instance run ./run.sh https://stackoverflow.com/ and get a container named firefox-httpsstackoverflow.com. If you then want to log into your bank completely isolated from all other firefox instances (protected by operating system process boundaries, not just some internal browser separation) you run ./run.sh https://yourbank.example.com/.
Try run xhost + in your docker host to allow conections with X server.

create a docker container that auto-exits after 1 hour

It is possible to create a docker container that exists and deletes itself after a specific amount of time?
for example, if I have an app that I run using:
docker run -d \
--name=my_name\
-p 3800:3800 \
-v /docker/appdata/folder:/folder:rw \
-v $HOME:/storage:rw \
image/here
I normally do docker ps, find container id, stop it manually then rm it, is it possible to replace the manual part, by setting a 1 hour expiry for each container to self-destruct 1 hour after the run command?
Thanks in advance
You can add next parameters:
--stop-timeout # (API 1.25+) Timeout (in seconds) to stop a container
--rm # to Automatically remove the container when it exits
So, your command will look like:
docker run -d \
--stop-timeout 3600 \
--rm \
--name=my_name\
-p 3800:3800 \
-v /docker/appdata/folder:/folder:rw \
-v $HOME:/storage:rw \
image/here

gdb debug in docker failed

envs:
host:centos
docker:ubuntu 16 nivida-docker
program:c++ websocket
desc:
when I use gdb in docker ,I can't use breakpoint ,it just says:warning: error disabling address space randomization: operation not permitted.I see alot of resolutions to this question,all of them tell me to add :--cap-add=SYS_PTRACE --security-opt seccomp=unconfinedto my docker file ,so I did it.here is my docker file:
!/bin/sh
SCRIPT_DIR=$(cd $(dirname "${BASH_SOURCE[0]}") && pwd)
PROJECT_ROOT="$( cd "${SCRIPT_DIR}/.." && pwd )"
echo "PROJECT_ROOT = ${PROJECT_ROOT}"
run_type=$1
docker_name=$2
sudo docker run \
--name=${docker_name} \
--privileged \
--network host \
-it --rm \
--cap-add=SYS_PTRACE --security-opt seccomp=unconfined \
-v ${PROJECT_ROOT}/..:/home \
-v /ssd3:/ssd3 \
xxxx/xx/xxxx:xxxx \
bash
but when restart the container and run gdb ,it always killed like below:
(gdb) r -c conf/a.json -p 8075
Starting program: /home/Service/bin/Service --args -c conf/a.json -p 8075
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Killed
I don't known where is wrong ,anyone have any opinions?
Try this
docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined

Error when run docker images in raspberry pi

I managed to run this firefox docker container on a standard linux box based on this article, however when I installed docker on my raspberry pi, I get this error when I want to run the same:
docker: Error response from daemon: rpc error: code = 2 desc = "oci runtime error: exec format error".
I've followed the instruction of a guy in the discussion:
Windows 7+
It's a bit easier on Windows 7+ with MobaXterm:
Install MobaXterm for windows
Start MobaXterm
Configure X server: Settings -> X11 (tab) -> set X11 Remote Access to full
Use this BASH script to launch the container
run_docker.bash:
#!/usr/bin/env bash
CONTAINER=py3:2016-03-23-rc3
COMMAND=/bin/bash
DISPLAY="$(hostname):0"
USER=$(whoami)
docker run \
-it \
--rm \
--user=$USER \
--workdir="/home/$USER" \
-v "/c/Users/$USER:/home/$USER:rw" \
-e DISPLAY \
$CONTAINER \
$COMMAND
On my pi this is the start script:
#!/usr/bin/env bash
CONTAINER=creack/firefox-vnc
COMMAND=/bin/bash
#DISPLAY="$(hostname):1.0"
DISPLAY="CCKK4H2:0.0"
USER=$(whoami)
docker run \
-it \
--rm \
--user=$USER \
--workdir="/home/$USER" \
-v "/c/Users/$USER:/home/$USER:rw" \
-e DISPLAY \
$CONTAINER \
$COMMAND
This is how it worked for me on normal centos.
Any idea how to troubleshoot or what does this means?
On Raspberry PI, you can only run images designed to work on ARM architectures.
You can find some in the arm32v* repositories : arm32v6 or arm32v7.
If you want to create your own ARM compatible images, you must build the images from an ARM device (your Raspberry PI for example).

Resources