In dask_jobqueue, how to add extra lines to job_script?

In dask_jobqueue, how to add extra lines to job_script? - dask

I'm configuring dask to run on an HPC cluster. I setup a client as follows:
First modify ~/.config/dask/*.yaml then run some code like this:
from dask_jobqueue import SLURMCluster
cluster = SLURMCluster()
cluster.scale(100) # Start 100 workers in 100 jobs
from distributed import Client
client = Client(cluster)
print(cluster.job_script())
Here's what the resultant job_script looks like:
#!/bin/bash
#!/usr/bin/env bash
#SBATCH -J dask-worker
#SBATCH -n 1
#SBATCH --cpus-per-task=1
#SBATCH --mem=1GB
#SBATCH -t 00:30:00
JOB_ID=${SLURM_JOB_ID%;*}
/path/to/python3 -m distributed.cli.dask_worker tcp://192.168.*.*:* --nthreads 1 --memory-limit 1000.00MB --name dask-worker--${JOB_ID}-- --death-timeout 60 --local-directory /scratch
So the script launches python3 immediately, but I need to make it do some stuff like activating a conda environment, or python virtual env before launching python. How can I add some pre-commands to the job_script?

I got it, by reading source code of dask_jobqueue/core.py, which was thankfully very easy.
In ~/.config/dask/jobqueue.yaml, edit env-extra. Each string in the list is added to the script as a command. For example, when using
env-extra: ['cd foo', 'mkdir bar', 'cd bar', 'conda foo']
The job_script comes out like this:
#!/bin/bash
#!/usr/bin/env bash
#SBATCH -J dask-worker
#SBATCH -n 1
#SBATCH --cpus-per-task=1
#SBATCH --mem=1GB
#SBATCH -t 00:30:00
JOB_ID=${SLURM_JOB_ID%;*}
cd foo
mkdir bar
cd bar
conda foo
/path/to/python3 -m distributed.cli.dask_worker tcp://192.168.*.*:* --nthreads 1 --memory-limit 1000.00MB --name dask-worker--${JOB_ID}-- --death-timeout 60 --local-directory /scratch

Related

Nsys Profile with MPMD(multiple program and multiple data) simulation

I am trying to profile a MPI+OPENACC program with nsys.
I am using OpenMPI(3.1.6) from Nvidia HPC SDK(20.7) with UCX enabled.
There are three exectuables, exec1, exec2, exec3. I want to profile for exec3. But I am failing.
Following is the run script:-
#SBATCH --nodes=1
#SBATCH --ntasks=40
#SBATCH --ntasks-per-node=40
#SBATCH --output=app.out
#SBATCH --error=app.err
#SBATCH -p Intel_6248_2s_20c_2t_GPU_hdr100_192GB_2933
#SBATCH --exclusive
#SBATCH --gres=gpu:4
WRAPPER=/run/acc_round_robin.sh
exec1=$workdir/exec/prog1
exec2=$workdir/exec/prog2
exec3=$workdir/exec/prog3
echo "0 $WRAPPER $exec1> $workdir/file.conf
echo "2-9,11-19,21-29,32-39 $WRAPPER $exec2">> $workdir/file.conf
echo "nsys profile 1,10,20,30,31 $WRAPPER $exec3">> $workdir/file.conf
echo "#!/bin/bash" > $workdir/file1_cmd
echo "srun --multi-prog $workdir/file.conf" >> $workdir/file1_cmd
echo "exit 1" >> $workdir/file1_cmd
chmod +x $workdir/file1_cmd
/usr/bin/time ./CASTING cast ./configure
date
TEND=echo "print time();" | perl
echo "++++ Total elapsed time expr $TEND - $TBEGIN seconds"
Run:- sbatch run.sh

How to Import Streamsets pipeline in Dockerfile without container exiting

I am trying to import a pipeline into streamsets, during container start up, by using the Docker CMD command in Dockerfile. The image builds, but while creating the container there is no error but it exits with code 0. So it never comes up. Here is what I did:
Dockerfile:
FROM streamsets/datacollector:3.18.1
COPY myPipeline.json /pipelinejsonlocation/
EXPOSE 18630
ENTRYPOINT ["/bin/sh"]
CMD ["/opt/streamsets-datacollector-3.18.1/bin/streamsets","cli","-U", "http://localhost:18630", \
"-u", \
"admin", \
"-p", \
"admin", \
"store", \
"import", \
"-n", \
"myPipeline", \
"--stack", \
"-f", \
"/pipelinejsonlocation/myPipeline.json"]
Build image:
docker build -t cmp/sdc .
Run image:
docker run -p 18630:18630 -d --name sdc cmp/sdc
This outputs the container id. But the container is in the Exited status as shown below.
docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
537adb1b05ab cmp/sdc "/bin/sh /opt/stream…" 5 seconds ago Exited (0) 3 seconds ago sdc
When I do not specify the CMD command in the Dockerfile, the streamsets container spins up and then when I run the streamsets import command in the running container in shell, it works. But how do I get it done during provisioning itself? Is there something I am missing in the Dockerfile?

In your Dockerfile you overwrite the default CMD and ENTRYPOINT from the StreamSets Data Collector Dockerfile. So the container only executes your command during startup and exits without errors afterwards. This is the reason why your container is in Exited (0) status.
In general this is good and expected behavior. If you want to keep your container alive you need to execute another command in the foreground, which never ends. But unfortunately, you cannot run multiple CMDs in your docker file.
I dug a little deeper. The default entry point of the image is ENTRYPOINT ["/docker-entrypoint.sh"]. This script sets up a few things and starts the Data Collector.
It is required that the Data Collector is running before the pipeline is imported. So a solution could be to copy the default docker-entrypoint.sh and modify it to start the Data Collector and import the pipeline afterwards. You could to it like this:
Dockerfile:
FROM streamsets/datacollector:3.18.1
COPY myPipeline.json /pipelinejsonlocation/
# Replace docker-entrypoint.sh
COPY docker-entrypoint.sh /docker-entrypoint.sh
EXPOSE 18630
docker-entrypoint.sh (https://github.com/streamsets/datacollector-docker/blob/master/docker-entrypoint.sh):
#!/bin/bash
#
# Copyright 2017 StreamSets Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
set -e
# We translate environment variables to sdc.properties and rewrite them.
set_conf() {
if [ $# -ne 2 ]; then
echo "set_conf requires two arguments: <key> <value>"
exit 1
fi
if [ -z "$SDC_CONF" ]; then
echo "SDC_CONF is not set."
exit 1
fi
grep -q "^$1" ${SDC_CONF}/sdc.properties && sed 's|^#\?\('"$1"'=\).*|\1'"$2"'|' -i ${SDC_CONF}/sdc.properties || echo -e "\n$1=$2" >> ${SDC_CONF}/sdc.properties
}
# support arbitrary user IDs
# ref: https://docs.openshift.com/container-platform/3.3/creating_images/guidelines.html#openshift-container-platform-specific-guidelines
if ! whoami &> /dev/null; then
if [ -w /etc/passwd ]; then
echo "${SDC_USER:-sdc}:x:$(id -u):0:${SDC_USER:-sdc} user:${HOME}:/sbin/nologin" >> /etc/passwd
fi
fi
# In some environments such as Marathon $HOST and $PORT0 can be used to
# determine the correct external URL to reach SDC.
if [ ! -z "$HOST" ] && [ ! -z "$PORT0" ] && [ -z "$SDC_CONF_SDC_BASE_HTTP_URL" ]; then
export SDC_CONF_SDC_BASE_HTTP_URL="http://${HOST}:${PORT0}"
fi
for e in $(env); do
key=${e%=*}
value=${e#*=}
if [[ $key == SDC_CONF_* ]]; then
lowercase=$(echo $key | tr '[:upper:]' '[:lower:]')
key=$(echo ${lowercase#*sdc_conf_} | sed 's|_|.|g')
set_conf $key $value
fi
done
# MODIFICATIONS:
#exec "${SDC_DIST}/bin/streamsets" "$#"
check_data_collector_status () {
watch -n 1 ${SDC_DIST}/bin/streamsets cli -U http://localhost:18630 ping | grep -q 'version' && echo "Data Collector has started!" && import_pipeline
}
function import_pipeline () {
sleep 1
echo "Start to import pipeline"
${SDC_DIST}/bin/streamsets cli -U http://localhost:18630 -u admin -p admin store import -n myPipeline --stack -f /pipelinejsonlocation/myPipeline.json
echo "Finished importing pipeline"
}
# Start checking if Data Collector is up (in background) and start Data Collector
check_data_collector_status & ${SDC_DIST}/bin/streamsets $#
I commented out the last line exec "${SDC_DIST}/bin/streamsets" "$#" of the default docker-entrypoint.sh and added two functions. check_data_collector_status () pings the Data Collector service until it is available. import_pipeline () imports your pipeline.
check_data_collector_status () runs in background and ${SDC_DIST}/bin/streamsets $# is started in foreground as before. So the pipeline is imported after the Data Collector service is started.

Run this image with sleep command:
docker run -p 18630:18630 -d --name sdc cmp/sdc sleep 300
300 is the time to sleep in seconds.
Then exec your script manually within the docker container and find out what's wrong.

Inject SSH key into a Docker container

I am trying to find a "global" solution for injecting an SSH key into a container. I know that there are several solutions including docker build kit and so on...but I don't want to build an image and inject the SSH key. I want to inject the SSH key by using an existing image with docker compose.
I use the following docker compose file:
version: '3.1'
services:
server1:
image: XXXXXXX
container_name: server1
command: bash -c "/root/init.sh && python3 /root/my_python.py"
environment:
- MANAGED_HOST=mserver
volumes:
- ./init.sh:/root/init.sh
secrets:
- id_rsa
secrets:
id_rsa:
file: /home/user/.ssh/id_rsa
The init.sh is as follows:
#!/bin/bash
eval "$(ssh-agent -s)" > /dev/null
if [ ! -d "/root/.ssh/" ]; then
mkdir /root/.ssh
ssh-keyscan $MANAGED_HOST > /root/.ssh/known_hosts
fi
ssh-add -k /run/secrets/id_rsa
If I run docker compose with the parameter command
bash -c "/root/init.sh && python3 /root/my_python.py", then the SSH authentication to the appropriate remote host ($MANAGED_HOST) is not working.
An agent process is running:
root 8 1 0 12:50 ? 00:00:00 ssh-agent -s
known_hosts is OK:
root#c67655d87ced:~# cat /root/.ssh/known_hosts
BLABLABLA ssh-rsa AAAAB3BLABLABLA....
and the agent is running, but the private key is not added:
root#c67655d87ced:~# ssh-add -l
Could not open a connection to your authentication agent.
Now, if I log in the container (docker exec -it server1 /bin/bash) and run the commands from init.sh one by one from the command line, then the SSH authentication to the appropriate remote host ($MANAGED_HOST) is working?!?
Any idea, how I can get it working by using the docker compose?

It should be enough to cause the file $HOME/.ssh/id_rsa to exist with appropriate permissions; you don't need an ssh agent running.
#!/bin/sh
if ! [ -d "$HOME/.ssh" ]; then
mkdir "$HOME/.ssh"
fi
chmod 0700 "$HOME/.ssh"
if [ -n "$MANAGED_HOST" ]; then
ssh-keyscan "$MANAGED_HOST" >> "$HOME/.ssh/known_hosts"
fi
if [ -f /run/secrets/id_rsa ]; then
cp /run/secrets/id_rsa "$HOME/.ssh/id_rsa"
chmod 0400 "$HOME/.ssh/id_rsa"
fi
# exec "$#"
A typical pattern is to use the Dockerfile ENTRYPOINT to do first-time setup tasks like this. That will get passed the CMD as arguments, and the commented exec "$#" line at the end of the file runs that as a command. You'd set this up in your image's Dockerfile like:
FROM XXXXXX
...
# Script must be executable on the host, and must start with a
# #!/bin/sh "shebang" line
COPY init.sh /root
# MUST use JSON-array form
ENTRYPOINT ["/root/init.sh"]
# Can use any Dockerfile syntax
CMD ["python3", "/root/my_python.py"]
In your specific example, you're launching init.sh as a subprocess. The ssh-agent setup sets some environment variables, like $SSH_AUTH_SOCK, but when these run as a subprocess they don't get propagated back out to the host process. You can use the standard POSIX shell . builtin (the bash source builtin is equivalent, but non-standard) to cause those environment variables to be set in the context of the parent shell:
command: sh -c ". /root/init.sh && exec python3 /root/my_python.py"
The exec replaces the shell wrapper with the Python script, which you generally want. This will also wind up being the parent process of ssh-agent, which could potentially surprise your process if it happens to exit.

issues in accessing docker environment variables in systemd service files

1) I am running a docker container with following cmd (passing few env variables with -e option)
$ docker run --name=xyz -d -e CONTAINER_NAME=xyz -e SSH_PORT=22 -e NWMODE=HOST -e XDG_RUNTIME_DIR=/run/user/0 --net=host -v /mnt:/mnt -v /dev:/dev -v /etc/sysconfig/network-scripts:/etc/sysconfig/network-scripts -v /:/hostroot/ -v /etc/hostname:/etc/host_hostname -v /etc/localtime:/etc/localtime -v /var/run/docker.sock:/var/run/docker.sock --privileged=true cf3681e04bfb
2) After running the container as above, i check the env variable NWMODE inside the container, and it shows correctly as shown below :
$ docker exec -it xyz bash
$ env | grep NWMODE
NWMODE=HOST
3) Now, i created a sample service 'b' shown below which executes a script b.sh (where i try to access NWMODE) :
root#ubuntu16:/etc/systemd/system# cat b.service
[Unit]
Description=testing service b
[Service]
ExecStart=/bin/bash /etc/systemd/system/b.sh
root#ubuntu16:/etc/systemd/system# cat b.sh
#!/bin/bash`
systemctl import-environment
echo "NWMODE:" $NWMODE`
4) Now if i start service 'b' and see its logs, it shows that it is not able to access NWMODE env variable
$ systemctl start b
$ journalctl -fu b
...
systemd[1]: Started testing service b.
bash[641]: NWMODE: //blank for $NWMODE here`
5) Now rather than having 'systemctl import-environment' in b.sh, if i do following then the b.service logs show the correct value of NWMODE env variable:
$ systemctl import-environment
$ systemctl start b
Though the step 5 above works i can't go for it, as all the services in my system will be started automatically by systemd. In that case, can anyone please let me know how can i access the environment variables (passed using 'docker run...' cmd above) in a service file (say for e.g. in b.sh above). Can this be achieved somehow with systemctl import-environment or there is some other way ?

systemd unsets all environment variables to provide a clean environment. Afaik that is intended to be a security feature.
Workaround: Create a file /etc/systemd/system.conf.d/myenvironment.conf:
[Manager]
DefaultEnvironment=CONTAINER_NAME=xyz NWMODE=HOST XDG_RUNTIME_DIR=/run/user/0
systemd will set the environment variables declared in this file.
You can set up an ENTRYPOINT script that automatically creates this file before running systemd. Example:
RUN echo '#! /bin/bash \n\
echo "[Manager] \n\
DefaultEnvironment=$(while read -r Line; do echo -n "$Line" ; done < <(env) \n\
" >/etc/systemd/system.conf.d/myenvironment.conf \n\
exec /lib/systemd/systemd \n\
' >/usr/local/bin/setmyenv && chmod +x /usr/bin/setmyenv
ENTRYPOINT /usr/bin/setmyenv
Instead of creating the script within Dockerfile you can store it outside and add it with COPY:
#! /bin/bash
echo "[Manager]
DefaultEnvironment=$(while read -r Line; do echo -n "$Line" ; done < <(env)
" >/etc/systemd/system.conf.d/myenvironment.conf
exec /lib/systemd/systemd

TL;DR
Run the the command using bash, first store the docker environment variables to a file (or just pipe them two awk), extract & export the variable and finally run your main script.
ExecStart=/bin/bash -c "cat /proc/1/environ | tr '\0' '\n' > /home/env_file; export MY_ENV_VARIABLE=$(awk -F= -v key="MY_ENV_VARIABLE" '$1==key {print $2}' /home/env_file); /usr/bin/python3 /usr/bin/my_python_script.py"
Whatever #mviereck is saying is true, still I have found another solution to this problem.
My use case is to pass an environment variable to my system-d container in the Docker run command (docker run -e MY_ENV_VARIABLE="some_val") and use that in the python script that is run through the system-d unit file.
According to this post (https://forums.docker.com/t/where-are-stored-the-environment-variables/65762) the container environment variables can be found in the running process /proc/1/environ inside the container. Performing a cat does show that the environment variable MY_ENV_VARIABLE=some_val does exist, though in some mangled form.
$ cat /proc/1/environ
HOSTNAME=271fbnd986bdMY_ENV_VARIABLE=some_valcontainer=dockerLC_ALL=CDEBIAN_FRONTEND=noninteractiveHOME=/rootroot#271fb0d986bd
The main task now would be to extract MY_ENV_VARIABLE="some_val" value and pass it to the ExecStart directive in the system-d unit file.
(extraction code referenced from How to grep for value in a key-value store from plain text)
# this outputs a nice key,value pair
$ cat /proc/1/environ | tr '\0' '\n'
HOSTNAME=861f23cd1b33
MY_ENV_VARIABLE=some_val
container=docker
LC_ALL=C
DEBIAN_FRONTEND=noninteractive
HOME=/root
# we can store this in a file for use, too
$ cat /proc/1/environ | tr '\0' '\n' > /home/env_var_file
# we can then reuse the file to extract the value of interest against a key
$ awk -F= -v key="MY_ENV_VARIABLE" '$1==key {print $2}' /home/env_file
some_val
Now in the ExecStart directive in the system-d unit file we can do this:
[Service]
Type=simple
ExecStart=/bin/bash -c "cat /proc/1/environ | tr '\0' '\n' > /home/env_file; export MY_ENV_VARIABLE=$(awk -F= -v key="MY_ENV_VARIABLE" '$1==key {print $2}' /home/env_file); /usr/bin/python3 /usr/bin/my_python_script.py"

How to deal with state "Exit 0" in Docker

I have build a Docker image and afterwards run a container using Docker Compose. The following command will do the job for me:
docker-compose up -d
I have restarted the PC and now I want to start the previous container that I've created before. So I have tried the following command:
$ docker-compose start
Starting php-apache ... done
Apparently it works but it doesn't as per the output for the following command:
$ docker-compose ps
Name Command State Ports
---------------------------------------------------------------------------
php55devwork_php-apache_1 /bin/sh -c bash -C '/usr/l ... Exit 0
For sure something is wrong and I am trying to find out what.
How do I find why the command is failing?
Is there any place where I could see a log file or something that help me to identify and fix the error?
Here is the repository if you want to give it a try.
Update
If I remove the container: docker rm <container-id> and recreate it by running docker-compose up -d --build it works again.
Update #1
I am not able to see such weird characters:

This is what helped me to resolve this issue:
Under one of your services in the docker-compose yaml file, type in the following:
tty: true so it'll look like
version: '3'
services:
web:
tty: true
Hopefully this helps someone; thumps up if it helps you :)

I took a look into your Docker github and setup_php_settings
on line (line n. 27) there is source /etc/apache2/envvars && exec /usr/sbin/apache2 -DFOREGROUND
and that runs apache2 on foreground so it shouldn't exit with status code 0.
But it seems to me like your setup_php_settings contains some weird character (when I run your image with compose)
(original is one on right side) weird character
I have changed it to new lines and it worked for me. Let us know if it helped.
If you want to debug your docker container you can run it without entrypoint like:
docker run -it yourImage bash
-- AFTER some investigation:
There were still some errors when I restart docker container - like in your case stopped container and start after reboot. There were problems: symbolic links already exist and apache2 has grumpy PID so we need to do something like in oficial php docker
This is full setup_php_settings worked for me after container restart.
#!/bin/bash -x
set -e
PHP_ERROR_REPORTING=${PHP_ERROR_REPORTING:-"E_ALL & ~E_DEPRECATED & ~E_NOTICE"}
sed -ri 's/^display_errors\s*=\s*Off/display_errors = On/g' /etc/php5/apache2/php.ini
sed -ri 's/^display_errors\s*=\s*Off/display_errors = On/g' /etc/php5/cli/php.ini
sed -ri "s/^error_reporting\s*=.*$//g" /etc/php5/apache2/php.ini
sed -ri "s/^error_reporting\s*=.*$//g" /etc/php5/cli/php.ini
echo "error_reporting = $PHP_ERROR_REPORTING" >> /etc/php5/apache2/php.ini
echo "error_reporting = $PHP_ERROR_REPORTING" >> /etc/php5/cli/php.ini
mkdir -p /data/tmp/php/uploads
mkdir -p /data/tmp/php/sessions
mkdir -p /data/tmp/php/xdebug
chown -R www-data:www-data /data/tmp/php*
ln -sf /etc/php5/mods-available/zz-php.ini /etc/php5/apache2/conf.d/zz-php.ini
ln -sf /etc/php5/mods-available/zz-php-directories.ini /etc/php5/apache2/conf.d/zz-php-directories.ini
# Add symbolic link to get Zend out of the current install dir
ln -sf /usr/share/php/libzend-framework-php/Zend/ /usr/share/php/Zend
a2enmod rewrite
php5enmod mcrypt
# Apache gets grumpy about PID files pre-existing
: "${APACHE_PID_FILE:=${APACHE_RUN_DIR:=/var/run/apache2}/apache2.pid}"
rm -f "$APACHE_PID_FILE"
source /etc/apache2/envvars && exec /usr/sbin/apache2 -DFOREGROUND "$#"

You can check logs with docker compose logs.
Looking through your repo, you have
ENTRYPOINT bash -C '/usr/local/bin/setup_php_settings';'bash'
which, without an interactive session, bash will exit immediately (with an exit code 0) after reading the end of file on stdin.

Normally getting an exit 0 should be a reason to celebrate, as it indicates that your command has ended successfully (http://www.tldp.org/LDP/abs/html/exit-status.html).
Having had a look at your Dockerfile it looks like, your just invoking bash in your entry point which then for sure will exit (as it is non blocking). In order to serve some data, you should rather be calling php (which is a blocking operation that keeps the container up), like done in the official docker files for php (see the CMD ["php", "-a"] at https://github.com/docker-library/php/blob/1c56325a69718a3e3cf76179e75d070b7e23da62/5.6/Dockerfile)

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart