Nsys Profile with MPMD(multiple program and multiple data) simulation - nvidia

I am trying to profile a MPI+OPENACC program with nsys.
I am using OpenMPI(3.1.6) from Nvidia HPC SDK(20.7) with UCX enabled.
There are three exectuables, exec1, exec2, exec3. I want to profile for exec3. But I am failing.
Following is the run script:-
#SBATCH --nodes=1
#SBATCH --ntasks=40
#SBATCH --ntasks-per-node=40
#SBATCH --output=app.out
#SBATCH --error=app.err
#SBATCH -p Intel_6248_2s_20c_2t_GPU_hdr100_192GB_2933
#SBATCH --exclusive
#SBATCH --gres=gpu:4
WRAPPER=/run/acc_round_robin.sh
exec1=$workdir/exec/prog1
exec2=$workdir/exec/prog2
exec3=$workdir/exec/prog3
echo "0 $WRAPPER $exec1> $workdir/file.conf
echo "2-9,11-19,21-29,32-39 $WRAPPER $exec2">> $workdir/file.conf
echo "nsys profile 1,10,20,30,31 $WRAPPER $exec3">> $workdir/file.conf
echo "#!/bin/bash" > $workdir/file1_cmd
echo "srun --multi-prog $workdir/file.conf" >> $workdir/file1_cmd
echo "exit 1" >> $workdir/file1_cmd
chmod +x $workdir/file1_cmd
/usr/bin/time ./CASTING cast ./configure
date
TEND=echo "print time();" | perl
echo "++++ Total elapsed time expr $TEND - $TBEGIN seconds"
Run:- sbatch run.sh

Related

Play a beep from a Docker image

I'd like to play a beep with the buzzer from a Docker image.
So far, I've been able to play a beep using the following command:
echo -e "\a" > /dev/console
This works correctly under Ubuntu 20.04.
I've tried to encapsulate this code into a Docker image:
FROM ubuntu:focal
RUN echo '#!/bin/bash' > /bootstrap.sh
RUN echo 'for i in {1..5}' >> /bootstrap.sh
RUN echo 'do' >> /bootstrap.sh
RUN echo ' echo "B"' >> /bootstrap.sh
RUN echo ' echo -e "\\a" > /dev/console' >> /bootstrap.sh
RUN echo ' sleep 0.5' >> /bootstrap.sh
RUN echo 'done' >> /bootstrap.sh
RUN echo 'sleep infinity' >> /bootstrap.sh
RUN chmod +x /bootstrap.sh
CMD /bootstrap.sh
To run the image, I've used the following command:
docker run -t -i --privileged -v /dev/console:/dev/console bell
This does not produce any sound. I've also tried to start a shell into the image but the commands only returns an empty string.
Any idea on how to fix this ?

nginx-alpine: how to view its docker file

I have the following image: nginx:1.19.0-alpine
I want to know about its docker file. I checked https://github.com/nginxinc/docker-nginx but could not understand how to check
Basically i want to change the /docker-entrypoint.sh
Currently its
#!/usr/bin/env sh
# vim:sw=4:ts=4:et
set -e
if [ "$1" = "nginx" -o "$1" = "nginx-debug" ]; then
if /usr/bin/find "/docker-entrypoint.d/" -mindepth 1 -maxdepth 1 -type f -print -quit 2>/dev/null | read v; then
echo "$0: /docker-entrypoint.d/ is not empty, will attempt to perform configuration"
echo "$0: Looking for shell scripts in /docker-entrypoint.d/"
find "/docker-entrypoint.d/" -follow -type f -print | sort -n | while read -r f; do
case "$f" in
*.sh)
if [ -x "$f" ]; then
echo "$0: Launching $f";
"$f"
else
# warn on shell scripts without exec bit
echo "$0: Ignoring $f, not executable";
fi
;;
*) echo "$0: Ignoring $f";;
esac
done
echo "$0: Configuration complete; ready for start up"
else
echo "$0: No files found in /docker-entrypoint.d/, skipping configuration"
fi
fi
# Handle enabling SSL
if [ "$ENABLE_SSL" = "True" ]; then
echo "Enabling SSL support!"
cp /etc/nginx/configs/default_ssl.conf /etc/nginx/conf.d/default.conf
fi
exec "$#"
I want it to modify it as
#!/usr/bin/env sh
# vim:sw=4:ts=4:et
set -x -o verbose;
echo $1
if [ "$1" = "nginx" -o "$1" = "nginx-debug" ]; then
if /usr/bin/find "/docker-entrypoint.d/" -mindepth 1 -maxdepth 1 -type f -print -quit 2>/dev/null | read v; then
echo "$0: /docker-entrypoint.d/ is not empty, will attempt to perform configuration"
echo "$0: Looking for shell scripts in /docker-entrypoint.d/"
find "/docker-entrypoint.d/" -follow -type f -print | sort -n | while read -r f; do
case "$f" in
*.sh)
if [ -x "$f" ]; then
echo "$0: Launching $f";
"$f"
else
# warn on shell scripts without exec bit
echo "$0: Ignoring $f, not executable";
fi
;;
*) echo "$0: Ignoring $f";;
esac
done
echo "$0: Configuration complete; ready for start up"
else
echo "$0: No files found in /docker-entrypoint.d/, skipping configuration"
fi
fi
# Handle enabling SSL
if [ "$ENABLE_SSL" = "True" ]; then
echo "Enabling SSL support!"
cp /etc/nginx/configs/default_ssl.conf /etc/nginx/conf.d/default.conf
fi
exec "$#"
I see $1 is being passed. But no clue what to pass
Normally, the CMD from the Dockerfile is sent as parameters to the ENTRYPOINT. In the generated Dockerfile (found here for current alpine: https://github.com/nginxinc/docker-nginx/blob/master/stable/alpine/Dockerfile), you can see that the CMD is CMD ["nginx", "-g", "daemon off;"].
So in normal use, the parameters for the entrypoint script are "nginx", "-g" and "daemon off;".
Since the first parameter is "nginx", the script will enter the if block and run the code in there.
The if is there in case you want to run a different command. For instance if you want to enter the shell and look around in the image, you could do docker run -it nginx:alpine /bin/sh. Now the parameter for the entrypoint.sh script is just "/bin/sh" and now the script won't enter the if-block.
If you want to build your own, modified version of the image, you can fork https://github.com/nginxinc/docker-nginx/ and build from your own version of the repo.
If you need the 1.19 version of the Dockerfile, you'll have to look through the old versions on the repo. The 1.19.0 version is here: https://github.com/nginxinc/docker-nginx/blob/1.19.0/stable/alpine/Dockerfile

how to make docker keep running in frontend and not exit so that I could see the running log output

Now I want to make a docker command run in frontend so that I could see the log output. Now I am using this command to run my docker container:
docker run -p 11110:11110 -p 11111:11111 -p 11112:11112 --name canal-server dolphinjiang/canal-server:v1.1.5
this is the Dockerfile of my project:
FROM centos:7
RUN cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
RUN echo ZONE=\"Asia/Shanghai\" > /etc/sysconfig/clock
RUN rm -rf /etc/yum.repos.d/*.repo
COPY CentOS6-Base-163.repo /etc/yum.repos.d/
RUN yum clean all
RUN groupadd -g 2500 canal; useradd -u 2501 -g canal -d /home/canal -m canal
RUN echo canal:De#2018er | chpasswd; echo root:dockerroot | chpasswd
RUN yum -y update && yum -y install wget vi openssl.x86_64 glibc.x86_64 tar tar.x86_64 inetutils-ping net-tools telnet which file
RUN yum clean all
COPY jdk-8u291-linux-x64.tar.gz /opt
RUN tar -zvxf /opt/jdk-8u291-linux-x64.tar.gz -C /opt && \
rm -rf /opt/jdk-8u291-linux-x64.tar.gz && \
chmod -R 755 /opt/jdk1.8.0_291 && \
chown -R root:root /opt/jdk1.8.0_291
RUN echo 'export JAVA_HOME=/opt/jdk1.8.0_291' >> /etc/profile
RUN echo 'export JRE_HOME=$JAVA_HOME/jre' >> /etc/profile
RUN echo 'export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH' >> /etc/profile
RUN echo 'export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH' >> /etc/profile
RUN source /etc/profile
RUN yum install kde-l10n-Chinese -y
RUN yum install glibc-common -y
RUN localedef -c -f UTF-8 -i zh_CN zh_CN.utf8
ENV JAVA_HOME /opt/jdk1.8.0_291
ENV PATH $PATH:$JAVA_HOME/bin
ENV LANG zh_CN.UTF-8
ENV LC_ALL zh_CN.UTF-8
ADD canal-server /home/canal/
RUN chmod 755 /home/canal/bin
WORKDIR /home/canal/bin
RUN chmod 777 /home/canal/bin/restart.sh
RUN chmod 777 /home/canal/bin/startup.sh
RUN chmod 777 /home/canal/bin/stop.sh
RUN chmod 777 /home/canal/bin/config.sh
CMD /home/canal/bin/config.sh
this is the config.sh:
cat > /home/canal/conf/canal.properties <<- EOF
# register ip
canal.register.ip = ${HOSTNAME}.canal-server-discovery-svc-stable.testcanal.svc.cluster.local
# canal admin config
canal.admin.manager = canal-admin-stable:8089
canal.admin.port = 11110
canal.admin.user = admin
canal.admin.passwd = 4ACFE3202A5FF5CF467898FC58AAB1D615029441
# admin auto register
canal.admin.register.auto = true
canal.admin.register.cluster =
EOF
sh /home/canal/bin/restart.sh
and this is the restart.sh:
#!/bin/bash
args=$#
case $(uname) in
Linux)
bin_abs_path=$(readlink -f $(dirname $0))
;;
*)
bin_abs_path=$(cd $(dirname $0) ||exit ; pwd)
;;
esac
sh "$bin_abs_path"/stop.sh $args
sh "$bin_abs_path"/startup.sh $args
and this is the start.sh:
#!/bin/bash
current_path=`pwd`
case "`uname`" in
Linux)
bin_abs_path=$(readlink -f $(dirname $0))
;;
*)
bin_abs_path=`cd $(dirname $0); pwd`
;;
esac
base=${bin_abs_path}/..
canal_conf=$base/conf/canal.properties
canal_local_conf=$base/conf/canal_local.properties
logback_configurationFile=$base/conf/logback.xml
export LANG=en_US.UTF-8
export BASE=$base
if [ -f $base/bin/canal.pid ] ; then
echo "found canal.pid , Please run stop.sh first ,then startup.sh" 2>&2
exit 1
fi
if [ ! -d $base/logs/canal ] ; then
mkdir -p $base/logs/canal
fi
## set java path
if [ -z "$JAVA" ] ; then
JAVA=$(which java)
fi
ALIBABA_JAVA="/usr/alibaba/java/bin/java"
TAOBAO_JAVA="/opt/taobao/java/bin/java"
if [ -z "$JAVA" ]; then
if [ -f $ALIBABA_JAVA ] ; then
JAVA=$ALIBABA_JAVA
elif [ -f $TAOBAO_JAVA ] ; then
JAVA=$TAOBAO_JAVA
else
echo "Cannot find a Java JDK. Please set either set JAVA or put java (>=1.5) in your PATH." 2>&2
exit 1
fi
fi
case "$#"
in
0 )
;;
1 )
var=$*
if [ "$var" = "local" ]; then
canal_conf=$canal_local_conf
else
if [ -f $var ] ; then
canal_conf=$var
else
echo "THE PARAMETER IS NOT CORRECT.PLEASE CHECK AGAIN."
exit
fi
fi;;
2 )
var=$1
if [ "$var" = "local" ]; then
canal_conf=$canal_local_conf
else
if [ -f $var ] ; then
canal_conf=$var
else
if [ "$1" = "debug" ]; then
DEBUG_PORT=$2
DEBUG_SUSPEND="n"
JAVA_DEBUG_OPT="-Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,address=$DEBUG_PORT,server=y,suspend=$DEBUG_SUSPEND"
fi
fi
fi;;
* )
echo "THE PARAMETERS MUST BE TWO OR LESS.PLEASE CHECK AGAIN."
exit;;
esac
str=`file -L $JAVA | grep 64-bit`
if [ -n "$str" ]; then
JAVA_OPTS="-server -Xms2048m -Xmx3072m -Xmn1024m -XX:SurvivorRatio=2 -XX:PermSize=96m -XX:MaxPermSize=256m -Xss256k -XX:-UseAdaptiveSizePolicy -XX:MaxTenuringThreshold=15 -XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+UseCMSCompactAtFullCollection -XX:+UseFastAccessorMethods -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError"
else
JAVA_OPTS="-server -Xms1024m -Xmx1024m -XX:NewSize=256m -XX:MaxNewSize=256m -XX:MaxPermSize=128m "
fi
JAVA_OPTS=" $JAVA_OPTS -Djava.awt.headless=true -Djava.net.preferIPv4Stack=true -Dfile.encoding=UTF-8"
CANAL_OPTS="-DappName=otter-canal -Dlogback.configurationFile=$logback_configurationFile -Dcanal.conf=$canal_conf"
if [ -e $canal_conf -a -e $logback_configurationFile ]
then
for i in $base/lib/*;
do CLASSPATH=$i:"$CLASSPATH";
done
CLASSPATH="$base/conf:$CLASSPATH";
echo "cd to $bin_abs_path for workaround relative path"
cd $bin_abs_path
echo LOG CONFIGURATION : $logback_configurationFile
echo canal conf : $canal_conf
echo CLASSPATH :$CLASSPATH
$JAVA $JAVA_OPTS $JAVA_DEBUG_OPT $CANAL_OPTS -classpath .:$CLASSPATH com.alibaba.otter.canal.deployer.CanalLauncher 2>&1
echo $! > $base/bin/canal.pid
echo "cd to $current_path for continue"
cd $current_path
else
echo "canal conf("$canal_conf") OR log configration file($logback_configurationFile) is not exist,please create then first!"
fi
after I start the docker, it exit automaticlly, and the docker not startup, no log output. what should I do to make it run in frontend. after successs, switch to the backend. I also tried to run in deamon like this(make the container run background and not exit):
docker run -it -d -p 11110:11110 -p 11111:11111 -p 11112:11112 --name canal-server canal/canal-server:v1.1.5
the process still exit automaticlly. and docker container did not startup.
Basically, you should get the point (based on your latest comment).
Docker is based on some command, when it's done - it stops the container.
So to make it continuously running you should have command and run infinitely.
Also check this answer as well, there are more explanation
Why docker exiting with code 0
One of the easiest solution is to tail some logs.
Like,
tail -f /dev/null
Taken from here
you can use tail -f /dev/null to keep the container from stopping, try this
docker run -it -d -p 11110:11110 -p 11111:11111 -p 11112:11112 --name canal-server canal/canal-server:v1.1.5 tail -f /dev/null
see also this post

cron not running in alpine docker

I have created and added below entry in my entry-point.sh for docker file.
# start cron
/usr/sbin/crond &
exec "${DIST}/bin/ss" "$#"
my crontab.txt looks like below:
bash-4.4$ crontab -l
*/5 * * * * /cleanDisk.sh >> /apps/log/cleanDisk.log
So when I run the docker container, i don't see any file created called as cleanDisk.log.
I have setup all permissions and crond is running as a process in my container see below.
bash-4.4$ ps -ef | grep cron
12 sdc 0:00 /usr/sbin/crond
208 sdc 0:00 grep cron
SO, can anyone, guide me why the log file is not getting created?
my cleanDisk.sh looks like below. Since it runs for very first time,and it doesn't match all the criteria, so I would expect at least to print "No Error file found on Host $(hostname)" in cleanDisk.log.
#!/bin/bash
THRESHOLD_LIMIT=20
RETENTION_DAY=3
df -Ph /apps/ | grep -vE '^Filesystem|tmpfs|cdrom' | awk '{ print $5,$1 }' | while read output
do
#echo $output
used=$(echo $output | awk '{print $1}' | sed s/%//g)
partition=$(echo $output | awk '{print $2}')
if [ $used -ge ${THRESHOLD_LIMIT} ]; then
echo "The partition \"$partition\" on $(hostname) has used $used% at $(date)"
FILE_COUNT=$(find ${SDC_LOG} -maxdepth 1 -mtime +${RETENTION_DAY} -type f -name "sdc-*.sdc" -print | wc -l)
if [ ${FILE_COUNT} -gt 0 ]; then
echo "There are ${FILE_COUNT} files older than ${RETENTION_DAY} days on Host $(hostname)."
for FILENAME in $(find ${SDC_LOG} -maxdepth 1 -mtime +${RETENTION_DAY} -type f -name "sdc-*.sdc" -print);
do
ERROR_FILE_SIZE=$(stat -c%s ${FILENAME} | awk '{ split( "B KB MB GB TB PB" , v ); s=1; while( $1>1024 ){ $1/=1024; s++ } printf "%.2f %s\n", $1, v[s] }')
echo "Before Deleting Error file ${FILENAME}, the size was ${ERROR_FILE_SIZE}."
rm -rf ${FILENAME}
rc=$?
if [[ $rc -eq 0 ]];
then
echo "Error log file ${FILENAME} with size ${ERROR_FILE_SIZE} is deleted on Host $(hostname)."
fi
done
fi
if [ ${FILE_COUNT} -eq 0 ]; then
echo "No Error file found on Host $(hostname)."
fi
fi
done
edit
my docker file looks like this
FROM adoptopenjdk/openjdk8:jdk8u192-b12-alpine
ARG SDC_UID=20159
ARG SDC_GID=20159
ARG SDC_USER=sdc
RUN apk add --update --no-cache bash \
busybox-suid \
sudo && \
echo 'hosts: files mdns4_minimal [NOTFOUND=return] dns mdns4' >> /etc/nsswitch.conf
RUN addgroup --system ${SDC_USER} && \
adduser --system --disabled-password -u ${SDC_UID} -G ${SDC_USER} ${SDC_USER}
ADD --chown=sdc:sdc crontab.txt /etc/crontabs/sdc/
RUN chgrp sdc /etc/cron.d /etc/crontabs /usr/bin/crontab
# Also tried to run like this but not working
# RUN /usr/bin/crontab -u sdc /etc/crontabs/sdc/crontab.txt
USER ${SDC_USER}
EXPOSE 18631
RUN /usr/bin/crontab /etc/crontabs/sdc/crontab.txt
ENTRYPOINT ["/docker-entrypoint.sh"]
CMD ["dc", "-exec"]

In dask_jobqueue, how to add extra lines to job_script?

I'm configuring dask to run on an HPC cluster. I setup a client as follows:
First modify ~/.config/dask/*.yaml then run some code like this:
from dask_jobqueue import SLURMCluster
cluster = SLURMCluster()
cluster.scale(100) # Start 100 workers in 100 jobs
from distributed import Client
client = Client(cluster)
print(cluster.job_script())
Here's what the resultant job_script looks like:
#!/bin/bash
#!/usr/bin/env bash
#SBATCH -J dask-worker
#SBATCH -n 1
#SBATCH --cpus-per-task=1
#SBATCH --mem=1GB
#SBATCH -t 00:30:00
JOB_ID=${SLURM_JOB_ID%;*}
/path/to/python3 -m distributed.cli.dask_worker tcp://192.168.*.*:* --nthreads 1 --memory-limit 1000.00MB --name dask-worker--${JOB_ID}-- --death-timeout 60 --local-directory /scratch
So the script launches python3 immediately, but I need to make it do some stuff like activating a conda environment, or python virtual env before launching python. How can I add some pre-commands to the job_script?
I got it, by reading source code of dask_jobqueue/core.py, which was thankfully very easy.
In ~/.config/dask/jobqueue.yaml, edit env-extra. Each string in the list is added to the script as a command. For example, when using
env-extra: ['cd foo', 'mkdir bar', 'cd bar', 'conda foo']
The job_script comes out like this:
#!/bin/bash
#!/usr/bin/env bash
#SBATCH -J dask-worker
#SBATCH -n 1
#SBATCH --cpus-per-task=1
#SBATCH --mem=1GB
#SBATCH -t 00:30:00
JOB_ID=${SLURM_JOB_ID%;*}
cd foo
mkdir bar
cd bar
conda foo
/path/to/python3 -m distributed.cli.dask_worker tcp://192.168.*.*:* --nthreads 1 --memory-limit 1000.00MB --name dask-worker--${JOB_ID}-- --death-timeout 60 --local-directory /scratch

Resources