Using parallel on a slurm cluster - gnu-parallel

I am looking for a command to parallelize the following command with Gnu Parallel:
OpenSees 1.tcl
OpenSees is an exe file which is OpenSees.exe in windows and OpenSees in Linux. I want to do parallel processing with parameter study. OpenSees is a seismic analysis tool. 1.tcl is an input file for it.
Please bear in mind that the 1.tcl will go from 1.tcl to 360.tcl and I would like to define the number of processors (In example how many parallel executions will go side by side). In normal conditions there are parallel versions with mpi for OpenSees but this is the sequential version I am asking for.

This is the slurm sh script I used, but it only worked for one machine, I could not add more than one machine, so I build more shell scripts with numbers >28 to 360. These are the necessary parts of the script.
#!/bin/bash
#SBATCH -n 28 # total number of cores
#SBATCH -N 1 # machine number
parallel --bar ./OpenSees {}.tcl ::: {1..28}

Related

PBS array job parallelization

I am trying to submit a job on high compute cluster that needs to run a python code lets say 10000 times. I used gnu parallel but then IT team sent me a mail stating that my job is creating too many ssh login logs in their monitoring system. They asked me to use job arrays instead. My code takes about 12 seconds to run. I believe I need to use #PBS -J statement in my PBS script. Then, I am not sure if it will run in parallel. I need to execute my code lets say on 10 nodes 16 cores each i.e. 160 instances of my code running in parallel. How can I parallelize it i.e. run many instances of my code at a given time utilizing all the resources I have?
Below is the initial pbs script with gnu parallel:
#!/bin/bash
#PBS -P My_project
#PBS -N my_job
#PBS -l select=10:ncpus=16:mem=4GB
#PBS -l walltime=01:30:00
module load anaconda
module load parallel
cd $PBS_O_WORKDIR
JOBSPERNODE=16
parallel --joblog jobs.log --wd $PBS_O_WORKDIR -j $JOBSPERNODE --sshloginfile $PBS_NODEFILE --env PATH "python $PBS_O_WORKDIR/xyz.py" :::: inputs.txt
inputs.txt is a fie with integer values 0-9999 in each line which is fed to my python code as an argument. Code is highly independent and output of one instance does not affect another.
a little late but thought I'd answer anyway.
Arrays will run in parallel, but the number of jobs running at once will depend on the availability of nodes and the limit of jobs per user per queue. Essentially, each HPC will be slightly different.
Adding #PBS -J 1-10000 will create an array of 10000 jobs, and assuming the syntax is the same as the HPC I use, something like ID=$(sed -n "${PBS_ARRAY_INDEX}p" /path/to/inputs.txt) will then be the integers from inputs.txt whereby PBS array number 123 will return the 123rd line of inputs.txt.
Alternatively, since you're on an HPC, if the jobs are only taking 12 seconds each, and you have 10000 iterations, then a for loop will also complete the entire process in 33.33 hours.

Use all cores to make OpenCV 3 [duplicate]

Quick question: what is the compiler flag to allow g++ to spawn multiple instances of itself in order to compile large projects quicker (for example 4 source files at a time for a multi-core CPU)?
You can do this with make - with gnu make it is the -j flag (this will also help on a uniprocessor machine).
For example if you want 4 parallel jobs from make:
make -j 4
You can also run gcc in a pipe with
gcc -pipe
This will pipeline the compile stages, which will also help keep the cores busy.
If you have additional machines available too, you might check out distcc, which will farm compiles out to those as well.
There is no such flag, and having one runs against the Unix philosophy of having each tool perform just one function and perform it well. Spawning compiler processes is conceptually the job of the build system. What you are probably looking for is the -j (jobs) flag to GNU make, a la
make -j4
Or you can use pmake or similar parallel make systems.
People have mentioned make but bjam also supports a similar concept. Using bjam -jx instructs bjam to build up to x concurrent commands.
We use the same build scripts on Windows and Linux and using this option halves our build times on both platforms. Nice.
If using make, issue with -j. From man make:
-j [jobs], --jobs[=jobs]
Specifies the number of jobs (commands) to run simultaneously.
If there is more than one -j option, the last one is effective.
If the -j option is given without an argument, make will not limit the
number of jobs that can run simultaneously.
And most notably, if you want to script or identify the number of cores you have available (depending on your environment, and if you run in many environments, this can change a lot) you may use ubiquitous Python function cpu_count():
https://docs.python.org/3/library/multiprocessing.html#multiprocessing.cpu_count
Like this:
make -j $(python3 -c 'import multiprocessing as mp; print(int(mp.cpu_count() * 1.5))')
If you're asking why 1.5 I'll quote user artless-noise in a comment above:
The 1.5 number is because of the noted I/O bound problem. It is a rule of thumb. About 1/3 of the jobs will be waiting for I/O, so the remaining jobs will be using the available cores. A number greater than the cores is better and you could even go as high as 2x.
make will do this for you. Investigate the -j and -l switches in the man page. I don't think g++ is parallelizable.
distcc can also be used to distribute compiles not only on the current machine, but also on other machines in a farm that have distcc installed.
I'm not sure about g++, but if you're using GNU Make then "make -j N" (where N is the number of threads make can create) will allow make to run multple g++ jobs at the same time (so long as the files do not depend on each other).
GNU parallel
I was making a synthetic compilation benchmark and couldn't be bothered to write a Makefile, so I used:
sudo apt-get install parallel
ls | grep -E '\.c$' | parallel -t --will-cite "gcc -c -o '{.}.o' '{}'"
Explanation:
{.} takes the input argument and removes its extension
-t prints out the commands being run to give us an idea of progress
--will-cite removes the request to cite the software if you publish results using it...
parallel is so convenient that I could even do a timestamp check myself:
ls | grep -E '\.c$' | parallel -t --will-cite "\
if ! [ -f '{.}.o' ] || [ '{}' -nt '{.}.o' ]; then
gcc -c -o '{.}.o' '{}'
fi
"
xargs -P can also run jobs in parallel, but it is a bit less convenient to do the extension manipulation or run multiple commands with it: Calling multiple commands through xargs
Parallel linking was asked at: Can gcc use multiple cores when linking?
TODO: I think I read somewhere that compilation can be reduced to matrix multiplication, so maybe it is also possible to speed up single file compilation for large files. But I can't find a reference now.
Tested in Ubuntu 18.10.

Decrease bazel memory usage

I'm using bazel on a computer with 4 GB RAM (to compile the tensorflow project). Bazel does however not take into account the amount of memory I have and spawns too many jobs causing my machine to swap and leading to a longer build time.
I already tried setting the ram_utilization_factor flag through the following lines in my ~/.bazelrc
build --ram_utilization_factor 30
test --ram_utilization_factor 30
but that did not help. How are these factors to be understood anyway? Should I just randomly try out some others?
Some other flags that might help:
--host_jvm_args can be used to set how much memory the JVM should use by setting -Xms and/or -Xmx, e.g., bazel --host_jvm_args=-Xmx4g --host_jvm_args=-Xms512m build //foo:bar (docs).
--local_resources in conjunction with the --ram_utilization_factor flag (docs).
--jobs=10 (or some other low number, it defaults to 200), e.g. bazel build --jobs=2 //foo:bar (docs).
Note that --host_jvm_args is a startup option so it goes before the command (build) and --jobs is a "normal" build option so it goes after the command.
For me, the --jobs argument from #kristina's answer worked:
bazel build --jobs=1 tensorflow:libtensorflow_all.so
Note: --jobs=1 must follow, not precede build, otherwise bazel will not recognize it. If you were to type bazel --jobs=1 build tensorflow:libtensorflow_all.so, you would get this error message:
Unknown Bazel startup option: '--jobs=1'.
Just wanted to second #sashoalm's comment that the --jobs=1 flag was what made bazel build finally work.
For reference, I'm running bazel on Lubuntu 17.04, running as a VirtualBox guest with about 1.5 GB RAM and two cores of an Intel i3 (I'm running a Thinkpad T460). I was following the O'Reilly tutorial on TensorFlow (https://www.oreilly.com/learning/dive-into-tensorflow-with-linux), and ran into trouble at the following step:
$ bazel build tensorflow/examples/label_image:label_image
Changing this to bazel build --jobs=1 tensorflow/... did the trick.
i ran into quite a few unstability that bazel build failed in my k8s cluster.
Besides --jobs=1, try this:
https://docs.bazel.build/versions/master/command-line-reference.html#flag--local_resources
E.g. --local_resources=4096,2.0,1.0

To use lsf bsub command without all the verbosity output

My problem is that: I have a bash script that do something and then call 800 bsub jobs like this:
pids=
rm -f ~/.count-*
for i in `ls _some_files_`; do
of=~/.count-${i}
bsub -I "grep _something_ $i > $of" &
pids="${!} ${pids}"
done
wait ${pids}
Then the scripts process the output files $of and echo the results.
The trouble is that I got a lot of lines like:
Job <7536> is submitted to default queue <interactive>.
<<Waiting for dispatch ...>>
<<Starting on hostA>>
It's actually 800 times the 3 lines above. Is there a way of suppressing this LSF lines?
I've tried in the loop above:
bsub -I "grep _something_ $i > $of" &> /dev/null
I does remove the LSF verbosity but instead of submitting almost all 800 jobs at once and then take less than 4 min to run, it submits just few jobs at a time and I have to wait more than an hour for the script to finish.
AFAIK lsf bsub doesn't seem to have a option to surpress all this verbosity. What can I do here?
You can suppress this output by setting the environment variable BSUB_QUIET to any value (including empty) before your bsub. So, before your loop say you could add:
export BSUB_QUIET=
Then if you want to return it back to normal you can clear the variable with:
unset BSUB_QUIET
Hope that helps you out.
Have you considered using job dependencies and post-process the logfiles?
1) Run each "child" job (removing the "-Is") and output the IO to separate output file. Each job should be submitted with a jobname (see -J). the jobname could form an array.
2) Your final job would be dependent on the children finishing (see -w).
Besides running concurrent across the cluster, another advantage of this approach is that your overall process is not susceptible to IO issues.

Inheriting environment variables with GNU Parallel

I would like to inherit environment variables in GNU Parallel. I have several 'scripts' (really just lists of commands, designed for use with GNU Parallel) with hundreds of lines each that all call different external programs. However, these external program (out of my control) requires that several environment variables be set before they will even run.
Setting/exporting them locally doesn't seem to help, and I don't see any way to add this information to a profile.
The documentation doesn't seem to have anything this, and similar SO pages suggest wrapping the command in a script. However, this seems like an inelegant solution. Is there a way to export the current environment, or perhaps specify the required variables in a script?
Thanks!
This works for me:
FOO="My brother's 12\" records"
export FOO
parallel echo 'FOO is "$FOO" Process id $$ Argument' ::: 1 2 3
To make it work for remote connections (through ssh) you need to quote the variable for shell expansion. parallel --shellquote can help you do that:
parallel -S server export FOO=$(parallel --shellquote ::: "$FOO")\;echo 'FOO is "$FOO" Process id $$ Argument' ::: 1 2 3
If that does not solve your issue, please consider showing an example that does not work.
-- Edit --
Look at --env introduced in version 20121022
-- Edit --
Look at env_parallel introduced in 20160322.

Resources