Gnu Parallel with multiple commands and multiple configurations - gnu-parallel

I start several gnu parallel jobs from a bash file like this:
parallel -a jobs_A.sh --workdir workDir_A_Path --results logDir_A_Path --joblog logDir_A_Path
parallel -a jobs_B.sh --workdir workDir_B_Path --results logDir_B_Path --joblog logDir_B_Path
I can append jobs_A.sh and jobs_B.sh.
Now I want one single parallel call to submit the jobs to the workers.
However, how can I tell parallel which workdir, results and joblog folder to use, respectively ?

You cannot do that because neither --results nor --joblog are computed per job.
You can get the workdir, though:
parallel --xapply --workdir {1} --results logDir_Path --joblog logDir_common_Path {2} \
:::: <(perl -ne 'print "workDir_A_Path\n"' jobs_A.sh; perl -ne 'print "workDir_B_Path\n"' jobs_B.sh;) \
:::: <(cat jobs_A.sh jobs_B.sh)

Related

How to take substring from input file as an argument to a program to be executed in GNU-parallel?

I am trying to execute a program (say, biotool) using GNU-parallel which takes 3 arguments, i, o and a :
the input files (i)
output file name to be written in (o)
an argument which takes a sub string from the input file name (a)
for example, say i have 10 text files like this
1_a_test.txt
2_b_test.txt
3_c_test.txt
...
10_j_test.txt
I want to run my tool (say biotool) on all the 10 text files. I tried this
parallel biotool -i {} -o {.}.out -a {} ::: *.txt
I want to pass the charachter/letter/whatever before the first underscore from the input text file name as an argument to -a option like this (dry run):
parallel biotool -i 1_a_test.txt -o 1_a_test.out -a 1 ::: *.txt`
parallel biotool -i 2_b_test.txt -o 2_b_test.out -a 2 ::: *.txt`
parallel biotool -i 3_c_test.txt -o 3_c_test.out -a 3 ::: *.txt`
...
{} supplies the complete file name to -a but I only want the sub string before the first underscore to be supplied to -a
The easiest, but harder to read is this:
parallel --dry-run biotool -i {} -o {.}.out -a '{= s/_.*// =}' ::: *test.txt
Alternatively, you can make a bash function that uses bash Parameter Substitution to extract the part before the underscore. Then export that to make it known to GNU Parallel
#!/bin/bash
doit(){
i=$1
o=$2
# Use internal bash parameter substitution to extract whatever precedes "_"
# See https://www.tldp.org/LDP/abs/html/parameter-substitution.html
a=${i/_*/}
echo biotool -i "$i" -o "$o" -a "$a"
}
export -f doit
parallel doit {} {.}.out ::: *test.txt
Sample Output
biotool -i 10_j_test.txt -o 10_j_test.out -a 10
biotool -i 1_a_test.txt -o 1_a_test.out -a 1
biotool -i 2_b_test.txt -o 2_b_test.out -a 2

GNU parallel arguments

From the example
seq 1 100 | parallel -I ## \ > 'mkdir top-##;seq 1 100 | parallel -X mkdir top-##/sub-{}
How do -X , ##, {} work? Also, what will be the behavior when '1' or '.' is passed inside {}? Is /> used for redirection here?
I was trying to go through the tutorial from https://www.youtube.com/watch?v=P40akGWJ_gY&list=PL284C9FF2488BC6D1&index=2 and reading through man parallel page. I am able to gather some basic knowledge but not exactly how to use it or as such.
Let's do the easy stuff first.
The backslash (\) is just telling the shell that the following line is a continuation of the current one, and the greater than sign (>) is the shell prompting for the continuation line. It is no different from typing:
echo \
hi
where you will actually see this:
echo \
> hi
hi
So, I am saying you can ignore \> and just run the command on a single line.
Next, the things in {}. These are described in the GNU Parallel manual page, but essentially:
{1} refers to the first parameter
{2} refers to the second parameter, and so on
Test this with the following where the column separator is set to a space but we use the parameters in the reverse order:
echo A B | parallel --colsep ' ' echo {2} {1}
B A
{.} refers to a parameter, normally a filename, with its extension removed
Test this with:
echo fred.dat | parallel echo {.}
fred
Now let's come to the actual question, with the continuation line removed as described above and with everything on a single line:
seq 1 100 | parallel -I ## 'mkdir top-##;seq 1 100 | parallel -X mkdir top-##/sub-{}'
So, this is essentially running:
seq 1 100 | parallel -I ## 'ANOTHER COMMAND'
Ole has used ## in place of {} in this command so that the substitutions used in the second, inner, parallel command don't get confused with each other. So, where you see ## you just need to replace it with the values from first seq 1 100.
The second parallel command is pretty much the same as the first one, but here Ole has used X. If you watch the video you link to, you will see that he previously shows you how it works. It actually passes "as many parameters as possible" to a command according to the system's ARGMAX. So, if you want 10,000 directories created, instead of this:
seq 1 10000 | parallel mkdir {}
which will start 10,000 separate processes, each one running mkdir, you will start one mkdir but with 10,000 parameters:
seq 1 10000 | parallel -X mkdir
That avoids the need to create 10,000 separate processes and speeds things up.
Let's now look at the outer parallel invocation and do a dry run to see what it would do, without actually doing anything:
seq 1 100 | parallel -k --dry-run -I ## 'mkdir top-##;seq 1 100 | parallel -X mkdir top-##/sub-{}'
Output
mkdir top-1;seq 1 100 | parallel -X mkdir top-1/sub-{}
mkdir top-2;seq 1 100 | parallel -X mkdir top-2/sub-{}
mkdir top-3;seq 1 100 | parallel -X mkdir top-3/sub-{}
mkdir top-4;seq 1 100 | parallel -X mkdir top-4/sub-{}
mkdir top-5;seq 1 100 | parallel -X mkdir top-5/sub-{}
mkdir top-6;seq 1 100 | parallel -X mkdir top-6/sub-{}
mkdir top-7;seq 1 100 | parallel -X mkdir top-7/sub-{}
mkdir top-8;seq 1 100 | parallel -X mkdir top-8/sub-{}
...
...
mkdir top-99;seq 1 100 | parallel -X mkdir top-99/sub-{}
mkdir top-100;seq 1 100 | parallel -X mkdir top-100/sub-{}
So, now you can see it is going to start 100 processes, each of which will make a directory then start 100 further processes that will each create 100 subdirectories.

Groovy shell script with a sed command in a Jenkins Pipeline

So writing Groovy with basic shell scripts seem to be much more difficult than it really should be.
I have a pipeline that needs to replace an entry in a file after running a packer command. It seems sensible to do this in the same shell script as the packer command as the variables are not available outside of the shell script even when exported.
The problem is that the sed command needs escape upon escape and still doesn't work. So this is what the Jenkins Pipeline Syntax generator suggested:
parallel (
"build my-application" : {
sh '''#!/bin/bash
export PATH=$PATH:~/bin
cd ${WORKSPACE}/platform/packer
packer build -machine-readable template.json | tee packer.out
AMI_APP=$(grep amazon-ebs,artifact,0,id,eu-west-2:ami- packer.out | awk -F: \'{ print $NF }\')
[[ ! ${AMI_APP} ]] && exit 1
sed -i.bak \'s!aws_ami_app = \\".*\\"!aws_ami_app = \\"\'"${AMI_APP}"\'\\"!\' ${WORKSPACE}/platform/terraform/env-${ENV}/env.auto.tfvars
'''
},
"build some-more-apps" : {
sh ''' *** same again different name ***
'''
}
)
What is the correct way to get a variable is a sed command working in a bash script running in groovy?
Any tips for the correct syntax going forward with Jenkins, groovy and bash - any documentation that actually helps?
EDIT
The original sed command that is running in a Jenkins Job shell is:
sed -i.bak 's!aws_ami_app = \".*\"!aws_ami_app = \"'"${AMI_APP}"'\"!' ${WORKSPACE}/platform/terraform/env-${ENV}/env.auto.tfvars
Because you put the shell script inside ''' which won't trigger Groovy String interpolation.
So you no need to escape any character, write the script as when you typing in Shell cmd window.
Below is example:
sh '''#!/bin/bash +x
echo "aws_ami_app = docker.xy.com/xy-ap123/conn:7et45u.1.23" > test.txt
echo "cpu = 512" >> test.txt
cat test.txt
AMI_APP=docker.xy.com/xy-ap123/conn:7et45u.1.25
sed -i 's,aws_ami_app.*,aws_ami_app = '"$AMI_APP"',' test.txt
cat test.txt
'''
Output in jenkins console:
[Pipeline] sh
[poc] Running shell script
aws_ami_app = docker.xy.com/xy-ap123/conn:7et45u.1.23
cpu = 512
aws_ami_app = docker.xy.com/xy-ap123/conn:7et45u.1.25
cpu = 512

jenkins pipeline: multiline shell commands with pipe

I am trying to create a Jenkins pipeline where I need to execute multiple shell commands and use the result of one command in the next command or so. I found that wrapping the commands in a pair of three single quotes ''' can accomplish the same. However, I am facing issues while using pipe to feed output of one command to another command. For example
stage('Test') {
sh '''
echo "Executing Tests"
URL=`curl -s "http://localhost:4040/api/tunnels/command_line" | jq -r '.public_url'`
echo $URL
RESULT=`curl -sPOST "https://api.ghostinspector.com/v1/suites/[redacted]/execute/?apiKey=[redacted]&startUrl=$URL" | jq -r '.code'`
echo $RESULT
'''
}
Commands with pipe are not working properly. Here is the jenkins console output:
+ echo Executing Tests
Executing Tests
+ curl -s http://localhost:4040/api/tunnels/command_line
+ jq -r .public_url
+ URL=null
+ echo null
null
+ curl -sPOST https://api.ghostinspector.com/v1/suites/[redacted]/execute/?apiKey=[redacted]&startUrl=null
I tried entering all these commands in the jenkins snippet generator for pipeline and it gave the following output:
sh ''' echo "Executing Tests"
URL=`curl -s "http://localhost:4040/api/tunnels/command_line" | jq -r \'.public_url\'`
echo $URL
RESULT=`curl -sPOST "https://api.ghostinspector.com/v1/suites/[redacted]/execute/?apiKey=[redacted]&startUrl=$URL" | jq -r \'.code\'`
echo $RESULT
'''
Notice the escaped single quotes in the commands jq -r \'.public_url\' and jq -r \'.code\'. Using the code this way solved the problem
UPDATE: : After a while even that started to give problems. There were certain commands executing prior to these commands. One of them was grunt serve and the other was ./ngrok http 9000. I added some delay after each of these commands and it solved the problem for now.
The following scenario shows a real example that may need to use multiline shell commands. Which is, say you are using a plugin like Publish Over SSH and you need to execute a set of commands in the destination host in a single SSH session:
stage ('Prepare destination host') {
sh '''
ssh -t -t user#host 'bash -s << 'ENDSSH'
if [[ -d "/path/to/some/directory/" ]];
then
rm -f /path/to/some/directory/*.jar
else
sudo mkdir -p /path/to/some/directory/
sudo chmod -R 755 /path/to/some/directory/
sudo chown -R user:user /path/to/some/directory/
fi
ENDSSH'
'''
}
Special Notes:
The last ENDSSH' should not have any characters before it. So it
should be at the starting position of a new line.
use ssh -t -t if you have sudo within the remote shell command
I split the commands with &&
node {
FOO = world
stage('Preparation') { // for display purposes
sh "ls -a && pwd && echo ${FOO}"
}
}
The example outputs:
- ls -a (the files in your workspace
- pwd (location workspace)
- echo world

Why is capistrano interpreting a flag passed with a command to `run` as input?

I'm trying to do this:
run "echo -n 'foo' > bar.txt"
and the contents of bar.txt ends up being:
-n foo \n
(With \n representing an actual newline)
I use run for other commands like rm -rf and, to my knowledge, it works fine.
I just found this in man echo:
Some shells may provide a builtin echo command which is similar or identical to this utility. Most notably, the builtin echo in sh(1) does not accept the -n option. Consult the builtin(1) manual page.
My version of bash has an echo builtin but seems to be respecting the -n flag. It looks like the shell on your deployment machine doesn't, in which case using the full path to the echo binary might do what you want here:
run "/bin/echo -n 'foo' > bar.txt"
It appears as though the -n flag isn't being interpreted as a flag by the shell. If, from the command line, one executes echo -Y hi, the output will be -Y hi.

Resources