I'm executing the following command which executes a group of scripts with each script being a curl download.
parallel --resume-failed --joblog logshd.log {1} ::: SH/*.sh
The set of files downloaded is quite large. I've noticed some files don't download.
I hoped that the resume-failed parameter would ensure that all the downloads that fail resume and complete.
I'm not clear on if that means I need to run the process again a second time or if that should occur when I run the one time.
From the gnu documentation
Where --resume-failed reads the commands from the command line (and
ignores the commands in the joblog), --retry-failed ignores the
command line and reruns the commands mentioned in the joblog.
I'm not clear on what ignoring the command line or ignores the commands in the job log means. Could that be clarified.
Can --resume-failed and --retry-failed be declared within the same command and if so what is the effect of that?
Regards
Conteh
If we assume the download fails intermittently then your answer is --retries 10. It will run the command 10 times before giving up.
--resume-failed and --retry-failed are both used when GNU Parallel has finished, and you then figure out that you want to retry some of the jobs again.
The difference between the two is in how to retry the command.
--retry-failed will run exactly the same command as failed before. It does that by looking in the joblog for the command. This is typically what you want.
--resume-failed is used if you figure out that the failing command actually needed some other parameter: i.e. GNU Parallel should not run exactly the same command, but it should run a (typically slightly changed) command with the same parameters instead.
Related
I received this error message which means something is erroring inside a bash script executed by the Dockerfile.
As an example, if something inside test.sh errors:
RUN test.sh
# 16 ERROR: executor failed running [/bin/sh -c test.sh]: exit code: 127
Question
What is the recommended way to gain visibility over the exact error message (i.e. to find out what's gone wrong) and to diagnose which line(s) of a bash script executed from a Dockerfile are problematic? Can docker be made to provide the output of the bash script so the exact error message is provided? Rather than just the somewhat cryptic:
executor failed running exit code: 127
as seen here.
What I know so far
One way to diagnose which line(s) is playing up is to survey the script, assess which line(s) might be causing problems, and comment out the suspect line and everything after it. If the error goes away, you've found the (first) line that is a problem, and it can be addressed. Rinse and repeat until the script is error-free. But this seems more manual than one would hope.
I'm trying to get a Jenkins job to run sfdx force:data:soql:query commands in order to migrate configuration data sets between our production org and our sandboxes after a refresh. Certain configurations do not persist on a refresh so we need a way to move that data.
Running the queries from the command line on the Jenkins server work as expected, however the job when it runs fails with the following error:
'C:\Program' is not recognized as an internal or external command, operable program or batch file.
Build step 'Execute shell' marked build as failure
The job does three things:
Authorizes to the DevHub, lists out the connected orgs, and then performs a SQOL query to just print some data - 16 lines to be exact. Here are the commands in the shell script of the job:
sfdx force:auth:jwt:grant -i ${CONNECTED_APP_CONSUMER_KEY} -u ${DEV_HUB} -f ${JENKINS_HOME}/certs/prod/server.key -r [...] -a DevHub
sfdx force:org:list
sfdx force:data:soql:query -u ${DEV_HUB} -q "SELECT Id, Name FROM [...tablename...]" -r human
I am completely stumped on why this is happening. Again, running the SOQL command directly on the server through PowerShell or Command Line works as expected. I would appreciate any help with this.
This one stumped me for a long time but we finally got it figured out.
If you are seeing this error, make sure to check your machine's environmental variables. I saw a TON of other answers pointing to this as the issue where the install of SFDX path name had spaces in it as in C:|P:rogram Files\SFDX\bin but only showed some weird command line FOR loop that made no sense what so ever.
What we did was to completely uninstall all of SFDX making sure none of it was left on the machine and reinstalled into a folder we made where there was no spaces in the path name.
Once we did that, our job worked like it was supposed to. I hope this helps others who run into this same issue.
I currently have a build of an application that is set to run infinitely. It is designed to run on a Raspberry Pi as a service, so it will continuously be running.
Whenever I try to test it on Travis-CI, the infinite loop portion draws an error even though the file builds correctly since it is running infinitely. Is there any way to stop this error, or do I have to remove the ability to run the build from the .travis.yml?
language: cpp
compiler:
- clang
- g++
script:
- make
- cd main
- ./jsonWeatherPrediction
I would expect it to error, I'm just not sure of a current way to stop it without removing - ./jsonWeatherPrediction
I don't know if this will help, but the build is located at https://travis-ci.org/DMoore12/json-weather-prediction
Thanks in advance :)
In most any reasonable CI workflow, the job should have well-defined start and finish. Your software you are testing may run forever, but your tests should not. So, first, I suggest re-thinking how you run your build.
Looking at build such as https://travis-ci.org/DMoore12/json-weather-prediction/jobs/474719832, I see that you are simply running your command (which raises a different question: The command is printing the same output forever in a tight loop. Is this the desired behavior?).
For testing, you need a different kind of behavior, one that can be tested (e.g., take input from STDIN or a command-line flag, print, and terminate).
I am unable to run repo non-interactively inside a container as part of a freestyle job.
It prompts for the user-name and email. I got round that by doing a git config --global inside the job.
But then it does the color test, and that hangs indefinitely.
Looking at the source code for repo I see this
if os.isatty(0) and os.isatty(1) and not self.manifest.IsMirror:
if opt.config_name or self._ShouldConfigureUser():
self._ConfigureUser()
self._ConfigureColor()
So, I ran the following inside the container:
python -C "import os; print os.isatty(0), os.isatty(1)"
and, sure enough, it printed out True True
Looking at the Jenkins log, it launches the container with --tty specified, and there seems no way to configure that option.
I can't find a bash option to force a script to be run in a non-interactive shell. If I put the above python line in a file and execute it with almost any combination of commands and options, it still prints out True True
The only way I see something different is if I use I/O redirection
bash <a.sh
which prints out False True - i.e. stdin is not a tty, and
bash <a.sh >a.log
which prints False False.
For a complex script, are there any problems using the bash <script approach?
Does anyone know any jenkins magic to prevent docker being launched using --tty?
I know that the --tty is the culprit. I built the container locally and ran the following
$ docker run repotest python -c "import os;print os.isatty(0), os.isatty(1)"
False False
$ docker run --tty repotest python -c "import os;print os.isatty(0), os.isatty(1)"
True True
Running Versions:
repo: 1.12.37 (per Ubuntu 16.04 apt-get)
Jenkins: 2.149
Cloudbees Docker Plugin: 1.7.3
Container base is ubuntu:xenial
I'm using the "Build inside a docker container" option.
To run bash script repo_script.sh "non-interactively", or more exactly speaking without having terminals associated with standard streams, you could run your script simply as
repo_script.sh < /dev/null 2>&1 | cat
assuming you want to see the output the way you would see it running simply as repo_script.sh. By piping the standard output and error to a different process the file descriptor appears as a pipe and not TTY to repo_script.sh. You could also direct output to a file, or even to /dev/null if you do not care about the output:
log_file=/dev/null
repo_script.sh < /dev/null > "${log_file}" 2>&1
Running the script as
bash < repo_script.sh | cat
might would work too, though it is very unorthodox and to my mind hackish way of running a script just to break the association of TTY to the standard input. From script engine point of view, it is different to read a script program from a file than from standard input (which typically, if it is a terminal, is not seekable), so there might be some subtle differences that could possibly bite you in unexpected ways. This way does not as clearly communicate your intention to the next person that need to understand your code, and may lead to partial hair loss in that person due to extraneous head scratching.
There is no need for any bash options, just using the output directions from within the interpreting shell as above described is an easy-to-comprehend, multi-platform compatible standard convention for changing the standard stream associations.
P.S. I think it should be enough for your repo script to just test if the standard input is a TTY. It looks to me like the author of that script did not think deeply enough there. There is simply no use waiting for input if you do not have terminal device associated with standard input, and you could determine that everything needs to run without user interaction from there or stop with an error if that is not possible.
I'm running a job on a cluster (using PBS) that runs out of memory. I'm trying to print the memory status for each node separately while my other job is running. I created a shell script and included a call to that script from inside my job submission script. But when I'm submitting my job it gives me permission denied error on the line that calls the script. I don't understand why do I get that error.
Secondly, I was thinking that I can have a 'watch free' or 'watch ps aux' in my script file but now I'm thinking if that will cause my submitted job to get stuck in memory-watching script and never continue to get to the main line that calls my parallel program.
After all, how can I achieve logging my memory in PBS for the jobs I'm submitting. My code is a C++ program using MRMPI (MPI MapReduce) library.
To see how much memory is being used throughout the job, run qstat -f:
$ qstat -f | grep used
resources_used.cput = 00:02:51
resources_used.energy_used = 0
resources_used.mem = 6960kb
resources_used.vmem = 56428kb
resources_used.walltime = 00:01:26
To examine past jobs you can look in the accounting file. This is located in the server_priv/accounting directory, the default is /var/spool/torque/server_priv/accounting/.
The entries look like this:
09/14/2015 10:52:11;E;202.napali;user=dbeer group=company jobname=intense.sh queue=batch ctime=1442248534 qtime=1442248534 etime=1442248534 start=1442248536 owner=dbeer#napali exec_host=napali/0-2 Resource_List.neednodes=1:ppn=3 Resource_List.nodect=1 Resource_List.nodes=1:ppn=3 session=20415 total_execution_slots=3 unique_node_count=1 end=0 Exit_status=0 resources_used.cput=1989 resources_used.energy_used=0 resources_used.mem=9660kb resources_used.vmem=58500kb resources_used.walltime=995
NOTE: if your ssh access to computing nodes of the cluster is closed, this method won't work!
This is how I ended up doing this. It might not be the best way but it works:
In summary, I added some short sleep periods in between my map and reduce steps by calling c++ sleep() function. And also wrote a script that ssh's to the nodes my job is running on and then gets the memory status on those nodes writing them in a file (using 'free' or 'top' commands).
More detailed: in my PBS job script, somewhere before the call to my binary, I added this line:
#this goes in job script, before the call to the job binary:
cat $PBS_NODEFILE > /some/path/nodelist.log
This writes a list of the nodes that my job runs on, into a file.
I have a second script "watchmem.sh":
#!/bin/bash
for i in $(seq 60)
do
while read line;
do
ssh $line 'bash -s' < /some/path/remote.sh "$line"
done < /some/path/nodelist.log
sleep 10
done
This script reads the file nodelist.log that we generated before, performs an ssh into each node and calls a third (and last script), remote.sh, on each of those nodes.
remote.sh contains the commands that we run on every node of our job. In this case it prints the current time and the result of 'free' into separate files for each node:
#remote.sh
echo "Current time : $(date)" >> $1
free >> $1 #this can be replaced by top by specifying a -n for it
Comparing the times from these files and the times I'm printing from my binary let's me find out the memory consumption (alloc/dealloc) in each step.
The sleep periods in my job is to make sure my scripts capture the memory status in between steps. 'sleep 10' in my script is to avoid unnecessary writes to the file; this period should be comparable to the sleep duration in the main job.