How do I increase timeout for a cronjob/crontab? - timeout

I have written a script that gets data from solr for which date is within the specified period, and I run the script using as a daily cron.
The problem is the cronjob does not complete the task. If I manually run the script (for the same time period), it works well. If I reduce the specified time period, the script runs from the cron as well. So my guess is cronjob is timing out while running the script is there is much data to process.
How do I increase the timeout for cronjob?
PS - 1. The script I am running in cronjob is a bash script which runs a python script.

Note that the ulimit -t solution suggested will limit the amount of CPU time used, not the amount of actual time that has passed.
From the bash manpage:
ulimit [-HSTabcdefilmnpqrstuvx [limit]]
...
-t The maximum amount of cpu time in seconds
And more importantly, cron doesn't impose any timeouts in the first place. It simply kicks off whatever process and moves on.
BTW: Sorry for posting this as an answer, but I don't have enough points to add comments yet.

You could try to use ulimit -t [number of seconds] in the cronjob before running the script.

Related

Limit the docker containers executed in a loop

I am very new to docker, have created a Dockerfile to create an image that executes the protractor tests.
That Dockerfile has an entry point that expects a parameter with Suite Name that I want to execute.
It all runs very well when I provide suite names in the command line.
I have about 30 test suites, I am using another .sh file which filters out suite names, and in a for loop, it runs docker commands with different suite names.
Now I do not want to sun 30 suites simultaneously but want to set a limit of say 6 at a time and want to keep others waiting until one is finished.
I execute like this:
for (( i=0; i<${tests}; i++ ));
do
docker run -dit containername $testSuiteName
done
So how can I limit the maximum number of executions at a time?
There are going to be a number of ways of tackling this problem. Here
is one possible solution.
You can treat this as a shell scripting problem, rather than a Docker problem. For example, consider the following, which instead of docker run ... just uses sleep:
#!/bin/sh
let count=5
let tests=20
for (( i=0; i<tests; i++ )); do
sleep $((RANDOM % 10)) &
echo "started $!"
let count--
if (( count <= 0 )); then
wait -n
let count++
fi
done
echo "waiting for remaining jobs"
wait
echo "all done"
This starts $count processes in parallel, and then waits for one to
exit. When a process exits, it immediately starts a new one. Once it
has started all the jobs, it simply waits for everything to finish.
Using this model, you would drop the -d from your docker run
command line, since you need the shell to track the background
processes. Instead of sleep ... &, you would run:
docker run containername $testSuiteName > /dev/null 2>&1 &
Note that I've dropped the -i and -t options here, since it looks like
you're running these tests non-interactively.
A few more possible solutions:
If you run your tests in jenkins, you can create a job that runs one test suite. Set the max number of executors to 6. And start as many tests as you want. Now jenkins won't let run more than 6 jobs at a time.
This is I think the ideal and most correct approach, yet most difficult one. You can use an orchestrator as kubernetes. This actually controls all your docker image. Unfortunately, I don't have step by step guide how to achieve. But this is really the most professional way to tackle your problem

Stop JMeter script after a certain time in jenkins build

While running JMeterbuild in jenkin, it run as infinite mode "never stops" even though the configuration is set in JMX.
How to stop JMeter build after a certain time ?
I have tried to provide command line arguments thinking it isn't rewarding jmx configuration.
PATH/jmeter -Jjmeter.save.saveservice.output_format=xml -Jduration=60 -n -t Main.jmx -l Mainreport.csv
Expecting JMeterbuild to stop after 60 seconds but it never ends, monitored for 30 minutes.
You need to get value using ${__P(duration,)} in JMeter Scheduler's duration field:
You can add __P default value in second argument, for example, for default 30
${__P(duration,30)}

Java process killed due to out of memory after running job in container for an hour

I have setup a job for running automation tests in CircleCI (https://hub.docker.com/r/jiteshsojitra/docker-headless-vnc-container), it works fine but after running tests for an hour it reaches to memory limit and suddenly kills running Java/ant job. So is there any way to increase the container memory so tests can be ran for 5-6 hours in container or its paid feature?
I tried by putting - JAVA_OPTS: -Xms512m -Xmx1024m in YAML script but overall container memory size reaches to ~4GB as it looks like.
References:
https://circleci.com/gh/jiteshsojitra/zm-selenium/231
https://circleci.com/api/v1.1/project/github/jiteshsojitra/zm-selenium/231/output/106/0?file=true
Log trail:
BUILD FAILED
/headless/zm-selenium/build.xml:348: Java returned: 137
Total time: 76 minutes 26 seconds
Exited with code 137
Hint: Exit code 137 typically means the process is killed because it was running out of memory
Hint: Check if you can optimize the memory usage in your app
Hint: Max memory usage of this container is 4286337024
according to /sys/fs/cgroup/memory/memory.max_usage_in_bytes
We had this problem. It is a limit in CircleCI (or the VM, really). Only solution is to make your app use less memory.
fiskeben is right. I think your mentioned container is an fork of our consol/docker-headless-vnc-container image. So you can add in the startup script the following lines.
# set correct java startup
export _JAVA_OPTIONS="-Duser.home=$HOME -Xmx${JVM_HEAP_XMX}m"
# add docker jvm flags, can maybe removed with JDK9
export _JAVA_OPTIONS="$_JAVA_OPTIONS -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap"
Now you would be able to set the environment variable JVM_HEAP_XMX with amount of megabytes your JVM should use, e.g.
docker run -e JVM_HEAP_XMX=512 ...
If you want to determine the size dynamically take a look at that script jvm_options.sh.

watching memory in PBS

I'm running a job on a cluster (using PBS) that runs out of memory. I'm trying to print the memory status for each node separately while my other job is running. I created a shell script and included a call to that script from inside my job submission script. But when I'm submitting my job it gives me permission denied error on the line that calls the script. I don't understand why do I get that error.
Secondly, I was thinking that I can have a 'watch free' or 'watch ps aux' in my script file but now I'm thinking if that will cause my submitted job to get stuck in memory-watching script and never continue to get to the main line that calls my parallel program.
After all, how can I achieve logging my memory in PBS for the jobs I'm submitting. My code is a C++ program using MRMPI (MPI MapReduce) library.
To see how much memory is being used throughout the job, run qstat -f:
$ qstat -f | grep used
resources_used.cput = 00:02:51
resources_used.energy_used = 0
resources_used.mem = 6960kb
resources_used.vmem = 56428kb
resources_used.walltime = 00:01:26
To examine past jobs you can look in the accounting file. This is located in the server_priv/accounting directory, the default is /var/spool/torque/server_priv/accounting/.
The entries look like this:
09/14/2015 10:52:11;E;202.napali;user=dbeer group=company jobname=intense.sh queue=batch ctime=1442248534 qtime=1442248534 etime=1442248534 start=1442248536 owner=dbeer#napali exec_host=napali/0-2 Resource_List.neednodes=1:ppn=3 Resource_List.nodect=1 Resource_List.nodes=1:ppn=3 session=20415 total_execution_slots=3 unique_node_count=1 end=0 Exit_status=0 resources_used.cput=1989 resources_used.energy_used=0 resources_used.mem=9660kb resources_used.vmem=58500kb resources_used.walltime=995
NOTE: if your ssh access to computing nodes of the cluster is closed, this method won't work!
This is how I ended up doing this. It might not be the best way but it works:
In summary, I added some short sleep periods in between my map and reduce steps by calling c++ sleep() function. And also wrote a script that ssh's to the nodes my job is running on and then gets the memory status on those nodes writing them in a file (using 'free' or 'top' commands).
More detailed: in my PBS job script, somewhere before the call to my binary, I added this line:
#this goes in job script, before the call to the job binary:
cat $PBS_NODEFILE > /some/path/nodelist.log
This writes a list of the nodes that my job runs on, into a file.
I have a second script "watchmem.sh":
#!/bin/bash
for i in $(seq 60)
do
while read line;
do
ssh $line 'bash -s' < /some/path/remote.sh "$line"
done < /some/path/nodelist.log
sleep 10
done
This script reads the file nodelist.log that we generated before, performs an ssh into each node and calls a third (and last script), remote.sh, on each of those nodes.
remote.sh contains the commands that we run on every node of our job. In this case it prints the current time and the result of 'free' into separate files for each node:
#remote.sh
echo "Current time : $(date)" >> $1
free >> $1 #this can be replaced by top by specifying a -n for it
Comparing the times from these files and the times I'm printing from my binary let's me find out the memory consumption (alloc/dealloc) in each step.
The sleep periods in my job is to make sure my scripts capture the memory status in between steps. 'sleep 10' in my script is to avoid unnecessary writes to the file; this period should be comparable to the sleep duration in the main job.

To use lsf bsub command without all the verbosity output

My problem is that: I have a bash script that do something and then call 800 bsub jobs like this:
pids=
rm -f ~/.count-*
for i in `ls _some_files_`; do
of=~/.count-${i}
bsub -I "grep _something_ $i > $of" &
pids="${!} ${pids}"
done
wait ${pids}
Then the scripts process the output files $of and echo the results.
The trouble is that I got a lot of lines like:
Job <7536> is submitted to default queue <interactive>.
<<Waiting for dispatch ...>>
<<Starting on hostA>>
It's actually 800 times the 3 lines above. Is there a way of suppressing this LSF lines?
I've tried in the loop above:
bsub -I "grep _something_ $i > $of" &> /dev/null
I does remove the LSF verbosity but instead of submitting almost all 800 jobs at once and then take less than 4 min to run, it submits just few jobs at a time and I have to wait more than an hour for the script to finish.
AFAIK lsf bsub doesn't seem to have a option to surpress all this verbosity. What can I do here?
You can suppress this output by setting the environment variable BSUB_QUIET to any value (including empty) before your bsub. So, before your loop say you could add:
export BSUB_QUIET=
Then if you want to return it back to normal you can clear the variable with:
unset BSUB_QUIET
Hope that helps you out.
Have you considered using job dependencies and post-process the logfiles?
1) Run each "child" job (removing the "-Is") and output the IO to separate output file. Each job should be submitted with a jobname (see -J). the jobname could form an array.
2) Your final job would be dependent on the children finishing (see -w).
Besides running concurrent across the cluster, another advantage of this approach is that your overall process is not susceptible to IO issues.

Resources