GNU parallel not spawning new jobs - gnu-parallel

I've been using GNU parallel (20120422) to run ~4000 commands (inside the commands.list file) like:
cat commands.list | parallel --keep-order --max-procs 40
It all started ok, with 40 jobs runnning, but afer a while it only had 5 jobs running (some jobs take way longer than others). I waited for a couple of hours and at some point one of those 5 jobs finished and all of the sudden more jobs were spawned.
Is this behaviour intended? I would expect that it would always try to have ~40 jobs running, no? Is there some buffer limit for the --keep-order option that prevents more jobs to be launched?
thanks,

That is not the intended behaviour. Please submit a bug report as per: http://www.gnu.org/software/parallel/man.html#REPORTING-BUGS

Related

Jenkins executor busy - job with loading bar but no link or id - ghost job

After Jenkins restart we found few nodes with busy executor. The job that occupies executor have striped white blue loading bar and does not link to any specific build (in fact no build is ongoing for that job). So we don't have id or ui way to abort it, you can see it here:
How the job looks on jenkins node
Now, I wanted to find a way to kill it without really looking into cause of the issue, maybe its related to Jenkins pipeline job wont finish in the UI - but in our case we don't have underlying finished job.
We tried to kill it by:
Restarting node
Killing any jenkins/agent threads on node - it just caused node to disconnect
Locating it somehow via ui
None of above worked, the ghost job was still there. Any clues how to kill such job or at least point to it without id ?
Edit: I found similar thread How to stop an unstoppable zombie job on Jenkins without restarting the server? with plenty answers though different solution that didn't work for me
Ok, so I've found a way to free these executors via groovy script executed on Jenkins Script Console.
The way I managed to kill it was to get node by label as Computer, iterate through executors (or rather, call the only one ;) ) and call Interrupt()
def busyExecutors = Jenkins.instance.getNode("myNode").toComputer()
println "Busy Executors list"
println busyExecutors;
println busyExecutors.getExecutors().get(0)
println busyExecutors.getExecutors().get(0).interrupt()
Here was my result of the script. Important part is of course the interrupt, other prints above are just for information.

The Dataflow job appears to be stuck because no worker activity has been seen in the last 1h

My dataflow batch job not end in 5 hours. still canceling.
Im running this type of job in scheduler every 10 min.
normally, it is finished in 10 min.
but it tooks over 5 hour!
My job is
2018-08-26_13_30_17-1172470278423820020
Error log is here
Stackdriver
2018-08-27 (06:33:14) Workflow failed. Causes: The Dataflow job appears to be stuck because no worker activity has been se...
2018-08-27 (08:34:08) Workflow failed. Causes: The Dataflow job appears to be stuck because no worker activity has been se...
2018-08-27 (10:34:58) Workflow failed. Causes: The Dataflow job appears to be stuck because no worker activity has been se...
Workflow failed. Causes: The Dataflow job appears to be stuck because no worker activity has been seen in the last 1h. You can get help with Cloud Dataflow at https://cloud.google.com/dataflow/support.
Generally The Dataflow job appears to be stuck because no worker activity has been seen in the last 1h can be caused by too long setup progress. Just increase worker resources (via --machine_type parameter) to overcome the issue.
In my case I was installing several dependencies that required building wheels (pystan, fbprophet) and it took more than an hour on the minimal machine (n1-standard-1 with 1 vCPU and 3.75GB RAM). Using more powerful instance (n1-standard-4 which has 4 times more resources) solved my problem.
It is possibly not the case if there is single job stuck but may help someone else getting the same error.
Such situation can happen in three major cases:
a) Some task takes more than an hour to process.
b) Some task got stuck processing.
This is generally caused by transforms that take too long to process, or enter blocking state.
Best way to debug is to check for previous logs and see if there were any errors, or unexpected state. It seems that you have tried to re-run the job and it still failed. In this case you can add extra logs to the step that got stuck and see which data it got stuck at.
c) There is failure on Apache Beam / Dataflow side.
This is a rare case. Please, create support ticket if you believe this is the issue.
I just recovered from this issue.
As mentioned in other responses, this error is due to long setup progress. This is probably the incapability of pip to resolve the dependencies in the setup.py file. It does not fail; it tries in vain "forever" to find suitable versions leading to a timeout.
Recreate the environment locally using the necessary packages.
Mark the versions that satisfy the dependencies.
Define these versions explicitly in the setup.py.
I don't know why this worked but updating the package dependencies in setup.py got the job to run.
dependencies = [
'apache-beam[gcp] == 2.25.0',
'google-cloud-storage == 1.32.0',
'mysql-connector-python == 8.0.22',
]

Triggering a Jenkins job on a timer from a job run through polling

We currently poll svn and run a job if there are any changes. We then trigger a job if the initial job passes.
Additionally, I'd like to trigger a second job that only runs once a day. So if the initial job (job 1) runs 40 times, job 2 would also run 40 times, but job 3 would only run 1 time. (It can be decoupled as long as job 3 knows exactly what machine the last instance of job 1 ran on)
My initial thought was to use a plugin similar to Node stalker (https://wiki.jenkins-ci.org/display/JENKINS/Node+Stalker+Plugin) to just get the value of the node the previous run was on. The plugin doesn't appear to be working (it runs on whatever node as if the plugin does nothing).
Is there another way of doing this?
I am unaware of another way to do this in a similar manner to node stalker, however two other options come to mind.
The ugly:
If all the machines have a network drive they can access just keep a text file and when a machine completes job 1 and 2 successfully then it updates that text file with a unique identifier for that name, then Job three reads that file and it knows who ran jobs 1 and 2 last.
The less ugly:
This one depends on how long it takes to run jobs 1 and 2 (shorter is better, longer may not be feasible). Run a 4 job chain:
job3 launcher -> job 1 -> job 2 -> job 3.
This way you can track what machine is being used for job 1 and job 2 and pass those along as build variables into job 3.

Jenkins gets very slow after saving some Jobs

i know there are several issues with a topic close to this one. But as fas as I searched i did not found a thread/question with the same topic.
So here is the situation:
On our Jenkins Server we a many build jobs (maybe a few hundred). Some of them running on Slaves, some on the master. Now i was asked to change settings of some of them (lets say 50), so the have project based security and I had to change the slave server they are running on. Before they ran already on a Slave, but a different one.
The Problem:
In the beginning everything went fine. I changed the settings ob several jobs quick and startet to change the settings of the next job. But after some time the configuration settings began to load slower and slower. First it were a few secounds(after 10 Job), then a few more secounds (after 20 jobs), then like one minute (after 30 jobs) and now several minutes (after 40 jobs). I open every settings page in a new tap and close the tab once I finished my configurations.
My Question:
Why does it take Jenkins so long to open up the configuration page? Especially because in the beginning there was nearly no loading tim and now after I changed a few jobs i have to wait minutes for it. What could be the reason?
You need to see in the first step what is configuration of this new slave,
Issue can be memory based, so on master instance check the memory usage and check the java process, can be done with strace -p <PID> depends on your environment.
Instance can be slower due to memmory usage which is in the most cases.

Jenkins builds all starting late

We've inherited a set of Jenkins builds. They all seem to start 90-100 seconds after the desired time. For example, a build with a schedule of */5 * * * * starts at :01:37, :06.29, :11:43, etc., instead of :00, :05, :10, etc. that I would expect.
There are a few builds set to run at /5, but they are all delayed and anyway only last a few seconds each.
I see a global 'quiet period' setting of 5.
The system as a whole does not seem busy. There are usually idle executors, and often nothing at all is building.
For most of the builds this isn't a concern, but there are a couple that we would like to make as precise as possible.
Is my expectation wrong? Is there a config option I'm missing? I should add that I am new to Jenkins and may be missing something obvious.
Thanks
We did not find the cause of Jenkins jobs starting late. We hacked up a workaround by having Jenkins start a script on the remote server that sleeps until the desired time. This creates a new problem by tying up a Jenkins executor for several minutes, so we have the remote script spawn a wait task and then return to Jenkins immediately. This creates another problem in that the output of the remote script is lost because when it completes it has no connection to Jenkins anymore. We get around that by having the remote script write its results into a tmp file, and returning the results of the previous run.
So we have a seriously hacked-up solution that actually works fine for our purposes.
We updated Jenkins from 1.492 to 1.565 and the problem went away. Jobs now start within a few seconds of the expected time.

Resources