I am trying to add a new independent pipeline to running job, and it keeps failing. Also it fails quite ungracefully, sitting in a "-" state for a long time before moving to "not started" and then failed with no errors reported.
I suspect that this may be impossible, but I cannot find confirmation anywhere.
Job id for the latest attempt is 2016-02-02_02_08_52-5813673121185917804 (it's still sitting at not started)
Update: It should now be possible to add a PubSub source when updating a pipeline.
Original Response:
Thanks for reporting this problem. We have identified an issue when updating a pipeline with additional PubSub sources.
For the time being we suggest either running a new (separate) pipeline with the additional PubSub source, or stopping and starting the job over with the additional sources installed from the beginning.
Related
My dataflow batch job not end in 5 hours. still canceling.
Im running this type of job in scheduler every 10 min.
normally, it is finished in 10 min.
but it tooks over 5 hour!
My job is
2018-08-26_13_30_17-1172470278423820020
Error log is here
Stackdriver
2018-08-27 (06:33:14) Workflow failed. Causes: The Dataflow job appears to be stuck because no worker activity has been se...
2018-08-27 (08:34:08) Workflow failed. Causes: The Dataflow job appears to be stuck because no worker activity has been se...
2018-08-27 (10:34:58) Workflow failed. Causes: The Dataflow job appears to be stuck because no worker activity has been se...
Workflow failed. Causes: The Dataflow job appears to be stuck because no worker activity has been seen in the last 1h. You can get help with Cloud Dataflow at https://cloud.google.com/dataflow/support.
Generally The Dataflow job appears to be stuck because no worker activity has been seen in the last 1h can be caused by too long setup progress. Just increase worker resources (via --machine_type parameter) to overcome the issue.
In my case I was installing several dependencies that required building wheels (pystan, fbprophet) and it took more than an hour on the minimal machine (n1-standard-1 with 1 vCPU and 3.75GB RAM). Using more powerful instance (n1-standard-4 which has 4 times more resources) solved my problem.
It is possibly not the case if there is single job stuck but may help someone else getting the same error.
Such situation can happen in three major cases:
a) Some task takes more than an hour to process.
b) Some task got stuck processing.
This is generally caused by transforms that take too long to process, or enter blocking state.
Best way to debug is to check for previous logs and see if there were any errors, or unexpected state. It seems that you have tried to re-run the job and it still failed. In this case you can add extra logs to the step that got stuck and see which data it got stuck at.
c) There is failure on Apache Beam / Dataflow side.
This is a rare case. Please, create support ticket if you believe this is the issue.
I just recovered from this issue.
As mentioned in other responses, this error is due to long setup progress. This is probably the incapability of pip to resolve the dependencies in the setup.py file. It does not fail; it tries in vain "forever" to find suitable versions leading to a timeout.
Recreate the environment locally using the necessary packages.
Mark the versions that satisfy the dependencies.
Define these versions explicitly in the setup.py.
I don't know why this worked but updating the package dependencies in setup.py got the job to run.
dependencies = [
'apache-beam[gcp] == 2.25.0',
'google-cloud-storage == 1.32.0',
'mysql-connector-python == 8.0.22',
]
I have a pipeline running, triggered by several gerrit review hooks.
e.g.
branch-v1.0
branch-v2.0
normally i receive my verifies accordingly to the result of the appropriate job run. E.g. run finished successfully with passed tests, i get the verified+1 back in my gerrit system.
My problem:
If there is running a job for verifying my gerrit change, a newer "verify job" of another change or patch, is always canceling the current running job. It doesn't matter whether the change comes from a different branch or not. Also no difference if the new change has something to do with the current one. The current running change is always superseded.
in the console:
In this case the job A canceled an older B and later A was canceled by a newer job C
Canceling older #3128
Waiting for builds [3126]
Canceled since #3130 got here
So, does anybody know how to avoid the canceling of the current running job?
I wanted to use the Multi-Branch pipeline (but i really do not know if this helps), but the gerrit plug-in is currently not supported by the Multi-Branch pipeline or the blue ocean project. As far as i know.
https://issues.jenkins-ci.org/browse/JENKINS-38046
There is a new gerrit plug-in in development, but there is no information when this will be available (or is 'production ready'). See the following comment in the issue.
lucamilanesio added a comment - 2017-08-18 15:40
Thanks for your support!
Recently, in our enterprise production setup, it seems someone has tried to setup a new job / test definition by using another (copying) from identical job. However, (s)he seems to have NOT saved (and probably, am guessing here, closed the browser with the session being lost).
But the new job got saved though it was not set to stable or active; we knew about this because changes uploaded to gerrit, started failing in this newly setup partial job (because, these changes were in certain repos that met certain TDD settings).
Question: Jenkins system does not have trace of who setup the system in 'configure versions' option. Is there anyway to know the details of who setup the job / when was that done ?
No, Jenkins does not store that information by default.
If your Jenkins instance happen to be running behind an Apache or Nginx web server, there might be access logs that can help you. To find out when the job was created you could look at when its config.xml file was created/modified.
However, there are a few plugins that can add this functionality so that you won't have this problem again:
JobConfigHistory Plugin – Tracks changes in your job configurations and gives the ability to restore old versions.
Audit Trail Plugin – Keeps a log of who performed particular Jenkins operations, such as configuring jobs.
Our pipeline failed, but the graph in the developers console still shows it as been successful. However, the results were not written to BigQuery, and the job log clearly shows it failed.
Shouldn't the graph show it as failed too?
Another example:
Another example:
This was a bug with the handling of BigQuery outputs that was fixed in a release of the Dataflow service. Thank you for your patience.
We've inherited a set of Jenkins builds. They all seem to start 90-100 seconds after the desired time. For example, a build with a schedule of */5 * * * * starts at :01:37, :06.29, :11:43, etc., instead of :00, :05, :10, etc. that I would expect.
There are a few builds set to run at /5, but they are all delayed and anyway only last a few seconds each.
I see a global 'quiet period' setting of 5.
The system as a whole does not seem busy. There are usually idle executors, and often nothing at all is building.
For most of the builds this isn't a concern, but there are a couple that we would like to make as precise as possible.
Is my expectation wrong? Is there a config option I'm missing? I should add that I am new to Jenkins and may be missing something obvious.
Thanks
We did not find the cause of Jenkins jobs starting late. We hacked up a workaround by having Jenkins start a script on the remote server that sleeps until the desired time. This creates a new problem by tying up a Jenkins executor for several minutes, so we have the remote script spawn a wait task and then return to Jenkins immediately. This creates another problem in that the output of the remote script is lost because when it completes it has no connection to Jenkins anymore. We get around that by having the remote script write its results into a tmp file, and returning the results of the previous run.
So we have a seriously hacked-up solution that actually works fine for our purposes.
We updated Jenkins from 1.492 to 1.565 and the problem went away. Jobs now start within a few seconds of the expected time.