Apache Beam Dataflow Jobs started failing with: Workflow failed - google-cloud-dataflow

I've been running batch jobs for over a week now with DataflowRunner without a problem but all of a sudden starting from today the jobs started failing with the error message below. The workers don't seem to start and there's no log in stackdriver at all.
Anything I'm missing here?
Dataflow SDK version: 2.0.0
Submitted job: 2017-08-29_09_43_20-9537473353894635176
2017-08-29 16:44:24 ERROR MonitoringUtil$LoggingHandler:101 - 2017-08-29T16:44:22.277Z: (54a5da9d57fd266d): Workflow failed.
EDIT:
If I remove --zone=europe-west2-b from the batch run it works which indicates that there might be something wrong with this zone.

I took a look at your job. It failed because it couldn't get quota to bring up the workers. Likely you do not have quota in that zone. This error is not handled back correctly, but it should be fixed in the next release.

Related

Jenkins Builds - PhantomJS was not killed in 2000 ms, sending SIGKILL

We have started seeing intermittent issues on our Jenkins build server when running jasmine unit tests with Karma.
We see the following error:
PhantomJS was not killed in 2000 ms, sending SIGKILL
This is usually goes away the next time we run the build, and we may not see the issue again for a couple of days.
We don't see this when running tests locally, so I'm wondering what could be different about our Jenkins environment that could cause this.
Can anyone offer any suggestions please?
Thanks

Concourse CI will not run Hello-World after setting up with the official docker image. It fails showing "no workers"

I recently set up concourse CI using the following docker-compose: https://concourse-ci.org/docker-repository.html and then tried the flight school training here: https://concourse-ci.org/flight-school.html and then, when that failed showing "no workers", I attempted the hello-world here: https://concourse-ci.org/hello-world.html.
I keep seeing an error saying "no workers". If I had to guess, this is because of a simple configuration issue on my end, but I am having trouble tracking it down.
Can someone please help me figure out how to debug this, I do not see errors in the docker startup logs. Searching for the problem is showing seemingly unrelated errors.
I talked with the team and it appears it was a race condition in v3.1 that I was running. Updating to v3.2 fixed the issues and everything is running smoothly.

Tensorflow not building

I followed the instructions given here to set-up my machine to run SyntaxNet. I have installed all the required software and ensured the versions are the same as the instructions. But when I run the bazel tests using command bazel test --linkopt=-headerpad_max_install_names syntaxnet/... util/utf8/... on my Mac OS, it fails every time. I'm getting the following error message
Sending SIGTERM to previous Bazel server (pid=42104)... Sending SIGKILL to previous Bazel server process group (pid=42104)... Error: SIGKILL unsuccessful after 10s: Operation not permitted
Not sure what's going wrong. Kindly advice

please wait while jenkins is restarting- waiting long

I updated some plugins and restarted the jenkins but now it says:
Please wait while Jenkins is restarting
Your browser will reload automatically when Jenkins is ready.
It is taking too much time (waiting from last 40 minutes). I have only 1 project with around 20 builds. I have restarted jenkins many times and worked fine but now it stucks.
Is there any way out to kill/suspend jenkins to avoid this wait?
I had a very similar issue when using jenkins build-in restart function. To fix it I killed the service (with crossed fingers), but somehow it kept serving the "Please wait" page. I guess it is served by a separate thread, but since i could not see any running java or jenkins processes i restarted the server to stop it.
After reboot jenkins worked but it was not updated. To make it work it I ran the update again and restarted the jenkins service manually - it took less than a minute and worked just fine...
Jenkins seems to have a number of bugs related to restarting, and at least one unresolved: jenkins issue
Windows ONLY....
All the solutions here didn't work and restarting the server was not an option. If you are in the same situation.
I had to kill java.exe and restart the jenkins service. After I did this Jenkins reloaded several times and then went back to normal.
I was stuck on the jenkins restarting page for 10-ish minutes untill I did this.
Hope this helps.
Running this in the command line helped me:
service jenkins restart
I had a similar issue when updating plugins from the pluging update page and I marked the restart jenkins options. jenkins only showed the waiting message for a long time.
I solved the issue restoring .bak to .jpi files of the the plugins that I tried to update.
I did the follow in my jenkins
cd $JENKINS_HOME/plugins/
>sudo mv git.bak git.jpi
.
. (more plugins files)
.
>sudo mv ldap.bak ldap.jpi
>sudo /sbin/service jenkins restart
Check Event Viewer.
I found that my Java died.
Faulting application java.exe, version 7.0.250.17, time stamp 0x51c4b3fd, faulting module ntdll.dll, version 6.0.6002.18541, time stamp 0x4ec3e39f, exception code 0xc0000374, fault offset 0x000abc4f, process id 0x1188, application start time 0x01cee4f42968bc81.
Finally I found that it's Jenkins 1.540 problem. Don't use it.
https://issues.jenkins-ci.org/browse/JENKINS-20630
I faced the same issue after upgrading some plugins on Windows. Looking on jenkins.err.log it displayed this error
Exception in thread "main" java.io.IOException: Jenkins has failed to create a temporary file in C:\Users\builder\AppData\Local\Temp\
at Main.extractFromJar(Main.java:350)
at Main._main(Main.java:194)
at Main.main(Main.java:91)
Caused by: java.io.IOException: There is not enough space on the disk
at java.io.WinNTFileSystem.createFileExclusively(Native Method)
at java.io.File.createTempFile(Unknown Source)
at Main.extractFromJar(Main.java:347)
... 2 more
The problem was that the TEMP folder of the jenkins user had lots of temporary files. After cleaning that folder jenkins restarted correctly.
just performed a restart on the server. That fixed the issue !
In Command prompt execute this
C:\>service jenkins restart
Or
You can go for Service currently running in your machine( Win + R ) seach for Jenkins and Click on restart
For me, the cause seemed to be having lots of old job build logs hanging around. To clean them up, I ran:
cd $JENKINS_HOME/jobs
find -name 'builds' | xargs -n 1 bash -c 'rm -rf $0/[1-9]*'
Then I stopped and started Jenkins again, and it came up within a minute.
Credit to: https://stackoverflow.com/a/39230597/2255242
This is an old thread.. but my personal recommendation is to WAIT before attempting to do anything (such as restarting service, etc).
I wasted hours once trying to fix something that turned out to be not an issue in the first place. In the end, I messed things up and wasted a lot of time.
Just because you see errors in the logs doesn't necessarily mean that you need to take action.
The upgrade took about 45 minutes in the end for me. All i did at one point was refreshing my browser window. It can take a while.
Just my opinion
On Win 10: Stopping with the service command from the command line reported failure to stop the service, but I was able to stop it from services.msc (running as administrator). The updates were applied. Sorry, no definitive answer from me. YMMV.
I used TCPView and killed process that was using port 8080. BAsically it was all Java.exe from Jenkins. Killed all processes and restarted Jenkins Service
try to restart that inside windows services console, it will work
I have observed the same issue after installing a plugin and opting to restart the jenkins when no jobs are running.
When I looked at the jenkins server process, it was running fine and no issues.
On restarting the jenkins service using the below command and reloading the browser, Jenkins was up.
sudo service jenkins restart
If Jenkins is taking an unusually long time to restart the best recourse is to check the generated logs to see what may be wrong. However, even that may be of little help because many plugins try to be "quiet" by default, even if they are furiously working to load content. So if all else fails, you may have to resort to manually disabling plugins.
However here is a free tip: Some plugins are known to be messy. For example the Job Config History plugin we observed to write hundreds of thousands of records for both job configuration changes AND agent changes. Removing this plugin, and deleting the configHistory folder fixed one problem where our startup literally took > 4 hours.
In our case, the problem was we were launching ephemeral agents (via docker and/or kubernetes). Each new "agent" was treated as a configuration change. With thousands of agents per day, it didn't take long to fill up a substantial part of the disk with history that never was effectively cleared.
There are other plugins that leak data in this way. And you can also create self-inflicted wounds, e.g. by using a standalone process to remove "obsolete" files. An example where we were "bitten" is a process that tried to discard old build records, but did an incomplete job - and was "warring" with the running Jenkins process. Jenkins will try breaking its neck to load a build.xml record that is empty or incomplete.
Three more tips:
You can install the monitoring plugin. Often when the jenkins UI proper didn't start, we were able to see the /monitoring in action.
Likewise, /userContent can often be loaded even when the rest of the UI is not fully up.
Don't rule out bad actors. It just takes one aggressive script that tries, e.g. to load the entire build history and ship it back via a REST call to effectively deny service to all other UI users.
I try to fix a file named hudson.model.UpdateCenter.xml located /var/lib/jenkins
I change the URL to https://mirrors.tuna.tsinghua.edu.cn/jenkins/updates/update-center.json
Finally, restart Jenkins. it solves my problem

jenkins hang up: Please wait while Jenkins is getting ready to work

I was trying to run jenkins in my server, but always I have the same message, and wait and wait and nothing. In the official website they report about this problem, but I wanted to ask if someone know how to fix it, any idea?
because for me was a little tricky take the solution, I'll post my own answer, now its working:
Finally I ran jenkins by hand, by using the war with the next command in linux, not by using the services ("service jenkins start"):
java -jar /usr/lib/jenkins/jenkins.war --httport=8081 --aj13port=8081 --prefix=/jenkins

Resources