Gerrit replication hangs indefinitely - gerrit

I'm having trouble with Gerrit replication plugin. I'm trying to replicate repository to Gitlab over HTTPS. Most important configuration:
etc/replication.config
[gerrit]
replicateOnStartup = true
[remote "gitlab-mirror"]
url = https://<name.surname>:<password>#gitlab.domain/<Name.Surname>/${name}.git
push = +refs/heads/*:refs/heads/*
push = +refs/tags/*:refs/tags/*
mirror = true
projects = hello-world
rescheduleDelay = 15
Repository on Gitlab side does exists under: https://gitlab.domain/<Name.Surname>/hello-world
I even cloned repository from Gerrit, add another remote to gitlab called mirror and pushed to it without hassle:
git clone ssh://admin#gerrit.domain:29418/hello-world
git remote add mirror https://<name.surname>:<password>#gitlab.domain/<Name.Surname>/hello-world.git
git push -u mirror --all
I'm scheduling replication as follows:
ssh -p 29418 gerrit.domain replication start
Which produce following log:
gerrit | [2020-03-23 22:01:40,019 +0000] 6c533415 [sshd-SshDaemon[33060020](port=22)-nio2-thread-1] admin a/1000000 LOGIN FROM 172.64.1.1
gerrit | [2020-03-23 22:01:40,071 +0000] 6c533415 [SSH replication start (admin)] admin a/1000000 replication.start 7ms 1ms 0
gerrit | [2020-03-23 22:01:40,102 +0000] 6c533415 [sshd-SshDaemon[33060020](port=22)-nio2-thread-2] admin a/1000000 LOGOUT
But then when process take place I got following Stack Trace:
gerrit | [2020-03-23 22:02:04,660] [ReplicateTo-gitlab-mirror-1] ERROR com.googlesource.gerrit.plugins.replication.ReplicationTasksStorage : Error while deleting task d44f53430eda0b204ca13da6aab17c2173531c94
gerrit | java.nio.file.NoSuchFileException: /srv/gerrit/data/replication/ref-updates/running/d44f53430eda0b204ca13da6aab17c2173531c94
gerrit | at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
gerrit | at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
gerrit | at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
gerrit | at sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:244)
gerrit | at sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103)
gerrit | at java.nio.file.Files.delete(Files.java:1126)
gerrit | at com.googlesource.gerrit.plugins.replication.ReplicationTasksStorage$Task.finish(ReplicationTasksStorage.java:232)
gerrit | at com.googlesource.gerrit.plugins.replication.ReplicationTasksStorage.finish(ReplicationTasksStorage.java:130)
gerrit | at com.googlesource.gerrit.plugins.replication.Destination.notifyFinished(Destination.java:574)
gerrit | at com.googlesource.gerrit.plugins.replication.PushOne.runPushOperation(PushOne.java:413)
gerrit | at com.googlesource.gerrit.plugins.replication.PushOne.lambda$run$0(PushOne.java:300)
gerrit | at com.google.gerrit.server.util.RequestScopePropagator.lambda$cleanup$1(RequestScopePropagator.java:182)
gerrit | at com.google.gerrit.server.util.RequestScopePropagator.lambda$context$0(RequestScopePropagator.java:170)
gerrit | at com.google.gerrit.server.git.PerThreadRequestScope$Propagator.lambda$scope$0(PerThreadRequestScope.java:70)
gerrit | at com.googlesource.gerrit.plugins.replication.PushOne.run(PushOne.java:303)
gerrit | at com.google.gerrit.server.logging.LoggingContextAwareRunnable.run(LoggingContextAwareRunnable.java:87)
gerrit | at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
gerrit | at java.util.concurrent.FutureTask.run(FutureTask.java:266)
gerrit | at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
gerrit | at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
gerrit | at com.google.gerrit.server.git.WorkQueue$Task.run(WorkQueue.java:610)
gerrit | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
gerrit | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
gerrit | at java.lang.Thread.run(Thread.java:748)
This is how data directory for replication looks like (I think whole time):
gerrit:/srv/gerrit$ find data/replication/
data/replication/
data/replication/ref-updates
data/replication/ref-updates/running
data/replication/ref-updates/building
data/replication/ref-updates/waiting
data/replication/ref-updates/waiting/50d5b9f61203cdd9223f21c21de7174f58a89bd3
data/replication/ref-updates/waiting/d44f53430eda0b204ca13da6aab17c2173531c94
Yep Gerrit tries to delete task from running (I have no idea why?) but task is in waiting. Gitlab repository does not get changes which is biggest problem.
I also tried queue replication event as follows, but that blocks indefinitely, until CTRL+C:
ssh -p 29418 gerrit.domain replication start --wait
Any idea what I'm missing or what more I could look for?

Related

Getting Internal Server Error on prisma deploy

I have a Postgres database on Heroku, upon deploying the data model by doing prisma deploy often times the following error is produced.
ERROR: Whoops. Looks like an internal server error. Search your server logs for request ID: local:cjxrmcnpx00hq0692zuwttqwv
{
"data": {
"addProject": null
},
"errors": [
{
"message": "Whoops. Looks like an internal server error. Search your server logs for request ID: local:cjxrmcnpx00hq0692zuwttqwv",
"path": [
"addProject"
],
"locations": [
{
"line": 2,
"column": 9
}
],
"requestId": "local:cjxrmcnpx00hq0692zuwttqwv"
}
],
"status": 200
}
and on checking the Docker logs I am seeing this erorr:
Jul 14, 2019 12:18:34 PM org.postgresql.Driver connect
prisma_1 | SEVERE: Connection error:
prisma_1 | org.postgresql.util.PSQLException: FATAL: too many connections for role "bcueventxumaik"
prisma_1 | at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2433)
prisma_1 | at org.postgresql.core.v3.QueryExecutorImpl.readStartupMessages(QueryExecutorImpl.java:2566)
prisma_1 | at org.postgresql.core.v3.QueryExecutorImpl.<init>(QueryExecutorImpl.java:131)
prisma_1 | at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:210)
prisma_1 | at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:49)
prisma_1 | at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:195)
prisma_1 | at org.postgresql.Driver.makeConnection(Driver.java:452)
prisma_1 | at org.postgresql.Driver.connect(Driver.java:254)
prisma_1 | at slick.jdbc.DriverDataSource.getConnection(DriverDataSource.scala:101)
prisma_1 | at slick.jdbc.DataSourceJdbcDataSource.createConnection(JdbcDataSource.scala:68)
prisma_1 | at slick.jdbc.JdbcBackend$BaseSession.<init>(JdbcBackend.scala:453)
prisma_1 | at slick.jdbc.JdbcBackend$DatabaseDef.createSession(JdbcBackend.scala:46)
prisma_1 | at slick.jdbc.JdbcBackend$DatabaseDef.createSession(JdbcBackend.scala:37)
prisma_1 | at slick.basic.BasicBackend$DatabaseDef.acquireSession(BasicBackend.scala:249)
prisma_1 | at slick.basic.BasicBackend$DatabaseDef.acquireSession$(BasicBackend.scala:248)
prisma_1 | at slick.jdbc.JdbcBackend$DatabaseDef.acquireSession(JdbcBackend.scala:37)
prisma_1 | at slick.basic.BasicBackend$DatabaseDef$$anon$2.run(BasicBackend.scala:274)
prisma_1 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
prisma_1 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
prisma_1 | at java.lang.Thread.run(Thread.java:748)
prisma_1 |
prisma_1 | Exception in thread "main" org.postgresql.util.PSQLException: FATAL: too many connections
prisma_1 | at org.postgresql.core.v3.QueryExecutorImpl.readStartupMessages(QueryExecutorImpl.java:2566)prisma_1 | at org.postgresql.core.v3.QueryExecutorImpl.<init>(QueryExecutorImpl.java:131)prisma_1 | at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:210)
prisma_1 | at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:49)prisma_1 | at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:195)
prisma_1 | at org.postgresql.Driver.makeConnection(Driver.java:452)
prisma_1 | at org.postgresql.Driver.connect(Driver.java:254)prisma_1 | at slick.jdbc.DriverDataSource.getConnection(DriverDataSource.scala:101)
prisma_1 | at slick.jdbc.DataSourceJdbcDataSource.createConnection(JdbcDataSource.scala:68)
prisma_1 | at slick.jdbc.JdbcBackend$BaseSession.<init>(JdbcBackend.scala:453)
prisma_1 | at slick.jdbc.JdbcBackend$DatabaseDef.createSession(JdbcBackend.scala:46)
prisma_1 | at slick.jdbc.JdbcBackend$DatabaseDef.createSession(JdbcBackend.scala:37)
prisma_1 | at slick.basic.BasicBackend$DatabaseDef.acquireSession(BasicBackend.scala:249)
prisma_1 | at slick.basic.BasicBackend$DatabaseDef.acquireSession$(BasicBackend.scala:248)
prisma_1 | at slick.jdbc.JdbcBackend$DatabaseDef.acquireSession(JdbcBackend.scala:37)
prisma_1 | at slick.basic.BasicBackend$DatabaseDef$$anon$2.run(BasicBackend.scala:274)
prisma_1 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
prisma_1 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
prisma_1 | at java.lang.Thread.run(Thread.java:748)
prisma_prisma_1 exited with code 1
The error is saying too many connections but I am firing prisma deploy from only one terminal and at the same time I am able to connect to the database using PgAdmin4. Moreover, the database seems to be perfectly reachable as I am able to ping the database from inside the container.
PS. Updated the docker logs as earlier on running docker logs -f processid I was getting older logs but now upon building the container again using docker-compose up I got the lastest logs
As the error clearly states there are too many connections to the database. So we need to investigate how many connections there are, who is creating them and why they are created. In order to either limit the consumers or increase the amount of available connections.
First we can start by using the heroku CLI to check the amount of used and available connections:
$ heroku pg:info
=== DATABASE_URL
Plan: Private 2
Status: Available
HA Status: Available
Data Size: 2.23 GB
Tables: 83
PG Version: 10.1
Connections: 26/400
Connection Pooling: Available
For more information on how to investigate heroku postgres databases see: https://devcenter.heroku.com/articles/heroku-postgresql#pg-info
To further investigate who is connected to your database you can either use psql or pgAdmin. If using pgAdmin you can select the database, click on the dashboard tab and select the server activity panel on the bottom of the page revealing all connected sessions. If using psql you could write a select like this:
SELECT pid as process_id,
usename as username,
datname as database_name,
client_addr as client_address,
application_name,
backend_start,
state,
FROM pg_stat_activity;
For a more detailed how to see: https://dataedo.com/kb/query/postgresql/list-database-sessions
By now you probably identified who is creating the connections to your database and can limit the client to use less (or increase the amount of available database connections).
One possible consumer of database connections is the prisma server itself of course. The prisma config luckily provides a setting to limit database connections.
The connectionLimit property in PRISMA_CONFIG determines the number of
database connections a Prisma service is going to use.
You can read more about it here: https://www.prisma.io/docs/prisma-server/database-connector-POSTGRES-jgfr/#managing-database-connections
If you are using heroku to run the docker container with your prisma server a PRISMA_CONFIG could look like this:
port: $PORT
managementApiSecret: ${PRISMA_MANAGEMENT_API_SECRET}
databases:
default:
connector: postgres
migrations: true
connectionLimit: 2
uri: ${DATABASE_URL}?ssl=1
I hope this structured approach helped. Let me know if you need more clarification. If so please provide details regarding the nature of the existing database connections.
run this command
docker logs <YOUR_PRISMA_CONTAINER_NAME>
use pooling:
import dotenv from 'dotenv'
dotenv.config()
import { PrismaClient } from '#prisma/client'
// add prisma to the NodeJS global type
interface CustomNodeJsGlobal extends NodeJS.Global {
prisma: PrismaClient
}
// Prevent multiple instances of Prisma Client in development
declare const global: CustomNodeJsGlobal
const prisma = global.prisma || new PrismaClient()
if (process.env.NODE_ENV === 'development') global.prisma = prisma
export default prisma
plus use:
await prisma.$disconnect()

Github webhook no longer triggering Jenkins builds

We've recently noticed our Jenkins builds stopped getting triggered automatically. After further investigation there were numerous issues.
Github webhooks were unsuccessful with Github reporting "Couldn't connect to the server" in Github's webhook configuration UI. I can confirm our ELB and EC2 instance hosting Jenkins is live and healthy. No DNS changes have been made here.
Jenkins logs report various failures:
Invalid credentials despite having a valid Jenkins username & password credentials (password is a personal API token):
There is no credentials with admin access to manage hooks on GitHubRepositoryName[host=github.com,username=REDACTED,repository=REDACTED]
Failed to delete post-commit hook:
ALPN callback dropped: SPDY and HTTP/2 are disabled. Is alpn-boot on the boot class path?
Apr 12, 2019 6:15:43 PM WARNING org.jenkinsci.plugins.github.webhook.WebhookManager$2 applyNullSafe
Failed to add GitHub webhook for GitHubRepositoryName[host=github.com,username=REDACTED,repository=REDACTED]
java.io.FileNotFoundException: https://api.github.com/repos/REDACTED/REDACTED/hooks/101704125
at com.squareup.okhttp.internal.huc.HttpURLConnectionImpl.getInputStream(HttpURLConnectionImpl.java:243)
at com.squareup.okhttp.internal.huc.DelegatingHttpsURLConnection.getInputStream(DelegatingHttpsURLConnection.java:210)
at com.squareup.okhttp.internal.huc.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:25)
at org.kohsuke.github.Requester.parse(Requester.java:617)
at org.kohsuke.github.Requester.parse(Requester.java:599)
at org.kohsuke.github.Requester._to(Requester.java:277)
Caused: org.kohsuke.github.GHFileNotFoundException: {"message":"Not Found","documentation_url":"https://developer.github.com/v3/repos/hooks/#delete-a-hook"}
at org.kohsuke.github.Requester.handleApiError(Requester.java:691)
at org.kohsuke.github.Requester._to(Requester.java:298)
at org.kohsuke.github.Requester.to(Requester.java:239)
at org.kohsuke.github.Requester.to(Requester.java:227)
at org.kohsuke.github.GHHook.delete(GHHook.java:56)
at org.jenkinsci.plugins.github.webhook.WebhookManager$10.applyNullSafe(WebhookManager.java:344)
Caused: org.kohsuke.github.GHException: Failed to delete post-commit hook
at org.jenkinsci.plugins.github.webhook.WebhookManager$10.applyNullSafe(WebhookManager.java:347)
at org.jenkinsci.plugins.github.webhook.WebhookManager$10.applyNullSafe(WebhookManager.java:341)
at org.jenkinsci.plugins.github.util.misc.NullSafePredicate.apply(NullSafePredicate.java:19)
at com.google.common.collect.Iterators$7.computeNext(Iterators.java:649)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at com.google.common.collect.Iterators$7.computeNext(Iterators.java:647)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at com.google.common.collect.Lists.newArrayList(Lists.java:138)
at com.google.common.collect.Lists.newArrayList(Lists.java:119)
at org.jenkinsci.plugins.github.util.FluentIterableWrapper.toList(FluentIterableWrapper.java:147)
at org.jenkinsci.plugins.github.webhook.WebhookManager$2.applyNullSafe(WebhookManager.java:202)
at org.jenkinsci.plugins.github.webhook.WebhookManager$2.applyNullSafe(WebhookManager.java:175)
at org.jenkinsci.plugins.github.util.misc.NullSafeFunction.apply(NullSafeFunction.java:18)
at com.google.common.collect.Iterators$8.next(Iterators.java:812)
at com.google.common.collect.Iterators$7.computeNext(Iterators.java:648)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at com.google.common.collect.Iterators$7.computeNext(Iterators.java:647)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at com.google.common.collect.Lists.newArrayList(Lists.java:138)
at com.google.common.collect.Lists.newArrayList(Lists.java:119)
at org.jenkinsci.plugins.github.util.FluentIterableWrapper.toList(FluentIterableWrapper.java:147)
at org.jenkinsci.plugins.github.webhook.WebhookManager$1.run(WebhookManager.java:127)
at hudson.util.SequentialExecutionQueue$QueueEntry.run(SequentialExecutionQueue.java:119)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Failed to create hook:
Apr 12, 2019 6:15:44 PM WARNING org.jenkinsci.plugins.github.webhook.WebhookManager$2 applyNullSafe
Failed to add GitHub webhook for GitHubRepositoryName[host=github.com,username=REDACTED,repository=REDACTED]
java.io.FileNotFoundException: https://api.github.com/repos/REDACTED/REDACTED/hooks
at com.squareup.okhttp.internal.huc.HttpURLConnectionImpl.getInputStream(HttpURLConnectionImpl.java:243)
at com.squareup.okhttp.internal.huc.DelegatingHttpsURLConnection.getInputStream(DelegatingHttpsURLConnection.java:210)
at com.squareup.okhttp.internal.huc.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:25)
at org.kohsuke.github.Requester.parse(Requester.java:617)
at org.kohsuke.github.Requester.parse(Requester.java:599)
at org.kohsuke.github.Requester._to(Requester.java:277)
Caused: org.kohsuke.github.GHFileNotFoundException: {"message":"Validation Failed","errors":[{"resource":"Hook","code":"custom","message":"Hook already exists on this repository"}],"documentation_url":"https://developer.github.com/v3/repos/hooks/#create-a-hook"}
at org.kohsuke.github.Requester.handleApiError(Requester.java:691)
at org.kohsuke.github.Requester._to(Requester.java:298)
at org.kohsuke.github.Requester.to(Requester.java:239)
at org.kohsuke.github.GHHooks$Context.createHook(GHHooks.java:49)
at org.kohsuke.github.GHRepository.createHook(GHRepository.java:1206)
at org.jenkinsci.plugins.github.webhook.WebhookManager$9.applyNullSafe(WebhookManager.java:329)
Caused: org.kohsuke.github.GHException: Failed to create hook
at org.jenkinsci.plugins.github.webhook.WebhookManager$9.applyNullSafe(WebhookManager.java:331)
at org.jenkinsci.plugins.github.webhook.WebhookManager$9.applyNullSafe(WebhookManager.java:316)
at org.jenkinsci.plugins.github.util.misc.NullSafeFunction.apply(NullSafeFunction.java:18)
at org.jenkinsci.plugins.github.webhook.WebhookManager$2.applyNullSafe(WebhookManager.java:204)
at org.jenkinsci.plugins.github.webhook.WebhookManager$2.applyNullSafe(WebhookManager.java:175)
at org.jenkinsci.plugins.github.util.misc.NullSafeFunction.apply(NullSafeFunction.java:18)
at com.google.common.collect.Iterators$8.next(Iterators.java:812)
at com.google.common.collect.Iterators$7.computeNext(Iterators.java:648)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at com.google.common.collect.Iterators$7.computeNext(Iterators.java:647)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at com.google.common.collect.Lists.newArrayList(Lists.java:138)
at com.google.common.collect.Lists.newArrayList(Lists.java:119)
at org.jenkinsci.plugins.github.util.FluentIterableWrapper.toList(FluentIterableWrapper.java:147)
at org.jenkinsci.plugins.github.webhook.WebhookManager$1.run(WebhookManager.java:127)
at hudson.util.SequentialExecutionQueue$QueueEntry.run(SequentialExecutionQueue.java:119)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
It's odd because I also see positive logs such as:
GitHub webhooks activated for job REDACTED_ORG_NAME/REDACTED_REPO with [GitHubRepositoryName[host=github.com,username=REDACTED,repository=REDACTED_REPO]] (events: [PULL_REQUEST, PUSH])
Apr 12, 2019 6:15:43 PM INFO org.jenkinsci.plugins.github.webhook.WebhookManager$1 run
GitHub webhooks activated for job REDACTED_ORG_NAME/REDACTED_REPO/develop with [GitHubRepositoryName[host=github.com,username=REDACTED,repository=REDACTED_REPO]] (events: [PULL_REQUEST, PUSH])
Apr 12, 2019 6:15:43 PM INFO org.jenkinsci.plugins.github.webhook.WebhookManager$1 run
GitHub webhooks activated for job REDACTED_ORG_NAME/REDACTED_REPO/master with [] (events: [PULL_REQUEST, PUSH])
Apr 12, 2019 6:15:43 PM INFO org.jenkinsci.plugins.github.webhook.WebhookManager$1 run
GitHub webhooks activated for job REDACTED_ORG_NAME/REDACTED_REPO/release%2F0.x with [] (events: [PULL_REQUEST, PUSH]
We've configured the following Jenkins plugins:
Git Plugin
Github Organizations with Github Branch Source with https://support.cloudbees.com/hc/en-us/articles/224543927-GitHub-webhook-configuration
I've followed all the troubleshooting steps at https://support.cloudbees.com/hc/en-us/articles/224621648-GitHub-webhook-troubleshooting but I'm getting no where. Steps that fail:
A.2. I tried redelivering the webhook payload but get Couldn't connect to server
B.4 Displays lists of Failed to delete post-commit hook:, Failed to create hook
, and There is no credentials with admin access to manage hooks on GitHubRepositoryName[host=github.com,username=REDACTED,repository=REDACTED]
What Does Work
Able to scan organization
Able to trigger builds manually by using Build Now
There seems like there are so many issues I have no idea where to look next. Thank you for help in advance.
Looks like our AWS Security Group was white-listing Github Hook IP addresses (https://api.github.com/meta) and Github recently added or changed an IP address that was not in the Security Group. So our AWS Security Group was rejecting requests. We added refreshed the IP addresses and it now works.

Jenkins Pipeline's deleteDir() fails with java.nio.file.AccessDeniedException exception

I have a Jenkins Pipeline job for building my Android project.
The basic steps are to checkout the project's repository, run docker container (map host's repository folder to appropriate folder in container), execute the script within the container and extract the artifacts.
The very first step deletes the workspace using deleteDir() function:
node("jenkins-slaves") {
deleteDir() // <-------------- HERE
stage('checkout repo') {
// REDACTED
}
// REDACTED
}
However, during one of the first attempts to run it, I've received the following error:
[Pipeline] node
Running on jenkins-slave14 in /home/jenkins/workspace/REDACTED
[Pipeline] {
[Pipeline] deleteDir
[Pipeline] }
[Pipeline] // node
[Pipeline] End of Pipeline
java.nio.file.AccessDeniedException: /home/jenkins/workspace/REDACTED/app/all-apk/apks
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:84)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:244)
at sun.nio.fs.AbstractFileSystemProvider.deleteIfExists(AbstractFileSystemProvider.java:108)
at java.nio.file.Files.deleteIfExists(Files.java:1165)
at hudson.Util.tryOnceDeleteFile(Util.java:290)
at hudson.Util.deleteFile(Util.java:245)
at hudson.FilePath.deleteRecursive(FilePath.java:1211)
at hudson.FilePath.deleteContentsRecursive(FilePath.java:1220)
at hudson.FilePath.deleteRecursive(FilePath.java:1202)
at hudson.FilePath.deleteContentsRecursive(FilePath.java:1220)
at hudson.FilePath.deleteRecursive(FilePath.java:1202)
at hudson.FilePath.deleteContentsRecursive(FilePath.java:1220)
at hudson.FilePath.deleteRecursive(FilePath.java:1202)
at hudson.FilePath.access$1000(FilePath.java:197)
at hudson.FilePath$14.invoke(FilePath.java:1181)
at hudson.FilePath$14.invoke(FilePath.java:1178)
at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2750)
at hudson.remoting.UserRequest.perform(UserRequest.java:208)
at hudson.remoting.UserRequest.perform(UserRequest.java:54)
at hudson.remoting.Request$2.run(Request.java:360)
at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused: java.io.IOException: Unable to delete '/home/jenkins/workspace/REDACTED/app/all-apk/apks'. Tried 3 times (of a maximum of 3) waiting 0.1 sec between attempts.
at hudson.Util.deleteFile(Util.java:250)
at hudson.FilePath.deleteRecursive(FilePath.java:1211)
at hudson.FilePath.deleteContentsRecursive(FilePath.java:1220)
at hudson.FilePath.deleteRecursive(FilePath.java:1202)
at hudson.FilePath.deleteContentsRecursive(FilePath.java:1220)
at hudson.FilePath.deleteRecursive(FilePath.java:1202)
at hudson.FilePath.deleteContentsRecursive(FilePath.java:1220)
at hudson.FilePath.deleteRecursive(FilePath.java:1202)
at hudson.FilePath.access$1000(FilePath.java:197)
at hudson.FilePath$14.invoke(FilePath.java:1181)
at hudson.FilePath$14.invoke(FilePath.java:1178)
at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2750)
at hudson.remoting.UserRequest.perform(UserRequest.java:208)
at hudson.remoting.UserRequest.perform(UserRequest.java:54)
at hudson.remoting.Request$2.run(Request.java:360)
at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
at ......remote call to jenkins-slave14(Native Method)
at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1654)
at hudson.remoting.UserResponse.retrieve(UserRequest.java:311)
at hudson.remoting.Channel.call(Channel.java:905)
at hudson.FilePath.act(FilePath.java:987)
Caused: java.io.IOException: remote file operation failed: /home/jenkins/workspace/REDACTED at hudson.remoting.Channel#16d096ce:jenkins-slave14
at hudson.FilePath.act(FilePath.java:994)
at hudson.FilePath.act(FilePath.java:976)
at hudson.FilePath.deleteRecursive(FilePath.java:1178)
at org.jenkinsci.plugins.workflow.steps.DeleteDirStep$Execution.run(DeleteDirStep.java:77)
at org.jenkinsci.plugins.workflow.steps.DeleteDirStep$Execution.run(DeleteDirStep.java:69)
at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution$1$1.call(SynchronousNonBlockingStepExecution.java:49)
at hudson.security.ACL.impersonate(ACL.java:260)
at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution$1.run(SynchronousNonBlockingStepExecution.java:46)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Finished: FAILURE
What is the root cause of this problem and how can I solve that?
Delete error is generally caused because of either insufficient permissions or someone/something locks the file.
I've noticed that the folder that Jenkins unsuccessfully tried to delete was dynamically created in container by the build script. The folder and all the files beneath were created with root/root (user/group).
This helped me to understand the root cause of this problem - the pipeline is running with jenkins/jenkins (user/group) and unable to delete files/folders created with another user (root/root in my case).
The solution I came up with was to create the docker image with same user (belonging to same group) as the host system uses (my container OS is based on Alpine):
RUN addgroup -S -g 6002 jenkins
RUN adduser -S -u 6002 -G jenkins jenkins
USER jenkins
Note that the GID and UID (both equal 6002 in above example) have to match the ids in your host machine. In order to find them, you can use the following commands:
id -u jenkins
id -g jenkins
From documentation:
The USER instruction sets the user name (or UID) and optionally the
user group (or GID) to use when running the image and for any RUN, CMD
and ENTRYPOINT instructions that follow it in the Dockerfile.
Once this step performed, any files/folders that are being created by a build script within the container are using a matching host user - which allows host OS to manipulate/delete them without any issues.

Jenkins - Unexpected executor death

I see all my executors frequently changing to Dead state in one of my Jenkins slave machine(Windows 2008 R2 SP2).
Jenkins ver. 1.651.3
I have restarted Jenkins server as well as the service.
error logs-
Unexpected executor death
java.io.IOException: Failed to create a temporary file in /var/lib/jenkins/jobs/ABCD/jobs/EFGH/jobs/Build
at hudson.util.AtomicFileWriter.<init>(AtomicFileWriter.java:68)
at hudson.util.AtomicFileWriter.<init>(AtomicFileWriter.java:55)
at hudson.util.TextFile.write(TextFile.java:118)
at hudson.model.Job.saveNextBuildNumber(Job.java:293)
at hudson.model.Job.assignBuildNumber(Job.java:351)
at hudson.model.Run.<init>(Run.java:284)
at hudson.model.AbstractBuild.<init>(AbstractBuild.java:167)
at hudson.model.Build.<init>(Build.java:92)
at hudson.model.FreeStyleBuild.<init>(FreeStyleBuild.java:34)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at jenkins.model.lazy.LazyBuildMixIn.newBuild(LazyBuildMixIn.java:175)
at hudson.model.AbstractProject.newBuild(AbstractProject.java:1018)
at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1209)
at hudson.model.AbstractProject.createExecutable(AbstractProject.java:144)
at hudson.model.Executor$1.call(Executor.java:364)
at hudson.model.Executor$1.call(Executor.java:346)
at hudson.model.Queue._withLock(Queue.java:1365)
at hudson.model.Queue.withLock(Queue.java:1230)
at hudson.model.Executor.run(Executor.java:346)
Caused by: java.io.IOException: Permission denied
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.createNewFile(File.java:1006)
at java.io.File.createTempFile(File.java:1989)
at hudson.util.AtomicFileWriter.<init>(AtomicFileWriter.java:66)
... 21 more
I see this error log in my slave machine
INFO: File download attempt 1
Oct 17, 2017 10:32:00 AM com.microsoft.tfs.core.clients.versioncontrol.VersionControlClient downloadFileToStreams
INFO: File download attempt 1
Oct 17, 2017 10:32:00 AM com.microsoft.tfs.core.ws.runtime.client.SOAPService executeSOAPRequestInternal
INFO: SOAP method='UpdateLocalVersion', status=200, content-length=367, server-wait=402 ms, parse=0 ms, total=402 ms, throughput=913 B/s, gzip
Oct 17, 2017 10:32:00 AM com.microsoft.tfs.core.clients.versioncontrol.VersionControlClient downloadFileToStreams
INFO: File download attempt 1
Oct 17, 2017 10:32:00 AM com.microsoft.tfs.core.clients.versioncontrol.VersionControlClient downloadFileToStreams
INFO: File download attempt 1
Oct 17, 2017 10:32:00 AM com.microsoft.tfs.core.clients.versioncontrol.VersionControlClient downloadFileToStreams
INFO: File download attempt 1
Can you please check the owner of the path /var/lib/jenkins/jobs/ABCD/jobs/EFGH/jobs/Build ? By any chance if it is created manually, you will get permission denied error if the owner is not Jenkins. Also check for free disk space on server as well as agent and try rebooting the slave agent. It has helped it at times.
How long are the real job names for ABCD and EFGH?
I've run into the 260 character maximum path length with Jenkins on Windows 2008 R2 before.
The path in:
java.io.IOException: Failed to create a temporary file in /var/lib/jenkins/jobs/ABCD/jobs/EFGH/jobs/Build
with the three /jobs in it seems strange to me. In Jenkins it normally should rather be:
+- /var/lib/jenkins/jobs
+- ABCD
| +- builds
| | +- ...
| +- ...
+- EFGH
| +- builds
| | +- ...
| +- ...
+- Build
+- builds
| +- ...
+- ...
Maybe there's some misconfiguration concerning paths and Jenkins tries a mkdir /var/lib/jenkins/jobs/ABCD/jobs/EFGH/jobs/Build and the Jenkins user or the user under which the job runs doesn't have permissions to do that.
See also File permissions and attributes:
| w | ... | The directory's contents can be modified (create new files or folders; [...]); requires the execute permission to be also set, otherwise this permission has no effect. |
In my situation, this happened because the server was very low on space. Click on "Build Executor Status" from the dashboard and see if there is low disk space or 0 swap space. Try to free up some space. Then restart the Jenkins server / service and try again.

Unable to connect to slave from master. "Invalid encoded sequence encountered:"

I have Box Master and Box Slave in AWS EC2 instances. I created jenkins user in Box Slave and I copied the master's public keys to slave. Now I created a new node in Jenkins Master. However, when I connect to slave using Launch agent via execution of command on master using command ssh -tt jenkins#10.15.0.10, it gives me the following error:
just before slave Services-Slave gets launched ...
executing pre-launch scripts ...
[06/26/17 16:25:28] Launching agent
$ ssh -tt jenkins#10.15.0.10
Welcome to Ubuntu 16.04.2 LTS (GNU/Linux 4.4.0-1020-aws x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage
Get cloud support with Ubuntu Advantage Cloud Guest:
http://www.ubuntu.com/business/services/cloud
5 packages can be updated.
0 updates are security updates.
Last login: Mon Jun 26 20:19:51 2017 from 10.15.0.5
<===[JENKINS REMOTING CAPACITY]===>To run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.
jenkins#ip-10-94-0-63:~$ <===[JENKINS REMOTING CAPACITY]===ERROR: Unable to launch the agent for Services-Slave
java.io.IOException: Invalid encoded sequence encountered: 08 08 08 08
at hudson.remoting.BinarySafeStream$1._read(BinarySafeStream.java:194)
at hudson.remoting.BinarySafeStream$1.read(BinarySafeStream.java:80)
at hudson.remoting.BinarySafeStream$1.read(BinarySafeStream.java:97)
at java.io.FilterInputStream.read(FilterInputStream.java:107)
at hudson.remoting.BinarySafeStream$1._read(BinarySafeStream.java:189)
at hudson.remoting.BinarySafeStream$1.read(BinarySafeStream.java:125)
at java.io.FilterInputStream.read(FilterInputStream.java:107)
at hudson.remoting.BinarySafeStream$1._read(BinarySafeStream.java:189)
at hudson.remoting.BinarySafeStream$1.read(BinarySafeStream.java:125)
at java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2338)
at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2351)
at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3092)
at java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:2892)
at java.io.ObjectInputStream.readUTF(ObjectInputStream.java:1075)
at java.io.ObjectStreamClass.readNonProxy(ObjectStreamClass.java:684)
at java.io.ObjectInputStream.readClassDescriptor(ObjectInputStream.java:833)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1609)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1521)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1781)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
at hudson.remoting.Capability.read(Capability.java:140)
at hudson.remoting.ChannelBuilder.negotiate(ChannelBuilder.java:391)
at hudson.remoting.ChannelBuilder.build(ChannelBuilder.java:310)
at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:389)
at hudson.slaves.CommandLauncher.launch(CommandLauncher.java:132)
at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:262)
at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
If I run the same command from my username from master it is able to ssh successfully. Any idea why this is happening?
I tried giving the .pem file
I also did sudo -u jenkins. Nothing works.
Several things were going wrong here too, while configuring master-node communication. Somewhere it seemed like, master is caching the configurations for nodes. Sometimes removing and adding the node did work!
But in the end, this helped every-time.
https://docs.google.com/document/d/1Qq-EkiUnC5x8BuM4AZWo-yRUQTrkberzz8JfdCM6yuc/edit?pli=1

Resources