Spring Cloud Skipper errors out immediately after start on local MicroK8s - spring-cloud-dataflow

I'm trying to deploy the entire Spring Cloud Data Flow platform to a MicroK8s cluster running on one of our server, a VM with Ubuntu 20.04. Before starting performing actions on the target server, I tried to deploy it on my local computer (same OS) and I even succeeded and created/run one stream. Nevertheless, I am currently experiencing an error both on my local computer and on the VM, and I can't manage to pinpoint the root cause.
My current situation:
I'm following the official guide for deploying SCDF using kubectl, only difference being that I'm using tag v2.9.4, latest at the time of writing, instead of v2.9.1. I also skipped the configuration of monitoring frameworks, and hence commented the relevant lines in the configuration of SCDF server, as suggested in the docs. Kafka message broker and MySQL database are deployed without issues.
But, after executing kubectl commands to create config map, service and deployment for Skipper, I can see that Skipper pod goes in status "CrashLoopBackOff". Checking the logs of the pod, the only thing I see is that the application is terminated right after it seems to have started:
[...]
2022-04-11 15:00:11.713 INFO 1 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port(s): 7577 (http) with context path ''
2022-04-11 15:00:11.907 INFO 1 --- [ main] o.s.c.s.s.app.SkipperServerApplication : Started SkipperServerApplication in 78.901 seconds (JVM running for 82.435)
2022-04-11 15:00:12.531 INFO 1 --- [ionShutdownHook] o.s.s.s.DefaultStateMachineService : Entering stop sequence, stopping all managed machines
2022-04-11 15:00:12.617 INFO 1 --- [ionShutdownHook] j.LocalContainerEntityManagerFactoryBean : Closing JPA EntityManagerFactory for persistence unit 'default'
2022-04-11 15:00:12.703 INFO 1 --- [ionShutdownHook] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Shutdown initiated...
2022-04-11 15:00:12.799 INFO 1 --- [ionShutdownHook] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Shutdown completed.
Native Memory Tracking:
Total: reserved=961864767, committed=325411903
- Java Heap (reserved=356515840, committed=138334208)
(mmap: reserved=356515840, committed=138334208)
- Class (reserved=269444100, committed=94409732)
(classes #17623)
( instance classes #16455, array classes #1168)
(malloc=3355652 #45645)
(mmap: reserved=266088448, committed=91054080)
( Metadata: )
( reserved=79691776, committed=78340096)
( used=76414680)
( free=1925416)
( waste=0 =0.00%)
( Class space:)
( reserved=186396672, committed=12713984)
( used=11544696)
( free=1169288)
( waste=0 =0.00%)
- Thread (reserved=14794856, committed=1323112)
(thread #14)
(stack: reserved=14729216, committed=1257472)
(malloc=51792 #86)
(arena=13848 #25)
- Code (reserved=255686068, committed=26629556)
(malloc=2053556 #8654)
(mmap: reserved=253632512, committed=24576000)
- GC (reserved=1728178, committed=1019570)
(malloc=560818 #2163)
(mmap: reserved=1167360, committed=458752)
- Compiler (reserved=35543622, committed=35543622)
(malloc=71174 #1162)
(arena=35472448 #19)
- Internal (reserved=432627, committed=432627)
(malloc=399859 #1104)
(mmap: reserved=32768, committed=32768)
- Other (reserved=10248, committed=10248)
(malloc=10248 #3)
- Symbol (reserved=22101496, committed=22101496)
(malloc=19867360 #240000)
(arena=2234136 #1)
- Native Memory Tracking (reserved=4899928, committed=4899928)
(malloc=9656 #122)
(tracking overhead=4890272)
- Arena Chunk (reserved=81808, committed=81808)
(malloc=81808)
- Tracing (reserved=1, committed=1)
(malloc=1 #1)
- Logging (reserved=4572, committed=4572)
(malloc=4572 #192)
- Arguments (reserved=19063, committed=19063)
(malloc=19063 #495)
- Module (reserved=310496, committed=310496)
(malloc=310496 #2710)
- Synchronizer (reserved=283672, committed=283672)
(malloc=283672 #2348)
- Safepoint (reserved=8192, committed=8192)
(mmap: reserved=8192, committed=8192)
No matter how many times the pod is restarted, it always exits at this phase. This is the output of kubectl get all
NAME READY STATUS RESTARTS AGE
pod/kafka-zk-6b6f4976cf-9hjzn 1/1 Running 0 69m
pod/kafka-broker-0 1/1 Running 0 58m
pod/mysql-7c57b4cfdf-njb97 1/1 Running 0 39m
pod/skipper-b46bfd5fd-wrnqv 0/1 CrashLoopBackOff 13 (57s ago) 38m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.152.183.1 <none> 443/TCP 148m
service/kafka-zk ClusterIP 10.152.183.62 <none> 2181/TCP,2888/TCP,3888/TCP 69m
service/kafka-broker ClusterIP None <none> 9092/TCP 69m
service/mysql ClusterIP 10.152.183.139 <none> 3306/TCP 40m
service/skipper LoadBalancer 10.152.183.250 <pending> 80:31955/TCP 38m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/kafka-zk 1/1 1 1 69m
deployment.apps/mysql 1/1 1 1 39m
deployment.apps/skipper 0/1 1 0 38m
NAME DESIRED CURRENT READY AGE
replicaset.apps/kafka-zk-6b6f4976cf 1 1 1 69m
replicaset.apps/mysql-7c57b4cfdf 1 1 1 39m
replicaset.apps/skipper-b46bfd5fd 1 1 0 38m
NAME READY AGE
statefulset.apps/kafka-broker 1/1 69m
What I tried:
Changing the Skipper service type from LoadBalancer to NodePort (I have not enabled metallb so load balancing is not provided), but didn't work;
Changing the port exposed by the container, in the default configuration is port 80, I changed it to 7577 (also in the service configuration), but the error still occurs;
Downgraded to the version 2.8.2 of skipper, the same in the documentation above, the behaviour was exactly the same.
Increasing the logging level by setting logging.level.org.springframework to DEBUG and then to TRACE didn't result in anything useful showing up in the logs, except a cryptic line which I did not found anywhere on google:
[...]
2022-04-11 15:22:38.818 DEBUG 1 --- [ main] o.s.c.c.CompositeCompatibilityVerifier : All conditions are passing
2022-04-11 15:22:39.098 DEBUG 1 --- [ main] ocalVariableTableParameterNameDiscoverer : Cannot find '.class' file for class [class org.springframework.statemachine.boot.autoconfigure.StateMachineAutoConfiguration$StateMachineMonitoringConfiguration$$EnhancerBySpringCGLIB$$b266f314] - unable to determine constructor/method parameter names
2022-04-11 15:22:39.925 INFO 1 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port(s): 7577 (http) with context path ''
2022-04-11 15:22:40.244 INFO 1 --- [ main] o.s.c.s.s.app.SkipperServerApplication : Started SkipperServerApplication in 76.267 seconds (JVM running for 79.716)
[...]
Can anyone suggest me what to try next, or give me some way to further diagnosticate this issue?

Related

In EKS, Worker pods going offline abruptly with 'hudson.slaves.ChannelPinger$1#onDead: Ping failed. Terminating the channel JNLP4-connect connection'

Our Environment:
Jenkins version - Jenkins 2.319.1
Jenkins Master image : jenkins/jenkins:2.319.1-lts-alpine
Jenkins worker image: jenkins/inbound-agent:4.11-1-alpine
Installed plugins:
Kubernetes - 1.30.6
Kubernetes Client API - 5.4.1
Kubernetes Credentials Plugin - 0.9.0
JAVA version on master: openjdk 11.0.13
JAVA version on Agent/worker : openjdk 11.0.14
Hi team,
We are facing issue in jenkins where jenkins agent disconnects(or goes offline) from master while still job is running on agent/worker. We are getting below error(highlighted) and tried below things but issue is still not resolving fully. Jenkins is deployed on EKS.
Error:
5334535:2022-11-02 14:07:54.573+0000 [id=140290] INFO hudson.slaves.NodeProvisioner#update: worker-7j4x4 provisioning successfully completed. We have now 2 computer(s)
5334695:2022-11-02 14:07:54.675+0000 [id=140291] INFO o.c.j.p.k.KubernetesLauncher#launch: Created Pod: kubernetes done-jenkins/worker-7j4x4
5334828:2022-11-02 14:07:56.619+0000 [id=140291] INFO o.c.j.p.k.KubernetesLauncher#launch: Pod is running: kubernetes done-jenkins/worker-7j4x4
5334964-2022-11-02 14:07:58.650+0000 [id=140309] INFO h.TcpSlaveAgentListener$ConnectionHandler#run: Accepted JNLP4-connect connection #97 from /100.122.254.111:42648
5335123-2022-11-02 14:09:19.733+0000 [id=140536] INFO hudson.model.AsyncPeriodicWork#lambda$doRun$1: Started DockerContainerWatchdog Asynchronous Periodic Work
5335275-2022-11-02 14:09:19.733+0000 [id=140536] INFO c.n.j.p.d.DockerContainerWatchdog#execute: Docker Container Watchdog has been triggered
5335409-2022-11-02 14:09:19.734+0000 [id=140536] INFO c.n.j.p.d.DockerContainerWatchdog$Statistics#writeStatisticsToLog: Watchdog Statistics: Number of overall executions: 2608, Executions with processing timeout: 0, Containers removed gracefully: 0, Containers removed with force: 0, Containers removal failed: 0, Nodes removed successfully: 0, Nodes removal failed: 0, Container removal average duration (gracefully): 0 ms, Container removal average duration (force): 0 ms, Average overall runtime of watchdog: 0 ms, Average runtime of container retrieval: 0 ms
5335965-2022-11-02 14:09:19.734+0000 [id=140536] INFO c.n.j.p.d.DockerContainerWatchdog#loadNodeMap: We currently have 1 nodes assigned to this Jenkins instance, which we will check
5336139-2022-11-02 14:09:19.734+0000 [id=140536] INFO c.n.j.p.d.DockerContainerWatchdog#execute: Docker Container Watchdog check has been completed
5336279-2022-11-02 14:09:19.734+0000 [id=140536] INFO hudson.model.AsyncPeriodicWork#lambda$doRun$1: Finished DockerContainerWatchdog Asynchronous Periodic Work. 1 ms
5336438-groovy.lang.MissingPropertyException: No such property: envVar for class: groovy.lang.Binding
5336532- at groovy.lang.Binding.getVariable(Binding.java:63)
5336585- at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxInterceptor.onGetProperty(SandboxInterceptor.java:271)
–
5394279-2022-11-02 15:09:19.733+0000 [id=141899] INFO hudson.model.AsyncPeriodicWork#lambda$doRun$1: Started DockerContainerWatchdog Asynchronous Periodic Work
5394431-2022-11-02 15:09:19.734+0000 [id=141899] INFO c.n.j.p.d.DockerContainerWatchdog#execute: Docker Container Watchdog has been triggered
5394565-2022-11-02 15:09:19.734+0000 [id=141899] INFO c.n.j.p.d.DockerContainerWatchdog$Statistics#writeStatisticsToLog: Watchdog Statistics: Number of overall executions: 2620, Executions with processing timeout: 0, Containers removed gracefully: 0, Containers removed with force: 0, Containers removal failed: 0, Nodes removed successfully: 0, Nodes removal failed: 0, Container removal average duration (gracefully): 0 ms, Container removal average duration (force): 0 ms, Average overall runtime of watchdog: 0 ms, Average runtime of container retrieval: 0 ms
5395121-2022-11-02 15:09:19.734+0000 [id=141899] INFO c.n.j.p.d.DockerContainerWatchdog#loadNodeMap: We currently have 3 nodes assigned to this Jenkins instance, which we will check
5395295-2022-11-02 15:09:19.734+0000 [id=141899] INFO c.n.j.p.d.DockerContainerWatchdog#execute: Docker Container Watchdog check has been completed
5395435-2022-11-02 15:09:19.734+0000 [id=141899] INFO hudson.model.AsyncPeriodicWork#lambda$doRun$1: Finished DockerContainerWatchdog Asynchronous Periodic Work. 1 ms
5395594-2022-11-02 15:11:59.502+0000 [id=140320] INFO hudson.slaves.ChannelPinger$1#onDead: Ping failed. Terminating the channel JNLP4-connect connection from ip-100-122-254-111.eu-central-1.compute.internal/100.122.254.111:42648.
5395817-java.util.concurrent.TimeoutException: Ping started at 1667401679501 hasn't completed by 1667401919502
5395920- at hudson.remoting.PingThread.ping(PingThread.java:134)
5395977- at hudson.remoting.PingThread.run(PingThread.java:90)
5396032:2022-11-02 15:11:59.503+0000 [id=141914] INFO j.s.DefaultJnlpSlaveReceiver#channelClosed: Computer.threadPoolForRemoting 5049 for worker-7j4x4 terminated: java.nio.channels.ClosedChannelException
5396231-2022-11-02 15:12:35.579+0000 [id=141933] INFO hudson.model.AsyncPeriodicWork#lambda$doRun$1: Started Periodic background build discarder
5396368-2022-11-02 15:12:36.257+0000 [id=141933] INFO hudson.model.AsyncPeriodicWork#lambda$doRun$1: Finished Periodic background build discarder. 678 ms
5396514-2022-11-02 15:14:15.582+0000 [id=141422] INFO hudson.slaves.ChannelPinger$1#onDead: Ping failed. Terminating the channel JNLP4-connect connection from ip-100-122-237-38.eu-central-1.compute.internal/100.122.237.38:55038.
5396735-java.util.concurrent.TimeoutException: Ping started at 1667401815582 hasn't completed by 1667402055582
5396838- at hudson.remoting.PingThread.ping(PingThread.java:134)
5396895- at hudson.remoting.PingThread.run(PingThread.java:90)
5396950-2022-11-02 15:14:15.584+0000 [id=141915] INFO j.s.DefaultJnlpSlaveReceiver#channelClosed: Computer.threadPoolForRemoting 5050 for worker-fjf1p terminated: java.nio.channels.ClosedChannelException
****5397149-2022-11-02 15:14:19.733+0000 [id=141950] INFO hudson.model.AsyncPeriodicWork#lambda$doRun$1: Started DockerContainerWatchdog Asynchronous Periodic Work
5397301-2022-11-02 15:14:19.733+0000 [id=141950] INFO c.n.j.p.d.DockerContainerWatchdog#execute: Docker Container Watchdog has been triggered
5397435-2022-11-02 15:14:19.734+0000 [id=141950] INFO c.n.j.p.d.DockerContainerWatchdog$Statistics#writeStatisticsToLog: Watchdog Statistics: Number of overall executions: 2621, Executions with processing timeout: 0, Containers removed gracefully: 0, Containers removed with force: 0, Containers removal failed: 0, Nodes removed successfully: 0, Nodes removal failed: 0, Container removal average duration (gracefully): 0 ms, Container removal average duration (force): 0 ms, Average overall runtime of watchdog: 0 ms, Average runtime of container retrieval: 0 ms
Any suggestion or resolutions pls.
Tried below things:
Increased idleMinutes to 180 from default
Verified that resources are sufficient as per graphana dashboard
Changed podRetention to onFailure from Never
Changed podRetention to Always from Never
Increased readTimeout
Increased connectTimeout
Increased slaveConnectTimeoutStr
Disabled the ping thread from UI via disabling “response time" checkbox from preventive node monitroing
Increased activeDeadlineSeconds
Verified same java version on master and agent
Updated kubernetes and kubernetes API client plugins
Expectation is worker/agent should disconnect once job is successfully ran and after idleMinutes defined it should terminate but few times its terminating while job is still running on agent

kubeflow dsl-compile - kfp_server_api.exceptions.ApiException: (400)

Please help understand what is causing the error and how to resolve.
The Kubeflow sdk - error in client.list_experiments() refers to Kubeflow sdk - error in client.list_experiments() #6120 github issue through which the author looks fixed the issue.
I received feedback from the developers (see the closed issue). This is one of the current caveats of multi-user mode (see documentation). This usage is now being supported through #5138.
However, I could not figure out what is exactly the cause and how to fix it. It looks Connecting to Kubeflow Pipelines using the SDK client gives the configurations but not sure how exactly I need to do.
Reproduction steps
Deployed Minikube on a remote instance and setup kubectl connection.
Deployed Kubeflow 1.5.0 by following Install with a single command.
Verify the connection and confirm pods are running from the local laptop.
$ kubectl get pods -n kubeflow
NAME READY STATUS RESTARTS AGE
admission-webhook-deployment-7df7558c67-drdzw 1/1 Running 5 2d18h
cache-deployer-deployment-6f4bcc969-8kpm6 2/2 Running 15 2d18h
cache-server-575d97c95-k7rv4 2/2 Running 10 2d18h
centraldashboard-5dd4f57bbd-gcxn5 2/2 Running 10 2d18h
jupyter-web-app-deployment-5886974887-8c2cf 1/1 Running 5 2d18h
katib-controller-58ddb4b856-mzq2l 1/1 Running 46 2d18h
katib-db-manager-6df878f5b8-c9dhr 1/1 Running 8 2d18h
katib-mysql-6dcb447c6f-lz5b8 1/1 Running 5 2d18h
katib-ui-f787b9d88-8h25n 1/1 Running 6 2d18h
kfserving-controller-manager-0 2/2 Running 50 2d18h
kfserving-models-web-app-7884f597cf-m9n59 2/2 Running 10 2d18h
kserve-models-web-app-5c64c8d8bb-bpdsb 2/2 Running 10 2d18h
kubeflow-pipelines-profile-controller-84bcbdb899-669hr 1/1 Running 5 2d18h
metacontroller-0 1/1 Running 6 2d18h
metadata-envoy-deployment-7b847ff6c5-d2fjv 1/1 Running 5 2d18h
metadata-grpc-deployment-6f6f7776c5-2vqp6 2/2 Running 21 2d18h
metadata-writer-78fc7d5bb8-q8hfq 2/2 Running 11 2d18h
minio-5b65df66c9-fttpm 2/2 Running 10 2d18h
ml-pipeline-75b5c59d7f-k7mm7 2/2 Running 59 2d18h
ml-pipeline-persistenceagent-87b6888c4-swv8k 2/2 Running 10 2d18h
ml-pipeline-scheduledworkflow-665847bb9-4b5vr 2/2 Running 10 2d18h
ml-pipeline-ui-68cc764f66-892rz 2/2 Running 14 2d18h
ml-pipeline-viewer-crd-68777557fb-6lq88 2/2 Running 16 2d18h
ml-pipeline-visualizationserver-58ccb76855-qz2rc 2/2 Running 12 2d18h
mysql-f7b9b7dd4-2dpqv 2/2 Running 10 2d18h
notebook-controller-deployment-6c5f5d6cfc-mxmzw 2/2 Running 17 2d18h
profiles-deployment-5cdc5dc577-szhjk 3/3 Running 61 2d18h
tensorboard-controller-controller-manager-5cbddb7fb5-xgq2v 3/3 Running 21 2d18h
tensorboards-web-app-deployment-7c5db448d7-t8xqp 1/1 Running 5 2d18h
training-operator-7b8cc9865d-qr8hm 1/1 Running 7 2d18h
volumes-web-app-deployment-87484c848-qvsnc 1/1 Running 5 2d18h
workflow-controller-6bf87db995-snfdn 2/2 Running 20 2d18h
Installed kubeflow SDK in the local laptop.
$ pip list | grep kfp
kfp 1.8.12
kfp-pipeline-spec 0.1.15
kfp-server-api 1.8.1
Connected to the kubeflow pipeline as per Connecting to Kubeflow Pipelines using the SDK client
kubectl port-forward svc/ml-pipeline-ui 3000:80 --namespace kubeflow\
Verified the pipeline ui appears as in the document.
You can verify that port forwarding is working properly by visiting http://localhost:3000 in your browser. If port forwarding is working properly, the Kubeflow Pipelines UI appears.
Run the code.
import kfp
client = kfp.Client(host='http://localhost:3000', namespace='kubeflow')
print(client.list_experiments(namespace='kubeflow'))
Got the error.
Traceback (most recent call last):
File "connect_kubeflow_pipeline.py", line 8, in <module>
print(client.list_experiments(namespace='kubeflow'))
File "/Users/1245095/venv/ml/lib/python3.7/site-packages/kfp/_client.py", line 540, in list_experiments
filter=filter)
File "/Users/1245095/venv/ml/lib/python3.7/site-packages/kfp_server_api/api/experiment_service_api.py", line 567, in list_experiment
return self.list_experiment_with_http_info(**kwargs) # noqa: E501
File "/Users/1245095/venv/ml/lib/python3.7/site-packages/kfp_server_api/api/experiment_service_api.py", line 682, in list_experiment_with_http_info
collection_formats=collection_formats)
File "/Users/1245095/venv/ml/lib/python3.7/site-packages/kfp_server_api/api_client.py", line 369, in call_api
_preload_content, _request_timeout, _host)
File "/Users/1245095/venv/ml/lib/python3.7/site-packages/kfp_server_api/api_client.py", line 188, in __call_api
raise e
File "/Users/1245095/venv/ml/lib/python3.7/site-packages/kfp_server_api/api_client.py", line 185, in __call_api
_request_timeout=_request_timeout)
File "/Users/1245095/venv/ml/lib/python3.7/site-packages/kfp_server_api/api_client.py", line 393, in request
headers=headers)
File "/Users/1245095/venv/ml/lib/python3.7/site-packages/kfp_server_api/rest.py", line 234, in GET
query_params=query_params)
File "/Users/1245095/venv/ml/lib/python3.7/site-packages/kfp_server_api/rest.py", line 224, in request
raise ApiException(http_resp=r)
kfp_server_api.exceptions.ApiException: (500)
Reason: Internal Server Error
HTTP response headers: HTTPHeaderDict({'X-Powered-By': 'Express', 'content-type': 'application/json', 'date': 'Tue, 24 May 2022 04:58:42 GMT', 'x-envoy-upstream-service-time': '2', 'server': 'envoy', 'connection': 'close', 'transfer-encoding': 'chunked'})
HTTP response body: {"error":"Internal error: Unauthenticated: Request header error: there is no user identity header.: Request header error: there is no user identity header.\nFailed to authorize with API resource references\ngithub.com/kubeflow/pipelines/backend/src/common/util.Wrap\n\t/go/src/github.com/kubeflow/pipelines/backend/src/common/util/error.go:279\ngithub.com/kubeflow/pipelines/backend/src/apiserver/server.(*ExperimentServer).canAccessExperiment\n\t/go/src/github.com/kubeflow/pipelines/backend/src/apiserver/server/experiment_server.go:249\ngithub.com/kubeflow/pipelines/backend/src/apiserver/server.(*ExperimentServer).ListExperiment\n\t/go/src/github.com/kubeflow/pipelines/backend/src/apiserver/server/experiment_server.go:148\ngithub.com/kubeflow/pipelines/backend/api/go_client._ExperimentService_ListExperiment_Handler.func1\n\t/go/src/github.com/kubeflow/pipelines/backend/api/go_client/experiment.pb.go:1089\nmain.apiServerInterceptor\n\t/go/src/github.com/kubeflow/pipelines/backend/src/apiserver/interceptor.go:30\ngithub.com/kubeflow/pipelines/backend/api/go_client._ExperimentService_ListExperiment_Handler\n\t/go/src/github.com/kubeflow/pipelines/backend/api/go_client/experiment.pb.go:1091\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc#v1.38.0/server.go:1286\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc#v1.38.0/server.go:1609\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc#v1.38.0/server.go:934\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357\nFailed to authorize with API resource references\ngithub.com/kubeflow/pipelines/backend/src/common/util.Wrap\n\t/go/src/github.com/kubeflow/pipelines/backend/src/common/util/error.go:279\ngithub.com/kubeflow/pipelines/backend/src/apiserver/server.(*ExperimentServer).ListExperiment\n\t/go/src/github.com/kubeflow/pipelines/backend/src/apiserver/server/experiment_server.go:150\ngithub.com/kubeflow/pipelines/backend/api/go_client._ExperimentService_ListExperiment_Handler.func1\n\t/go/src/github.com/kubeflow/pipelines/backend/api/go_client/experiment.pb.go:1089\nmain.apiServerInterceptor\n\t/go/src/github.com/kubeflow/pipelines/backend/src/apiserver/interceptor.go:30\ngithub.com/kubeflow/pipelines/backend/api/go_client._ExperimentService_ListExperiment_Handler\n\t/go/src/github.com/kubeflow/pipelines/backend/api/go_client/experiment.pb.go:1091\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc#v1.38.0/server.go:1286\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc#v1.38.0/server.go:1609\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc#v1.38.0/server.go:934\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357","code":13,"message":"Internal error: Unauthenticated: Request header error: there is no user identity header.: Request header error: there is no user identity header.\nFailed to authorize with API resource references\ngithub.com/kubeflow/pipelines/backend/src/common/util.Wrap\n\t/go/src/github.com/kubeflow/pipelines/backend/src/common/util/error.go:279\ngithub.com/kubeflow/pipelines/backend/src/apiserver/server.(*ExperimentServer).canAccessExperiment\n\t/go/src/github.com/kubeflow/pipelines/backend/src/apiserver/server/experiment_server.go:249\ngithub.com/kubeflow/pipelines/backend/src/apiserver/server.(*ExperimentServer).ListExperiment\n\t/go/src/github.com/kubeflow/pipelines/backend/src/apiserver/server/experiment_server.go:148\ngithub.com/kubeflow/pipelines/backend/api/go_client._ExperimentService_ListExperiment_Handler.func1\n\t/go/src/github.com/kubeflow/pipelines/backend/api/go_client/experiment.pb.go:1089\nmain.apiServerInterceptor\n\t/go/src/github.com/kubeflow/pipelines/backend/src/apiserver/interceptor.go:30\ngithub.com/kubeflow/pipelines/backend/api/go_client._ExperimentService_ListExperiment_Handler\n\t/go/src/github.com/kubeflow/pipelines/backend/api/go_client/experiment.pb.go:1091\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc#v1.38.0/server.go:1286\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc#v1.38.0/server.go:1609\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc#v1.38.0/server.go:934\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357\nFailed to authorize with API resource references\ngithub.com/kubeflow/pipelines/backend/src/common/util.Wrap\n\t/go/src/github.com/kubeflow/pipelines/backend/src/common/util/error.go:279\ngithub.com/kubeflow/pipelines/backend/src/apiserver/server.(*ExperimentServer).ListExperiment\n\t/go/src/github.com/kubeflow/pipelines/backend/src/apiserver/server/experiment_server.go:150\ngithub.com/kubeflow/pipelines/backend/api/go_client._ExperimentService_ListExperiment_Handler.func1\n\t/go/src/github.com/kubeflow/pipelines/backend/api/go_client/experiment.pb.go:1089\nmain.apiServerInterceptor\n\t/go/src/github.com/kubeflow/pipelines/backend/src/apiserver/interceptor.go:30\ngithub.com/kubeflow/pipelines/backend/api/go_client._ExperimentService_ListExperiment_Handler\n\t/go/src/github.com/kubeflow/pipelines/backend/api/go_client/experiment.pb.go:1091\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc#v1.38.0/server.go:1286\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc#v1.38.0/server.go:1609\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc#v1.38.0/server.go:934\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357","details":[{"#type":"type.googleapis.com/api.Error","error_message":"Internal error: Unauthenticated: Request header error: there is no user identity header.: Request header error: there is no user identity header.\nFailed to authorize with API resource references\ngithub.com/kubeflow/pipelines/backend/src/common/util.Wrap\n\t/go/src/github.com/kubeflow/pipelines/backend/src/common/util/error.go:279\ngithub.com/kubeflow/pipelines/backend/src/apiserver/server.(*ExperimentServer).canAccessExperiment\n\t/go/src/github.com/kubeflow/pipelines/backend/src/apiserver/server/experiment_server.go:249\ngithub.com/kubeflow/pipelines/backend/src/apiserver/server.(*ExperimentServer).ListExperiment\n\t/go/src/github.com/kubeflow/pipelines/backend/src/apiserver/server/experiment_server.go:148\ngithub.com/kubeflow/pipelines/backend/api/go_client._ExperimentService_ListExperiment_Handler.func1\n\t/go/src/github.com/kubeflow/pipelines/backend/api/go_client/experiment.pb.go:1089\nmain.apiServerInterceptor\n\t/go/src/github.com/kubeflow/pipelines/backend/src/apiserver/interceptor.go:30\ngithub.com/kubeflow/pipelines/backend/api/go_client._ExperimentService_ListExperiment_Handler\n\t/go/src/github.com/kubeflow/pipelines/backend/api/go_client/experiment.pb.go:1091\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc#v1.38.0/server.go:1286\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc#v1.38.0/server.go:1609\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc#v1.38.0/server.go:934\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357\nFailed to authorize with API resource references\ngithub.com/kubeflow/pipelines/backend/src/common/util.Wrap\n\t/go/src/github.com/kubeflow/pipelines/backend/src/common/util/error.go:279\ngithub.com/kubeflow/pipelines/backend/src/apiserver/server.(*ExperimentServer).ListExperiment\n\t/go/src/github.com/kubeflow/pipelines/backend/src/apiserver/server/experiment_server.go:150\ngithub.com/kubeflow/pipelines/backend/api/go_client._ExperimentService_ListExperiment_Handler.func1\n\t/go/src/github.com/kubeflow/pipelines/backend/api/go_client/experiment.pb.go:1089\nmain.apiServerInterceptor\n\t/go/src/github.com/kubeflow/pipelines/backend/src/apiserver/interceptor.go:30\ngithub.com/kubeflow/pipelines/backend/api/go_client._ExperimentService_ListExperiment_Handler\n\t/go/src/github.com/kubeflow/pipelines/backend/api/go_client/experiment.pb.go:1091\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc#v1.38.0/server.go:1286\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc#v1.38.0/server.go:1609\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc#v1.38.0/server.go:934\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357","error_details":"Internal error: Unauthenticated: Request header error: there is no user identity header.: Request header error: there is no user identity header.\nFailed to authorize with API resource references\ngithub.com/kubeflow/pipelines/backend/src/common/util.Wrap\n\t/go/src/github.com/kubeflow/pipelines/backend/src/common/util/error.go:279\ngithub.com/kubeflow/pipelines/backend/src/apiserver/server.(*ExperimentServer).canAccessExperiment\n\t/go/src/github.com/kubeflow/pipelines/backend/src/apiserver/server/experiment_server.go:249\ngithub.com/kubeflow/pipelines/backend/src/apiserver/server.(*ExperimentServer).ListExperiment\n\t/go/src/github.com/kubeflow/pipelines/backend/src/apiserver/server/experiment_server.go:148\ngithub.com/kubeflow/pipelines/backend/api/go_client._ExperimentService_ListExperiment_Handler.func1\n\t/go/src/github.com/kubeflow/pipelines/backend/api/go_client/experiment.pb.go:1089\nmain.apiServerInterceptor\n\t/go/src/github.com/kubeflow/pipelines/backend/src/apiserver/interceptor.go:30\ngithub.com/kubeflow/pipelines/backend/api/go_client._ExperimentService_ListExperiment_Handler\n\t/go/src/github.com/kubeflow/pipelines/backend/api/go_client/experiment.pb.go:1091\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc#v1.38.0/server.go:1286\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc#v1.38.0/server.go:1609\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc#v1.38.0/server.go:934\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357\nFailed to authorize with API resource references\ngithub.com/kubeflow/pipelines/backend/src/common/util.Wrap\n\t/go/src/github.com/kubeflow/pipelines/backend/src/common/util/error.go:279\ngithub.com/kubeflow/pipelines/backend/src/apiserver/server.(*ExperimentServer).ListExperiment\n\t/go/src/github.com/kubeflow/pipelines/backend/src/apiserver/server/experiment_server.go:150\ngithub.com/kubeflow/pipelines/backend/api/go_client._ExperimentService_ListExperiment_Handler.func1\n\t/go/src/github.com/kubeflow/pipelines/backend/api/go_client/experiment.pb.go:1089\nmain.apiServerInterceptor\n\t/go/src/github.com/kubeflow/pipelines/backend/src/apiserver/interceptor.go:30\ngithub.com/kubeflow/pipelines/backend/api/go_client._ExperimentService_ListExperiment_Handler\n\t/go/src/github.com/kubeflow/pipelines/backend/api/go_client/experiment.pb.go:1091\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc#v1.38.0/server.go:1286\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc#v1.38.0/server.go:1609\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc#v1.38.0/server.go:934\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357"}]}
Found How to do programmatic authentication with Dex? #140. By modifying the code as per the solution provided there, it worked.
import requests
import kfp
import kfp.dsl as dsl
from kfp.components import create_component_from_func
# Does not work
#import kfp
#client = kfp.Client(host='http://localhost:3000', namespace='kubeflow')
#print(client.list_experiments(namespace='kubeflow'))
# --------------------------------------------------------------------------------
# https://github.com/kubeflow/kfctl/issues/140#issuecomment-719894529
# How to do programmatic authentication with Dex? #140
# --------------------------------------------------------------------------------
HOST = "http://localhost:8080/"
USERNAME = "user#example.com"
PASSWORD = "12341234"
NAMESPACE = "kubeflow-user-example-com"
session = requests.Session()
response = session.get(HOST)
headers = {
"Content-Type": "application/x-www-form-urlencoded",
}
data = {"login": USERNAME, "password": PASSWORD}
session.post(response.url, headers=headers, data=data)
session_cookie = session.cookies.get_dict()["authservice_session"]
client = kfp.Client(
host=f"{HOST}/pipeline",
cookies=f"authservice_session={session_cookie}",
namespace=NAMESPACE,
)
print(client.list_pipelines())
def add(a: float, b: float) -> float:
'''Calculates sum of two arguments'''
return a + b
add_op = create_component_from_func(
add, output_component_file='add_component.yaml')
#dsl.pipeline(
name='Addition pipeline',
description='An example pipeline that performs addition calculations.'
)
def add_pipeline(
a='1',
b='7',
):
# Passes a pipeline parameter and a constant value to the `add_op` factory
# function.
first_add_task = add_op(a, 4)
# Passes an output reference from `first_add_task` and a pipeline parameter
# to the `add_op` factory function. For operations with a single return
# value, the output reference can be accessed as `task.output` or
# `task.outputs['output_name']`.
second_add_task = add_op(first_add_task.output, b)
# Specify argument values for your pipeline run.
arguments = {'a': '7', 'b': '8'}
# Create a pipeline run, using the client you initialized in a prior step.
#client.create_run_from_pipeline_func(add_pipeline, arguments=arguments)
kfp.compiler.Compiler().compile(
pipeline_func=add_pipeline,
package_path='pipeline.yaml')

SCDF server failing to start with the k8s version 1.20

We recently upgraded to k8s version 1.20.9 and not sure if that is the root cause but SCDF server pod fails to come up with the error below.
I usually deploy scdf server using kubectl based deployment.
Anyone has any idea ? Attached error below.
2022-01-05 05:08:56.207 INFO 1 --- [ main]
o.a.coyote.http11.Http11NioProtocol : Starting ProtocolHandler
["http-nio-80"] 2022-01-05 05:08:56.300 WARN 1 --- [ main]
ConfigServletWebServerApplicationContext : Exception encountered
during context initialization - cancelling refresh attempt:
org.springframework.context.ApplicationContextException: Failed to
start bean 'webServerStartStop'; nested exception is
org.springframework.boot.web.server.WebServerException: Unable to
start embedded Tomcat server 2022-01-05 05:08:56.798 INFO 1 --- [
main] j.LocalContainerEntityManagerFactoryBean : Closing JPA
EntityManagerFactory for persistence unit 'default' 2022-01-05
05:08:56.893 INFO 1 --- [ main]
com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Shutdown
initiated... 2022-01-05 05:08:57.194 INFO 1 --- [ main]
com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Shutdown
completed. 2022-01-05 05:08:57.197 INFO 1 --- [ main]
o.a.coyote.http11.Http11NioProtocol : Pausing ProtocolHandler
["http-nio-80"] 2022-01-05 05:08:57.197 INFO 1 --- [ main]
o.apache.catalina.core.StandardService : Stopping service [Tomcat]
2022-01-05 05:08:57.292 INFO 1 --- [ main]
o.a.coyote.http11.Http11NioProtocol : Stopping ProtocolHandler
["http-nio-80"] 2022-01-05 05:08:57.293 INFO 1 --- [ main]
o.a.coyote.http11.Http11NioProtocol : Destroying ProtocolHandler
["http-nio-80"] 2022-01-05 05:08:57.793 ERROR 1 --- [ main]
o.s.boot.SpringApplication : Application run failed
org.springframework.context.ApplicationContextException: Failed to
start bean 'webServerStartStop'; nested exception is
org.springframework.boot.web.server.WebServerException: Unable to
start embedded Tomcat server Caused by:
org.springframework.boot.web.server.WebServerException: Unable to
start embedded Tomcat server Caused by:
java.lang.IllegalArgumentException:
standardService.connector.startFailed Caused by:
org.apache.catalina.LifecycleException: Protocol handler start failed
Caused by: java.net.SocketException: Permission denied
What stands out in the trace is SocketException: permission denied It is likely due to some security configuration change in the upgrade affecting the TCP layer. I would start with your security configuration. Keep us posted.

Jenkins agent jobs killed when running Spring Boot Tests with DB

I am running a self-hosted OKD 4 cluster with minimum production requirements (3 control planes and two compute nodes). This setup includes a Jenkins installation - installed via Helm (https://www.jenkins.io/doc/book/installing/kubernetes/) So far everything worked fine: builds start automatically when changes are pushed to Github and when they are successful are deployed to the same cluster where Jenkins runs in.
But currently I am facing the problem that when a build job executes a Spring Boot test which fires up a persistence context. The build agent (a jdk-11 image, see additionalAgent configuration below) gets killed as soon as Spring starts up the persistence context. Downloading dependencies and compilation works fine, btw.
additionalAgents:
jdk-11:
podName: jdk-11
customJenkinsLabels: jdk-11
image: jenkins/jnlp-agent-jdk11
tag: latest
...
When the tests are disabled the job runs fine. But as soon as the persistence gets initialised the agent gets killed.
Those are the configurations I have tried for the test:
Starting with an in-memory h2 database and flyway provisioning.
Without flyway provisioning.
Even without the database connection string set.
The time where the job gets killed is almost the same:
For 1. it is
2021-10-20 22:44:06.637 INFO 299 --- [ main] .s.d.r.c.RepositoryConfigurationDelegate : Bootstrapping Spring Data JPA repositories in DEFAULT mode.
2021-10-20 22:44:07.032 INFO 299 --- [ main] .s.d.r.c.RepositoryConfigurationDelegate : Finished Spring Data repository scanning in 310 ms. Found 2 JPA repository interfaces.
2021-10-20 22:44:08.240 INFO 299 --- [ main] o.s.cloud.context.scope.GenericScope : BeanFactory id=1c9e8306-7514-338e-8a9f-3cfba5c1169b
2021-10-20 22:44:10.527 INFO 299 --- [ main] o.f.c.internal.license.VersionPrinter : Flyway Community Edition 7.7.3 by Redgate
2021-10-20 22:44:10.532 INFO 299 --- [ main] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Starting...
2021-10-20 22:44:11.744 INFO 299 --- [ main] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Start completed.
2021-10-20 22:44:12.041 INFO 299 --- [ main] o.f.c.i.database.base.DatabaseType : Database: jdbc:h2:mem:testdb (H2 1.4)
Killed
For 2.
2021-10-21 19:50:51.604 INFO 306 --- [ main] .s.d.r.c.RepositoryConfigurationDelegate : Bootstrapping Spring Data JPA repositories in DEFAULT mode.
2021-10-21 19:50:52.005 INFO 306 --- [ main] .s.d.r.c.RepositoryConfigurationDelegate : Finished Spring Data repository scanning in 391 ms. Found 2 JPA repository interfaces.
2021-10-21 19:50:53.510 INFO 306 --- [ main] o.s.cloud.context.scope.GenericScope : BeanFactory id=0fd77ef3-b5a2-35cb-b157-6d27c0cfe9a5
2021-10-21 19:50:56.405 INFO 306 --- [ main] o.hibernate.jpa.internal.util.LogHelper : HHH000204: Processing PersistenceUnitInfo [name: default]
2021-10-21 19:50:56.708 INFO 306 --- [ main] org.hibernate.Version : HHH000412: Hibernate ORM core version 5.4.32.Final
2021-10-21 19:50:57.503 INFO 306 --- [ main] o.hibernate.annotations.common.Version : HCANN000001: Hibernate Commons Annotations {5.1.2.Final}
Killed
And for 3.
2021-10-21 22:02:48.810 INFO 309 --- [ main] .s.d.r.c.RepositoryConfigurationDelegate : Bootstrapping Spring Data JPA repositories in DEFAULT mode.
2021-10-21 22:02:49.198 INFO 309 --- [ main] .s.d.r.c.RepositoryConfigurationDelegate : Finished Spring Data repository scanning in 380 ms. Found 2 JPA repository interfaces.
2021-10-21 22:02:50.509 INFO 309 --- [ main] o.s.cloud.context.scope.GenericScope : BeanFactory id=0fd77ef3-b5a2-35cb-b157-6d27c0cfe9a5
2021-10-21 22:02:53.523 INFO 309 --- [ main] o.hibernate.jpa.internal.util.LogHelper : HHH000204: Processing PersistenceUnitInfo [name: default]
2021-10-21 22:02:53.898 INFO 309 --- [ main] org.hibernate.Version : HHH000412: Hibernate ORM core version 5.4.32.Final
Killed
The log of the Jenkins pod just states
Terminated Kubernetes instance for agent jenkins/jdk-11-bjtz5
Disconnected computer jdk-11-bjtz5
2021-10-21 22:02:57.342+0000 [id=465] INFO o.c.j.p.k.KubernetesSlave#deleteSlavePod: Terminated Kubernetes instance for agent jenkins/jdk-11-bjtz5
2021-10-21 22:02:57.342+0000 [id=465] INFO o.c.j.p.k.KubernetesSlave#_terminate: Disconnected computer jdk-11-bjtz5
2021-10-21 22:02:57.356+0000 [id=436] INFO j.s.DefaultJnlpSlaveReceiver#channelClosed: Computer.threadPoolForRemoting [#56] for jdk-11-bjtz5 terminated: java.nio.channels.ClosedChannelException
In all cases there are no exceptions, stacktraces or suspicious events. And these steps are reproducable - when I run the build with the same configuration again the agents gets killed at exactly the same step in the test.
The setup:
Jenkins Version 2.303.2
Jenkins uses a MySQL database running in the same cluster
all Jenkins plugins are up-to-date
OKD currently running at version 4.8.0-0.okd-2021-10-10-030117
currently there are no resource quotas set and the system still has plenty of free resources
I am presuming that a little bit of configuration is missing to make this work. But I just cannot find what it could be. So I am asking: have had anyone the same issue here? Or any guesses what the missing part could be?
When there is some information missing please point it out and I will add it.
After a little step back I took a look into the actual pod which runs the build. And found out that the memory limit of the agents was the problem.
So increasing the limit solved the problem!
I have done that by modifying the local jenkins-values.yaml and updated the limits section of the agent: block in it.
A little confusing for me was the fact that no log entry stated the exceed of memory usage.
Next thought is that I will set a memory limit for the test step via Java options to kill the maven process before the pod exceeds the limit. Guess it would be more transparent in the build.
And as a sidenote: the limit has been set to 512Mi previously and has been exceeded by ~10 MiB -.-
Luckily for me I have found it at this point and the other build-jobs where running fine was just due to the lack of resource-usage (haven't figured that just starting Hibernate would exceed the 512 MiB mark)

PassengerTempDir issue ERROR: Phusion Passenger doesn't seem to be running

I have trouble while trying to set PassengerTempDir:
I have redmine on apache+mod_passenger, CentOs. In redmine I have 500 Internal Error while uploading files. Due to my issue I have found that I should change PassengerTempDir to custom folder.
wrong permissions set in the webserver_private directory of passenger.
6.6 PassengerTempDir
For tests I have create folder /home/tmp_passenger and set 777 for it.
And the next I proceed export PASSENGER_TMPDIR=/tmp_passenger
Result is :
passenger-status
ERROR: Phusion Passenger doesn't seem to be running.
So, before I have proceed export PASSENGER_TMPDIR=/tmp_passenger
passenger-status
Version : 4.0.53
Date : 2014-12-29 12:43:36 +0100
Instance: 1416
----------- General information -----------
Max pool size : 20
Processes : 1
Requests in top-level queue : 0
----------- Application groups -----------
/home/admin/web/MYDOMAIN/public_html/redmine#default:
App root: /home/admin/web/MYDOMAIN/public_html/redmine
Requests in queue: 0
* PID: 6440 Sessions: 0 Processed: 8 Uptime: 22m 1s
CPU: 0% Memory : 52M Last used: 10m 29s ago
After I have proceed export PASSENGER_TMPDIR=/tmp_passenger
passenger-status
ERROR: Phusion Passenger doesn't seem to be running.
Please, help me resolve issue. What I should do now ?

Resources