kubeflow dsl-compile - kfp_server_api.exceptions.ApiException: (400)

kubeflow dsl-compile - kfp_server_api.exceptions.ApiException: (400) - kubeflow

Please help understand what is causing the error and how to resolve.
The Kubeflow sdk - error in client.list_experiments() refers to Kubeflow sdk - error in client.list_experiments() #6120 github issue through which the author looks fixed the issue.
I received feedback from the developers (see the closed issue). This is one of the current caveats of multi-user mode (see documentation). This usage is now being supported through #5138.
However, I could not figure out what is exactly the cause and how to fix it. It looks Connecting to Kubeflow Pipelines using the SDK client gives the configurations but not sure how exactly I need to do.
Reproduction steps
Deployed Minikube on a remote instance and setup kubectl connection.
Deployed Kubeflow 1.5.0 by following Install with a single command.
Verify the connection and confirm pods are running from the local laptop.
$ kubectl get pods -n kubeflow
NAME READY STATUS RESTARTS AGE
admission-webhook-deployment-7df7558c67-drdzw 1/1 Running 5 2d18h
cache-deployer-deployment-6f4bcc969-8kpm6 2/2 Running 15 2d18h
cache-server-575d97c95-k7rv4 2/2 Running 10 2d18h
centraldashboard-5dd4f57bbd-gcxn5 2/2 Running 10 2d18h
jupyter-web-app-deployment-5886974887-8c2cf 1/1 Running 5 2d18h
katib-controller-58ddb4b856-mzq2l 1/1 Running 46 2d18h
katib-db-manager-6df878f5b8-c9dhr 1/1 Running 8 2d18h
katib-mysql-6dcb447c6f-lz5b8 1/1 Running 5 2d18h
katib-ui-f787b9d88-8h25n 1/1 Running 6 2d18h
kfserving-controller-manager-0 2/2 Running 50 2d18h
kfserving-models-web-app-7884f597cf-m9n59 2/2 Running 10 2d18h
kserve-models-web-app-5c64c8d8bb-bpdsb 2/2 Running 10 2d18h
kubeflow-pipelines-profile-controller-84bcbdb899-669hr 1/1 Running 5 2d18h
metacontroller-0 1/1 Running 6 2d18h
metadata-envoy-deployment-7b847ff6c5-d2fjv 1/1 Running 5 2d18h
metadata-grpc-deployment-6f6f7776c5-2vqp6 2/2 Running 21 2d18h
metadata-writer-78fc7d5bb8-q8hfq 2/2 Running 11 2d18h
minio-5b65df66c9-fttpm 2/2 Running 10 2d18h
ml-pipeline-75b5c59d7f-k7mm7 2/2 Running 59 2d18h
ml-pipeline-persistenceagent-87b6888c4-swv8k 2/2 Running 10 2d18h
ml-pipeline-scheduledworkflow-665847bb9-4b5vr 2/2 Running 10 2d18h
ml-pipeline-ui-68cc764f66-892rz 2/2 Running 14 2d18h
ml-pipeline-viewer-crd-68777557fb-6lq88 2/2 Running 16 2d18h
ml-pipeline-visualizationserver-58ccb76855-qz2rc 2/2 Running 12 2d18h
mysql-f7b9b7dd4-2dpqv 2/2 Running 10 2d18h
notebook-controller-deployment-6c5f5d6cfc-mxmzw 2/2 Running 17 2d18h
profiles-deployment-5cdc5dc577-szhjk 3/3 Running 61 2d18h
tensorboard-controller-controller-manager-5cbddb7fb5-xgq2v 3/3 Running 21 2d18h
tensorboards-web-app-deployment-7c5db448d7-t8xqp 1/1 Running 5 2d18h
training-operator-7b8cc9865d-qr8hm 1/1 Running 7 2d18h
volumes-web-app-deployment-87484c848-qvsnc 1/1 Running 5 2d18h
workflow-controller-6bf87db995-snfdn 2/2 Running 20 2d18h
Installed kubeflow SDK in the local laptop.
$ pip list | grep kfp
kfp 1.8.12
kfp-pipeline-spec 0.1.15
kfp-server-api 1.8.1
Connected to the kubeflow pipeline as per Connecting to Kubeflow Pipelines using the SDK client
kubectl port-forward svc/ml-pipeline-ui 3000:80 --namespace kubeflow\
Verified the pipeline ui appears as in the document.
You can verify that port forwarding is working properly by visiting http://localhost:3000 in your browser. If port forwarding is working properly, the Kubeflow Pipelines UI appears.
Run the code.
import kfp
client = kfp.Client(host='http://localhost:3000', namespace='kubeflow')
print(client.list_experiments(namespace='kubeflow'))
Got the error.
Traceback (most recent call last):
File "connect_kubeflow_pipeline.py", line 8, in <module>
print(client.list_experiments(namespace='kubeflow'))
File "/Users/1245095/venv/ml/lib/python3.7/site-packages/kfp/_client.py", line 540, in list_experiments
filter=filter)
File "/Users/1245095/venv/ml/lib/python3.7/site-packages/kfp_server_api/api/experiment_service_api.py", line 567, in list_experiment
return self.list_experiment_with_http_info(**kwargs) # noqa: E501
File "/Users/1245095/venv/ml/lib/python3.7/site-packages/kfp_server_api/api/experiment_service_api.py", line 682, in list_experiment_with_http_info
collection_formats=collection_formats)
File "/Users/1245095/venv/ml/lib/python3.7/site-packages/kfp_server_api/api_client.py", line 369, in call_api
_preload_content, _request_timeout, _host)
File "/Users/1245095/venv/ml/lib/python3.7/site-packages/kfp_server_api/api_client.py", line 188, in __call_api
raise e
File "/Users/1245095/venv/ml/lib/python3.7/site-packages/kfp_server_api/api_client.py", line 185, in __call_api
_request_timeout=_request_timeout)
File "/Users/1245095/venv/ml/lib/python3.7/site-packages/kfp_server_api/api_client.py", line 393, in request
headers=headers)
File "/Users/1245095/venv/ml/lib/python3.7/site-packages/kfp_server_api/rest.py", line 234, in GET
query_params=query_params)
File "/Users/1245095/venv/ml/lib/python3.7/site-packages/kfp_server_api/rest.py", line 224, in request
raise ApiException(http_resp=r)
kfp_server_api.exceptions.ApiException: (500)
Reason: Internal Server Error
HTTP response headers: HTTPHeaderDict({'X-Powered-By': 'Express', 'content-type': 'application/json', 'date': 'Tue, 24 May 2022 04:58:42 GMT', 'x-envoy-upstream-service-time': '2', 'server': 'envoy', 'connection': 'close', 'transfer-encoding': 'chunked'})
HTTP response body: {"error":"Internal error: Unauthenticated: Request header error: there is no user identity header.: Request header error: there is no user identity header.\nFailed to authorize with API resource references\ngithub.com/kubeflow/pipelines/backend/src/common/util.Wrap\n\t/go/src/github.com/kubeflow/pipelines/backend/src/common/util/error.go:279\ngithub.com/kubeflow/pipelines/backend/src/apiserver/server.(*ExperimentServer).canAccessExperiment\n\t/go/src/github.com/kubeflow/pipelines/backend/src/apiserver/server/experiment_server.go:249\ngithub.com/kubeflow/pipelines/backend/src/apiserver/server.(*ExperimentServer).ListExperiment\n\t/go/src/github.com/kubeflow/pipelines/backend/src/apiserver/server/experiment_server.go:148\ngithub.com/kubeflow/pipelines/backend/api/go_client._ExperimentService_ListExperiment_Handler.func1\n\t/go/src/github.com/kubeflow/pipelines/backend/api/go_client/experiment.pb.go:1089\nmain.apiServerInterceptor\n\t/go/src/github.com/kubeflow/pipelines/backend/src/apiserver/interceptor.go:30\ngithub.com/kubeflow/pipelines/backend/api/go_client._ExperimentService_ListExperiment_Handler\n\t/go/src/github.com/kubeflow/pipelines/backend/api/go_client/experiment.pb.go:1091\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc#v1.38.0/server.go:1286\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc#v1.38.0/server.go:1609\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc#v1.38.0/server.go:934\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357\nFailed to authorize with API resource references\ngithub.com/kubeflow/pipelines/backend/src/common/util.Wrap\n\t/go/src/github.com/kubeflow/pipelines/backend/src/common/util/error.go:279\ngithub.com/kubeflow/pipelines/backend/src/apiserver/server.(*ExperimentServer).ListExperiment\n\t/go/src/github.com/kubeflow/pipelines/backend/src/apiserver/server/experiment_server.go:150\ngithub.com/kubeflow/pipelines/backend/api/go_client._ExperimentService_ListExperiment_Handler.func1\n\t/go/src/github.com/kubeflow/pipelines/backend/api/go_client/experiment.pb.go:1089\nmain.apiServerInterceptor\n\t/go/src/github.com/kubeflow/pipelines/backend/src/apiserver/interceptor.go:30\ngithub.com/kubeflow/pipelines/backend/api/go_client._ExperimentService_ListExperiment_Handler\n\t/go/src/github.com/kubeflow/pipelines/backend/api/go_client/experiment.pb.go:1091\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc#v1.38.0/server.go:1286\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc#v1.38.0/server.go:1609\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc#v1.38.0/server.go:934\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357","code":13,"message":"Internal error: Unauthenticated: Request header error: there is no user identity header.: Request header error: there is no user identity header.\nFailed to authorize with API resource references\ngithub.com/kubeflow/pipelines/backend/src/common/util.Wrap\n\t/go/src/github.com/kubeflow/pipelines/backend/src/common/util/error.go:279\ngithub.com/kubeflow/pipelines/backend/src/apiserver/server.(*ExperimentServer).canAccessExperiment\n\t/go/src/github.com/kubeflow/pipelines/backend/src/apiserver/server/experiment_server.go:249\ngithub.com/kubeflow/pipelines/backend/src/apiserver/server.(*ExperimentServer).ListExperiment\n\t/go/src/github.com/kubeflow/pipelines/backend/src/apiserver/server/experiment_server.go:148\ngithub.com/kubeflow/pipelines/backend/api/go_client._ExperimentService_ListExperiment_Handler.func1\n\t/go/src/github.com/kubeflow/pipelines/backend/api/go_client/experiment.pb.go:1089\nmain.apiServerInterceptor\n\t/go/src/github.com/kubeflow/pipelines/backend/src/apiserver/interceptor.go:30\ngithub.com/kubeflow/pipelines/backend/api/go_client._ExperimentService_ListExperiment_Handler\n\t/go/src/github.com/kubeflow/pipelines/backend/api/go_client/experiment.pb.go:1091\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc#v1.38.0/server.go:1286\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc#v1.38.0/server.go:1609\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc#v1.38.0/server.go:934\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357\nFailed to authorize with API resource references\ngithub.com/kubeflow/pipelines/backend/src/common/util.Wrap\n\t/go/src/github.com/kubeflow/pipelines/backend/src/common/util/error.go:279\ngithub.com/kubeflow/pipelines/backend/src/apiserver/server.(*ExperimentServer).ListExperiment\n\t/go/src/github.com/kubeflow/pipelines/backend/src/apiserver/server/experiment_server.go:150\ngithub.com/kubeflow/pipelines/backend/api/go_client._ExperimentService_ListExperiment_Handler.func1\n\t/go/src/github.com/kubeflow/pipelines/backend/api/go_client/experiment.pb.go:1089\nmain.apiServerInterceptor\n\t/go/src/github.com/kubeflow/pipelines/backend/src/apiserver/interceptor.go:30\ngithub.com/kubeflow/pipelines/backend/api/go_client._ExperimentService_ListExperiment_Handler\n\t/go/src/github.com/kubeflow/pipelines/backend/api/go_client/experiment.pb.go:1091\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc#v1.38.0/server.go:1286\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc#v1.38.0/server.go:1609\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc#v1.38.0/server.go:934\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357","details":[{"#type":"type.googleapis.com/api.Error","error_message":"Internal error: Unauthenticated: Request header error: there is no user identity header.: Request header error: there is no user identity header.\nFailed to authorize with API resource references\ngithub.com/kubeflow/pipelines/backend/src/common/util.Wrap\n\t/go/src/github.com/kubeflow/pipelines/backend/src/common/util/error.go:279\ngithub.com/kubeflow/pipelines/backend/src/apiserver/server.(*ExperimentServer).canAccessExperiment\n\t/go/src/github.com/kubeflow/pipelines/backend/src/apiserver/server/experiment_server.go:249\ngithub.com/kubeflow/pipelines/backend/src/apiserver/server.(*ExperimentServer).ListExperiment\n\t/go/src/github.com/kubeflow/pipelines/backend/src/apiserver/server/experiment_server.go:148\ngithub.com/kubeflow/pipelines/backend/api/go_client._ExperimentService_ListExperiment_Handler.func1\n\t/go/src/github.com/kubeflow/pipelines/backend/api/go_client/experiment.pb.go:1089\nmain.apiServerInterceptor\n\t/go/src/github.com/kubeflow/pipelines/backend/src/apiserver/interceptor.go:30\ngithub.com/kubeflow/pipelines/backend/api/go_client._ExperimentService_ListExperiment_Handler\n\t/go/src/github.com/kubeflow/pipelines/backend/api/go_client/experiment.pb.go:1091\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc#v1.38.0/server.go:1286\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc#v1.38.0/server.go:1609\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc#v1.38.0/server.go:934\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357\nFailed to authorize with API resource references\ngithub.com/kubeflow/pipelines/backend/src/common/util.Wrap\n\t/go/src/github.com/kubeflow/pipelines/backend/src/common/util/error.go:279\ngithub.com/kubeflow/pipelines/backend/src/apiserver/server.(*ExperimentServer).ListExperiment\n\t/go/src/github.com/kubeflow/pipelines/backend/src/apiserver/server/experiment_server.go:150\ngithub.com/kubeflow/pipelines/backend/api/go_client._ExperimentService_ListExperiment_Handler.func1\n\t/go/src/github.com/kubeflow/pipelines/backend/api/go_client/experiment.pb.go:1089\nmain.apiServerInterceptor\n\t/go/src/github.com/kubeflow/pipelines/backend/src/apiserver/interceptor.go:30\ngithub.com/kubeflow/pipelines/backend/api/go_client._ExperimentService_ListExperiment_Handler\n\t/go/src/github.com/kubeflow/pipelines/backend/api/go_client/experiment.pb.go:1091\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc#v1.38.0/server.go:1286\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc#v1.38.0/server.go:1609\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc#v1.38.0/server.go:934\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357","error_details":"Internal error: Unauthenticated: Request header error: there is no user identity header.: Request header error: there is no user identity header.\nFailed to authorize with API resource references\ngithub.com/kubeflow/pipelines/backend/src/common/util.Wrap\n\t/go/src/github.com/kubeflow/pipelines/backend/src/common/util/error.go:279\ngithub.com/kubeflow/pipelines/backend/src/apiserver/server.(*ExperimentServer).canAccessExperiment\n\t/go/src/github.com/kubeflow/pipelines/backend/src/apiserver/server/experiment_server.go:249\ngithub.com/kubeflow/pipelines/backend/src/apiserver/server.(*ExperimentServer).ListExperiment\n\t/go/src/github.com/kubeflow/pipelines/backend/src/apiserver/server/experiment_server.go:148\ngithub.com/kubeflow/pipelines/backend/api/go_client._ExperimentService_ListExperiment_Handler.func1\n\t/go/src/github.com/kubeflow/pipelines/backend/api/go_client/experiment.pb.go:1089\nmain.apiServerInterceptor\n\t/go/src/github.com/kubeflow/pipelines/backend/src/apiserver/interceptor.go:30\ngithub.com/kubeflow/pipelines/backend/api/go_client._ExperimentService_ListExperiment_Handler\n\t/go/src/github.com/kubeflow/pipelines/backend/api/go_client/experiment.pb.go:1091\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc#v1.38.0/server.go:1286\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc#v1.38.0/server.go:1609\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc#v1.38.0/server.go:934\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357\nFailed to authorize with API resource references\ngithub.com/kubeflow/pipelines/backend/src/common/util.Wrap\n\t/go/src/github.com/kubeflow/pipelines/backend/src/common/util/error.go:279\ngithub.com/kubeflow/pipelines/backend/src/apiserver/server.(*ExperimentServer).ListExperiment\n\t/go/src/github.com/kubeflow/pipelines/backend/src/apiserver/server/experiment_server.go:150\ngithub.com/kubeflow/pipelines/backend/api/go_client._ExperimentService_ListExperiment_Handler.func1\n\t/go/src/github.com/kubeflow/pipelines/backend/api/go_client/experiment.pb.go:1089\nmain.apiServerInterceptor\n\t/go/src/github.com/kubeflow/pipelines/backend/src/apiserver/interceptor.go:30\ngithub.com/kubeflow/pipelines/backend/api/go_client._ExperimentService_ListExperiment_Handler\n\t/go/src/github.com/kubeflow/pipelines/backend/api/go_client/experiment.pb.go:1091\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc#v1.38.0/server.go:1286\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc#v1.38.0/server.go:1609\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc#v1.38.0/server.go:934\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357"}]}

Found How to do programmatic authentication with Dex? #140. By modifying the code as per the solution provided there, it worked.
import requests
import kfp
import kfp.dsl as dsl
from kfp.components import create_component_from_func
# Does not work
#import kfp
#client = kfp.Client(host='http://localhost:3000', namespace='kubeflow')
#print(client.list_experiments(namespace='kubeflow'))
# --------------------------------------------------------------------------------
# https://github.com/kubeflow/kfctl/issues/140#issuecomment-719894529
# How to do programmatic authentication with Dex? #140
# --------------------------------------------------------------------------------
HOST = "http://localhost:8080/"
USERNAME = "user#example.com"
PASSWORD = "12341234"
NAMESPACE = "kubeflow-user-example-com"
session = requests.Session()
response = session.get(HOST)
headers = {
"Content-Type": "application/x-www-form-urlencoded",
}
data = {"login": USERNAME, "password": PASSWORD}
session.post(response.url, headers=headers, data=data)
session_cookie = session.cookies.get_dict()["authservice_session"]
client = kfp.Client(
host=f"{HOST}/pipeline",
cookies=f"authservice_session={session_cookie}",
namespace=NAMESPACE,
)
print(client.list_pipelines())
def add(a: float, b: float) -> float:
'''Calculates sum of two arguments'''
return a + b
add_op = create_component_from_func(
add, output_component_file='add_component.yaml')
#dsl.pipeline(
name='Addition pipeline',
description='An example pipeline that performs addition calculations.'
)
def add_pipeline(
a='1',
b='7',
):
# Passes a pipeline parameter and a constant value to the `add_op` factory
# function.
first_add_task = add_op(a, 4)
# Passes an output reference from `first_add_task` and a pipeline parameter
# to the `add_op` factory function. For operations with a single return
# value, the output reference can be accessed as `task.output` or
# `task.outputs['output_name']`.
second_add_task = add_op(first_add_task.output, b)
# Specify argument values for your pipeline run.
arguments = {'a': '7', 'b': '8'}
# Create a pipeline run, using the client you initialized in a prior step.
#client.create_run_from_pipeline_func(add_pipeline, arguments=arguments)
kfp.compiler.Compiler().compile(
pipeline_func=add_pipeline,
package_path='pipeline.yaml')

Related

UWSGI Works Within Network But Not Over Domain

I have a RPi running NGINX and UWSGI serving a webpage and an API via UWSGI.
Web page works fine, both locally and from the web.
API works locally, but not via web. My guess it's either the router or the NGINX configuration.
I am using cloudflare for the DNS, and all appears fine there.
I can GET / POST locally using Postman, but not via the web address. I would greatly appreciate any ideas on where to look.
Output from uwsgi is:
*** Starting uWSGI 2.0.20 (32bit) on [Sat May 14 12:35:08 2022] ***
compiled with version: 8.3.0 on 06 October 2021 05:59:48
os: Linux-5.10.103-v7l+ #1529 SMP Tue Mar 8 12:24:00 GMT 2022
nodename: xxx
machine: armv7l
clock source: unix
pcre jit disabled
detected number of CPU cores: 4
current working directory: /var/www/xxx.xxx/public
detected binary path: /home/pi/.local/bin/uwsgi
*** WARNING: you are running uWSGI without its master process manager ***
your processes number limit is 12393
your memory page size is 4096 bytes
detected max file descriptor number: 1024
lock engine: pthread robust mutexes
thunder lock: disabled (you can enable it with --thunder-lock)
uWSGI http bound on :9090 fd 4
spawned uWSGI http 1 (pid: 3176)
uwsgi socket 0 bound to TCP address 127.0.0.1:34881 (port auto-assigned) fd 3
Python version: 3.7.3 (default, Jan 22 2021, 20:04:44) [GCC 8.3.0]
*** Python threads support is disabled. You can enable it with --enable-threads ***
Python main interpreter initialized at 0xd5c950
your server socket listen backlog is limited to 100 connections
your mercy for graceful operations on workers is 60 seconds
mapped 64408 bytes (62 KB) for 1 cores
*** Operational MODE: single process ***
<<<<<<<<<<<<<<<< Loaded script >>>>>>>>>>>>>>>>
WSGI app 0 (mountpoint='') ready in 0 seconds on interpreter 0xd5c950 pid: 3175 (default app)
*** uWSGI is running in multiple interpreter mode ***
spawned uWSGI worker 1 (and the only) (pid: 3175, cores: 1)

Spring Cloud Skipper errors out immediately after start on local MicroK8s

I'm trying to deploy the entire Spring Cloud Data Flow platform to a MicroK8s cluster running on one of our server, a VM with Ubuntu 20.04. Before starting performing actions on the target server, I tried to deploy it on my local computer (same OS) and I even succeeded and created/run one stream. Nevertheless, I am currently experiencing an error both on my local computer and on the VM, and I can't manage to pinpoint the root cause.
My current situation:
I'm following the official guide for deploying SCDF using kubectl, only difference being that I'm using tag v2.9.4, latest at the time of writing, instead of v2.9.1. I also skipped the configuration of monitoring frameworks, and hence commented the relevant lines in the configuration of SCDF server, as suggested in the docs. Kafka message broker and MySQL database are deployed without issues.
But, after executing kubectl commands to create config map, service and deployment for Skipper, I can see that Skipper pod goes in status "CrashLoopBackOff". Checking the logs of the pod, the only thing I see is that the application is terminated right after it seems to have started:
[...]
2022-04-11 15:00:11.713 INFO 1 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port(s): 7577 (http) with context path ''
2022-04-11 15:00:11.907 INFO 1 --- [ main] o.s.c.s.s.app.SkipperServerApplication : Started SkipperServerApplication in 78.901 seconds (JVM running for 82.435)
2022-04-11 15:00:12.531 INFO 1 --- [ionShutdownHook] o.s.s.s.DefaultStateMachineService : Entering stop sequence, stopping all managed machines
2022-04-11 15:00:12.617 INFO 1 --- [ionShutdownHook] j.LocalContainerEntityManagerFactoryBean : Closing JPA EntityManagerFactory for persistence unit 'default'
2022-04-11 15:00:12.703 INFO 1 --- [ionShutdownHook] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Shutdown initiated...
2022-04-11 15:00:12.799 INFO 1 --- [ionShutdownHook] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Shutdown completed.
Native Memory Tracking:
Total: reserved=961864767, committed=325411903
- Java Heap (reserved=356515840, committed=138334208)
(mmap: reserved=356515840, committed=138334208)
- Class (reserved=269444100, committed=94409732)
(classes #17623)
( instance classes #16455, array classes #1168)
(malloc=3355652 #45645)
(mmap: reserved=266088448, committed=91054080)
( Metadata: )
( reserved=79691776, committed=78340096)
( used=76414680)
( free=1925416)
( waste=0 =0.00%)
( Class space:)
( reserved=186396672, committed=12713984)
( used=11544696)
( free=1169288)
( waste=0 =0.00%)
- Thread (reserved=14794856, committed=1323112)
(thread #14)
(stack: reserved=14729216, committed=1257472)
(malloc=51792 #86)
(arena=13848 #25)
- Code (reserved=255686068, committed=26629556)
(malloc=2053556 #8654)
(mmap: reserved=253632512, committed=24576000)
- GC (reserved=1728178, committed=1019570)
(malloc=560818 #2163)
(mmap: reserved=1167360, committed=458752)
- Compiler (reserved=35543622, committed=35543622)
(malloc=71174 #1162)
(arena=35472448 #19)
- Internal (reserved=432627, committed=432627)
(malloc=399859 #1104)
(mmap: reserved=32768, committed=32768)
- Other (reserved=10248, committed=10248)
(malloc=10248 #3)
- Symbol (reserved=22101496, committed=22101496)
(malloc=19867360 #240000)
(arena=2234136 #1)
- Native Memory Tracking (reserved=4899928, committed=4899928)
(malloc=9656 #122)
(tracking overhead=4890272)
- Arena Chunk (reserved=81808, committed=81808)
(malloc=81808)
- Tracing (reserved=1, committed=1)
(malloc=1 #1)
- Logging (reserved=4572, committed=4572)
(malloc=4572 #192)
- Arguments (reserved=19063, committed=19063)
(malloc=19063 #495)
- Module (reserved=310496, committed=310496)
(malloc=310496 #2710)
- Synchronizer (reserved=283672, committed=283672)
(malloc=283672 #2348)
- Safepoint (reserved=8192, committed=8192)
(mmap: reserved=8192, committed=8192)
No matter how many times the pod is restarted, it always exits at this phase. This is the output of kubectl get all
NAME READY STATUS RESTARTS AGE
pod/kafka-zk-6b6f4976cf-9hjzn 1/1 Running 0 69m
pod/kafka-broker-0 1/1 Running 0 58m
pod/mysql-7c57b4cfdf-njb97 1/1 Running 0 39m
pod/skipper-b46bfd5fd-wrnqv 0/1 CrashLoopBackOff 13 (57s ago) 38m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.152.183.1 <none> 443/TCP 148m
service/kafka-zk ClusterIP 10.152.183.62 <none> 2181/TCP,2888/TCP,3888/TCP 69m
service/kafka-broker ClusterIP None <none> 9092/TCP 69m
service/mysql ClusterIP 10.152.183.139 <none> 3306/TCP 40m
service/skipper LoadBalancer 10.152.183.250 <pending> 80:31955/TCP 38m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/kafka-zk 1/1 1 1 69m
deployment.apps/mysql 1/1 1 1 39m
deployment.apps/skipper 0/1 1 0 38m
NAME DESIRED CURRENT READY AGE
replicaset.apps/kafka-zk-6b6f4976cf 1 1 1 69m
replicaset.apps/mysql-7c57b4cfdf 1 1 1 39m
replicaset.apps/skipper-b46bfd5fd 1 1 0 38m
NAME READY AGE
statefulset.apps/kafka-broker 1/1 69m
What I tried:
Changing the Skipper service type from LoadBalancer to NodePort (I have not enabled metallb so load balancing is not provided), but didn't work;
Changing the port exposed by the container, in the default configuration is port 80, I changed it to 7577 (also in the service configuration), but the error still occurs;
Downgraded to the version 2.8.2 of skipper, the same in the documentation above, the behaviour was exactly the same.
Increasing the logging level by setting logging.level.org.springframework to DEBUG and then to TRACE didn't result in anything useful showing up in the logs, except a cryptic line which I did not found anywhere on google:
[...]
2022-04-11 15:22:38.818 DEBUG 1 --- [ main] o.s.c.c.CompositeCompatibilityVerifier : All conditions are passing
2022-04-11 15:22:39.098 DEBUG 1 --- [ main] ocalVariableTableParameterNameDiscoverer : Cannot find '.class' file for class [class org.springframework.statemachine.boot.autoconfigure.StateMachineAutoConfiguration$StateMachineMonitoringConfiguration$$EnhancerBySpringCGLIB$$b266f314] - unable to determine constructor/method parameter names
2022-04-11 15:22:39.925 INFO 1 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port(s): 7577 (http) with context path ''
2022-04-11 15:22:40.244 INFO 1 --- [ main] o.s.c.s.s.app.SkipperServerApplication : Started SkipperServerApplication in 76.267 seconds (JVM running for 79.716)
[...]
Can anyone suggest me what to try next, or give me some way to further diagnosticate this issue?

Why are Google Pipeline VM instances hanging indefinitely?

I am using Dockerflow to run parallel tasks through the Google Pipelines API on Google Cloud Platform. I started a single-step task running 1389 VMs in parallel and found that 233 of the VMs were apparently doing nothing and hanging indefinitely.
I did a spot check of the serial console output and repeatedly saw the VMs running into "Getting controller config failed" errors.
When I tried logging into the VMs I received the error: "Connection Failed. We are unable to connect to the VM on port 22".
I am wondering why my VM instances are hanging, and if there is something I can do to avoid running into these issues.
I've included a snippet of the serial console output below
startupscript: +++ readlink -f /usr/share/google-genomics/startup.sh
startupscript: ++ dirname /usr/share/google-genomics/startup.sh
startupscript: + cd /usr/share/google-genomics
startupscript: + ./controller --operation_id <id> --validation_token <token> --base_path https://genomics.googleapis.com
create controller[2905]: Getting controller config
create controller[2905]: Getting controller config failed, will retry: Get <link>: Get <service_account_token_link>: net/http: timeout awaiting response headers
create controller[2905]: Getting controller config failed, will retry: Get <link>: dial tcp 74.125.26.95:443: i/o timeout
collectd[2342]: write_gcm: Asking metadata server for auth token
collectd[2342]: write_gcm: curl_easy_perform() failed: Couldn't connect to server
collectd[2342]: write_gcm: Error -1 from wg_curl_get_or_post
collectd[2342]: write_gcm: wg_transmit_unique_segment failed.
collectd[2342]: write_gcm: wg_transmit_unique_segments failed. Flushing.

there was a temporary networking issue in us-east1-b. All 3 above VMs were in us-east1-b. These minor incidents do not appear in https://status.cloud.google.com/
Serial console output for a successful run looks like:
A Feb 21 19:05:06 ggp-5629907348021283130 startupscript: + ./controller --operation_id --validation_token --base_path https://autopush-genomics.sandbox.googleapis.com
A Feb 21 19:05:06 ggp-5629907348021283130 create controller[2689]: Getting controller config
A Feb 21 19:05:36 ggp-5629907348021283130 create controller[2689]: Getting controller config failed, will retry: Get https://genomics.googleapis.com/v1alpha2/pipelines:getControllerConfig?alt=json&operationId=&validationToken=: dial tcp 173.194.212.81:443: i/o timeout
A Feb 21 19:05:43 ggp-5629907348021283130 controller[2689]: Switching to status: pulling-image
A Feb 21 19:05:43 ggp-5629907348021283130 controller[2689]: Calling SetOperationStatus(pulling-image)
A Feb 21 19:05:44 ggp-5629907348021283130 controller[2689]: SetOperationStatus(pulling-image) succeeded
The "Getting controller config failed, will retry" is fine. It succeeded upon retry. The "SetOperationStatus(pulling-image) succeeded" indicates networking is working.
In theory, you can submit any number of jobs to Pipelines API and the API will take care of queueing.
If these temporary networking hiccups become common, we may consider changing Pipelines API to somehow detect and retry.

there may have been a temporary networking issue. Can you give me some failed operation ids (or failed VM names)?
Have you tried again since then; can you reproduce the problem?

Random failure of creating a New Cassandra Cluster using OpsCenter

OpsCenter version: 5.1.0 and
DSE Version: 4.6.0
Creating a brand new cluster by using OpsCenter directly, gives us the following error. It randomly works with the same settings but 95% of the times it fails with the same error. Opscenter is running on its own box but sharing the same Security groups as the cluster instances. For good measure, I have opened up all TCP ports to all IPs. The following is the stack trace of the error from the opscenterd.log:
*2015-03-19 10:06:12+0000 [] INFO: Starting provisioning process
2015-03-19 10:06:12+0000 [] INFO: Starting installation phase of cluster provisioning
2015-03-19 10:06:13+0000 [] WARN: HTTP request http://10.x.x.x:61621/alive? failed: Connection was refused by other side: 111: Connection refused.
2015-03-19 10:06:13+0000 [] INFO: Beginning install of OpsCenter agent to 54.x.x.x
2015-03-19 10:06:26+0000 [] WARN: HTTP request http://10.x.x.x:61621/alive? failed: Connection was refused by other side: 111: Connection refused.
2015-03-19 10:06:31+0000 [] INFO: Agent for ip 10.x.x.x is version None
2015-03-19 10:06:31+0000 [] INFO: Agent for ip 10.x.x.x is version u'5.1.0'
2015-03-19 10:07:23+0000 [] INFO: Successfully installed agent and dse on node 10.x.x.x
2015-03-19 10:07:23+0000 [] INFO: Beginning "stop" phase of cluster provisioning
2015-03-19 10:07:25+0000 [] WARN: Marking request '10.x.x.x: /ops/stop' (f6708fa2-b45f-42b4-b992-90a82b460ac7) as failed: /usr/sbin/service dse stop failed
exit status: 1
stdout:
log_daemon_msg is a shell function
Cassandra 2.0 and later require Java 7 or later.
2015-03-19 10:07:25+0000 [] ERROR: Failed to stop node 10.x.x.x: /usr/sbin/service dse stop failed
exit status: 1
stdout:
log_daemon_msg is a shell function
Cassandra 2.0 and later require Java 7 or later.
2015-03-19 10:07:25+0000 [] WARN: Marking request 'stop stage' (0b6fcb6b-96ba-404e-a484-b4b6b167b309) as failed: Failed to stop node 10.x.x.x: /usr/sbin/service dse stop failed
exit status: 1
stdout:
log_daemon_msg is a shell function
Cassandra 2.0 and later require Java 7 or later.
2015-03-19 10:07:25+0000 [] ERROR: Stop stage failed: Failed to stop node 10.x.x.x: /usr/sbin/service dse stop failed
exit status: 1
stdout:
log_daemon_msg is a shell function
Cassandra 2.0 and later require Java 7 or later.
2015-03-19 10:07:25+0000 [] WARN: Marking request 'provision' (daf1c15d-92e3-40b0-83ca-34d548ea835b) as failed: Stop stage failed: Failed to stop node 10.x.x.x: /usr/sbin/service dse stop failed
exit status: 1
stdout:
log_daemon_msg is a shell function
Cassandra 2.0 and later require Java 7 or later.
2015-03-19 10:07:25+0000 [] ERROR:
2015-03-19 10:07:25+0000 [] ERROR: Cluster provisioning failed: Exception: Stop stage failed: Failed to stop node 10.x.x.x: /usr/sbin/service dse stop failed
exit status: 1
stdout:
log_daemon_msg is a shell function
Cassandra 2.0 and later require Java 7 or later.
2015-03-19 10:07:25+0000 [] ERROR: Failed to provision cluster: Cluster provisioning failed: Exception: Stop stage failed: Failed to stop node 10.x.x.x: /usr/sbin/service dse stop failed
exit status: 1
stdout:
log_daemon_msg is a shell function
Cassandra 2.0 and later require Java 7 or later.
2015-03-19 10:07:25+0000 [] WARN: Marking request 28c021fd-d21a-4fed-bb5c-a4fe17d362e0 as failed: Cluster provisioning failed: Exception: Stop stage failed: Failed to stop node 10.x.x.x: /usr/sbin/service dse stop failed
exit status: 1
stdout:
log_daemon_msg is a shell function
Cassandra 2.0 and later require Java 7 or later.
2015-03-19 10:07:41+0000 [] WARN: Unable to find a matching cluster for node with IP [u'fe80:0:0:0:2000:aff:feeb:31c7%2', u'10.x.x.x', u'0:0:0:0:0:0:0:1%1', u'127.0.0.1']; the message was [u'5.1.0', u'/1947480708/conf']. This usually indicates that an OpsCenter agent is still running on an old node that was decommissioned or is part of a cluster that OpsCenter is no longer monitoring.
Appreciate any help!
Thanks in advance
Harsha

OpCenter developer here. I make the OpsCenter provisioning features go zoom (or splat occasionally as you've seen). It is with sadness and shame that I must tell you that you're hitting a bug.
The Datastax AMI version 2.4 used by OpsCenter provisioning (https://github.com/riptano/ComboAMI/tree/2.4) does quite a bit of work at boot time via startup scripts. One of those tasks is to set up some gpg repository keys used to validate packages. Intermittently that process can fail, breaking package installs and leading to the series of errors that you're seeing. This failure is intermittent and has greatly increased in frequency recently. If you check /home/ubuntu/datastax-ami/ami.log you should see the gpg key failures that begin the rest of the failure chain.
Unfortunately, this error is pretty far down the technology stack and is difficult to manually work around. If you just need to provision a single cluster you can retry until you get a good run. Otherwise your best best is to manually launch the instances and use local provisioning to deploy dse/dsc to their private ip addresses:
Launch instances using ami-ada2b6c4 (assuming you're in us-east-1)
Make sure to add the instances to the OpsCenterSecurity group.
Make sure you have the private half of the keypair you use (you'll need it during local provisioning)
On the instance data page, hit the advanced pulldown and add the following userdata as text "--raidonly --java7"
Do a local-provisioning run against the private-ip's
Not a super-simple workaround. I wish your experience with OpsCenter this time around was more awesome. The good news is I'm on this bug and it will be fixed in an upcoming point release.
Edit: No longer necessary to manually remove /etc/security/limits.d/cassandra.conf

if its just complaining about java then install the java 7 preferably datastax wants oracle jdk and jre. you might already have java 7 and another version on your nodes but java 7 is not the default version. to change this do:
sudo update-java-alternatives -s java-7-oracle
which is a command you can script to run with ssh so you dont have to log in to each node

PassengerTempDir issue ERROR: Phusion Passenger doesn't seem to be running

I have trouble while trying to set PassengerTempDir:
I have redmine on apache+mod_passenger, CentOs. In redmine I have 500 Internal Error while uploading files. Due to my issue I have found that I should change PassengerTempDir to custom folder.
wrong permissions set in the webserver_private directory of passenger.
6.6 PassengerTempDir
For tests I have create folder /home/tmp_passenger and set 777 for it.
And the next I proceed export PASSENGER_TMPDIR=/tmp_passenger
Result is :
passenger-status
ERROR: Phusion Passenger doesn't seem to be running.
So, before I have proceed export PASSENGER_TMPDIR=/tmp_passenger
passenger-status
Version : 4.0.53
Date : 2014-12-29 12:43:36 +0100
Instance: 1416
----------- General information -----------
Max pool size : 20
Processes : 1
Requests in top-level queue : 0
----------- Application groups -----------
/home/admin/web/MYDOMAIN/public_html/redmine#default:
App root: /home/admin/web/MYDOMAIN/public_html/redmine
Requests in queue: 0
* PID: 6440 Sessions: 0 Processed: 8 Uptime: 22m 1s
CPU: 0% Memory : 52M Last used: 10m 29s ago
After I have proceed export PASSENGER_TMPDIR=/tmp_passenger
passenger-status
ERROR: Phusion Passenger doesn't seem to be running.
Please, help me resolve issue. What I should do now ?

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

kubeflow dsl-compile - kfp_server_api.exceptions.ApiException: (400) - kubeflow

Related

UWSGI Works Within Network But Not Over Domain

Spring Cloud Skipper errors out immediately after start on local MicroK8s

Why are Google Pipeline VM instances hanging indefinitely?

Random failure of creating a New Cassandra Cluster using OpsCenter

PassengerTempDir issue ERROR: Phusion Passenger doesn't seem to be running

Categories

Resources