Kubeflow Pipelines v2 is giving Permission Denied on OutputPath - kubeflow
In Kubeflow Pipelines v2, running on EKS with default install, I'm getting a "permission denied" error.
It ran correctly in KFP v1.
time="2022-04-26T21:53:30.710Z" level=info msg="capturing logs" argo=true
I0426 21:53:30.745547 18 launcher.go:144] PipelineRoot defaults to "minio://mlpipeline/v2/artifacts".
I0426 21:53:30.745908 18 cache.go:120] Connecting to cache endpoint 10.100.244.104:8887
I0426 21:53:30.854201 18 launcher.go:193] enable caching
F0426 21:53:30.979055 18 main.go:50] Failed to execute component: failed to create directory "/tmp/outputs/output_context_path" for output parameter "output_context_path": mkdir /tmp/outputs/output_context_path: permission denied
time="2022-04-26T21:53:30.980Z" level=info msg="/tmp/outputs/output_context_path/data -> /var/run/argo/outputs/artifacts/tmp/outputs/output_context_path/data.tgz" argo=true
time="2022-04-26T21:53:30.981Z" level=info msg="Taring /tmp/outputs/output_context_path/data"
Error: failed to tarball the output /tmp/outputs/output_context_path/data to /var/run/argo/outputs/artifacts/tmp/outputs/output_context_path/data.tgz: stat /tmp/outputs/output_context_path/data: permission denied
failed to tarball the output /tmp/outputs/output_context_path/data to /var/run/argo/outputs/artifacts/tmp/outputs/output_context_path/data.tgz: stat /tmp/outputs/output_context_path/data: permission denied
The code that produces this is here:
import kfp
from kfp.v2.dsl import component, Artifact, Input, InputPath, Output, OutputPath, Dataset, Model
from typing import NamedTuple
def same_step_000_afc67b36914c4108b47e8b4bb316869d_fn(
input_context_path: InputPath(str),
output_context_path: OutputPath(str),
run_info: str ="gAR9lC4=",
metadata_url: str="",
):
from base64 import urlsafe_b64encode, urlsafe_b64decode
from pathlib import Path
import datetime
import requests
import tempfile
import dill
import os
input_context = None
with Path(input_context_path).open("rb") as reader:
input_context = reader.read()
# Helper function for posting metadata to mlflow.
def post_metadata(json):
if metadata_url == "":
return
try:
req = requests.post(metadata_url, json=json)
req.raise_for_status()
except requests.exceptions.HTTPError as err:
print(f"Error posting metadata: {err}")
# Move to writable directory as user might want to do file IO.
# TODO: won't persist across steps, might need support in SDK?
os.chdir(tempfile.mkdtemp())
# Load information about the current experiment run:
run_info = dill.loads(urlsafe_b64decode(run_info))
# Post session context to mlflow.
if len(input_context) > 0:
input_context_str = urlsafe_b64encode(input_context)
post_metadata({
"experiment_id": run_info["experiment_id"],
"run_id": run_info["run_id"],
"step_id": "same_step_000",
"metadata_type": "input",
"metadata_value": input_context_str,
"metadata_time": datetime.datetime.now().isoformat(),
})
# User code for step, which we run in its own execution frame.
user_code = f"""
import dill
# Load session context into global namespace:
if { len(input_context) } > 0:
dill.load_session("{ input_context_path }")
{dill.loads(urlsafe_b64decode("gASVGAAAAAAAAACMFHByaW50KCJIZWxsbyB3b3JsZCIplC4="))}
# Remove anything from the global namespace that cannot be serialised.
# TODO: this will include things like pandas dataframes, needs sdk support?
_bad_keys = []
_all_keys = list(globals().keys())
for k in _all_keys:
try:
dill.dumps(globals()[k])
except TypeError:
_bad_keys.append(k)
for k in _bad_keys:
del globals()[k]
# Save new session context to disk for the next component:
dill.dump_session("{output_context_path}")
"""
# Runs the user code in a new execution frame. Context from the previous
# component in the run is loaded into the session dynamically, and we run
# with a single globals() namespace to simulate top-level execution.
exec(user_code, globals(), globals())
# Post new session context to mlflow:
with Path(output_context_path).open("rb") as reader:
context = urlsafe_b64encode(reader.read())
post_metadata({
"experiment_id": run_info["experiment_id"],
"run_id": run_info["run_id"],
"step_id": "same_step_000",
"metadata_type": "output",
"metadata_value": context,
"metadata_time": datetime.datetime.now().isoformat(),
})
Environment
How did you deploy Kubeflow Pipelines (KFP)?
From manifests
KFP version:
1.8.1
KFP SDK version:
1.8.12
I SUSPECT this is because I'm using the native functionality in Kubeflow to write out files to a local temp directory, but I (theorize) in KFP v2 it doesn't auto-create this. Do I need to have a bucket created for this purpose on KFP v2 on AWS?
EDIT TWO: here's the generated yaml - line 317 is the one that worries me. It APPEARS it's putting in the string of output_context_path when shouldn't that be a variable? is that substituted at runtime? --
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: root-pipeline-compilation-
annotations:
pipelines.kubeflow.org/kfp_sdk_version: 1.8.12
pipelines.kubeflow.org/pipeline_compilation_time: '2022-04-29T18:04:24.336927'
pipelines.kubeflow.org/pipeline_spec: '{"inputs": [{"default": "", "name": "context",
"optional": true, "type": "String"}, {"default": "", "name": "metadata_url",
"optional": true, "type": "String"}, {"default": "", "name": "pipeline-root"},
{"default": "pipeline/root_pipeline_compilation", "name": "pipeline-name"}],
"name": "root_pipeline_compilation"}'
pipelines.kubeflow.org/v2_pipeline: "true"
labels:
pipelines.kubeflow.org/v2_pipeline: "true"
pipelines.kubeflow.org/kfp_sdk_version: 1.8.12
spec:
entrypoint: root-pipeline-compilation
templates:
- name: root-pipeline-compilation
inputs:
parameters:
- {name: metadata_url}
- {name: pipeline-name}
- {name: pipeline-root}
dag:
tasks:
- name: run-info-fn
template: run-info-fn
arguments:
parameters:
- {name: pipeline-name, value: '{{inputs.parameters.pipeline-name}}'}
- {name: pipeline-root, value: '{{inputs.parameters.pipeline-root}}'}
- name: same-step-000-d5554cccadc4445f91f51849eb5f2de6-fn
template: same-step-000-d5554cccadc4445f91f51849eb5f2de6-fn
dependencies: [run-info-fn]
arguments:
parameters:
- {name: metadata_url, value: '{{inputs.parameters.metadata_url}}'}
- {name: pipeline-name, value: '{{inputs.parameters.pipeline-name}}'}
- {name: pipeline-root, value: '{{inputs.parameters.pipeline-root}}'}
- {name: run-info-fn-run_info, value: '{{tasks.run-info-fn.outputs.parameters.run-info-fn-run_info}}'}
- name: run-info-fn
container:
args:
- sh
- -c
- |2
if ! [ -x "$(command -v pip)" ]; then
python3 -m ensurepip || python3 -m ensurepip --user || apt-get install python3-pip
fi
PIP_DISABLE_PIP_VERSION_CHECK=1 python3 -m pip install --quiet --no-warn-script-location 'kfp' 'dill' 'kfp==1.8.12' && "$0" "$#"
- sh
- -ec
- |
program_path=$(mktemp -d)
printf "%s" "$0" > "$program_path/ephemeral_component.py"
python3 -m kfp.v2.components.executor_main --component_module_path "$program_path/ephemeral_component.py" "$#"
- |2+
import kfp
from kfp.v2 import dsl
from kfp.v2.dsl import *
from typing import *
def run_info_fn(
run_id: str,
) -> NamedTuple("RunInfoOutput", [("run_info", str),]):
from base64 import urlsafe_b64encode
from collections import namedtuple
import datetime
import base64
import dill
import kfp
client = kfp.Client(host="http://ml-pipeline:8888")
run_info = client.get_run(run_id=run_id)
run_info_dict = {
"run_id": run_info.run.id,
"name": run_info.run.name,
"created_at": run_info.run.created_at.isoformat(),
"pipeline_id": run_info.run.pipeline_spec.pipeline_id,
}
# Track kubernetes resources associated wth the run.
for r in run_info.run.resource_references:
run_info_dict[f"{r.key.type.lower()}_id"] = r.key.id
# Base64-encoded as value is visible in kubeflow ui.
output = urlsafe_b64encode(dill.dumps(run_info_dict))
return namedtuple("RunInfoOutput", ["run_info"])(
str(output, encoding="ascii")
)
- --executor_input
- '{{$}}'
- --function_to_execute
- run_info_fn
command: [/kfp-launcher/launch, --mlmd_server_address, $(METADATA_GRPC_SERVICE_HOST),
--mlmd_server_port, $(METADATA_GRPC_SERVICE_PORT), --runtime_info_json, $(KFP_V2_RUNTIME_INFO),
--container_image, $(KFP_V2_IMAGE), --task_name, run-info-fn, --pipeline_name,
'{{inputs.parameters.pipeline-name}}', --run_id, $(KFP_RUN_ID), --run_resource,
workflows.argoproj.io/$(WORKFLOW_ID), --namespace, $(KFP_NAMESPACE), --pod_name,
$(KFP_POD_NAME), --pod_uid, $(KFP_POD_UID), --pipeline_root, '{{inputs.parameters.pipeline-root}}',
--enable_caching, $(ENABLE_CACHING), --, 'run_id={{workflow.uid}}', --]
env:
- name: KFP_POD_NAME
valueFrom:
fieldRef: {fieldPath: metadata.name}
- name: KFP_POD_UID
valueFrom:
fieldRef: {fieldPath: metadata.uid}
- name: KFP_NAMESPACE
valueFrom:
fieldRef: {fieldPath: metadata.namespace}
- name: WORKFLOW_ID
valueFrom:
fieldRef: {fieldPath: 'metadata.labels[''workflows.argoproj.io/workflow'']'}
- name: KFP_RUN_ID
valueFrom:
fieldRef: {fieldPath: 'metadata.labels[''pipeline/runid'']'}
- name: ENABLE_CACHING
valueFrom:
fieldRef: {fieldPath: 'metadata.labels[''pipelines.kubeflow.org/enable_caching'']'}
- {name: KFP_V2_IMAGE, value: 'python:3.7'}
- {name: KFP_V2_RUNTIME_INFO, value: '{"inputParameters": {"run_id": {"type":
"STRING"}}, "inputArtifacts": {}, "outputParameters": {"run_info": {"type":
"STRING", "path": "/tmp/outputs/run_info/data"}}, "outputArtifacts": {}}'}
envFrom:
- configMapRef: {name: metadata-grpc-configmap, optional: true}
image: python:3.7
volumeMounts:
- {mountPath: /kfp-launcher, name: kfp-launcher}
inputs:
parameters:
- {name: pipeline-name}
- {name: pipeline-root}
outputs:
parameters:
- name: run-info-fn-run_info
valueFrom: {path: /tmp/outputs/run_info/data}
artifacts:
- {name: run-info-fn-run_info, path: /tmp/outputs/run_info/data}
metadata:
annotations:
pipelines.kubeflow.org/v2_component: "true"
pipelines.kubeflow.org/component_ref: '{}'
pipelines.kubeflow.org/arguments.parameters: '{"run_id": "{{workflow.uid}}"}'
labels:
pipelines.kubeflow.org/kfp_sdk_version: 1.8.12
pipelines.kubeflow.org/pipeline-sdk-type: kfp
pipelines.kubeflow.org/v2_component: "true"
pipelines.kubeflow.org/enable_caching: "true"
initContainers:
- command: [launcher, --copy, /kfp-launcher/launch]
image: gcr.io/ml-pipeline/kfp-launcher:1.8.7
name: kfp-launcher
mirrorVolumeMounts: true
volumes:
- {name: kfp-launcher}
- name: same-step-000-d5554cccadc4445f91f51849eb5f2de6-fn
container:
args:
- sh
- -c
- |2
if ! [ -x "$(command -v pip)" ]; then
python3 -m ensurepip || python3 -m ensurepip --user || apt-get install python3-pip
fi
PIP_DISABLE_PIP_VERSION_CHECK=1 python3 -m pip install --quiet --no-warn-script-location 'dill' 'requests' 'kfp==1.8.12' && "$0" "$#"
- sh
- -ec
- |
program_path=$(mktemp -d)
printf "%s" "$0" > "$program_path/ephemeral_component.py"
python3 -m kfp.v2.components.executor_main --component_module_path "$program_path/ephemeral_component.py" "$#"
- |2+
import kfp
from kfp.v2 import dsl
from kfp.v2.dsl import *
from typing import *
def same_step_000_d5554cccadc4445f91f51849eb5f2de6_fn(
input_context_path: InputPath(str),
output_context_path: OutputPath(str),
run_info: str = "gAR9lC4=",
metadata_url: str = "",
):
from base64 import urlsafe_b64encode, urlsafe_b64decode
from pathlib import Path
import datetime
import requests
import tempfile
import dill
import os
input_context = None
with Path(input_context_path).open("rb") as reader:
input_context = reader.read()
# Helper function for posting metadata to mlflow.
def post_metadata(json):
if metadata_url == "":
return
try:
req = requests.post(metadata_url, json=json)
req.raise_for_status()
except requests.exceptions.HTTPError as err:
print(f"Error posting metadata: {err}")
# Move to writable directory as user might want to do file IO.
# TODO: won't persist across steps, might need support in SDK?
os.chdir(tempfile.mkdtemp())
# Load information about the current experiment run:
run_info = dill.loads(urlsafe_b64decode(run_info))
# Post session context to mlflow.
if len(input_context) > 0:
input_context_str = urlsafe_b64encode(input_context)
post_metadata({
"experiment_id": run_info["experiment_id"],
"run_id": run_info["run_id"],
"step_id": "same_step_000",
"metadata_type": "input",
"metadata_value": input_context_str,
"metadata_time": datetime.datetime.now().isoformat(),
})
# User code for step, which we run in its own execution frame.
user_code = f"""
import dill
# Load session context into global namespace:
if { len(input_context) } > 0:
dill.load_session("{ input_context_path }")
{dill.loads(urlsafe_b64decode("gASVGAAAAAAAAACMFHByaW50KCJIZWxsbyB3b3JsZCIplC4="))}
# Remove anything from the global namespace that cannot be serialised.
# TODO: this will include things like pandas dataframes, needs sdk support?
_bad_keys = []
_all_keys = list(globals().keys())
for k in _all_keys:
try:
dill.dumps(globals()[k])
except TypeError:
_bad_keys.append(k)
for k in _bad_keys:
del globals()[k]
# Save new session context to disk for the next component:
dill.dump_session("{output_context_path}")
"""
# Runs the user code in a new execution frame. Context from the previous
# component in the run is loaded into the session dynamically, and we run
# with a single globals() namespace to simulate top-level execution.
exec(user_code, globals(), globals())
# Post new session context to mlflow:
with Path(output_context_path).open("rb") as reader:
context = urlsafe_b64encode(reader.read())
post_metadata({
"experiment_id": run_info["experiment_id"],
"run_id": run_info["run_id"],
"step_id": "same_step_000",
"metadata_type": "output",
"metadata_value": context,
"metadata_time": datetime.datetime.now().isoformat(),
})
- --executor_input
- '{{$}}'
- --function_to_execute
- same_step_000_d5554cccadc4445f91f51849eb5f2de6_fn
command: [/kfp-launcher/launch, --mlmd_server_address, $(METADATA_GRPC_SERVICE_HOST),
--mlmd_server_port, $(METADATA_GRPC_SERVICE_PORT), --runtime_info_json, $(KFP_V2_RUNTIME_INFO),
--container_image, $(KFP_V2_IMAGE), --task_name, same-step-000-d5554cccadc4445f91f51849eb5f2de6-fn,
--pipeline_name, '{{inputs.parameters.pipeline-name}}', --run_id, $(KFP_RUN_ID),
--run_resource, workflows.argoproj.io/$(WORKFLOW_ID), --namespace, $(KFP_NAMESPACE),
--pod_name, $(KFP_POD_NAME), --pod_uid, $(KFP_POD_UID), --pipeline_root, '{{inputs.parameters.pipeline-root}}',
--enable_caching, $(ENABLE_CACHING), --, input_context_path=, 'metadata_url={{inputs.parameters.metadata_url}}',
'run_info={{inputs.parameters.run-info-fn-run_info}}', --]
env:
- name: KFP_POD_NAME
valueFrom:
fieldRef: {fieldPath: metadata.name}
- name: KFP_POD_UID
valueFrom:
fieldRef: {fieldPath: metadata.uid}
- name: KFP_NAMESPACE
valueFrom:
fieldRef: {fieldPath: metadata.namespace}
- name: WORKFLOW_ID
valueFrom:
fieldRef: {fieldPath: 'metadata.labels[''workflows.argoproj.io/workflow'']'}
- name: KFP_RUN_ID
valueFrom:
fieldRef: {fieldPath: 'metadata.labels[''pipeline/runid'']'}
- name: ENABLE_CACHING
valueFrom:
fieldRef: {fieldPath: 'metadata.labels[''pipelines.kubeflow.org/enable_caching'']'}
- {name: KFP_V2_IMAGE, value: 'public.ecr.aws/j1r0q0g6/notebooks/notebook-servers/codeserver-python:v1.5.0'}
- {name: KFP_V2_RUNTIME_INFO, value: '{"inputParameters": {"input_context_path":
{"type": "STRING"}, "metadata_url": {"type": "STRING"}, "run_info": {"type":
"STRING"}}, "inputArtifacts": {}, "outputParameters": {"output_context_path":
{"type": "STRING", "path": "/tmp/outputs/output_context_path/data"}}, "outputArtifacts":
{}}'}
envFrom:
- configMapRef: {name: metadata-grpc-configmap, optional: true}
image: public.ecr.aws/j1r0q0g6/notebooks/notebook-servers/codeserver-python:v1.5.0
volumeMounts:
- {mountPath: /kfp-launcher, name: kfp-launcher}
inputs:
parameters:
- {name: metadata_url}
- {name: pipeline-name}
- {name: pipeline-root}
- {name: run-info-fn-run_info}
outputs:
artifacts:
- {name: same-step-000-d5554cccadc4445f91f51849eb5f2de6-fn-output_context_path,
path: /tmp/outputs/output_context_path/data}
metadata:
annotations:
pipelines.kubeflow.org/v2_component: "true"
pipelines.kubeflow.org/component_ref: '{}'
pipelines.kubeflow.org/arguments.parameters: '{"input_context_path": "", "metadata_url":
"{{inputs.parameters.metadata_url}}", "run_info": "{{inputs.parameters.run-info-fn-run_info}}"}'
pipelines.kubeflow.org/max_cache_staleness: P0D
labels:
pipelines.kubeflow.org/kfp_sdk_version: 1.8.12
pipelines.kubeflow.org/pipeline-sdk-type: kfp
pipelines.kubeflow.org/v2_component: "true"
pipelines.kubeflow.org/enable_caching: "true"
initContainers:
- command: [launcher, --copy, /kfp-launcher/launch]
image: gcr.io/ml-pipeline/kfp-launcher:1.8.7
name: kfp-launcher
mirrorVolumeMounts: true
volumes:
- {name: kfp-launcher}
arguments:
parameters:
- {name: context, value: ''}
- {name: metadata_url, value: ''}
- {name: pipeline-root, value: ''}
- {name: pipeline-name, value: pipeline/root_pipeline_compilation}
serviceAccountName: pipeline-runner
It's DEFINITELY a regression - here's the same YAML generated with the two compiler flags on. The first works, the second doesn't.
using the compiler in v1 mode - https://gist.github.com/aronchick/0dfc57d2a794c1bd4fb9bff9962cfbd6
using the compiler in v2 mode - https://gist.github.com/aronchick/473060503ae189b360fbded04d802c80
Related
Folder deleted/not created inside the common dir mounted with emptyDir{} type on EKS Fargate pod
We are facing strange issue with EKS Fargate Pods. We want to push logs to cloudwatch with sidecar fluent-bit container and for that we are mounting the separately created /logs/boot and /logs/access folders on both the containers with emptyDir: {} type. But somehow the access folder is getting deleted. When we tested this setup in local docker it produced desired results and things were working fine but not when deployed in the EKS fargate. Below is our manifest files Dockerfile FROM anapsix/alpine-java:8u201b09_server-jre_nashorn ARG LOG_DIR=/logs # Install base packages RUN apk update RUN apk upgrade # RUN apk add ca-certificates && update-ca-certificates # Dynamically set the JAVA_HOME path RUN export JAVA_HOME="$(dirname $(dirname $(readlink -f $(which java))))" && echo $JAVA_HOME # Add Curl RUN apk --no-cache add curl RUN mkdir -p $LOG_DIR/boot $LOG_DIR/access RUN chmod -R 0777 $LOG_DIR/* # Add metadata to the image to describe which port the container is listening on at runtime. # Change TimeZone RUN apk add --update tzdata ENV TZ="Asia/Kolkata" # Clean APK cache RUN rm -rf /var/cache/apk/* # Setting JAVA HOME ENV JAVA_HOME=/opt/jdk # Copy all files and folders COPY . . RUN rm -rf /opt/jdk/jre/lib/security/cacerts COPY cacerts /opt/jdk/jre/lib/security/cacerts COPY standalone.xml /jboss-eap-6.4-integration/standalone/configuration/ # Set the working directory. WORKDIR /jboss-eap-6.4-integration/bin EXPOSE 8177 CMD ["./erctl"] Deployment apiVersion: apps/v1 kind: Deployment metadata: name: vinintegrator namespace: eretail labels: app: vinintegrator pod: fargate spec: selector: matchLabels: app: vinintegrator pod: fargate replicas: 2 template: metadata: labels: app: vinintegrator pod: fargate spec: securityContext: fsGroup: 0 serviceAccount: eretail containers: - name: vinintegrator imagePullPolicy: IfNotPresent image: 653580443710.dkr.ecr.ap-southeast-1.amazonaws.com/vinintegrator-service:latest resources: limits: memory: "7629Mi" cpu: "1.5" requests: memory: "5435Mi" cpu: "750m" ports: - containerPort: 8177 protocol: TCP # securityContext: # runAsUser: 506 # runAsGroup: 506 volumeMounts: - mountPath: /jboss-eap-6.4-integration/bin name: bin - mountPath: /logs name: logs - name: fluent-bit image: 657281243710.dkr.ecr.ap-southeast-1.amazonaws.com/fluent-bit:latest imagePullPolicy: IfNotPresent env: - name: HOST_NAME valueFrom: fieldRef: fieldPath: spec.nodeName - name: POD_NAME valueFrom: fieldRef: fieldPath: metadata.name - name: POD_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace resources: limits: memory: 200Mi requests: cpu: 200m memory: 100Mi volumeMounts: - name: fluent-bit-config mountPath: /fluent-bit/etc/ - name: logs mountPath: /logs readOnly: true volumes: - name: fluent-bit-config configMap: name: fluent-bit-config - name: logs emptyDir: {} - name: bin persistentVolumeClaim: claimName: vinintegrator-pvc Below is the /logs folder ownership and permission. Please notice the 's' in drwxrwsrwx drwxrwsrwx 3 root root 4096 Oct 1 11:50 logs Below is the content inside logs folder. Please notice the access folder is not created or deleted. /logs # ls -lrt total 4 drwxr-sr-x 2 root root 4096 Oct 1 11:50 boot /logs # Below is the configmap of Fluent-Bit apiVersion: v1 kind: ConfigMap metadata: name: fluent-bit-config namespace: eretail labels: k8s-app: fluent-bit data: fluent-bit.conf: | [SERVICE] Flush 5 Log_Level info Daemon off Parsers_File parsers.conf HTTP_Server On HTTP_Listen 0.0.0.0 HTTP_Port 2020 #INCLUDE application-log.conf application-log.conf: | [INPUT] Name tail Path /logs/boot/*.log Tag boot [INPUT] Name tail Path /logs/access/*.log Tag access [OUTPUT] Name cloudwatch_logs Match *boot* region ap-southeast-1 log_group_name eks-fluent-bit log_stream_prefix boot-log- auto_create_group On [OUTPUT] Name cloudwatch_logs Match *access* region ap-southeast-1 log_group_name eks-fluent-bit log_stream_prefix access-log- auto_create_group On parsers.conf: | [PARSER] Name docker Format json Time_Key time Time_Format %Y-%m-%dT%H:%M:%S.%LZ Below is error log of Fluent-bit container AWS for Fluent Bit Container Image Version 2.14.0 Fluent Bit v1.7.4 * Copyright (C) 2019-2021 The Fluent Bit Authors * Copyright (C) 2015-2018 Treasure Data * Fluent Bit is a CNCF sub-project under the umbrella of Fluentd * https://fluentbit.io [2021/10/01 06:20:33] [ info] [engine] started (pid=1) [2021/10/01 06:20:33] [ info] [storage] version=1.1.1, initializing... [2021/10/01 06:20:33] [ info] [storage] in-memory [2021/10/01 06:20:33] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128 [2021/10/01 06:20:33] [error] [input:tail:tail.1] read error, check permissions: /logs/access/*.log [2021/10/01 06:20:33] [ warn] [input:tail:tail.1] error scanning path: /logs/access/*.log [2021/10/01 06:20:38] [error] [net] connection #33 timeout after 5 seconds to: 169.254.169.254:80 [2021/10/01 06:20:38] [error] [net] socket #33 could not connect to 169.254.169.254:80
Suggest remove the following from your Dockerfile: RUN mkdir -p $LOG_DIR/boot $LOG_DIR/access RUN chmod -R 0777 $LOG_DIR/* Use the following method to setup the log directories and permissions: apiVersion: v1 kind: Pod # Deployment metadata: name: busy labels: app: busy spec: volumes: - name: logs # Shared folder with ephemeral storage emptyDir: {} initContainers: # Setup your log directory here - name: setup image: busybox command: ["bin/ash", "-c"] args: - > mkdir -p /logs/boot /logs/access; chmod -R 777 /logs volumeMounts: - name: logs mountPath: /logs containers: - name: app # Run your application and logs to the directories image: busybox command: ["bin/ash","-c"] args: - > while :; do echo "$(date): $(uname -r)" | tee -a /logs/boot/boot.log /logs/access/access.log; sleep 1; done volumeMounts: - name: logs mountPath: /logs - name: logger # Any logger that you like image: busybox command: ["bin/ash","-c"] args: # tail the app logs, forward to CW etc... - > sleep 5; tail -f /logs/boot/boot.log /logs/access/access.log volumeMounts: - name: logs mountPath: /logs The snippet runs on Fargate as well, run kubectl logs -f busy -c logger to see the tailing. In real world, the "app" is your java app, "logger" is any log agent you desired. Note Fargate has native logging capability using AWS Fluent-bit, you do not need to run AWS Fluent-bit as sidecar.
What is kubeflow gpu resource node allocation criteria?
I’m curious about the Kubeflow GPU Resource. I’m running the job below. The only part where I specified the GPU Resource is on first container with only 1 GPU. However, the event message tells me 0/4 nodes are available: 4 Insufficient nvidia.com/gpu. Why is this job searching for 4 nodes though I specified only 1 GPU resource? Does my interpretation have a problem? Thanks much in advance. FYI) I have 3 worker nodes with each 1 gpu. apiVersion: batch/v1 kind: Job metadata: name: saint-train-3 annotations: sidecar.istio.io/inject: "false" spec: template: spec: initContainers: - name: dataloader image: <AWS CLI Image> command: ["/bin/sh", "-c", "aws s3 cp s3://<Kubeflow Bucket>/kubeflowdata.tar.gz /s3-data; cd /s3-data; tar -xvzf kubeflowdata.tar.gz; cd kubeflow_data; ls"] volumeMounts: - mountPath: /s3-data name: s3-data env: - name: AWS_ACCESS_KEY_ID valueFrom: secretKeyRef: {key: AWS_ACCESS_KEY_ID, name: aws-secret} - name: AWS_SECRET_ACCESS_KEY valueFrom: secretKeyRef: {key: AWS_SECRET_ACCESS_KEY, name: aws-secret} containers: - name: trainer image: <Our Model Image> command: ["/bin/sh", "-c", "wandb login <ID>; python /opt/ml/src/main.py --base_path='/s3-data/kubeflow_data' --debug_mode='0' --project='kubeflow-test' --name='test2' --gpu=0 --num_epochs=1 --num_workers=4"] volumeMounts: - mountPath: /s3-data name: s3-data resources: limits: nvidia.com/gpu: "1" - name: gpu-watcher image: pytorch/pytorch:latest command: ["/bin/sh", "-c", "--"] args: [ "while true; do sleep 30; done;" ] volumeMounts: - mountPath: /s3-data name: s3-data volumes: - name: s3-data persistentVolumeClaim: claimName: test-claim restartPolicy: OnFailure backoffLimit: 6
0/4 nodes are available: 4 Insufficient nvidia.com/gpu This is mean you haven't nodes with label nvidia.com/gpu
Openshift oc patch not executing initdb.sql from /docker-entrypoint-initdb.d
OpenShift: I have the below MySQL Deployment apiVersion: apps/v1 kind: Deployment metadata: name: mysql-master spec: selector: matchLabels: app: mysql-master strategy: type: Recreate template: metadata: labels: app: mysql-master spec: volumes: - name: mysql-persistent-storage persistentVolumeClaim: claimName: ro-mstr-nfs-datadir-claim containers: - image: mysql:5.7 name: mysql-master env: - name: MYSQL_SERVER_CONTAINER value: mysql - name: MYSQL_ROOT_PASSWORD valueFrom: secretKeyRef: name: mysql-secret key: MYSQL_ROOT_PASSWORD - name: MYSQL_DATABASE valueFrom: secretKeyRef: name: mysql-secret key: MYSQL_DATABASE - name: MYSQL_USER valueFrom: secretKeyRef: name: mysql-secret key: MYSQL_USER - name: MYSQL_PASSWORD valueFrom: secretKeyRef: name: mysql-secret key: MYSQL_PASSWORD ports: - containerPort: 3306 name: mysql-master volumeMounts: - name: mysql-persistent-storage mountPath: /var/lib/mysql I created a deployment using this yml file which created a deployment and pod which is successfully running. And I have a configmap apiVersion: v1 kind: ConfigMap metadata: name: ro-mstr-mysqlinitcnfgmap data: initdb.sql: |- CREATE TABLE aadhaar ( name varchar(255) NOT NULL, sex char NOT NULL, birth DATE NOT NULL, death DATE NULL, id int(255) NOT NULL AUTO_INCREMENT, PRIMARY KEY (id) ); CREATE USER 'usera'#'%' IDENTIFIED BY 'usera'; GRANT REPLICATION SLAVE ON *.* TO 'usera' IDENTIFIED BY 'usera'; FLUSH PRIVILEGES; Now I need to patch the above deployment using this configmap. I am using the below command oc patch deployment mysql-master -p '{ "spec": { "template": { "spec": { "volumes": [ { "name": "ro-mysqlinitconf-vol", "configMap": { "name": "ro-mstr-mysqlinitcnfgmap" } } ], "containers": [ { "image": "mysql:5.7", "name": "mysql-master", "volumeMounts": [ { "name": "ro-mysqlinitconf-vol", "mountPath": "/docker-entrypoint-initdb.d" } ] } ] } } } }' So the above command is successful, I validated the Deployment description and inside the container it placed the initdb.sql file successfully, and recreated the pod. But the issue is it has not created the aadhaar table. I think it has not executed the initdb.sql file from docker-entrypoint-initdb.d.
If you dive into the entrypoint script in your image (https://github.com/docker-library/mysql/blob/75f81c8e20e5085422155c48a50d99321212bf6f/5.7/docker-entrypoint.sh#L341-L350) you can see it only runs the initdb.d files if it is also creating the database the first time. I think maybe you assumed it always ran them on startup?
Is there a sneaky way to run a command before the entrypoint (in a k8s deployment manifest) without having to modify the dockerfile/image? [duplicate]
In this official document, it can run command in a yaml config file: https://kubernetes.io/docs/tasks/configure-pod-container/ apiVersion: v1 kind: Pod metadata: name: hello-world spec: # specification of the pod’s contents restartPolicy: Never containers: - name: hello image: "ubuntu:14.04" env: - name: MESSAGE value: "hello world" command: ["/bin/sh","-c"] args: ["/bin/echo \"${MESSAGE}\""] If I want to run more than one command, how to do?
command: ["/bin/sh","-c"] args: ["command one; command two && command three"] Explanation: The command ["/bin/sh", "-c"] says "run a shell, and execute the following instructions". The args are then passed as commands to the shell. In shell scripting a semicolon separates commands, and && conditionally runs the following command if the first succeed. In the above example, it always runs command one followed by command two, and only runs command three if command two succeeded. Alternative: In many cases, some of the commands you want to run are probably setting up the final command to run. In this case, building your own Dockerfile is the way to go. Look at the RUN directive in particular.
My preference is to multiline the args, this is simplest and easiest to read. Also, the script can be changed without affecting the image, just need to restart the pod. For example, for a mysql dump, the container spec could be something like this: containers: - name: mysqldump image: mysql command: ["/bin/sh", "-c"] args: - echo starting; ls -la /backups; mysqldump --host=... -r /backups/file.sql db_name; ls -la /backups; echo done; volumeMounts: - ... The reason this works is that yaml actually concatenates all the lines after the "-" into one, and sh runs one long string "echo starting; ls... ; echo done;".
If you're willing to use a Volume and a ConfigMap, you can mount ConfigMap data as a script, and then run that script: --- apiVersion: v1 kind: ConfigMap metadata: name: my-configmap data: entrypoint.sh: |- #!/bin/bash echo "Do this" echo "Do that" --- apiVersion: v1 kind: Pod metadata: name: my-pod spec: containers: - name: my-container image: "ubuntu:14.04" command: - /bin/entrypoint.sh volumeMounts: - name: configmap-volume mountPath: /bin/entrypoint.sh readOnly: true subPath: entrypoint.sh volumes: - name: configmap-volume configMap: defaultMode: 0700 name: my-configmap This cleans up your pod spec a little and allows for more complex scripting. $ kubectl logs my-pod Do this Do that
If you want to avoid concatenating all commands into a single command with ; or && you can also get true multi-line scripts using a heredoc: command: - sh - "-c" - | /bin/bash <<'EOF' # Normal script content possible here echo "Hello world" ls -l exit 123 EOF This is handy for running existing bash scripts, but has the downside of requiring both an inner and an outer shell instance for setting up the heredoc.
I am not sure if the question is still active but due to the fact that I did not find the solution in the above answers I decided to write it down. I use the following approach: readinessProbe: exec: command: - sh - -c - | command1 command2 && command3 I know my example is related to readinessProbe, livenessProbe, etc. but suspect the same case is for the container commands. This provides flexibility as it mirrors a standard script writing in Bash.
IMHO the best option is to use YAML's native block scalars. Specifically in this case, the folded style block. By invoking sh -c you can pass arguments to your container as commands, but if you want to elegantly separate them with newlines, you'd want to use the folded style block, so that YAML will know to convert newlines to whitespaces, effectively concatenating the commands. A full working example: apiVersion: v1 kind: Pod metadata: name: myapp labels: app: myapp spec: containers: - name: busy image: busybox:1.28 command: ["/bin/sh", "-c"] args: - > command_1 && command_2 && ... command_n
Here is my successful run apiVersion: v1 kind: Pod metadata: labels: run: busybox name: busybox spec: containers: - command: - /bin/sh - -c - | echo "running below scripts" i=0; while true; do echo "$i: $(date)"; i=$((i+1)); sleep 1; done name: busybox image: busybox
Here is one more way to do it, with output logging. apiVersion: v1 kind: Pod metadata: labels: type: test name: nginx spec: containers: - image: nginx name: nginx volumeMounts: - name: log-vol mountPath: /var/mylog command: - /bin/sh - -c - > i=0; while [ $i -lt 100 ]; do echo "hello $i"; echo "$i : $(date)" >> /var/mylog/1.log; echo "$(date)" >> /var/mylog/2.log; i=$((i+1)); sleep 1; done dnsPolicy: ClusterFirst restartPolicy: Always volumes: - name: log-vol emptyDir: {}
Here is another way to run multi line commands. apiVersion: batch/v1 kind: Job metadata: name: multiline spec: template: spec: containers: - command: - /bin/bash - -exc - | set +x echo "running below scripts" if [[ -f "if-condition.sh" ]]; then echo "Running if success" else echo "Running if failed" fi name: ubuntu image: ubuntu restartPolicy: Never backoffLimit: 1
Just to bring another possible option, secrets can be used as they are presented to the pod as volumes: Secret example: apiVersion: v1 kind: Secret metadata: name: secret-script type: Opaque data: script_text: <<your script in b64>> Yaml extract: .... containers: - name: container-name image: image-name command: ["/bin/bash", "/your_script.sh"] volumeMounts: - name: vsecret-script mountPath: /your_script.sh subPath: script_text .... volumes: - name: vsecret-script secret: secretName: secret-script I know many will argue this is not what secrets must be used for, but it is an option.
Named arguments not getting picked up from my kubernetes template
I'm trying to update a kubernetes template that we have so that I can pass in arguments such as --db-config <value> when my container starts up. This is obviously not right b/c there's not getting picked up ... containers: - name: {{ .Chart.Name }} ... args: ["--db-config", "/etc/app/cfg/db.yaml", "--tkn-config", "/etc/app/cfg/tkn.yaml"] <-- WHY IS THIS NOT WORKING
Here's an example showing your approach working: main.go: package main import "flag" import "fmt" func main() { db := flag.String("db-config", "default", "some flag") tk := flag.String("tk-config", "default", "some flag") flag.Parse() fmt.Println("db-config:", *db) fmt.Println("tk-config:", *tk) } Dockerfile [simplified]: FROM scratch ADD kube-flags / ENTRYPOINT ["/kube-flags"] Test: docker run kube-flags:180906 db-config: default tk-config: default docker run kube-flags:180906 --db-config=henry db-config: henry tk-config: default pod.yaml: apiVersion: v1 kind: Pod metadata: name: test spec: containers: - image: gcr.io/.../kube-flags:180906 imagePullPolicy: Always name: test args: - --db-config - henry - --tk-config - turnip test: kubectl logs test db-config: henry tk-config: turnip