How to use ExitHandler with Kubeflow Pipelines SDK v2 - kubeflow

I'm trying to move all my Kubeflow Pipelines from using the previous SDK v1 (kfp), to the newer Pipelines SDK v2 (kfp.v2). I'm using version 1.8.12.This refactoring have proved successful for almost all code, except for the ExitHandler, which still exists; from kfp.v2.dsl import ExitHandler. It seems like the previous way of compiling the pipeline object into a tar.gz-file using kfp.compiler.Compiler().compile(pipeline, 'basic_pipeline.tar.gz') file preserved some type of Argo placeholders, while the new .json pipelines using compiler.Compiler().compile(pipeline_func=pipeline, package_path="basic-pipeline.json") doesn't work the same way. Below, I will go into detail what works in Pipelines SDK v1 and how I've tried to implement it in v2.
Previously, using Kubeflow Pipelines v1, I could use an ExitHandler as shown in this StackOverflow question to eg. send a message to Slack when one of the pipeline components failed. I would define the pipeline as
import kfp.dsl as dsl
#dsl.pipeline(
name='Basic-pipeline'
)
def pipeline(...):
exit_task = dsl.ContainerOp(
name='Exit handler that catches errors and post them in Slack',
image='eu.gcr.io/.../send-error-msg-to-slack',
arguments=[
'python3', 'main.py',
'--message', 'Basic-pipeline failed'
'--status', "{{workflow.status}}"
]
)
with dsl.ExitHandler(exit_task):
step_1 = dsl.ContainerOp(...)
step_2 = dsl.ContainerOp(...) \
.after(step_1)
if __name__ == '__main__':
import kfp.compiler as compiler
compiler.Compiler().compile(pipeline, 'basic_pipeline.tar.gz')
where the exit_task would send the message to our Slack if any of the steps of the pipeline failed. The code for the exit_task image looks like
import argparse
def get_args():
parser = argparse.ArgumentParser()
parser.add_argument('--message', type=str)
parser.add_argument('--status', type=str)
return parser.parse_known_args()
def main(FLAGS):
def post_to_slack(msg):
...
if FLAGS.status == "Failed":
post_to_slack(FLAGS.message)
else:
pass
if __name__ == '__main__':
FLAGS, unparsed = get_args()
main(FLAGS)
This worked, because the underlying Argo workflow could somehow understand the "{{workflow.status}}" notion.
However, I'm now trying to use Vertex AI to run the pipeline, leveraging the Kubeflow Pipelines SDK v2, kfp.v2. Using the same exit-handler image as before, 'eu.gcr.io/.../send-error-msg-to-slack', I now define a yaml component file (exit_handler.yaml) instead,
name: Exit handler
description: Prints to Slack if any step of the pipeline fails
inputs:
- {name: message, type: String}
- {name: status, type: String}
implementation:
container:
image: eu.gcr.io/.../send-error-msg-to-slack
command: [
python3,
main.py,
--message, {inputValue: message},
--status, {inputValue: status}
]
The pipeline code now looks like this instead,
from google.cloud import aiplatform
from google.cloud.aiplatform import pipeline_jobs
from kfp.v2 import compiler
from kfp.v2.dsl import pipeline, ExitHandler
from kfp.components import load_component_from_file
#pipeline(name="Basic-pipeline",
pipeline_root='gs://.../basic-pipeline')
def pipeline():
exit_handler_spec = load_component_from_file('./exit_handler.yaml')
exit_handler = exit_handler_spec(
message="Basic pipeline failed.",
status="{{workflow.status}}"
)
with ExitHandler(exit_handler):
step_0_spec = load_component_from_file('./comp_0.yaml')
step0 = step_0_spec(...)
step_1_spec = load_component_from_file('./comp_1.yaml')
step1 = step_1_spec(...) \
.after(step0)
if __name__ == '__main__':
compiler.Compiler().compile(
pipeline_func=pipeline,
package_path="basic-pipeline.json"
)
from google.oauth2 import service_account
credentials = service_account.Credentials.from_service_account_file("./my-key.json")
aiplatform.init(project='bsg-personalization',
location='europe-west4',
credentials=credentials)
job = pipeline_jobs.PipelineJob(
display_name="basic-pipeline",
template_path="basic-pipeline.json",
parameter_values={...}
)
job.run()
This "works" (no exceptions) to compile and run, but the ExitHandler code interprets the status as a string with value {{workflow.status}}, which is also indicated by the compiled pipeline json generated from the code above (basic-pipeline.json), which you can see below ("stringValue": "{{workflow.status}}"):
...
"exit-handler": {
"componentRef": {
"name": "comp-exit-handler"
},
"dependentTasks": [
"exit-handler-1"
],
"inputs": {
"parameters": {
"message": {
"runtimeValue": {
"constantValue": {
"stringValue": "Basic pipeline failed."
}
}
},
"status": {
"runtimeValue": {
"constantValue": {
"stringValue": "{{workflow.status}}"
}
}
}
}
},
"taskInfo": {
"name": "exit-handler"
},
"triggerPolicy": {
"strategy": "ALL_UPSTREAM_TASKS_COMPLETED"
}
}
...
Any idea of how I can refactor my old ExitHandler code using v1 to the new SDK v2, to make the exit handler understand if the status of my pipeline is failed or not?

This is probably not yet fully documented but in V2 we introduced a different variable PipelineTaskFinalStatus that can automatically populated for you to send it to your Slack channel.
Here is an example of the exit handler in the official doc https://cloud.google.com/vertex-ai/docs/pipelines/email-notifications#sending_a_notification_from_a_pipeline
And here is the corresponding email notification component
https://github.com/kubeflow/pipelines/blob/master/components/google-cloud/google_cloud_pipeline_components/v1/vertex_notification_email/component.yaml
You can write your own component with the following parameter which would be automatically populated when exit handler runs.
inputs:
...
- name: pipeline_task_final_status
type: PipelineTaskFinalStatus
(Note this feature is currently not available in Kubeflow Pipelines open source distribution yet and will be available in KFP V2. It's only available in Vertex Pipelines distribution)

Replacement of "{{workflow.status}}" in KFP SDK v2 is the special type annotation PipelineTaskFinalStatus as IronPan mentioned above.
Its usage is documented in https://www.kubeflow.org/docs/components/pipelines/v2/author-a-pipeline/pipelines/#dslexithandler

Related

What is the correct format of BatchGetBuilds parameters for 'AWS CDK - StepFunction - CallAwsService'

I'm working for a state machine by CDK.
And getting an issue to check the codebuild project status in the state machine...
Q. Could you let me know the correct format of batchGetBuilds parameters in CallAwsService?
import { CallAwsService } from "aws-cdk-lib/aws-stepfunctions-tasks"
import { JsonPath } from "aws-cdk-lib/aws-stepfunctions"
new CallAwsService(scope, "Check 1-1: Codebuild Status", {
service: "codebuild",
action: "batchGetBuilds",
parameters: {
Ids: [JsonPath.stringAt("$.results.codebuild.id")],
},
iamResources: ["*"],
inputPath: "$",
resultSelector: { "status.$": "$.builds[0].buildStatus" },
resultPath: "$.results.bulidAmi",
})
I tried 2 ways.
JsonPath.stringAt("$.results.codebuild.id")
Then it returns below and execution be failed.
"An error occurred while executing the state 'Check 1-1: Codebuild Status' (entered at the event id #9).
The Parameters '{\"Ids\":\"******-generate-new-ami-project:05763ec2-89a6-4b56-8b44-************\"}' could not be used to start the Task:
[Cannot deserialize instance of `java.util.ArrayList<java.lang.Object>` out of VALUE_STRING token]"
[JsonPath.stringAt("$.results.codebuild.id")]
If I use array, it is failed in the build stage... (I'm using cdk pipeline to deploy this) error message is below
Cannot use JsonPath fields in an array, they must be used in objects
+ Extra Question
I found this during the search
https://stackoverflow.com/questions/70978385/aws-step-functions-wait-for-codebuild-to-finish
Can I use this `sync` on the `CallAwsService`? (Main 1... state is using `CallAwsService` also)
If yes, how can I use it..?
Or do I need to change the `CallAwsService` to `CodeBuildStartBuild`?
Could you let me know the correct format of batchGetBuilds parameters in CallAwsService?
Use the States.Array intrinsic function. These CDK syntaxes are equivalent:
parameters = {
'Ids.$': 'States.Array($.results.codebuild.id)',
Ids: JsonPath.stringAt('States.Array($.results.codebuild.id)'),
Ids: JsonPath.array(JsonPath.stringAt('$.results.codebuild.id'))
}
Can I use this sync on the CallAwsService?
No. The CallAwsService task implements the AWS SDK service integrations, which does not support .sync for CodeBuild actions. As of v2.15, CDK should throw an error if you pass the RUN_JOB (= .sync) pattern to CallAwsService. See this github issue for context.
Or do I need to change the CallAwsService to CodeBuildStartBuild?
Yes. CodeBuildStartBuild works as expected with the RUN_JOB integration pattern.

How do i add Input transformation to a target using aws cdk for a cloudwatch event rule?

After i create a cloud-watch event rule i am trying to add a target to it but i am unable to add a input transformation. Previously the add target had the props allowed for input transformation but it does not anymore.
codeBuildRule.addTarget(new SnsTopic(props.topic));
The aws cdk page provides this solution but i dont exactly understand what it says
You can add additional targets, with optional input transformer using eventRule.addTarget(target[, input]). For example, we can add a SNS topic target which formats a human-readable message for the commit.
You should specify the message prop and use RuleTargetInput static methods. Some of these methods can use strings returned by EventField.fromPath():
// From a path
codeBuildRule.addTarget(new SnsTopic(props.topic, {
message: events.RuleTargetInput.fromEventPath('$.detail')
}));
// Custom object
codeBuildRule.addTarget(new SnsTopic(props.topic, {
message: RuleTargetInput.fromObject({
foo: EventField.fromPath('$.detail.bar')
})
}));
I had the same question trying to implement this tutorial in CDK: Tutorial: Set up a CloudWatch Events rule to receive email notifications for pipeline state changes
I found this helpful as well: Detect and react to changes in pipeline state with Amazon CloudWatch Events
NOTE: I could not get it to work using the Pipeline's class method onStateChange().
I ended up writing a Rule:
const topic = new Topic(this, 'topic', {topicName: 'codepipeline-notes-failure',
});
const description = `Generated by the CDK for stack: ${this.stackName}`;
new Rule(this, 'failed', {
description: description,
eventPattern: {
detail: {state: ['FAILED'], pipeline: ['notes']},
detailType: ['CodePipeline Pipeline Execution State Change'],
source: ['aws.codepipeline'],
},
targets: [
new SnsTopic(topic, {
message: RuleTargetInput.fromText(
`The Pipeline '${EventField.fromPath('$.detail.pipeline')}' has ${EventField.fromPath(
'$.detail.state',
)}`,
),
}),
],
});
After implementing, if you navigate to Amazon EventBridge -> Rules, then select the rule, then select the Target(s) and then click View Details you will see the Target Details with the Input transformer & InputTemplate.
Input transformer:
{"InputPathsMap":{"detail-pipeline":"$.detail.pipeline","detail-state":"$.detail.state"},"InputTemplate":"\"The
Pipeline '<detail-pipeline>' has <detail-state>\""}
This would work for CDK Python. CodeBuild to SNS notifications.
sns_topic = sns.Topic(...)
codebuild_project = codebuild.Project(...)
sns_topic.grant_publish(codebuild_project)
codebuild_project.on_build_failed(
f'rule-on-failed',
target=events_targets.SnsTopic(
sns_topic,
message=events.RuleTargetInput.from_multiline_text(
f"""
Name: {events.EventField.from_path('$.detail.project-name')}
State: {events.EventField.from_path('$.detail.build-status')}
Build: {events.EventField.from_path('$.detail.build-id')}
Account: {events.EventField.from_path('$.account')}
"""
)
)
)
Credits to #pruthvi-raj comment on an answer above

CDK generating empty targets for CfnCrawler

I'm using CDK Python API to define a Glue crawler, however, the CDK generated template contains empty 'Targets' block in the Crawler resource.
I've not been able to find an example to emulate. I've tried varying the definition of the targets object, but the object definition seems to be ignored by CDK.
from aws_cdk import cdk
BUCKET='poc-1-bucket43879c71-5uabw2rni0cp'
class PocStack(cdk.Stack):
def __init__(self, app: cdk.App, id: str, **kwargs) -> None:
super().__init__(app, id)
from aws_cdk import (
aws_iam as iam,
aws_glue as glue,
cdk
)
glue_role = iam.Role(
self, 'glue_role',
assumed_by=iam.ServicePrincipal('glue.amazonaws.com'),
managed_policy_arns=['arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole']
)
glue_crawler = glue.CfnCrawler(
self, 'glue_crawler',
database_name='db',
role=glue_role.role_arn,
targets={"S3Targets": [{"Path": f'{BUCKET}/path/'}]},
)
I expect the generated template to contain a valid 'targets' block with a single S3Target. However, cdk synth outputs a template with empty Targets in the AWS::Glue::Crawler resource:
gluecrawler:
Type: AWS::Glue::Crawler
Properties:
DatabaseName: db
Role:
Fn::GetAtt:
- glueroleFCCAEB57
- Arn
Targets: {}
Resolved, thanks to a clever colleague!
Changing "S3Targets" to "s3Targets", and "Path" to "path" resolved the issue. See below.
Hi Bob,
When I use typescript, the following works for me:
new glue.CfnCrawler(this, 'glue_crawler', {
databaseName: 'db',
role: glue_role.roleArn,
targets: {
s3Targets: [{ path: "path" }]
}
}
When I used Python, the following appears working too:
glue_crawler = glue.CfnCrawler(
self, 'glue_crawler',
database_name='db',
role=glue_role.role_arn,
targets={
"s3Targets": [{ "path": f'{BUCKET}/path/'}]
},
)
In Typescript, TargetsProperty is an interface with s3Targets as a property. And in
s3Targets, path is a property as well. I guess during the JSII transformation, it forces
us to use the same names in Python instead of the initial CFN resource names.
A more general way to approach this problem is to dig inside the cdk library in 2 steps:
1.
from aws_cdk import aws_glue
print(aws_glue.__file__)
(.env/lib/python3.8/site-packages/aws_cdk/aws_glue/__init__.py)
Go to that file and see how the mapping/types are defined. As of 16 Aug 2020, you find
#jsii.data_type(
jsii_type="#aws-cdk/aws-glue.CfnCrawler.TargetsProperty",
jsii_struct_bases=[],
name_mapping={
"catalog_targets": "catalogTargets",
"dynamo_db_targets": "dynamoDbTargets",
"jdbc_targets": "jdbcTargets",
"s3_targets": "s3Targets",
}
)
I found that the lowerCamelCase always work, while the pythonic snake_case does not.

Set Gatling report name in Jenkins pipeline througt taurus

I'm writing a declarative jenkins pipeline and I'm facing problems with gatling reports:
Mean response time trend is not correct, Is there a way to replace following cloud of dots by a curve ?
Extracts from my Jenkinsfile:
stage('perf') {
steps {
bzt params: './taurus/scenario.yml', generatePerformanceTrend: false, printDebugOutput: true
perfReport configType: 'PRT', graphType: 'PRT', ignoreFailedBuilds: true, modePerformancePerTestCase: true, modeThroughput: true, sourceDataFiles: 'results.xml'
dir ("taurus/results") {
gatlingArchive()
}
}
}
Extract from my scenario.yml:
modules:
gatling:
path: ./bin/gatling.sh
java-opts: -Dgatling.core.directory.data=./data
In scenario.yml, I tried to set gatling.core.outputDirectoryBaseName :
java-opts: -Dgatling.core.directory.data=./data -Dgatling.core.outputDirectoryBaseName=./my_scenario
In this case it replace only gatling by my_scenario, but huge number is already present.
I finally found a solution to solve this problem, but it's not simple since it involves an extension of the taurus code.
The problem is here, at line 309 of the file gatling.py in taurus repo. It explicitly add a prefix 'gatling-' to find a gatling report.
However, parameter -Dgatling.core.outputDirectoryBaseName=./my_scenario in file scenario.yml change this prefix by my_scenario. What I will describe below is a way to extend taurus in order to quickly extends.
Create a file ./extensions/gatling.py with this code to extend class GatlingExecutor:
from bzt.modules.gatling import GatlingExecutor, DataLogReader
class GatlingExecutorExtension(GatlingExecutor):
def __init__(self):
GatlingExecutor.__init__(self)
def prepare(self):
# From method bzt.modules.gatling.GatlingExecutor:prepare, copy code before famous line 309
# Replace line 309 by
self.dir_prefix = self.settings.get('dir_prefix', 'gatling-%s' % id(self))
# From method bzt.modules.gatling.GatlingExecutor:prepare, copy code after famous line 309
Create a file ./bztx.py to wrap command bzt:
import signal
import logging
from bzt.cli import main, signal_handler
if __name__ == "__main__":
signal.signal(signal.SIGINT, signal_handler)
signal.signal(signal.SIGTERM, signal_handler)
main()
Update file scenario.yml by using new setting property dir_prefix and define new executor class:
modules:
gatling:
path: ./bin/gatling.sh
class: extensions.GatlingExecutorExtension
dir_prefix: my_scenario
java-opts: -Dgatling.core.directory.data=./data -Dgatling.core.outputDirectoryBaseName=./my_scenario
Finaly, update your Jenkinsfile by replacing bzt call with a call to your new file bztx.py:
stage('perf') {
steps {
sh 'python bztx.py ./taurus/scenario.yml'
perfReport configType: 'PRT', graphType: 'PRT', ignoreFailedBuilds: true, modePerformancePerTestCase: true, modeThroughput: true, sourceDataFiles: 'results.xml'
dir ("taurus/results") {
gatlingArchive()
}
}
}
That's all and it works for me. Bonus: this solution gives a way to easily extends taurus with your owns plugins ;-)

how to get the trigger information in Jenkins programmatically

I need to add the next build time scheduled in a build email notification after a build in Jenkins.
The trigger can be "Build periodically" or "Poll SCM", or anything with schedule time.
I know the trigger info is in the config.xml file e.g.
<triggers>
<hudson.triggers.SCMTrigger>
<spec>8 */2 * * 1-5</spec>
<ignorePostCommitHooks>false</ignorePostCommitHooks>
</hudson.triggers.SCMTrigger>
</triggers>
and I also know how to get the trigger type and spec with custom scripting from the config.xml file, and calculate the next build time.
I wonder if Jenkins has the API to expose this information out-of-the-box. I have done the search, but not found anything.
I realise you probably no longer need help with this, but I just had to solve the same problem, so here is a script you can use in the Jenkins console to output all trigger configurations:
#!groovy
Jenkins.instance.getAllItems().each { it ->
if (!(it instanceof jenkins.triggers.SCMTriggerItem)) {
return
}
def itTrigger = (jenkins.triggers.SCMTriggerItem)it
def triggers = itTrigger.getSCMTrigger()
println("Job ${it.name}:")
triggers.each { t->
println("\t${t.getSpec()}")
println("\t${t.isIgnorePostCommitHooks()}")
}
}
This will output all your jobs that use SCM configuration, along with their specification (cron-like expression regarding when to run) and whether post-commit hooks are set to be ignored.
You can modify this script to get the data as JSON like this:
#!groovy
import groovy.json.*
def result = [:]
Jenkins.instance.getAllItems().each { it ->
if (!(it instanceof jenkins.triggers.SCMTriggerItem)) {
return
}
def itTrigger = (jenkins.triggers.SCMTriggerItem)it
def triggers = itTrigger.getSCMTrigger()
triggers.each { t->
def builder = new JsonBuilder()
result[it.name] = builder {
spec "${t.getSpec()}"
ignorePostCommitHooks "${t.isIgnorePostCommitHooks()}"
}
}
}
return new JsonBuilder(result).toPrettyString()
And then you can use the Jenkins Script Console web API to get this from an HTTP client.
For example, in curl, you can do this by saving your script as a text file and then running:
curl --data-urlencode "script=$(<./script.groovy)" <YOUR SERVER>/scriptText
If Jenkins is using basic authentication, you can supply that with the -u <USERNAME>:<PASSWORD> argument.
Ultimately, the request will result in something like this:
{
"Build Project 1": {
"spec": "H/30 * * * *",
"ignorePostCommitHooks": "false"
},
"Test Something": {
"spec": "#hourly",
"ignorePostCommitHooks": "false"
},
"Deploy ABC": {
"spec": "H/20 * * * *",
"ignorePostCommitHooks": "false"
}
}
You should be able to tailor these examples to fit your specific use case. It seems you won't need to access this remotely but just from a job, but I also included the remoting part as it might come in handy for someone else.

Resources