How to define component/step using training operators such as TFJob in kubeflow pipeline - kubeflow

I know there is a way to use tfjob operator via kubectl, like the example at here (https://www.kubeflow.org/docs/components/training/tftraining/):
kubectl create -f https://raw.githubusercontent.com/kubeflow/training-operator/master/examples/tensorflow/simple.yaml
But I don't know how to incorporate in kubeflow pipeline. A normal component/job is defined via #component decoration or ContainerOp is a Kubernetes Job kind which runs in a Pod, but I don't know how to define a component with special training operator such as TFJob, so that my code runs as
apiVersion: "kubeflow.org/v1"
kind: TFJob
rather than:
apiVersion: "kubeflow.org/v1"
kind: Job
in kubernetes.
P.S.: there is a example here: https://github.com/kubeflow/pipelines/blob/master/components/kubeflow/launcher/sample.py
but don't see anywhere specify TFJob

The example you reference leverages some code that actually creates a TFJob (look at the folder of your example):
TFJob is instantiated here: https://github.com/kubeflow/pipelines/blob/master/components/kubeflow/launcher/src/launch_tfjob.py#L97
...and created as Kubernetes resource here: https://github.com/kubeflow/pipelines/blob/master/components/kubeflow/launcher/src/launch_tfjob.py#L126
...the previous code is accessed by a Kubeflow component as specified here: https://github.com/kubeflow/pipelines/blob/master/components/kubeflow/launcher/component.yaml
...which is imported into your referenced example here: https://github.com/kubeflow/pipelines/blob/master/components/kubeflow/launcher/sample.py#L60
The general question you raised is still subject to current discussions. Using tfjob_launcher_op appears to be the currently recommended way. Instead, some people also natively use ResourceOps to simulate your kubectl create call.

Related

How do I specify multiple experiments in a flex template run?

I am using Dataflow flex templates and am trying to launch as a job with GPU. I am following docs here to build my template from base nvidia image: https://cloud.google.com/dataflow/docs/guides/using-gpus
I want to run the template with a GPU attached. This requires experiments:
"worker_accelerator=type:nvidia-tesla-t4l;count:1;install-nvidia-driver
"use_runner_v2"
I believe these need to be specified separately rather than as a list. Or at least I haven't found a way to do that and the docs specify two --experiment arguments. From looking at docs I also believe that I need to specify the experiments as part of the --parameters argument as there is no --experiments argument for running flex templates.
I have tried the following:
In gcloud command line:
Specified experiments argument twice under --parameters. In this case it only assigns the second specified experiment.
gcloud dataflow flex-template run "test-flex" --template-file-gcs-location=<LOCATION> --parameters=source="bigquery",experiments="worker_accelerator=type:nvidia-tesla-t4;count:1;install-nvidia-driver",experiments="use_runner_v2" --max-workers=1 --region=us-east4 --worker-zone=us-east4-b
Assigned experiments in --parameters to specify the GPU and set --additional-experiments to use_runner_v2.
gcloud dataflow flex-template run "test-flex" --template-file-gcs-location=<LOCATION> --parameters=source="bigquery",experiments="worker_accelerator=type:nvidia-tesla-t4;count:1;install-nvidia-driver" --max-workers=1 --region=us-east4 --worker-zone=us-east4-b --additional-experiments='use_runner_v2'
This caused an error:
ERROR: (gcloud.dataflow.flex-template.run) INVALID_ARGUMENT: The template parameters are invalid. Details:
experiments: Runtime parameter experiments should not be specified in both parameters field and environment field.
I can get each experiment to work separately but cannot get them to both work. Is there a simple fix for this? I haven't been able to find anything in the documentation nor figured it out myself.
Please let me know any additional information you need me to provide.
You can pass both experiments in additional-experiments:
--additional-experiments=worker_accelerator=type:nvidia-tesla-t4;count:1;install-nvidia-driver,use_runner_v2
--additional-experiments=worker_accelerator=type:nvidia-tesla-t4;count:1;install-nvidia-driver \
--additional-experiments=use_runner_v2
should both work.

In jenkins-kubernetes-plugin, how to generate labels in Pod template that are based on a pattern

Set-Up
I am using jenkins-kubernetes-plugin to run our QE jobs. The QE jobs are executed over multiple PODs and each POD has a static set of labels like testing chrome
Issue:
In these QE jobs, there is one port say 7900 that I want to expose through Kubernetes Ingress Controller.
The issue is we have multiple PODs running from the same Pod Template and they all have the same set of labels. For Ingress Controller to work, I want these PODs to have some labels that come from a pattern.
Like POD1 has a label chrome-1 and POD2 has a label called chrome-2 and so on...
Is this possible?
This is not currently possible directly, but you could use groovy in the pipeline to customize it, ie. add the build id as a label

Labelling Openshift Build transient pods

I have a simple S2I build running on an Openshift 3.11 project that outputs to an Image Stream.
The build is working fine, the resulting images are tagged correctly and available in the stream. My issue is, each build spins up a transient pod to handle doing the actual build. I would like to label these pods. This project is shared between multiple teams and we have several scripts that differentiate pods based on a label.
Right now each one automatically gets labelled like so:
openshift.io/build.name: <buildname>-<buildnum>
Which is fine, I don't want to get rid of that label. I just want to add an additional custom label (something like owner: <teamname>). How can I do that?
You can add your custom label using imageLabels section in the buildConfig as follows. Further information is here, Output Image Labels.
spec:
output:
to:
kind: "ImageStreamTag"
name: "your-image:latest"
imageLabels:
- name: "owner"
value: "yourteamname"

Automated ansible output parsing in jenkins pipeline

How do you automatically parse ansible warnings and errors in your jenkins pipeline jobs?
I greatly enjoy the power of leveraging in ansible in jenkins when it works. Upon a failure, the hunt to locate the actual error can be challenging.
I use WarningsNG which supports custom parsers (and allows their programmatic generation)
Do you know of any plugins or addons that already transform these logs into the kind charts similar to WarningsNG?
I figured I'd ask as I go off into deep regex land and make my own.
One good way to achieve this seems to be the following:
select an existing structured output ansible callback plugin (json, junit and yaml are all viable) . I selected junit as I can play with the format to get a really nice view into the playbook with errors reported in a very obvious way.
fork that GPL file (yes, so be careful with that license) to augment with the following:
store output as file
implement the missing callback methods (the three mentioned above do not implement the v2...item callbacks.
forward events to the default or debug callback to ensure operators see something when they execute the plan
add a secrets cleaner - if you use jenkins credentials-binding-plugin it will hide secrets from the console, it will not not hide secrets within stored files. You'll need to handle that in your playbook or via some groovy code (if groovy, try{...} finally { clean } seems a good pattern)
Snippet - forewarding to default callback
from ansible.plugins.callback.default import CallbackModule as CallbackModule_default
...
class CallbackModule(CallbackBase):
CALLBACK_VERSION = 2.0
CALLBACK_TYPE = 'stdout'
CALLBACK_NAME = 'json'
def __init__(self, display=None):
super(CallbackModule, self).__init__(display)
self.default_callback = CallbackModule_default()
...
def v2_on_file_diff(self, result):
self.default_callback.v2_on_file_diff(result)
... do whatever you'd want to ensure the content appears in the json file

How to use dockerhub-notification-plugin in Jenkins scripted pipeline?

I want to trigger a pipeline when a new image is pushed to docker hub.
I installed dockerhub-notification-plugin.
If I use web UI it's possible to specify the docker hub repo:
I tried to use pipeline snippet generator, but it is not working correctly: if I specify a repo it's ignored in generated code.
For example:
generates code:
properties([pipelineTriggers([[$class: 'DockerHubTrigger', options: []]])])
As you can see there is no docker hub repo specified in the generated code.
The correct way to do this is to write your properties like below:
properties([
pipelineTriggers([[$class: 'DockerHubTrigger', options: [[$class: 'TriggerOnSpecifiedImageNames', repoNames: ["YOUR_REPO_NAME"].toSet()]]]])
])
First notice the additional parenthesis around options value. This is due to the way how groovy scripts are evaluated in jenkins.
But why set?
According to the javadoc TriggerOnSpecifiedImageNames class has three constructors: without parameters, with varargs of strings and with collection. But groovy will use reflection to instantiate this class, which means that the default constructor will be called and later respective properties will be applied. And this brings us to the toSet() because as you can see in javadoc there is a setter for repo names property which looks like follow: setRepoNames(Set<String> repoNames).

Resources