How to make a kubeflow pipeline step depend on multiple previous steps - kubeflow

I am running several kf steps in parallel. When they all complete AND if they have all succeeded I would like to trigger a last final step. With my current implementation the last step triggers if any of the previous ones succeeds which is not what I intend.
I have been looking at the documentation but I could not find a straightforward way to do it. Could someone provide an example?

If kubeflow knows about a dependency between your steps (eg. step B depends on the output of step A) then this will happen automatically. Otherwise, you can just do this
from kfp import components, dsl, Client
#components.func_to_container_op
def echo(text: str)-> str:
print(text)
return text
#dsl.pipeline("with-afters")
def my_pipeline():
parallel_1 = echo("bish")
parallel_2 = echo("bash")
serial_1 = echo("bosh").after(parallel_1).after(parallel_2)
If you want to wait for a loop it's easy too:
#dsl.pipeline("with-loop")
def loop_pipeline():
with dsl.ParallelFor(["bish", "bash"]) as word:
loop_step = echo(word)
serial_1 = echo("bosh").after(loop_step)

You should return a Boolean output from each of them and then check whether they're all true in the downstream step. I don't know Kubeflow syntax off the top of my head but with ZenML (which you run on Kubeflow) it would look like:
#step
def step1() -> bool:
if successful:
return True
return False
#step
def step2() -> bool:
if successful:
return True
return False
#step
def step3() -> bool:
if successful:
return True
return False
#step
def downstream(step1: book, step2: book, step3: bool) -> bool:
if step1 and step2 and step3:
# execute stuff
return True
return False
#pipeline
def p(step1, step2, step3, downstream):
downstream(
step1(),
step2(),
step3(),
)
p(step1(), step2(), step3(), downstream ()).run()

Related

why is my for-each loop not working properly in parallel stages in jenkins scripted syntax? [duplicate]

In the context of Jenkins pipelines, I have some Groovy code that's enumerating a list, creating closures, and then using that value in the closure as a key to lookup another value in a map. This appears to be rife with some sort of anomaly or race condition almost every time.
This is a simplification of the code:
def tasks = [:]
for (platformName in platforms) {
// ...
tasks[platformName] = {
def componentUploadPath = componentUploadPaths[platformName]
echo "Uploading for platform [${platformName}] to [${componentUploadPath}]."
// ...
}
tasks.failFast = true
parallel(tasks)
platforms has two values. I will usually see two iterations and two tasks registered and the keys in tasks will be correct, but the echo statement inside the closure indicates that we're just running one of the platforms twice:
14:20:02 [platform2] Uploading for platform [platform1] to [some_path/platform1].
14:20:02 [platform1] Uploading for platform [platform1] to [some_path/platform1].
It's ridiculous.
What do I need to add or do differently?
It's the same issue as you'd see in Javascript.
When you generate the closures in a for loop, they are bound to a variable, not the value of the variable.
When the loop exits, and the closures are run, they will all be using the same value...that is -- the last value in the for loop before it exited
For example, you'd expect the following to print 1 2 3 4, but it doesn't
def closures = []
for (i in 1..4) {
closures << { -> println i }
}
closures.each { it() }
It prints 4 4 4 4
To fix this, you need to do one of two things... First, you could capture the value in a locally scoped variable, then close over this variable:
for (i in 1..4) {
def n = i
closures << { -> println n }
}
The second thing you could do is use groovy's each or collect as each time they are called, the variable is a different instance, so it works again:
(1..4).each { i ->
closures << { -> println i }
}
For your case, you can loop over platforms and collect into a map at the same time by using collectEntries:
def tasks = platforms.collectEntries { platformName ->
[
platformName,
{ ->
def componentUploadPath = componentUploadPaths[platformName]
echo "Uploading for platform [${platformName}] to [${componentUploadPath}]."
}
]
}
Hope this helps!

Test if there is a match when using regular expression in jenkins pipeline

I am using regex to grab a number from a string in my pipeline
it works ok as long that I have a match, but when there is no match I get an error
java.lang.IndexOutOfBoundsException: index is out of range 0..-1 (index = 0)
There error happens when i try to capture the group on following line
env.ChangeNr = chngnr[0][1]
How can i test if there isn't a match from my capture group ?
This is the pipeline
pipeline {
agent {
node {
label 'myApplicationNode'
}
}
environment {
GIT_MESSAGE = "${bat(script: "git log --no-walk --format=format:%%s ${GIT_COMMIT}", returnStdout: true)}".readLines().drop(2).join(" ")
}
stages {
stage('get_commit_msg'){
steps {
script {
def gitmsg=env.GIT_MESSAGE
def chngnr = gitmsg =~/([0-9]{1,8})/
env.ChangeNr = chngnr[0][1] /* put test if nothing is extracted */
}
}
}
}
}
In groovy when you use the =~ (find operator) it actually creates a java.util.regex.Matcher and therefore you can use any of its standard methods like find() or size(), so in your case you can jest use the size function to test if there are any matched patterns before you attempt to extract any groups:
def chngnr = gitmsg =~/([0-9]{1,8})/
assert chngnr.size() > 0
env.ChangeNr = chngnr[0][1]
Another nice option is to use the =~ operator in context of boolean, in this case, Groovy implicitly invokes the matcher.find() method, which means that the expression evaluates to true if any part of the string matches the pattern:
def chngnr = gitmsg =~/([0-9]{1,8})/
if(chngnr){
env.ChangeNr = chngnr[0][1]
}
else {
...
}
You can read more info on Groovy Regular Expressions Here

Jenkins Pipeline errors out accepting multiple values from a groovy method

I'm trying to accept multiple values from a groovy method into a Jenkins pipeline and keep hitting Pipeline Workflow errors, any pointers as to what I'm doing wrong here is greatly appreciated.
(env.var1, env.var2, env.var3) = my_func()
def my_func(){
def a =10
def b =10
def c =10
return [a, b, c]
}
I get following error:
expecting ')', found ',' #(env.var1, env.var2, env.var3) = my_func()
You are using Groovy's multiple assignment feature incorrectly. It works when you assign a collection of values to a list of new variables. You can't use this type of assignment to assign values to an existing object. Your code also fails when executed in plain Groovy:
def env = [foo: 'bar']
(env.var1, env.var2, env.var3) = my_func()
println env
def my_func(){
def a =10
def b =10
def c =10
return [a,b,c]
}
Output:
1 compilation error:
expecting ')', found ',' at line: 3, column: 14
In Jenkins environment, env variable is represented not by a map, but by EnvActionImpl object which means it does not even support plus() or putAll() methods. It only overrides getProperty() and setProperty() methods, so you can access properties with env.name dot notation.
Solution
The simplest solution to your problem is to use multiple assignment correctly and then set env variables from variables. Consider following example:
node {
stage("A") {
def (var1, var2, var3) = my_func()
env.var1 = var1
env.var2 = var2
env.var3 = var3
}
stage("B") {
println env.var1
}
}
def my_func() {
def a = 10
def b = 10
def c = 10
return [a, b, c]
}
Keep in mind that var1, var2 and var3 variables cannot already exist in current scope, otherwise compiler will throw an exception.

Making a variable number of parallel HTTP requests with Gatling?

I am trying to model a server-to-server REST API interaction in Gatling 2.2.0. There are several interactions of the type "request a list and then request all items on the list at in parallel", but I can't seem to model this in Gatling. Code so far:
def groupBy(dimensions: Seq[String], metric: String) = {
http("group by")
.post(endpoint)
.body(...).asJSON
.check(
...
.saveAs("events")
)
}
scenario("Dashboard scenario")
.exec(groupBy(dimensions, metric)
.resources(
// a http() for each item in session("events"), plz
)
)
I have gotten as far as figuring out that parallel requests are performed by .resources(), but I don't understand how to generate a list of requests to feed it. Any input is appreciated.
Below approach is working for me. Seq of HttpRequestBuilder will be executed concurrently:
val numberOfParallelReq = 1000
val scn = scenario("Some scenario")
.exec(
http("first request")
.post(url)
.resources(parallelRequests: _*)
.body(StringBody(firstReqBody))
.check(status.is(200))
)
def parallelRequests: Seq[HttpRequestBuilder] =
(0 until numberOfParallelReq).map(i => generatePageRequest(i))
def generatePageRequest(id: Int): HttpRequestBuilder = {
val body = "Your request body here...."
http("page")
.post(url)
.body(StringBody(body))
.check(status.is(200))
}
Not very sure of your query but seems like you need to send parallel request which can be done by
setUp(scenorio.inject(atOnceUsers(NO_OF_USERS)));
Refer this http://gatling.io/docs/2.0.0-RC2/general/simulation_setup.html

How to parse text in Groovy

I need to parse a text (output from a svn command) in order to retrieve a number (svn revision).
This is my code. Note that I need to retrieve all the output stream as a text to do other operations.
def proc = cmdLine.execute() // Call *execute* on the strin
proc.waitFor() // Wait for the command to finish
def output = proc.in.text
//other stuff happening here
output.eachLine {
line ->
def revisionPrefix = "Last Changed Rev: "
if (line.startsWith(revisionPrefix)) res = new Integer(line.substring(revisionPrefix.length()).trim())
}
This code is working fine, but since I'm still a novice in Groovy, I'm wondering if there were a better idiomatic way to avoid the ugly if...
Example of svn output (but of course the problem is more general)
Path: .
Working Copy Root Path: /svn
URL: svn+ssh://svn.company.com/opt/svnserve/repos/project/trunk
Repository Root: svn+ssh://svn.company.com/opt/svnserve/repos
Repository UUID: 516c549e-805d-4d3d-bafa-98aea39579ae
Revision: 25447
Node Kind: directory
Schedule: normal
Last Changed Author: ubi
Last Changed Rev: 25362
Last Changed Date: 2012-11-22 10:27:00 +0000 (Thu, 22 Nov 2012)
I've got inspiration from the answer below and I solved using find(). My solution is:
def revisionPrefix = "Last Changed Rev: "
def line = output.readLines().find { line -> line.startsWith(revisionPrefix) }
def res = new Integer(line?.substring(revisionPrefix.length())?.trim()?:"0")
3 lines, no if, very clean
One possible alternative is:
def output = cmdLine.execute().text
Integer res = output.readLines().findResult { line ->
(line =~ /^Last Changed Rev: (\d+)$/).with { m ->
if( m.matches() ) {
m[ 0 ][ 1 ] as Integer
}
}
}
Not sure it's better or not. I'm sure others will have different alternatives
Edit:
Also, beware of using proc.text. if your proc outputs a lot of stuff, then you could end up blocking when the inputstream gets full...
Here is a heavily commented alternative, using consumeProcessOutput:
// Run the command
String output = cmdLine.execute().with { proc ->
// Then, with a StringWriter
new StringWriter().with { sw ->
// Consume the output of the process
proc.consumeProcessOutput( sw, System.err )
// Make sure we worked
assert proc.waitFor() == 0
// Return the output (goes into `output` var)
sw.toString()
}
}
// Extract the version from by looking through all the lines
Integer version = output.readLines().findResult { line ->
// Pass the line through a regular expression
(line =~ /Last Changed Rev: (\d+)/).with { m ->
// And if it matches
if( m.matches() ) {
// Return the \d+ part as an Integer
m[ 0 ][ 1 ] as Integer
}
}
}

Resources