How to skip already-run steps in kubeflow pipeline? - kubeflow

I'm building an ML pipeline in Kubeflow and I have a question. Is there anything out of the box that allows me to configure my pipeline, such that a step is not rerun if its output exists? I've thought of ways to do this manually (either checking for existing outputs as I'm compiling the pipeline, or having an initial step that returns a list of steps to run, or manually configuring which steps to run as an input parameter) but I cannot find a native way of handling this.
The common use case for me would be to rerun the model step without rerunning any pre-processing of the data; but without having to have a specific "model development" pipeline that would differ from the more general prod one that would include the data pre-processing step. Or perhaps I'm iterating on an evaluation phase and I don't even need retraining but I would still like to use the same pipeline. Right now, colleagues are using several pipelines, that each start at a different step, to work around this.
I'm coming at it from a map-reduce perspective where this is trivial - the framework automatically detects which outputs are present and doesn't rebuild them as default, but easily gives you the option to rebuild some or all of them. Maybe this is biasing my way of working with kubeflow?
Any help appreciated!

Ok, I thought I'd put on here what I've found to solve this.
As of September 2019, this is not a feature of Kubeflow (according to people working on it), but there is a caching feature in the works that should not rerun any steps whose outputs exist.
In the meantime I manually implemented it, via a pipelineParam 'startingStep' from which everything needs to be rerun. Something like this:
with dsl.Condition(first_step_to_run == "prep"):
create_ops(StartingStep.prep)
with dsl.Condition(first_step_to_run == "train"):
create_ops(StartingStep.train)
with dsl.Condition(first_step_to_run == "evaluate"):
create_ops(StartingStep.evaluate)
with a create_ops method that understands what order to create steps in and chains them appropriately (we actually have seven steps so I really wanted to avoid copy/pasting all over).

Related

Jenkins pipeline from YAML file

Jenkins declarative pipeline is too powerful for us, often users can abuse it. We are thinking to use an opinionated YAML to describe CI/CD pipeline. And it seems there are two choices.
Write a plugin and consume YAML and dynamically create stage / steps.
Write a plugin to convert a YAML to Jenkins pipeline.
I am not expert on Jenkins, so I hope some expert can give some guidance and maybe an example.
using official plugin pipeline-as-yaml, but it has a fixed grammar.
using or customization wolox-ci
create your own shared libaray. However, they are easy from beginning but full grammer design is required when used widely. Here is a psudo code based on curry.
// create a file named yamlCompiler.groovy in shared library,
def call(str){
def rawMap = readYaml(text: str)
// consume yaml and get a lambda function
return {
stage{
steps.each{it ->
it."$type"(it)
}
}
}
}
Use yamlCompiler in your jenkinsfile code block.
#Library('your libs name')
def str =
'''
steps:
- type: sh
script: ls -la
- type: echo
message: xxx
'''
Closure closure = yamlCompiler(str)
closure.call()
I'm looking for a similar solution. We run hardened predefined pipelines for every project, but still want to allow dev teams to customise certain steps within the process —without allowing them the full power of a Jenkinsfile.
I'm also exploring the possibility of an —in your words— "opinionated YAML".
I've so far only found one example of such an implementation: Wolox-CI supports their own pre-defined build steps via YAML. You'll be able to see the steps they support here.
I'm thinking of parsing the YAML using Snake YAML. Here's an SO answer with an example on how to do it.
Two solutions:
create a shared library to abstract the actual pipeline and provide to your users some guidance on how to setup a shared library and a Jenkinsfile sample. Here is an example of embeded pipeline https://github.com/SAP/jenkins-library/blob/master/vars/piperPipeline.groovy
use another tool like https://drone.io/
If you're not an expert and don't want/have the time to become one, the second solution might be the best one.
Really? Is the only difference here when the plugin is executed?:
Write a plugin and consume YAML and dynamically create stage / steps.
Write a plugin to convert a YAML to Jenkins pipeline.
Forgive me, because I may be a little hardened, but abstracting a layer for the dynamic creation of a declarative, or scripted, Jenkinsfile written in the simple groovy lang syntax so that it can be pretty-printed in yml prevents users from updating your yml exactly how? It seems to me your abstraction only adds to the complexity with which you wish to implement usability.
One, all the current yml plugins for Jenkins do exactly that. Two, they don't actually have the full breadth of "features" (yes, I'm using that term loosely here) accessible by implementing the groovy/(java) classes already available in the Jenkins domain (referencing the DSL). Two solutions exist right now for this, and I've investigated both, and implemented both, extensively. One is wolox-ci, which is the better of the two, and the other is Pipeline-as-YAML. In my opinion, it's easy to use, but both lack the full breadth of implementation features simply using groovy provides. So why force it? Simply so your users can have a pretty-printed yml file, and not have to be concerned with simple syntax, which you claim hardens your infrastructure-as-code backend so that the same users can't screw it up? Sorry, I'm calling bull pucky on that assertion. What's to stop anyone from totally screwing up your builds by pushing a change to the yml file which breaks the integration with groovy, or worse, completely changes an algorithm you worked hard to customize?
Sorry, I just don't get it. Sure, making something more human readable is always a good thing. Doing it because of the reasons you've stipulated makes no sense, though. Also, unless you have a super simple defined algorithm in your CI/CD process, without any non-continuous-passing-style transform methods being implemented, then using the current iterations of the yml-as-Jenkinsfile-templates plugins is probably not the way you want to go.
Now, you could write your own plugin to do this, but what's the technical debt on that, versus just learning the groovy syntax? Also, it still doesn't prevent users from making code changes to your build infrastructure, then integrating those changes in a simple yml file.

combine allure reports from several machines into one without retry

I ask you to consult on the following question about allure: I use jenkins + pytest to run the tests. The same tests run on several virtual machines, these machines differ in operating systems (different linux distributions) and test environment. After running the tests, I want to combine the results from all the machines into one report. - here the question arises - if I put all the reports in one directory and generate a report, then the results from different machines will be considered as rerun of the same test and combined into one. How can I get around this? so as not to be combined and so that it was possible to somehow sort out which result from which machine. Thanks.
i have solve this by override the names of tests/suites.
Meaning you have to make some code implementation, work with the before listeners, there you can get the current test name and override it. Set the test name by OS + Browser or something unique.
When you combine reports, they will be unique and properly displayed.
I ran into a similar issue with behave where Allure was treating each parallel build as a retry of the first build. I realize this isn't the same as pytest, but perhaps it'll help.
I was inspired by the previous answer and started experimenting. By changing the scenario name(s) within the feature, I was able to make Allure recognize each parallel build as separate tests. I accomplished this by adding a before_feature method to my environment.py file that simply added the hostname to each scenario name within that feature:
def before_feature(context, feature):
for scenario in feature.scenarios:
scenario.name = f'[{socket.gethostname()}] {scenario.name}'
Originally, I tried to directly change scenario.name in before_scenario but that seemed to have no effect in Allure.

Add condition to transition using script runner

I am using the scriptrunner plugin for Jira.
Is it possible to add a condition to a transition using scriptrunner?
Currently, my condition is in a script which I have manually added to the workflow.
But I was wondering if there is a way to do it automatically?
I was looking through documentation on: https://docs.atlassian.com/
I came across this method:
replaceConditionInTransition which is a method of WorkFlowManager.
But I'm unsure on how to use this.
Any help would be appreciated.
Conditions as any another scripts can be added from file system. You can store scripts in any VCS (bitbucket, github, gitlab, etc) and automatically deploy them to Jira server file system through any CI/CD system (teamcity, jenkins, bamboo, gitlab, etc).
So, as result process will be looks like. 1. commit changes in you script to vcs 2. wait a bit for auto deploy (e.g. triggered by commit) 3. done. As additional you can write any script/service/etc for commit these changes automatically if needed.
Also look at script roots it's helpful way which allows reuse any of script fragments through helpers classes.
It's rather conceptual answer basically because implementation is depends on environment, but I hope that you get at least one more point of view to solve this task.
I think that using the Java API to modify Jira workflows is pretty tough. You could dig around in the workflow editor to see how conditions are added there. Remember that you have to do this in a draft workflow and then publish it, which takes some time in large projects
I like the idea of replacing a script file as easier, if it can be done when no issues are transitioning

configure grid extra (groupon) with jenkins in order to run cucumber tests

I'm struggling with something for quite a while and can't find a solution,
I got a test project (cucumber, maven) I configure jenkins to pull the project from github, build and execute the code (selenium test script) on jenkins slave and that works perfect, I added few more slave, tagged them and I'm able to execute the same job parallel(the same test cases on different machines)
my next step is to use grid extra (https://github.com/groupon/Selenium-Grid-Extras) in order to use some cool features like video recording, browser updating, selenium updates etc...
now, I know that in order to use the grid I need to address it via my code and also define the desired capabilities (browser, os etc...).
currently when I'm running the same job twice, my second request will be queued till the first one ends, if I will run the same code from two developers machine it will run at 2 different nodes and the grid can handle both request.
not sure what is wrong with my jenkins configuration or my grid hub configuration, I checked it again and again and all looks good :-)
so guess I'm missing something
any advice/direction/idea will be highly appreciated.
Thanks
Ronen

Revert snapshot before or after each test method in a TFS build-deploy-test workflow?

I have a scenario that requires me to revert to a clean snapshot before or after each Coded UI test method is executed. I have researched using the TFS Lab Management API (see http://blogs.microsoft.co.il/shair/2011/12/22/tfs-api-part-42-getting-started-with-lab-management-api/) to revert to a specific snapshot as part of the TestInitialize and/or TestCleanup method, but I can only get this to work when executed locally. When executed on a remote machine I get errors authenticating to the TFS service.
My other option is to somehow do a 'foreach test in testrun' into the build process template (LabDefaultTemplate.11.xaml). I have identified the area that I think this would fit best, but cannot find any documentation on running a loop on each test.
Is this something that is possible, or is there somehow a built in method to accomplish this that I have overlooked?
To do what you propose you should switch to Release Management and create a separate test run for each of your groupings, in your case each test. You can use RM to orchestrate looping through each of your runs and executing then.
http://nakedalm.com/execute-tests-release-management-visual-studio-2013/
However running a UI test should not break your application and I would suggest that either your tests are way too long, or there is some flaw in the design of your application.

Resources