In Kubeflow Pipelines, how to send a list of elements to a lightweight python component? - kubeflow

I am trying to send a list of elements as a PipelineParameter to a lightweight component.
Here is a sample that reproduces the problem. Here is the function:
def my_func(my_list: list) -> bool:
print(f'my_list is {my_list}')
print(f'my_list is of type {type(my_list)}')
print(f'elem 0 is {my_list[0]}')
print(f'elem 1 is {my_list[1]}')
return True
And if I execute it with this:
test_data = ['abc', 'def']
my_func(test_data)
It behaves as expected:
my_list is ['abc', 'def']
my_list is of type <class 'list'>
elem 0 is abc
elem 1 is def
but if I wrap it in an op and and set up a pipeline:
import kfp
my_op = kfp.components.func_to_container_op(my_func)
#kfp.dsl.pipeline()
def my_pipeline(my_list: kfp.dsl.PipelineParam = kfp.dsl.PipelineParam('my_list', param_type=kfp.dsl.types.List())):
my_op(my_list)
kfp.compiler.Compiler().compile(my_pipeline, 'my_pipeline.zip')
And then run a pipeline:
client = kfp.Client()
experiment = client.create_experiment('Default')
client.run_pipeline(experiment.id, 'my job', 'my_pipeline.zip', params={'my_list': test_data})
Then it seems at some point my list was converted to a string!
my_list is ['abc', 'def']
my_list is of type <class 'str'>
elem 0 is [
elem 1 is '

Here is a workaround I discovered, serializing arguments as a json string. Not sure this is really the best way...
The bare function becomes:
def my_func(json_arg_str: str) -> bool:
import json
args = json.loads(json_arg_str)
my_list = args['my_list']
print(f'my_list is {my_list}')
print(f'my_list is of type {type(my_list)}')
print(f'elem 0 is {my_list[0]}')
print(f'elem 1 is {my_list[1]}')
return True
Which still works as long as you pass the args as a json string instead of a list:
test_data = '{"my_list":["abc", "def"]}'
my_func(test_data)
Which produces expected results:
my_list is ['abc', 'def']
my_list is of type <class 'list'>
elem 0 is abc
elem 1 is def
And now the pipeline is changed to accept a str instead of a PipelineParam of type kfp.dsl.types.List:
import kfp
my_op = kfp.components.func_to_container_op(my_func)
#kfp.dsl.pipeline()
def my_pipeline(json_arg_str: str):
my_op(json_arg_str)
kfp.compiler.Compiler().compile(my_pipeline, 'my_pipeline.zip')
Which, when executed like this:
client = kfp.Client()
experiment = client.create_experiment('Default')
client.run_pipeline(experiment.id, 'my job', 'my_pipeline.zip', params={'json_arg_str': test_data})
Produces the same result:
my_list is ['abc', 'def']
my_list is of type <class 'list'>
elem 0 is abc
elem 1 is def
Although it works, I nevertheless find this workaround annoying. What then is the point of kfp.dsl.types.List, if not for allowing a PipelineParam that is a List?

Currently the best option seems to be serializing the arguments. There is one issue related to this: https://github.com/kubeflow/pipelines/issues/1901

Related

How to iterate over a compile-time seq in a manner that unrolls the loop?

I have a sequence of values that I know at compile-time, for example: const x: seq[string] = #["s1", "s2", "s3"]
I want to loop over that seq in a manner that keeps the variable a static string instead of a string as I intend to use these strings with macros later.
I can iterate on objects in such a manner using the fieldPairs iterator, but how can I do the same with just a seq?
A normal loop such as
for s in x:
echo s is static string
does not work, as s will be a string, which is not what I need.
The folks over at the nim forum were very helpful (here the thread).
The solution appears to be writing your own macro to do this. 2 solutions I managed to make work for me were from the users mratsim and a specialized version from hlaaftana
Hlaaftana's version:
This one unrolls the loop over the various values in the sequence. By that I mean, that the "iterating variable s" changes its value and is always the value of one of the entries of that compile-time seq x (or in this example a). In that way it functions basically like a normal for-in loop.
import macros
macro unrollSeq(x: static seq[string], name, body: untyped) =
result = newStmtList()
for a in x:
result.add(newBlockStmt(newStmtList(
newConstStmt(name, newLit(a)),
copy body
)))
const a = #["la", "le", "li", "lo", "lu"]
unrollSeq(a, s):
echo s is static
echo s
mratsim's version:
This one doesn't unroll a loop over the values, but over a range of indices.
You basically tell the staticFor macro over what range of values you want an unrolled for loop and it generates that for you. You can access the individual entries in the seq then with that index.
import std/macros
proc replaceNodes(ast: NimNode, what: NimNode, by: NimNode): NimNode =
# Replace "what" ident node by "by"
proc inspect(node: NimNode): NimNode =
case node.kind:
of {nnkIdent, nnkSym}:
if node.eqIdent(what):
return by
return node
of nnkEmpty:
return node
of nnkLiterals:
return node
else:
var rTree = node.kind.newTree()
for child in node:
rTree.add inspect(child)
return rTree
result = inspect(ast)
macro staticFor*(idx: untyped{nkIdent}, start, stopEx: static int, body: untyped): untyped =
result = newStmtList()
for i in start .. stopEx: # Slight modification here to make indexing behave more in line with the rest of nim-lang
result.add nnkBlockStmt.newTree(
ident("unrolledIter_" & $idx & $i),
body.replaceNodes(idx, newLit i)
)
staticFor(index, x.low, x.high):
echo index
echo x[index] is static string
Elegantbeefs version
Similar to Hlaaftana's version this unrolls the loop itself and provides you a value, not an index.
import std/[macros, typetraits]
proc replaceAll(body, name, wth: NimNode) =
for i, x in body:
if x.kind == nnkIdent and name.eqIdent x:
body[i] = wth
else:
x.replaceAll(name, wth)
template unrolledFor*(nameP, toUnroll, bodyP: untyped): untyped =
mixin
getType,
newTree,
NimNodeKind,
`[]`,
add,
newIdentDefs,
newEmptyNode,
newStmtList,
newLit,
replaceAll,
copyNimTree
macro myInnerMacro(name, body: untyped) {.gensym.} =
let typ = getType(typeof(toUnroll))
result = nnkBlockStmt.newTree(newEmptyNode(), newStmtList())
result[^1].add nnkVarSection.newTree(newIdentDefs(name, typ[^1]))
for x in toUnroll:
let myBody = body.copyNimTree()
myBody.replaceAll(name, newLit(x))
result[^1].add myBody
myInnerMacro(nameP, bodyP)
const x = #["la", "le", "Li"]
unrolledFor(value, x):
echo value is static
echo value
All of them are valid approaches.

Jenkins Pipeline errors out accepting multiple values from a groovy method

I'm trying to accept multiple values from a groovy method into a Jenkins pipeline and keep hitting Pipeline Workflow errors, any pointers as to what I'm doing wrong here is greatly appreciated.
(env.var1, env.var2, env.var3) = my_func()
def my_func(){
def a =10
def b =10
def c =10
return [a, b, c]
}
I get following error:
expecting ')', found ',' #(env.var1, env.var2, env.var3) = my_func()
You are using Groovy's multiple assignment feature incorrectly. It works when you assign a collection of values to a list of new variables. You can't use this type of assignment to assign values to an existing object. Your code also fails when executed in plain Groovy:
def env = [foo: 'bar']
(env.var1, env.var2, env.var3) = my_func()
println env
def my_func(){
def a =10
def b =10
def c =10
return [a,b,c]
}
Output:
1 compilation error:
expecting ')', found ',' at line: 3, column: 14
In Jenkins environment, env variable is represented not by a map, but by EnvActionImpl object which means it does not even support plus() or putAll() methods. It only overrides getProperty() and setProperty() methods, so you can access properties with env.name dot notation.
Solution
The simplest solution to your problem is to use multiple assignment correctly and then set env variables from variables. Consider following example:
node {
stage("A") {
def (var1, var2, var3) = my_func()
env.var1 = var1
env.var2 = var2
env.var3 = var3
}
stage("B") {
println env.var1
}
}
def my_func() {
def a = 10
def b = 10
def c = 10
return [a, b, c]
}
Keep in mind that var1, var2 and var3 variables cannot already exist in current scope, otherwise compiler will throw an exception.

Lua, Modify print function

I am writing a generic Log() function in lua which utilizes lua print function:
Log (variable, 'String: %s ', str, 'Word: %d', w)
Currently I'm using below approach:
print(string.format (variable, 'String: %s ', str, 'Word: %d', w))
I tried something like:
Log = function(...) begin
return print(string.format(...))
end
But it doesn't work, Is this correct approach? Or Is there any better more generic way to get this done?
If you just want to print a sequence of values, you can do that with print:
print(variable, 'String: %s ', str, 'Word: %d', w)
What you seem to want is something more complicated. Your algorithm seems to be:
For each argument:
If the argument is not a string, then convert it to a string and print it.
If the argument is a string, figure out how many % patterns it has (let us call this number k). Pass string.format the current argument string and the following k parameters, printing the resulting string. Advance k parameters.
That's a much more complicated algorithm than can be done in a one-line system.
Using Lua 5.3, here's what such a function would look like (note: barely tested code):
function Log(...)
local values = {}
local params = table.pack(...)
local curr_ix = 1
while (curr_ix <= params.n) do
local value = params[curr_ix]
if(type(value) == "string") then
--Count the number of `%` characters, *except* for
--sequential `%%`.
local num_formats = 0
for _ in value:gmatch("%%[^%%]") do
num_formats = num_formats + 1
end
value = string.format(table.unpack(params, curr_ix, num_formats + curr_ix))
curr_ix = curr_ix + num_formats
end
values[#values + 1] = value
curr_ix = curr_ix + 1
end
print(table.unpack(values))
end
I don't think your current approach works, because the first argument of string.format expects the format specifier, not the rest of the arguments.
Anyway, this is the way to combine formatting and printing together:
Log = function(...)
return print(string.format(...))
end
And call it like this:
Log("String: %s Number: %d", 'hello' , 42)
Also, it might be better to make the format specifier argument more explicit, and use io.write instead of print to get more control over printing:
function Log(fmt, ...)
return io.write(string.format(fmt, ...))
end

implementing torch's __len__ meta function

In our torch-dataframe project we're trying to implement __len__ meta function as follows :
MyClass.__len__ = argcheck{
{name="self", type="MyClass"},
{name="other", type="MyClass"},
call=function(self, other)
return self.n_rows
end}
This works in Lua 5.2 and 5.3 but for Lua 5.1, luajit 2.0 and 2.1 the returned variable is not the actual row number but 0. The sense is that it returns a new instance of MyClass but it's hard to understand why. There is a note about __len changing here but that's the best doc hint we've managed to locate so far.
A little surprising is the need for two arguments. When argcheck is provided with a single argument version:
MyClass.__len__ = argcheck{
{name = "self", type = "MyClass"},
call=function(self)
return self.n_rows
end}
it throws:
[string "argcheck"]:28:
Arguments:
({
self = MyClass --
})
Got: MyClass, MyClass
We currently rely on the argcheck overload operator for handling this:
MyClass.__len__ = argcheck{
{name="self", type="MyClass"},
{name="other", type="MyClass"},
call=function(self, other)
return self.n_rows
end}
MyClass.__len__ = argcheck{
overload=MyClass.__len__,
{name="self", type="MyClass"},
call=function(self)
return self.n_rows
end}
For more details here is the full class and the travis report :
Full metatable class
Travis report
Test case
Here's a full test-case that works as expected in 5.2 and 5.3 that perhaps illustrates the problem in a more concise way that the full package:
require 'torch'
local argcheck = require "argcheck"
local MyClass = torch.class("MyClass")
function MyClass:init()
self.n_rows = 0
end
MyClass.__len__ = argcheck{
{name = "self", type = "MyClass"},
{name = "other", type = "MyClass"},
call=function(self, other)
print(self.n_rows)
print(other.n_rows)
return(self.n_rows)
end}
local obj = MyClass.new()
obj.n_rows = 1
local n = #obj
print(n)
This prints as expected:
1
1
1
The issue is related to this SO question. There simply is no support for it in 5.1:
__len on tables is scheduled to be supported in 5.2. See LuaFiveTwo.

Use a string of expressions in a method for python

For example, if I have the string:
"id=100 id2=200 id3=300 ..."
where the variable names, values, and number of expressions can be anything.
How can I then use that string in a method that is used like this:
method(id=100,id2=200,id3=300,...)
I get the string from a command line argument.
We parse them iteratively:
pairs = "id=100 id2=200 id3=300".split(' ')
res = {}
for p in pairs:
k,v = p.rsplit('=', 1)
res[k] = v
print res # prints {'id2': '200', 'id': '100', 'id3': '300'}
# now we can send the dictionary to the method
You can first convert it to a dictionary:
>>> s = "id=100 id2=200 id3=300"
>>> d = dict(a.split('=') for a in s.plit())
>>> print d
{'id2': '200', 'id': '100', 'id3': '300'}
And now use it in functions:
>>> method(**d)

Resources