Sharing Beam State across different DoFns - google-cloud-dataflow

Is Beam State shared across different DoFns?
Lets say I have 2 DoFns:
StatefulDoFn1: { myState.write(1)}
StatefulDoFn2: { myState.read() ; do something ... output}
And then the pipeline in pseudocode:
pipline = readInput.........applyDoFn(StatefulDoFn1)......map{do something else}.......applyDoFn(StatefulDoFn2)
If I annotate myState identically in both StatefulDoFns - will what I write in StatefulDoFn1 be visible to StatefulDoFn2 , we implemented a pipeline with the assumption the answer is Yes ---- but it seems to be no

No, state is local to each stateful DoFn, and it is also actually local to each key (and window, if you are using a window) inside that DoFn.

Related

anylogic agent communication and message sending

In my model, I have some agents;
"Demand" agent,
"EnergyProducer1" agent
"EnergyProducer2" agent.
When my hourly energy demands are created in the Main agent with a function, the priority for satisfying this demand is belongs to "EnergyProducer1" agent. In this agent, I have a function that calculate energy production based on some situtations. The some part of the inside of this function is following;
**" if (statechartA.isStateActive(Operating.busy)) && ( main.heatLoadDemandPerHour >= heatPowerNominal) {
producedHeatPower = heatPowerNominal;
naturalGasConsumptionA = naturalGasConsumptionNominal;
send("boilerWorking",boiler);
} else ..... "**
Here my question is related to 4th line of the code. If my agent1 fails to satisfy the hourly demand, I have to say agent2 that " to satisfy rest of demand". If I send this message to agent2, its statechart will be active and the function of agent2 will be working. My question is that this all situations will be realized at the same hour ??? İf it is not, is accessing variables and parameters of other agent2 more appropiaote way???
I hope I could explain my problem.
thanks for your help in advance...
**Edited question...
As a general comment on your question, within AnyLogic environment sending messages is alway preferable to directly accessing variable and parameters of another agent.
Specifically in the example presented the send() function will schedule message delivery the next instance after the completion of the current function.
Update: A message in AnyLogic can be any Java class. Sending strings such as "boilerWorking" used in the example is good for general control, however if more information needs to be shared (such as a double value) then it is good practice to create a new Java class (let's call is ModelMessage and follow these instructions) with at least two properties msgStr and msgVal. With this new class sending a message changes from this:
...
send("boilerWorking", boiler);
...
to this:
...
send(new ModelMessage("boilerWorking",42.0), boiler);
...
and firing transitions in the statechart has to be changed to use if expression is true with expression being msg.msgString == "boilerWorking".
More information about Agent communication is available here.

Using global shared libraries in Jenkins to define parameter options

I am trying to use a global class that I've defined in a shared library to help organise job parameters. It's not working, and I'm not even sure if it is possible.
My job looks something like this:
pipelineJob('My-Job') {
definition {
// Job definition goes here
}
parameters {
choiceParam('awsAccount', awsAccount.ALL)
}
}
In a file in /vars/awsAccount.groovy I have the following code:
class awsAccount implements Serializable {
final String SANDPIT = "sandpit",
final String DEV = "dev",
final String PROD = "prod"
static String[] ALL = [SANDPIT, DEV, PROD]
}
Global pipeline libraries are configured to load implicitly from the my repository's master branch.
When attempting to update the DSL scripts I receive the error:
ERROR: (myJob.groovy, line 67) No such property: awsAccount for class: javaposse.jobdsl.dsl.helpers.BuildParametersContext
Why does it not find the class, and is it even possible to use shared library classes like this in pipeline job?
Disclaimer: I know it works using Jenkinsfile. Unfortunatelly, not tested usng Declarative Pipelines - but no answers yet, so it may be worth a try
Regarding your first question: there are some reasons why a class from your shared-lib could not be found. Starting from the library import, the library syntax, etc. But they definitvely work for DSL. To be more precise about it, additional information would be great. But be sure that:
You have your groovy class definition using exactly the directory structure as described in the documentation (https://www.jenkins.io/doc/book/pipeline/shared-libraries/)
Give a name to the shared-lib in jenkins as you configure it and be sure is exactly the name you use in the import
Use the import as described in the documentation (under Using Libraries)
Regarding your second question (the one that names this SO question): yes, you can include parameter jobs from information in your shared-lib. At least, using Jenkinsfiles. You can even define properties to be included in the pipelie. I got it working with a tricky syntax due to different problems.
Again, I am using Jenkinsfile and this is what worked for me:
In my shared-lib class, I added a static function that introduces the build parameters. Notice the input parameters that function needs and its usage:
class awsAccount implements Serializable {
//
static giveMeParameters (script) {
return [
// Some parms
script.string(defaultValue: '', description: 'A default parameter', name: 'textParm'),
script.booleanParam(defaultValue: false, description: 'If set to True, do whatever you need - otherwise, do not do it', name: 'boolOption'),
]
}
}
To introduce those parameters in the pipeline, you need to place the returned value of the function into the parameters array
properties (
parameters (
awsAccount.giveMeParameters (this)
)
Again, notice the syntax when calling the function. Similar to this, you can also define functions in the shared-lib that return properties and use them in multiple jobs (disableConcurrentBuilds, buildDiscarder, etc)

How to find the concurrent.future input arguments for a Dask distributed function call

I'm using Dask to distribute work to a cluster. I'm creating a cluster and calling .submit() to submit a function to the scheduler. It returns a Futures object. I'm trying to figure out how to obtain the input arguments to that future object once it's been completed.
For example:
from dask.distributed import Client
from dask_yarn import YarnCluster
def somefunc(a,b,c ..., n ):
# do something
return
cluster = YarnCluster.from_specification(spec)
client = Client(cluster)
future = client.submit(somefunc, arg1, arg2, ..., argn)
# ^^^ how do I obtain the input arguments for this future object?
# `future.args` doesn't work
Futures don't hold onto their inputs. You can do this yourself though.
futures = {}
future = client.submit(func, *args)
futures[future] = args
A future only knows the key by which it is uniquely known on the scheduler. At the time of submission, if it has dependencies, these are transiently found and sent to the scheduler but no copy if kept locally.
The pattern you are after sounds more like delayed, which keeps hold of its graph, and indeed client.compute(delayed_thing) returns a future.
d = delayed(somefunc)(a, b, c)
future = client.compute(d)
dict(d.dask) # graph of things needed by d
You could communicate directly with the scheduler to find the dependencies of some key, which will in general also be keys, and so reverse-engineer the graph, but that does not sound like a great path, so I won't try to describe it here.

How often does PubsubIO.readStrings pull from subscription

I'm trying to understand Beam/Dataflow concepts better, so pretend I have the following streaming pipeline:
pipeline
.apply(PubsubIO.readStrings().fromSubscription("some-subscription"))
.apply(ParDo.of(new DoFn<String, String>() {
#ProcessElement
public void processElement(ProcessContext c) {
String message = c.element();
LOGGER.debug("Got message: {}", message);
c.output(message);
}
}));
How often will the unbounded source pull messages from the subscription? Is this configurable at all (potentially based on windows/triggers)?
Since no custom windowing/triggers have been defined, and there are no sinks (just a ParDo that logs + re-outputs the message), will my ParDo still be executed immediately as messages are received, and is that setup problematic in any way (not having any windows/triggers/sinks defined)?
It will pull messages from the subscription continuously - as soon as a message arrives, it will be processed immediately (modulo network and RPC latency).
Windowing and triggers do not affect this at all - they only affect how the data gets grouped at grouping operations (GroupByKey and Combine). If your pipeline doesn't have grouping operations, windowing and triggers are basically a no-op.
The Beam model does not have the concept of a sink - writing to various storage systems (e.g. writing files, writing to BigQuery etc) is implemented as regular Beam composite transforms, made of ParDo and GroupByKey like anything else. E.g. writing each element to its own file could be implemented by a ParDo whose #ProcessElement opens the file, writes the element to it and closes the file.

Jenkins Pipeline Multiconfiguration Project

Original situation:
I have a job in Jenkins that is running an ant script. I easily managed to test this ant script on more then one software version using a "Multi-configuration project".
This type of project is really cool because it allows me to specify all the versions of the two software that I need (in my case Java and Matlab) an it will run my ant script with all the combinations of my parameters.
Those parameters are then used as string to be concatenated in the definition of the location of the executable to be used by my ant.
example: env.MATLAB_EXE=/usr/local/MATLAB/${MATLAB_VERSION}/bin/matlab
This is working perfectly but now I am migrating this scripts to a pipline version of it.
Pipeline migration:
I managed to implement the same script in a pipeline fashion using the Parametrized pipelines pluin. With this I achieve the point in which I can manually select which version of my software is going to be used if I trigger the build manually and I also found a way to execute this periodically selecting the parameter I want at each run.
This solution seems fairly working however is not really satisfying.
My multi-config project had some feature that this does not:
With more then one parameter I can set to interpolate them and execute each combination
The executions are clearly separated and in build history/build details is easy to recognize which settings hads been used
Just adding a new "possible" value to the parameter is going to spawn the desired executions
Request
So I wonder if there is a better solution to my problem that can satisfy also the point above.
Long story short: is there a way to implement a multi-configuration project in jenkins but using the pipeline technology?
I've seen this and similar questions asked a lot lately, so it seemed that it would be a fun exercise to work this out...
A matrix/multi-config job, visualized in code, would really just be a few nested for loops, one for each axis of parameters.
You could build something fairly simple with some hard coded for loops to loop over a few lists. Or you can get more complicated and do some recursive looping so you don't have to hard code the specific loops.
DISCLAIMER: I do ops much more than I write code. I am also very new to groovy, so this can probably be done more cleanly, and there are probably a lot of groovier things that could be done, but this gets the job done, anyway.
With a little work, this matrixBuilder could be wrapped up in a class so you could pass in a task closure and the axis list and get the task map back. Stick it in a shared library and use it anywhere. It should be pretty easy to add some of the other features from the multiconfiguration jobs, such as filters.
This attempt uses a recursive matrixBuilder function to work through any number of parameter axes and build all the combinations. Then it executes them in parallel (obviously depending on node availability).
/*
All the config axes are defined here
Add as many lists of axes in the axisList as you need.
All combinations will be built
*/
def axisList = [
["ubuntu","rhel","windows","osx"], //agents
["jdk6","jdk7","jdk8"], //tools
["banana","apple","orange","pineapple"] //fruit
]
def tasks = [:]
def comboBuilder
def comboEntry = []
def task = {
// builds and returns the task for each combination
/* Map the entries back to a more readable format
the index will correspond to the position of this axis in axisList[] */
def myAgent = it[0]
def myJdk = it[1]
def myFruit = it[2]
return {
// This is where the important work happens for each combination
node(myAgent) {
println "Executing combination ${it.join('-')}"
def javaHome = tool myJdk
println "Node=${env.NODE_NAME}"
println "Java=${javaHome}"
}
//We won't declare a specific agent this part
node {
println "fruit=${myFruit}"
}
}
}
/*
This is where the magic happens
recursively work through the axisList and build all combinations
*/
comboBuilder = { def axes, int level ->
for ( entry in axes[0] ) {
comboEntry[level] = entry
if (axes.size() > 1 ) {
comboBuilder(axes[1..-1], level + 1)
}
else {
tasks[comboEntry.join("-")] = task(comboEntry.collect())
}
}
}
stage ("Setup") {
node {
println "Initial Setup"
}
}
stage ("Setup Combinations") {
node {
comboBuilder(axisList, 0)
}
}
stage ("Multiconfiguration Parallel Tasks") {
//Run the tasks in parallel
parallel tasks
}
stage("The End") {
node {
echo "That's all folks"
}
}
You can see a more detailed flow of the job at http://localhost:8080/job/multi-configPipeline/[build]/flowGraphTable/ (available under the Pipeline Steps link on the build page.
EDIT:
You can move the stage down into the "task" creation and then see the details of each stage more clearly, but not in a neat matrix like the multi-config job.
...
return {
// This is where the important work happens for each combination
stage ("${it.join('-')}--build") {
node(myAgent) {
println "Executing combination ${it.join('-')}"
def javaHome = tool myJdk
println "Node=${env.NODE_NAME}"
println "Java=${javaHome}"
}
//Node irrelevant for this part
node {
println "fruit=${myFruit}"
}
}
}
...
Or you could wrap each node with their own stage for even more detail.
As I did this, I noticed a bug in my previous code (fixed above now). I was passing the comboEntry reference to the task. I should have sent a copy, because, while the names of the stages were correct, when it actually executed them, the values were, of course, all the last entry encountered. So I changed it to tasks[comboEntry.join("-")] = task(comboEntry.collect()).
I noticed that you can leave the original stage ("Multiconfiguration Parallel Tasks") {} around the execution of the parallel tasks. Technically now you have nested stages. I'm not sure how Jenkins is supposed to handle that, but it doesn't complain. However, the 'parent' stage timing is not inclusive of the parallel stages timing.
I also noticed is that when a new build starts to run, on the "Stage View" of the job, all the previous builds disappear, presumably because the stage names don't all match up. But after the build finishes running, they all match again and the old builds show up again.
And finally, Blue Ocean doesn't seem to vizualize this the same way. It doesn't recognize the "stages" in the parallel processes, only the enclosing stage (if it is present), or "Parallel" if it isn't. And then only shows the individual parallel processes, not the stages within.
Points 1 and 3 are not completely clear to me, but I suspect you just want to use “scripted” rather than “Declarative” Pipeline syntax, in which case you can make your job do whatever you like—anything permitted by matrix project axes and axis filters and much more, including parallel execution. Declarative syntax trades off syntactic simplicity (and friendliness to “round-trip” editing tools and “linters”) for flexibility.
Point 2 is about visualization of the result, rather than execution per se. While this is a complex topic, the usual concrete request which is not already supported by existing visualizations like Blue Ocean is to be able to see test results distinguished by axis combination. This is tracked by JENKINS-27395 and some related issues, with design in progress.

Resources