How to change dask job_name to SGECluster - dask

I am using dask_jobqueue.SGECluster() and when I submit jobs to the grid they are all listed as dask-worker. I want to have different names for each submitted job.
Here is one example:
futures = []
for i in range(1,10):
res = client.submit(slow_pow, i,2)
futures.append(res)
[future.result() for future in as_completed(futures)]
All 10 jobs appear with name dask-worker when checking their status with qsub.
I have tried adding client.adapt(job_name=f'job{i}') within the loop, but no success, name still dask-worker.
Any hints?

The dask-worker is a generic name for the compute allocation from the cluster, it can be changed by providing cluster-specific arguments at the time the cluster is created. For example, for SLURMCluster this would be:
cluster = SLURMCluster('job_extra': ['--job-name="func"'])
SGECluster might have a different syntax.
The actual tasks submitted to dask scheduler will have their names generated automatically by dask and can be viewed through the dashboard, by default on (http://localhost:8787/status).
It's possible to specify a custom name for each task submitted to scheduler by using key kwarg:
fut = client.submit(myfunc, my_arg, key='custom_key')
Note that if you are submitting multiple futures, you will want them to have unique keys:
futs = [client.submit(myfunc, i, key=f'custom_key_{i}') for i in range(3)]

Related

str() is not usable anymore to get true value of a Text tfx.data_types.RuntimeParameter during pipeline execution

how to get string as true value of tfx.orchestration.data_types.RuntimeParameter during execution pipeline?
Hi,
I'm defining a runtime parameter like data_root = tfx.orchestration.data_types.RuntimeParameter(name='data-root', ptype=str) for a base path, from which I define many subfolders for various components like str(data_root)+'/model' for model serving path in tfx.components.Pusher().
It was working like a charm before I moved to tfx==1.12.0: str(data_root) is now providing a json dump.
To overcome that, i tried to define a runtime parameter for model path like model_root = tfx.orchestration.data_types.RuntimeParameter(name='model-root', ptype=str) and then feed the Pusher component the way I saw in many tutotrials:
pusher = Pusher(model=trainer.outputs['model'],
model_blessing=evaluator.outputs['blessing'],
push_destination=tfx.proto.PushDestination(
filesystem=tfx.proto.PushDestination.Filesystem(base_directory=model_root)))
but I get a TypeError saying tfx.proto.PushDestination.Filesystem does not accept Runtime parameter.
It completely breaks the existing setup as i received those parameters from external client for each kubeflow run.
Thanks a lot for any help.
I was able to fix it.
First of all, the docstring is not clear regarding which parameter of Pusher can be a RuntimeParameter or not.
I finally went to __init__ code definition of component Pusher to see that only the parameter push_destination can be a RuntimeParameter:
def __init__(
self,
model: Optional[types.BaseChannel] = None,
model_blessing: Optional[types.BaseChannel] = None,
infra_blessing: Optional[types.BaseChannel] = None,
push_destination: Optional[Union[pusher_pb2.PushDestination,
data_types.RuntimeParameter]] = None,
custom_config: Optional[Dict[str, Any]] = None,
custom_executor_spec: Optional[executor_spec.ExecutorSpec] = None):
Then I defined the component consequently, using my RuntimeParameter
model_root = tfx.orchestration.data_types.RuntimeParameter(name='model-serving-location', ptype=str)
pusher = Pusher(model=trainer.outputs['model'],
model_blessing=evaluator.outputs['blessing'],
push_destination=model_root)
As push_destination parameter is supposed to be message proto tfx.proto.pusher_pb2.PushDestination, you have then to respect the associated schema when instantiating and running a pipeline execution, meaning the value should be like:
{'type': 'model-serving-location': 'value': '{"filesystem": {"base_directory": "path/to/model/serving/for/the/run"}}'}
Regards

anylogic agent communication and message sending

In my model, I have some agents;
"Demand" agent,
"EnergyProducer1" agent
"EnergyProducer2" agent.
When my hourly energy demands are created in the Main agent with a function, the priority for satisfying this demand is belongs to "EnergyProducer1" agent. In this agent, I have a function that calculate energy production based on some situtations. The some part of the inside of this function is following;
**" if (statechartA.isStateActive(Operating.busy)) && ( main.heatLoadDemandPerHour >= heatPowerNominal) {
producedHeatPower = heatPowerNominal;
naturalGasConsumptionA = naturalGasConsumptionNominal;
send("boilerWorking",boiler);
} else ..... "**
Here my question is related to 4th line of the code. If my agent1 fails to satisfy the hourly demand, I have to say agent2 that " to satisfy rest of demand". If I send this message to agent2, its statechart will be active and the function of agent2 will be working. My question is that this all situations will be realized at the same hour ??? İf it is not, is accessing variables and parameters of other agent2 more appropiaote way???
I hope I could explain my problem.
thanks for your help in advance...
**Edited question...
As a general comment on your question, within AnyLogic environment sending messages is alway preferable to directly accessing variable and parameters of another agent.
Specifically in the example presented the send() function will schedule message delivery the next instance after the completion of the current function.
Update: A message in AnyLogic can be any Java class. Sending strings such as "boilerWorking" used in the example is good for general control, however if more information needs to be shared (such as a double value) then it is good practice to create a new Java class (let's call is ModelMessage and follow these instructions) with at least two properties msgStr and msgVal. With this new class sending a message changes from this:
...
send("boilerWorking", boiler);
...
to this:
...
send(new ModelMessage("boilerWorking",42.0), boiler);
...
and firing transitions in the statechart has to be changed to use if expression is true with expression being msg.msgString == "boilerWorking".
More information about Agent communication is available here.

Dask - diagnostics dashboard - custom info about task

I'm using Dask to schedule and run research batches.
Those mostly produce side effects and are quite heavy (ranging from few minutes to a couple of hours). There's no communication between the tasks.
In code it looks like this, first I'm passing all the batches to process:
def process_batches(batches: Iterator[Batch], log_dir: Path):
cluster = LocalCluster(
n_workers=os.cpu_count(),
threads_per_worker=1
)
client = Client(cluster)
futures = []
for batch in batches:
futures += process_batch(batch, client, log_dir)
progress(futures)
Then I'm submitting repetitions from each batch as tasks:
def process_batch(batch: Batch, client: Client, log_dir: Path) -> List[Future]:
batch_dir = log_dir.joinpath(batch.nice_hash)
batch_futures = []
num_workers = len(client.scheduler_info()['workers'])
with Logger(batch_dir, clear_dir=True) as logger:
logger.save_json(batch.as_dict, 'batch')
for repetition in range(batch.n_repeats):
cpu_index = repetition % num_workers
future = client.submit(
process_batch_repetition,
batch,
repetition,
cpu_index,
logger
)
batch_futures.append(future)
return batch_futures
Is there any way to pass some custom info about the submitted task to the dashboard?
All I'm seeing are just tasks process_batch_repetition. Could I replace it with a custom string, so I can see what batch configurations are being processed at the moment?
Got an answer from Dask's BDFL mrocklin.
You can use the key= keyword to specify a key for the future. This should
be unique per future. Dask will use the prefix of the key name to
determine how it is rendered on the dashboard. See the docstring for
dask.utils.key_split for examples on how a key prefix is generated from a
key.
So you can use it like this:
future = client.submit(
process_batch_repetition,
batch,
repetition,
cpu_index,
logger,
key=f'{str(batch)}_repetition_{repetition}'
)
You just pass a unique string for this task. There are some forbidden chars (i.e. spaces), so expect some key errors.

How to find the concurrent.future input arguments for a Dask distributed function call

I'm using Dask to distribute work to a cluster. I'm creating a cluster and calling .submit() to submit a function to the scheduler. It returns a Futures object. I'm trying to figure out how to obtain the input arguments to that future object once it's been completed.
For example:
from dask.distributed import Client
from dask_yarn import YarnCluster
def somefunc(a,b,c ..., n ):
# do something
return
cluster = YarnCluster.from_specification(spec)
client = Client(cluster)
future = client.submit(somefunc, arg1, arg2, ..., argn)
# ^^^ how do I obtain the input arguments for this future object?
# `future.args` doesn't work
Futures don't hold onto their inputs. You can do this yourself though.
futures = {}
future = client.submit(func, *args)
futures[future] = args
A future only knows the key by which it is uniquely known on the scheduler. At the time of submission, if it has dependencies, these are transiently found and sent to the scheduler but no copy if kept locally.
The pattern you are after sounds more like delayed, which keeps hold of its graph, and indeed client.compute(delayed_thing) returns a future.
d = delayed(somefunc)(a, b, c)
future = client.compute(d)
dict(d.dask) # graph of things needed by d
You could communicate directly with the scheduler to find the dependencies of some key, which will in general also be keys, and so reverse-engineer the graph, but that does not sound like a great path, so I won't try to describe it here.

Jenkins How To Run Same Job for the Same Count the Parameters are defined?

My requirement is that I have written a bash script which monitors telnet on several ip(s) and ports. I have used the CSV which contains the input data and the script will read each row in the CSV and checks if the ip(s) can be telnet.
However I have requirement to jenkinize it, and I am wondering if there a way I can define my parameter in the Jenkins Job with different combination or values
say for example:
PARAM_KEY : VAL_1
PARAM_KEY : VAL_2
PARAM_KEY : VAL_3
and so on thus I can use the PARAM_KEY in the script and the Jenkins job gets executed for all the parameters defined i.e. based on the number of PARAMETERS defined i.e. 3 in above case.
Can any one guide me on this requirement.
If you mean to run 1 job and iterate over the ips inside, you can parse the CSV file inside a pipeline or pass it as a parameter ( and then split it )
// example of pipeline code
node ('slave80') {
csvString = "1.1.1.1,2.2.2.2,3.3.3.3" // can be sent as parameter
def ips = csvString.split(',')
ips.each { ip ->
sh """
./bash_script ${ip}
"""
}
}

Resources