I am working on a LeafSystem like this:
class exampleLeafSystem(LeafSystem):
def __init__(self, plant):
self._plant = plant
self._plant_context = plant.CreateDefaultContext()
self.DeclareVectorIutputPort("q_v", BasicVector(6))
self.DeclareVectorOutputPort("tau", BasicVector(3), self.TauCalcOutput)
self.DeclareVectorOutputPort("xe", BasicVector(3), self.xeCalcOutput)
def TauCalcOutput(self, context, output):
q_v = self.get_input_port(0).Eval(context)
self._plant.SetPositionsAndVelocities(self._plant_context, q_v)
# Do some calculation with self._plant and self._plant_context to get the output
output.SetFromVector(tau)
def xeCalcOutput(self, context, output):
q_v = self.get_input_port(0).Eval(context)
self._plant.SetPositionsAndVelocities(self._plant_context, q_v)
# Do some calculation with self._plant and self._plant_context to get the output
output.SetFromVector(xe)
In the two methods TauCalcOutput and xeCalcOutput here, I need frist update the MultibodyPlant's state information and then do the calculation to compute the output. However, since I do not know the order of the two CalcOutput method being called, in order for this two method to use the newest state information, I have to write
q_v = self.get_input_port(0).Eval(context)
self._plant.SetPositionsAndVelocities(self._plant_context, q_v)
in both methods, which seems a bit unnecessary. If I can assume the order of the CalcOutput methods being runned, for example, say that TauCalcOutput always runs first, then I can only have
q_v = self.get_input_port(0).Eval(context)
self._plant.SetPositionsAndVelocities(self._plant_context, q_v)
in TauCalcOutput method, without worrying that XeCalcOutput method will use MultibodyPlant's state information that is one step lagged.
So my question is that is there a specific order for the CalcOutput methods being called?
No. The contract is that output ports can be called in any order at any time, and are only called when they are evaluated (e.g. by a downstream system). The order that they will be called will depend on the other systems in the Diagram; they might not all get called during a single simulation step (e.g. if one system is consumed by a discrete time system with time step 0.1, and another by time step 0.2), or may not be called at all if they are not connected. Users can even call get_output_port().Eval() manually.
For a general approach to avoiding duplicate computation in output ports, you should store the result of the shared computation in the Context, either as state or as a "cache entry". See DeclareCacheEntry for more details.
For this workflow, specifically, perhaps the simplest solution is to check whether the positions and velocities are already set to the same value, just to avoid repeating the kinematics evaluation, for instance as you see here:
https://github.com/RobotLocomotion/drake/blob/6e6e37ffa677362245773f13c0628f0042b47414/multibody/inverse_kinematics/kinematic_constraint_utilities.cc#L47-L54
I have a LeafSystem (using pydrake) with a couple of inputs and an output which is calculated from the inputs. The CalcOutput callback function blocks execution of the program until the output is set. In some cases, I prefer to not set the output even if there is an input (eg.out of the limit values).
Is there a way to continue execution, without setting the output?
Drake System's framework uses a "pull" architecture. All of the system evaluations happen in a single thread, and CalcOutput is only called when a downstream method is evaluated which requests an input (e.g. in a downstream CalcOutput or CalcTimeDerivatives). So you need to return some value.
I guess that instead of returning some null value, you probably want to just have the output port continue to have the value it output last time? In that case, the solution is to store the output in a state variable (which means moving the work in CalcOutput into a state update), and then having your output just write the state variable out to the port.
I have trialled TFF tutorial (MNIST) on my single machine and now I am trying to perform a multi-machine process using MNIST data.
Clearly, I cannot use create_tf_dataset_for_client so I have used GRPC to learn how to pass data from one machine to another.
My scenario is that Server will dispatch the initial model (with zeroes) to all the participating clients where the model will run on local data. Each client will dispatch the new weights to the server that will perform federated_mean.
I was thinking of using tff.learning.build_federated_averaging_process where I could hopefully customise the next function (2nd argument) but I failed... I am not even sure if we use this approach to send the model and get the weights back from remote clients.
Then I thought I could use tff.federated_mean under #tff.federated_computation decorator. However, since weights are arrays and I have a list of them (as I have a number of clients), I am unable to understand how do I create a tff.FederatedType that points to that a list of lists. Any help from someone who has modelled federation on distributed dataset will be handy to understand.
Regards,
Dev.
TFF computations are designed to be platform/runtime agnostic; a single computation can be executed by several different backends.
TFF's type system can be helpful here in reasoning about how data is expected to flow in you computation. See the custom federated algorithms part 1 tutorial for an intro to how TFF thinks about types.
The result of build_federated_averaging_process expects an argument of datasets which are placed at clients; for a dataset of element type T, in TFF's usual notation this would be denoted {T*}#C. This signature particular is agnostic with respect to how the datasets arrive at the clients, or indeed how the clients themselves are represented.
Materializing the data and representing the clients is really the job of the runtime. TFF provides a few so-called native options here.
For example, in the local Python runtime clients are represented by threads on your local machine. Datasets are simply eager tf.data.Dataset objects, and the threads pull data from the datasets during training.
In the remote Python runtime, clients are represented by (threads on) remote workers, so that a single remote worker could be running more than one client. In this case, as you note, data must be materialized on the remote worker in order to train.
There are several options for accomplishing this.
One, TFF will actually handle serialization and deserialization of eager datasets across this RPC connection for you, so you could use the identical pattern of specifying data as in the local runtime, and it should "just work". This pattern actually got significantly better in March of 2021, via the use of tf.raw_ops.DatasetToGraphV2.
Perhaps better mapping to the concepts of federated computation, however, is the use of some library functions to simply instantiate the datasets on the workers.
Suppose you have an iterative process ip, which accepts a state and data argument, where data is of type {T*}#C. Suppose further we have a TFF computation get_dataset_for_client_id, which accepts a string and returns a dataset of appropriate type (IE, its TFF type signature is tf.str -> T*).
Then we can compose these two computations into another:
#tff.federated_computation(STATE_TYPE, tff.FederatedType(tf.string, tff.CLIENTS))
def new_next(state, client_ids):
datasets_on_clients = tff.federated_map(get_dataset_for_client_id, client_ids)
return ip.next(state, datasets_on_clients)
new_next now requires the controller to only specify the ids of clients on which to train, and delegates responsibility for pointing to a data store to whoever is representing the clients.
This pattern I think is likely what you want; TFF provides some helper s like the dataset_computation attribute on tff.simulation.ClientData and tff.simulation.compose_dataset_computation_with_iterative_process, which will more or less perform the wiring we did above for you.
let's do this step by step. Please let us know if the explanation below answers your question.
Let's start with an example of TF (non-federated, just local) code that takes a dataset and does something with it, say add numbers:
#tff.tf_computation(tff.SequenceType(tf.int32))
def process_data(ds):
return ds.reduce(np.int32(0), lambda x, y: x + y)
This code takes a dataset of integer numbers at input, and returns a single integer with the sum at output.
You can confirm this by lookin at the type signature, like this:
str(process_data.type_signature)
You should see this:
(int32* -> int32)
So, process_data takes a set of integers, and returns an integer.
Now, using TFF's federated operators we can create a federated computation that does this on multiple clients, like this:
#tff.federated_computation(tff.FederatedType(tff.SequenceType(tf.int32), tff.CLIENTS))
def process_data_on_clients(federated_ds):
return tff.federated_map(process_data, federated_ds)
If you look at the type signature of this new computation (just like above), you will see this:
({int32*}#CLIENTS -> {int32}#CLIENTS)
It means process_data_on_clients takes a federated set of integers (one set per client), and returns a federated integer (one integer with the sum on each client).
What happens in the above is that, the TF logic in process_data will be executed once on each client. This is how the federated_map operator works.
Now, process_data_on_clients is a little bit like the the iterative process you are working with. It wants you to provide a federated dataset as an argument.
Let's see how we can make one by following the same pattern as above.
Here's some TF code that creates a single dataset with integers, say you supply an integer n and want to create a dataset with numbers from 1 up to n, i..e, {1, 2, ..., n}:
#tff.tf_computation(tf.int32)
def make_data(n):
return tf.data.Dataset.range(tf.cast(n, tf.int64)).map(lambda x: tf.cast(x + 1, tf.int32))
This is obviously a silly example, you could do something more along the lines of what you need (e.g., read data from a file specified by a name, etc.).
And here's what its type signature looks like:
(int32 -> int32*)
You can see the similarity to process_data.
And, just like with processing data, here's now we can make data on all clients by using the federated_map operator:
#tff.federated_computation(tff.FederatedType(tf.int32, tff.CLIENTS))
def make_data_on_clients(federated_n):
return tff.federated_map(make_data, federated_n)
This is the type signature:
({int32}#CLIENTS -> {int32*}#CLIENTS)
Great, so make_data_on_clients takes a federated integer (that tells us how many data items to produce on each client), and returns a federated dataset, just like what process_data_on_clients wants.
You can check that the two work together as intended:
federated_n = [2, 3, 4]
federated_ds = make_data_on_clients(federated_n)
result = process_data_on_clients(federated_ds)
result
You should get the sums 1+2, 1+2+3, and 1+2+3+4 on the 3 clients involved in this computation (note there were 3 numbers in the federated integer above, so there are 3 clients here):
[<tf.Tensor: shape=(), dtype=int32, numpy=3>,
<tf.Tensor: shape=(), dtype=int32, numpy=6>,
<tf.Tensor: shape=(), dtype=int32, numpy=10>]
Note that all TF code you have seen so far, including both dataset creation and dataset reduce, were being executed on the clients (using federated_map).
Now, you can put the two together:
#tff.federated_computation(tff.FederatedType(tf.int32, tff.CLIENTS))
def make_and_process_data_on_clients(federated_n):
federated_ds = make_data_on_clients(federated_n)
return process_data_on_clients(federated_ds)
And now, you can invoke the make and process data combo in one shot:
make_and_process_data_on_clients(federated_n)
Again, all TF code here is executing on clients, just like in the above.
So where does this leave you?
Going back to Keith's explanation, the iterative process you got from TFF wants a federated dataset at input, just like process_data_on_clients.
The function get_dataset_for_client_id in Keith's example is like our make_data in that it is assumed to contain TensorFlow code that you want to run on each client to physically construct a dataset on that client.
In out silly example, dataset construction logic used range, but it can be anything. For example, you could load data on each client from the same local file my_data, or using a custom TF op, or by whatever other means. Just like in our example, you can pass parameters to that function to give you more centralized control (similarly to whatever did above with the federated integer).
The code snipper new_next in Keith's example is just like our make_and_process_data_on_clients, in that it combines two federated computations: one that makes federated data on clients (supplied by you, just as discussed here), and one that processes that data (from tff.learning, the iterative process).
Does this help?
If still unclear, I would recommend to try the examples I included above on your distributed setup, since you already have one. You could inject some TF print ops to that code to confirm that the TF code you wrote is executing on the client machines in your system.
Once you get that part, it's simple tweak to replace the silly data set construction logic in make_data with one that loads a dataset on each client from whatever local data source you are using.
EDITS:
Re: how to print, any TensorFlow code that appears in the body of a #tff.tf_computation is executed in eager mode, and you can use standard TensorFlow mechanisms such as tf.print to print from within TensorFlow.
tensorflow.org/api_docs/python/tf/print
On how to configure a multi-machine system with multiple worker nodes, see the Kubernetes tutorial. Note that the machine that drives the process connects to worker nodes, not the other way round.
https://www.tensorflow.org/federated/tutorials/high_performance_simulation_with_kubernetes
I am trying to find the correct syntax for using a for loop with dask delayed. I have found several tutorials and other questions but none fit my condition, which is extremely basic.
First, is this the correct way to run a for-loop in parallel?
%%time
list_names=['a','b','c','d']
keep_return=[]
#delayed
def loop_dummy(target):
for i in range (1000000000):
pass
print('passed value is:'+target)
return(1)
for i in list_names:
c=loop_dummy(i)
keep_return.append(c)
total = delayed(sum)(keep_return)
total.compute()
This produced
passed value is:a
passed value is:b
passed value is:c
passed value is:d
Wall time: 1min 53s
If I run this in serial,
%%time
list_names=['a','b','c','d']
keep_return=[]
def loop_dummy(target):
for i in range (1000000000):
pass
print('passed value is:'+target)
return(1)
for i in list_names:
c=loop_dummy(i)
keep_return.append(c)
it is actually faster.
passed value is:a
passed value is:b
passed value is:c
passed value is:d
Wall time: 1min 49s
I have seen examples where it was stated there is a small amount of overhead for Dask, but this seems to take long enough to justify, no?
My actual for loop involves heavier computation where I build a model for various targets.
This computation
for i in range(...):
pass
Is bound by the global interpreter lock (GIL). You will want to use the multiprocessing or dask.distributed Dask backends rather than the default threading backend. I recommend the following:
total.compute(scheduler='multiprocessing')
However, if your actual computation is mostly Numpy/Pandas/Scikit-Learn/Other numeric package code, then the default threading backend is probably the right choice.
More information about choosing between schedulers is available here: http://dask.pydata.org/en/latest/scheduling.html
I have a dataset with potentially corrupted/malicious data. The data is timestamped. I'm rating the data with a heuristic function. After a period of time I know that all new data items coming with some IDs needs to be discarded and they represent a significant portion of data (up to 40%).
Right now I have two batch pipelines:
First one just runs the rating over the data.
The second one first filters out the corrupted data and runs the analysis.
I would like to switch from batch mode (say, running every day) into an online processing mode (hope to get a delay < 10 minutes).
The second pipeline uses a global window which makes processing easy. When the corrupted data key is detected, all other records are simply discarded (also using the discarded keys from previous days as a pre-filter is easy). Additionally it makes it easier to make decisions about the output data as during the processing all historic data for a given key is available.
The main question is: can I create a loop in a Dataflow DAG? Let's say I would like to accumulate quality-rates given to each session window I process and if the rate sum is over X, some a filter function in earlier stage of pipeline should filter out malicious keys.
I know about side input, I don't know if it can change during runtime.
I'm aware that DAG by definition cannot have cycle, but how achieve same result without it?
Idea that comes to my mind is to use side output to mark ID as malicious and make fake unbounded output/input. The output would dump the data to some storage and the input would load it every hour and stream so it can be joined.
Side inputs in the Beam programming model are windowed.
So you were on the right path: it seems reasonable to have a pipeline structured as two parts: 1) computing a detection model for the malicious data, and 2) taking the model as a side input and the data as a main input, and filtering the data according to the model. This second part of the pipeline will get the model for the matching window, which seems to be exactly what you want.
In fact, this is one of the main examples in the Millwheel paper (page 2), upon which Dataflow's streaming runner is based.