custom aggregators with client_states as states - tensorflow-federated

I want to create a custom aggregator where the state is the unique client state of each client. To initialize I can define client states as usual, and then use federated_collect to place #SERVER placement since thats what initialize_fn() wants. I can also do the same for creating new_state in the next_fn(). The problem is once I don't know how I can "broadcast" these states back into clients. Normally federated_broadcast takes say A#SERVER and then makes copies of it equal to number of clients. so for two clients it would be {A}#CLIENTS lets say (A A). What I want is to have AB#SERVER turning into (A B).
I am currently defining client states outside the aggregation process, and then passing these in run_one_round of iterative process. use federated_collect to collect these states from measurements of aggregator, and then unstack it outside. So from the outside of the federated computations it looks like
server_state, train_metrics , client_states, aggregation_state = iterative_process.next(
server_state, sampled_train_data, client_states, aggregation_state)
client_states = [x for x in client_states]
In TFF
output = aggregation_process.next(aggregation_state, client_outputs.weights_delta, client_states)
new_aggregation_state = output.state
round_model_delta = output.result
new_client_states = output.measurements
in aggregator
measurements = tff.federated_collect(new_client_states)
return tff.templates.MeasuredProcessOutput(
state=new_state, result= round_model_delta, measurements=measurements)
But I am trying to define and handle these client states completely inside the aggregator so that I can plug this aggregator into tff.learning.build_federated_averaging_process like
iterative_process = tff.learning.build_federated_averaging_process(
model_fn,
client_optimizer_fn=lambda: tf.keras.optimizers.SGD(learning_rate=0.02),
server_optimizer_fn=lambda: tf.keras.optimizers.SGD(learning_rate=1.0),
model_update_aggregation_factory=my_aggregation_factory)
Is that possible? if so how?

tff.federated_collect is likely not the desired tool in this situation, and it will be removed in future versions of TFF (see commit #030a406).
Alternatively, A tff.federated_computation can both take #CLIENTS parameters as input, and return #CLIENT placed values as output. Instead of collecting all the values on the server first (implying that the system is communicating the states); it maybe best to leave the values on the clients.
When executing TFF in a simulation environment (e.g. invoking a tff.Computation in a Colab notebook) a T#CLIENT placed value will be returned as a list of T objects; one for each client. This can later be used as a parameter to future tff.Computation invocation.
Example:
#tff.tf_computation(tf.int32)
def sqrt(value):
return tf.math.sqrt(tf.cast(value, tf.float32))
#tff.federated_computation(tff.types.at_clients(tf.int32))
def federated_sqrt(values):
return tff.federated_map(sqrt, values)
client_values = [1,2,3,4]
federated_sqrt(client_values)
>>> [<tf.Tensor: shape=(), dtype=float32, numpy=1.0>,
<tf.Tensor: shape=(), dtype=float32, numpy=1.4142135>,
<tf.Tensor: shape=(), dtype=float32, numpy=1.7320508>,
<tf.Tensor: shape=(), dtype=float32, numpy=2.0>]
Important caveat: the order of inputs and outputs is not necessarily guaranteed to be the same. An example of how to index and track states across invocations can be found in the tensorflow_federated/python/examples/stateful_clients/ directory inside the repository.

Related

How to properly reset the ContinuousState in a class derived from LeafSystem?

I want to write a continuous time system derived from the LeafSystem that can have its continuous state reset to other values if some conditions are met. However, the system does not work as what I expected. To find out the reason, I implement a simple multi-step integrator system as below:
class MultiStepIntegrator(LeafSystem):
def __init__(self):
LeafSystem.__init__(self)
self.state_index = self.DeclareContinuousState(1)
self.DeclareStateOutputPort("x", self.state_index)
self.flag_1 = True
self.flag_2 = True
def reset_state(self, context, value):
state = context.get_mutable_continuous_state_vector()
state.SetFromVector(value)
def DoCalcTimeDerivatives(self, context, derivatives):
t = context.get_time()
if t < 2.0:
V = [1]
elif t < 4.0:
if self.flag_1:
self.reset_state(context, [0])
print("Have done the first reset")
self.flag_1 = False
V = [1]
else:
if self.flag_2:
self.reset_state(context, [0])
print("Have done the second reset")
self.flag_2 = False
V = [-1]
derivatives.get_mutable_vector().SetFromVector(V)
What I expect from this system is that it will give me a piecewise and discontinuous trajectory. Given that I set the state initially to be 0, firstly the state will go from 0 to 2 for $t \in [0,2]$, then agian from 0 to 2 for $t \in [2,4]$ and then from 0 to -2 for $t \in [4,6]$.
Then I simulate this system, and plot the logging with
builder = DiagramBuilder()
plant, scene_graph = AddMultibodyPlantSceneGraph(builder, 1e-4)
plant.Finalize()
integrator = builder.AddSystem(MultiStepIntegrator())
state_logger = LogVectorOutput(integrator.get_output_port(), builder, 1e-2)
diagram = builder.Build()
simulator = Simulator(diagram)
log_state = state_logger.FindLog(context)
fig = plt.figure()
t = log_state.sample_times()
plt.plot(t, log_state.data()[0, :])
fig.set_size_inches(10, 6)
plt.tight_layout()
It seems that the resets never happen. However I do see the two logs indicating that the resets are done:
Have done the first reset
Have done the second reset
What happened here? Are there some checkings done behind the scene that the ContinuousState cannot jump (as the name indicates)? How can I reset the state value given that some conditions are met?
Thank you very much for your help!
In DoCalcTimeDerivatives, the context is a const (input-only) argument. It cannot be modified. The only thing DoCalcTimeDerivatives can do is output the derivative to enable the integrator to integrate the continuous state.
Not all integrators used fixed-size time steps. Some might need to evaluate the gradients multiple times before deciding what step size(s) to use. Therefore, it's not reasonable for a dx/dt calculation to have any side-effects. It must be a pure function, where its only consequence is to report a dx/dt.
To change a continuous state value other than through pure integration, the System needs to use an "unrestricted update" event. That event can mutate any and all elements of the State (including continuous state).
If the timing of the discontinuities is periodic (even if some events make no change to the state), you can use DeclarePeriodicUnrestrictedUpdateEvent to declare the update calculation.
If the discontinuities happen per a witness function, see bouncing_ball or rimless_wheel or compass_gait for an example.
If you need a generalized (bespoke) triggering schedule for the discontinuity events, you'll need to override DoCalcNextUpdateTime to manually inject the next event timing, something like the LcmSubscriberSystem. We don't have many good examples of this to my knowledge.

Different access methods to Pyro Paramstore give different results

I am following the Pyro introductory tutorial in forecasting, and trying to access the learned parameters after training the model, I get different results using different access methods for some of them (while getting identical results for others).
Here is the stripped-down reproducible code from the tutorial:
import torch
import pyro
import pyro.distributions as dist
from pyro.contrib.examples.bart import load_bart_od
from pyro.contrib.forecast import ForecastingModel, Forecaster
pyro.enable_validation(True)
pyro.clear_param_store()
pyro.__version__
# '1.3.1'
torch.__version__
# '1.5.0+cu101'
# import & prepare the data
dataset = load_bart_od()
T, O, D = dataset["counts"].shape
data = dataset["counts"][:T // (24 * 7) * 24 * 7].reshape(T // (24 * 7), -1).sum(-1).log()
data = data.unsqueeze(-1)
T0 = 0 # begining
T2 = data.size(-2) # end
T1 = T2 - 52 # train/test split
# define the model class
class Model1(ForecastingModel):
def model(self, zero_data, covariates):
data_dim = zero_data.size(-1)
feature_dim = covariates.size(-1)
bias = pyro.sample("bias", dist.Normal(0, 10).expand([data_dim]).to_event(1))
weight = pyro.sample("weight", dist.Normal(0, 0.1).expand([feature_dim]).to_event(1))
prediction = bias + (weight * covariates).sum(-1, keepdim=True)
assert prediction.shape[-2:] == zero_data.shape
noise_scale = pyro.sample("noise_scale", dist.LogNormal(-5, 5).expand([1]).to_event(1))
noise_dist = dist.Normal(0, noise_scale)
self.predict(noise_dist, prediction)
# fit the model
pyro.set_rng_seed(1)
pyro.clear_param_store()
time = torch.arange(float(T2)) / 365
covariates = torch.stack([time], dim=-1)
forecaster = Forecaster(Model1(), data[:T1], covariates[:T1], learning_rate=0.1)
So far so good; now, I want to inspect the learned latent parameters stored in Paramstore. Seems there are more than one ways to do this; using the get_all_param_names() method:
for name in pyro.get_param_store().get_all_param_names():
print(name, pyro.param(name).data.numpy())
I get
AutoNormal.locs.bias [14.585433]
AutoNormal.scales.bias [0.00631594]
AutoNormal.locs.weight [0.11947815]
AutoNormal.scales.weight [0.00922901]
AutoNormal.locs.noise_scale [-2.0719821]
AutoNormal.scales.noise_scale [0.03469057]
But using the named_parameters() method:
pyro.get_param_store().named_parameters()
gives the same values for the location (locs) parameters, but different values for all scales ones:
dict_items([
('AutoNormal.locs.bias', Parameter containing: tensor([14.5854], requires_grad=True)),
('AutoNormal.scales.bias', Parameter containing: tensor([-5.0647], requires_grad=True)),
('AutoNormal.locs.weight', Parameter containing: tensor([0.1195], requires_grad=True)),
('AutoNormal.scales.weight', Parameter containing: tensor([-4.6854], requires_grad=True)),
('AutoNormal.locs.noise_scale', Parameter containing: tensor([-2.0720], requires_grad=True)),
('AutoNormal.scales.noise_scale', Parameter containing: tensor([-3.3613], requires_grad=True))
])
How is this possible? According to the documentation, Paramstore is a simple key-value store; and there are only these six keys in it:
pyro.get_param_store().get_all_param_names() # .keys() method gives identical result
# result
dict_keys([
'AutoNormal.locs.bias',
'AutoNormal.scales.bias',
'AutoNormal.locs.weight',
'AutoNormal.scales.weight',
'AutoNormal.locs.noise_scale',
'AutoNormal.scales.noise_scale'])
so, there is no way that one method access one set of items and the other a different one.
Am I missing something here?
pyro.param() returns transformed parameters in this case to the positive reals for scales.
Here is the situation, as revealed in the Github thread I opened in parallel with this question...
Paramstore is no more just a simple key-value store - it also performs constraint transformations; quoting a Pyro developer from the above link:
here's some historical background. The ParamStore was originally just a key-value store. Then we added support for constrained parameters; this introduced a new layer of separation between user-facing constrained values and internal unconstrained values. We created a new dict-like user-facing interface that exposed only constrained values, but to keep backwards compatibility with old code we kept the old interface around. The two interfaces are distinguished in the source files [...] but as you observe it looks like we forgot to mark the old interface as DEPRECATED.
I guess in clarifying docs we should:
clarify that the ParamStore is no longer a simple key-value store
but also performs constraint transforms;
mark all "old" style interface methods as DEPRECATED;
remove "old" style interface usage from examples and tutorials.
As a consequence, it turns out that, while pyro.param() returns the results in the constrained (user-facing) space, the older method named_parameters() returns the unconstrained (i.e. for internal use only) values, hence the apparent discrepancy.
It's not difficult to verify indeed that the scales values returned by the two methods above are related by a logarithmic transformation:
import numpy as np
items = list(pyro.get_param_store().named_parameters()) # unconstrained space
i = 0
for name in pyro.get_param_store().keys():
if 'scales' in name:
temp = np.log(
pyro.param(name).item() # constrained space
)
print(temp, items[i][1][0].item() , np.allclose(temp, items[i][1][0].item()))
i+=1
# result:
-5.027793402915326 -5.0277934074401855 True
-4.600319371162187 -4.6003193855285645 True
-3.3920585732532835 -3.3920586109161377 True
Why does this discrepancy affect only scales parameters? That's because scales (i.e. essentially variances) are by definition constrained to be positive; that doesn't hold for locs (i.e. means), which are not constrained, hence the two representations coincide for them.
As a result of the question above, a new bullet has now been added in the Paramstore documentation, giving a relevant hint:
in general parameters are associated with both constrained and unconstrained values. for example, under the hood a parameter that is constrained to be positive is represented as an unconstrained tensor in log space.
as well as in the documentation of the named_parameters() method of the old interface:
Note that, in the event the parameter is constrained, unconstrained_value is in the unconstrained space implicitly used by the constraint.

`knitr_out, `file_out` and `vis_drake_graph` usage in R:drake

I'm trying to understand how to use knitr_out, file_out and vis_drake_graph properly in drake.
I have two questions.
Q1: Usage of knitr_out and file_out to create markdown reports
While a code like this works correctly for one of my smaller projects:
make_hyp_data_aggregated_report <- function() {
render(
input = knitr_in("rmd/hyptest-is-data-being-aggregated.Rmd"),
output_file = file_out("~/projectname/reports/01-hyp-test.html"),
quiet = TRUE
)
}
plan <- drake_plan(
...
...
hyp_data_aggregated_report = make_hyp_data_aggregated_report()
...
...
)
Exactly similar code in my large project (with ~10+ reports) doesn't work exactly right. i.e., while the reports get built, the knitr_in objects don't get displayed as the blue squares in the graph using drake::vis_drake_graph() in my large project.
Both projects use the drake::loadd(....) within the markdown to get the objects from cache.
Is there some code in vis_drake_graph that removes these squares once the graph gets busy?
Q2: file_out objects in vis_drake_graph
Is there a way to display the file_out objects themselves as circles/squares in vis_drake_graph?
Q3: packages showing up in vis_drake_graph
Is there a way to avoid vis_drake_graph from printing the packages explicitly? (Basically anything with the ::)
Q1
Every literal file path needs its own knitr_in() or file_out(). If you have one function with one knitr_in(), even if you use the function multiple times, that still only counts as one file path. I recommend writing these keywords at the plan level, e.g.
plan <- drake_plan(
r1 = render(knitr_in("report1.Rmd"), output_file = file_out("report1.html")),
r2 = render(knitr_in("report2.Rmd"), output_file = file_out("report2.html")),
r3 = render(knitr_in("report3.Rmd"), output_file = file_out("report3.html"))
)
Q2
They should appear unless you set show_output_files = FALSE in vis_drake_graph().
Q3
No, but if it's any consolation, I do regret the decision to track namespaced functions and objects at all in drake. drake's approach is fundamentally suboptimal for tracking packages, and I plan to get rid of it if there ever comes time for a round of breaking changes. Otherwise, there is no way to get rid of it except vis_drake_graph(targets_only = TRUE), which also gets rid of all the imports in the graph.

how to print local outputs in tensorflow federated?

I want to print local outputs of clients in the tensorflow federated tutorial https://www.tensorflow.org/federated/tutorials/federated_learning_for_image_classification. What should I do?
If you only want a list of the values that go into the aggregations (e.g. into tff.federated_mean), one option would be to add additional outputs to aggregate_mnist_metrics_across_clients() to include metrics computed using tff.federated_collect().
This might look something like:
#tff.federated_computation
def aggregate_mnist_metrics_across_clients(metrics):
return {
'num_examples': tff.federated_sum(metrics.num_examples),
'loss': tff.federated_mean(metrics.loss, metrics.num_examples),
'accuracy': tff.federated_mean(metrics.accuracy, metrics.num_examples),
'per_client/num_examples': tff.federated_collect(metrics.num_examples),
'per_client/loss': tff.federated_collect(metrics.loss),
'per_client/accuracy': tff.federated_collect(metrics.accuracy),
}
Which will get printed a few cells later when the computation runs:
state, metrics = iterative_process.next(state, federated_train_data)
print('round 1, metrics={}'.format(metrics))
round 1, metrics=<...,per_client/accuracy=[0.14516129, 0.10642202, 0.13972603],per_client/loss=[3.2409852, 3.417463, 2.9516447],per_client/num_examples=[930.0, 1090.0, 730.0]>
Note however: if you want to know the value of a specific client, there is intentionally no way to do that. By design, TFF's language intentionally avoids a notion of client identity; there is desire to avoid making clients addressable.
If you want to print something in 'client_update' function you can use tf.print().

Dask Delayed ignores name for dependent variables

When creating a graph of calculations using delayed I'm trying to assign names so that if I visualize the graph it's readable. However, for delayed variables that are dependent on functions the name parameter doesn't seem to affect the key. Here's a toy example:
def calc_avg(a, b):
return pd.concat([a, b], axis=1).mean(axis=1)
def calc_ratio(a, b):
return a / b
a = delayed(pd.Series(np.random.rand(10)), name='a')
b = delayed(pd.Series(np.random.rand(10)), name='b')
c = delayed(pd.Series(np.random.rand(10)), name='c')
x = delayed(calc_avg, name='avg_result')(a,b)
y = delayed(calc_ratio, name='ratio_result')(x,c)
y.visualize()
You can see the visualization here (I can't embed images), but rather than seeing 'avg_result' I see 'calc_avg-#0' and rather than see 'ratio_result' I see 'calc_ratio-#1'. If I look at x.key or y.key they do not match the names that I provided. Is this the expected behavior?
The key of a dask result needs to be unique for every combination of the function that was delayed, and the inputs you give it. What you see above is the expected behaviour: you are naming the function, but a call with different inputs would expect a different output, so the key must be different.
You can specify the key you'd like associated not when you define the delayed function, but when you call it:
x = delayed(calc_avg)(a, b, dask_key_name='avg_result')
y = delayed(calc_ratio)(x, c, dask_key_name='ratio_result')

Resources