how to print local outputs in tensorflow federated? - tensorflow-federated

I want to print local outputs of clients in the tensorflow federated tutorial https://www.tensorflow.org/federated/tutorials/federated_learning_for_image_classification. What should I do?

If you only want a list of the values that go into the aggregations (e.g. into tff.federated_mean), one option would be to add additional outputs to aggregate_mnist_metrics_across_clients() to include metrics computed using tff.federated_collect().
This might look something like:
#tff.federated_computation
def aggregate_mnist_metrics_across_clients(metrics):
return {
'num_examples': tff.federated_sum(metrics.num_examples),
'loss': tff.federated_mean(metrics.loss, metrics.num_examples),
'accuracy': tff.federated_mean(metrics.accuracy, metrics.num_examples),
'per_client/num_examples': tff.federated_collect(metrics.num_examples),
'per_client/loss': tff.federated_collect(metrics.loss),
'per_client/accuracy': tff.federated_collect(metrics.accuracy),
}
Which will get printed a few cells later when the computation runs:
state, metrics = iterative_process.next(state, federated_train_data)
print('round 1, metrics={}'.format(metrics))
round 1, metrics=<...,per_client/accuracy=[0.14516129, 0.10642202, 0.13972603],per_client/loss=[3.2409852, 3.417463, 2.9516447],per_client/num_examples=[930.0, 1090.0, 730.0]>
Note however: if you want to know the value of a specific client, there is intentionally no way to do that. By design, TFF's language intentionally avoids a notion of client identity; there is desire to avoid making clients addressable.

If you want to print something in 'client_update' function you can use tf.print().

Related

custom aggregators with client_states as states

I want to create a custom aggregator where the state is the unique client state of each client. To initialize I can define client states as usual, and then use federated_collect to place #SERVER placement since thats what initialize_fn() wants. I can also do the same for creating new_state in the next_fn(). The problem is once I don't know how I can "broadcast" these states back into clients. Normally federated_broadcast takes say A#SERVER and then makes copies of it equal to number of clients. so for two clients it would be {A}#CLIENTS lets say (A A). What I want is to have AB#SERVER turning into (A B).
I am currently defining client states outside the aggregation process, and then passing these in run_one_round of iterative process. use federated_collect to collect these states from measurements of aggregator, and then unstack it outside. So from the outside of the federated computations it looks like
server_state, train_metrics , client_states, aggregation_state = iterative_process.next(
server_state, sampled_train_data, client_states, aggregation_state)
client_states = [x for x in client_states]
In TFF
output = aggregation_process.next(aggregation_state, client_outputs.weights_delta, client_states)
new_aggregation_state = output.state
round_model_delta = output.result
new_client_states = output.measurements
in aggregator
measurements = tff.federated_collect(new_client_states)
return tff.templates.MeasuredProcessOutput(
state=new_state, result= round_model_delta, measurements=measurements)
But I am trying to define and handle these client states completely inside the aggregator so that I can plug this aggregator into tff.learning.build_federated_averaging_process like
iterative_process = tff.learning.build_federated_averaging_process(
model_fn,
client_optimizer_fn=lambda: tf.keras.optimizers.SGD(learning_rate=0.02),
server_optimizer_fn=lambda: tf.keras.optimizers.SGD(learning_rate=1.0),
model_update_aggregation_factory=my_aggregation_factory)
Is that possible? if so how?
tff.federated_collect is likely not the desired tool in this situation, and it will be removed in future versions of TFF (see commit #030a406).
Alternatively, A tff.federated_computation can both take #CLIENTS parameters as input, and return #CLIENT placed values as output. Instead of collecting all the values on the server first (implying that the system is communicating the states); it maybe best to leave the values on the clients.
When executing TFF in a simulation environment (e.g. invoking a tff.Computation in a Colab notebook) a T#CLIENT placed value will be returned as a list of T objects; one for each client. This can later be used as a parameter to future tff.Computation invocation.
Example:
#tff.tf_computation(tf.int32)
def sqrt(value):
return tf.math.sqrt(tf.cast(value, tf.float32))
#tff.federated_computation(tff.types.at_clients(tf.int32))
def federated_sqrt(values):
return tff.federated_map(sqrt, values)
client_values = [1,2,3,4]
federated_sqrt(client_values)
>>> [<tf.Tensor: shape=(), dtype=float32, numpy=1.0>,
<tf.Tensor: shape=(), dtype=float32, numpy=1.4142135>,
<tf.Tensor: shape=(), dtype=float32, numpy=1.7320508>,
<tf.Tensor: shape=(), dtype=float32, numpy=2.0>]
Important caveat: the order of inputs and outputs is not necessarily guaranteed to be the same. An example of how to index and track states across invocations can be found in the tensorflow_federated/python/examples/stateful_clients/ directory inside the repository.

Mismanagement of dask future results slow down the performances

I'm looking for any suggestion on how to solve the bottleneck below described.
Within a dask distributed infrastructure I map some futures and gain results whenever they are ready. Once retrieved I've to invoke a time consuming, blocking "pandas" function and, unfortunately, this function can't be avoided.
The optimum would be to have something that let me create another process, detached from the for loop, that's able to ingest the flow of results. For other constraints, not present in the example, the output can't be serialized and sent to workers and must be processed on the master.
here a small mockup. Just grab the idea and not focus too much on the details of the code.
class pxldrl(object):
def __init__(self, df):
self.table = df
def simulation(list_param):
time.sleep(random.random())
val = sum(list_param)/4
if val < 0.5:
result = {'param_e': val}
else:
result = {'param_f': val}
return pxldrl(result)
def costly_function(result, output):
time.sleep(1)
# blocking pandas function
output = output.append(result.table, sort=False, ignore_index=True)
return output
def main():
client = Client(n_workers=4, threads_per_worker=1)
output = pd.DataFrame(columns=['param_e', 'param_f'])
input = pd.DataFrame(np.random.random(size=(100, 4)),
columns=['param_a', 'param_b', 'param_c', 'param_d'])
for i in range(2):
futures = client.map(simulation, input.values)
for future, result in as_completed(futures, with_results=True):
output = costly_function(result, output)
It sounds like you want to run costly_function in a separate thread. Perhaps you could using the threading or concurrent.futures module to run your entire routine on a separate thread?
If you wanted to get fancy, you could even use Dask again and create a second client that ran within this process:
local_client = Client(processes=False)
and use that. (although you'll have to be careful about mixing futures between clients, which won't work)

Issue returning desired data with Lua

Wondering if I could get some help with this:
function setupRound()
local gameModes = {'mode 1','mode 2','mode 3'} -- Game modes
local maps = {'map1','map2','map3'}
--local newMap = maps[math.random(1,#maps)]
local mapData = {maps[math.random(#maps)],gameModes[math.random(#gameModes)]}
local mapData = mapData
return mapData
end
a = setupRound()
print(a[1],a[2]) --Fix from Egor
What the problem is:
`
When trying to get the info from setupRound() I get table: 0x18b7b20
How I am trying to get mapData:
a = setupRound()
print(a)
Edit:
Output Issues
With the current script I will always the the following output: map3 mode 2.
What is the cause of this?
Efficiency; is this the best way to do it?
While this really isn't a question, I just wanted to know if this method that I am using is truly the most efficient way of doing this.
First of all
this line does nothing useful and can be removed (it does something, just not something you'd want)
local mapData = mapData
Output Issues
The problem is math.random. Write a script that's just print(math.random(1,100)) and run it 100 times. It will print the same number each time. This is because Lua, by default, does not set its random seed on startup. The easiest way is to call math.randomseed(os.time()) at the beginning of your program.
Efficiency; is this the best way to do it?
Depends. For what you seem to want, yes, it's definitely efficient enough. If anything, I'd change it to the following to avoid magic numbers which will make it harder to understand the code in the future.
--- etc.
local mapData = {
map = maps[math.random(#maps)],
mode = gameModes[math.random(#gameModes)]
}
-- etc.
print(a.map, a.mode)
And remember:
Premature optimization is the root of all evil.
— Donald Knuth
You did very good by creating a separate function for generating your modes and maps. This separates code and is modular and neat.
Now, you have your game modes in a table modes = {} (=which is basically a list of strings).
And you have your maps in another table maps = {}.
Each of the table items has a key, that, when omitted, becomes a number counted upwards. In your case, there are 3 items in modes and 3 items in maps, so keys would be 1, 2, 3. The key is used to grab a certain item in that table (=list). E.g. maps[2] would grab the second item in the maps table, whose value is map 2. Same applies to the modes table. Hence your output you asked about.
To get a random game mode, you just call math.random(#mode). math.random can accept up to two parameters. With these you define your range, to pick the random number from. You can also pass a single parameter, then Lua assumes to you want to start at 1. So math.random(3) becomes actually math.random(1, 3). #mode in this case stand for "count all game modes in that table and give me that count" which is 3.
To return your chosen map and game mode from that function we could use another table, just to hold both values. This time however the table would have different keys to access the values inside it; namely "map" and "mode".
Complete example would be:
local function setupRound()
local modes = {"mode 1", "mode 2", "mode 3"} -- different game modes
local maps = {"map 1", "map 2", "map 3"} -- different maps
return {map = maps[math.random(#maps)], mode = modes[math.random(#modes)]}
end
for i = 1, 10 do
local freshRound = setupRound()
print(freshRound.map, freshRound.mode)
end

dask equivalent of df.loc[df.index.intesection(mylabels)]

When I run df.loc[mylabels] in dask I get a warning with the link to
Warning Starting in 0.21.0, using .loc or [] with a list with one or more missing labels, is deprecated, in favor of .reindex *
This page also says:
Alternatively, if you want to select only valid keys, the following is idiomatic and efficient; it is guaranteed to preserve the dtype of the selection.
In [106]: labels = [1, 2, 3]
In [107]: s.loc[s.index.intersection(labels)]
Out[107]:
1 2
2 3
dtype: int64
Dask indexes do not have an intersection method.
So hat is the recommended way to achieve the above effect in dask?
The problem with df.loc[mylabels] is that mylabels contains items not in df.index.
For now it looks like you should continue calling df.loc[labels].
It looks like things have changed upstream and probably dask.dataframe needs to follow a bit. I recommend submitting a bug report to https://github.com/dask/dask/issues/new

How can filter any SET by its concat value according to another SET in Redis

I have a filter optimization problem in Redis.
I have a Redis SET which keeps the doc and pos pairs of a type in a corpus.
example:
smembers type_in_docs.1
result: doc.pos pairs
array (size=216627)
0 => string '2805.2339' (length=9)
1 => string '2410.14208' (length=10)
2 => string '3516.1810' (length=9)
...
Another redis set i create live according to user choices
It contains selected docs.
smembers filteredDocs
I want to filter doc.pos pairs "type_in_docs" set according to user Doc id choices.
In fact if i didnt use concat values in set it was easy with SINTER.
So i implement a php filter code as below.
It works but need an optimization.
In big doc.pairs set too much time need. (Nearly After 150000 members!)
$concordance= $this->redis->smembers('types_in_docs.'.$typeID);
$filteredDocs= $this->redis->smembers('filteredDocs');
$filtered = array_filter($concordance, function($pairs) use ($filteredDocs) {
if( in_array(substr($pairs, 0, strpos($pairs, '.')), $filteredDocs) ) return true;
});
I tried sorted set with scores as docId.
Bu couldnt find a intersect or filter option for score values.
I am thinking and searching a Redis based solution with supported keys, sets or Lua script for time optimization.
But nothing find.
How can i filter Redis sets with concat values?
Thanks for helps.
Your code is slow primarily because you're moving a lot of data from Redis to your PHP filter. The general motivation here should be perform as much filtering as possible on the server. To do that you'd need to pay some sort of price in CPU & RAM.
There are many ways to do this, here's one:
Ensure you're using Redis v2.8.9 or above.
To allow efficiently looking for doc only, keep your doc.pos pairs as is but use Sorted Sets with score = 0, your e.g.:
ZADD type_in_docs.1 0 2805.2339 0 2410.14208 0 3516.1810
This will allow you to mimic SISMEMBER for doc in the set with:
ZRANGEBYLEX type_in_docs.1 [<$typeID> (<$typeID + "\xff">
You can now just SMEMBERS on the (usually) smaller filterDocs set and then call ZRANGEBYLEX on each for immediate gains.
If you want to do better - in extreme cases (i.e. large filterDocs, small type_in_docs) you should do the reverse.
If you want to do even better, use Lua to wrap up the filtering logic - something like:
-- #usage: redis-cli --filter_doc_pos.lua <filter set keyname> <type pairs keyname>
-- #returns: list of matching doc.pos pairs
local r = {}
for _, fv in pairs(redis.call("SMEMBERS", KEYS[1])) do
local t = redis.call("ZRANGEBYLEX", KEYS[2], "[" .. fv , "(" .. fv .. "\xff")
for _, tv in pairs(t) do
r[#r+1] = tv
end
end
return r

Resources