How to perform asynchronous model training using TFF framework?
I review the iterative training process loop, however I am not sure how to know which clients models are received.
Its quite possible to simulate something akin to "asynchronous FL" in TFF. One way to think about this could be to conceptually decouple the simulation time from wall clock time.
Sampling different numbers of clients each round (rather than the uniform K clients that is commonly done), perhaps with some distribution that weights clients based on how long they are expected to train, could simulate asynchronous FL. Its possible to only process a portion of the selected clients first, the researcher has the freedom to slice up the data/computation as they desired.
Python-esque pseudo code demonstrates the two techniques, different client sampling, and delayed gradient application:
state = fed_avg_iter_proc.initialize()
for round_num in range(NUM_ROUNDS):
# Here we conceptualize a "round" as a block of time, rather than a synchronous
# round. We have a function that determines which clients will "finish" within
# our configured block of time. This might even return only a single client.
participants = get_next_clients(time_window=timedelta(minutes=30))
num_participants = len(participants)
# Here we only process the first half, and then updated the global model.
state2, metrics = fed_avg_iter_proc.next(state, participants[:num_participants/2])
# Now process the second half of the selected clients.
# Note: this is now apply the 'pseudo-gradient' that was computed on clients
# (the difference between the original `state` and their local training result),
# to the model that has already taken one step (`state2`). This possibly has
# undesirable effects on the optimisation process, or may be improved with
# techniques that handle "stale" gradients.
state3, metrics = fed_avg_iter_proc.next(state2, participants[num_participants/2:])
# Finally update the state for the next for-loop of the simulation.
state = state3
Related
I'm reading over the implementation of the dask-lightgbm estimators (specifically, the _train_part function in dask_lightgb.core.py), and I'm failing to see how the entirety of the training set gets used to fit the final estimator?
The _train_part function accepts the boolean argument return_model, and in the implementation of the train function (which uses client.submit to call _train_part on each worker), return_model is only true when the worker is the "master_worker" (which itself appears to be a randomly chosen Dask worker). Logically, each worker gets dispatched 1/n chunks of the overall model training set - where n = total number of workers - then each worker trains its own independent model on its own subset of the training set. The return_model parameter controls whether each worker's model gets returned by _train_part, so it returns None for all workers - and therefore, models - except for one worker.
Code:
def _train_part(params, model_factory, list_of_parts, worker_addresses, return_model, local_listen_port=12400,
time_out=120, **kwargs):
network_params = build_network_params(worker_addresses, get_worker().address, local_listen_port, time_out)
params.update(network_params)
# Concatenate many parts into one
parts = tuple(zip(*list_of_parts))
data = concat(parts[0])
label = concat(parts[1])
weight = concat(parts[2]) if len(parts) == 3 else None
try:
model = model_factory(**params)
model.fit(data, label, sample_weight=weight)
finally:
_safe_call(_LIB.LGBM_NetworkFree())
return model if return_model else None
Is this not equivalent to training a non-distributed version of a lightgbm estimator on a 1/n subsample of the training set? Am I missing something? I feel like I am missing a part where either the workers' independent models get combined into one, or where a single estimator is getting updated with the individual trees learned by separate workers.
Thank you!
Ah the answer is yes - dask_lightgbm uses all available training samples. Dask's responsibility is only to distribute data across workers. LightGBM handles all distributed learning once its network parameters are set. It's not that each worker is training its own independent model - LightGBM is training a single model - but each worker will get a copy of it. For this reason, only the chosen worker returns the fitted estimator, and everyone else returns None.
I want to use tf.metrics.accuracy to track the accuracy of my predictions, but I am unsure of how to use the update_op (acc_update_op below) that the function returns:
accuracy, acc_update_op = tf.metrics.accuracy(labels, predictions)
I was thinking that adding it to tf.GraphKeys.UPDATE_OPS would make sense, but I am not sure how to do this.
tf.metrics.accuracy is one of the many streamed metric TensorFlow operations (another one of which is tf.metrics.recall). Upon creation, two variables (count and total) are created in order to accumulate all incoming results for one final outcome. The first returned value is a tensor for the calculation count / total. The second op returned is a stateful function which updates these variables. Streamed metric functions are useful when evaluating the performance of a classifier over multiple batches of data. A quick example of use:
# building phase
with tf.name_scope("streaming"):
accuracy, acc_update_op = tf.metrics.accuracy(labels, predictions)
test_fetches = {
'accuracy': accuracy,
'acc_op': acc_update_op
}
# when testing the classifier
with tf.name_scope("streaming"):
# clear counters for a fresh evaluation
sess.run(tf.local_variables_initializer())
for _i in range(n_batches_in_test):
fd = get_test_batch()
outputs = sess.run(test_fetches, feed_dict=fd)
print("Accuracy:", outputs['accuracy'])
I was thinking that adding it to tf.GraphKeys.UPDATE_OPS would make sense, but I am not sure how to do this.
That would not be a good idea unless you are only using the UPDATE_OPS collection for testing purposes. Usually, the collection will already have certain control operations for the training phase (such as moving batch normalization parameters) that are not meant to be run alongside the validation phase. It may be best to either keep them in a new collection or add these operations to the fetch dictionary manually.
I am attempting to use reinforcement learning to choose the closest point to the origin out of a given set of points repeatedly, until a complex (and irrelevant) end condition is reached. (This is a simplification of my main problem.)
A 2D array containing possible points is passed to the reinforcement learning algorithm, which makes a choice as to which point it thinks is the most ideal.
A [1, 10]
B [100, 0]
C [30, 30]
D [5, 7]
E [20, 50]
In this case, D would be the true best choice. (The algorithm should ideally output 3, from the range 0 to 4.)
However, whenever I train the algorithm, it seems to not learn what the "concept" is, but instead just that choosing, say, C is usually the best choice, so it should always choose that.
import numpy as np
import rl.core as krl
class FindOriginEnv(krl.Env):
def observe(self):
return np.array([
[np.random.randint(100), np.random.randint(100)] for _ in range(5)
])
def step(self, action):
observation = self.observe()
done = np.random.rand() < 0.01 # eventually
reward = 1 if done else 0
return observation, reward, done, {}
# ...
What should I modify about my algorithm such that it will actually learn about the goal it is trying to accomplish?
Observation shape?
Reward function?
Action choices?
Keras code would be appreciated, but is not required; a purely algorithmic explanation would also be extremely helpful.
Sketching out the MDP from your description, there are a few issues:
Your observation function appears to be returning 5 points, so that means a state can be any configuration of 10 integers in [0,99]. That's 100^10 possible states! Your state space needs to be much smaller. As written, observe appears to be generating possible actions, not state observations.
You suggest that you're are picking actions from [0,4], where each action is essentially an index into an array of points available to the agent. This definition of the action space doesn't give the agent enough information to discriminate what you say you'd like it to (smaller magnitude point is better), because you only act based on the point's index! If you wanted to tweak the formulation a bit to make this work, you would define an action to be selecting a 2D point with each dimension in [0,99]. This would mean you would have 100^2 total possible actions, but to maintain the multiple choice aspect, you would restrict the agent to selecting amongst a subset at a given step (5 possible actions) based on its current state.
Finally, the reward function that gives zero reward until termination means that you're allowing a large number of possible optimal policies. Essentially, any policy that terminates, regardless of how long the episode took, is optimal! If you want to encourage policies that terminate quickly, you should penalize the agent with a small negative reward at each step.
I was using tensorflow input pipelines like cifar10 model in tensorflow and try to use tf.cond to do validation and I wrote something like this
train_data = model.input(istrain=True)
val_data = model.input(istrain=False)
# This selects which stream to use.
select_val = tf.placeholder(dtype=bool,shape=[],name='select_test')
data = tf.cond(
select_val,
lambda:val_data,
lambda:train_data
)
# Here is the model.
loss = ...
train_op = ...
...
with tf.Session():
...
And if I delete the cond and just use the training data, the speed is 4000 samples/s and if I use the code above, the speed decrease to 2300 samples/s. The validation pipeline capacity is set really small so it won't take too much memory in GPU. The frequency of doing validation is also really low.
I'm not sure what is going wrong and please help me out.
tf.cond is not fully lazy. Any operations that are required by either of the branches of the cond will be run even if the branch that requires it is not the branch to be executed. So in your case, both model.input(istrain=True) and model.input(istrain=False) are being execute every time your data op is being called. The results of one of them is just ignored.
The documentation for cond gives a minimal code example:
Note that the conditional execution applies only to the operations
defined in fn1 and fn2. Consider the following simple program:
z = tf.multiply(a, b)
result = tf.cond(x < y, lambda: tf.add(x, z), lambda: tf.square(y))
If x < y, the tf.add operation will be executed and tf.square
operation will not be executed. Since z is needed for at least one
branch of the cond, the tf.mul operation is always executed,
unconditionally. Although this behavior is consistent with the
dataflow model of TensorFlow, it has occasionally surprised some users
who expected a lazier semantics.
Also note, this means that if your model.input is pulling some set of data from a larger pool (say, a batch from an entire dataset), each time the cond is run, data gets pulled from both validation and training, and one set just gets thrown away. This can cause problems more serious than inefficiencies in some cases. For example, if you're processing only a certain number epochs, then with this code you're not actually processing that number of epochs because data was being pulled that was not used.
I'm doing policy gradient and I'm trying to figure out what the best objective function is for the task. The task is the open ai CartPole-v0 environment in which the agent receives a reward of 1 for each timestep it survives and a reward of 0 upon termination. I'm trying to figure out which is the best way to model the objective function. I've come up with 3 possible functions:
def total_reward_objective_function(self, episode_data) :
return sum([timestep_data['reward'] for timestep_data in timestep_data])
def average_reward_objective_function(self, episode_data):
return total_reward_objective_function(episode_data) / len(episode_data)
def sum_of_discounted_rewards_objective_function(self, episode_data, discount_rate=0.7)
return sum([episode_data[timestep]['reward'] * pow(discount_rate, timestep)
for timestep in enumerate(episode_data)])
Note that for the average reward objective function will always return 1 unless I intervene and modify the reward function to return a negative value upon termination. The reason I'm asking rather than just running a few experiments is because there's errors elsewhere. So if someone could point me towards a good practice in this area I could focus on the more significant mistakes in the algorithm.
You should use the last one (sum of discounted rewards), since the cart-pole problem is an infinite horizon MDP (you want to balance the pole as long as you can). The answer to this question explains why you should use a discount factor in infinite horizon MDPs.
The first one, instead, is just an undiscounted sum of the rewards, which could be used if episodes have a fixed length (for instance, in the case of a robot performing a 10 seconds trajectory). The second one is usually used in finite horizon MDPs, but I am not very familiar with it.
For the cart-pole, a discount factor of 0.9 should work (or, depending on the algorithm used, you can search for scientific papers and see the discount factor used).
A final note. The reward function you described (+1 at each timestep) is not the only one used in literature. A common one (and I think also the "original" one) gives 0 at each timestep and -1 if the pole falls. Other reward functions are related to the angle between the pole and the cart.