TFF: Remote Executor - tensorflow-federated

We are setting up a federated scenario with Server and Client on different physical machines.
On the server, we have used the docker container to kickstart:
The above has been borrowed from Kubernetes tutorial. We believe this creates a 'local executor' [Ref 1] which helps create a gRPC server [Ref 2].
Ref 1:
Ref 2:
Next on the client 1, we are calling tff.framework.RemoteExecutor that connects to the gRPC server.
Our understanding based on the above is that the Remote Executor runs on the client which connects to the gRPC server.
Assuming the above is correct, how can we send a
tff.tf_computation
from the server to the client and print the output on the client side to ensure the whole setup works well.

Your understanding is definitely correct.
If you construct an ExecutorFactory directly, as seems to be the case in the code above, passing it to tff.framework.set_default_context will install your remote stack as the default mechanism for executing computations in the TFF runtime. You should additionally be able to pass the appropriate channels to tff.backends.native.set_remote_execution_context to handle the remote executor construction and context installation if desired, but the way you are doing it certainly works, and allows for greater customization.
Once you have set this up, running an example end-to-end should be fairly simple. We will set up a computation which takes a set of federated integers, prints on the clients, and sums the integers up. Let:
#tff.tf_computation(tf.int32)
def print_and_return(x):
# We must use tf.print here, as this logic will be
# serialized and run on the clients as TensorFlow.
tf.print('hello world')
return x
#tff.federated_computation(tff.FederatedType(tf.int32, tff.CLIENTS))
def print_and_sum(federated_arg):
same_ints = tff.federated_map(print_and_return, federated_arg)
return tff.federated_sum(same_ints)
Suppose we have N clients; we simply instantiate the set of federated integers, and invoke our computation.
federated_ints = [1] * N
total = print_and_sum(federated_ints)
assert total == N
This should cause the tf.prints defined above to run on the remote machine; as long as tf.print is directed to an output stream which you can monitor, you should be able to see it.
PS: you may note that the federated sum above is unnecessary; it certainly is. The same effect can be had by simply mapping the identity function with the serialized print.

Related

DiagramBuilder: Cannot operate on ports of System plant until it has been registered using AddSystem

I have an issue working with DiagramBuilder and ManipulationStation classes.
It appears to me, that c++ API and the python bindings work differently in my case.
C++ API behaves as expected, while the python bindings result in the runtime error:
DiagramBuilder: Cannot operate on ports of System plant until it has been registered using AddSystem
How I use C++ API
In one of the ManipulationStation::Setup...() methods I inject a block of code, that adds an extra manipuland
const std::string sdf_path = FindResourceOrThrow("drake/examples/manipulation_station/models/bolt_n_nut.sdf");
RigidTransform<double> X_WC(RotationMatrix<double>::Identity(), Vector3d(0.0, -0.3, 0.1));
bolt_n_nut_ = internal::AddAndWeldModelFrom(sdf_path, "nut_and_bolt", lant_->world_frame(), "bolt", X_WC, plant_);
I inject another block of code into the method ManipulationStation::Finalize:
auto zero_torque = builder.template AddSystem<systems::ConstantVectorSource<double>>(Eigen::VectorXd::Zero(plant_->num_velocities(bolt_n_nut_)));
builder.Connect(zero_torque->get_output_port(), plant_->get_actuation_input_port(bolt_n_nut_));
With these changes, the simulation runs as expected.
How I use python bindings
plant = station.get_multibody_plant()
manipuland_path = get_manipuland_resource_path()
bolt_with_nut = Parser(plant=plant).AddModelFromFile(manipuland_path)
X_WC = RigidTransform(RotationMatrix.Identity(), [0.0, -0.3, 0.1])
plant.WeldFrames(plant.world_frame(), plant.GetFrameByName('bolt', bolt_with_nut), X_WC)
...
station.Finalize()
zero_torque = builder.AddSystem(ConstantValueSource(AbstractValue.Make([0.])))
builder.Connect(zero_torque.get_output_port(), plant.get_actuation_input_port(bolt_with_nut_model))
This triggers a RuntimeError with a message as above; The port, which causes this error is nut_and_bolt_actuation.
My vague understanding of the problem is the (in) visibility of nut_and_bolt System, due to having two distinct DiagramBuilders in a process: 1) a one is inside ManipulationStation 2) another is in the python code, that instantiates this ManipulationStation object.
Using ManipulationStation via python bindings is a preference for me, because that way I would've avoided depending on a custom build of drake library.
Thanks for your insight!
I agree with your assessment: you have two different DiagramBuilder objects here. This does not have anything to due with C++ or Python; the ManipulationStation is itself a Diagram (created using its own DiagramBuilder), and you have a second DiagramBuilder (in either c++ or python) that is connecting the ManipulationStation together with other elements. You are trying to connect a system that is in the external diagram to a port that is in the internal diagram, but is not exposed.
The solution would be to have the ManipulationStation diagram expose the extra nut and bolt actuation port so that you can connect to it from the second builder.
If you prefer Python, I've switched my course to using a completely python version of the manipulation station. I find this version is much easier to adapt to different student projects. (To be clear, the setup is in python, but at simulation time all of the elements are c++ and it doesn't call back to python; so the performance is almost identical.)

Is any way to start go_binary before java_test?

Our project has a few GRPC servers defined as go_binary targets. We develop client SDKs for Java and Python applications and we would like to use java_test and py_test. Is any way to start a specific go_binary target before java_test or py_test?
You can create a test harness that starts the gRPC server before running the tests. For example, you could add the binary to the data attribute of the test, and then started it beforehand:
go_binary(
name = "my_grpc_server",
[...]
)
py_test(
name = "my_test",
[...]
data = [":my_grpc_server"],
)
and then inside the test file:
class ClientTestCase(unittest.TestCase):
def setUp(self):
r = runfiles.Create()
self.server = subprocess.Popen([r.Rlocation("path/to/my_grpc_server")])
def tearDown(self):
self.server.terminate()
self.server.wait()
This example is very simple, you'll probably run into issues regarding the availability of the port the server listens on, or waiting for the server to start up. You could add flags to your gRPC server to allow communication over a domain socket, or make it listen on an unused port and have the test parse the port number from the server's log output.
For details on finding the server with runfiles: https://github.com/bazelbuild/bazel/blob/a7a0d48fbeb059ee60e77580e5d05baeefdd5699/tools/python/runfiles/runfiles.py#L16-L58
If you find yourself copy-pasting this pattern a lot, or having to implement it in multiple languages, you could try using an sh_test() rule to wrap the underlying py_test or java_test, and to start the server, then start the test with an environment variable telling it how to reach the server (eg MY_GRPC_SERVER_ADDRESS=localhost:${test_port}.

Override dask scheduler to concurrently load data on multiple workers

I want to run graphs/futures on my distributed cluster which all have a 'load data' root task and then a bunch of training tasks that run on that data. A simplified version would look like this:
from dask.distributed import Client
client = Client(scheduler_ip)
load_data_future = client.submit(load_data_func, 'path/to/data/')
train_task_futures = [client.submit(train_func, load_data_future, params)
for params in train_param_set]
Running this as above the scheduler gets one worker to read the file, then it spills that data to disk to share it with the other workers. However, loading the data is usually reading from a large HDF5 file, which can be done concurrently, so I was wondering if there was a way to force all workers to read this file concurrently (they all compute the root task) instead of having them wait for one worker to finish then slowly transferring the data from that worker.
I know there is the client.run() method which I can use to get all workers to read the file concurrently, but how would you then get the data you've read to feed into the downstream tasks?
I cannot use the dask data primitives to concurrently read HDF5 files because I need things like multi-indexes and grouping on multiple columns.
Revisited this question and found a relatively simple solution, though it uses internal API methods and involves a blocking call to client.run(). Using the same variables as in the question:
from distributed import get_worker
client_id = client.id
def load_dataset():
worker = get_worker()
data = {'load_dataset-0': load_data_func('path/to/data')}
info = worker.update_data(data=data, report=False)
worker.scheduler.update_data(who_has={key: [worker.address] for key in data},
nbytes=info['nbytes'], client=client_id)
client.run(load_dataset)
Now if you run client.has_what() you should see that each worker holds the key load_dataset-0. To use this in downstream computations you can simply create a future for the key:
from distributed import Future
load_data_future = Future('load_dataset-0', client=client)
and this can be used with client.compute() or dask.delayed as usual. Indeed the final line from the example in the question would work fine:
train_task_futures = [client.submit(train_func, load_data_future, params)
for params in train_param_set]
Bear in mind that it uses internal API methods Worker.update_data and Scheduler.update_data and works fine as of distributed.__version__ == 1.21.6 but could be subject to change in future releases.
As of today (distributed.__version__ == 1.20.2) what you ask for is not possible. The closest thing would be to compute once and then replicate the data explicitly
future = client.submit(load, path)
wait(future)
client.replicate(future)
You may want to raise this as a feature request at https://github.com/dask/distributed/issues/new

why redis cannot execute non deterministic commands in lua script

SPOP is not allowed to be executed in lua. And if you do some non-deterministic commands firstly, then you is not allowed to execute write commands. This seems confusing to me. So why redis has such limitation?
It's explained fairly well in the Redis docs here.
The scripts are replicated to slaves by sending the script over and running it on the slave, so the script needs to always produce the same results every time it's run or the data on the slave will diverge from the data on the master.
You could try the new 'scripts effects replication' in the same link if you need perform non deterministic operations in a script.
you need to run this replicate commands function before any data changing command
redis.replicate_commands()
it's explained here
On a single Redis instance I cannot think about any negativ effect.
But say you're running some master-slave setup. the result of those lua scripts which calls e.g. TIME wouldn't be equal on the master.
you can act like what proposed at the last of error description
Additional information: ERR Error running script (call to
f_082d853d36ea323f47b6b00d36b7db66dac0bebd): #user_script:10:
#user_script: 10: Write commands not allowed after non deterministic
commands. Call redis.replicate_commands() at the start of your script
in order to switch to single commands replication mode.
Call redis.replicate_commands() at the start of your script in order to switch to single commands replication mode.
note that because of redis.call('time') call is non deterministic.
local function ticks(key,ticks)
redis.replicate_commands()
local t = ticks
local time = redis.call('time')
local last=redis.call('get',key..'_last')
if last==t then
local inc=redis.call('incr',key..'cnt')
t = t+inc
else
redis.call('set',key..'cnt',0)
end
redis.call('set',key..'_last',t)
return t
end
return ticks(#key,#ticks)

Using Neo4j with React JS

Can we use graph database neo4j with react js? If not so is there any alternate option for including graph database in react JS?
Easily, all you need is neo4j-driver: https://www.npmjs.com/package/neo4j-driver
Here is the most simplistic usage:
neo4j.js
//import { v1 as neo4j } from 'neo4j-driver'
const neo4j = require('neo4j-driver').v1
const driver = neo4j.driver('bolt://localhost', neo4j.auth.basic('username', 'password'))
const session = driver.session()
session
.run(`
MATCH (n:Node)
RETURN n AS someName
`)
.then((results) => {
results.records.forEach((record) => console.log(record.get('someName')))
session.close()
driver.close()
})
It is best practice to close the session always after you get the data. It is inexpensive and lightweight.
It is best practice to only close the driver session once your program is done (like Mongo DB). You will see extreme errors if you close the driver at a bad time, which is incredibly important to note if you are beginner. You will see errors like 'connection to server closed', etc. In async code, for example, if you run a query and close the driver before the results are parsed, you will have a bad time.
You can see in my example that I close the driver after, but only to illustrate proper cleanup. If you run this code in a standalone JS file to test, you will see node.js hangs after the query and you need to press CTRL + C to exit. Adding driver.close() fixes that. Normally, the driver is not closed until the program exits/crashes, which is never in a Backend API, and not until the user logs out in the Frontend.
Knowing this now, you are off to a great start.
Remember, session.close() immediately every time, and be careful with the driver.close().
You could put this code in a React component or action creator easily and render the data.
You will find it no different than hooking up and working with Axios.
You can run statements in a transaction also, which is beneficial for writelocking affected nodes. You should research that thoroughly first, but transaction flow is like this:
const session = driver.session()
const tx = session.beginTransaction()
tx
.run(query)
.then(// same as normal)
.catch(// errors)
// the difference is you can chain multiple transactions:
const tx1 = await tx.run().then()
// use results
const tx2 = await tx.run().then()
// then, once you are ready to commit the changes:
if (results.good !== true) {
tx.rollback()
session.close()
throw error
}
await tx.commit()
session.close()
const finalResults = { tx1, tx2 }
return finalResults
// in my experience, you have to await tx.commit
// in async/await syntax conditions, otherwise it may not commit properly
// that operation is not instant
tl;dr;
Yes, you can!
You are mixing two different technologies together. Neo4j is graph database and React.js is framework for front-end.
You can connect to Neo4j from JavaScript - http://neo4j.com/developer/javascript/
Interesting topic. I am using the driver in a React App and recently experienced some issues. I am closing the session every time a lifecycle hook completes like in your example. When there where more intensive queries I would see a timeout error. Going back to my setup decided to experiment by closing the driver in some more expensive queries and it looks like (still need more testing) the crashes are gone.
If you are deploying a real-world application I would urge you to think about Authentication and Authorization when using a DB-React setup only as you would have to store username/password of the neo4j server in the client. I am looking into options of having the Neo4J server issuing a token and receiving it for Authorization but the best practice is for sure to have a Node.js server in the middle with something like Passport to handle Authentication.
So, all in all, maybe the best scenario is to only use the driver in Node and have the browser always communicating with the Node server using axios...

Resources