Docker Tensorflow-Serving Predictions too large

Docker Tensorflow-Serving Predictions too large - docker

I'm trying to serve my model using Docker + tensorflow-serving. However, due to restrictions with serving a model with an iterator (using
make_initializable_iterator() ), I had to split up my model.
I'm using grpc to interface with my model on docker. The problem is that my predicted tensor is about 10MB and about 4.1MB serialized. The error I'm getting is:
"grpc_message":"Received message larger than max (9830491 vs. 4194304)"
Is there a way to write out my predictions to disk instead of transmitting them in the grpc response? The output file is a 32-channel tensor so I'm unable to decode it as a png before saving to disk using tf.io.write_file.
Thanks!

Default message length is 4MB in gRPC, but we can extend size in your gRPC client and server request in python as something given below. You will be able to send and receive large messages without streaming
request = grpc.insecure_channel('localhost:6060',
options=[('grpc.max_send_message_length', MAX_MESSAGE_LENGTH),
('grpc.max_receive_message_length', MAX_MESSAGE_LENGTH)])
In GO lang we have functions refer the URLs
https://godoc.org/google.golang.org/grpc#MaxMsgSize https://godoc.org/google.golang.org/grpc#WithMaxMsgSize

The code to set the size of Messages to Unlimited in gRPC Client Request using C++ is shown below:
grpc::ChannelArguments ch_args;
ch_args.SetMaxReceiveMessageSize(-1);
std::shared_ptr<grpc::Channel> ch = grpc::CreateCustomChannel("localhost:6060", grpc::InsecureChannelCredentials(), ch_args);

Related

Does Artifactory support partial file upload?

I'm working a Python 3 script to download data from a server in chunk of data (~1MB each, but can be setup externally).
Each block downloaded must be uploaded to a JFrog Artifactory (JA) server (version 5.4.6, revision 50406900).
I'm using the HTTP Header 'Content-Range' to send the data blocks. But the JA is replacing the old data and keeping only the last block.
The test file has 1164 byte and the header was sent right, with test block of 512 bytes (test only, no need big file to test it)!
- BLK#1: bytes 0-511/1164
- BLK#2: bytes 512-1023/1164
- BLK#3: bytes 1024-1163/1164
NOTE: Each PUT on JA was answered with a HTTP RC 201 (Created).
The syntax look all right (https://developer.mozilla.org/pt-BR/docs/Web/HTTP/Headers/Content-Range).
The first two blocks was 512 bytes long and the last one with 140 bytes. So we got the 1164 bytes of file.
I'm digging through the official documentation, but haven't been able to find a answer.
Does JFrog Artifactory able to receive partial uploads?
If so, how I can acomplish it?

I figured out a way to solve the JFrog Artifactory annoying failure on partial upload of big files.
I just created a inherited class of the io.RawIOBase, a base class of Python 3. After a few tests I got the list of basic methods which I must implement.
class MemoryFile(io.RawIOBase):
Just get the test with a basic usage sample code:
# Each side must have a session to avoid issues
downReq = requests.Session()
# Here the request with all security tokens and required params (I'm ignoring SSL check for test)
response = downReq.get(urlTargetFile,stream=True, verify=False)
# My custom class take a Response as constructor argument
fpIn = MemoryFile(response)
# Another session to upload the date
upReq = requests.Session()
# Call the JFrog Artifactory address
upResp = upReq.put(urlUp, data=fpIn , headers=headers, verify=False, stream=True)
# The SHA-256 hash to save into Audit Log
print("SHA-256: ", fp.hash())

TFF: Remote Executor

We are setting up a federated scenario with Server and Client on different physical machines.
On the server, we have used the docker container to kickstart:
The above has been borrowed from Kubernetes tutorial. We believe this creates a 'local executor' [Ref 1] which helps create a gRPC server [Ref 2].
Ref 1:
Ref 2:
Next on the client 1, we are calling tff.framework.RemoteExecutor that connects to the gRPC server.
Our understanding based on the above is that the Remote Executor runs on the client which connects to the gRPC server.
Assuming the above is correct, how can we send a
tff.tf_computation
from the server to the client and print the output on the client side to ensure the whole setup works well.

Your understanding is definitely correct.
If you construct an ExecutorFactory directly, as seems to be the case in the code above, passing it to tff.framework.set_default_context will install your remote stack as the default mechanism for executing computations in the TFF runtime. You should additionally be able to pass the appropriate channels to tff.backends.native.set_remote_execution_context to handle the remote executor construction and context installation if desired, but the way you are doing it certainly works, and allows for greater customization.
Once you have set this up, running an example end-to-end should be fairly simple. We will set up a computation which takes a set of federated integers, prints on the clients, and sums the integers up. Let:
#tff.tf_computation(tf.int32)
def print_and_return(x):
# We must use tf.print here, as this logic will be
# serialized and run on the clients as TensorFlow.
tf.print('hello world')
return x
#tff.federated_computation(tff.FederatedType(tf.int32, tff.CLIENTS))
def print_and_sum(federated_arg):
same_ints = tff.federated_map(print_and_return, federated_arg)
return tff.federated_sum(same_ints)
Suppose we have N clients; we simply instantiate the set of federated integers, and invoke our computation.
federated_ints = [1] * N
total = print_and_sum(federated_ints)
assert total == N
This should cause the tf.prints defined above to run on the remote machine; as long as tf.print is directed to an output stream which you can monitor, you should be able to see it.
PS: you may note that the federated sum above is unnecessary; it certainly is. The same effect can be had by simply mapping the identity function with the serialized print.

ROS - How do I publish a message and get the subscribed callback immediately

I have a ROS node that allows you to "publish" a data structure to it, to which it responds by publishing an output. The timestamp of what I published and what it publishes is matched.
Is there a mechanism for a blocking function where I send/publish and output, and it waits until I receive an output?

I think you need the ROS_Services (client/server) pattern instead of the publisher/subscriber.
Here is a simple example to do that in Python:
Client code snippet:
import rospy
from test_service.srv import MySrvFile
rospy.wait_for_service('a_topic')
try:
send_hi = rospy.ServiceProxy('a_topic', MySrvFile)
print('Client: Hi, do you hear me?')
resp = send_hi('Hi, do you hear me?')
print("Server: {}".format(resp.response))
except rospy.ServiceException, e:
print("Service call failed: %s"%e)
Server code snippet:
import rospy
from test_service.srv import MySrvFile, MySrvFileResponse
def callback_function(req):
print(req)
return MySrvFileResponse('Hello client, your message received.')
rospy.init_node('server')
rospy.Service('a_topic', MySrvFile, callback_function)
rospy.spin()
MySrvFile.srv
string request
---
string response
Server out:
request: "Hi, do you hear me?"
Client out:
Client: Hi, do you hear me?
Server: Hello client, your message received.
Learn more in ros-wiki
Project repo on GitHub.
[UPDATE]
If you are looking for fast communication, TCP-ROS communication is not your purpose because it is slower than a broker-less communicator like ZeroMQ (it has low latency and high throughput):
ROS-Service pattern equivalent in ZeroMQ is REQ/REP (client/server)
ROS publisher/subscriber pattern equivalent in ZeroMQ is PUB/SUB
ROS publisher/subscriber with waitformessage equivalent in ZeroMQ is PUSH/PULL
ZeroMQ is available in both Python and C++
Also, to transfer huge amounts of data (e.g. pointcloud), there is a mechanism in ROS called nodelet which is supported only in C++. This communication is based on shared memory on a machine instead of TCP-ROS socket.
What exactly is a nodelet?

Since you want to stick with publish/ subscribers, assuming from your comment, that services are to slow I would have a look at waitForMessage (Documentation).
And for an example on how to use it you can have a look at this ros answers question.
All you need to do is to publish your data and immediately call waitForMessage on the output topic and manually pass the received message to your "callback".
I hope this is what you were looking for.

To get this request/reply behaviour ROS has a mechanism called ROS service.
You can specify the input and output of your service in a service file similar to a ROS message definition. You can then call the service of a node with your input and the call will receive an output when the service is finished.
Here is a tutorial how to use this mechanism in python. If you prefer C++ there is also one, you should find it.

How select prefered file transport method?

I have a problem, as I think, with my prosody configuration. When I am sending files (for example photos) more the ~2 or 3 megabytes (as I established experimentally) using Converstions 2.* version (android IM app) it transfers this files using peer to peer connection instead of uploading this file to server and sending a link to my interlocutor. Small files transfers well using http upload. And I couldn't find a reason for such behavior.
Here are some lines for http_upload module from my config, that I took from official documentation (where I hadn't found a setup for turning off peer to peer files transfer):
http_upload_file_size_limit = 536870912 -- 512 MB in bytes
http_upload_expire_after = 604800 -- 60 * 60 * 24 * 7
http_upload_quota = 10737418240 -- 10 GB
http_upload_path = "/var/lib/prosody"
And this is my full config: https://pastebin.com/V6DNYrhe

Small files are transferred well using http upload. And I couldn't
find a reason for such behavior.
TL;DR: You put options in the wrong place. The default 1MB limit
applies. This is advertised to clients so they know about it and can use
more efficient p2p transfer methods for very large files.
http_upload_path = "/var/lib/prosody"
This line makes Prosodys data directory public, allowing anyone easy
access to all user data. You really don't want to do that. You are
lucky you did not put that in the correct section.
And this is my full config: https://pastebin.com/V6DNYrhe
"http_upload" is in the global modules_enabled list which will load
it onto all VirtualHost(s).
You have added options to the end of the config file, putting them under
a Component section. That makes those options only apply to that
Component.
Thus, the VirtualHost where mod_http_upload is loaded sees no options
set and will use the defaults.
http_upload_file_size_limit = 536870912 -- 512 MB in bytes
Don't do this. Prosodys built-in HTTP server is not optimized for very
large uploads. There is a safety limit on HTTP request size that will
cap HTTP upload size limit to 10M to prevent DoS attacks.
While that limit can be changed, I would strongly suggest you look at
https://modules.prosody.im/mod_http_upload_external.html instead.

zlib inflate giving data error in erlang

I have a java client which is sending some message to an erlang server process listening on TCP.The java client sends the data using outputstream.On the server side i am using following call to uncompress the data after initialising zlib
zlib:inflate(ZStream, Data),
where Data is binary.I am getting data_error on this call.
Under what conditions do I get data_error with zlib.

Try setting a 0 or -15 WindowBits, would help if you paste more code like the zlib:inflateInit call, the binary dump of Data variable, and the Java side zlib init.

If you are streaming the data in relatively small chunks, you can use my ezlib on Github.
Performance wise it's around 69 % faster than erlang driver and also works better when you have concurrent sessions.
To integrate, use rebar as you would do for any other erlang app. To run a small example:
StringBin = <<"this is a string compressed with zlib nif library">>,
{ok, DeflateRef} = ezlib:new(?Z_DEFLATE),
{ok, InflateRef} = ezlib:new(?Z_INFLATE),
CompressedBin = ezlib:process(DeflateRef, StringBin),
DecompressedBin = ezlib:process(InflateRef, CompressedBin).
Do not use it to compress large blocks, because you can block the erlang scheduler. I will change this in the subsequent versions.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Docker Tensorflow-Serving Predictions too large - docker

The code to set the size of Messages to Unlimited in gRPC Client Request using C++ is shown below: grpc::ChannelArguments ch_args; ch_args.SetMaxReceiveMessageSize(-1); std::shared_ptr<grpc::Channel> ch = grpc::CreateCustomChannel("localhost:6060", grpc::InsecureChannelCredentials(), ch_args);

Related

Does Artifactory support partial file upload?

TFF: Remote Executor

ROS - How do I publish a message and get the subscribed callback immediately

How select prefered file transport method?

zlib inflate giving data error in erlang

Categories

Resources