How to cache based on size in Varnish? - influxdb

I've been trying to cache based on response size of varnish.
Other answers suggested using Content-Length to decide whether or not to cache but I'm using InfluxDB (Varnish reverse proxies to this) and it responds with a Transfer-Encoding:Chunked which omits the Content-Length header and I am not able to figure out the size of the response.
Is there any way I could access response body size and make decision in vcl_backend_response?

Cache miss: chunked transfer encoding
When Varnish processes incoming chunks from the origin, it has no idea ahead of time how much data will be received. Varnish streams the data through to the client and stores the data byte per byte.
Once the 0\r\n\r\n is received to mark the end of the stream, Varnish will finalize the object storage and calculate the total amount of bytes.
Cache hit: content length
The next time the object is requested, Varnish no longer needs to use Chunked Transfer Encoding, because it has the full object in cache and knows the size. At that point a Content-Length header is part of the response, but this header is not accessible in VCL because it seems to be generated after sub vcl_deliver {} is executed.
Remove objects after the fact
It is possible to remove objects after the fact by monitoring their size through VSL.
The following command will look at the backend request accounting field of the VSL output and check the total size. If the size is greater than 5MB, it generates output
varnishlog -g request -i berequrl -q "BereqAcct[5] > 5242880"
Here's some potential output:
* << Request >> 98330
** << BeReq >> 98331
-- BereqURL /
At that point, you know that the / resource is bigger than 5 MB. You can then attempt to remove it from the cache using the following command:
varnishadm ban "obj.http.x-url == / && obj.http.x-host == domain.com"
Replace domain.com with the actual hostname of your service and set / to the URL of the actual endpoint you're trying to remove from the cache.
Don't forget to add the following code to your VCL file to ensure that the x-url and x-host headers are available:
sub vcl_backend_response {
set beresp.http.x-url = bereq.url;
set beresp.http.x-host = bereq.http.host;
}
sub vcl_deliver {
unset resp.http.x-url;
unset resp.http.x-host;
}
Conclusion
Although there's no turn-key solution to access the size of the body in VCL, but the hacky solution I suggested where we remove objects after the fact is the only thing I can think of.

Related

Docker Tensorflow-Serving Predictions too large

I'm trying to serve my model using Docker + tensorflow-serving. However, due to restrictions with serving a model with an iterator (using
make_initializable_iterator() ), I had to split up my model.
I'm using grpc to interface with my model on docker. The problem is that my predicted tensor is about 10MB and about 4.1MB serialized. The error I'm getting is:
"grpc_message":"Received message larger than max (9830491 vs. 4194304)"
Is there a way to write out my predictions to disk instead of transmitting them in the grpc response? The output file is a 32-channel tensor so I'm unable to decode it as a png before saving to disk using tf.io.write_file.
Thanks!
Default message length is 4MB in gRPC, but we can extend size in your gRPC client and server request in python as something given below. You will be able to send and receive large messages without streaming
request = grpc.insecure_channel('localhost:6060',
options=[('grpc.max_send_message_length', MAX_MESSAGE_LENGTH),
('grpc.max_receive_message_length', MAX_MESSAGE_LENGTH)])
In GO lang we have functions refer the URLs
https://godoc.org/google.golang.org/grpc#MaxMsgSize https://godoc.org/google.golang.org/grpc#WithMaxMsgSize
The code to set the size of Messages to Unlimited in gRPC Client Request using C++ is shown below:
grpc::ChannelArguments ch_args;
ch_args.SetMaxReceiveMessageSize(-1);
std::shared_ptr<grpc::Channel> ch = grpc::CreateCustomChannel("localhost:6060", grpc::InsecureChannelCredentials(), ch_args);

How select prefered file transport method?

I have a problem, as I think, with my prosody configuration. When I am sending files (for example photos) more the ~2 or 3 megabytes (as I established experimentally) using Converstions 2.* version (android IM app) it transfers this files using peer to peer connection instead of uploading this file to server and sending a link to my interlocutor. Small files transfers well using http upload. And I couldn't find a reason for such behavior.
Here are some lines for http_upload module from my config, that I took from official documentation (where I hadn't found a setup for turning off peer to peer files transfer):
http_upload_file_size_limit = 536870912 -- 512 MB in bytes
http_upload_expire_after = 604800 -- 60 * 60 * 24 * 7
http_upload_quota = 10737418240 -- 10 GB
http_upload_path = "/var/lib/prosody"
And this is my full config: https://pastebin.com/V6DNYrhe
Small files are transferred well using http upload. And I couldn't
find a reason for such behavior.
TL;DR: You put options in the wrong place. The default 1MB limit
applies. This is advertised to clients so they know about it and can use
more efficient p2p transfer methods for very large files.
http_upload_path = "/var/lib/prosody"
This line makes Prosodys data directory public, allowing anyone easy
access to all user data. You really don't want to do that. You are
lucky you did not put that in the correct section.
And this is my full config: https://pastebin.com/V6DNYrhe
"http_upload" is in the global modules_enabled list which will load
it onto all VirtualHost(s).
You have added options to the end of the config file, putting them under
a Component section. That makes those options only apply to that
Component.
Thus, the VirtualHost where mod_http_upload is loaded sees no options
set and will use the defaults.
http_upload_file_size_limit = 536870912 -- 512 MB in bytes
Don't do this. Prosodys built-in HTTP server is not optimized for very
large uploads. There is a safety limit on HTTP request size that will
cap HTTP upload size limit to 10M to prevent DoS attacks.
While that limit can be changed, I would strongly suggest you look at
https://modules.prosody.im/mod_http_upload_external.html instead.

Deleting an element from a Riak set via protocol buffers using the Erlang client without fetching the whole set

Via the HTTP API, we can delete an arbitrary element from a set without fetching the whole content:
curl -X POST http://127.0.0.1:8098/types/sets/buckets/travel/datatypes/cities -H "content-type: application/json" -d '{ "remove" : "Toronto" }'
(to verify:
tcpdump -i any -s 0 -n 'src port 8087 or src port 8098 and host 127.0.0.1')
However via protocol buffers client, we need to perform the following steps in order to delete an element from a set:
{ok, MySet} = case riakc_pb_socket:fetch_type(Pid, {<<"sets">>, <<"travel">>}, <<"cities">>) of {error,{notfound,set}}-> {ok, riakc_set:new()}; {ok, Set} -> {ok, Set} end.
ModSet=riakc_set:del_element(lists:last(ordsets:to_list(riakc_set:value(MySet))), MySet).
riakc_pb_socket:update_type(Pid, {<<"sets">>, <<"travel">>}, <<"cities">>, riakc_set:to_op(ModSet)).
As its name suggests, riakc_pb_socket:fetch_type retrieves the whole set. I could not find any methods in the Erlang client using protobuf to just send the delete request without retrieving the whole set first.
Is there a way to avoid fetching the whole set object via the protobuf client when deleting an element?
Update: protocol buffers API to update datatypes seems useful:
http://docs.basho.com/riak/latest/dev/references/protocol-buffers/dt-set-store/
riak_pb/src/riak_pb_dt_codec.erl
The last argument to riakc-pb-socket:modify_type (source code) is a set of changes. If you already know which element you want removed it looks like you could, in theory, create a new empty set and build a remove operation
Empty = riakc_set:new(Context),
Removal = riakc_set:del_element(<<"Toronto">>,Empty),
Op = riakc_set:to_op(Removal),
riakc_pb_socket:update_type(Pid, {<<"sets">>, <<"travel">>}, <<"cities">>, Op).
The key here is the Context which is an opaque value generated by the server. You may be able to send the request without one, or with an empty one (<<>>), but that is probably not a Good Thing(tm). The context is how Riak determines causality. It is updated by each actor each time an action is taken and is used to determine the final consistent value. So if you send a set operation with no context it may fail or be processed out of order, especially if any other updates are happening around the same time.
In the case of the HTTP API the entire object is fetched by a coordinator to get the context, then the operation is submitted with that context.
When performing a regular get operation, you can specify head in the options to get back just the metadata, which include the context, but not the data. I haven't tested with fetch_type yet, but there may be similar functionality for convergent types. If there is, you would just need to fetch the head to get the context, and submit your operation with that context.
-EDIT-
According to the docs:
%% You cannot fetch a Data Type's context directly using the Erlang
%% client. This is actually quite all right, as the client automatically
%% manages contexts when making updates.
It would appear that you can pass a fun to riakc_pb_socket:modify_type so that you don't have to explicitly fetch the old value, but that will just fetch it behind the scenes, so you only really save a tiny bit of code.
riakc_pb_socket:modify_type(Pid,
fun(MySet) -> riakc_set:del_element(lists:last(ordsets:to_list(riakc_set:value(MySet))), MySet)
end, {<<"sets">>, <<"travel">>}, <<"cities">>,[create]).

How much size is too large for Session on Rails 3

I want to load a text file in Session.
The file size is about 50KB ~ 100KB.
When user trigger the function in my page. it will create the Session.
My Server's RAM is about 8GB. and the max users is about 100
Because there will be a script run in background to collect IP and MAC in LAN.
The script continues write data into text file.
In the same time, the webpage will using Ajax to fetch fresh data from text file.and display on the page.
Is it suitable to implement by session to keep the result? or any better way to achieve ?
Thanks ~
The Python script will collect the data in the LAN in 1 ~ 3 minutes.(Background job)
To avoid blocking for 1~3 minutes. I will use Ajax to fetch the data in text file (continuing added by Python script) and show on the page.
And my user should carry the information cross pages. So I want to store the data in Session.
00:02:D1:19:AA:50: 172.19.13.39
00:02:D1:13:E8:10: 172.19.12.40
00:02:D1:13:EB:06: 172.19.1.83
C8:9C:DC:6F:41:CD: 172.19.12.73
C8:9C:DC:A4:FC:07: 172.19.12.21
00:02:D1:19:9B:72: 172.19.13.130
00:02:D1:13:EB:04: 172.19.13.40
00:02:D1:15:E1:58: 172.19.12.37
00:02:D1:22:7A:4D: 172.19.11.84
00:02:D1:24:E7:0F: 172.19.1.79
00:FD:83:71:00:10: 172.19.11.45
00:02:D1:24:E7:0D: 172.19.1.77
00:02:D1:81:00:02: 172.19.11.58
00:02:D1:24:36:35: 172.19.11.226
00:02:D1:1E:18:CA: 172.19.12.45
00:02:D1:0D:C5:A8: 172.19.1.45
74:27:EA:29:80:3E: 172.19.12.62
Why does this need to be stored in the browser? Couldn't you fire off what you're collecting to a data store somewhere?
Anyway, assuming you HAD to do this, and the example you gave is pretty close to the data you'll actually be seeing, you have a lot of redundant data there. You could save space for the IPs by creating a hash pointing to each successive value, I.E.
{172 => {19 => {13 => [39], 12 => [40, 73, 21], 1 => [83]}}} ...etc. Similarly for the MAC addresses. But again, you can probably simplify this problem a LOT by storing the info you need somewhere other than the session.

REXML :: RuntimeError (entity expansion has grown too large)

After upgrading to Ruby-1.9.3-p392 today, REXML throws a Runtime Error when attempting to retrieve an XML response over a certain size - everything works fine and no error is thrown when receiving under 25 XML records, but once a certain XML response length threshold is reached, I get this error:
Error occurred while parsing request parameters.
Contents:
RuntimeError (entity expansion has grown too large):
/.rvm/rubies/ruby-1.9.3-p392/lib/ruby/1.9.1/rexml/text.rb:387:in `block in unnormalize'
I realize this was changed in the most recent Ruby version:
http://www.ruby-lang.org/en/news/2013/02/22/rexml-dos-2013-02-22/
As a quick fix, I've changed the size of REXML::Document.entity_expansion_text_limit to a larger number and the error goes away.
Is there a less risky solution?
This issue is generated when you send too much content as XML response.
To fix this issue : You need to restrict the data(< 10k) in the individual node (Instead of sending the whole data, show truncated data and provide a seperate link to view full content)
The error is being raised from the below file :
ruby-2.1.2/lib/ruby/2.1.0/rexml/text.rb
# Unescapes all possible entities
def Text::unnormalize( string, doctype=nil, filter=nil, illegal=nil )
sum = 0
string.gsub( /\r\n?/, "\n" ).gsub( REFERENCE ) {
s = Text.expand($&, doctype, filter)
if sum + s.bytesize > Security.entity_expansion_text_limit
raise "entity expansion has grown too large"
else
sum += s.bytesize
end
s
}
end
The limit ruby-2.1.2/lib/ruby/2.1.0/rexml/text.rb defaults to 10240 which means 10k data per node.
REXML already defaults to only allow 10000 entity substitutions per document, so the maximum amount of text that can be generated by entity substitution will be around 98 megabytes. (Refer https://www.ruby-lang.org/en/news/2013/02/22/rexml-dos-2013-02-22/ )
That sounds like a LOT of XML. Do you really need to get all of it? Maybe you can just request certain fields from the remote server? One option might be to try another XML parser (Nokogiri for example). Another option to maybe use something other than XML as a transport (JSON? Binary?).

Resources