NoHttpResponseException is a retriable exception? - connection-pooling

We implemented connection pooling in our client code to invoke a server which closes(sends Connection:close in response headers) a connection after 2.5mins. Due to server behaviour we sometimes/intermittently get NoHttpResponseException. And this may occur at high TPS or at low TPS as well.
We are using apache http client version 4.5.11. And there is one validateAfterInactivity setting in PoolingHttpClientConnectionManager which is by-default set to 2000ms. But i think we may get same exception if we try to get the connection in 2000ms period.
We can choose to set aggressive value for validateAfterInactivity but i heard that it can degrade the performance by ~20 to 30ms for each request.
is retrying this exception a good solution ?
And also align to same context, can we retry in case of java.net.SocketException: Connection reset ?
#ok2c any suggestion here ?
Thanks in advance.

NoHttpResponseException is considered safe to retry for idempotent methods.
In your particular case however I would consider limiting the TTL (total to live) of client connections to 2.5 minutes to match that of the server endpoints.

Related

Flickering HttpClient sometimes throwing IOException

I'm using java.net.http.HttpClient.newHttpClient() under Java 19 (Temurin) and perform sendAsync(...) requests from different treads on the same instance. I assume this is ok, as the javadoc states:
Once built, an HttpClient is immutable...
However, some requests fail with:
java.io.IOException: HTTP/1.1 header parser received no bytes
The weird thing is, it depends on the speed of my requests:
Requests every 5 seconds: 30% failure
Requests every 3 seconds: 0% failure
I've written a test for it:
private final HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create("https://..."))
.setHeader("Content-Type", "application/json")
.POST(HttpRequest.BodyPublishers.ofByteArray("[]".getBytes()))
.build();
#ParameterizedTest
#ValueSource(ints = {3, 5})
void httpClientTest(int intervalSeconds) throws Exception {
HttpClient httpClient = HttpClient.newHttpClient();
httpClient.sendAsync(request, HttpResponse.BodyHandlers.ofByteArray()).get();
Thread.sleep(Duration.ofSeconds(intervalSeconds));
httpClient.sendAsync(request, HttpResponse.BodyHandlers.ofByteArray()).get();
Thread.sleep(Duration.ofSeconds(intervalSeconds));
httpClient.sendAsync(request, HttpResponse.BodyHandlers.ofByteArray()).get();
Thread.sleep(Duration.ofSeconds(intervalSeconds));
httpClient.sendAsync(request, HttpResponse.BodyHandlers.ofByteArray()).get();
Thread.sleep(Duration.ofSeconds(intervalSeconds));
httpClient.sendAsync(request, HttpResponse.BodyHandlers.ofByteArray()).get();
}
I've already tried the following:
Doing the same with curl on the command line. No requests fail whatever interval I try. So it's probably not a problem with the server.
Running the tests multiple times in parallel. Still the 5-second-intervals fail (then multiple times in parallel). So it's probably not a problem with the server.
Creating an HttpClient.newHttpClient() for every request. No requests fail whatever interval. So it's probably not a problem with the server but with an internal state of the HttpClient (although it claims to be immutable?).
Do you have an idea what I could do, without needing to create a new HttpClient for every request?
Here is the answer for the record: the java.net.HttpClient has a long default HTTP/1.1 keepAlive time, which is longer than what usual servers are configured with. This often results in the server closing idle HTTP/1.1 connections before the client does. If the server closes the connection at about the same time than the client tries to reuse it, some IOException might get raised.
If such exceptions are observed too frequently applications should consider adapting the default keepAlive time in the client to some value shorter than what the servers it connects to are using.
A default value for the HttpClient HTTP/1.1 keepAlive time can be specified on the command line with: -Djdk.httpclient.keepalive.timeout=duration-in-seconds
So for instance - if a server is configured with a keepAlive time of 5s, you could consider supplying -Djdk.httpclient.keepalive.timeout=3 or -Djdk.httpclient.keepalive.timeout=4 on the client's java command line.

DocumentDB return "Request rate is large", parse on azure

I'm runing parse on azure (Parse Server on managed Azure services),
I'ts include DocumentDB as database and have limit for requests per seconds.
Some parse cloud functions are large and the speed of requests is too high (even for S3 tier) so i'm getting this error (seen using Visual Studio Team Services (was Visual Studio Online) and Streaming logs).
error: Uncaught internal server error. { [MongoError: Message: {"Errors":["Request rate is large"]}
ActivityId: a4f1e8eb-0000-0000-0000-000000000000, Request URI: rntbd://10.100.99.69:14000/apps/f8a35ed9-3dea-410f-a89a-28650ff41381/services/2d8e5320-89e6-4750-a06f-174c12013c69/partitions/53e8a085-9fed-4880-bd90-f6191765f625/replicas/131091039101528218s]
name: 'MongoError',
message: 'Message: {"Errors":["Request rate is large"]}\r\nActivityId: a4f1e8eb-0000-0000-0000-000000000000, Request URI: rntbd://10.100.99.69:14000/apps/f8a35ed9-3dea-410f-a89a-28650ff41381/services/2d8e5320-89e6-4750-a06f-174c12013c69/partitions/53e8a085-9fed-4880-bd90-f6191765f625/replicas/131091039101528218s' } MongoError: Message: {"Errors":["Request rate is large"]}
ActivityId: a4f1e8eb-0000-0000-0000-000000000000, Request URI: rntbd://10.100.99.69:14000/apps/f8a35ed9-3dea-410f-a89a-28650ff41381/services/2d8e5320-89e6-4750-a06f-174c12013c69/partitions/53e8a085-9fed-4880-bd90-f6191765f625/replicas/131091039101528218s
at D:\home\site\wwwroot\node_modules\mongodb-core\lib\cursor.js:673:34
at handleCallback (D:\home\site\wwwroot\node_modules\mongodb-core\lib\cursor.js:159:5)
at setCursorDeadAndNotified (D:\home\site\wwwroot\node_modules\mongodb-core\lib\cursor.js:501:3)
at nextFunction (D:\home\site\wwwroot\node_modules\mongodb-core\lib\cursor.js:672:14)
at D:\home\site\wwwroot\node_modules\mongodb-core\lib\cursor.js:585:7
at queryCallback (D:\home\site\wwwroot\node_modules\mongodb-core\lib\cursor.js:241:5)
at Callbacks.emit (D:\home\site\wwwroot\node_modules\mongodb-core\lib\topologies\server.js:119:3)
at null.messageHandler (D:\home\site\wwwroot\node_modules\mongodb-core\lib\topologies\server.js:397:23)
at TLSSocket.<anonymous> (D:\home\site\wwwroot\node_modules\mongodb-core\lib\connection\connection.js:302:22)
at emitOne (events.js:77:13)
How to handle this error?
TL;DR;
Upgrade the old S3 collection to a new single collection under the new pricing scheme. This can support up to 10K RU (up from 2500 RU)
Delete the old S3 collection and create a new partitioned collection. Will require support for partitioned collection in parse.
Implement a backoff strategy in line with the x-ms-retry-after-ms response header.
Long answer:
Each request to DocumentDB returns a HTTP header with the Request charge for that operation. The number of request units is configured per collection. As per my understanding you have 1 collection of size S3, so this collection can only handle 2500 Request Units per second.
DocumentDB scales by adding multiple collections. With the old configuration using S1 -> S3 you must do this manually, i.e. you must distribute your data over the collections using an algorithm such as consistent hashing, a map or perhapse date. With the new pricing in DocumentDB you can use partitioned collections, by defining a partition key, DocumentDB will shard your data for you. If you see sustained rates of RequestRateTooLarge errors I recommend scaling out the partitions. However, you will need to investigate if Parse supports partitined collections.
When you receive a HTTP 429 RequestRateTooLarge there's also a header called x-ms-retry-after-ms :### where ### denotes the number of milliseconds to wait before you retry the operation. What you can do is to implement a back-off strategy which retries the operation. Do note that if you have clients hanging on the server during retries, you may build up request queues and clog the server. I recommend adding a Queue to handle such burst. For short burst of requests this is a nice way to handled it without scaling up the collections.
i used Mlab as external mongoDB database and configure the parse app in azure to use it instead of documentDB.
I have to will to pay so much for "performance" increase.

In ruby/rails, can you differentiate between no network response vs long-running response?

We have a Rails app with an integration with box.com. It happens fairly frequently that a request for a box action to our app results in a Passenger process being tied up for right around 15 minutes, and then we get the following exception:
Errno::ETIMEDOUT: Connection timed out - SSL_connect
Often it's on something that should be fairly quick, such as listing the contents of a small folder, or deleting a single document.
I'm under the impression that these requests never actually got to an open channel, that either at the tcp or ssl levels we got no initial response, or the full handshake/session-setup never completed.
I'd like to get either such condition to timeout quickly, say 15 seconds, but allow for a large file that is successfully transferring to continue.
Is there any way to get TCP or SSL to raise a timeout much sooner when the connection at either of those levels fails to complete setup, but not raise an exception if the session is successfully established and it's just taking a long time to actually transfer the data?
Here is what our current code looks like - we are not tied to doing it this way (and I didn't write this code):
def box_delete(uri)
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_NONE
request = Net::HTTP::Delete.new(uri.request_uri)
http.request(request)
end

wso2 esb how to increase endpoint timeout

I have JMS queue message processor sequence where request is send to SOAP endpoint. However request to this endpoint can take a long time, up to 30 minutes or so. How can I can configure ESB to allow long timeout values ? Currently I'm getting following error after 60 seconds:
[2014-01-20 14:18:31,772] WARN - TargetHandler http-outgoing-4: Connection time out while in state: REQUEST_DONE
[2014-01-20 14:18:31,775] WARN - SynapseCallbackReceiver Synapse received a response for the request with message Id : urn:uuid:c6a023c2-7fb4-4321-b1c2-d78e9bb13add But a callback is not registered (anymore) to process this response
Thanks for any help
Edit: I added http.socket.timeout=1800000 -property in repository/conf/passthru-http.properties which seems to solve the timeout issue.
Assuming this is a "Scheduled Message Forwarding Processor", to increase the send timeout up to 30 minutes :
In your endpoint, verify that "connection timeout" is "never
timeout" (edit the endpoint in the console and "Show Advanced
options")
Edit repository/conf/synapse.properties and modify
synapse.global_timeout_interval (in ms) : this is the maximum time a
callback instance will exist in wso2 to receive the response
copy the sample axis2 conf file
from samples/axis2Client/client_repo/conf/axis2.xml to, for example,
repository/conf/axis2/axis2_mp.xml
Edit this axis2_mp.xml config, find
transportSender name="http" and add a parameter "SO_TIMEOUT" (in ms) : <parameter name="SO_TIMEOUT" locked="false">108000000</parameter>
Edit your Message Processor and in Show Additional Parameters, specify the entry "Axis2 Configuration" to repository/conf/axis2/axis2_mp.xml
SO_TIMEOUT is the time to wait for the response.
You can specify CONNECTION_TIMEOUT for the max time to establish the connection.
Pay attention : all callbacks will persist up to 30 minutes in the ESB !

Connection in RabbitMQ server auto lost after 600s

I'm using rabbitMQ server with amq.
I am having a difficult problem. After leaving the server alone for about 10 min, the connection is lost.
What could be causing this?
If you look at the Erlang client documentation http://www.rabbitmq.com/erlang-client-user-guide.html you will see a section titled Connecting To A Broker
This gives you a few different options that you can specify when setting up your connection to the RabbitMQ server, one of the options is the heartbeat, as you can see the default is 0 so no heartbeat is specified.
I don't know the exact Erlang notation, but you will need to do something like:
{ok, Connection} = amqp_connection:start(#amqp_params_network{heartbeat = 5})
The heartbeat timeout is specified in seconds. So this would cause your consumer to heartbeat back to the server every 5seconds.
Also take a look at this discussion: https://groups.google.com/forum/?fromgroups=#!topic/rabbitmq-discuss/u227xzvqOr8
The default connection timeout for the RabbitMQ connection factory is 600 seconds (at least in the Java client API), hence your 10 minutes. You can change this by specifying to the connection factory your timeout of choice.
It is good practice to ensure your connection is release and recreated after a specific amount of time, to prevent eventual leaks and excessive resournces. Your code should ensure that it seeks a valid connection that is not close to be timed-out, and re-establish a new connection on the ones that did time-out. Overall, adopt a connection-pooling approach.
- Java example:
ConnectionFactory factory = new ConnectionFactory();
factory.setHost(this.serverName);
factory.setPort(this.serverPort);
factory.setUsername(this.userName);
factory.setPassword(this.userPassword);
factory.setConnectionTimeout( YOUR-TIMEOUT-IN-SECONDS );
Connection = factory.newConnection();

Resources