Golang http automatically retrying on StatusRequestTimeout (408) response - timeout

Setup
I have two different applications, both written in Go. The first is the server, and second is a smaller app that makes calls to the server. They use the http package for making calls and the router package for setting up endpoints.
The Problem
When the device makes a specific call to the server a 408 (StatusRequestTimeout) response is returned. This response is not due to our server actually timing out, but just used to describe the error (more info on this below). The first time the device makes this call it receives the 408 and proceeds normally. However, if the same call is made again a new 'third' call being sent to the server immediately after the second call has finished. This third call is the same as the first two. There is no http retry logic enabled for this call.
The bug
Why is this third call being issued? When the code is updated to instead return a 400 status response instead of a 408 this third call is no longer made. Additionally changing different calls to return a 408 instead of a 400 will start to exhibit the same behavior of sending a triplicate third call. I have been unable to find documentation to explain this behavior, or other articles which describe it.
Hunch
I have found many articles like this one which indicate browsers will sometimes retry requests. Additionally some other stackoverflow posts like this indicate that the http request doesn't retry without setting up our own retry logic. Again, we have set this up, but it is not enabled for this given call, and debugging shows that we do not ever enter our custom retry logic.
I believe that this is a chromium feature. I've tried to replicate this with firefox, but I haven't been successful, however Edge exhibits the same behavior. Chromes dev tools (and edge) however only show two network calls, the first and the third. I think it could also be the http library, but it is very strange that the behavior is different between browsers.
Bug Fix
Given the nature of what a 408 response is supposed to entail I have decided to move away from using them for custom error responses. At this point, I'm just more curious about why the behavior is as it is, if my hunch is correct, or if there is something else at play.

Lets start from method is408Message() which is here. It checks that the buffer carries 408 Request timeout status code. This method is used by another method to inspect the response from server and in case of 408 Request Timeout the persistConn is closed with an errServerClosedIdle error. The error is assigned to persistConn.closed field.
In the main loop of http Transport, there is a call to persistConn.roundTrip here which as an error returns the value stored in persistConn.closed field. Few lines below you can find a method called pconn.shouldRetryRequest which takes as an argument the error returned by persistConn.roundTrip and returns true when the error is errServerClosedIdle. Since the whole operation is wrapped by the for loop the request will be sent again.
It could be valuable for you to analyze shouldRetryRequest method because there are multiple conditions which must be met to retry the request. For example the request will not be repeated when the connection was used for the first time.

Related

POST with TIdHTTP hangs on retrieving the JSON response

This question is maybe more a tip for people to search a solution if they have the same problem (as I found the solution eventually).
I had an application that does some HTTP requests with a local server (a mix of GET/POST with JSON content in the request/response bodies). The server is a third-party application, and after I upgraded it to a recent version, my Delphi app was no longer working.
It turned out that it was now hanging on the statement:
IdHTTP.Post("URL", "Payload", "BytesStreamResult");
As a manual POSTMAN request was still working, it had to be on the Delphi client side.
Further isolating the issue showed that the HTTP POST request did get an HTTP 200 response with valid HTTP response headers, but then was getting stuck reading the response body. It was hanging on:
IOHandler.ReadLn
When I compared the headers with the POSTMAN response, I noticed that 'Transfer-Encoding: chunked' was missing in the Delphi response.
Finally, I noticed the code related to TIdHTTP's hoKeepOrigProtocol option, which is not set by default.
So, my POST request was "downgraded" to an HTTP 1.0 request, and I guess this now made the (updated) server to respond differently (I'm not an RFC expert, but I guess 'chunked' is maybe an HTTP 1.1 option only).
After setting this option, everything worked like before (and indeed, the response was now read as "chunked" in Delphi).
Summary:
Shouldn't hoKeepOrigProtocol be the default option? (why punish good citizens for those that are not...)
Can we intercept this? Now my POST is assuming upfront a streamed response and thus it hangs because the server doesn't write anything to the buffer.
What would that high-level code look like? As it seems a mix of interpreting the header response headers and then deciding if more response reading is required.
(it didn't do anything specific regarding time-outs, either. I have the impression it hangs forever, or at least > 10 minutes...)
TIdHTTP supports non-chunked responses (which yes, is an HTTP 1.1 feature), so the hanging would have to be caused by the server sending a malformed response (a bug that should be reported to the server author).
When reading a non-chunked and non-MIME response, TIdHTTP does not use IOHandler.ReadLn to read the response's body, as you claim. Only when reading the response's headers.
But, since you did not show what the response actually looks like, nobody can explain for sure exactly why the hang occurs.
Shouldn't hoKeepOrigProtocol be the default option?
At the time the option was first introduced, no. There were plenty of buggy HTTP 1.1 servers around that downgrading to HTTP 1.0 was warranted.
However, that was many years ago. Nowadays, HTTP 1.1 is much more mature, and such buggy servers are rare. So, feel free to submit a change/pull request to Indy's GitHub repo if you feel the default behavior should be changed.
Can we intercept this?
No. The behavior you describe is most likely caused by a bug in the HTTP server. Either it is not sending all of the data it should be, or else the response is likely malformed in a way that makes TIdHTTP expect more data than is actually being sent. Either way, all you can do is assign a non-infinite timeout to TIdHTTP.
it didn't do anything specific regarding time-outs, either. I have the impression it hangs forever, or at least > 10 minutes.
Indy is designed to use infinite timeouts by default. You can assign custom timeouts to TIdHTTP's ConnectTimeout and ReadTimeout properties.
Setting this prevent the HTTP protocol downgrade:
IdHTTP.HTTPOptions := IdHTTP.HTTPOptions + [hoKeepOrigProtocol];
This is, of course, dependant upon how the server processes the protocol specification, and if it results in issues or not.

Chromium Edge - Javascript seems to be affected by automatic checks for Edge updates

We have a single page web application. One of the functions of the application is to supervise the connection path from the client back to the server. This is implemented with a periodic ajax http request in javascript to the server every 60 seconds. This request acts as a heartbeat.
After a session is started, the server looks for that heartbeat. If it fails to receive a heartbeat request after a reasonable amount of time, it takes specific action.
The client also looks for a response to that heartbeat request. If it fails to receive a response after a reasonable amount of time, it displays a message on the screen via javascript.
We are getting reports from the field where a Chrome version of Edge is failing. Communication between the client and server is apparently failing. The server is seeing those heartbeat requests cease – and taking that specific action. However, the client is not taking the expected action on its side. It’s not displaying the message indicating a failed heartbeat request. It’s almost appears as though the javascript stopped running altogether.
The thing is, though… The customer has reported that if they disable automatic updates to Microsoft Edge the application runs fine. If the checking of updates is allowed to occur, the application eventually fails as described above. Note that this is apparently happening when Edge is just checking for updates - it's already up to date.
Updates are being turned off using several guid-named registry keys at [HKEY_LOCAL_MACHINE\SOFTWARE\Policies\Microsoft\EdgeUpdate].
Any thoughts?

Using Gatlin load testing, how do requests get classified as "KO?"

I am using https://gatling.io for load testing an application. I really appreciate the default reporting from the tool. After searching the documentation, it's not clear to me how particular requests get classified as "KO." (or not ok)
We are currently using all the default settings from Gatlin.
We suspect that requests need to respond under 10 seconds from inspection of gatling.conf.
Is this assumption correct?
Anything that fails a check() will be a KO
Even if you haven't added a check yourself Gatling checks for an http 2xx or 3xx response by default
https://gatling.io/docs/2.3/http/http_check/
If you're seeing KO's and haven't added any checks it's likely you're getting some 4xx or 5xx responses

App architecture when 'state changing' APNS fails

I've seen several questions on this topic. But all simply say you just have to recover from other means. But none explain what the other means are! I couldn't find an answer on SO. This is also a follow up from the comments of this question.
Let's say I'm working on a Uber app. Drivers need to know passenger locations.
A passenger sets a pickup location for 123 XYZStreet.
2 minutes later she decides to cancel the entire pickup. So now I need
to inform the driver. This is an important state changing update.
The first thought that comes to mind is:
Send a notification that has content-available:1 so I can update the app as soon as the notification arrives, and in the didReceiveNotification I call GET(PassengerInfoModel) and also have include "alert" : "Pickup has been canceled. Don't go there' So the driver would also be visually informed. Obviously tapping on the notification is not what manages the updates. The content-available being set to 1 will manage that.
But doing that, still what happens when the arrival of that notification fails—completely? Well then the latest GET(PassengerInfoModel) won't happen. As a solution I've heard of a HEAD request:
The HEAD method is identical to GET except that the server MUST NOT
return a message-body in the response. The metainformation contained
in the HTTP headers in response to a HEAD request SHOULD be identical
to the information sent in response to a GET request. This method can
be used for obtaining metainformation about the entity implied by the
request without transferring the entity-body itself. This method is
often used for testing hypertext links for validity, accessibility,
and recent modification.
Not sure what happens if using a HEAD request we figured out that there was an update!? Do we then make a GET request in the success case of the HEAD's completion Handler?
Question1 How should we handle the HEAD request response? (I'm guessing that for the server to be able to route HEAD requests, there must be some changes, but let's just assume that's outside the scope of the question).
Question2 How often do we have to do this request? Based on this comment one solution could be to set a repeating timer in the viewDidAppear e.g. make a HEAD request every 2 minutes. Is that a good idea?
Question3 Now let's say we did that HEAD request, but the GET(PassengerInfoModel) is requested from 2 other scenes/viewControllers as well. The server can't differentiate between the different scenes/viewControllers. I'm guessing a solution would be have all our app's network requests managed through a singleton NetworkHandler. Is that a good idea?
I understand that this question is broad, but believe the issue needs to be addressed as a whole
Question1 How should we handle the HEAD request response? (I'm guessing that for the server to be able to route HEAD requests, there must be some changes, but let's just assume that's outside the scope of the question).
You probably don't need to deal with HEAD requests. Using Etags is a standard mechanism which lets you make a GET request and the server can just return an empty body with 304 response if nothing has changed, or the actual new content if something has.
Question2 How often do we have to do this request? Based on this comment one solution could be to set a repeating timer in the viewDidAppear e.g. make a HEAD request every 2 minutes. Is that a good idea?
I think this is reasonable, especially if you want to inform your user when you are unable to make that request successfully. You might also consider using Apple's Reachability code to detect when you can or cannot talk to your server.
Question3 Now let's say we did that HEAD request, but the GET(PassengerInfoModel) is requested from 2 other scenes/viewControllers as well. The server can't differentiate between the different scenes/viewControllers. I'm guessing a solution would be have all our app's network requests managed through a singleton NetworkHandler. Is that a good idea?
Yes, I think having a singleton is reasonable, though I'm not sure why the server cares what view controller is making the request. Like can't they just request different urls?

In Rails 3, how do I call some code via a controller but completely after the Request/Response cycle is done?

I have a very weird situation: I have a system where a client app (Client) makes an HTTP GET call to my Rails server, and that controller does some handling and then needs to make a separate call to the Client via a different pathway (i.e. it actually goes via Rabbit to a proxy and the proxy calls the Client). I can't change the pathway for that different call and I can't change the Client at all (it's a 3rd party system).
However: the issue is: the call via the different pathway fails UNLESS the HTTP GET from the client is completed.
So I'm trying to figure out: is there a way to have Rails finish the HTTP GET response and then make this additional call?
I've tried:
1) after_filter: this doesn't work because the after filter is apparently still within the Request/Response cycle so the TCP/HTTP response back to the Client hasn't completed.
2) enqueuing a worker: this works, but it is not ideal because if the workers are backed up, this call back to the client may not happen right away and it really does need to happen right after the Client calls the Rails app
3) starting a separate thread: this may work, but it makes me nervous: adding threading explicitly in Rails could be fraught with peril.
I welcome any ideas/suggestions.
Again, in short, the goal is: process the HTTP GET call to the Rails app and return a 200 OK back to the Client, completely finishing the HTTP request/response cycle and then call some extra code
I can provide any further details if that would help. I've found both #1 and #2 as recommended options but neither of them are quite what I need.
Ideally, there would be some "after_response" callback in Rails that allows some code to run but after the full request/response cycle is done.
Possibly use an around filter? Around filters allow us to define methods that wrap around every action that rails calls. So if I had an around filter for the above controller, I could control the execution of every action, execute code before calling the action, and after calling it, and also completely skip calling the action under certain circumstances if I wanted to.
So what I ended up doing was using a gem that I had long ago helped with: Spawnling
It turns out that this works well, although it required a tweak to get it working with Rails 3.2. It allows me to spawn a thread to do the extra, out-of-band callback to the Client, but let the normal, controller process complete. And I don't have to worry about thread management, or AR connection management. Spawnling handles that.
It's still not ideal, but pretty close. And it's slightly better than enqueuing a Resque/Sidekiq worker as there's no risk of worker backlog causing an unexpected delay.
I still wish there was an "after_response_sent" callback or something, but I guess this is too unusual a request.

Resources