Handling overlapping asynchronous POST requests - post

We have implemented How to Use POST for Asynchronous Tasks in our project for long running tasks.
My question is how we can handle the case wherein we have multiple POST requests issued back-to-back from the same client. In this case, only the last POST request is what the client really is interested in. When a POST triggers a background thread, a subsequent POST also tries to trigger another background thread.
This effectively means that apart from the last POST request, the others result in wasteful computation
Are there any design patterns to address this?

I don't believe there is a commonly accepted pattern for this situation.
There are at least 2 situations where the duplicate POST can arise:
The client re-transmits the original POST for whatever reason ( such as it missed the response).
The user re-transmits either the original POST (intentionally or not) or he issues a new POST intended to replace the earlier versions.
It's best to deal with the problem at the server side.
One solution is to add a unique identifier (such as a GUID or a natural key) to each POST, and then have the server check if a request with that identifier already exists or not. Your server will need to save state somewhere while an existing POST is being processed (cookie, SESSION, local file, database are options) .
Note if you use a GUID you have to be sure that the client does not create a new GUID for re-transmits of the same request. You can avoid this problem by using a natural key instead of a GUID generator. The generation of the natural key should be repeatable and (reasonably) unique.

Related

App architecture when 'state changing' APNS fails

I've seen several questions on this topic. But all simply say you just have to recover from other means. But none explain what the other means are! I couldn't find an answer on SO. This is also a follow up from the comments of this question.
Let's say I'm working on a Uber app. Drivers need to know passenger locations.
A passenger sets a pickup location for 123 XYZStreet.
2 minutes later she decides to cancel the entire pickup. So now I need
to inform the driver. This is an important state changing update.
The first thought that comes to mind is:
Send a notification that has content-available:1 so I can update the app as soon as the notification arrives, and in the didReceiveNotification I call GET(PassengerInfoModel) and also have include "alert" : "Pickup has been canceled. Don't go there' So the driver would also be visually informed. Obviously tapping on the notification is not what manages the updates. The content-available being set to 1 will manage that.
But doing that, still what happens when the arrival of that notification fails—completely? Well then the latest GET(PassengerInfoModel) won't happen. As a solution I've heard of a HEAD request:
The HEAD method is identical to GET except that the server MUST NOT
return a message-body in the response. The metainformation contained
in the HTTP headers in response to a HEAD request SHOULD be identical
to the information sent in response to a GET request. This method can
be used for obtaining metainformation about the entity implied by the
request without transferring the entity-body itself. This method is
often used for testing hypertext links for validity, accessibility,
and recent modification.
Not sure what happens if using a HEAD request we figured out that there was an update!? Do we then make a GET request in the success case of the HEAD's completion Handler?
Question1 How should we handle the HEAD request response? (I'm guessing that for the server to be able to route HEAD requests, there must be some changes, but let's just assume that's outside the scope of the question).
Question2 How often do we have to do this request? Based on this comment one solution could be to set a repeating timer in the viewDidAppear e.g. make a HEAD request every 2 minutes. Is that a good idea?
Question3 Now let's say we did that HEAD request, but the GET(PassengerInfoModel) is requested from 2 other scenes/viewControllers as well. The server can't differentiate between the different scenes/viewControllers. I'm guessing a solution would be have all our app's network requests managed through a singleton NetworkHandler. Is that a good idea?
I understand that this question is broad, but believe the issue needs to be addressed as a whole
Question1 How should we handle the HEAD request response? (I'm guessing that for the server to be able to route HEAD requests, there must be some changes, but let's just assume that's outside the scope of the question).
You probably don't need to deal with HEAD requests. Using Etags is a standard mechanism which lets you make a GET request and the server can just return an empty body with 304 response if nothing has changed, or the actual new content if something has.
Question2 How often do we have to do this request? Based on this comment one solution could be to set a repeating timer in the viewDidAppear e.g. make a HEAD request every 2 minutes. Is that a good idea?
I think this is reasonable, especially if you want to inform your user when you are unable to make that request successfully. You might also consider using Apple's Reachability code to detect when you can or cannot talk to your server.
Question3 Now let's say we did that HEAD request, but the GET(PassengerInfoModel) is requested from 2 other scenes/viewControllers as well. The server can't differentiate between the different scenes/viewControllers. I'm guessing a solution would be have all our app's network requests managed through a singleton NetworkHandler. Is that a good idea?
Yes, I think having a singleton is reasonable, though I'm not sure why the server cares what view controller is making the request. Like can't they just request different urls?

How to avoid Race Conditions when a restful architecture doesn't make sense

I'm making a website that I don't think makes sense to implement with a restful architecture (at least not the portion relevant to this problem), but it's causing some problems with race conditions across multiple servers which share a database.
My website has info about users of another product, so it has a Users table (not users of my site though). Users have many files.
Users and files are populated by an automated service, not manually on the site. The service posts the files to the server, the server parses them and gets the username from the file. If the username is new, it creates a new user row in the table. It then returns about the file to the service that made the request.
The problems I'm seeing are when race conditions when multiple requests come in at the same time for related objects, and it causes things like violations of unique indexes in the db.
For example, there is a unique key on username. This code can be a problem if 2 requests from the automated service for files from the same user come in at the same time.
var myuser = db.users.FirstOrDefault(u => u.username == username);
if(myuser == null)
{
myuser = new user(username);
db.AddObject(user);
}
db.SaveChanges();
Request 1 will see that there is no user with username foo, so the if condition returns true. Request 2 sees the same thing, not knowing that request 1 already began creating the user, and when request 2 tries to save, it violates the unique key.
Is there a common pattern or solution to this problem? I know this wouldn't be a problem if the server was RESTful, but I don't think it's really feasible for the service to change the way it makes requests, so I'd like that to stay the same if possible. Right now, it just posts the file to the server, not knowing whether the user of that file existed already, or whether that file was posted to the server yet (it may post it more than once). Those objects are created if they don't exist yet, and if they do, the list of items is updated. But as far the service is concerned, it just wants to know certain info about the file, and isn't concerned with whether or not it already exists in my db.
I think it'd be too slow for it to try to create a user via a request, then try to create the file via a request, and then request info about the file in another request. Also, the service runs multiple requests at a time via Parallel.ForEach, and it'd be too slow for it to run it in a single thread.
The first thing is separation of concerns. If you have an automated service populating the data, then that service (or another piece of middleware) should be responsible for creating the database records. This shouldn't happen at run time in response to a request to your website.
Second, if you must do it this way, that is what locks are for. Each request to your website runs in it's own thread(s). So, if multiple threads need to access the same volatile resource (your DB) then you need to institute optimistic locking, so that the first thread in wins and any further threads will only be able to try to interact with that table or row (depending on the type of lock) once the first has completed its work.
Third, this is pretty much exactly what RESTful architecture attempts to solve. You can use ETags to version your resources so any attempt to POST to an outdated resource will return an HTTP error (409 Conflict) directing the client to refetch the original resource.

Rails application design: Queueing, Resque, Background Services, and Redis

I am designing a Rails app that takes in requests, uses data within the request to call a 3rd party web service, process the reply and then sends out a response to the original requestor and also issues a PUT request to yet another service.
I am trying to wrap my head around how to design this Rails app as it's different from the canonical Rails structure.
The objects are Lists and Tasks. Each List has many Tasks, and each Task belongs to a List.
The request I would get is something like:
http://myrailsapp.heroku.com/v1/lists?id=1&from=2012-02-12&to=2012-02-14&priority=high
In this example I am requesting tasks from 2/12/2012 to 2/14/2012 with a high priority in List #1
I would then issue a 3rd party web service call like this:
http://thirdpartywebservice.com/v1/lists?id=4128&from=2012-02-12&to=2012-02-14&priority=high
As you can see some processing was done on the data (id was changed in this case)
The results are then sent back to the requestor and to another web service via PUT.
My question is, how do I set up the Rails app to handle these types of behaviors? How does the controller structure change? This looks like a good use case for queues, how do I distribute multiple concurrent requests among queues?
For one thing I don't need data persistence (data can be discarded after the response is sent out) and also data structure design is simplified. (I don't think I need ruby objects, simply dictionaries or hashes representing these would be lighter weight and quicker to implement)
Edit
So I broke down the work flow of the app into these components
Parse incoming request
Construct 3rd part web service request
Send 3rd party request
Enqueue a worker to process the expected response
Process the response once it arrives
Send the parsed result back as a response
Which of the standard ruby controllers handle each of these steps? What are the models needed besides Lists and Tasks?
You should still use a database because passing data to Resque is messy. Rather, you should store it in the database and then pass the id to the workers, fetch the data, commit any new data or delete the record. It's really up to you but this method is cleaner. You can also use a push service like faye to let the user know when the processing is complete.
If you expect to have many concurrent requests, I would recommend Sidekiq as it's less of a memory hog. Having 4-5 resque workers can already suck up about 512 MB. The controller structure should not change. Please comment on anything you need clarified and I'll be happy to update my answer.
EDIT
You would want to use a separate database store, such as Postgres. Not sure if it's important what models you need, but essentially this is what should be happening.
In your controller, create a Request object which contains the query params you want to query this 3rd party service with. Then enqueue a job to be handled by Sidekiq/Resque, let's call this ThirdPartyRequest and pass in the id of the Request object you just created as an argument. Then render a view here showing the Request object. Let's say that Request#response is still empty cause it hasn't been processed yet, so let the user know it's still processing.
A worker then handles your job ThirdPartyRequest. ThirdPartyRequest should then fetch the Request object and obtain the query params needed to contact the third party service. It does that then gets a Request. Update the Request object with this Request then save it.
class ThirdPartyRequest
def self.perform(request_id)
request = Request.find(request_id)
# contact third party service
request.response = ...
request.save
end
end
The user can continually refresh his page to check on his/her Request object. Once it gets updated with the response, they will know its completed. If you want the page to refresh automatically, look into faye/juggernaut/private_pub or a SaaS solution like Pusher.

When should I use HttpDelete or HttpPut in an asp.net mvc application

I use always HttpGet or HttpPost even when my action is executing a delete method on the database.
For what should I use then HttpDelete/HttpPut ?
Web browsers only support GET and POST, so if you are building a web site, there is no need for PUT or DELETE. If you are building a RESTful api, though, PUT and DELETE are the way to go if you want your users to be able to put and/or delete stuff.
EDIT: It seems browsers do support DELETE and PUT in their implementations of XMLHttpRequest. Hence you can use them in ajax requests. Html forms, though, do not support them.
If you build an OData service.
HTTP DELETE - Deletes the entity data that the specified resource represents. A payload is not present in the request or response messages.
HTTP PUT - Replaces existing entity data at the requested resource with new data that is supplied in the payload of the request message. (msdn)
There's a presentation with Scott Hanselman that might be interesting. (I haven't seen it yet.)
There's also a couple of lectures on pluralsight on OData if you have a subscription there.
I guess you have understood about the use of DELETE request but PUT is a little different thing.
If I'm creating a new resource in the server and if the URI through which it can be accessed is decided by me then I'll go for PUT. In most of the cases the URI is decided by the server and hence POST go for creation and PUT usually for update.
Final thing is, like GET both DELETE and PUT are idempotent, means how many times the client send the requests serially the state of the server should be changed to same as in the first request.

Client Server API pattern in REST (unreliable network use case)

Let's assume we have a client/server interaction happening over unreliable network (packet drop). A client is calling server's RESTful api (over http over tcp):
issuing a POST to http://server.com/products
server is creating an object of "product" resource (persists it to a database, etc)
server is returning 201 Created with a Location header of "http://server.com/products/12345"
! TCP packet containing an http response gets dropped and eventually this leads to a tcp connection reset
I see the following problem: the client will never get an ID of a newly created resource yet the server will have a resource created.
Questions: Is this application level behavior or should framework take care of that? How should a web framework (and Rails in particular) handle a situation like that? Are there any articles/whitepapers on REST for this topic?
The client will receive an error when the server does not respond to the POST. The client would then normally re-issue the request as they assume that it has failed. Off the top of my head I can think of two approaches to this problem.
One is that the client can generate some kind of request identifier, such as a guid, which it includes in the request. If the server receives a POST request with a duplicate GUID then it can refuse it.
The other approach is to PUT instead of POST to create. If you cannot get the client to generate the URI then you can ask the server to provide a new URI with a GET and then do a PUT to that URI.
If you search for something like "make POST idempotent" you will probably find a bunch of other suggestions on how to do this.
If it isn't reasonable for duplicate resources to be created (e.g. products with identical titles, descriptions, etc.), then unique identifiers can be generated on the server which can be tracked against created resources to prevent duplicate requests from being processed. Unlike Darrel's suggestion of generating unique IDs on the client, this would also prevent separate users from creating duplicate resources (which you may or may not find desirable). Clients will be able to distinguish between "created" responses and "duplicate" responses by their response codes (201 and 303 respectively, in my example below).
Pseudocode for generating such an identifier — in this case, a hash of a canonical representation of the request:
func product_POST
// the canonical representation need not contain every field in
// the request, just those which contribute to its "identity"
tags = join sorted request.tags
canonical = join [request.name, request.maker, tags, request.desc]
id = hash canonical
if id in products
http303 products[id]
else
products[id] = create_product_from request
http201 products[id]
end
end
This ID may or may not be part of the created resources' URIs. Personally, I'd be inclined to track them separately — at the cost of an extra lookup table — if the URIs were going to be exposed to users, as hashes tend to be ugly and difficult for humans to remember.
In many cases, it also makes sense to "expire" these unique hashes after some time. For example, if you were to make a money transfer API, a user transferring the same amount of money to the same person a few minutes apart probably indicates that the client never received the "success" response. If a user transfers the same amount of money to the same person once a month, on the other hand, they're probably paying their rent. ;-)
The problem as you describe it boils down to avoiding what are called double-adds. As mentioned by others, you need to make your posts idempotent.
This can be easily implemented at the framework level. The framework can keep a cache of completed responses. The requests have to have a request unique so that any retries are treated as such, and not as new requests.
If the successful response gets lost on its way to the client, the client will retry with the same request unique, the server will then respond with its cached response.
You are left with durability of the cache, how long to keep responses, etc. One approach is to remove responses from the server cache after a given period of time, this will depend on your app domain and traffic and can be left as a configurable step on the framework piece. Another approach is to force the client to sent acknowledgements. The acks can be sent either as separate requests (note that these could be lost too), or as extra data piggy backed on real requests.
Although what I suggest is similar to what others suggest, I strongly encourage you to keep this layer of network resiliency to do only that, deal with drop requests/responses and not allow it to deal with duplicate resources from separate requests which is an application level task. Merging both pieces will mush all functionality and will not leave you with a clear separation of responsibilities.
Not an easy problem, but if you keep it clean you can make your app much more resilient to bad networks without introducing too much complexity.
And for some related experiences by others go here.
Good luck.
As the other responders have pointed out, the basic problem here is that the standard HTTP POST method is not idempotent like the other methods. There is an effort underway to establish a standard for an idempotent POST method known as Post-Once-Exactly, or POE.
Now I'm not saying that this is a perfect solution for everybody in the situation you describe, but if it is the case that you are writing both the server and the client, you may be able to leverage some of the ideas from POE. The draft is here: https://datatracker.ietf.org/doc/html/draft-nottingham-http-poe-00
It isn't a perfect solution, which is probably why it hasn't really taken off in the six years since the draft was submitted. Some of the problems, and some clever alternate options are discussed here:
http://tech.groups.yahoo.com/group/rest-discuss/message/7646
HTTP is a stateless protocol, meaning the server can't open an HTTP connection. All connections get initialized by the client. So you can't solve such an error on the server side.
The only solution I can think of: If you know, which client created the product, you can supply it the products it created, if it pulls that information. If the client never contacts you again, you won't be able to transmit information about the new product.

Resources