I have implemented a RESTful API with few resources, for example:
/products/
/products/1
/products/2
/categories/
/categories/1
/categories/2
etc.
Now, I have been told that the app should mainly work offline, therefore I need to get all the data from the APIs and store it locally.
Since I am not providing a single chunk of data but there are different resources URI that needs to be called in order to get all the data I was wondering if this could be a problem.
How does this work? will there be many HTTP calls or one call will do everything?
What is the best approach in this case?
Are these endpoints in themselves?
/products
/categories
It's a pretty well established convention for those to return the entire collection. You could even add some request parameters for filtering etc.
Each URI represents single peace of data. The main idea of REST, instead of having randomly named setter and getter URLs and using GET for all the getters and POST for all the setters, we try to have the URLs identify resources, and then use the HTTP actions GET, POST, PUT and DELETE to do stuff to them.
So, using AFNetworking, for example, you get all benefits of this architecture.
Download model could look like:
Ask server for specified resource by get request
save response in background thread
ask for new peace of data
Of course, if you do not have ability to make new endpoint, that will download all stub, you must download it separately for each:
/products/
/products/1
/products/2
/categories/
/categories/1
/categories/2
Setting up your endpoints in this way will allow for a user of your app to retrieve a single product/category or a list of products/categories.
Here's what each of these API endpoints should do when they are called.
/products - returns a list of products
/categories - returns a list of categories
/products/:id - returns the product with the specified id
/categories/:id - returns the category with the specific id
As far as allowing the app to work mostly offline. The best approach is to do some caching on the client (app) side. Every time a call is made to one of these endpoints for the first time, the result should be stored somewhere on the client side. The next time that same request is made, the data has already been retrieved previously, so therefore no network call needs to be made and the app will work offline. However, the first call needs to have a network connection to be made.
This can be implemented with a dictionary, where the key is the request (/products, /categories/1, etc.) and the value is the result that is returned from the API request. Every time a request is made, your app should first check if the data exists already on the client side. If it does it does not need to make a network call to get it and can just return the data that's present on the cleint.
Related
I've created and application and a paginated api which is hooked up to each other. However i'm a bit confused on what is best practice in terms of only showing updated data. For instance if i retrieve data one day and save it into my mobile database. How will the app the next day know that it should make a request and only show that particular data that just has been fetched from the database. Do i need to make somekind of flag or look at createdAt?
When making the request, include either the If-None-Match header with the local resource's ETag or the If-Modified-Since header with the date the local resource was requested.
Configure your server to look for the header and return a 304 Not Modified if the data hasn't changed. That will at least save you some traffic on the responses.
In addition, if the resource data is relatively static, or if the client can tolerate having stale client data, then you can add caching headers to your response. As long as the cached request is valid, the request will never leave your client.
Ideally, you want do design your API to support this where possible. For example, have the request "give me all things in 50 meters" return a list of URIs. Then the API will only have to hit the server for those URIs which are stale.
I am designing a Rails app that takes in requests, uses data within the request to call a 3rd party web service, process the reply and then sends out a response to the original requestor and also issues a PUT request to yet another service.
I am trying to wrap my head around how to design this Rails app as it's different from the canonical Rails structure.
The objects are Lists and Tasks. Each List has many Tasks, and each Task belongs to a List.
The request I would get is something like:
http://myrailsapp.heroku.com/v1/lists?id=1&from=2012-02-12&to=2012-02-14&priority=high
In this example I am requesting tasks from 2/12/2012 to 2/14/2012 with a high priority in List #1
I would then issue a 3rd party web service call like this:
http://thirdpartywebservice.com/v1/lists?id=4128&from=2012-02-12&to=2012-02-14&priority=high
As you can see some processing was done on the data (id was changed in this case)
The results are then sent back to the requestor and to another web service via PUT.
My question is, how do I set up the Rails app to handle these types of behaviors? How does the controller structure change? This looks like a good use case for queues, how do I distribute multiple concurrent requests among queues?
For one thing I don't need data persistence (data can be discarded after the response is sent out) and also data structure design is simplified. (I don't think I need ruby objects, simply dictionaries or hashes representing these would be lighter weight and quicker to implement)
Edit
So I broke down the work flow of the app into these components
Parse incoming request
Construct 3rd part web service request
Send 3rd party request
Enqueue a worker to process the expected response
Process the response once it arrives
Send the parsed result back as a response
Which of the standard ruby controllers handle each of these steps? What are the models needed besides Lists and Tasks?
You should still use a database because passing data to Resque is messy. Rather, you should store it in the database and then pass the id to the workers, fetch the data, commit any new data or delete the record. It's really up to you but this method is cleaner. You can also use a push service like faye to let the user know when the processing is complete.
If you expect to have many concurrent requests, I would recommend Sidekiq as it's less of a memory hog. Having 4-5 resque workers can already suck up about 512 MB. The controller structure should not change. Please comment on anything you need clarified and I'll be happy to update my answer.
EDIT
You would want to use a separate database store, such as Postgres. Not sure if it's important what models you need, but essentially this is what should be happening.
In your controller, create a Request object which contains the query params you want to query this 3rd party service with. Then enqueue a job to be handled by Sidekiq/Resque, let's call this ThirdPartyRequest and pass in the id of the Request object you just created as an argument. Then render a view here showing the Request object. Let's say that Request#response is still empty cause it hasn't been processed yet, so let the user know it's still processing.
A worker then handles your job ThirdPartyRequest. ThirdPartyRequest should then fetch the Request object and obtain the query params needed to contact the third party service. It does that then gets a Request. Update the Request object with this Request then save it.
class ThirdPartyRequest
def self.perform(request_id)
request = Request.find(request_id)
# contact third party service
request.response = ...
request.save
end
end
The user can continually refresh his page to check on his/her Request object. Once it gets updated with the response, they will know its completed. If you want the page to refresh automatically, look into faye/juggernaut/private_pub or a SaaS solution like Pusher.
I want to fetch location of person and connections so how should I specify fields for this purpose?
http://api.linkedin.com/v1/people/~/network/updates:(update-content:(person:(id,headline,location)))?type=CONN
If I'll make another calls for just getting location, it will be very costly for me, as it will require to make extra calls for each of new connection and will increase number of calls exponentially. So, I want some solution using which I can get location in network updates API call itself.
EDIT: And another thing I need is to check about the privacy setting of connections. As per my knowledge, linkedin doesn't provide any api which returns that which connection allows to see updates and which are not. So, when I try to get network update for a particular connection, it returns error that this user doesn't allow public to see updates. If I want to check this thing before call network updates API, how can I do it in Ruby Language.
OR
Let me know some way to pass multiple dynamic IDs while calling linkedin API.
When retrieving person data associated with a Network Update, it appears that only the basic fields are available. The solution would be to get the id for the person and make a second call to the Profile API:
http://api.linkedin.com/v1/people/id=12345:(first-name,last-name,connections,location)
Currently, linkedin doesn't provide any API for this purpose. You have to make multiple calls for this purpose. But you should make these calls in chunks to avoid timeout issues.
Reference
Try this api
`String url = "https://api.linkedin.com/v1/people/~/connections:(id,first-name,last-name,location,picture-url,positions:(title,company:(name)))"; `
I use always HttpGet or HttpPost even when my action is executing a delete method on the database.
For what should I use then HttpDelete/HttpPut ?
Web browsers only support GET and POST, so if you are building a web site, there is no need for PUT or DELETE. If you are building a RESTful api, though, PUT and DELETE are the way to go if you want your users to be able to put and/or delete stuff.
EDIT: It seems browsers do support DELETE and PUT in their implementations of XMLHttpRequest. Hence you can use them in ajax requests. Html forms, though, do not support them.
If you build an OData service.
HTTP DELETE - Deletes the entity data that the specified resource represents. A payload is not present in the request or response messages.
HTTP PUT - Replaces existing entity data at the requested resource with new data that is supplied in the payload of the request message. (msdn)
There's a presentation with Scott Hanselman that might be interesting. (I haven't seen it yet.)
There's also a couple of lectures on pluralsight on OData if you have a subscription there.
I guess you have understood about the use of DELETE request but PUT is a little different thing.
If I'm creating a new resource in the server and if the URI through which it can be accessed is decided by me then I'll go for PUT. In most of the cases the URI is decided by the server and hence POST go for creation and PUT usually for update.
Final thing is, like GET both DELETE and PUT are idempotent, means how many times the client send the requests serially the state of the server should be changed to same as in the first request.
Let's assume we have a client/server interaction happening over unreliable network (packet drop). A client is calling server's RESTful api (over http over tcp):
issuing a POST to http://server.com/products
server is creating an object of "product" resource (persists it to a database, etc)
server is returning 201 Created with a Location header of "http://server.com/products/12345"
! TCP packet containing an http response gets dropped and eventually this leads to a tcp connection reset
I see the following problem: the client will never get an ID of a newly created resource yet the server will have a resource created.
Questions: Is this application level behavior or should framework take care of that? How should a web framework (and Rails in particular) handle a situation like that? Are there any articles/whitepapers on REST for this topic?
The client will receive an error when the server does not respond to the POST. The client would then normally re-issue the request as they assume that it has failed. Off the top of my head I can think of two approaches to this problem.
One is that the client can generate some kind of request identifier, such as a guid, which it includes in the request. If the server receives a POST request with a duplicate GUID then it can refuse it.
The other approach is to PUT instead of POST to create. If you cannot get the client to generate the URI then you can ask the server to provide a new URI with a GET and then do a PUT to that URI.
If you search for something like "make POST idempotent" you will probably find a bunch of other suggestions on how to do this.
If it isn't reasonable for duplicate resources to be created (e.g. products with identical titles, descriptions, etc.), then unique identifiers can be generated on the server which can be tracked against created resources to prevent duplicate requests from being processed. Unlike Darrel's suggestion of generating unique IDs on the client, this would also prevent separate users from creating duplicate resources (which you may or may not find desirable). Clients will be able to distinguish between "created" responses and "duplicate" responses by their response codes (201 and 303 respectively, in my example below).
Pseudocode for generating such an identifier — in this case, a hash of a canonical representation of the request:
func product_POST
// the canonical representation need not contain every field in
// the request, just those which contribute to its "identity"
tags = join sorted request.tags
canonical = join [request.name, request.maker, tags, request.desc]
id = hash canonical
if id in products
http303 products[id]
else
products[id] = create_product_from request
http201 products[id]
end
end
This ID may or may not be part of the created resources' URIs. Personally, I'd be inclined to track them separately — at the cost of an extra lookup table — if the URIs were going to be exposed to users, as hashes tend to be ugly and difficult for humans to remember.
In many cases, it also makes sense to "expire" these unique hashes after some time. For example, if you were to make a money transfer API, a user transferring the same amount of money to the same person a few minutes apart probably indicates that the client never received the "success" response. If a user transfers the same amount of money to the same person once a month, on the other hand, they're probably paying their rent. ;-)
The problem as you describe it boils down to avoiding what are called double-adds. As mentioned by others, you need to make your posts idempotent.
This can be easily implemented at the framework level. The framework can keep a cache of completed responses. The requests have to have a request unique so that any retries are treated as such, and not as new requests.
If the successful response gets lost on its way to the client, the client will retry with the same request unique, the server will then respond with its cached response.
You are left with durability of the cache, how long to keep responses, etc. One approach is to remove responses from the server cache after a given period of time, this will depend on your app domain and traffic and can be left as a configurable step on the framework piece. Another approach is to force the client to sent acknowledgements. The acks can be sent either as separate requests (note that these could be lost too), or as extra data piggy backed on real requests.
Although what I suggest is similar to what others suggest, I strongly encourage you to keep this layer of network resiliency to do only that, deal with drop requests/responses and not allow it to deal with duplicate resources from separate requests which is an application level task. Merging both pieces will mush all functionality and will not leave you with a clear separation of responsibilities.
Not an easy problem, but if you keep it clean you can make your app much more resilient to bad networks without introducing too much complexity.
And for some related experiences by others go here.
Good luck.
As the other responders have pointed out, the basic problem here is that the standard HTTP POST method is not idempotent like the other methods. There is an effort underway to establish a standard for an idempotent POST method known as Post-Once-Exactly, or POE.
Now I'm not saying that this is a perfect solution for everybody in the situation you describe, but if it is the case that you are writing both the server and the client, you may be able to leverage some of the ideas from POE. The draft is here: https://datatracker.ietf.org/doc/html/draft-nottingham-http-poe-00
It isn't a perfect solution, which is probably why it hasn't really taken off in the six years since the draft was submitted. Some of the problems, and some clever alternate options are discussed here:
http://tech.groups.yahoo.com/group/rest-discuss/message/7646
HTTP is a stateless protocol, meaning the server can't open an HTTP connection. All connections get initialized by the client. So you can't solve such an error on the server side.
The only solution I can think of: If you know, which client created the product, you can supply it the products it created, if it pulls that information. If the client never contacts you again, you won't be able to transmit information about the new product.