Using DELETE or PUT to delete a foreign key in a database? - restful-url

We are currently creating a web app that mimics how flickr works as a project.
In this app, we have a galleries and photos.
In each gallery, we store the photo ids in an array.
gallery.photos = [ photoId1, photoId2, photoId3 ]
If we want to delete a photo from the gallery, however the photo stays on the database and can be accessed if you go to the user's profile but not from the gallery.
so if we do DELETE url/gallery/photos/photoId3 then GET url/gallery/photos/photoId3, it would return an Error 404.
There's an argument currently happening on whether we should use DELETE or PUT.
some say DELETE as we are deleting things and photo isn't accessible from that url.
others say PUT as we are just editing the list of photo ids.
So, my question is, is there a common convention when it comes to this problem ?

is there a common convention when it comes to this problem ?
The important thing to recognize is that DELETE and PUT are of the transfer of documents over a network domain.
We use the standard methods in standard ways so that general purpose components can understand what the messages mean and do intelligent things.
If the semantics of the request are "the target uri should answer GET requests with a 404", then DELETE is appropriate. The underlying details of how your implementation achieves this are irrelevant.
Note that the definition of DELETE is pretty explicit about the limits of the semantics
If the target resource has one or more current representations, they might or might not be destroyed by the origin server, and the associated storage might or might not be reclaimed, depending entirely on the nature of the resource and its implementation by the origin server (which are beyond the scope of this specification).
Where things get really messy: HTTP gives us agreement on the meaning of the semantics of messages, but does not restrict implementations. In particular, it is perfectly valid that a DELETE of one resource would also change the representation of some other resource, or vice versa.
Imagine, if you will, that the gallery is a web page (/gallery/1/photos/webpage.html), with links to images, including a link to /gallery/1/photos/3.
DELETE /gallery/1/photos/3
That removes the association between the URI and the image data, but it doesn't (necessarily) change the web page, so you get a broken link.
PUT /gallery/1/photos/webpage.html
That takes the link out of the web page, but of course the image can still be accessed directly via its URI.
(Note: if your profile was using the same URI for the image as your gallery, then this is more likely to be the model you want to use. We take the link out of this web page, but not out of profile.html. DELETE would produce 404's for ALL web pages that link to the picture).
If you want both - when the DELETE happens, the web page should be updated automatically, or vice versa - you can do that within your implementation (side effects are allowed). BUT... general purpose components will not necessarily know that both things have happened. For example, a general purpose cache isn't going to know that the webpage changed when you DELETE the image. And it won't know that you removed the image when you edit the web page.
Which is to say, we don't have a standard way to include in the HTTP response metadata that describes the other resources that have been changed by a request.
You can, of course, include that information in the response so that a bespoke component can do intelligent things.
DELETE /gallery/1/photos/photoId3
200 OK
Content-Type: text/plain
Deleted: /gallery/1/photos/photoId3
Changed: /gallery/1/photos/webpage.html
GET /gallery/galleryId/photos/photoId3 would return an Error 404 as photo isn't in that gallery, however GET /photos/photoId3 would still return the photo assuming you have the correct permissions
The good news: general purpose components don't know that there is any relationship between /gallery/galleryId/photos/photoId3 and /photos/photoId3. Again, the fact that they share information under the covers is an implementation detail hidden behind the HTTP facade.

Related

Do paths like /profile make proper URIs? Do they violate REST? What are the implications?

I'm developing an app on which regular users should have read and write permissions on their own data, while admins have read permission on everybody's.
In my design, admins can:
GET /users
GET /users/:id
But for regular users, two routing schemas came to mind. The first one being just a continuation of the first:
GET /users/:id
GET /users/:id/edit
PATCH /users/:id
and the second being another resource that is dependent on the user that's logged in:
GET /profile
GET /profile/edit
PATCH /profile
The advantage I see on the second approach is that the design itself doesn't allow users to change the URL and try to edit other people's records.
However, Wikipedia says:
A Uniform Resource Identifier (URI) is a string of characters that unambiguously identifies a particular resource.
and as I understand it, /profile doesn't fit that description since different users will see and update different records.
So, the questions are:
Does /profile make a proper URI?
Does it violate REST?
What might be other implications of such design?
Thanks <3
PS: probably URN is a more accurate term than URI in this situation.
As best I can tell, it isn't really a good idea, but you will probably get away with it if you go that route.
First, it's important to recognize that one of the very powerful implications of URI that identify a resource is that you can easily share that URI (for example, pasting it into a message), and the recipient can just use it. In the usual case, the identifier means the same thing no matter who is using it, which is to say that both clients and the server all agree what the URI refers to.
You lose some of that semantic agreement when you start experimenting with providing personalized representations of resources depending on the identify associated with the query.
A second issue is that the target-uri is an important element in HTTPs caching story; there are other condition in play, but a primary condition is whether the target-uri in the request matches the target-uri of the stored response.
So it's easy to image: Alice asks for a representation of some resource, but instead of seeing her own view of the resource, she sees a representation of Bob's view of the resource, because his was available in some public cache.
Which would be pretty awful.
That doesn't actually happen though; how do we tell Alice from Bob? The standard answer is that we have that information in the Authorization header field. HTTP caching, however, has special rules that take effect for shared caches when the request includes an authorization header.
So these rules are going to protect you unless you go out of your way to make a mess of it (for example, by using the public cache control directive).
In summary: can you? Yes, absolutely. Should you...? I eventually decided that I shouldn't. If I need to be clever with a pronoun URI then I will use it to redirect to the appropriate resource, rather than leaning upon content negotiation via the authorization header.
As with most questions, the answer is "it depends" - in this case it depends on who is the primary consumer of those URIs. If it's a user then /profile is perfectly acceptable since there's the additional requirement of user experience. Together with the state provided by the session cookie it uniquely represents a user. To give another example - which would be better on an e-commerce website /basket or /baskets/:id? Obviously it's the former since it allows a user to navigate directly to a URI without having to remember what their basket id is (which is likely to change over time).
Conversely, if the primary user is an API client then the format /users/:id may be more appropriate since that allows for a more consistent approach to coding. Though even here it may still be worthwhile providing some affordance with a URI like /users/current. Even if you follow the principle of HATEOAS in an API you'll still need to get the relevant URIs to call from some singleton resource like the root path.
In general the thing to remember is that these are guiding principles and not hard and fast rules - what makes sense for your application and context may not be the same for other people's applications.
I think the question is: "Should my route be called /profile based on the context of my program?" I don't think it should. I think you should have a base user and run something like permission levels. Like is_admin or is_moderator.

Storage of user data

When looking at how websites such as Facebook stores profile images, the URLs seem to use randomly generated value. For example, Google's Facebook page's profile picture page has the following URL:
https://scontent-lhr3-1.xx.fbcdn.net/hprofile-xft1/v/t1.0-1/p160x160/11990418_442606765926870_215300303224956260_n.png?oh=28cb5dd4717b7174eed44ca5279a2e37&oe=579938A8
However why not just organise it like so:
https://scontent-lhr3-1.xx.fbcdn.net/{{ profile_id }}/50x50.png
Clearly this would be much easier in terms of storage and simplicity. Am I missing something? Thanks.
Companies like Facebook have fairly intense CDNs. They may look like randomly generated urls but they aren't, each individual route is on purpose and programed to be handled in that manner.
They aren't after simplicity of storage like you would be if you were just using a FTP to connect to a basic marketing website server. While you may put all your images in a /images folder, Facebook is much too complex for this. Dozens of different types of applications accessing hundreds if not thousands of CDNs and servers world wide.
If you ever build a web app, such as a Ruby on Rails app, and you work with a services such as AWS (Amazon Web Services) you'll also encounter what seems like nonsensical urls. But it's all part of the fast delivery network provided within the architecture. Every time you "push" your app up to the server new urls are generated for each unique resource automatically, css files, JavaScript files, image files, etc all dynamically created. You don't have to type in each of these unique urls individually each time you publish the app, the code simply knows where to look for those as a part of the publishing process.
Example: you tell the web app to look for
//= require jquery
and it returns you http://example.com/assets/jquery-eb3e278249152b5b5d5170b73d9dbf52.js?body=1 in your header.
It doesn't matter that the url is more complex than it should be, the application recognizes it, and that's all that matters.
Simply put, I think it can boil down to two main reasons: Security and Cache:
Security - Adding these long unpredictable hashes prevent others from guessing photo URLs and makes it pretty hard to download photos you aren't supposed to.
Consider what would happen if I could easily guess your profile photo URL and download it, even when you explicitly chose to share it only with friends.
Cache - by adding "random" query params to each photo, you make sure each photo instance gets its own URL. Thus you can store the photo in browser's cache for a long time, knowing that whenever you replace it with a new one, the new photo will have a fresh URL and the browser won't keep showing you the old photo.
If you were to keep the same URL for each user's profile photo (e.g. https://scontent-lhr3-1.xx.fbcdn.net/{{ profile_id }}/50x50.png), and then upload a new photo, either one of these can happen:
If you stored the photo in browser's cache for a long time, the browser will keep showing you the cached version (as long as URL is the same, and cache hasn't expired, there's no need to re-download the image).
If, instead, you only keep the image in cache for short period of time, you end up hitting your server much more then actually needed, increasing the load and hurting performance.
I hope this clarifies it.
With your route scheme, how would you avoid strangers to access the pictures of a private account? The hash also prevent bots to downloads all the pictures.
I get your pain :-) I might not stay with describing how this problem could appear more, but rather let me speak of a solution. Well it is normal that in general code while dealing with hashed value or even base64ed value it seems likes mess to deal with, but with an identifier to explain along, it does not remain much!
I use to work in a company where we use to collate Facebook post, using Graph API get its Insights Object and extract information from it for easy passing around within UI and sending back to our Redis cache store; and once we defined a data-structure in TaffyDB how an object organization is going to look like, everything just made sense with its ability to query the useful finite from long junk looking stream of minified Javascript stream
Refer: http://www.taffydb.com/
The extra values in the URL are useful to:
Track access. This is like when a newspaper appends "&homepage" vs. "&email" to an article URL, so their system knows how a reader found the page.
Avoid abuse and control access. Imagine that a user loaded a small, popular pornographic image into a profile image. They could then hijack the CDN to be a free web host for their porn site. But that code is used internally by the CDN to limit the number of views.

Rails - Store unique data for each open tab/window

I have an application that has different data sets depending on which company the user has currently selected (dropdown box on sidebar currently used to set a session variable).
My client has expressed a desire to have the ability to work on multiple different data sets from a single browser simultaneously. Hence, sessions no longer cut it.
Googling seems to imply get or post data along with every request is the way, which was my first guess. Is there a better/easier/rails way to achieve this?
You have a few options here, but as you point out, the session system won't work for you since it is global across all instances of the same browser.
The standard approach is to add something to the URL that identifies the context in which to execute. This could be as simple as a prefix like /companyx/users instead of /users where you're fetching the company slug and using that as a scope. Generally you do this by having a controller base class that does this work for you, then inherit from that for all other controllers that will be affected the same way.
Another approach is to move the company identifying component from the URL to the host name. This is common amongst software-as-a-service providers because it makes sharding your application much easier. Instead of myapp.com/companyx/users you'd have companyx.myapp.com/users. This has the advantage of preserving the existing URL structure, and when you have large amounts of data, you can partition your app by customer into different databases without a lot of headache.
The answer you found with tagging all the URLs using a GET token or a POST field is not going to work very well. For one, it's messy, and secondly, a site with every link being a POST is very annoying to work with as it makes navigating with the back-button or forcing a reload troublesome. The reason it has seen use is because out of the box PHP and ASP do not have support routes, so people have had to make do.
You can create a temporary database table, or use a key-value database and store all data you need in it. The uniq key can be used as a window id. Furthermore, you have to add this window id to each link. So you can receive the corresponding data for each browser tab out of the database and store it in the session, object,...
If you have an object, lets say #data, you can store it in the database using Marshal.dump and get it back with Marshal.load.

PUT vs. POST for Uploading Files - RESTful API to be Built Using Zend Framework

I'm building a RESTful API using Zend Framework via the Zend_Rest_Route. For uploading of files, should I use PUT or POST to handle the process? I'm trying to be as consistent as possible with the definition of the REST verbs. Please refer to: PUT or POST: The REST of the Story.
The way I understand this is that I should use PUT if and only if I'm updating the full content of the specified resource. I'll have to know the exact URL to use PUT. On the other hand, I should use POST if I'm sending a command to the server to create a subordinate of the specified resource, using some server-side algorithm.
Let's assume this is a REST API for uploading images. Does that mean I should use POST if the server is to manipulate the image file (i.e. create thumbnail, resize, etc); and use PUT if I just want to save the raw image file to the server?
If I use PUT to handle a file upload, should the process be as follows:
The user sends a GET request to retrieve the specific URL to upload the file by PUT.
Then the user sends a PUT request to that URL.
The file being uploaded is raw - exactly the one the user uploaded.
I'm quite new to this stuff; so hopefully I'm making sense here...
If you know the "best" way to do this, feel free to comment as well.
There seems to be quite a bit of misunderstanding here. PUT versus POST is not really about replace versus create, but rather about idempotency and resource naming.
PUT is an idempotent operation. With it, you give the name of a resource and an entity to place as that resource's content (possibly with server-generated additions). Crucially, doing the operation twice in a row should result in the same thing as if it was done just once or done 20 times, for some fairly loose definition of “the same thing” (it doesn't have to be byte-for-byte identical, but the information that the user supplied should be intact). You wouldn't ever want a PUT to cause a financial transaction to be triggered.
POST is a non-idempotent operation. You don't need to give the name of the resource which you're looking to have created (nor does a POST have to create; it could de-duplicate resources if it wished). POST is often used to implement “create a resource with a newly-minted name and tell me what the name is” — the lack of idempotency implied by “newly-minted name” fits with that. Where a new resource is created, sending back the locator for the resource in a Location header is entirely the right thing to do.
Now, if you are taking the policy position that clients should never create resource names, you then get POST being the perfect fit for creation (though theoretically it could do anything based on the supplied entity) and PUT being how to do update. For many RESTful applications that makes a lot of sense, but not all; if the model being presented to the user was of a file system, having the user supply the resource name makes a huge amount of sense and PUT becomes the main creation operation (and POST becomes delegated to less common things like making an empty directory and so on; WebDAV reduces the need for POST even further).
The summary: Don't think in terms of create/update, but rather in terms of who makes the resource names and which operations are idempotent. PUT is really create-or-update, and POST is really do-anything-which-shouldnt-be-repeated-willy-nilly.
For file upload, unless it is replacing an existing resource, definitely use POST.
In REST, POST is to create new resources, PUT to replace existing resources, GET to retrieve resources, and DELETE to delete resources.
Source: http://en.wikipedia.org/wiki/Representational_state_transfer#RESTful_web_services
REST isn't a standard so this can easily turn into a religious battle. AtomPub and OData standards which are considered to be "RESTful" do agree on this though: POST = creation while PUT = updates
The simple answer is you should use PUT instead of POST in your case since you will be replacing the entire content of the file. Take a look at PUT vs POST
I'll have to know the exact URL to PUT
to
No. You dont have to know the URL to PUT i.e. the PUT URI needn't be present before the PUT operation. If the resource doesn't exist, the resource is created. If the resource is already present, then the resource is replace with the new representation.
To quote the linked article:
PUT puts a page at a specific URL. If
there’s already a page there, it’s
replaced in toto. If there’s no page
there, a new one is created. This
means it’s like a DELETE followed by
an insert of a new record with the
same primary key

ASP.NET MVC: Do GET requests on private web pages have to be nondestructive?

In ASP.NET MVC it seems to be common practice not to use GET requests for calls to a controller that modify the model. For example, deleting a customer should not be possible by clicking a simple HTML link.
The only reason for this rule I am aware of is not safeguard against web-crawlers which might indavertently alter the database. GET requests are commonly regarded as safe, whereas POST requests are not.
Does this mean that this rule does not apply to non-public portions of a website (Example: Your password-protected user administration area)? Or is there any other reason not to use destructive GET requests?
This is generally part of HTTP. From the HTTP 1.1 RFC 2616
Implementors should be aware that the
software represents the user in their
interactions over the Internet, and
should be careful to allow the user to
be aware of any actions they might
take which may have an unexpected
significance to themselves or others.
In particular, the convention has been
established that the GET and HEAD
methods SHOULD NOT have the
significance of taking an action other
than retrieval. These methods ought to
be considered "safe". This allows user
agents to represent other methods,
such as POST, PUT and DELETE, in a
special way, so that the user is made
aware of the fact that a possibly
unsafe action is being requested.
Naturally, it is not possible to
ensure that the server does not
generate side-effects as a result of
performing a GET request; in fact,
some dynamic resources consider that a
feature. The important distinction
here is that the user did not request
the side-effects, so therefore cannot
be held accountable for them.
In other words, it's not enforced, but it's really bad form for a GET request to have side-effects. Imagine if a user bookmarks a URL which does updates something, for example - they probably wouldn't expect that to happen.
Another good reason is accelerator plug-ins for browsers. These attempt to speed up page loads by pre-fetching links on the current page. Imagine if you had a bunch of GET requests to delete all the objects in a list, the plug-in would delete them!
The short of it is that you can't predict what a browser will do with GET requests, if it looks like a plain-old hyperlink then its fair game for a browser to go fetch it.
Yes.
It's not just about web crawlers, it's about CRSF - Cross Site Request Forgery.
So imagine that someone is logged into your web site, and browses to www.hax0rs.com
In the source for hax0rs.com is the following tag
<img src="http://mysite.com/members/statusChange?status=I%20am%20looking%20for%20a%20gimp%20mask" height="0" width="0">
Because your user is logged in, and because the request is going to your site, the authentication cookie goes with it. And bang, suddenly your user's status has changed.
What fun :)
But I suppose you can still do some sort of "non-retrieval" actions on GET requests. For example updating the "LastVisit" records which can be consider undestructive and relatively safe.

Resources