How would you design a hackable url - url

Imagine you had a group of product categories organized in a nice tree hierarchy and you wanted to provide hackable urls to browse these. You could do something like this
/catalog/categorya/categoryb/categoryc
You could then quite easily figure out which category you should list the products for (note that the full URL is needed since you could have categories with the same name but at different locations in the hierarchy)
Now what would be a good approach to add product information in that as well? To give you an example, you wanted to display the product Oblivion for this category
/catalog/games/consoles/playstation/adventure
It's tempting to just add the product at the end of the url
/catalog/games/consoles/playstation/adventure/oblivion
but the moment you do so you loose the ability to know if its category or a product which is called oblivion. I personally feel that not being forced to add a suffix such as .html
/catalog/games/consoles/playstation/adventure/oblivion.html
would be the nicest solution and using some sort of prefix, such as
/catalog/games/consoles/playstation/adventure/product:oblivion
You could also add some sort of trigger like
/catalog/games/consoles/playstation/adventure/PRODUCT/oblivion
not as nice either and you would (even though its very unlikely it would be a problem) restrict yourself from having a category called product
So far a suffix solution looks like the most user-friendly approach that I can think of from the top of my head but I'm not fond of having to use an extension
What are your thoughts on this?

Deep paths irk me. They're hideous to share.
/product/1234/oblivion --> direct page
/product/oblivion --> /product/1234/oblivion if oblivion is a unique product,
--> ~ Diambiguation page if oblivion is not a unqiue product.
/product/1234/notoblivion -> /product/1234/oblivion
/categories/79/adventure --> playstation adventure games
/categories/75/games --> console games page
/categories/76/games --> playstation games page
/categories/games --> Disambiguation Page.
Otherwise, the long urls, while seeming hackable, require you to get all node elements right to hack it.
Take php.net
php.net/str_replace --> goes to
http://nz2.php.net/manual/en/function.str-replace.php
And this model is so hackable people use it all the time blindly.
Note: The .html suffix is regarded by the W3C as functionally meaningless and redundant, and should be avoided in URLs.
http://www.w3.org/Provider/Style/URI

Lets disect your URL in order to be more DRY (non-repetitive). Here is what you are starting with:
/catalog/games/consoles/playstation/adventure/oblivion
Really, the category adventure is redundant as the game can belong to multiple genres.
/catalog/games/consoles/playstation/oblivion
The next thing that strikes me is that consoles is also not needed. It probably isn't a good idea to differentiate between PC's and Console machines as a subsection. They are all types of machines and by doing this you are just adding another level of complexity.
/catalog/games/playstation/oblivion
Now you are at the point of making some decisions about your site. I would recommend removing the playstation category on your page, as a game can exist across multiple platforms and also the games category. Your url should look like:
/catalog/oblivion
So how do you get a list of all the action games for the Playstation?
/catalog/tags/playstation+adventure
or perhaps
/catalog/tags/adventure/playstation
The order doesn't really matter. You have to also make sure that tags is a reserved name for a product.
Lastly, I am assuming that you cannot remove the root /catalog due to conflicts. However, if your site is tiny and doesn't have many other sections then reduce everything to the root level:
/oblivion
/tags/playstation/adventure
Oh and if oblivion isn't a unique product just construct a slug which includes it's ID:
/1234-oblivion

Those all look fine (except for the one with the colon).
The key is what to do when they guess wrong -- don't send them to a 404 -- instead, take the words you don't know and send them to your search page results for that word -- even better if you can spell check there.

If you see the different pieces as targets then the product itself is just another target.
All targets should be accessable by target.html or only target.
catalog/games/consoles/playstation.html
catalog/games/consoles/playstation
catalog/games/consoles/playstation/adventure.html
catalog/games/consoles/playstation/adventure
catalog/games/consoles/playstation/adventure/oblivion.html
catalog/games/consoles/playstation/adventure/oblivion
And so on to make it consistent.
My 5 cents...

One problem is that your user's notion of a "group of product categories organized in a nice tree hierarchy" may match yours.
Here's a google tech talk by David Weinberger's "Everything is Miscellaneous" with some interesting ideas on categorizing stuff:
http://www.youtube.com/watch?v=x3wOhXsjPYM

#Lou Franco yeah either method needs a sturdy fallback mechanism and sending it to some sort of suggestion page or seach engine would be good candidates
#Stefan the problem with treating both as targets are how to distinguish them (like I described). At worst case scenario is that you first hit your database to see if there is a category which satisfies the path and if it doesn't then you check if there is a product which does. The problem is that for each product path you will end up making a useless call to the database to make sure its not a category.
#some yeah a delimiter could be a possible solution but then a .html suffix is more userfriendly and commonly known of.

i like /videogames/consolename/genre/title" and use the amount of /'s to distinguish between category or product. The only thing i would be worried about multi (or hard to distinguish) genre. I highly recommend no extension on title. You could also do something like videogames(.php)?c=x360;t=oblivion; and just guess the missing information however i like the / method as it looks more neat. Why are you adding genre? it may be easier to use the first letter of the title or just to do videogame/console/title/

My humble experience, although not related to selling games, tells me:
editors often don't use the best names for these "slugs", they don't chose them wisely.
many items belong (logically) to several categories, so why restrict them (technically) to a single category?
Better design item urls by ids, (i.e. /item/435/ )
ids are stable (generated by the db, not editable by the editor), so the url stands a much bigger chance at not being changed over time
they don't expose (or depend on) the organization of the objects in the database like the category/item_name style of urls does. What if you change the underlying design (object structure) to allow an item to belong to multiple categories? the category/item urls suddenly won't make sense anymore; you'll change your url design and old urls might not work anymore.
Labels are better than categories. That is to say, allowing an item to belong to several categories is a better approach than assigning one category to each item.

the problem with treating both as
targets are how to distinguish them
(like I described). At worst case
scenario is that you first hit your
database to see if there is a category
which satisfies the path and if it
doesn't then you check if there is a
product which does. The problem is
that for each product path you will
end up making a useless call to the
database to make sure its not a
category.
So what? There's no real need to make a hard distinction between products and categories, least of all in the URI, except maybe a performance concern over an extra database call. If that's really such a big deal to you, consider these two suggestions:
Most page views will presumably be on products, not categories. So doing the check for a product first will minimize the frequency with which you need to double up on the database lookups.
Add code to your app to display the time taken to generate each page, then go out to the nearest internet cafe (not your internal LAN!) with a stopwatch. Bring up some pages from your site and time how long each takes to come up. Subtract the time taken to generate the page. Also compare the time taken to generate one-database-lookup pages vs. two-database-lookup pages. Then ask yourself, when it takes maybe 1-2 seconds total to establish a network connection, generate the content, and download the content, does it really matter whether you're spending an extra 0.05 second or less for an additional database lookup or not?
Optimize where it matters, like making URLs that will be human-friendly (as in Chris Lloyd's answer). Don't waste your time trying to shave off the last possible fraction of a percent.

Related

Node vs Relationship

So I've just worked through the tutorial and I'm unclear about a few things. The main one, however, is how do you decide when something is a relationship and when it should be a Node?
For example, in the Movies Database,there is a relationship showing who acted in which film. A property of that relationship is the Role. BUT, what if it's a series of films? The role may well be constant between films (say, Jack Ryan in The Hunt for Red October, Patriot Games etc.)
We may also want to have some kind of Character bio, which would obviously remain constant between movies. Worse, the actor may change from one movie to another (Alec Baldwin then Harrison Ford.) There are many others like this (James Bond, for example).
Even if the actor doesn't change (Main roles in Harry Potter) the character is constant. So, at what point would the Role become a node in its own right? When it does, can I have a 3-way relationship (Actor-Role-Movie)? Say I start of with it being a relationship and then, down the line, decide it should've been a node, is there a simple way to go through the database and convert it?
No there is no way to convert your datamodel. When you start your own Database first take time to find a fitting schema. There is no ideal way to create a schema and also there are many different models fitting to the same situation without being totally wrong.
My strategy is to put less information to the relationship itself. I only add properties that directly concern the relationship and store all the other data in the nodes. Also think of properties you could use for traversing the graph. For example you might need some flags or even different labels for relationships even they more or less are the same. The apoc.algo.aStar is only including relationshiptypes you want (you could exclude certain nodes by giving them a special relationshiptype). So keep that in mind that you take a look at procedures that you might use later.
Try to create the schema as simple as possible and find a way to stay consistent in terms of what things are nodes and what deserves a relationship. Dont mix it up.
Choose a design that makes sense for you! (device 1)-[cable]-(device 2) vs (device 1)-[has cable]-(cable)-[has cable]-(device 2) in this case I'd prefer the first because [has cable] wouldn't bring anymore information. Irrespective to what I wrote above I would have a lot of information in this [cable] relationship but it totally makes sense for me because I wouldnt want to search in a device node for cable information.
For your example giving the role a own node is also valid way. For example if you want to espacially query which actors had the same role in common I'll totally go for giving the role a extra node.
Summary:
Think of what you want to do with the data and choose the easiest model.

how to create a replicable, unique code for a pre-ISBN book

I am putting my collection of some 13000 books in a mySQL database. Most of the copies I possess
can be identified uniquely by ISBN. I need to use this distinguishing code as a foreign key into
another database table.
However, quite a few of my books date from pre-ISBN ages. So for these, I am trying to devise a
scheme to uniquely assign a code, sort of like an SKU.
The code would be strictly for private use. It should have the important property that, when I
obtain a pre-ISBN publication, I could build the code from inspecting the work, and based on the
result search the database to see if I already have other copies in my possession.
Many years ago I think I saw a search scheme for some university(?) catalogue, where you could
perform a search of a title based on a concatenated string' (or code) that was made up of let's
say 8 letters from the title, and 4 from the author, and maybe some other data. For example,
to search 'The Nature of Space and Time' by Stephen Hawking and Roger Penrose you might perform
a search on the string 'Nature SHawk', being comprised of 8 characters from the title (omitting
non-filing words and stopwords) and 4 from the author(s).
I haven't been able to find any information on such scheme's, or whether or not such an approach
was standardized in any way.
Something along these lines could be made up of course, but I was wondering if people here have
heard of such schemes, of have ideas on how to come to a solution to this.
So keep in mind the important property of 'replicability': using the scheme, inspection of a pre-
ISBN dated work should --omitting very special or exclusive cases-- in general lead to a code
that can singly be used to subsequently determine if such a copy is already in the database.
Thank you for your time.
Just use the Title (add Author and Publisher as options) and a series id to produce a fake isbn. Take a look at fake_isbn.
NOTE: use the first digit as a series id but don't use 9!

CRUD operations on list of items instead of a single one in ChicagoBoss

I'm a newbie to ChicagoBoss and Erlang in general, so please bear with me.
I have a model of Options which represent a number of site configurations (think of the available options in WordPress, since it's modeled after it), to which I have to perform CRUD operations on.
The model looks like this:
-module(options,
[
Id,
KeyName::string(),
Value::string(),
IsActive::string()
]
).
-compile(export_all).
Each option is prefixed by its category, so general options names look like "general_option_" followed by its specific name.
The views for Options are mostly a list of inputs with each input linked to a specific option, as you might expect.
Since the number and name of options is not known beforehand (except in the view), I would like to know what approaches there are for dealing with this case, as every example I've seen so far deals with a single item, and not a list of them. Please share any advice or constructive criticism you have, as it will be very welcome.

New(?) attept to structure RESTful base URLs

We all love REST, especially when it comes to the development of APIs. Doing so for the last years I always stumble upon the same problem: nested resources. It seems we're living at the two edges of a scale. Let me introduce an example.
/galaxies/8/solarsystems/5/planets/1/continents/4/countries.json
Neato. Cases like that seem to happen everywhere, no matter in what shape they materialize. Now I'd like to being able to fetch all the countries in a solar system while being able to fetch countries deeply scoped as shown above.
It seems I have two choices here. The first one, I flatten my nested structure and introduce a lot of GET parameters (that need to be well documented and understood by my API user) like so:
/countries.json?galaxy=8&solarsystem=5&planet=1&continent=4
I could flatten all my resources like so and won a unique endpoint base URL for each one. Good point … unique endpoints per resource!
But what's the price? Something that does not feel natural, is not discoverable and does not behave like the tree structure below my resources. Conclusion: Bad idea, but well practiced.
On the other hand I could try to get rid of as many additional GET parameters as possible, creating endpoints like that:
/galaxies/8/solarsystems/5/countries.json
But I also needed:
/galaxies/8/solarsystems/5/planets/1/continents/4/countries.json
This seems to be the other side of the scale. Least number of additional GET parameters, more natural behave but still not what I expected as an API user.
The most APIs I worked with in the last year follow the one or the other paradigm. It seems there is at least one bullet to bite. So why not doing the following:
If there are resources that nest naturally, lets nest them exactly in the way we'd expect them to be nested. What we achive is at first a unique endpoint for every resource when we stay like that:
/galaxies.json
/galaxies/8/solarsystems.json
/galaxies/8/solarsystems/5/planets.json
/galaxies/8/solarsystems/5/planets/1/continents.json
/galaxies/8/solarsystems/5/planets/1/continents/4/countries.json
Ok, but how to solve the initial problem, I wanted to fetch all the countries in a solar system while still being able to fetch countries fully scoped under galaxies, solar systems, planets and continents? Here's what feels natural for me:
/galaxies/8/solarsystems/5/planets/0/continents/0/countries.json # give me all countries in the solarsystem 5
/galaxies/8/solarsystems/0/planets/0/continents/0/countries.json # give me all countries in the galaxy 8
… and so on, and so on. Now you may argue "ok, but the zero there ….." and you are right. Does not look really nice. So why not change the two upper calls to something like that:
/galaxies/8/solarsystems/5/planets/all/continents/all/countries.json # give me all countries in the solarsystem 5
/galaxies/8/solarsystems/all/planets/all/continents/all/countries.json # give me all countries in the galaxy 8
Neat eh? So what do we achive? No additional GET parameters and still stable base URLs for each resources endpoint. What's the price? Yep, at least longer URLs especially during testing by hand using tools like curl.
I wonder wether this could be a way to improve not only the maintainability but also the ease of use of APIs. If so, why does not anyone take an approach like that. I can not imagine to be the first one having that idea. So there must be valid counter arguments against an approach like that. I don't see any. Do you?
I would really like to hear your opinion and arguments for or against an approach like that. Maybe there are ideas for improvement … would be great to hear from you. In my opinion this could lead to much better structured APIs, so hopefully someone will read that and reply.
Regards.
Jan
It would all depend on upon how the data is presented. Would the user really need to the know the galaxy # to find a specific country? If so them what you propose makes sense. However, it seems to me that what you are proposing, while structured and presented well, doesn't allow for clients to search for child element unless the parent is a known quantity.
In your example, if I had a specific id for a continent I would need to know the planet, solar system and galaxy as well. In order to find the specific continent I would need to get all for each possible parent until I found the continent.
Presenting structured data in this manner if fine. Using this structure when you only have a piece of the data may be a bit cumbersome. It all depends upon what you are trying to accomplish.
Nested resource URLs are usually bad. The approach I generally take is to use unique IDs.
Design your DB so that it is only going to have one continent with ID 4. Then, instead of the horrible /galaxies/8/solarsystems/5/planets/1/continents/4/countries.json, all you need is the simple /continents/4/countries.json. Clear, sufficient, and memorable.
The :shallow routing option in Rails does this automatically.
For "all countries in a solar system", I'd use /solar_systems/5/countries.json -- that is, don't try to shoehorn it into the generic URL scheme. (And note the underscore.)

Cleaning Up Query Strings

This is more of an open question. What is your opinion on query strings in a URL? While creating sites in ASP.NET MVC you spend a lot of time thinking about and crafting clean URLs only for them to be shattered the first time you have to use query strings, especially on a search form.
For example I recently did a fairly simple search form with half a dozen text field and two or three lists of checkboxes and selects. This produced the query string below when submitted
countrylcid=2057&State=England&StateId=46&Where=&DateFrom=&DateTo=&Tags=&Keywords=&Types
=1&Types=0&Types=2&Types=3&Types=4&Types=5&Costs=0.0-9.99&Costs=10.00-29.99&Costs=30.00-59.99&Costs=60.00-10000.00
Beautiful I think you'll agree. Half the fields had no information in them and the list inputs are very verbose indeed.
A while ago I implemented a simple solution to this for paging which produced a url such as
www.yourdomain.com/browse/filter-on/page-1/perpage-50/
This used a catchall route to grab what is essentially a replacement query string after the filter-on portion. Works quite well but breaks down when doing form submissions.
Id be keen to hear what other solutions people have come up with? There are lots of articles on clean urls but are aimed at asp.net developers creating basic restful urls which MVC has covered. I am half considering diving into model binding to produce a proper solution along those lines. With the above convention the large query string could be rewritten as:
filter-on/countrylcid-2057/state-England/stateId-46/types-{1,0,2,3,4,5}/costs-{0.0-9.99,10.00-29.99,30.00-59.99,60.00-10000.00}/
Is this worth the effort?
Thanks,
My personal view is that where users are likely to want to either bookmark or pass on URLs to other people then a nice, clean "friendly" URL is the way to go. Aesthetically they are much nicer. For simple pagination and ordering then a re-written URL is a good idea.
However, for pages that have a large number of temporary, dynamic fields (such as a search) then I think the humble query string is fine. Like wise for pages whose contents are likely to change significantly given the exact same URL in the future. In these cases, URLs with query strings are fine and, perhaps, even preferable as they at least indicate to the observant user that the page is dynamic. However, in these cases it may be better to use form POST variables, anyway, that way visitors are not tempted to "fiddle" with the values.
In addition to what others have said, a URL implies a hierarchy that is semantic. Whether true today or not, the ancestry is directories and people still think of it as such. That's why you have controller/action/id. Likewise, to me a querystring implies options or queries.
Personally, I think a rewritten URL is best when you can't tell if there's an interpreter behind it -- maybe it's just a generated HTML file?
So however you choose to do it (and it's a pain on the client in a search form -- I'd say more trouble than it's worth), I'd support you doing it for hierarchies.
E.g. /Search/Country/State/City
but once you start getting into prices and types, or otherwise having to preface a "directory" with the type of value (e.g. /prices=50.00/ or worse, with an array), then that's where you've lost me.
In fact, if all elements are filled in, then all you've really done is taken the querystring, replaced "&" with "/", and combined your arrays into a single field.
If you're going to be writing the javascript anyways, why don't you just loop through the form elements and:
Remove the empty ones, cleaning up the querystring from the "&price_low=&price_high=&" sorts of things.
Combine multiple values into an array structure
But then submit as a querystring.
James
Aren't the values of the different fields available in the FormsCollection anyway on post?

Resources