Pros and Cons of using hierarchical URLs versus flat? - url

I'm building a large news site and we'll have several thousand articles. So far we have over 20,000. We plan on having a main menu which contains links which will display articles based on those criteria. Therefore, clicking "baking" will show all articles related to "baking", and "baking/cakes" will show everything related to cakes.
Right now, we're weighing whether or not to use hierarchical URLs for each article. If I'm on the "baking/cakes" page, and I click an article that says "Chocolate Raspberry Cake", would it be best to put that article at a specific, hierarchical URL like this:
website.com/baking/cakes/chocolate-raspberry-cake
or a generic, flat one like this:
website.com/articles/chocolate-raspberry-cake
What are the pros and cons of doing each? I can think of cases for each approach, but I'm wondering what you think.
Thanks!

It really depends on the structure of your site. There's no one correct answer for every site.
That being said, here's my recommendation for a news site: instead of embedding the category in the URL, embed the date. For example: website.com/article/2016/11/18/chocolate-raspberry-cake or even website.com/2016/11/18/chocolate-raspberry-cake. This allows you to write about Chocolate Raspberry Cake more than once, as long as you don't do it on the same day. When I'm browsing news I find it helpful to identify the date an article was written as quickly as possible; embedding it in the URL is very helpful.
Hierarchical URLs based on categories lock you into a single category for each article, which may be too limiting. There may be articles which fit multiple categories. If you've set up your site to require each article to have a single primary category, then this may not be an issue for you.
Hierarchical URLs based on categories can also be problematic if any of the categories ever change. For example, in the case of typos, changes to pluralization, a new term coming into vogue and replacing an existing term, or even just a change in wording (e.g. "baking" could become "baked goods"). The terms as they existed when you created the article will be forever immortalized in your URL structure, unless you retroactively change them all (invalidating old links, so make sure to use Drupal's Redirect module).
If embedding the date in the URL is not an option, then my second choice would be the flat URL structure because it will give you URLs which are shorter and easier to remember. I would recommend using "article" instead of "articles" in the URL because it saves you a character.

Related

URL keyword vs URL readibility

this question is about SEO in URL naming, I just want to know is SEO really weight much more than user experience? What you guys will see as limit to how far SEO should go as ruining people's experience. Just like for this example, I have a page that contain information about art contest that is running or have run in my website.
Which URL is better?
example.com/contest/{contest-id}/{name-of-contest}
or
example.com/online-graphic-design-contest/{contest-id}/{name-of-contest}
Is keyword stuffing in url for keyword such as 'online', 'graphic', 'design' and 'contest' so much more important in SEO, than having a short more readable URL such as the first one?
The best way to think about SEO these days is through the perspective of the user, firstly, and then through the search engine perspective. I would argue that your second URL is much better for both cases. It's more descriptive to the user (we have an "online graphic design contest") and also to search engines.
Google has made it apparent that their focus is on providing content that is relevant to the user, and the best way to be relevant is with content that is descriptive and fits with what your users are searching for. I don't think you're keyword stuffing if you're using a single natural language phrase in the URL to describe the content of the page. That portion of the URL should also match your page title, and header tags on the page, etc., etc.
Here are some useful resources:
http://static.googleusercontent.com/media/www.google.com/en/us/webmasters/docs/search-engine-optimization-starter-guide.pdf
http://linchpinseo.com/user-focused-seo-redefining-what-search-engine-optimization-is

Canonical url and localization

In my application I have localized urls that look something like this:
http://examle.com/en/animals/elephant
http://examle.com/nl/dieren/olifant
http://examle.com/de/tiere/elefant
This question is mainly for Facebook Likes, but I guess I will hit similar problems when I start thinking about search engine crawlers.
What kind of url would you expect as canonical url? I don't want to use the exact english url, because I want that people clicking the link will be forwarded to their own language (browser setting/dependent on IP).
The IP lookup is not something that I want to do on every page hit. Besides that I would need to incorporate more 'state' in my application, because I have to check wether a user has already been forwarded to his own locale, or is browsing the english version on purpose.
I guess it will going to be something like:
http://example.com/something/animals/elephant
or maybe without any language identifier at all:
http://example.com/animals/elephant
but that is a bit harder to implement, bigger chance on url clashes in the future (in the rare case I would get a category called en or de).
Summary
What kind of url would you expect as canonical url? Is there already a standard set for this?
I know this question is a bit old, but I was facing the same issue.
I found this:
Different language versions of a single page are considered duplicates only if the main content is in the same language (that is, if only the header, footer, and other non-critical text is translated, but the body remains the same, then the pages are considered to be duplicates).
That can be found here: https://developers.google.com/search/docs/advanced/crawling/consolidate-duplicate-urls
From this I can conclude that we should add locales to canonicals.
I did find one resource that recommends not using the canonical tag with localized addresses. However, Google's documentation does not specify and only mentions subdomains in another context.
There is more that that language that you need to think of.
It's typical a tuple of 3 {region, language, property}
If you only have one website then you have {region, language} only.
Every piece of content can either be different in this 3 dimensional space, or at least presented differently. But this is the same piece of content so you'd like to centralize managing of editorial signals, promotions, tracking etc etc. Think about search systems - you'd like page rank to be merged across all instances of the article, not spread thinly out.
I think there is a standard solution: Canonical URL
Put language/region into the domain name
example.com
uk.example.com
fr.example.com
Now you have a choice how you attach a cookie for subdomain (for language/region) or for domain (for user tracking)!
On every html page add a link to canonical URL
<link rel="canonical" href="http://example.com/awesome-article.html" />
Now you are done.
There certainly is no "Standard" beyond it has to be an URL. What you certainly do see on many comercial websites is exactly what you describe:
<protocol>://<server>/<language>/<more-path>
For the "language-tag" you may follow RFCs as well. I guess your 2-letter-abbrev is quite fine.
I only disagree on the <more-path> of the URL. If I understand you right you are thinking about transforming each page into a local-language URL? I would not do that. Maybe I am not the standard user, but I personally like to manually monkey around in URLs, i.e. if the URL shown is http://examle.com/de/tiere/elefant, but I don't trust the content to be translated well I would manually try http://examle.com/en/tiere/elefant -- and that would not bring me to the expected page. And since I also dislike those URLs http://ex.com/with-the-whole-title-in-the-url-so-the-page-will-be-keyworded-by-search-engines my favorite would be to just exchange the <language> part and use generic english (or any other language) for <more-path>. Eg:
http://examle.com/en/animals/elephant
http://examle.com/nl/animals/elephant
http://examle.com/de/animals/elephant
If your site is something like Wikipedia, then I would agree to your scheme of translating the <more-part> as well.
Maybe this Google's guidelines can help with your issue: https://support.google.com/webmasters/answer/189077?hl=en
It says that many websites serve users (across the world) with content targeted to users in a certain region. It is advised to use the rel="alternate" hreflang="x" attributes to serve the correct language or regional URL in Search results.

Find out various layouts used in website

Is it possible to find out the total no of layouts (templates) used within a website.
For example:-
Suppose i want to know how many types of layouts www.flipkart.com uses.
Answer will be like:-
Landing page or Home page
Category Page e.g http://www.flipkart.com/mobiles?_l=GIuT6NCRsZbfL9ID9ZKHNQ--&_r=hCno5y6eFUI8C0iWzaQbAg--&ref=cef19a11-4ebc-4f8e-a0dc-401c2d55de3e&_pop=brdcrumb
This is a category page. All such pages will have same layout only the inner content will be different.
Product Pages like http://www.flipkart.com/htc-sensation-mobile-phone/p/itmczbrsnwphgbnw?pid=MOBCYW9HXBUDYJPH&_l=sXQjsX87GxqrvKzhjuOrkw--&_r=n_2yuAC4xgh0SZTuulvAtw--&ref=9305103f-6fc1-497c-807a-8f30ee30c13c is a product page.
All the product pages will have same layout like they have buy now option. Multiple images will be there. So Is there any existing tool to find out this.
I hope i am clear in my question. I just want to classify the site pages into some buckets.
Well I don't think there exists some kind of tool or algorithm now upto my knowledge but yes you can write some. Try to find out some attributes of these pages and set them as benchmarks. Now whenever you encounter a url and you want to identify its category just find out the attributes again and compare against the benchmarks set.
Its not generic though but will work for specific websites :)

Rails - extract seo keywords from block of text

I need to generate seo meta keyword tags based upon user generated wiki content.
Say I have an article and a predefined list of keywords/phrases, is there some good method to grab matched article keywords? Keywords may not be of one word length and will be given a predefined weight as to which keywords are used first. Some implementation of Nokogiri seems the obvious choice but I wondered if there were something more complete for this exact scenario.
You could process your text thanks to a semantic API, it will give you a list of potential keywords + the score associated.
I've begun to develop this gem: https://github.com/apneadiving/SemExtractor
It still needs some improvements for error handling but it's fully operational to query the following engines:
Zemanta
Semantic Hacker from Textwise
Yahoo Boss
OpenCalais
If you're only wanting to grab keywords for the meta keyword tag, that's not really worth your time. Google doesn't pay attention to those anymore.
Here's a good post about it, with a video of Matt Cutts from Google explaining that the meta keyword tag doesn't play a part in search engine rankings.
http://www.stepforth.com/blog/2010/meta-keyword-tag-dead-seo/
What is worth your time? Good title tags.

Lack of invariance in stackoverflow URL. Why? [duplicate]

This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
Why do some websites add “Slugs” to the end of URLs?
This is not a question about stackoverflow, it's a question about a design decision which stackoverflow implements, and I take it as example.
A question on stackoverflow is identified by the following URL (took one from the suggestions)
https://stackoverflow.com/questions/363292/why-is-visual-c-lacking-refactor
Similarly, my user URL is:
https://stackoverflow.com/users/78374/stefano-borini
fact is, only the numeric index is actually used
https://stackoverflow.com/users/78374/
The remaining part can be anything. What is the reason behind such design decision, in particular considering that "cool URIs do not change"
Edit: voting for close after I saw this question which substantially puts the same issue forward. My question is a duplicate
Part of the reason is so you can change your user name or the title of the post (correcting spellings etc.) but leave the URL valid.
It makes SEO sense to have the title in the URL - it makes it a lot more likely that the site will get indexed correctly.
It allows the URL to contain some interesting information for humans and search engines, but still works even if the title changes.
You could store the original "slug" in the database and verify against that as well as the id, but the only thing it prevents is games like this:
Lack of invariance in stackoverflow URL. Why?
:)
Search engines like text in URLs.
Pages are given higher rank when the search terms actually appear in the URL rather than just the page. It's robot sugar, basically.
This allows you to see in the url some text which means something to you. If I look in a history of links a bunch of questions, the number alone would be meaningless. However, having the text there allows me to have some context.
This is SEO (search engine optimization) in action. It helps with the ranking of pages in search results (on google, yahoo, bing, ...), because search engines give higher rankings to pages which URL's contain keywords the user is searching for.
Theoretically, it's for SEO reasons. The software ignores the part following the identifier (the "slug"), but the idea is that search engine crawlers consider the description part of the link text, and thus weigh the resulting page higher in search results. Whether this actually happens in any meaningful way, I don't know for sure.
A more practical use is that you can gain a better idea of where a link's pointing just by inspecting the slug, which is handy if you've got multiple question URLs.
In addition to the benefit for Search Engines (the text in the URL is powerful), the fact is for all practical purposes this URL does not "change". A change can be defined as something which causes a link at some point in the future not to work - this would never be the case with this URL. The varying text at the end does not affect any user's ability to access the underlying resource.
"cool URIs do not change"
Cool URIs can change, as long as the old ones are still fine.
Maybe someone will decide that “Lack of invariance in stackoverflow URL. Why ?” is a bad question title, and change it to “Why is there redundant information in SO URLs?”. It would be good if the slug can update to reflect the new title, especially if the reason we wanted to change it was an embarrassing typo. But the old URI must continue to work.
One drawback to non-canonical URIs is that search engines can get confused and think they're two different pages. Or they'll spot that they're two pages the same, but decide that the ‘best’ page to link to is the one with the title you don't want. This is especially bad if lots of people link to another title completely like:
http://stackoverflow.com/questions/1534161/stackoverflow-smells-of-poo
cue more embarrassment. The best way around this (though few sites bother, and SO doesn't) is to check the slug on submission and do a 301 permanent redirect to the new URI with the up-to-date slug instead. Search engines will pick up the new URI and not any malicious one with poo in it.

Resources