internationalization - translate urls or not - url

I've found a lot of topics talcking about url schemes when going to international, the most common is the one which do :
www.mysite.com/fr/products
www.mysite.com/en/products
www.mysite.com/es/products
Is is common or useful to do something like this?
www.mysite.com/fr/produits
www.mysite.com/en/products
www.mysite.com/es/productos
ie translating the path as well as the page content.

I'm not sure if Stack Overflow is the right forum for this, but here goes:
From a technical point of view, option 1 is probably the easiest to implement. If you are on any English-language page and need to link to, say, the Spanish version of the same page, all you need to do is take your URL and replace /en/ by /es/.
So far I've seen this approach used practically everywhere... I cannot remember ever having seen option 2 in practice.

Related

Canonical url and localization

In my application I have localized urls that look something like this:
http://examle.com/en/animals/elephant
http://examle.com/nl/dieren/olifant
http://examle.com/de/tiere/elefant
This question is mainly for Facebook Likes, but I guess I will hit similar problems when I start thinking about search engine crawlers.
What kind of url would you expect as canonical url? I don't want to use the exact english url, because I want that people clicking the link will be forwarded to their own language (browser setting/dependent on IP).
The IP lookup is not something that I want to do on every page hit. Besides that I would need to incorporate more 'state' in my application, because I have to check wether a user has already been forwarded to his own locale, or is browsing the english version on purpose.
I guess it will going to be something like:
http://example.com/something/animals/elephant
or maybe without any language identifier at all:
http://example.com/animals/elephant
but that is a bit harder to implement, bigger chance on url clashes in the future (in the rare case I would get a category called en or de).
Summary
What kind of url would you expect as canonical url? Is there already a standard set for this?
I know this question is a bit old, but I was facing the same issue.
I found this:
Different language versions of a single page are considered duplicates only if the main content is in the same language (that is, if only the header, footer, and other non-critical text is translated, but the body remains the same, then the pages are considered to be duplicates).
That can be found here: https://developers.google.com/search/docs/advanced/crawling/consolidate-duplicate-urls
From this I can conclude that we should add locales to canonicals.
I did find one resource that recommends not using the canonical tag with localized addresses. However, Google's documentation does not specify and only mentions subdomains in another context.
There is more that that language that you need to think of.
It's typical a tuple of 3 {region, language, property}
If you only have one website then you have {region, language} only.
Every piece of content can either be different in this 3 dimensional space, or at least presented differently. But this is the same piece of content so you'd like to centralize managing of editorial signals, promotions, tracking etc etc. Think about search systems - you'd like page rank to be merged across all instances of the article, not spread thinly out.
I think there is a standard solution: Canonical URL
Put language/region into the domain name
example.com
uk.example.com
fr.example.com
Now you have a choice how you attach a cookie for subdomain (for language/region) or for domain (for user tracking)!
On every html page add a link to canonical URL
<link rel="canonical" href="http://example.com/awesome-article.html" />
Now you are done.
There certainly is no "Standard" beyond it has to be an URL. What you certainly do see on many comercial websites is exactly what you describe:
<protocol>://<server>/<language>/<more-path>
For the "language-tag" you may follow RFCs as well. I guess your 2-letter-abbrev is quite fine.
I only disagree on the <more-path> of the URL. If I understand you right you are thinking about transforming each page into a local-language URL? I would not do that. Maybe I am not the standard user, but I personally like to manually monkey around in URLs, i.e. if the URL shown is http://examle.com/de/tiere/elefant, but I don't trust the content to be translated well I would manually try http://examle.com/en/tiere/elefant -- and that would not bring me to the expected page. And since I also dislike those URLs http://ex.com/with-the-whole-title-in-the-url-so-the-page-will-be-keyworded-by-search-engines my favorite would be to just exchange the <language> part and use generic english (or any other language) for <more-path>. Eg:
http://examle.com/en/animals/elephant
http://examle.com/nl/animals/elephant
http://examle.com/de/animals/elephant
If your site is something like Wikipedia, then I would agree to your scheme of translating the <more-part> as well.
Maybe this Google's guidelines can help with your issue: https://support.google.com/webmasters/answer/189077?hl=en
It says that many websites serve users (across the world) with content targeted to users in a certain region. It is advised to use the rel="alternate" hreflang="x" attributes to serve the correct language or regional URL in Search results.

URL SEO for e-commerce website products

I'm currently working out the best setup for URLs on an e-commerce site I'm working on.
The site sells games and as you may already know games can come with demos and multiple dlc packs. On the site a game, a demo and dlc all have their own individual pages.
I have designed the following urls... but can't figure out which one is better and whether potentially they might be too long.
Option One:
.../product/the-game-name/ // the full game
.../product/the-game-name/demo/ // the demo
.../product/the-game-name/dlc/name-of-dlc/ // the specific dlc
Option Two:
.../game/the-game-name/ // the full game
.../demo/the-game-name/ // the demo
.../dlc/the-game-name/name-of-dlc/ // the specific dlc
In both examples "..." is purely the domain name: i.e. http://mysite.com
If anyone can tell me the pros and cons of either option, or whether there are better alternatives that would be handy.
I'd avoid extending the URL with things like /product/ if possible. It doesn't really add anything.
Otherwise, I prefer the first option as it seems more natural.
You may also want to be consistent with using folders (ends with a slash) or files, don't mix.
.../the-game-name/ # the full game
.../the-game-name/demo/ # the demo
.../the-game-name/dlc/name-of-dlc/ # the specific dlc
Your use of dashes for spaces is good.
I've always been told that you want your key phrase rich URLs as close to the root as possible. To use hyperbole, domain.com/manufacturers/nintendo/products/super-mario-brothers is less effective than domain.com/super-mario-brothers or domain.com/nintendo/super-mario-brothers.
With the slashes, I assume you have IIS7. If you don't, you can download the Rewrite Module 2.0, and once installed, click it, then click add rule, there's a module right on the interface for taking care of trailing slashes throughout your solution.
After you've downloaded the Rewrite module, I would also recommend the Search Engine Optimization Toolkit that will offer suggestions as well. Some of the suggestions may be feeble, but some may be worth looking at. For instance, maybe you have your pageheaders styled a certain way but you don't have h1 tags, which SEs like.
Lastly, none of the rewrite modules mean anything if you don't get your hands dirty and go out there and make backlinks. Submit your sitemap.xml to google, your urllist.txt to yahoo/bing, and then backlink. Put your site in as many free directories as possible if they're relevant to your industry -- don't "link farm", though. Make a notepad file with a 30 or 40 word description with key phrases from your site, and just copy/paste 'em into backlinking directories. The most important backlinks are the ones that are pertinent to your product.

Lack of invariance in stackoverflow URL. Why? [duplicate]

This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
Why do some websites add “Slugs” to the end of URLs?
This is not a question about stackoverflow, it's a question about a design decision which stackoverflow implements, and I take it as example.
A question on stackoverflow is identified by the following URL (took one from the suggestions)
https://stackoverflow.com/questions/363292/why-is-visual-c-lacking-refactor
Similarly, my user URL is:
https://stackoverflow.com/users/78374/stefano-borini
fact is, only the numeric index is actually used
https://stackoverflow.com/users/78374/
The remaining part can be anything. What is the reason behind such design decision, in particular considering that "cool URIs do not change"
Edit: voting for close after I saw this question which substantially puts the same issue forward. My question is a duplicate
Part of the reason is so you can change your user name or the title of the post (correcting spellings etc.) but leave the URL valid.
It makes SEO sense to have the title in the URL - it makes it a lot more likely that the site will get indexed correctly.
It allows the URL to contain some interesting information for humans and search engines, but still works even if the title changes.
You could store the original "slug" in the database and verify against that as well as the id, but the only thing it prevents is games like this:
Lack of invariance in stackoverflow URL. Why?
:)
Search engines like text in URLs.
Pages are given higher rank when the search terms actually appear in the URL rather than just the page. It's robot sugar, basically.
This allows you to see in the url some text which means something to you. If I look in a history of links a bunch of questions, the number alone would be meaningless. However, having the text there allows me to have some context.
This is SEO (search engine optimization) in action. It helps with the ranking of pages in search results (on google, yahoo, bing, ...), because search engines give higher rankings to pages which URL's contain keywords the user is searching for.
Theoretically, it's for SEO reasons. The software ignores the part following the identifier (the "slug"), but the idea is that search engine crawlers consider the description part of the link text, and thus weigh the resulting page higher in search results. Whether this actually happens in any meaningful way, I don't know for sure.
A more practical use is that you can gain a better idea of where a link's pointing just by inspecting the slug, which is handy if you've got multiple question URLs.
In addition to the benefit for Search Engines (the text in the URL is powerful), the fact is for all practical purposes this URL does not "change". A change can be defined as something which causes a link at some point in the future not to work - this would never be the case with this URL. The varying text at the end does not affect any user's ability to access the underlying resource.
"cool URIs do not change"
Cool URIs can change, as long as the old ones are still fine.
Maybe someone will decide that “Lack of invariance in stackoverflow URL. Why ?” is a bad question title, and change it to “Why is there redundant information in SO URLs?”. It would be good if the slug can update to reflect the new title, especially if the reason we wanted to change it was an embarrassing typo. But the old URI must continue to work.
One drawback to non-canonical URIs is that search engines can get confused and think they're two different pages. Or they'll spot that they're two pages the same, but decide that the ‘best’ page to link to is the one with the title you don't want. This is especially bad if lots of people link to another title completely like:
http://stackoverflow.com/questions/1534161/stackoverflow-smells-of-poo
cue more embarrassment. The best way around this (though few sites bother, and SO doesn't) is to check the slug on submission and do a 301 permanent redirect to the new URI with the up-to-date slug instead. Search engines will pick up the new URI and not any malicious one with poo in it.

Best way to format pretty URLs for numeric IDs

Alright, so let's say I'm writing a forum application, and I want pretty URLs. However, all my tables use numeric IDs, so I'm not sure the best way to format the URLs for those resources. Let's pretend I'm trying to get a topic with ID 123456 and title This is a forum post. I've seen it done a couple ways:
www.example.com/topic/123456
www.example.com/topic/this-is-a-forum-post
www.example.com/topic/123456/this-is-a-forum-post
Which one would you say is, taking all things into consideration (including SEO), the optimal URL?
Sorry if this question is too vague, but it seems programming-related and it's not incredibly open-ended, as I just want to hear the pros and cons of each method.
I would go with option 3, and make the slug (the last bit) optional
Because?
The ID will always be unique... 2 people may make a thread with the name 'good news' for example
The search bots can access the slug for some SEO goodness
The slug should be optional ... Using just the ID should still give you access to the site. Perhaps if the slug isn't there you could forward to the slug'd version, if you're concerned about duplicate content. You could always use the canonical meta tag to tell Google to index the slugged version.
Another benefit of the optional slug is if someone copies and pastes the URL into a document, there is a chance it could have characters at the end chopped off (because URLs generally don't have spaces, so they don't break to new lines). Having the slug optional means there is more of a chance people will find your page.
I believe this is what Stack Overflow does.. and also notice they are doing rather well in the Search Engines.
Update
From the comments, be sure to 301 redirect any missing slug version to the correct slug.
URL 1 is definitely suboptimal. URL 2 is attractive but you run the risk of confusion if tags collide, especially if they differ only in punctuation. So I'd say URL 3 is the clear winner.
Also note that just because you display URL 3 is no reason not to accept all 3, with the other two redirecting. If URL 2 is ambiguous, it should redirect to a disambiguation page.
I would think that the 2nd URL would be the best for SEO since it is meaningful and has less depth. It's nicer for people as well since you can look at the URL and know what the content is about.
Doesn't include the title, so you'll lose the additional SEO value of having those keywords in the URL.
Won't work well, because it doesn't have a unique numerical ID, so what are you going to do if someone else tries to post a topic titled "This is a forum post"? Then you start getting into the weird thing digg does, where it has to give the second one the url "http://www.example.com/topic/this-is-a-forum-post_2", and so on. It makes it harder to take the URL they tried to load, and figure out exactly which topic they were trying to get to.
Has the best of both worlds, this would be my style of choice.
Stackoverflow seems to using pattern 3, with the title being ignored completely (just the id is used).
That makes for nice semantic URL, and is also easy to implement, and still works if the title changes later.
Of course, the title could be completely fake:
Best way to format pretty URLs for numeric IDs
I'll go for the first one. You know it really doesn't matter now. Since there are Long URLs converter and it will just proliferate and will become the norm in the future. Remember the longer your URL the less SEO points you'll get.
And you can't control the way people name their forum topics. So really, I'll just choose the first one for simplicity and the norm.
For SEO/traffic, definitely no.2 without a doubt. Get those meaningless numbers out of the URL every single time.
www.example.com/topic/this-is-a-forum-post
pickup the "this-is-a-forum-post" from your database and map it back to the ID number within your database via a query. Then do an internal URL re-write to the real page, something like /topic.php?ID=324342
I would go with option 2, as SEO can better understand.
Stack Overflow uses the third way, probably, that is the reason, Stack Overflow urls were not optimized for SEO. I am not sure in the above answer.
But In my experience with Google, Quite Often, I could see a solution from other forums, whereas stackoverflow solutions were almost invisible.
Best way to format pretty URLs for numeric IDs
Best way to format pretty URLs for numeric IDs
if the both urls were one and the same, the SEO simply goes with option 2, which is less optimized.
I'm not convinced longer URL's are SEO trouble. The depth seems to be a bigger issue, and not by counting slashes, but by steps it takes to get from an indexed page with rank to the content page. I recently created a dummy test page titled /content/roofing/how-much-does-a-shingle-roof-cost.html and threw it on the server just to test pathways and make sure my directories were working correctly. I'm not even sure how google discovered the page but it did and it started getting traffic, so I had to give it content and make it part of the family. The dummy content was a copy of our about page so it wasn't empty, but I was surprised an unpromoted page would get traction, and think the URL had something to do with that.
Which brings up a slight alternative to the above 3 choices for a URL. What if you went with number 3 but added .html to the end? I generally do this with dynamic URL's but I have no concrete evidence that it's helpful. According to Google they brag that they can index dynamic URL's just fine and so there's no need to do URL rewrites at all. Google doesn't mind a bit if the other engines aren't as good at that. Several sites I trust add the html at the end (blogger for example) and it can't hurt, so I still do it.
i would suggest the first one, since the topic title can be changed for clarity, by the admins and then the url will be inconsistent.
www.example.com/topic/123456
also allows one to just edit the last bit of the url (the numbers and jump to another topic), not likely to happen but still a usable feature.

Why do some websites add "Slugs" to the end of URLs? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
Many websites, including this one, add what are apparently called slugs - descriptive but as far as I can tell useless bits of text - to the end of URLs.
For example, the URL the site gives for this question is:
https://stackoverflow.com/questions/47427/why-do-some-websites-add-slugs-to-the-end-of-urls
But the following URL works just as well:
https://stackoverflow.com/questions/47427/
Is the point of this text just to somehow make the URL more user friendly or are there some other benefits?
The slugs make the URL more user-friendly and you know what to expect when you click a link. Search engines such as Google, rank the pages higher if the searchword is in the URL.
Usability is one reason, if you receive that link in your e-mail, you know what to expect.
SEO (search engine optimization) is another reason. Search engines such as google will rank your page higher for the keywords contained in the url
I recently changed my website url format from:
www.mywebsite.com/index.asp?view=display&postid=100
To
www.mywebsite.com/this-is-the-title-of-the-post
and noticed that click through rates to article increased about 300% after the change. It certainly helps the user decide if what they're thinking of clicking on is relevant, in terms of SEO purposes though I have to say I've seen little impact after the change
I agree with other responses that any mis-typed slug should 301-redirect to the proper form. In other words, https://stackoverflow.com/questions/47427/wh should redirect to https://stackoverflow.com/questions/47427/why-do-some-websites-add-slugs-to-the-end-of-urls . It has one other benefit that hasn't been mentioned--if you do not do a redirect to a canonical URL, it will appear that you have a near-infinite number of duplicate pages. Google hates duplicate content.
That said, you should really only care about the content ID and allow any input for the slug as long as you redirect. Why?
https://stackoverflow.com/questions/47427/why-do-some-websites-add-slugs-to-the-end-of-urls
... Oops, the mail software cut off the end of the URL! No problem though because you still can roll with just https://stackoverflow.com/questions/47427
The one big problem with this approach is if you derive the slug from the title of your content, how are you going to deal with non-ASCII, UTF-8 titles?
The reason most sites use it is probably SEO (Search Engine Optimization). Yahoo used to give a reasonable weighting to the presence of the search keyword in the URL itself, and it also helped in the Google result as well.
More recently the search engines have lowered the weighting given to keywords in the URL, likely because the technique is now more common on spam sites than legitimate. Keywords in the URL now have only a very minor impact on the search results, if at all.
As for stackoverflow itself, SEO might be a motivation (old habits die hard) or simply for usability.
It's basically a more meaningful location for the resource. Using the ID is perfectly valid but it means more to machines than people.
Strictly speaking the ID shouldn't be needed if the slug is unique, you can more easily ensure unique slugs by scoping them inside dates.
ie:
/2008/sept/06/why-some-websites-add-slugs-end-of-urls/
Basically this exploits the low likelihood of two identical slugs being in use on the same day. If there is a clash the general convention is to add a counter at the end of the slug but it's rare that you ever see these:
/2008/sept/06/why-some-websites-add-slugs-end-of-urls/
/2008/sept/06/why-some-websites-add-slugs-end-of-urls-1/
/2008/sept/06/why-some-websites-add-slugs-end-of-urls-2/
A lot of slug algorithms also get rid of common words like "the" and "a" to assist in keeping the URL short. This scoped approach also makes it very straightforward to find all resources for a given day, month or year - you simply chop off segments.
Additionally, stackoverflow URLs are bad in the sense that they introduce an additional segment in order to feature the slug, which is a violation of the idea that each segment should represent descending a resource hierarchy.
The term slug comes from the newspaper/publishing business. It's a short title that's used to identify a story in progress. People interested in URL semantics started using a short, abbreviated title in their URLs. It also pays off in SEO land, as keywords in URLs add importance to a page.
Ironically, lots of websites have started place a full serialized-with-hyphens version of the titles in their URLs for strictly SEO purposes, which means the term slug no longer quite applies. This also rankles semantic purists, as many implementations just tack this serialized version of the title at the end of their URLs.
I note that you can change the text freely. This URL appears to work just as well.
https://stackoverflow.com/questions/47427/why-is-billpg-so-very-awesome
As already stated, the 'slug' helps people and the search engines...
Something worth noticing, is that in the source of the page there is a canonical url
This stops the page from being index multiple times.
Example:
<link rel="canonical" href="http://stackoverflow.com/questions/47427/why-do-some-websites-add-slugs-to-the-end-of-urls">
Remove the formatting from your question, and you'll see part of the answer:
https://stackoverflow.com/questions/47427/
vs
https://stackoverflow.com/questions/47427/why-do-some-websites-add-slugs-to-the-end-of-urls
With no markup, the second one is self-descriptive.
Don't forget readability when sending a link, not just in search engines. If you email someone the first link they can look at the URL and get a general idea of what it is about. The second one gives no indication of the content of that page before they click.
If you emailed someone a link wouldn't it make more sense to include a description by actually writing out a description rather than making the other person parse to the URL where the description exists, and try-to-read-a-bunch-of-hyphenated-words-stuck-together.
First off, it's SEO and user friendly, but in the case of the example (this site), it's not done well or correctly
(as it is open to black hat tricks and rank poisoning by others, which would reflect badly on this site).
If
https://stackoverflow.com/questions/47427/why-do-some-websites-add-slugs-to-the-end-of-urls
has the content, then
https://stackoverflow.com/questions/47427/
and
https://stackoverflow.com/questions/47427/any-other-bollix
should not be duplicates. They should actually automatically detect the link followed is not using the current text (as obviously the slug is defined by the question title and can be later edited) and they should redirect 301 automatically to
https://stackoverflow.com/questions/47427/why-do-some-websites-add-slugs-to-the-end-of-urls
thus ensuring the "one piece of content to one URI" rule, and if the URI moves/changes, ensure the old bookmarks follow/move with it through 301 redirects (so intelligent browsers can update the bookmarks).
Ideally, the "slug" should be the only identifier needed. In practice, on dynamic sites such as this, you either have to have a unique numerical identifier or start appending/incrementing numbers to the "slug" like Digg does.

Resources