Url format for internationalized web app? - url

Scenario
The web server gets a request for http://domain.com/folder/page. The Accept-Language header tells us the user prefers Greek, with the language code el. That's good, since we have a Greek version of page.
Now we could do one of the following with the URL:
Return a Greek version keeping the current URL: http://domain.com/folder/page
Redirect to http://domain.com/folder/page/el
Redirect to http://domain.com/el/folder/page
Redirect to http://el.domain.com/folder/page
Redirect to http://domain.com/folder/page?hl=el
...other alternatives?
Which one is best? Pros, cons from a user perspective? developer perspective?

I would not go for option 1, if your pages are publically available, i.e. you are not required to log in to view the pages.
The reason is that a search engine will not scan the different language versions of the page.
The same reason goes agains option no 5. A search engine is less likely to identify two pages as separate pages, if the language identification goes in the query string.
Lets look at option 4, placing the language in the host name. I would use that option if the different language versions of the site contains completely different content. On a site like Wikipedia for example, the Greek version contains its own complete set of articles, and the English version contains another set of articles.
So if you don't have completely different content (which it doesn't seem like from your post), you are left with option 2 or 3. I don't know if there are any compelling arguments for one over the other, but no. 3 looks nicer in my eyes. So that is what I would use.
But just a comment for inspiration. I'm currently working on a web application that has 3 major parts, one public, and two parts for two different user types. I have chosen the following url scheme (with en referring to language of course):
http://www.example.com/en/x/y/z for the public part.
http://www.example.com/part1/en/x/y/z for the one private part
http://www.example.com/part2/en/x/y/z for the other private part.
The reason for this is that if I were to split the three parts up into separate applications, it will be a simple reconfiguration in the web server when I have the name of the part at the top of the path. E.g. if we were to use a commercial CMS system for the public part of the site
Edit:
Another argument against option no. 1 is that if you ONLY listen to accept-language, you are not giving the user a choice. The user may not know how to change the language set up in a browser, or may be using a frinds computer setup to a different language. You should at least give the user a choice (storing it in a cookie or the user's profile)

I'd choose number 3, redirect to http://example.com/el/folder/page, because:
Language selection is more important than a page selection, thus selected language should go first in a true human-readable URL.
Only one domain gets all Google's PR. That's good for SEO.
You could advert your site locally with a language code built-in. E.g. in Greece you would advert as http://example.com/el/, so every local visitor will get to a site in Greece and would avoid language-choosing frustration.
Alternatively, you can go for number 5: it is fine for Google and friends, but not as nice for a user.
Also, we should refrain to redirect a user anywhere, unless required. Thus, in my mind, a user opening http://example.com/folder/page should get not a redirect, but a page in a default language.

Number four is the best option, because it specifies the language code pretty early. If you are going to provide any redirects always be sure to use a canonical link tag.

Pick option 5, and I don't believe it is bad for SEO.
This option is good because it shows that the content for say:
http://domain.com/about/corporate/locations is the same as the content in
http://domain.com/about/corporate/locations?hl=el except that the language differs.
The hl parameter should override the Accept-language header so that the user can easily control the matter. The header would only be used when the hl parameter is missing. Granted linking is a little complicated by this, and should probably be addressed through either a cookie which would keep the redirection going to the language chosen by the hl parameter (as it may have been changed by the user from the Accept-language setting, or by having all the links on the page be processed for adding on the current hl parameter.
The SEO issues can be addressed by creating index files for everything like stackoverflow does, these could include multiple sets of indices for the different languages, hopefully encouraging showing up in results for the non-default language.
The use of 1 takes away the differentiator in the URL. The use of 2 and 3 suggest that the page is different, possibly beyond just language, like wikipedia is. And the use of 4 suggests that the server itself is separated, perhaps even geographically.
Because there is a surprisingly poor correlation of geographic location to language preferences, the issue of providing geographically close servers should be left to a proper CDN setup.

My own choice is #3: http://domain.com/el/folder/page. It seems to be the most popular out there on the web. All the other alternatives have problems:
http://domain.com/folder/page --- Bad for SEO?
http://domain.com/folder/page/el --- Doesn't work for pages with parameters. This looks weird: ...page?par1=x&par2=y/el
http://domain.com/el/folder/page --- Looks good!
http://el.domain.com/folder/page --- More work needed to deploy since it requires adding subdomains.
http://domain.com/folder/page?hl=el --- Bad for SEO?

It depends. I would choose number four personally, but many successful companies have different ways of achieving this.
Wikipedia uses subdomains for various
languages (el.wikipedia.org).
So does Yahoo (es.yahoo.com for Spanish), although it doesn't support Greek.
So does Gravatar (el.gravatar.com)
Google uses a /intl/el/ directory.
Apple uses a /gr/ directory (albeit in English and limited to an iPhone page)
It's really up to you. What do you think your customers will like the most?

None of them. A 'normal user' wouldn't understand (and so remember) any of those abbreviations.
In order of preference I'd suggest:
http://www.domain.gr/folder/page
http://www.domain.com/
http://domain.com/gr/folder/page

3 or 4.
3: Can be easily dealt with using htaccess/mod_rewrite. The only downside is that you'd have to write some method of automatically injecting the language code as the first segment of the URI.
4: Probably the best method. Using host headers, it can all be sent to the same web application/content but you can then use code to extract the language code and go from there.
Simples. ;)

I prefer 3 or 4

Related

URL structure for multilingual websites

I'm developing a SPA web app and it will support various languages. It is build with AngularJS and I am using angular-translate to provide i18n.
But I am struggling a little bit with how the URL structure should be. I do no plan on using either gTLDs nor ccTLDs, so that leaves me with three options.
Use query params: ?locale=en-us
Use url paths: /en-us/page
Store the chosen locale in localStorage or a cookie
The first option is a no-go according to Google's guidelines for web apps SEO. So that leaves me with the last two options.
I have a hard time deciding which is more beneficial, though I am inclined to believe that using url paths would probably be more crawler friendly.
P.S: Not sure if this is the best place to ask such a question either.
The second option is your safest bet as according to https://webmasters.stackexchange.com/questions/59652/what-happens-if-i-try-to-set-a-cookie-on-a-bot cookies are ignored. You can test this yourself by going to the Google Console and fetching your website.
As of now most crawlers ignore cookies and DO NOT execute JavaScript. This means that they usually just download the html and make their judgements from there.
Some developers get around the no javascript problem by pre-rendering parts of their content. I haven't done it personally but you might want to check out https://prerender.io/
Edit
As rolandjitsu mentioned google crawls and executes javascript content.
You should go with second option: provide the language tag (and, optionally, region subtags) in the URL path as first segment.
For the simple reason that it allows you, visitors, and bots to link to specific translations.

Canonical url and localization

In my application I have localized urls that look something like this:
http://examle.com/en/animals/elephant
http://examle.com/nl/dieren/olifant
http://examle.com/de/tiere/elefant
This question is mainly for Facebook Likes, but I guess I will hit similar problems when I start thinking about search engine crawlers.
What kind of url would you expect as canonical url? I don't want to use the exact english url, because I want that people clicking the link will be forwarded to their own language (browser setting/dependent on IP).
The IP lookup is not something that I want to do on every page hit. Besides that I would need to incorporate more 'state' in my application, because I have to check wether a user has already been forwarded to his own locale, or is browsing the english version on purpose.
I guess it will going to be something like:
http://example.com/something/animals/elephant
or maybe without any language identifier at all:
http://example.com/animals/elephant
but that is a bit harder to implement, bigger chance on url clashes in the future (in the rare case I would get a category called en or de).
Summary
What kind of url would you expect as canonical url? Is there already a standard set for this?
I know this question is a bit old, but I was facing the same issue.
I found this:
Different language versions of a single page are considered duplicates only if the main content is in the same language (that is, if only the header, footer, and other non-critical text is translated, but the body remains the same, then the pages are considered to be duplicates).
That can be found here: https://developers.google.com/search/docs/advanced/crawling/consolidate-duplicate-urls
From this I can conclude that we should add locales to canonicals.
I did find one resource that recommends not using the canonical tag with localized addresses. However, Google's documentation does not specify and only mentions subdomains in another context.
There is more that that language that you need to think of.
It's typical a tuple of 3 {region, language, property}
If you only have one website then you have {region, language} only.
Every piece of content can either be different in this 3 dimensional space, or at least presented differently. But this is the same piece of content so you'd like to centralize managing of editorial signals, promotions, tracking etc etc. Think about search systems - you'd like page rank to be merged across all instances of the article, not spread thinly out.
I think there is a standard solution: Canonical URL
Put language/region into the domain name
example.com
uk.example.com
fr.example.com
Now you have a choice how you attach a cookie for subdomain (for language/region) or for domain (for user tracking)!
On every html page add a link to canonical URL
<link rel="canonical" href="http://example.com/awesome-article.html" />
Now you are done.
There certainly is no "Standard" beyond it has to be an URL. What you certainly do see on many comercial websites is exactly what you describe:
<protocol>://<server>/<language>/<more-path>
For the "language-tag" you may follow RFCs as well. I guess your 2-letter-abbrev is quite fine.
I only disagree on the <more-path> of the URL. If I understand you right you are thinking about transforming each page into a local-language URL? I would not do that. Maybe I am not the standard user, but I personally like to manually monkey around in URLs, i.e. if the URL shown is http://examle.com/de/tiere/elefant, but I don't trust the content to be translated well I would manually try http://examle.com/en/tiere/elefant -- and that would not bring me to the expected page. And since I also dislike those URLs http://ex.com/with-the-whole-title-in-the-url-so-the-page-will-be-keyworded-by-search-engines my favorite would be to just exchange the <language> part and use generic english (or any other language) for <more-path>. Eg:
http://examle.com/en/animals/elephant
http://examle.com/nl/animals/elephant
http://examle.com/de/animals/elephant
If your site is something like Wikipedia, then I would agree to your scheme of translating the <more-part> as well.
Maybe this Google's guidelines can help with your issue: https://support.google.com/webmasters/answer/189077?hl=en
It says that many websites serve users (across the world) with content targeted to users in a certain region. It is advised to use the rel="alternate" hreflang="x" attributes to serve the correct language or regional URL in Search results.

translate to multiple languages

I would like to get translation from one ( best - automatically detected) language to 4 different using google-translate. My idea is to wrote a html document which contain 4 frames - in one of them I can find text form and button. After click on it, Internet browser will send demand to google translate and show results in 4 frames.
If you want a self service, hosted service that does translations and content management for you check out Localize.js
This is going to be terribly translated. As someone that speaks English well, Russian poorly, and Spanish even more poorly, I can detect that these auto-translations never come out right.
My recommendation is to serve your page through a basic system that will allow you to respond to submitted form values. Pass in &LANG=two country iso code and then have your backend serve up the correct data.
Have someone that speaks both languages prepare the content for you. Then, whenever you are serving these pages, you can also conditionally adjust CSS to account for differences in format which come from difference in language length.
If you don't have those capabilities available, make 5 pages. One in English and the other 4 in the other languages. You will seriously seem retarded to anyone that speaks those languages well if you use an auto-translate. I think this is a bad idea for any kind of professional page, even if you can work out the technical issues.
-Brian J. Stinar-
Google has an API to its translate tool that will enable you to send it some text and receive back that text translated into any language you choose.
edit: This is now a paid service

Rails - search engine indexing of redirect action

I have a multilingual site with the same content in different languages with descriptive seo urls incorporating the title of each pages article. To switch between said languages of translated articles I have an action which looks up the translated title using the previous language and redirects to it. This all works fine except I noticed, despite there being no view, google has indexed said redirect urls.
Is this bad practice? I don't want to 301 redirect as it seems having links on every page to 301 redirects is a really bad idea. Do I somehow include a meta tag or is there some other approach?
The reason I currently have this is I want each article page to link to all of its translations using flags at the top of each page. The more I think about it I should just generate the direct url as this itself may have seo benefits. The reason I didn't go down this path originally was page rendering speed. I'd have to look up multiple articles solely for their url slug and expire caches of all languages upon any title change (it's a wiki style user generated content). Also, in some cases a translation wouldn't exist in which case I would need to link instead, say, to the category of article with a flash message.
So thinking through this while writing maybe this seems the preferable if more difficult to implement solution?
Hey Mark, from a search engine perspective you definitely don't want to rely on redirects everywhere, if for nothing other than performance. Search engines allocate a certain amount of bandwidth to each site based on ranking, if you're redirecting every page, you're eating up more of that bandwidth than you need to, and potentially not getting as much content crawled as you could otherwise.
Your second solution of generating the localized URLs and sticking them at the top of the page is the best option for search engines. That will give a unique URL for each page, and will provide a direct link to each page that Google and Bing (e.g. Yahoo) can follow and index.
I provided a set of best practices for SEO & Localized sites on another stackoverflow Q&A, here's a link, I think you'll find it valuable too: Internationalization and Search Engine Optimization
Good luck!
I have an app that I'm building that supports ten languages: English, simplified and traditional Chinese, French, Spanish, Russian, Japanese, German, and Hindi.
I tried a number of things but what I ended up doing was making :en default and then switching by where the request was coming from and then when uses signup they can set a default language. So if it was coming from mainland China I use :scn, and if it comes from Hong Kong I use :tcn traditional Chinese/simplified Chinese.
This way the application maintains a state of a language and there is no redirection.
I think any redirection is going to be troublesome so I wouldn't do that. Also, I am working on a dynamic site map that will list all of the links to google, which will have 10 different translations per 'page'.
I haven't deployed my application yet so I cannot check the Chinese search engines etc... to see if they are indexing my content.

What is the advantage of putting the language indicator into the URL?

I'm doing a site which supports multiple languages. At the moment, I’m doing like /en/… in the URL path and using .htaccess to determine which language the user is on. Actually, this is very common for sites with multiple languages to either do http://en.example.com or http://example.com/en/.
My question is: Why is it so common to show in the URL which language the user is viewing? I can't see any technical advantages. Is it for optimizing user experience?
Because you could easily just use sessions/cookies and hide it from the user which I'm leaning to at the moment.
Thanks in advance :)
For easy bookmarking probably.
Specifying the language information in the URL is 1 way to indicate that you want to view in that particular language, ignoring your current locale.
Wrapping this information in the URL is better than using a cookie for example, as some users may delete all cookies after each browsing session.
And because of this pseudo REST like URL, /en/, it is easily bookmarkable, and search engine friendly
I think it's used as a substitute for not owning the domain within each TLD. (ie company.co.uk and company.com).
It's also usable because of the uri's possibility to be localised: ikea.com/se/stolar could be the localised variant of ikea.com/en/chairs; usable both for the end user and SEO.
It is not directory, but mod_rewrite - such url as:
http://google.pl/en
gets rewritten server side for:
http://google.pl?lang=en
and for every language it will be more handy.
Why? Because if client saves link to our page in favorites and sends it to his friend, he can pass also the language of the page he was viewing. If the default language was for example polish, and he changed it to english, he saves friend some time to search and click specific button.
If you put it in the URL the search engines will store every page in every language. If you use cookies, they will only store one. So it's more a SEO advantage I think.

Resources