Good or Bad for SEO: Keeping URLs in English for a non-english website? - url

I'm planning to release a community website that doesn't have a PRIMARY audience that is english speaking. This means that URLs that point to /profile /forums and so on will be in english and not in their native language. I'm not concerned if a user is using the website while accessing different URL paths in English, but I am concerned if I were to use non english URLs then would a search engine pickup on pages on the website better or worse?
Anyone care to share their opinions?

In my opinion, it would be better to have URLs that reflect the primary language of your users as it would make them finding your website easier on search engines (supposing they search using their primary language). From a SEO perspective, if possible try to also include in your URLs the relevant search terms you think would be used by your audience. If you have a forum, for example, include in the thread URLs the full thread title if possible, and so on.
Sources: my own experience with building and managing powershell.it and sqlserver.it, two of the most important Italian technology-related communities.

The best place to start on this issue would be Google's Webmaster Central section on Internationalization.
If you will have versions of the same URL in multiple languages, you can connect them using the rel="alternate"mechanism, which is explained at Google's Webmaster Tools page.

1. Summary
Using non-English URLs for non-English websites is fine.
2. Argumentation
Google Senior Webmaster Trends Analyst John Mueller said in a recent SEO snippets video that using non-English URLs for non-English websites is fine and that Google is able to crawl, index and rank them.
This includes non-Latin characters in your URLs. John Mueller said “as long as URLs are valid and unique, that’s fine.” He added, “So to sum it up, yes, non-English words and URLs are fine, and we recommend using them for non-English websites.”
Read full article here.
3. Disclaimer
Data of this answer were relevant in March 2018 and may be obsolete in the future.

Related

Canonical url and localization

In my application I have localized urls that look something like this:
http://examle.com/en/animals/elephant
http://examle.com/nl/dieren/olifant
http://examle.com/de/tiere/elefant
This question is mainly for Facebook Likes, but I guess I will hit similar problems when I start thinking about search engine crawlers.
What kind of url would you expect as canonical url? I don't want to use the exact english url, because I want that people clicking the link will be forwarded to their own language (browser setting/dependent on IP).
The IP lookup is not something that I want to do on every page hit. Besides that I would need to incorporate more 'state' in my application, because I have to check wether a user has already been forwarded to his own locale, or is browsing the english version on purpose.
I guess it will going to be something like:
http://example.com/something/animals/elephant
or maybe without any language identifier at all:
http://example.com/animals/elephant
but that is a bit harder to implement, bigger chance on url clashes in the future (in the rare case I would get a category called en or de).
Summary
What kind of url would you expect as canonical url? Is there already a standard set for this?
I know this question is a bit old, but I was facing the same issue.
I found this:
Different language versions of a single page are considered duplicates only if the main content is in the same language (that is, if only the header, footer, and other non-critical text is translated, but the body remains the same, then the pages are considered to be duplicates).
That can be found here: https://developers.google.com/search/docs/advanced/crawling/consolidate-duplicate-urls
From this I can conclude that we should add locales to canonicals.
I did find one resource that recommends not using the canonical tag with localized addresses. However, Google's documentation does not specify and only mentions subdomains in another context.
There is more that that language that you need to think of.
It's typical a tuple of 3 {region, language, property}
If you only have one website then you have {region, language} only.
Every piece of content can either be different in this 3 dimensional space, or at least presented differently. But this is the same piece of content so you'd like to centralize managing of editorial signals, promotions, tracking etc etc. Think about search systems - you'd like page rank to be merged across all instances of the article, not spread thinly out.
I think there is a standard solution: Canonical URL
Put language/region into the domain name
example.com
uk.example.com
fr.example.com
Now you have a choice how you attach a cookie for subdomain (for language/region) or for domain (for user tracking)!
On every html page add a link to canonical URL
<link rel="canonical" href="http://example.com/awesome-article.html" />
Now you are done.
There certainly is no "Standard" beyond it has to be an URL. What you certainly do see on many comercial websites is exactly what you describe:
<protocol>://<server>/<language>/<more-path>
For the "language-tag" you may follow RFCs as well. I guess your 2-letter-abbrev is quite fine.
I only disagree on the <more-path> of the URL. If I understand you right you are thinking about transforming each page into a local-language URL? I would not do that. Maybe I am not the standard user, but I personally like to manually monkey around in URLs, i.e. if the URL shown is http://examle.com/de/tiere/elefant, but I don't trust the content to be translated well I would manually try http://examle.com/en/tiere/elefant -- and that would not bring me to the expected page. And since I also dislike those URLs http://ex.com/with-the-whole-title-in-the-url-so-the-page-will-be-keyworded-by-search-engines my favorite would be to just exchange the <language> part and use generic english (or any other language) for <more-path>. Eg:
http://examle.com/en/animals/elephant
http://examle.com/nl/animals/elephant
http://examle.com/de/animals/elephant
If your site is something like Wikipedia, then I would agree to your scheme of translating the <more-part> as well.
Maybe this Google's guidelines can help with your issue: https://support.google.com/webmasters/answer/189077?hl=en
It says that many websites serve users (across the world) with content targeted to users in a certain region. It is advised to use the rel="alternate" hreflang="x" attributes to serve the correct language or regional URL in Search results.

How to set different languages for different spiders on a website?

I have multilanguage website. Actually, the website language is chosen according to the web browser language.
Is there any way to set the language according to the search engine spider? For example:
Display the website in Chinese for Baidu search engine spider,
Display the website in Russian for Yandex spider?
This is called crawler identification. When a request is made to your website, User-Agent field contains the information about the browser or the crawler.
Depending on the crawler, the value of this field will be different. You can then associate different values with different languages. You can also take a look at the large list of user agents.
I'm still pretty sure that by doing this, you'll lower your rank in search engines since you provide different responses to crawlers than to real users, but I don't have solid references to support this statement.
In all cases, crawlers are expected to gather resources in different languages, and those crawlers know how to deal with multilingual websites, except maybe the ones which try to follow every worst practice. Also, the search engines you quoted are not limited to one language. Yandex is available for example in Turkish. As for Baidu, According to Wikipedia, it serves China, Japan, Thailand, Egypt and India.

Determining language of twitter posts

What is the best way to determine the language of twitter posts.
There is the language parameter that comes with the streaming API but it doesn't really seem to be very accurate. Even many Japanese posts are labelled as English.
What have others done to sort out the langauges?
I've had very good results with this PHP package:
http://pear.php.net/package/Text_LanguageDetect/
It is fast and open source. We use it to select English only posts for a site we run at http://2012twit.com.
google have language detection within their Translate API if using evil external services is a go-er?
http://code.google.com/apis/language/translate/v1/reference.html#detectResult

Rails - search engine indexing of redirect action

I have a multilingual site with the same content in different languages with descriptive seo urls incorporating the title of each pages article. To switch between said languages of translated articles I have an action which looks up the translated title using the previous language and redirects to it. This all works fine except I noticed, despite there being no view, google has indexed said redirect urls.
Is this bad practice? I don't want to 301 redirect as it seems having links on every page to 301 redirects is a really bad idea. Do I somehow include a meta tag or is there some other approach?
The reason I currently have this is I want each article page to link to all of its translations using flags at the top of each page. The more I think about it I should just generate the direct url as this itself may have seo benefits. The reason I didn't go down this path originally was page rendering speed. I'd have to look up multiple articles solely for their url slug and expire caches of all languages upon any title change (it's a wiki style user generated content). Also, in some cases a translation wouldn't exist in which case I would need to link instead, say, to the category of article with a flash message.
So thinking through this while writing maybe this seems the preferable if more difficult to implement solution?
Hey Mark, from a search engine perspective you definitely don't want to rely on redirects everywhere, if for nothing other than performance. Search engines allocate a certain amount of bandwidth to each site based on ranking, if you're redirecting every page, you're eating up more of that bandwidth than you need to, and potentially not getting as much content crawled as you could otherwise.
Your second solution of generating the localized URLs and sticking them at the top of the page is the best option for search engines. That will give a unique URL for each page, and will provide a direct link to each page that Google and Bing (e.g. Yahoo) can follow and index.
I provided a set of best practices for SEO & Localized sites on another stackoverflow Q&A, here's a link, I think you'll find it valuable too: Internationalization and Search Engine Optimization
Good luck!
I have an app that I'm building that supports ten languages: English, simplified and traditional Chinese, French, Spanish, Russian, Japanese, German, and Hindi.
I tried a number of things but what I ended up doing was making :en default and then switching by where the request was coming from and then when uses signup they can set a default language. So if it was coming from mainland China I use :scn, and if it comes from Hong Kong I use :tcn traditional Chinese/simplified Chinese.
This way the application maintains a state of a language and there is no redirection.
I think any redirection is going to be troublesome so I wouldn't do that. Also, I am working on a dynamic site map that will list all of the links to google, which will have 10 different translations per 'page'.
I haven't deployed my application yet so I cannot check the Chinese search engines etc... to see if they are indexing my content.

Is there any disadvantage (in SEO terms) to using a country-specific subdomain over the country's TLD?

I'm developing a site at the moment which requires localization to a number of different countries. We own our site's name on many of the countries' TLDs (though not all of them). From a developer's perspective, many things are simplified if we could simply redirect all traffic to "domainname.co.uk" to "uk.domainnname.com" (or "domainname.fr" to "fr.domainname.com") — but my boss is concerned that there may be an adverse SEO impact from doing this.
So, I'm wondering if anyone knows if there is indeed any SEO impact from doing this. The country-specific content is still there, just served from a country-specific subdomain rather than the TLD.
Sorry if this is all a bit confusing! If anyone can offer any help, that would be fantastic.
Many thanks.
From the SEO point of view, it is always better to do domainname.com/fr Why? Because all the links to domainname.com/uk and domainname.com/fr are added to the same PageRank. If you have individual domains, the links are diluted between domains.
What Richie says is not right, because you can tell Google the specific geo target using Google WebMasters Tools
Here is an example, searching only sites "from argentina" (.ar TLD) where the top result is a generic .com
alt text http://img2.imageshack.us/img2/8862/capturejl.png
A country-specific search engine like google.co.uk will understand that domainname.co.uk is a UK site, but it won't understand that about uk.domainnname.com.
If I select google.co.uk's pages from the UK option I'd expect to see the former but not the latter.
(Edit: Yes, you can configure this for Google and some other search engines, but there's more to SEO than one or two specific search engines.)

Resources