I am doing one application that uses wikipedia data for show some information.
I have a link to each of the wikipedia topics that I need, but I will localize my app to different languages, so I need to know if there is a way to get links for each language if I have the a link to one language using the Wikipedia API.
Example:
In my application I have this link: http://en.wikipedia.org/wiki/Stack_Overflow
I need a safe way to get this links (if exist, if not I will show the english one):
de --> http://de.wikipedia.org/wiki/Stack_Overflow_%28Website%29
es --> http://es.wikipedia.org/wiki/Stack_Overflow
pt --> does not exist
It is possible to ask for through the API. Here is a request with your example: https://en.wikipedia.org/w/api.php?action=query&prop=langlinks&format=json&llprop=url%7Clangname&titles=Stack%20overflow
Related
In my application I have localized urls that look something like this:
http://examle.com/en/animals/elephant
http://examle.com/nl/dieren/olifant
http://examle.com/de/tiere/elefant
This question is mainly for Facebook Likes, but I guess I will hit similar problems when I start thinking about search engine crawlers.
What kind of url would you expect as canonical url? I don't want to use the exact english url, because I want that people clicking the link will be forwarded to their own language (browser setting/dependent on IP).
The IP lookup is not something that I want to do on every page hit. Besides that I would need to incorporate more 'state' in my application, because I have to check wether a user has already been forwarded to his own locale, or is browsing the english version on purpose.
I guess it will going to be something like:
http://example.com/something/animals/elephant
or maybe without any language identifier at all:
http://example.com/animals/elephant
but that is a bit harder to implement, bigger chance on url clashes in the future (in the rare case I would get a category called en or de).
Summary
What kind of url would you expect as canonical url? Is there already a standard set for this?
I know this question is a bit old, but I was facing the same issue.
I found this:
Different language versions of a single page are considered duplicates only if the main content is in the same language (that is, if only the header, footer, and other non-critical text is translated, but the body remains the same, then the pages are considered to be duplicates).
That can be found here: https://developers.google.com/search/docs/advanced/crawling/consolidate-duplicate-urls
From this I can conclude that we should add locales to canonicals.
I did find one resource that recommends not using the canonical tag with localized addresses. However, Google's documentation does not specify and only mentions subdomains in another context.
There is more that that language that you need to think of.
It's typical a tuple of 3 {region, language, property}
If you only have one website then you have {region, language} only.
Every piece of content can either be different in this 3 dimensional space, or at least presented differently. But this is the same piece of content so you'd like to centralize managing of editorial signals, promotions, tracking etc etc. Think about search systems - you'd like page rank to be merged across all instances of the article, not spread thinly out.
I think there is a standard solution: Canonical URL
Put language/region into the domain name
example.com
uk.example.com
fr.example.com
Now you have a choice how you attach a cookie for subdomain (for language/region) or for domain (for user tracking)!
On every html page add a link to canonical URL
<link rel="canonical" href="http://example.com/awesome-article.html" />
Now you are done.
There certainly is no "Standard" beyond it has to be an URL. What you certainly do see on many comercial websites is exactly what you describe:
<protocol>://<server>/<language>/<more-path>
For the "language-tag" you may follow RFCs as well. I guess your 2-letter-abbrev is quite fine.
I only disagree on the <more-path> of the URL. If I understand you right you are thinking about transforming each page into a local-language URL? I would not do that. Maybe I am not the standard user, but I personally like to manually monkey around in URLs, i.e. if the URL shown is http://examle.com/de/tiere/elefant, but I don't trust the content to be translated well I would manually try http://examle.com/en/tiere/elefant -- and that would not bring me to the expected page. And since I also dislike those URLs http://ex.com/with-the-whole-title-in-the-url-so-the-page-will-be-keyworded-by-search-engines my favorite would be to just exchange the <language> part and use generic english (or any other language) for <more-path>. Eg:
http://examle.com/en/animals/elephant
http://examle.com/nl/animals/elephant
http://examle.com/de/animals/elephant
If your site is something like Wikipedia, then I would agree to your scheme of translating the <more-part> as well.
Maybe this Google's guidelines can help with your issue: https://support.google.com/webmasters/answer/189077?hl=en
It says that many websites serve users (across the world) with content targeted to users in a certain region. It is advised to use the rel="alternate" hreflang="x" attributes to serve the correct language or regional URL in Search results.
I have multilanguage website. Actually, the website language is chosen according to the web browser language.
Is there any way to set the language according to the search engine spider? For example:
Display the website in Chinese for Baidu search engine spider,
Display the website in Russian for Yandex spider?
This is called crawler identification. When a request is made to your website, User-Agent field contains the information about the browser or the crawler.
Depending on the crawler, the value of this field will be different. You can then associate different values with different languages. You can also take a look at the large list of user agents.
I'm still pretty sure that by doing this, you'll lower your rank in search engines since you provide different responses to crawlers than to real users, but I don't have solid references to support this statement.
In all cases, crawlers are expected to gather resources in different languages, and those crawlers know how to deal with multilingual websites, except maybe the ones which try to follow every worst practice. Also, the search engines you quoted are not limited to one language. Yandex is available for example in Turkish. As for Baidu, According to Wikipedia, it serves China, Japan, Thailand, Egypt and India.
I'm currently writing an ASP.NET MVC 3 web application that supports multiple languages.
I already managed to translate all the routes so that calls like:
www.mysite.de/Kontakt and www.mysite.de/Contact will route to the same Controller/Action.
By design it is so that when calling www.mysite.de the language (stored in the session object) will automatically be set to a default language (here German). The navigation of the site is then dynamically setup accordingly.
The language in the session object can be changed by either hitting the "English version" link or when manually calling e.g. www.mysite.de/Contact. In this case it is recognized that the link (/Contact) matches a route that is
defined as English and so I change the language in the session object to English. Of course the content of the sites is also localized.
My question now is how does that cooperate with SEO, especially with Google?
I already add the Content-Language meta tag dynamically to each page. So I think that with a proper sitemap.xml should be sufficient.
Does Google recognize this correctly? Is it when searching Google in German that I get "Kontakt" as result and "Contact" when searching in English?
Another issue is what happens when the link is the same for different languages? E.g. the link to "Jobs" would/could be the same as well in English as in German.
I hope that the question is understandable as my issue is rather complicated.
Cheers,
Simon
Google does not only rely on you telling them what language your site is in, you only hint them.
The pages will be analyzed and presented as a page in "German" or a page in "English" based on the language of the content.
But your base assumption is correct.
Yes, if I search for your page in German, and Google has indexed the page as a page in German, Google will return Kontakt.
As for your second question, unless you provide another mean to change the language other than the path (query string or language in browser setting), those links will only be in your default (German) language.
If you would like them to appear in english, use a different, additional URL: Jobs-EN that you only have in your SiteMap.xml (and route, of course).
Another issue is what happens when the link is the same for
different languages? E.g. the link to "Jobs" would/could be
the same as well in English as in German.
You might consider having the language as part of your URL, for example:
www.mysite.de/de/Kontakt
www.mysite.de/en-us/Contact
www.mysite.de/en-gb/Contact
I would like to get translation from one ( best - automatically detected) language to 4 different using google-translate. My idea is to wrote a html document which contain 4 frames - in one of them I can find text form and button. After click on it, Internet browser will send demand to google translate and show results in 4 frames.
If you want a self service, hosted service that does translations and content management for you check out Localize.js
This is going to be terribly translated. As someone that speaks English well, Russian poorly, and Spanish even more poorly, I can detect that these auto-translations never come out right.
My recommendation is to serve your page through a basic system that will allow you to respond to submitted form values. Pass in &LANG=two country iso code and then have your backend serve up the correct data.
Have someone that speaks both languages prepare the content for you. Then, whenever you are serving these pages, you can also conditionally adjust CSS to account for differences in format which come from difference in language length.
If you don't have those capabilities available, make 5 pages. One in English and the other 4 in the other languages. You will seriously seem retarded to anyone that speaks those languages well if you use an auto-translate. I think this is a bad idea for any kind of professional page, even if you can work out the technical issues.
-Brian J. Stinar-
Google has an API to its translate tool that will enable you to send it some text and receive back that text translated into any language you choose.
edit: This is now a paid service
What is the best way to determine the language of twitter posts.
There is the language parameter that comes with the streaming API but it doesn't really seem to be very accurate. Even many Japanese posts are labelled as English.
What have others done to sort out the langauges?
I've had very good results with this PHP package:
http://pear.php.net/package/Text_LanguageDetect/
It is fast and open source. We use it to select English only posts for a site we run at http://2012twit.com.
google have language detection within their Translate API if using evil external services is a go-er?
http://code.google.com/apis/language/translate/v1/reference.html#detectResult