How detailed should my sitemap be for a multilingual site? - url

I have a one page website which includes an English main page, and a French Main Page. One can access my website through the following URLs:
ENGLISH VERSION OF MAIN PAGE
www.example.org
www.example.org/index.html
example.org
example.org/index.html
FRENCH VERSION OF MAIN PAGE
www.example.org/fr
www.example.org/fr/index.html
example.org/fr
example.org/fr/index.html
For optimal search engine indexing, should I include all of these URLs in my sitemap (with both http:// and https://)? If not, what would be the set of URLs I should include in my sitemap.xml file?

You should include all unique pages in your sitemap once.
All of the different URLs you listed are just different ways of accessing the same page/content, just like most PHP applications can be accessed via site.org/ or site.org/index.php. Your sitemap should include just one reference to a page.

The best practice is to have one canonical URL per document. And each canonical URL should be added to your sitemap (if you have one).
So in your case you may want to use one URL for the English main page and one URL for the French main page, and redirect (with HTTP status code 301) from the other URLs to the canonical ones. In addition, you can declare the canonical URL with the canonical link relation.
If you need to provide HTTP in addition to HTTPS (instead of enforcing HTTPS), you would of course need to have two URLs per document (one with HTTP, one with HTTPS). But you [should only list one variant in the sitemap](http://www.sitemaps.org/faq.html#faq_http_vs_https "Sitemaps.org FAQ: 'My site has both "http" and "https" versions of URLs. Do I need to list both?'"), and you should only declare one as canonical (ideally the same which you added to the sitemap).
Which URLs to choose can depend on various factors (usability, SEO, your backend, …), but it seems safe to assume that index.html is ballast. You’d have to decide if to use the www subdomain (a common convention) or not. Assuming that you choose to omit it, you could have these canonical URLs:
https://example.org/
https://example.org/fr
And you would redirect the following URLs with 301 to the canonical URLs listed above:
https://example.org/index.html
https://www.example.org/
https://www.example.org/index.html
https://example.org/fr/index.html
https://www.example.org/fr
https://www.example.org/fr/index.html

Related

Remove cpanel login pages from the Google index

Google has indexed the url login cpanel hosting of Hostgator. Ex: mysite.com:2082
Also indexed 5 pages of my site with www. So I'm with duplicate content.
Is indexed, eg mysite.com/page1 and www.mysite.com/page1
I've tried removing the Webmaster Tools, but always add a slash (/) after the domain.
When trying to send mysite.com:2082 Removal is added /, getting mysite.com/:2082
Has anyone had this problem?
Can anything be done to remove these pages?
Thank.
Google has indexed the url login cpanel hosting of Hostgator. Ex: mysite.com:2082
If you are on a shared host I don't think you can do anything about this unfortunately.
cPanel blocks the crawling of these pages with robots.txt. Unfortunately this can still result in a link-only entry in Google SERPs, with a description such as:
A description for this result is not available because of this site's robots.txt – learn more.
To prevent these pages from being indexed they either need a noindex robots meta tag, or a similar noindex X-Robots-Tag HTTP response header. And remove the Disallow directive in robots.txt (which prevents the pages from being crawled). As far as I'm aware the cPanel pages do not return an appropriate robots meta tag.
This issue has been discussed in the cPanel forums (some years ago!) and "fixes" have supposedly been released, however, I have seen no change in this behaviour.
To be honest, using robots.txt to block the crawling of these pages is arguably the most efficient method as it simply blocks the (good) bots from requesting the pages and thus reducing (just a little bit) the load on the server. In order to block these pages from Google's index you need to allow the pages to be crawled so the robots meta tags can be detected (which doesn't currently exist). Bit of a catch 22.
If you are thinking in terms of security, then preventing these page from being indexed does not really help. It's just security by obscurity at best. The cPanel login pages can easily be found by requesting the standard URL, example.com:2082.
Also indexed 5 pages of my site with www. So I'm with duplicate content.
You can set a preference for either www or no-www in Google Webmaster Tools. Or you can redirect one to the other in .htaccess. Which is your preferred URL is up to you. For instance, to redirect from none-www to www...
RewriteEngine On
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteRule (.*) http://www.%{HTTP_HOST}/$1 [R=301,L]
Although to be honest, Google does a pretty good job of resolving this issue anyway (it is very common). There is no duplicate content "penalty", it's just that if you don't specify a preference then either could be indexed.

TYPO3: Howto share session / cookies between domains (one for each language )?

I wonder if there's a way to tell TYPO3 to share the sessions / cookies between different domains?
We wrote an Extbase extension on a multi language / multi domain site.
We store search words from a search form in the user session. If the user switches the page language, he should get the same results as before - Without the need to re-fill the search form.
One way would be to tell the browser to store several cookies at the same time - one for each domain/language. How can this be achieved with TYPO3 / Extbase?
By default, there is no way to set cookies for a different domain - not with or without TYPO3. This is a security measure implemented in every browser (or do you want me to set / read your cookies from yourbank.com when you visit my web site? ;-))
You have to create some helper script that does this for you. One way could be:
example.com is loaded
this page includes an iframe to a PHP script (or TYPO3 site, e.g. with eID) on example.org with a GET parameter storing being the session id
the script loaded from example.org reads the GET parameter and sets a cookie with that session id (or whatever parameter you want to transfer).
afterwards the cookie is also available when browsing example.org
I have never tried this, but I'm pretty sure it will work with PHP. Maybe it's even possible with pure JavaScript, but I'm not so sure. In every case, think about what security holes you get with the explained script. In doubt sign the parameters (or require a token)!

IIS 7 redirect and rewrite for retired domain

I have an old domain for a company that has merged with another company and they want to decommission the old site and redirect traffic to the new domain. OldCompany.com will now point to NewCompany.com. However, to keep their SEO rankings we also want to map the pages from the OldCompany.com domain to the corresponding pages on NewCompany.com.
I know it's possible to setup Rewrite Maps in IIS (I've done this), but if the OldCompany domain is now pointing to the NewCompany web server, but the site itself was not migrated, will I still be able to use rewrite rules in conjunction with redirects to point OldCompany.com/about.html to NewCompany.com/subDirectory/about.aspx?? Do I need to setup these pages in order to accomplish this? Will Rewrite rules work without the pages from the originating site in place?
Right now I am able to setup a HTTP Redirect for the entire OldCompany.com domain by just creating a new site in IIS and using the HTTP Redirect to do this. What I really want is the more granular solution outlined above, so that people get to the pages they are looking for and not just the new site's homepage.
You should not do the redirect with new site (in application level). This would just break any existing incoming links. Better approach is to redirect old domain (with the whole url path & query string that you may have) with 301 redirect and map all relevant old urls to urls in the new site.
Usually it's done with multiple steps:
Tell Google Webmaster Tools the new domain address (in case you use that)
Create IIS rewrite rule to redirect (with 301) old domain to the new domain, preserving path & query string info
Create IIS rewrite rules (in your new site) to map any old url to the new structure, with permanent redirect (301) or redirect to same other page when user can move forward, if exact page is not found from the new structure.
This will tell Google that the URLs have changed and point to the new location.

Redirecting large amount of indexed links

I am in the process of launching 2 sites that have been recently redesigned (one in RoR and one in WordPress) they both have a very large amount of inbound links coming in from search engines and outside sources. This has been something I have been curious for quite some time on an efficient way to implement redirects on all links.
My main purpose of this is so the site does not lose the work it has done SEO wise and in addition not leave any old backlinks forwarding to a 404.
What is the best practice when launching a new site for redirecting old URIs?
You'll find that most of your back-links are to your home-page anyway, so that will take care of the bulk of them. In terms of mitigating 404s from broken back-links, try to create a pattern-match (regex) redirect sending a 301 (Moved Permanently) header - using .htaccess (since you're using RoR/WP).
WordPress does have some plugins to handle migrations and redirections - simply search on the wordpress.org site.
Ensure you register your site with Google's Webmaster Tools and monitor your 404 pages (or log them server-side) to catch ones you've missed.
Lastly, to ensure that you get your new URLs indexed and canonicalization (beyond ensuring rel=canonical is used correctly), submit an XML sitemap of all your new pages.
In terms of redirecting old links to new links, it is general practice to do a 301 redirect (for SEO purposes). In the absolute worst case you cannot do this, redirect to the homepage at the very least to not lose visitors to 404 pages.

url structure for mobile/desktop site

What are the pros and cons of these url formats for a website that does mobile and desktop content...
mobile.example.com
example.com/mobile
no explicit url, but send back dynamic content based on browser, or querystring variable?
thanks
W3 recommends "When accessing site entry points users should not have to enter a filename as part of the URI. If possible, configure Web sites so that they can be accessed without having to specify a sub-domain as part of the URI."
So m.example.com or example.com/m would be best solution

Resources