Remove cpanel login pages from the Google index - url

Google has indexed the url login cpanel hosting of Hostgator. Ex: mysite.com:2082
Also indexed 5 pages of my site with www. So I'm with duplicate content.
Is indexed, eg mysite.com/page1 and www.mysite.com/page1
I've tried removing the Webmaster Tools, but always add a slash (/) after the domain.
When trying to send mysite.com:2082 Removal is added /, getting mysite.com/:2082
Has anyone had this problem?
Can anything be done to remove these pages?
Thank.

Google has indexed the url login cpanel hosting of Hostgator. Ex: mysite.com:2082
If you are on a shared host I don't think you can do anything about this unfortunately.
cPanel blocks the crawling of these pages with robots.txt. Unfortunately this can still result in a link-only entry in Google SERPs, with a description such as:
A description for this result is not available because of this site's robots.txt – learn more.
To prevent these pages from being indexed they either need a noindex robots meta tag, or a similar noindex X-Robots-Tag HTTP response header. And remove the Disallow directive in robots.txt (which prevents the pages from being crawled). As far as I'm aware the cPanel pages do not return an appropriate robots meta tag.
This issue has been discussed in the cPanel forums (some years ago!) and "fixes" have supposedly been released, however, I have seen no change in this behaviour.
To be honest, using robots.txt to block the crawling of these pages is arguably the most efficient method as it simply blocks the (good) bots from requesting the pages and thus reducing (just a little bit) the load on the server. In order to block these pages from Google's index you need to allow the pages to be crawled so the robots meta tags can be detected (which doesn't currently exist). Bit of a catch 22.
If you are thinking in terms of security, then preventing these page from being indexed does not really help. It's just security by obscurity at best. The cPanel login pages can easily be found by requesting the standard URL, example.com:2082.
Also indexed 5 pages of my site with www. So I'm with duplicate content.
You can set a preference for either www or no-www in Google Webmaster Tools. Or you can redirect one to the other in .htaccess. Which is your preferred URL is up to you. For instance, to redirect from none-www to www...
RewriteEngine On
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteRule (.*) http://www.%{HTTP_HOST}/$1 [R=301,L]
Although to be honest, Google does a pretty good job of resolving this issue anyway (it is very common). There is no duplicate content "penalty", it's just that if you don't specify a preference then either could be indexed.

Related

How detailed should my sitemap be for a multilingual site?

I have a one page website which includes an English main page, and a French Main Page. One can access my website through the following URLs:
ENGLISH VERSION OF MAIN PAGE
www.example.org
www.example.org/index.html
example.org
example.org/index.html
FRENCH VERSION OF MAIN PAGE
www.example.org/fr
www.example.org/fr/index.html
example.org/fr
example.org/fr/index.html
For optimal search engine indexing, should I include all of these URLs in my sitemap (with both http:// and https://)? If not, what would be the set of URLs I should include in my sitemap.xml file?
You should include all unique pages in your sitemap once.
All of the different URLs you listed are just different ways of accessing the same page/content, just like most PHP applications can be accessed via site.org/ or site.org/index.php. Your sitemap should include just one reference to a page.
The best practice is to have one canonical URL per document. And each canonical URL should be added to your sitemap (if you have one).
So in your case you may want to use one URL for the English main page and one URL for the French main page, and redirect (with HTTP status code 301) from the other URLs to the canonical ones. In addition, you can declare the canonical URL with the canonical link relation.
If you need to provide HTTP in addition to HTTPS (instead of enforcing HTTPS), you would of course need to have two URLs per document (one with HTTP, one with HTTPS). But you [should only list one variant in the sitemap](http://www.sitemaps.org/faq.html#faq_http_vs_https "Sitemaps.org FAQ: 'My site has both "http" and "https" versions of URLs. Do I need to list both?'"), and you should only declare one as canonical (ideally the same which you added to the sitemap).
Which URLs to choose can depend on various factors (usability, SEO, your backend, …), but it seems safe to assume that index.html is ballast. You’d have to decide if to use the www subdomain (a common convention) or not. Assuming that you choose to omit it, you could have these canonical URLs:
https://example.org/
https://example.org/fr
And you would redirect the following URLs with 301 to the canonical URLs listed above:
https://example.org/index.html
https://www.example.org/
https://www.example.org/index.html
https://example.org/fr/index.html
https://www.example.org/fr
https://www.example.org/fr/index.html

Allow access to a certain page on a site, only if they come from a link on another certain page

For my ASP.NET MVC website, I want to allow access to a certain page, only if the user comes from a link on another certain page (that page could be from completely different URL).
Example:
I want to allow a user access to www.MySite.com/thispage, only if they come from a link on www.MySite.com/thatpage or www.MyOtherSite.com/thatpage
How can this be accomplished?
You'll want to check the HTTP_REFERER header
You can do that with
Request.UrlReferrer
That said, this isn't real security. Someone could set the referer header of their browser manually.
If this is just a means of preventing hotlinking, it's fine. But if you're only using this to keep people out of private/secure information, you'll want to implement some real form of authentication/authorization.
You can restrict it with the Referer however this can easily be manipulated so if this is very important to you... I would go another route.
What you can do is use .htaccess to prevent the incoming referring link from reaching your site. You could deny the incoming traffic from the site or link in question completely. Or, maybe you simply redirect or forward the incoming traffic from the site or link to somewhere else.
This can be applied to any web site with .htaccess as long as your host has mod_rewrite enabled.
If you are using Dolphin and it is installed in your root/main/http docs/public_html directory then you would add to:
yoursite()com/.htaccess
If you are using Dolphin and it is installed in a subfolder/subdirectory then you would add to:
yoursite()com/dolphin-directory/.htaccess
Add to .htaccess file after the line RewriteEngine on like:
RewriteEngine on
RewriteCond %{HTTP_REFERER} baddomain.com [NC]
RewriteRule .* - [F]
To forward or redirect traffic from the unwanted site or link to somewhere else then add:
RewriteEngine on
RewriteCond %{HTTP_REFERER} baddomain.com [NC]
RewriteRule .* http://en.wikipedia.org/ [R,L]
Note:
-Change RewriteCond line: baddomain.com to the link or site you want to prevent.
-Change RewriteRule line: http://en.wikipedia.org/ to the link or site you would like to redirect or forward it to.
--Be sure to download and backup your original .htaccess file prior to editing it. Htaccess is extremely sensitive and needs to be exactly right, or it can make your entire site error out.
---And be sure you test afterward to ensure everything is working properly. You do not want to accidentally block all traffic to your site.
You could always track what sites they visit like the big agencies do thus block them if they aren't coming from your utopia :)

Redirecting large amount of indexed links

I am in the process of launching 2 sites that have been recently redesigned (one in RoR and one in WordPress) they both have a very large amount of inbound links coming in from search engines and outside sources. This has been something I have been curious for quite some time on an efficient way to implement redirects on all links.
My main purpose of this is so the site does not lose the work it has done SEO wise and in addition not leave any old backlinks forwarding to a 404.
What is the best practice when launching a new site for redirecting old URIs?
You'll find that most of your back-links are to your home-page anyway, so that will take care of the bulk of them. In terms of mitigating 404s from broken back-links, try to create a pattern-match (regex) redirect sending a 301 (Moved Permanently) header - using .htaccess (since you're using RoR/WP).
WordPress does have some plugins to handle migrations and redirections - simply search on the wordpress.org site.
Ensure you register your site with Google's Webmaster Tools and monitor your 404 pages (or log them server-side) to catch ones you've missed.
Lastly, to ensure that you get your new URLs indexed and canonicalization (beyond ensuring rel=canonical is used correctly), submit an XML sitemap of all your new pages.
In terms of redirecting old links to new links, it is general practice to do a 301 redirect (for SEO purposes). In the absolute worst case you cannot do this, redirect to the homepage at the very least to not lose visitors to 404 pages.

Upgraded a site from PHP to rails. What about missing old pages redirected from google?

I created a rails app for my client. It was PHP and I totally rebuild it from the scratch with rails. The problem is that the site is old and many old pages are ranked in google. Naturally many people will click the page link in google and the page won't be available.
How do you usually handle such a problem?
I need to redirect such requests (missing old pages) to the main front page of the new app(rails). How can I do that?
Thanks.
Sam
A 301 redirect is meant to be the most efficient and Google friendly way and should preserve your search page ranking.
That said, I haven't tried it in real-life as the next release of my application will be using this approach to restructure a web site.
You might google ".htaccess", apache and "permanent redirect".
Redirecting the user to the front-page would be kind of disorienting without a note (flash[:notice]) to let him know what went wrong.
I think it would be better to write some routes in config/routes.rb to handle the old pages and return the new versions of the pages (if they still exist) else fallback to a 404 page.
If you have been able to maintain the URLs in the new application(eg /members.php is now /members), you can do the following if you use apache:
RewriteRule ^(.*).php $1 [R=301,L]
This will remove the php extension and do a 301 redirect, and should transfer the pagerank to the new page.
If this is not possible and you must redirect to the new main page this MIGHT work, I have not tried it myself:
RewriteRule ^(.*).php http://www.example.com/ [R=301,L]

SSL-secured website best practices

I have a website (www.mydomain.com) that is secured with an SSL certificate. It is an ASP.NET website and I have forced certain pages via code to be required to use the https:// prefix. If they don't it will redirect them to the https:// equivalent. Is this a good practice? Is there an easier way to do this? Not every single page requires SSL.
Also, when the users use my URL in the form of mydomain.com instead of www.mydomain.com they get a certificate error because the certificate was registered for www.mydomain.com. Should I use the same approach as I am with the http:// and https:// issue I mentioned above? Or is there a better way of handling this?
Your approach sounds fine. In my current project, I force HTTPS when a user goes to my login page, (Based on a config flag which lets me test locally without dealing with needing a cert). This allows me to access other pages unsecured which is handy.
I have a couple places where our server grabs the output of other pages (rendering to html to PDF and fetching dynamic images for example). Because of our environment, our server can't resolve it's public name, so if we were to force ssl at the site we'd have to add, our internal IP address (or fake the domain name).
As for your second question you have two options to handle the www.example.com vs example.com. You can buy a certificate that allows you to have multiple domain names. These are known as UCC certificates.
Your second option is to redirect example.com to www.example.com or the other way around. Redirecting is a great option if want your content to be indexed by google or other search engines. Since they will see www.example.com and example.com as two seperate sites. This means that links to your sites will be split reducing your overall page rank.
You can configure sites in IIS to require a Cert but that would A) generate an error if someone isn't visiting with https and B) require all pages to use https. So, that won't work. You could put a filter on IIS that checks all requests and redirects them as https calls if they are on your encryption list. The obvious drawback here is the need to update your list of pages every time a new page is added (e.g. from an XML file or database) and restart the filter.
I think that you are probably correct in building code into the pages that require https that redirects to an https version if they arrive via http. As far as your cert error goes, you could redirect with a full path (that includes the www) instead of a relative path to fix this problem. If you have any questions about how to detect whether the call uses https OR how to get the full path of the current request please let me know. Both are pretty straightforward but I've got sample code if you need it.
UPDATE - Josh, the certs that handle multiple subdomains are called wildcard certs. The problem is that they are quite a bit more expensive than standard certs.
UPDATE 2: One other thing to consider is to use a Master page or derived class for the pages that need SSL. That way, instead of duplicating the code in each page you can just declare it as type SSLPage (or use the corresponding Master page) and have the Master/Parent class handle the redirect. Again, you'll need to do some URL processing if you take this approach but it is pretty trivial.
Following is something that can help you:
If it is fine to display all your website pages with https:// then you can simply update your code to use https:// and set two bindings in IIS. One is for http and another is for https. In this way, your website can be accessible through any of the protocol.
Your visitors are receiving a name mismatch error because the common name used in your SSL certificate is www.mydomain.com. Namecheap is providing RapidSSL certificates through which you can secure both names under single SSL. You can purchase this SSL for www.mydomain.com and it will automatically secure mydomain.com (i.e. without www).
Another option is you can write a code to redirect your visitors to www.mydomain.com website even if they browse mydomain.com.

Resources