Redirecting large amount of indexed links - ruby-on-rails

I am in the process of launching 2 sites that have been recently redesigned (one in RoR and one in WordPress) they both have a very large amount of inbound links coming in from search engines and outside sources. This has been something I have been curious for quite some time on an efficient way to implement redirects on all links.
My main purpose of this is so the site does not lose the work it has done SEO wise and in addition not leave any old backlinks forwarding to a 404.
What is the best practice when launching a new site for redirecting old URIs?

You'll find that most of your back-links are to your home-page anyway, so that will take care of the bulk of them. In terms of mitigating 404s from broken back-links, try to create a pattern-match (regex) redirect sending a 301 (Moved Permanently) header - using .htaccess (since you're using RoR/WP).
WordPress does have some plugins to handle migrations and redirections - simply search on the wordpress.org site.
Ensure you register your site with Google's Webmaster Tools and monitor your 404 pages (or log them server-side) to catch ones you've missed.
Lastly, to ensure that you get your new URLs indexed and canonicalization (beyond ensuring rel=canonical is used correctly), submit an XML sitemap of all your new pages.

In terms of redirecting old links to new links, it is general practice to do a 301 redirect (for SEO purposes). In the absolute worst case you cannot do this, redirect to the homepage at the very least to not lose visitors to 404 pages.

Related

Remove cpanel login pages from the Google index

Google has indexed the url login cpanel hosting of Hostgator. Ex: mysite.com:2082
Also indexed 5 pages of my site with www. So I'm with duplicate content.
Is indexed, eg mysite.com/page1 and www.mysite.com/page1
I've tried removing the Webmaster Tools, but always add a slash (/) after the domain.
When trying to send mysite.com:2082 Removal is added /, getting mysite.com/:2082
Has anyone had this problem?
Can anything be done to remove these pages?
Thank.
Google has indexed the url login cpanel hosting of Hostgator. Ex: mysite.com:2082
If you are on a shared host I don't think you can do anything about this unfortunately.
cPanel blocks the crawling of these pages with robots.txt. Unfortunately this can still result in a link-only entry in Google SERPs, with a description such as:
A description for this result is not available because of this site's robots.txt – learn more.
To prevent these pages from being indexed they either need a noindex robots meta tag, or a similar noindex X-Robots-Tag HTTP response header. And remove the Disallow directive in robots.txt (which prevents the pages from being crawled). As far as I'm aware the cPanel pages do not return an appropriate robots meta tag.
This issue has been discussed in the cPanel forums (some years ago!) and "fixes" have supposedly been released, however, I have seen no change in this behaviour.
To be honest, using robots.txt to block the crawling of these pages is arguably the most efficient method as it simply blocks the (good) bots from requesting the pages and thus reducing (just a little bit) the load on the server. In order to block these pages from Google's index you need to allow the pages to be crawled so the robots meta tags can be detected (which doesn't currently exist). Bit of a catch 22.
If you are thinking in terms of security, then preventing these page from being indexed does not really help. It's just security by obscurity at best. The cPanel login pages can easily be found by requesting the standard URL, example.com:2082.
Also indexed 5 pages of my site with www. So I'm with duplicate content.
You can set a preference for either www or no-www in Google Webmaster Tools. Or you can redirect one to the other in .htaccess. Which is your preferred URL is up to you. For instance, to redirect from none-www to www...
RewriteEngine On
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteRule (.*) http://www.%{HTTP_HOST}/$1 [R=301,L]
Although to be honest, Google does a pretty good job of resolving this issue anyway (it is very common). There is no duplicate content "penalty", it's just that if you don't specify a preference then either could be indexed.

Subdomain security in Rails

Let's say I'm trying to create an application called Blue. Blue is a Ruby on Rails application that turns the background of any website blue. It also allows users to log in and keep track of the websites they've visited and turned blue.
In order to turn a website's background websites blue, I've created a web proxy that inserts <link HREF="http://www.example.com/blue.css" type="text/css"> into the response's body. The proxy is implemented as a rack application and is be placed inside the Rails routes using the approach from the Rack in Rails 3 Railscast:
root :to => BlueProxy, :constraints => { :subdomain => "proxy" }
I'm very concerned about security with this approach. I know by default the domain for the cookies in my application would be .example.com. If the user typed in a malicious URL, the website could manipulate the user's account. I could fix this by only allowing the www subdomain for cookies in the application. However, I'd also like the proxy to be able to store cookies for the proxied site as well.
Here are my three questions:
Is this a bad approach? Is there a better way to solve this problem?
What's the best way to keep sibling subdomain cookies separate in Rails?
Are there any other security concerns I'm missing?
This approach is dangerous, and I caution you against running a proxy for several reasons:
It brings up a host of legal issues ranging from people accessing illegal content to your hosting content for your own benefit (and modifying it).
Your bandwidth (and hosting fees) will explode if the site gets popular.
Loading content inside an iframe has ux issues, like the browser back button not quite performing as the user wants it to.
Running a proxy opens up several more attack vectors to your site (e.g. sending a permalink to a malicious site proxied through your site) that you'll have to consider from a security perspective.
Instead of running an open proxy (okay, maybe it's not completely open, but how hard is it for someone to sign up?) on your back end, consider using a browser extension or greasemonkey script on the front end that can get its set of rules from your rails app and then add the stylesheet changes on the client side.

IIS 7 redirect and rewrite for retired domain

I have an old domain for a company that has merged with another company and they want to decommission the old site and redirect traffic to the new domain. OldCompany.com will now point to NewCompany.com. However, to keep their SEO rankings we also want to map the pages from the OldCompany.com domain to the corresponding pages on NewCompany.com.
I know it's possible to setup Rewrite Maps in IIS (I've done this), but if the OldCompany domain is now pointing to the NewCompany web server, but the site itself was not migrated, will I still be able to use rewrite rules in conjunction with redirects to point OldCompany.com/about.html to NewCompany.com/subDirectory/about.aspx?? Do I need to setup these pages in order to accomplish this? Will Rewrite rules work without the pages from the originating site in place?
Right now I am able to setup a HTTP Redirect for the entire OldCompany.com domain by just creating a new site in IIS and using the HTTP Redirect to do this. What I really want is the more granular solution outlined above, so that people get to the pages they are looking for and not just the new site's homepage.
You should not do the redirect with new site (in application level). This would just break any existing incoming links. Better approach is to redirect old domain (with the whole url path & query string that you may have) with 301 redirect and map all relevant old urls to urls in the new site.
Usually it's done with multiple steps:
Tell Google Webmaster Tools the new domain address (in case you use that)
Create IIS rewrite rule to redirect (with 301) old domain to the new domain, preserving path & query string info
Create IIS rewrite rules (in your new site) to map any old url to the new structure, with permanent redirect (301) or redirect to same other page when user can move forward, if exact page is not found from the new structure.
This will tell Google that the URLs have changed and point to the new location.

What are the pros and cons of a default URL with www or without www?

We need to default URL to unique name. If it is www then with no prefix or vice versa. So decision to be made is either stick with www or with no prefix.
With no prefix cookie is set for all sub domains. What are other downsides for it? Or benefits?
Basically we need this for OpenID as OpenID will make users look different if they came from www or with no prefix.
As our site is new so we can go with either one. Also, how the domain name looks is not much of a concern.
You probably want to redirect (with a HTTP 301 - Permanent Redirect) one to the other anyway, since maintaining consistent urls is much easier that way. So whichever you decide, just make sure the actual authentication is done after the redirect, and users looking different won't be an issue.
That said, if you want www or not depends entirely on how other things in your appliction works. You mention that cookies for domain.com will be saved for all subdomains - is this something you want? Are you ever going to need to differentiate (for example, by allowing users to set up their own authentication systems for subdomains as a shared hosting service might do)?
If none of the differences you find between including and excluding www matter to your application, I'd go for not using www. The main reason for this is my picture of current trends on the internet - more and more applications (SO is an example of this) tend to leave the www out, both when linking to their own sites, and in marketing of different kinds.
However, the main point is make both work. You don't want your site to break because the user did(n't) type www at the beginning of the url.
By not using the www subdomain, you can suffer a performance hit when delivering static content, as noted here: http://developer.yahoo.com/performance/rules.html#cookie_free. As I understand it, if you use http://example.com/ and http://static.example.com for static content, any cookies you set on the main domain will be passed with requests to your static subdomain.
This can be avoided quite easily, by buying a distinct domain for static content. However, this can certainly be dealt with by using a www subdomain.
Then again, this is a very minor con, and really only comes into play when you're dealing with a high-demand site. (For example, Digg uses http://digg.com and http://*.diggstatic.com).
Ultimately, I would say that this is such a minor problem that it can probably be dealt with if performance starts to suffer. Don't optimize prematurely, and all that...
And, as #Tomas Lycken points out, make sure you account for www even if you don't use the subdomain.

Map a domain to an MVC area

Anybody got any experience in mapping a domain to an MVC area?
Here's our situation:
Old system (still active but will soon redirect to new store):
www.example.com - our main site where we send traffic
store.example.com - our store site which is a completely separate site that is indexed in google
New system:
www.example.com - same site as before
www.example.com/store - new store site - built in an ASP.NET MVC area
Because store is a separate domain google gives it a separate entry in the search results. I'd like to keep this benefit in future but wondering whether or not there is a good way to map a domain (store.example.com) to the MVC area or if its just going to be more trouble than its worth.
PS. I'm not trying to keep existing indexing - its a completely separate store so thats not possible. I just want to redirect to the corresponding page in the new store. I'm just trying not to lose the benefit of two domains for SEO purposes.
I would use URL Rewriting, either in ASP.NET or in IIS7 Application and Request Routing (ARR) to change incoming requests for store.example.com/... to example.com/store/....
MVC will have no issue with this - it doesn't get to see anything but the new URL and it will generate links only for the new layout.
Other alternatives:
Create a website for the store.example.com that just does a wildcard 301 redirect for each page to the corresponding page on the new site.
If the URLs don't overlap at all, point the old domain to the new MVC site and add duplicate routes for each action, e.g. shop.example.com/info.aspx?item27 might have a route "/info.aspx/{pathinfo*}" ... which loads an Action that knows how to handle the old URL parameters and can do a Redirect to the new Action.
I have sites where there are many URLs mapped onto the same Action - in fact, every legacy URL that has ever been used for a page still works today, including even the old .ASPX URLs which are now served up by an MVC Action. Some legacy URLs are dealt with using a 301 response, others which legitimately have duplicate content on the site are handled as normal but the page also includes a canonical URL to point out which one is the preferred URL.

Resources