Disable google from indexing redirect urls - url

i have a news aggregation site with tons of news...
if a single news on the home page has text then the link is with a redirect to the sources site www.site.com/red/23545, but if the same news were not to have any text just the title then it gets this link which is not a redirect and goes to a different page on my site www.site.com/23545/some_news_title
So the same news can on some pages be with text and then get redirected on the source page or on some pages with only title and get a normal link to a page within the site.
The problem is that google is indexing the redirect links (www.site.com/red/23545).
I added
Disallow: /red/
to my robots.txt
and i also added
rel="nofollow"
to all redirect links
but non of it didn't work... its still indexed...
The reason i use the redirect link is to prevent screen scraping of my site.. if someone were to go on the redirect link without a my domain as a referrer he would be redirected to my sit and not the news source page...
Any help would be appreciated.
Thanks!

You can add the following header on your redirect pages to prevent them from being indexed:
X-Robots-Tag: noindex
Still, I'm surprised that your robots.txt didn't work. How long have you had the robots.txt file there? It does tend to take some time for those sorts of things to take effect. You can speed up the process by removing the links from Google's index via the Google Webmaster Tools.

Related

Get rid of old links to a retired website in Google search

I have a website that has been replaced by another website with a different domain name.
In Google search, I am able to find links to the pages on the old site, and I hope they will not show up in future Google search.
Here is what I did, but I am not sure whether it is correct or enough.
Access to any page on the old website will be immediately redirected to the homepage of the new website. There is no one-to-one page mapping between the two sites. Here is the code for the redirect on the old website:
<meta http-equiv="refresh" content="0;url=http://example.com" >
I went to Google Webmasters site. For the old website, I went to Fetch as Google, clicked "Fetch and Render" and "Reindex".
Really appreciate any input.
A few things you'll want to do here:
You need to use permanent server redirects, not meta refresh. Also I suggest you do provide one-to-one page mapping. It's a better user experience, and large numbers of redirects to root are often interpreted as soft 404s. Consult Google's guide to site migrations for more details.
Rather than Fetch & Render, use Google Search Console's (Webmaster Tools) Change of Address tool. Bing have a similar tool.
A common mistake is blocking crawler access to an retired site. That has the opposite of the intended effect: old URLs need to be accessible to search engines for the redirects to be "seen".

How to interpret Google Search Console URL Crawl Errors

We have a relatively large website and by looking at Google Search Console we have found a lot of strange errors. By lot, I mean 199 URLs give 404 reponse.
My problem is that I don't think these URLs can be found on any of our pages, even though we have a lot of dynamically generated content.
Because of this, I wonder if these are URLs the crawler found or requests coming to our site? Like mysite.com/foobar, which obviously would return 404.
Google reports all backlinks to your website that deliver a 404 in the Google Search Console, no matter if there has ever been a webpage with that URL in the past.
When you click on an URL in the pages with an error list, a pop-up window will give you details. There is a tab "Linked from" listing all (external) links to that page.
(Some of the entries can be outdated. But if these links still exist, try to get them updated or set up redirects for them. The goal in the end is to improve the user experience.)

Strange google result listing, invalid URL created

Would be great if you guys could shed some light on this, has baffled me:
I was asked by a client if I could try and make the search term for his comedy night "sketchercise" put his website top of the Google ranking. I simply changed the title tag of the header for the whole site from "Allnutt and Simpson" to "Allnutt and Simpson - Sketchercise # Ginglik - Sketch Duo". It did the trick and now the site comes up top of the Google listing when typing in "sketchercise". However, it gives off this very strange link:
http://www.allnuttandsimpson.com/index.php/videos/
This is the link to the google search result too:
http://www.google.co.uk/search?sourceid=chrome&ie=UTF-8&q=sketchercise
This link is invalid, it doesn't make any sense. I guess it has something to do with the use of hash tags and the AJAX driven site, but before I changed the title tag, it linked to the site fine using the # tags. What is the deal with this slash?
The strangest part is that the valid URL for the videos page on that site is /index.php#vidspics, I have never used the word "videos" in a url!
If anyone can explain the cause of this or just help me stop it from happening, I'd be very grateful. I realise that this is an SEO question and I hate that stuff generally, but I hope you can see this is a bit of a strange case!
Just to compare, if you google "allnutt and simpson" it works just fine links to the site and all of it's pages absolutely fine as .php pages (and then my JS converts them to hash tags to keep things clean)
It's because there must be a folder called 'videos' under your hosted files, use an FTP client and check this.
Google crawls every folder and file unless you tell him not to do this, look for robot.txt files to learn how to avoid indexation.
Also ask google to remove that result when you solve this.
Finally that behaviour is not related with hash tags, these are just references to javascript in order to display the appropiate content in you webpage.
Not sure why its posted like this but the only way to stop that page from appearing is using a google webmaster account for this website and make sure the crawlers can't find this link anymore. The alternative is have the site admin put this tag, <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> , in the header when isset($_REQUEST(videos)) is true.
The slash in the address is the parsed form of www.allnuttandsimpson.com/index.php?=videos. You can have the web server change all the php parameters into slashes to make the links look pretty.
Best option for correct results is to create a sitemap and submit it to https://www.google.com/webmasters/tools/ for that site. You will need access.
Oh forgot, the sitemap will make google see all the pages you want it to post, use this for the major pages like those in the main menu. To remove links you don't want requires a robots.txt in the main directory of the site.

Good 404 in Rails, with search result on moved pages for new links

I am trying to replace lots of pages in my database at once, and lots of pages that are indexed by Google will have new URLs. So in result, old pages will be redirected to a 404 page.
So I need to design a new 404 page, by including a search box in it. Also, I want the 404 page to grab the keywords in the broken URL in the address bar, and show the search result based on the keywords in the broken links, so that user will have an idea where to go next to the new link.
Old URL:
http://abc.com/123-good-books-on-rails
New URL:
http://abc.com/good-books-on-rails
Then when a user comes from search engine, it shows the old URL. The 404 page will do a search on "good books on rails" keywords and return with a list of search result. So the user know the latest url of that link.
How do I implement this? I will be using Friendly ID, Sphinx and Rails 2.3.8.
Thanks.
You are far better off simply generating the appropriate redirects yourself than to expect your users to do anything weird if a Google link fails. This won't be indefinite - Google will eventually reindex you. If you use 301 (permanent) redirects, Google will be smart enough to NOT follow the link when reindexing your site. If you don't want to manually create redirects for hundreds of pages, then you'll need to try to figure out the algorithm for how your old pages map to new pages.

URL Redirection for Coming Soon Page?

I have a site with over 100 pages. We need to go live with products that are soon available, however, many site pages will not be prepared at the time of release.
In order to move forward, I would like to reference a "coming soon" page with links to pages that are current and available.
Is there an easy way to forward a URL to a Coming Soon page?
Is this valid, or is there a better way?
Found this at:
http://www.web-source.net/html_redirect.htm
"Place the following HTML redirect code between the and tags of your HTML code.
meta HTTP-EQUIV="REFRESH" content="0; url=http://www.yourdomain.com/index.html"
Does this negatively affect you if the search engines crawl through your site?
Thank you!
The code you listed will work. However, I would never do this:
You could just show the page you wanted to show immediately, without a redirect. This will be faster for the visitor, as they don't need to load two pages.
If you must use a redirect, why not create it programmatically, for example by instructing your web server (e.g. Apache) to redirect certain pages?
I would not link to pages that don't exist yet. Most visitors will dislike that - clicking on something to find out "come back later" is a disappointment. We've all seen those coming soon pages, with the content never arriving, or only after months or even years. Either leave out those links (or perhaps put a "work in progress sign" without a link), or add the items only after they've been finished.
Search engines should work well with redirect pages, although it is unlikely your "coming soon" page will show up anywhere in the top the rankings anyway.
Perhaps a better or "more correct way" would be to do the redirection at the header level. Using PHP, you would call
<?php
header("Location: http://www.yourdomain.com/index.html");
There's also ways to do this in Apache (assuming you are using it) and .htaccess-files. See http://www.webweaver.nu/html-tips/web-redirection.shtml for more info about that.

Resources