Whats the correct way to use robots.txt

Whats the correct way to use robots.txt - ruby-on-rails

I am trying to get robots.txt to work so that search engines start indexing my website and show meta info like descriptions etc.
However, I get this message:
A description for this result is not available because of this site's robots.txt – learn more.
Here is what my robots.txt look like.
# See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
#
# To ban all spiders from the entire site uncomment the next two lines:
User-agent: *
Disallow: /tags/*
Disallow: /users/*
What do I need to change?
This is a Rails4 application hosted on Heroku and is in the public directory in the Rails repository

First of all, it is not compulsory to use robots.txt file! you only need to use them in case you don't want the search engines to crawl specific pages or directories of your website.
In your case, you are restricting search engines to crawl tags and users' directories hosted on the root. Now, any page inside this directory will give this error.
I also recommend using the Google webmaster tool and verifying your website. You can test Robots.txt file from there.

Try removing some asterisks:
User-agent: *
Disallow: /tags/
Disallow: /users/
Meanwhile, providing a location to your site map might be helpful too:
Sitemap: www.yoursite.com/sitemap.xml

Related

Remove index.html from url in Weebly without .htaccess file

I want to remove "index.html" from the homepage URL of Weebly site without .htaccess file, please help me to resolve this problem.

Weebly does not currently provide the option to redirect /index.html to the root domain URL or give you the access needed to properly make those changes, however, links to your home page (at least on your website) should be going to the domain-root.com and not /index.html so you should be ok there.
Keep in mind that index.html is a file, that exists as the home page for the folder your website pages live in, and you can't remove it from it's existence(at least on Weebly).
So, the thing to do would be to submit it to Weebly as a Feature Request and request that they make the necessary changes on their end, for the sake of ALL Weebly users! ;)
https://community.weebly.com/t5/Vote-on-Features/idb-p/IdeaExchange

remove pages from google dynamic url - robots.txt

I have a few links on google that are domain.com/results.php?name=a&address=b
The results page/parameters has now been renamed and I need to remove the existing links on google etc.
I tried
User-agent: *
Disallow: /results.php
in robots.txt and then on google webmaster added the url to be removed:
domain.com/results.php
it says it was removed successfully, however when I look at google an type domain.com - the existing urls with parameters are all still there.
What am I doing wrong? There are quite a few links so I need a way to deal with all of them at once instead of one by one.
Thanks

You could put a page at results.php, and just get it to return a 301 redirect back your home page.
<?php
header("HTTP/1.1 301 Moved Permanently");
header("Location: http://www.Your-Website.com");
?>
As Google recrawls your site, the old pages will disappear. You may find this works faster than just having removed the old page.

Joomla Site URL

I have a website on a 1and1 server. I have 2 domains on the package; the default url somenumbers.websitehome.co.uk and my actual URL.
The problem is, Joomla quotes the somoenumbers.websitehome.co.uk in system e-mails instead of my actual URL. They both point to the same directory I just don't want to give a stupid URL to my users.
I think I have narrowed it down to the $siteUrl variable but I'm not sure how to go from here.
Thanks
James

At first I thought this sounded largely like a DNS issue - but then you said you only have 1 Joomla installation and both domains point to that directory.
If they're both pointing to the same directory, then both domains will display the same Joomla installation. That is, unless you're checking to see where they are coming from and having the content display dynamically based upon how they got to your site - but from your comments I doubt you're doing that. How your domains are behaving is the expected response if you're pointing them both to the same directory.
If you want 2 sites on the same hosting? Setup 2 databases (or apply different prefix for each site and use 1 database) and set up each website in it's own directory.
Adjust the DNS of each domain name (and subdomain name) to point the appropriate directory.
From there, use the .htaccess file and SEF URL's to get rid of any indicator of the directory so that it displays the same as if the site was in the root directory.
That is the best way to accomplish what you're after - because from the sounds of it Joomla is doing exactly as it should, displaying correctly what is in the directory since both domain names point to the same directory.

$siteUrl should be left blank.

Is the HTML content published using Google Drive Site Publishing indexed by Search Engines?

Ref: http://googleappsdeveloper.blogspot.com/2012/11/announcing-google-drive-site-publishing.html
Is the HTML content published using Google Drive Site Publishing indexed by Search Engines?

Yes, it is, as long as the files are publicly shared.

No, Please see the Robots.txt file.
https://googledrive.com/robots.txt
User-agent: *
Disallow: /

Rails & Javascript: strange 404s.... perhaps a crawler?

This is perhaps a vague question, but it appears like some bot is crawling my site and doing it VERY poorly. It appears to be guessing IDs from my application js file and putting these into urls, for example:
Couldn't find Post with id=keypress
And even more strangely, the HTTP referrer is listed as application.js.
Has anyone experienced this before? Any ideas on how to stop these crawlers?

If it is a legitimate crawler, you can stop it when by placing robot.txt file in your root domain directory - http://en.wikipedia.org/wiki/Robots_exclusion_standard
You would include the following text in the robots.txt file:
User-agent: *
Disallow: /YOUR_PATH_TO_FILE/application.js
You can also add this tag to your page headers:
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
If it is a malicious crawler, this of course will not stop it. There are other methods you can take for crawlers that do not respect robots.txt, but that depends on what web server you are using.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Whats the correct way to use robots.txt - ruby-on-rails

Try removing some asterisks: User-agent: * Disallow: /tags/ Disallow: /users/ Meanwhile, providing a location to your site map might be helpful too: Sitemap: www.yoursite.com/sitemap.xml

Related

Remove index.html from url in Weebly without .htaccess file

remove pages from google dynamic url - robots.txt

Joomla Site URL

Is the HTML content published using Google Drive Site Publishing indexed by Search Engines?

Rails & Javascript: strange 404s.... perhaps a crawler?

Categories

Resources