how to block a search engine to search from my domain - search-engine

I want to block a search engine to stop indexing my website. I've followed this reference Here and create a robot.txt on root. Content is this:
User-agent: http://search.pch.com
Disallow: /
But it doesn't work. Any help will be appreciated. I want to block search engine http://search.pch.com either through .htaccess or some other method.
UPDATE
I have also tried this one
<meta name="robots" content="noindex, nofollow">
<meta name="googlebot" content="noindex, nofollow">
no effect

You need to look into your log-files on your webserver to check if http://search.pch.com is the User-agent of the crawler.
Use a robot.txt (not reboot.txt) with
User-agent: *
Disallow: /
instead if you like any bots (that respect robot.txt) not to crawl you page.

First: file name should be robot.txt
Second: its web crawlers choice whether to honor this file. It clearly says "most of"
Third and most important: the user agent string for the PCHSearch might not be the same as its url. double check the user agent string.
or you can use this code for htaccess
# block visitors referred from indicated domains
RewriteEngine on
RewriteCond %{HTTP_REFERER} baddomain01\.com [NC,OR]
RewriteCond %{HTTP_REFERER} baddomain02\.com [NC]
RewriteRule .* - [F]

this worked for me
SetEnvIfNoCase Referer "http://search.pch\.com" bad_referer
Order Allow,Deny
Allow from ALL
Deny from env=bad_referer

Related

mod rewrite url automatically

I have a problem for rewrite this url:
http://example.org/public/item.php?id=4
i id like to rewrite with htaccess file in:
http://example.org/public/item/4.php
this is my htaccess file:
Options +FollowSymLinks
RewriteEngine On
RewriteBase /public
RedirectMatch ^/$ /public/
RewriteRule ^public/item/([^/]*)\.php$ /public/item.php?id=$1 [L]
that works only if i digit manually the previous url, but i lost all the style css, javascript file, and images, also i want to do this redirect seo url automatically.
what I'm doing wrong?
You have to add a condition if the requested filename(javascript, css, image, etc. files) actually exist not to rewrite the url,
So all you have to do is to add a condition before your rewrite rule:
RewriteCond %{REQUEST_FILENAME} -f
check here the Documentation

Redirect 301 for Active Forum

thanks for reading, pls help out if u can :)
if i want to redirect the entire forum.jalan2.com to subfolder, whats the best way to do it ?
So its on jalan2.com/forum/
How to make EVERY PAGE redirect correctly when accessed from the google ?
Say this page :
forum.jalan2.com/topic/9689-mimiland-batu-payung-village-singkawang-bengkayang/
So it becomes
jalan2.com/forum/topic/9689-mimiland-batu-payung-village-singkawang-bengkayang/
I dont want thousands of the old pages to redirect to only 1 page which is forum home at jalan2.com/forum/
i want each page redirect exactly to the new page location
Thanks :)
Rudy
Add this to your .htaccess in your web root / directory of forum.jalan2.com
RewriteEngine on
RewriteCond %{HTTP_HOST} ^forum.jalan2.com$ [NC]
RewriteRule ^(.*)$ http://jalan2.com/forum/%{REQUEST_URI} [R=301,NC,L,QSA]

How to show HTML website URLs like wordpress [duplicate]

This question already has answers here:
Reference: mod_rewrite, URL rewriting and "pretty links" explained
(5 answers)
Closed 8 years ago.
I'm creating a website for my personal needs. I have different pages like index.html, about.html, contact.html etc...
The default way of showing them is mysite.com/index.html or mysite.com/contact.html
But is there any way to hide the extension part .html and just to show the url main texts like mysite.com/about/ or mysite.com/contact/ ??
Please advice me.
You have to use a URL rewrite engine, edit (or create) the .htaccess like that:
# Remove .html from url
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}\.html -f
RewriteRule ^(.*)$ $1.html
For more details: Remove .html from URLs with a redirect
It's possible to show your page with mysite.com/about/
You need to replace all file name to index.html
Eg : about.html to index.html
Add index.html to their respective pages, like
Before:
www/
|
|-about/about.html
|
|-contact/contact.html
|
|-etc/etc.html
After :
www/
|
|-about/index.html
|
|-contact/index.html
|
|-etc/index.html
Now when you hit the URL www.yourSite.com/about that page will be shown. (without about.html part)

Remove Page from being indexed in Google, Yahoo, Bing [duplicate]

I don't want the search engines to index my imprint page. How could I do that?
Also you can add following meta tag in HEAD of that page
<meta name="robots" content="noindex,nofollow" />
You need a simple robots.txt file. Basically, it's a text file that tells search engines not to index particular pages.
You don't need to include it in the header of your page; as long as it's in the root directory of your website it will be picked up by crawlers.
Create it in the root folder of your website and put the following text in:
User-Agent: *
Disallow: /imprint-page.htm
Note that you'd replace imprint-page.html in the example with the actual name of the page (or the directory) that you wish to keep from being indexed.
That's it! If you want to get more advanced, you can check out here, here, or here for a lot more info. Also, you can find free tools online that will generate a robots.txt file for you (for example, here).
You can setup a robots.txt file to try and tell search engines to ignore certain directories.
See here for more info.
Basically:
User-agent: *
Disallow: /[directory or file here]
<meta name="robots" content="noindex, nofollow">
Just include this line in your <html> <head> tag. Why I'm telling you this because if you use robots.txt file to hide your URLs that might be login pages or other protected URLs that you won't show to someone else or search engines.
What I can do is just accessing the robots.txt file directly from your website and can see which URLs you have are secret. Then what is the logic behind this robots.txt file?
The good way is to include the meta tag from above and keep yourself safe from anyone.
Nowadays, the best method is to use a robots meta tag and set it to noindex,follow:
<meta name="robots" content="noindex, follow">
Create a robots.txt file and set the controls there.
Here are the docs for google:
http://code.google.com/web/controlcrawlindex/docs/robots_txt.html
A robot wants to vists a Web site URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds:
you can explicitly disallow :
User-agent: *
Disallow: /~joe/junk.html
please visit below link for details
robots.txt

HTA redirect - mask URL

I'm having trouble redirecting my main website www.mydomain.com to the folder mydomain.com/stuff/public_html/index.html while retaining www.mydomain.com in the URL. I'd prefer to use HTA over some html solution, but what's the most SE friendly and modern solution for this?
I've tried the simple HTA 301 redirect below, but it shows the file path which I want to avoid.
RewriteEngine on
RewriteCond %{HTTP_HOST} ^mydomain\.com$ [OR]
RewriteCond %{HTTP_HOST} ^www\.mydomain\.com$
RewriteRule ^/?$ "http\:\/\/www\.mydomain\.com\/stuff\/public_html\/" [R=301,L]
Thanks!
Looks like you do not need an external redirect. You need an internal redirect. For this you need to remove the [R] flag in your rule (and have only [L]). The [R] flag forces an external redirect with a HTTP 301 response code.

Resources