Remove Page from being indexed in Google, Yahoo, Bing [duplicate]

Remove Page from being indexed in Google, Yahoo, Bing [duplicate] - search-engine

I don't want the search engines to index my imprint page. How could I do that?

Also you can add following meta tag in HEAD of that page
<meta name="robots" content="noindex,nofollow" />

You need a simple robots.txt file. Basically, it's a text file that tells search engines not to index particular pages.
You don't need to include it in the header of your page; as long as it's in the root directory of your website it will be picked up by crawlers.
Create it in the root folder of your website and put the following text in:
User-Agent: *
Disallow: /imprint-page.htm
Note that you'd replace imprint-page.html in the example with the actual name of the page (or the directory) that you wish to keep from being indexed.
That's it! If you want to get more advanced, you can check out here, here, or here for a lot more info. Also, you can find free tools online that will generate a robots.txt file for you (for example, here).

You can setup a robots.txt file to try and tell search engines to ignore certain directories.
See here for more info.
Basically:
User-agent: *
Disallow: /[directory or file here]

<meta name="robots" content="noindex, nofollow">
Just include this line in your <html> <head> tag. Why I'm telling you this because if you use robots.txt file to hide your URLs that might be login pages or other protected URLs that you won't show to someone else or search engines.
What I can do is just accessing the robots.txt file directly from your website and can see which URLs you have are secret. Then what is the logic behind this robots.txt file?
The good way is to include the meta tag from above and keep yourself safe from anyone.

Nowadays, the best method is to use a robots meta tag and set it to noindex,follow:
<meta name="robots" content="noindex, follow">

Create a robots.txt file and set the controls there.
Here are the docs for google:
http://code.google.com/web/controlcrawlindex/docs/robots_txt.html

A robot wants to vists a Web site URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds:
you can explicitly disallow :
User-agent: *
Disallow: /~joe/junk.html
please visit below link for details
robots.txt

Related

Can I include canonical URLs in sitemap for SEO?

Can I include canonical URLs in sitemaps for SEO?
For example www.example.com/url.html is a duplicate page of www.example2.com/url.html.
So I used following tag in www.example.com/url.html page for SEO not to be penalized by search engines:
<link rel="canonical" href="www.example2.com/url.html">
Now my question is can I display www.example.com/url.html URL inside of www.example.com/sitemap.xml?
I already display www.example2.com/url.html URL inside of www.example2.com/sitemap.xml.
Please suggest me what I have to do.

You can include these two pages into your sitemap.xml and there won't be a problem for SEO because you're using the rel="canonical" tag. Indeed, when web crawlers will try to index the duplicate page, they will see the rel="canonical" tag and they will index the second page (the good one).

For better index you must leave one URL - canonical URL in XML site map

MVC / .NET Root URLs

In my layout page, I have:
<link href="~/Content/bootstrap.css" rel="stylesheet">
My understanding is that this should not be altered when it is sent to the client. However, when I set up the website as a virtual application under a "myapp" folder in IIS, the HTML is:
<link href="/myapp/Content/bootstrap.css" rel="stylesheet">
I'm a bit confused as I had thought I would need to change these URLs to:
<link href="#Url.Content("~/Content/bootstrap.css")" rel="stylesheet">
in order for this to work correctly.
So do I need to use URL.Content to get the correct root URL of the app/website, or can I just put tildes into the actual HTML src + href elements, and assume it will be outputted correctly by IIS?

As of ASP.NET MVC version 4 (or actually Razor version 2), the tilde links are essentially shortcuts to Url.Content(..).

You actually answered your own question. Yes, you should use Url.Content() for your relative paths. A simple tilde in front of relative paths are only parsed in the client's browser,which treats all URL's under the http://www.foo.com/ as a single domain, so will try to look for resources at http://www.foo.com/ and not http://www.foo.com/myapp/.

Hide web pages to the search engines robots

I need to hide all my sites pages to ALL the spider robots, except for the home page (www.site.com) that should be parsed from robots.
Does anyone knows how can i do that?

add to all pages you do not want to index tag <meta name="robots" content="noindex" />
or you can create robots.txt in your document root and put there something like:
User-agent: *
Allow: /$
Disallow: /*

How do I set the correct path in the jQuery script?

Dear all... I've been creating my website on another PC. Now I want to copy all PHP and jQuery files to my notebook... but all jQuery widgets does not show. How do I put the correct address in my script??
<link href="js/jquery.ui.all.css" rel=.....>
The files are at:
local disc(c:)/xampp/htdocs/js/jquery.ui.all

Since it's under the web root, just add a / before the path to make it absolute, like this:
<link href="/js/jquery.ui.all.css" rel=.....>
Then you put a path that's not fully qualified or absolute, it's relative to the page, so for example your current <link> would only work on a page in the root folder as well, for example:
/xampp/htdocs/page.html
With it absolute, it won't matter how deep the page is, it's looking for like this, regardless of the page path:
http://www.mysite.com/js/jquery.ui.all.css

grails app root context

I have a test grails app setup with a context of "/testapp". When I add a link in my gsp that references / it does not go to the root of my grails.app.context, but to the root of my grails.serverURL property.
For example given a link with href "/css/main.css"
I would expect that this link would actually look in localhost:8080/testapp/css/main.css instead of localhost:8080/css/main.css
Is there a way that I can get references to / to start at my grails.app.context vs the grails.serverURL?

use the request contextPath value on the page
${request.contextPath}
and then prepend the additional host information if necessary to construct the complete url

the question is how do you add your links into your gsps?
We do things like
<link rel="stylesheet" href="${resource(dir: 'css', file: 'stylesheet1.css')}"/>
and
<g:javascript library="prototype"/>
by using the g:javascript and resource tags and methods, you tell grails to set the path for you...
I suspect you are just putting standard tags in...
goto
http://grails.org/doc/latest/
and, under tags in the left hand nav, look for resource and/or javascript to get an idea (its difficult to link directly in to the docs...:()

I had a similar issue to OP - how to have grails form links that start at the context root and NOT server root?
You can do so using the "uri" attribute for g:link and g:createLink tags. For example:
<g:link uri="/login">login</g:link>
will prefix any context if applicable, and produce the following
login if your app is at the http://server/
login if your app is at http://server/testapp/
Not sure why it's an undocumented attribute in the reference docs, but I found it in the Javadocs - ApplicationTagLib

You should probably be using the resource tag into your grails CSS directory, like mentioned above. However, you can also use the resource method to find the root context of you web application using the same tag:
${resource(uri:'/')}
then just use that string wherever.

And when it comes to elements like stylesheets I'd recommend creating a simple tag that'll do the trick, something along those lines:
class StylesTagLib {
static namespace = "g"
def stylesheet = { args, body ->
out << """<link rel="stylesheet" href="${resource(dir: 'css', file: args.href)}"/>"""
}
}
and later on in your code use it like this:
<g:stylesheet href="main.css"/>
Obviously you can fiddle with the conventions (should I use a predefined folder? should I add the .css extension automatically? stuff like that) but the general idea is to hide the ugliness behind a nicely defined tag.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Remove Page from being indexed in Google, Yahoo, Bing [duplicate] - search-engine

I don't want the search engines to index my imprint page. How could I do that?

Also you can add following meta tag in HEAD of that page <meta name="robots" content="noindex,nofollow" />

You can setup a robots.txt file to try and tell search engines to ignore certain directories. See here for more info. Basically: User-agent: * Disallow: /[directory or file here]

Nowadays, the best method is to use a robots meta tag and set it to noindex,follow: <meta name="robots" content="noindex, follow">

Create a robots.txt file and set the controls there. Here are the docs for google: http://code.google.com/web/controlcrawlindex/docs/robots_txt.html

A robot wants to vists a Web site URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds: you can explicitly disallow : User-agent: * Disallow: /~joe/junk.html please visit below link for details robots.txt

Related

Can I include canonical URLs in sitemap for SEO?

MVC / .NET Root URLs

Hide web pages to the search engines robots

How do I set the correct path in the jQuery script?

grails app root context

Categories

Resources