I have developed an ASP.NET MVC 5 website that uses jQuery and ajax requests to pull and post data. Google's crawlers found my POST action urls in Javascript code and tried to index them.
In Webmaster Tools I see a lot of errors like that /Account/Login with a 500 error response because obviously a name and a password were not provided. How can I solve this problem? I don't want any crawl error, but I don't know how to say to Google not to follow these urls.
Thank you!
Use Google's instructions to create a robots.txt file, which is a request to a search engine to not index.
A sample robots.txt to put in the root of your domain may look like this:
User-agent: Googlebot
Disallow: /path/to/my/post/url
Related
I had an old asp.net website (aspx) and have redesigned it with asp.net MVC using SSL. Anyway, after I promoted the new site I saw a ton of errors which were being generated by bots looking for old pages; the errors were looking something like this: The controller for path '/blablabla/moreBlalbalba/page.aspx' was not found or does not implement IController. So, I updated my error handling to return a 301 response and redirect to the home page and added a sitemap. The google 404 console errors went away for around a month, but now have a ton of 404 errors and they are all pointing to the old site structure. As a side note, the new MVC/SSL site has no 404 errors in the webmaster console, all the errors are on the non SSL site. So, what is the best way to update the bots for the new site structure.
Thanks!
I would suggest you to redirect all traffic from HTTP to HTTPS, using this example. This may solve your problem already.
And secondly create a sitemap for Google to check the relevant pages of the website, example here
Given I use an CMS which makes an article available unter the following URL: http://example.com/article/1-my-first-and-famous-article/
Internally I can identify the requested article unequivocally by its id (1).
How should I handle requests to a wrong (typing error, manipulation, ..) URL? For example someone requests http://example.com/article/1-my-firsz-and-famous-article/ or http://example.com/article/1-this-article-is-stupid-idiot/ - should I respond with http status code 301 and redirect to the right URL or with 404 and show a not found page (maybe with redirection after a few seconds). Which is the preferable way in terms of search engine optimization?
Wrong URLs will be 404 error and any existing page moved to new location will be 301 redirect
I am new in MVC, I have a list of url redirection:
•website1.domain.com goes to domain.com\websites\1
•website2.domain.com goes to domain.com\websites\2
This is a dynamic mapping like this: websiteN.domain.com goes to domain.com\websites\N
How can I do this in MVC, Do I need to use routing? or I need only URL redirection?
This is a duplicate question.
Everything you need can be done in IIS.
please visit this Stack link:
handling sub-domains in IIS for a web application
(same user asked this question and reposted How can we make an ASP.NET MVC4 route based on a subdomain?)
you can get more detailed information:
http://www.dotnetexpertguide.com/2012/04/aspnet-iis-dns-records-sub-domain-on.html
http://content.websitegear.com/article/subdomain_setup.htm
I've had a similar situation where I needed to make sure that the language code was in the url.
My solution was to write an http module. You'll want this module to inspect the request and see what subdomain the request is under. If it is a subdomain, then you'll want to redirect them to the correct directory under domain.com
I'm stuck at the point where I need to crawl websites that have a form post.
Nutch does not support this.
How do I get around this so I can crawl these websites using Nutch? Is there a better solution?
Make a file with data: regex for URLs requiring auth / URL to submit form / form data
Make own http protocol plugin modifying standard protocol-httpclient plugin. If URL to make http request is requiring auth and no auth made yet, so go to form and send it.
Here's the simplest solution. The problem is, there is no one simple solution for big amount of websites. There are problems with cookie expiring / using of Javascript during login / etc. Search through Nutch's JIRA, there were many discussions about that.
Here is the answer that you guys are looking for:
http://lifelongprogrammer.blogspot.com/2014/02/part1-using-apache-http-client-to-do-http-post-form-authentication.html
and
https://issues.apache.org/jira/browse/NUTCH-827
These two links have complete and sample code. If you follow each steps correctly, then you will be able to achieve Form Based Authentication in Nutch.
I have website. last two weeks ago I change new media / entertainment related site. old Content is download link provided. Now I get 20000 urls not found. 5000 urls sitemap errors. How to delete this urls. I am tried. I used google webmaster tools. my site losing visitors.
You should redirect the old locations to the new ones, using the HTTP 301 Moved Permanently status code.
Redirect old url to the new one! This can be done in .htaccess fle:
Redirect 301 http://old_url http://new_url