This is perhaps a vague question, but it appears like some bot is crawling my site and doing it VERY poorly. It appears to be guessing IDs from my application js file and putting these into urls, for example:
Couldn't find Post with id=keypress
And even more strangely, the HTTP referrer is listed as application.js.
Has anyone experienced this before? Any ideas on how to stop these crawlers?
If it is a legitimate crawler, you can stop it when by placing robot.txt file in your root domain directory - http://en.wikipedia.org/wiki/Robots_exclusion_standard
You would include the following text in the robots.txt file:
User-agent: *
Disallow: /YOUR_PATH_TO_FILE/application.js
You can also add this tag to your page headers:
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
If it is a malicious crawler, this of course will not stop it. There are other methods you can take for crawlers that do not respect robots.txt, but that depends on what web server you are using.
Related
I have a Rails 4.2.6 application running. In some of my pages, I use iframe to put Google Map Embedded api to show maps of some locations. My whole website is https secured using letsencrypt. However the pages that use google map api always get a "Not Secure" warning from Chrome or Firefox. When I remove the google map iframes, the warning disappears.
I have googled a lot and there is a workaround in here using meta tag. The workaround is an HTML format, but I still don't know how to put the meta tag to my rails application. The keyword seems to be "upgrade-insecure-requests".
Please help, thank you .
Solved!
just put this line
<meta http-equiv="Content-Security-Policy" content="upgrade-insecure-requests">
in your main html file.
In my case, it is not application.html.erb, it's another file. After I put that line in my home page file, everything is working just fine now.
I am trying to get robots.txt to work so that search engines start indexing my website and show meta info like descriptions etc.
However, I get this message:
A description for this result is not available because of this site's robots.txt – learn more.
Here is what my robots.txt look like.
# See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
#
# To ban all spiders from the entire site uncomment the next two lines:
User-agent: *
Disallow: /tags/*
Disallow: /users/*
What do I need to change?
This is a Rails4 application hosted on Heroku and is in the public directory in the Rails repository
First of all, it is not compulsory to use robots.txt file! you only need to use them in case you don't want the search engines to crawl specific pages or directories of your website.
In your case, you are restricting search engines to crawl tags and users' directories hosted on the root. Now, any page inside this directory will give this error.
I also recommend using the Google webmaster tool and verifying your website. You can test Robots.txt file from there.
Try removing some asterisks:
User-agent: *
Disallow: /tags/
Disallow: /users/
Meanwhile, providing a location to your site map might be helpful too:
Sitemap: www.yoursite.com/sitemap.xml
I just spend the last few hours debugging a huge problem, the problem being,
My external css style sheet were not loading when I used Firefox.
Using Firefoxes debugging tools I was able to conclude that the file was not been found, it had nothing to do with the MIME type or encoding as I checked.
I was using relative URL's to reference my style sheets to I decided to use absolute and it worked! after hours of nearly losing my mind.
However using absolute URL's on every page is just a pain and not practical if I am debugging on localhost all the time.
Could anyone tell me why I need to proved the absolute URL's? The CSS file is there and Firefox states the relative URL and when I go to it manually, it works, however Firefox will just not find it. Every other browser including Chrome and Safari Works with the relative URL's.
I could use php and define all these relative URL's and then reference these within my HTML making it easier to switch domains for debugging but still its a pain and I don't know why I have to do this.
My site here
Thanks in advance,
Jack.
Note : For testing reasons I am giving the link to my site which I am having problems with, nothing to do with advertising.
For your stylesheet problem: change the backslash to a forward slash in your <link> element.
<link rel="stylesheet" href="css/main.css">
There are a couple of images with a similar problem.
You have a number of other errors: <script> tags between <head> and <body>, and some loose </article> tags as well
If you're using Firefox, take a look at the page source and fix anything you see highlighted in red. Then try again.
Thanks in advance for any help you can provide!
I have a website built in Ruby on Rails. My site has a webpage, located at example.com/communityboard, that you can use to enter a separate Community area (an off-the-shelf bulletin board called bbPress.)
I want users who type in the URL example.com/community to be redirected to example.com/communityboard . It used to work this way, but for some reason, the redirect no longer works in any browser but IE.
We accomplished this redirect by placing an index.html file in the /community folder where bbPress had been installed. The entire code for the index.html file reads
<meta http-equiv="refresh" content="0;url=http://example.com/communityboard">
Back when we built the site, I was told that a meta refresh redirect using an index.html file was the best option. The redirect had to address ONLY a single page (http://example.com/community) and not all of the sublevels of the community bb (which lives at http://example.com/community/index.php). Otherwise, the community bb and all of its sublevels would be redirected.
So... my questions:
Why is the meta refresh redirect not working anymore?
How can I fix it?
Thanks again for any help you can offer!
If it's only working in IE, it's possible there's a script or parsing issue that's breaking other browsers. I would run the HTML through a validator like http://validator.w3.org/.
Meta-refresh is a legacy practice that is now discouraged -- the wikipedia entry contains more info and links to alternative solutions: http://en.wikipedia.org/wiki/Meta_refresh.
Here's what happened according to my developer. I don't fully understand the explanation, so I'm not sure I can answer follow-up questions! "With the old mongrel cluster, Apache would recognize "/community" as a directory, silently forward to "/community/", which would then pick up the forwarding index.html file. With Phusion Passenger," which I guess we're using now, "Apache sends the request directly to Passenger if "/community" is not a regular file, and Passenger was returning the 404 error. As a fix, we've disabled passenger in the community folder, which fixes the problem."
I've run into a weird issue with a site running ASP.NET MVC on IIS7.
Whenever I do a page refresh (F5), the external stylesheet content gets "injected" into the page itself, instead of the tag pointing to the css file. Example:
<head><link type="text/css" rel="stylesheet" href="external.css" /></head>
Renders as:
<head><style type="text/css">body{ color: #000; }</style></head>
Locally, there is no issue at all, only when it is uploaded to the server.
If I do a hard refresh (Ctrl + F5), it renders as it should, but subsequent requests will not.
I'm inexperienced with IIS7, so I don't know if this issue could be caused by it.
Any help would be appreciated.
Turns out an improperly closed script tag was wrecking havoc with the page.
After fixing it the page renders normally.
Well, this is a weird issue. I don't know if IIS7 has a setting, or a handler that would cause this.
Try using a tool like Fiddler or Live HTTP Headers to verify the external CSS file is actually not being requested at all.