I'm seeing some weird POST errors to my Rails 5 site. They are in the form of POST https://www.example.com/pages/:5054, with the weird :5054 at the end. I also do not have any route that POSTs to /pages/ or /pages/:id. I've got no idea what is causing these weird POSTs. The referrer-URLs are from my site, so it is not, as far as I can see, some weird bot or some such.
The only common denominator I see is the presence of the Rmch-Securitycookie: header on the bad requests. While I do not know if this is the root cause of my issue, it's a start at least. I'm thinking it's a bad extension or some monitoring software or some such. Google turns up nothing; has anyone encountered this header and knows what it is?
Related
Can anyone help me better understand Unicorn::ClientShutdown errors? I see them occasionally via my web app's error logs and I have no idea what's causing them, how I can replicate the issue, or whether it's safe to ignore it altogether.
From the documentation (https://www.rubydoc.info/gems/unicorn/2.0.0/Unicorn/ClientShutdown), it seems like this has something to do with interrupted sockets, but I'm not sure exactly what that means or how it relates to my app.
I believe I've only ever seen this on POST requests, and the error has almost exclusively been associated with a simple POST request that tracks page views (and is by far the most common non-GET request made by the web app).
Thanks in advance for your help!
So https://www.jenkins.io/ has been down for at least most of the afternoon. The main page is accessible, but blog posts and plugins etc. aren't available. I get a 503 that looks like this:
I figured I'd try again later, but since it was still down I thought I'd better report it. So I went to their JIRA to report the issue at https://issues.jenkins-ci.org/, which seems to be up, but when trying to log in I get a 502 response, with the following error message:
I went to their GitHub, but they have issues disabled there. I'm running out of options, so I figured I'd ask here to see if there is someone that knows how to get in touch with someone that knows how to fix it. I found a few tweets about it, but no responses from anyone that seems to be able to do anything about it.
After the issue was resolved and I was able to log in to JIRA, I found a way to report the issue and apparently a few people already did. If this happens again, you can go to https://github.com/jenkins-infra/jenkins.io/issues/ and report the issue there.
Another place to check is the jenkins-infra channel on freenode, as that's where they are discussing the issue during investigation.
In case you were curious, it seems like this outage was due to a problem with the Kubernetes cluster where it was hosted. I don't know any more details than that.
I hope this might help someone in the future.
My site has some particular pages that are:
Already indexed in search engines, but I want to remove them from the indexes.
Numerous, as they are dynamic (based on query string).
A bit "heavy." (An overzealous bot can strain the server more than I'd like.)
Because of #2, I'm just going to let them slowly get removed naturally, but I need to settle on a plan.
I started out by doing the following:
Bots: Abort execution using user-agent detection in the application, and send a basically blank response. (I don't mind if some bots slip through and render the real page, but I'm just blocking some common ones.)
Bots: Throw a 403 (forbidden) response code.
All clients: Send "X-Robots-Tag: noindex" header.
All clients: Added rel="nofollow" to the links that lead to these pages.
Did not disallow bots to those pages in robots.txt. (I think it's only useful to disallow bots if you do so from the very beginning, or else after those pages are completely removed from search engines; otherwise, engines can't crawl/access those pages to discover/honor the noindex header, so they wouldn't remove them. I mention this because I think robots.txt might commonly be misunderstood, and it might get suggested as an inappropriate silver bullet.)
However, since then, I think some of those steps were either fairly useless toward my goal, or actually problematic.
I'm not sure if throwing a 403 to bots is a good idea. Do the search engines see that and completely disregard the X-Robots-Tag? Is it better to just let them respond 200?
I think rel="nofollow" only potentially affects target page rank, and doesn't affect crawling at all.
The rest of the plan seems okay (correct me if I'm wrong), but I'm not sure about the above bullets in the grand scheme.
I think this is a good plan:
Bots: Abort execution using user-agent detection in the application, and send a basically blank response. (I don't mind if some bots slip through and render the real page, but I'm just blocking some common ones.)
Bots: Send a 410 (Gone) response code."In general, sometimes webmasters get a little too caught up in the tiny little details and so if the page is gone, it's fine to serve a 404, if you know it's gone for real it's fine to serve a 410," - http://goo.gl/AwJdEz
All clients: Send "X-Robots-Tag: noindex" header. I think this would be extraneous for the known bots who got the 410, but it would cover unknown engines' bots.
All clients: Add rel="nofollow" to the links that lead to these pages. This probably isn't completely necessary, but it wouldn't hurt.
Do not disallow bots to those pages in robots.txt. (It's only useful to disallow bots if you do so from the very beginning, or else after those pages are completely removed from search engines; otherwise, engines can't crawl/access those pages to discover/honor the noindex header, so they wouldn't remove them. I mention this because I think robots.txt might commonly be misunderstood, and it might get suggested as an inappropriate silver bullet.)
I have a page that's been throwing the occasional error due to a strange query appending itself to the url. My errors show this i being added:
&ved=1t:1527...
The ... means there's more. A little googling turned up some urls in existence with this:
&ved=1t:1527,r:9,s:107
The syntax is alien to me. I'm fairly certain it's not coming from my code and I'd like to know what it's trying to do. Does anyone have a clue what this might be?
I'm not even sure I'm using the right terminology, whether this is actually bots or not. I didn't want to use the word 'spam' because it's not like I have comments or posts that are being created/spammed. It looks more like something is making the same repeated request to my domain, which is what made me think it was some kind of bot.
I've opened my first rails app to the 'public', which is a really a small group of users, <50 currently. That was last Friday. I started having performance issues today, so I looked at the log and I see tons of these RoutingErrors
ActionController::RoutingError (No route matches "/portalApp/APF/pages/business/util/whichServer.jsp" with {:method=>:get}):
They are filling up the log and I'm assuming this is causing the slowdown. Note the .jsp on the end and this is a rails app, so I've got no urls remotely like this in my app. I mean, the /portalApp I don't even have, so I don't know where this is coming from.
This is hosted at Dreamhost and I chatted with one of their support people, and he suggested a couple sites that detail using htaccess to block things. But that looks like you need to know the IP or domain that the requests are coming from, which I don't.
How can I block this? How can I find the IP or domain from the request? Any other suggestions?
Follow up info:
After looking at the access logs, it looks like it's not a bot. Maybe I'm not reading the logs right, but there are valid url requests (generated from within my Flex app) coming from the same IP. So now I'm wondering if it's some kind of plugin generating the requests, but I really don't know. Now I'm wondering if it's possible to block a certain url request, based on a pattern, but I suppose that's a separate question.
Old question, but for people who are still looking for alternatives I suggest checking out Kickstarter's rack-attack gem. Allows not only blacklisting and whitelisting, but also throttling.
These page seems to offer some good advice:
Here
The section on blocking by user agent may be something you could look at implementing. Is there anyway you can get the useragent from the bot from your logs? If so look for the unique aspect of the useragent that probably identifies the bot and add the following to .htaccess replacing the relevant bits
BrowserMatchNoCase SpammerRobot bad_bot
Order Deny,Allow
Deny from env=bad_bot
Its detail on that link in more detail and of course, if you can't get the useragent from your logs then this will be of no use to you!
You can also update your public/robots.txt file to allow/disallow robots.
http://www.robotstxt.org/wc/robots.html