Lot's of ActionController::UnknownHttpMethod: CONNECT in a Rails application - ruby-on-rails

I'm getting lot's of these exceptions in a Rails application:
ActionController::UnknownHttpMethod: CONNECT, accepted HTTP methods are get, head, put, post, delete, and options
As far as I see it seems to be some crawler or something like that trying to use CONNECT as an http verb. I've never heard of it, but the documentation say:
This specification reserves the method name CONNECT for use with a proxy that can dynamically switch to being a tunnel (e.g. SSL tunneling [44]).
Any ideas what might be going on? Some poorly written crawler? Something trying to abuse my application or web server? What can I do about it? Totally block them, if so how? This is a Ruby on Rails app running with Passenger on Apache.

Are all the requests coming from the same IP or hostname? If so I would use Apache's mod_authz_host mod_access to deny access to the, most likely, crawler. Since Rails doesn't seem to be doing anything with the request I wouldn't worry about it too much though :)

Related

Rails rewriting link from Blogger

I use rack-reverse-proxy to setup my Blogger.com to a subdomain of my Ruby on Rails app: pulpoludo.com/blog
It's work, but I have an issue with the link of Blogger which returns to blog.pulpoludo.com (where my Blogger blog is host).
I would like to rewrite this link. But I don't know-how. Can you help me?
(I have found someone who does this in PHP: https://matt-stannard.blogspot.com/2013/02/blogger-in-subdirectory-of-my-domain.html
But I would like to do the same thing with Rails and a gem maybe)
Indeed, you cannot use rack-reverse-proxy because it does not allow you to change the response (you need to rewrite the page you retrieve using a regular expression replacement, as in the example you link to.
Also, you should probably avoid using rack-reverse-proxy in production, as it will keep your ruby processes busy waiting for the backend responses, that might fail or be slow. And:
It is not meant for production systems
You should instead proxy from your front HTTP acceptor (nginx or other). For nginx you can see a very thorough response, using a combination of proxy_pass and sub_filter, at https://stackoverflow.com/a/32543398/384417.
edit: If it's not possible to use nginx or another reverse proxy, you can still do it in ruby.
rack-reverse-proxy supports transformers, you can build one yourself, and register it so it's run on the response. This (closed) issue will help, it is exactly what you need: https://github.com/waterlink/rack-reverse-proxy/issues/65. The caveat (as always when changing responses) is that you have to update the Content-Length response header to match the updated size of the body.

Using routing with faye::WebSocket in ruby

I am experimenting with websockets in my ruby on Rails server. I am trying faye-websocket as described in here.
Initial tests look promising (I am using a python client and I am able to connect to the websocket) but I have a newbie question that keeps bugging me. Including my websockets library as a middleware in ruby seems to capture ALL requests from my client that are websocket connections. In such case, how do I differentiate (and reply differently) to client calls with different routing (e.g. calls to http://myserver.com/apple and http://myserver.com/pear being both websockets)?
EDIT
I found that the env variable contains the field "REQUEST_PATH" which has the information of the routing requested by the client. I can use that variable to return the appropriate answer to each one of the different client calls. Is there any more "elegant" way to do it?

How to prevent abusive crawlers from crawling a rails app deployed on Heroku?

I want to restrict the crawler access to my rails app running on Heroku. This would have been a straight forward task if I was using Apache OR nginX. Since the app is deployed on Heroku I am not sure how I can restrict access at the HTTP server level.
I have tried to use robots.txt file, but the offending crawlers don't honor robot.txt.
These are the solutions I am considering:
1) A before_filter in the rails layer to restrict access.
2) Rack based solution to restrict access
I am wondering if there are any better ways to deal with this problem.
I have read about honeypot solutions: You have one URI that must not be crawled (put it in robots.txt). If any IP calls this URI, block it. I'd implement it as a Rack middleware so the hit does not go to the full Rails stack.
Sorry, I googled around but could not find the original article.

How can I future-proof my client URL links to my server for future HTTPS migration?

How can I future-proof my client URL links to my server for future HTTPS migration?
I have a .net winforms client talking to my ruby on rails backend. If I move the website in the future I want to make sure that my API links from the client don't have to change.
Or is this something a hosting provider can let you configure.
Oh, and when I do migrate I will not want any non HTTPS to occur.
PS1 - I am not talking about moving servers here, just upgrading the existing web application server with a certificate and moving to HTTPS only traffic
Place a base url as a config parameter in your client application, then run all new links through a getLinkURL(String relativeDestination) method which will give you a full url.
If you're worried about clients that haven't been updated making non-http requests, in your http (non-secure) vhost just Redirect 301 / https:// on your server.
If I understand the question correctly, I think you can solve this by using relative links everywhere; unless there's a reason you can't do that?
I think you need to look into DNS and how it works. It's not going to protect you against an HTTP to HTTPS migration but would allow you to move servers without re-engineering your code. Ideally I think you'd look to have a config setting in your code to switch from HTTP to HTTPS (and back) when necessary.

Rails/Passenger/Apache: Simple one-off URL redirect to catch stale DNS after server move

One of my rails apps (using passenger and apache) is changing server hosts. I've got the app running on both servers (the new one in testing) and the DNS TTL to 5 minutes. I've been told (and experienced something like this myself) by a colleague that sometimes DNS resolvers slightly ignore the TTL and may have the old IP cached for some time after I update DNS to the new server.
So, after I've thrown the switch on DNS, what I'd like to do is hack the old server to issue a forced redirect to the IP address of the new server for all visitors. Obviously I can do a number of redirects (301, 302) in either Apache or the app itself. I'd like to avoid the app method since I don't want to do a checkin and deploy of code just for this one instance so I was thinking a basic http url redirect would work. Buuttt, there are SEO implications should google visit the old site etc. etc.
How best to achieve the re-direct whilst maintaining search engine niceness?
I guess the question is - where would you redirect to? If you are redirecting to the domain name, the browser (or bot) would just get the same old IP address and end up in a redirect loop.
If you redirect to an IP address.. well, that's not going to look very user friendly in someone's browser.
Personally, I wouldn't do anything. There may be some short period where bots get errors trying to access your site, but it should all work itself out in a couple days without any "SEO damage"
One solution might be to use Mod_Proxy instead of a rewrite to proxy traffic to the new host. This way you shouldn't see any "SEO damage".
I used rinetd to redirect the IP traffic from the old server to the new one on IP level. No web server or virtual hosts config needed. Runs very smoothly and absolutely transparent to any client.

Resources