How to use proxy when using URI.open in Ruby on Rails - ruby-on-rails

I am parsing a web page using nokogiri gem like the code below:
require 'open-uri'
url = 'https://www.google.com/' # example url
Nokogiri::HTML(URI.open(url))
# ... some other codes
I would like to use proxy with open-uri. I have looked through the documentation, but the example code is without using proxy.
How can I use proxy when using URI.open? I would appreciate any help with syntax explanation or some example code. Thank you!

URI.open is designed to be an easy to use wrapper for Net::Http but I think you have at least a couple of options:
URI.open (and URI generically) supports the use of an HTTP_PROXY environment variable, like: Automatically adding proxy to all HTTP connections in ruby
Net:Http has options that would allow you to fine tune this and use different types of authentication with the proxy, and it's not particularly hard see: How to set a proxy in rubys net/http?

Related

Rails rewriting link from Blogger

I use rack-reverse-proxy to setup my Blogger.com to a subdomain of my Ruby on Rails app: pulpoludo.com/blog
It's work, but I have an issue with the link of Blogger which returns to blog.pulpoludo.com (where my Blogger blog is host).
I would like to rewrite this link. But I don't know-how. Can you help me?
(I have found someone who does this in PHP: https://matt-stannard.blogspot.com/2013/02/blogger-in-subdirectory-of-my-domain.html
But I would like to do the same thing with Rails and a gem maybe)
Indeed, you cannot use rack-reverse-proxy because it does not allow you to change the response (you need to rewrite the page you retrieve using a regular expression replacement, as in the example you link to.
Also, you should probably avoid using rack-reverse-proxy in production, as it will keep your ruby processes busy waiting for the backend responses, that might fail or be slow. And:
It is not meant for production systems
You should instead proxy from your front HTTP acceptor (nginx or other). For nginx you can see a very thorough response, using a combination of proxy_pass and sub_filter, at https://stackoverflow.com/a/32543398/384417.
edit: If it's not possible to use nginx or another reverse proxy, you can still do it in ruby.
rack-reverse-proxy supports transformers, you can build one yourself, and register it so it's run on the response. This (closed) issue will help, it is exactly what you need: https://github.com/waterlink/rack-reverse-proxy/issues/65. The caveat (as always when changing responses) is that you have to update the Content-Length response header to match the updated size of the body.

Forcing Gibbon Gem (or Faraday) to use QuotaGuard Static HTTP proxy on Heroku

Full disclaimer; I'm not a strong Ruby dev, but I am learning quickly :)
I've set up a simple Ruby script on a Heroku dyno that listens for calls from our donation platform.
When a donation is made, it hits a webhook endpoint within my app, which then sends a donation receipt via Mandrill (which works fine), and updates/inserts a record in a Mailchimp list, via the 'upsert' method of the wonderful Gibbon gem.
That all works fine; except when the Heroku box happens to come up on an IP address that has done something bad in the past, and Mailchimp's API drops with a 403 (Forbidden) error.
I've had this confirmed by the Mailchimp API team; they suggest using something like QuotaGuard Static to tunnel the API requests to Mailchimp through, removing the issue of API calls from inconsistent (and sometimes untrusted) IP addresses.
I'd love some advice on how to make this happen. I can see that Gibbon uses Faraday to handle HTTP requests, but I'm not an advanced enough Ruby dev to fork the code and add in HTTP proxy functionality.
If there's a way to globally force the Faraday calls to use a HTTP proxy (ie QuotaGuard Static), that's what I'm looking for. A config setting for Faraday, for example.
Or perhaps there's a tweak I can make to my Procfile:
web: bundle exec ruby webhooks.rb -p $PORT
...that will force the outbound traffic to go via the QuotaGuard Static proxy. I know Proximo has this functionality, but it also blocks inbound access to the app, which doesn't work for this app.
Appreciate any ideas the community can offer. Thanks!
Gibbon Author here. You can simply set the proxy value to the proxy URL in Gibbon 2.2.0 and later.
From the Faraday documentation (here) the Connectionclass uses the proxy specified in the http_proxy environment variable. I have never tried it, but looking at the source code it should work.
I wanted to provide a bit more information, since the two answers pointed me on the right track but still required me to do some digging. I solved this issue by first adding the QuotaGuard Static add-on in Heroku (free for up to 250 uses per month) and then initializing Gibbon like so:
g = Gibbon::Request.new
g.proxy = ENV["QUOTAGUARDSTATIC_URL"]
And here is the relevant section from the Gibbon docs: https://github.com/amro/gibbon#other

http request (using net http or RestClient) inside my rails controller

I have problem creating http request inside my controller action. I used net/http and RestClient but I can't get it to work on my local server url i.e http://localhost:3000/engine/do_process, I always get requesttimeout however It works with other valid url.
Hope you can enlighten me on this one. I did some research but I can't find resources as to why I got this timeout problem.
Sample controller code:
require 'rest_client'
class LgController < ApplicationController
def get_lgjson
response = RestClient.get("http://localhost:3000/engine/do_process_lg")
#generated_json = response.to_str
end
end
I encountered this problem today, too, exactly in the same context: using the Ruby RestClient to make a HTTP request inside a controller. It worked earlier in a different project using OpenURI without problems. This was surprising because both http libraries, the RestClient and OpenURI for Ruby, use the same library Net::HTTP.
It is the URL that makes the difference. We can make a connection to an external URL in the controller, but not to localhost. The problem seems to be the duplicated connection to localhost. There is already a connection to localhost open, and we are trying to open a second one. This does not seem to work in a single-threaded web server like Thin for instance. A multi-threaded web server such as Puma could help.
I think this is because you use single-threaded web server. You have two opportunities to fix.
use passenger
define if it makes sense to make net/http to localhost.

Ruby - Get client IP Address from inside a gem

I'm trying to create a gem and I would like to have a feature inside it that takes the client ip address when using it. Basically it's just like the rails ActionController request.remote_ip, but within the gem I don't want to rely/depend on rails.
Is there any way to have something like this purely using Ruby?
I've found this, but when deployed to a production server it only gets the server ip address and not the client one.
Any help would be pretty much appreciated
Thanks a lot
it's possible because of Ruby :) if you are developing a web app you should use rack ! it's pretty cool and simple otherwise you will have to deal with the 'net' module.

HTTP Logging in rails?

Does anyone know of a plugin / gem that will log any HTTP requests your rails app may be making when responding to a request? For example if you are using HTTParty to hit an API, how can you see what outbound requests are coming out of your rails app?
You have to tell the outbound HTTP client to use a proxy.
For HTTParty it's fairly simple (from the docs),
class Twitter
include HTTParty
http_proxy 'http://myProxy', 1080
If you're looking for a proxy to set up, personally I like Paros proxy (Java so cross platform and does SSL).
Try also http_logger gem:
require 'http_logger'
Net::HTTP.logger = Logger.new(...) # defaults to Rails.logger if Rails is defined
Net::HTTP.colorize = true # Default: true
This will log all requests that goes through Net::HTTP library.
https://github.com/railsware/http_logger
If you're doing development on your own machine, Charles Proxy is a good option.
In production, you'd probably be better off creating your own logger.debug() messages.
The only way I got this to work was to specify only the IP as the first parameter to the http_proxy call:
http_proxy '10.2.2.1', 8888
The example above, with the http:// prefix, did not work, I got a SocketError: getaddrinfo: nodename nor servname provided
Try my httplog gem, you can customize it to log requests, responses, headers etc.

Resources