Mechanize - Receiving Errno::EMFILE: Too many open files - socket(2) after a day - ruby-on-rails

I'm running an application that uses mechanize to fetch some data every so often from an RSS feed.
It runs as a heroku worker and after a day or so I'm receiving the following error:
Errno::EMFILE: Too many open files - socket(2)
I wasn't able to find a "close" method within mechanize, is there anything special I need to be doing in order to close out my browser sessions?
Here is how I create the browser + read information:
def mechanize_browser
#mechanize_browser ||= begin
agent = Mechanize.new
agent.redirect_ok = true
agent.request_headers = {
'Accept-Encoding' => "gzip,deflate,sdch",
'Accept-Language' => "en-US,en;q=0.8",
}
agent
end
end
And actually fetching information:
response = mechanize_browser.get(url)
And then closing after the response:
def close_mechanize_browser
#mechanize_browser = nil
end
Thanks in advance!

Since you manually can't close each instance of Mechanize, you can try invoking Mechanize as a block. According to the docs:
After the block executes, the instance is cleaned up. This includes closing all open connections.
So, rather than abstracting Mechanize.new into a custom function, try running Mechanize via the start class method, which should automatically close all your connections upon completion of the request:
Mechanize.start do |m|
m.get("http://example.com")
end

I ran into this same issue. The Mechanize start example by #zeantsoi is the answer that I ended up following, but there is also a Mechanize.shutdown method if you want to do this manually without their block.
There is also an option that you can add a lambda on post_connect_hooks
Mechanize.new.post_connect_looks << lambda {|agent, url, response, response_body| agent.shutdown }

Related

Accessing response status after it is sent

I'm working on a feature to store requests made to specific endpoints in my app.
after_action :record_user_activity
def record_user_activity
return unless current_user
return if request.url =~ /assets|packs/
#session.navigation.create!(
uri: request.url,
request_method: request.method,
response_status: response.code.to_i,
access_time: Time.now
)
end
The problem is that, even if we get an error response, when getting the response.code at this point (after_action), the response code is still a 2xx. I imagine it's probably because the server hasn't yet faced whatever problem it may face during the data access process.
How can I properly store the status code that was actually sent to the user?
The rails logs already store the requests and responses. If you just need to track all user requests, you can simply add the user id to your logs. Unless there's a specific reason to keep this data in the DB, which would grow exponentially with the amount of user activity, add this to either config/application.rb or config/environments/production.rb
MyAppName::Application.configure do
#...whatever you already have in configs
# then add the following line
config.log_tags = [
-> request { "user-#{request.cookie_jar.signed[:user_id]}" }
]
end
Then you can tail the logs and use grep, or write some other processes to parse over the logs with other analytics. There are many tools available for this type of work. But here's a basic example
tail -f logs/production.log | grep user-101
# this would show all log requests from user with id 101
If you still need this however, you may wanna try prepend_after_action instead, see
http://api.rubyonrails.org/classes/AbstractController/Callbacks/ClassMethods.html#method-i-prepend_after_action

How to receive data in ActionCable Channel without JS?

I'm writing a Rails application that uses WebSockets to communicate with other machines (no browser and client side logic in this process). I have a channel:
class MachinesChannel < ApplicationCable::Channel
def subscribed
...
end
def unsubscribed
...
end
def handle_messages
...
end
end
To receive the data the only way I know about is the JavaScript client:
ActionCable.createConsumer('/cable').subscriptions.create 'MachinesChannel',
received: (message) ->
#perform('handle_messages')
I can call server side methods from JS via #perform() method.
Is there any way to omit the JS part and somehow directly handle the incoming data in MachinesChannel?
The ideal situation would be to have the handle_messages method accept a data argument and have this metod called on incoming data.
After looking into ActionCable source code I got the following solution. You just have to create a method in MachinesChannel that you want to be called, e.g. handle_messages(data). Then, in the client that connects to your websocket, you need to send a message in the following format (example in ruby):
id = { channel: 'MachinesChannel' }
ws = WebSocket::Client::Simple.connect(url)
ws.send(JSON.generate(command: 'message', identifier: JSON.generate(id), data: JSON.generate(action: 'handle_messages', foo: 'bar', biz: 'baz')))
action has to be the name of the method you want to be called in MachinesChannel. The rest of key-values are whatever you want. This the date you can receive in the ActionCable channel.
Recently a gem action_cable_client has been release which seems exactly perfect for this kind of usage. I haven't used it, so I don't know how it really works.
Instead of:
def handle_messages
...
end
This works for me:
def receive(data)
puts data
...
end

The stratigy of build a talk-to-talk system using em-websocket in rails?

Maybe it is a good example for server push system. There are many users in the system, and users can talk with each other. It can be accomplished like this: one user sends message(through websocket) to the server, then the server forward the message to the other user. The key is to find the binding between the ws(websocket object) and the user. The example code like below:
EM.run {
EM::WebSocket.run(:host => "0.0.0.0", :port => 8080, :debug => false) do |ws|
ws.onopen { |handshake|
# extract the user id from handshake and store the binding between user and ws
}
ws.onmessage { |msg|
# extract the text and receiver id from msg
# extract the ws_receiver from the binding
ws_receiver.send(text)
}
end
}
I want to figure out following issues:
The ws object can be serialized so it can be stored into disk or database? Otherwise I can only store the binding into memory.
What the differences between em-websocket and websocket-rails?
Which gem do you recommend for websocket?
You're approaching a use case that websockets are pretty good for, so you're on the right track.
You could serialize the ws object with Marshal, but think of websocket objects as being a bit like http request objects in that they are abstractions for a type of communication. You are probably best off marshaling/storing the data.
em-websocket is a lower(ish) lever websocket library built more or less directly on web-machine. websocket-rails is a higher level abstraction on websockets, with a lot of nice tools built in and pretty ok docs. It is built on top of faye-websocket-rails which is itself built on web machine. *Note, action cable which is the new websocket library for Rails 5 is built on faye.
I've use websocket-rails in the past and rather like it. It will take care of a lot for you. However, if you can use Rails 5 and Action Cable, do that, its the future.
The following is in addition to Chase Gilliam's succinct answer which included references to em-websocket, websocket-rails (which hadn't been maintained in a long while), faye-websocket-rails and ActionCable.
I would recommend the Plezi framework. It works both as an independent application framework as well as a Rails Websocket enhancement.
I would consider the following points as well:
do you need the message to persist between connections (i.e. if the other user if offline, should the message wait in a "message box"? for how long should the message wait?)...?
Do you wish to preserve message history?
These points would help yo decide if to use a persistent storage (i.e. a database) for the messages or not.
i.e., to use Plezi with Rails, create an init_plezi.rb in your application's config/initializers folder. use (as an example) the following code:
class ChatDemo
# use JSON events instead of raw websockets
#auto_dispatch = true
protected #protected functions are hidden from regular Http requests
def auth msg
#user = User.auth_token(msg['token'])
return close unless #user
# creates a websocket "mailbox" that will remain open for 9 hours.
register_as #user.id, lifetime: 60*60*9, max_connections: 5
end
def chat msg, received = false
unless #user # require authentication first
close
return false
end
if received
# this is only true when we sent the message
# using the `broadcast` or `notify` methods
write msg # writes to the client websocket
end
msg['from'] = #user.id
msg['time'] = Plezi.time # an existing time object
unless msg['to'] && registered?(msg['to'])
# send an error message event
return {event: :err, data: 'No recipient or recipient invalid'}.to_json
end
# everything was good, let's send the message and inform
# this will invoke the `chat` event on the other websocket
# notice the `true` is setting the `received` flag.
notify msg['to'], :chat, msg, true
# returning a String will send it to the client
# when using the auto-dispatch feature
{event: 'message_sent', msg: msg}.to_json
end
end
# remember our route for websocket connections.
route '/ws_chat', ChatDemo
# a route to the Javascript client (optional)
route '/ws/client.js', :client
Plezi sets up it's own server (Iodine, a Ruby server), so remember to remove from your application any references to puma, thin or any other custom server.
On the client side you might want to use the Javascript helper provided by Plezi (it's optional)... add:
<script src='/es/client.js' />
<script>
TOKEN = <%= #user.token %>;
c = new PleziClient(PleziClient.origin + "/ws_chat") // the client helper
c.log_events = true // debug
c.chat = function(event) {
// do what you need to print a received message to the screen
// `event` is the JSON data. i.e.: event.event == 'chat'
}
c.error = function(event) {
// do what you need to print a received message to the screen
alert(event.data);
}
c.message_sent = function(event) {
// invoked after the message was sent
}
// authenticate once connection is established
c.onopen = function(event) {
c.emit({event: 'auth', token: TOKEN});
}
// // to send a chat message:
// c.emit{event: 'chat', to: 8, data: "my chat message"}
</script>
I didn't test the actual message code because it's just a skeleton and also it requires a Rails app with a User model and a token that I didn't want to edit just to answer a question (no offense).

Optimal way to structure polling external service (RoR)

I have a Rails application that has a Document with the flag available. The document is uploaded to an external server where it is not immediately available (takes time to propogate). What I'd like to do is poll the availability and update the model when available.
I'm looking for the most performant solution for this process (service does not offer callbacks):
Document is uploaded to app
app uploads to external server
app polls url (http://external.server.com/document.pdf) until available
app updates model Document.available = true
I'm stuck on 3. I'm already using sidekiq in my project. Is that an option, or should I use a completely different approach (cron job).
Documents will be uploaded all the time and so it seems relevant to first poll the database/redis to check for Documents which are not available.
See this answer: Making HTTP HEAD request with timeout in Ruby
Basically you set up a HEAD request for the known url and then asynchronously loop until you get a 200 back (with a 5 second delay between iterations, or whatever).
Do this from your controller after the document is uploaded:
Document.delay.poll_for_finished(#document.id)
And then in your document model:
def self.poll_for_finished(document_id)
document = Document.find(document_id)
# make sure the document exists and should be polled for
return unless document.continue_polling?
if document.remote_document_exists?
document.available = true
else
document.poll_attempts += 1 # assumes you care how many times you've checked, could be ignored.
Document.delay_for(5.seconds).poll_for_finished(document.id)
end
document.save
end
def continue_polling?
# this can be more or less sophisticated
return !document.available || document.poll_attempts < 5
end
def remote_document_exists?
Net::HTTP.start('http://external.server.com') do |http|
http.open_timeout = 2
http.read_timeout = 2
return "200" == http.head(document.path).code
end
end
This is still a blocking operation. Opening the Net::HTTP connection will block if the server you're trying to contact is slow or unresponsive. If you're worried about it use Typhoeus. See this answer for details: What is the preferred way of performing non blocking I/O in Ruby?

Get online users XMPP4r + Rails

I'm trying get online friends by user in XMPP server (Ejabberd). I'm using Ruby on Rails 3.2.
The idea is to add in array all online users to use this on view page.
I found asynchronous code (below), but it use Thread and it's difficult to work on controller method.
jid = Jabber::JID.new('user#localhost')
cl = Jabber::Client.new(jid)
cl.connect
cl.auth('123456')
#online_users = [] #online users queue
roster = Jabber::Roster::Helper.new(cl)
mainthread = Thread.current
roster.add_presence_callback { |item,oldpres,pres|
if item.online?
#online_users.push item
else
#online_users.delete_if {|x| x.jid == item.jid }
end
puts #online_users.inspect
puts "#{item.jid} - online: #{item.online?}"
}
cl.send(Jabber::Presence.new.set_show(:dnd))
t = Thread.new { sleep XMPP_REQUEST_TIMEOUT; mainthread.wakeup;}
Thread.stop
cl.close
So I need some synchronous code, or some way to execute this kind of code in controller method.
Thanks.
For this found another solution that help me:
I installed a mod_rest in ejabberd server. This module allow that you do HTTP request of terminal commands of ejabberdctl.
So it has "ejabberdctl connected_users", that return users online.
So in your model app using gem rest-client you can do something like it:
def online_users
response = RestClient.post('http://localhost:5280/rest', "connected_users")
response
end
You will be much happier in the long run if you use a library like Strophe.js to do this in the browser, talking to an XMPP server that has BOSH enabled. Snapshots of presence are never anywhere as interesting as you expect them to be, and you're going to have really bad authentication/authorization problems on the path down which you're heading.

Resources