I'm currently building a Ruby on Rails app that allows users to sign in via Gmail and it have a constant IDLE connection to their Inbox. Emails need to arrive in the app as soon as they come into their Gmail Inbox.
Currently I have the following in terms of implementation, and some issues that I really need some help figuring out.
At the moment, when the Rails app boots up, it creates a thread per user which authenticates and runs in a loop to keep the IDLE connection alive.
Every 10-15 minutes, the thread will "bounce IDLE", so that a little data is transferred to make sure the IDLE connection stays alive.
The major issue I think is in terms of scalability and how many connections the app has to Postgres. It seems that each thread requires a connection to Postgres, this will be heavily limited on Heroku by the number of max connections (20 for basic and 500 for any plans after that).
I really need help with the following:
What's the best way to keep all these IDLE connections alive, but reducing the number of threads and connections needed to the database?
Note: user token refresh may happen if the refresh token to Gmail runs out, so this requires access to the database
Are there any other suggestions for how this may be implemented?
EDIT:
I have implemented something similar to the OP in this question: Ruby IMAP IDLE concurrency - how to tackle?
There is no need to spawn a new thread for each IMAP session. These can be done in a single thread.
Maintain an Array (or Hash) of all users and their IMAP sessions. Spawn a thread, in that thread, send IDLE keep-alive to each of the connections one after the other. Run the loop periodically. This will definitely give you far more concurrency than your current approach.
A long term approach will be to use EventMachine. That will allow using many IMAP connections in the same thread. If you are processing web requests in the same process, you should create a separate thread for Event Machine. This approach can provide you phenomenal concurrency. See https://github.com/ConradIrwin/em-imap for Eventmachine compatible IMAP library.
Start an EventMachine in Rails
Since you are on Heroku, you are probably using thin, which already starts an EventMachine for you. However, should you ever move to another host and use some other web server (e.g. Phusion Passenger), you can start an EventMachine with a Rails initializer:
module IMAPManager
def self.start
if defined?(PhusionPassenger)
PhusionPassenger.on_event(:starting_worker_process) do |forked|
# for passenger, we need to avoid orphaned threads
if forked && EM.reactor_running?
EM.stop
end
Thread.new { EM.run }
die_gracefully_on_signal
end
else
# faciliates debugging
Thread.abort_on_exception = true
# just spawn a thread and start it up
Thread.new { EM.run } unless defined?(Thin)
# Thin is built on EventMachine, doesn't need this thread
end
end
def self.die_gracefully_on_signal
Signal.trap("INT") { EM.stop }
Signal.trap("TERM") { EM.stop }
end
end
IMAPManager.start
(adapted from a blog post by Joshua Siler.)
Share 1 connection
What you have is a good start, but having O(n) threads with O(n) connections to the database is probably hard to scale. However, since most of these database connections are not doing anything most of the time, one might consider sharing one database connection.
As #Deepak Kumar mentioned, you can use the EM IMAP adapter to maintain the IMAP IDLE connections. In fact, since you are using EM within Rails, you might be able to simply use Rails' database connection pool by making your changes through the Rails models. More information on configuring the connection pool can be found here.
Related
Update:
Read "Indicate to an ajax process that the delayed job has completed" before if you have the same problem. Thanks Gene.
I have a problem with concurrency. I have a controller scraping a few web sites, but each call to my controller needs about 4-5 seconds to respond.
So if I call 2 (or more) times in a row, the second call needs wait for the first call before starting.
So how I can fix this problem in my controller? Maybe with something like EventMachine?
Update & Example:
application_controller.rb
def func1
i=0
while i<=2
puts "func1 at: #{Time.now}"
sleep(2)
i=i+1
end
end
def func2
j=0
while j<=2
puts "func2 at: #{Time.now}"
sleep(1)
j=j+1
end
end
whatever_controller.rb
puts ">>>>>>>> Started At #{Time.now}"
func1()
func2()
puts "End at #{Time.now}"
So now I need request http://myawesome.app/whatever several times at the same times from the same user/browser/etc.
I tried Heroku (and local) with Unicorn but without success, this is my setup:
unicorn.rb http://pastebin.com/QL0wdGx0
Procfile http://pastebin.com/RrTtNWJZ
Heroku setup https://www.dropbox.com/s/wxwr5v4p61524tv/Screenshot%202014-02-20%2010.33.16.png
Requirements:
I need a RESTful solution. This is API so I need to responds JSON
More info:
I have right now 2 cloud servers running.
Heroku with Unicorn
Engineyard Cloud with Nginx + Panssenger
You're probably using webrick in development mode. Webrick only handles one request at a time.
You have several solutions, many ruby web servers exist that can handle concurrency.
Here are a few of them.
Thin
Thin was originally based on mongrel and uses eventmachine for handling multiple concurrent connections.
Unicorn
Unicorn uses a master process that will dispatch requests to web workers, 4 workers equals 4 concurrent possible requests.
Puma
Puma is a relatively new ruby server, its shiny feature is that it handles concurrent requests in threads, make sure your code is threadsafe !
Passenger
Passenger is a ruby server bundled inside nginx or apache, it's great for production and development
Others
These are a few alternatives, many other exist, but I think they are the most used today.
To use all these servers, please check their instructions. They are generally available on their github README.
For any long response time controller function, the delayed job gem
is a fine way to go. While it is often used for bulk mailing, it works as well for any long-running task.
Your controller starts the delayed job and responds immediately with a page that has a placeholder - usually a graphic with a progress indicator - and Ajax or a timed reload that updates the page with the full information when it's available. Some information on how to approach this is in this SO article.
Not mentioned in the article is that you can use redis or some other memory cache to store the results rather than the main database.
Answers above are part of the solution: you need a server environment that can properly dispatch concurrent requests to separate workers; unicorn or passenger can both work by creating workers in separate processes or threads. This allows many workers to sit around waiting while not blocking other incoming requests.
If you are building a typical bot whose main job is to get content from other sources, these solutions may be ok. But if what you need is a simple controller that can accept hundreds of concurrent requests, all of which are sending independent requests to other servers, you will need to manage threads or processes yourself. Your goal is to have many workers waiting to do a simple job, and one or more masters whose jobs it is to send requests, then be there to receive the responses. Ruby's Thread class is simple, and works well for cases like this with ruby 2.x or 1.9.3.
You would need to provide more detail about what you need to do for help getting to any more specific solution.
Try something like unicorn as it handles concurrency via workers. Something else to consider if there's a lot of work to be done per request, is to spin up a delayed_job per request.
The one issue with delayed job is that the response won't be synchronous, meaning it won't return to the user's browser.
However, you could have the delayed job save its responses to a table in the DB. Then you can query that table for all requests and their related responses.
What ruby version are you utilizing?
Ruby & Webserver
Ruby
If its a simple application I would recommend the following. Try to utilize rubinius (rbx) or jruby as they are better at concurrency. Although they have drawback as they're not mainline ruby so some extensions won't work. But if its a simple app you should be fine.
Webserver
use Puma or Unicorn if you have the patience to set it up
If you're app is hitting the API service
You indicate that the Global Lock is killing you when you are scraping other sites (presumably ones that allow scraping), if this is the case something like sidekiq or delayed job should be utilized, but with caution. These will be idempotent jobs. i.e. they might be run multiple times. If you start hitting a website multiple times, you will hit a website's Rate limit pretty quickly, eg. twitter limits you to 150 requests per hour. So use background jobs with caution.
If you're the one serving the data
However reading your question it sounds like your controller is the API and the lock is caused by users hitting it.
If this is the case you should utilize dalli + memcached to serve your data. This way you won't be I/O bound by the SQL lookup as memcached is memory based. MEMORY SPEED > I/O SPEED
I'm contemplating writing a web application with Rails. Each request made by the user will depend on an external API being called. This external API can randomly be very slow (2-3 seconds), and so obviously this would impact an individual request.
During this time when the code is waiting for the external API to return, will further user requests be blocked?
Just for further clarification as there seems to be some confusion, this is the model I'm anticipating:
Alice makes request to my web app. To fulfill this, a call to API server A is made. API server A is slow and takes 3 seconds to complete.
During this wait time when the Rails app is calling API server A, Bob makes a request which has to make a request to API server B.
Is the Ruby (1.9.3) interpreter (or something in the Rails 3.x framework) going to block Bob's request, requiring him to wait until Alice's request is done?
If you only use one single-threaded, non-evented server (or don't use evented I/O with an evented server), yes. Among other solutions using Thin and EM-Synchrony will avoid this.
Elaborating, based on your update:
No, neither Ruby nor Rails is going to cause your app to block. You left out the part that will, though: the web server. You either need multiple processes, multiple threads, or an evented server coupled with doing your web service requests with an evented I/O library.
#alexd described using multiple processes. I, personally, favor an evented server because I don't need to know/guess ahead of time how many concurrent requests I might have (or use something that spins up processes based on load.) A single nginx process fronting a single thin process can server tons of parallel requests.
The answer to your question depends on the server your Rails application is running on. What are you using right now? Thin? Unicorn? Apache+Passenger?
I wholeheartedly recommend Unicorn for your situation -- it makes it very easy to run multiple server processes in parallel, and you can configure the number of parallel processes simply by changing a number in a configuration file. While one Unicorn worker is handling Alice's high-latency request, another Unicorn worker can be using your free CPU cycles to handle Bob's request.
Most likely, yes. There are ways around this, obviously, but none of them are easy.
The better question is, why do you need to hit the external API on every request? Why not implement a cache layer between your Rails app and the external API and use that for the majority of requests?
This way, with some custom logic for expiring the cache, you'll have a snappy Rails app and still be able to leverage the external API service.
I'm trying to build a (private, for now) web application that will utilize IMAP IDLE connections to show peoples emails as they arrive.
I'm having a hard time figuring out how to hack this together - and how it would fit together with my Heroku RoR server.
I've written a basic script for connecting to an IMAP server and idling, looks something like this (simplified):
imap = Net::IMAP.new server, port, usessl
imap.login username, password
imap.select "INBOX"
imap.add_response_handler do |response|
if resp.kind_of(Net::IMAP::UntaggedResponse) && resp.name == "EXISTS"
# New mail recieved. Ping back and process.
end
end
imap.idle
loop do
sleep 10*60
imap.renew_idle
end
This will make one connection to the IMAP server and start idling. As you see, this is blocking with the loop.
I would like to have multiple IMAP connections idling at the same time for my users. Initially, I just wanted to put each of them in a thread, like so:
Thread.new do
start_imap_idling(server, port, usessl, username, password)
end
I'm not that sharp on threads yet, but with this solution I will still have to block my main thread to wait for the threads? So if I do something like:
User.each do |user|
Thread.new do
start_imap_idling(server, port, usessl, username, password)
end
end
loop do
# Wait
end
That would work, but not without the loop at the bottom to allow the threads to run?
My question is how I best melt this together with my Ruby On Rails application on Heroku? I can't be blocking the thread with that last loop - so how do I run this? Another server? A dyno more - perhaps a worker? I've been reading a bit about Event Machine - could this solve my problem, if so, how should I go about writing this?
Another thing is, that I would like to be able to add new imap clients and remove current ones on the fly. How might that look? Something with a queue perhaps?
Any help and comments are very much appreciated!
I'm not familiar with the specifics of RoR, Event Machine, etc. -- but it seems like you'd want to set up a producer/consumer.
The producer is your thread that's listening for changes from the IMAP server. When it gets changes it writes them to a queue. It seems like you'd want to set up multiple producers, one for each IMAP connection.
Your consumer is a thread that blocks on read from the queue. When something enters the queue it unblocks and processes the event.
Your main thread would then be free to do whatever you want. It sounds like you'd want your main thread doing things like adding new IMAP clients (i.e., producers) and removing current ones on the fly.
As for where you'd run these things: You could run the consumers and producer in one executable, in separate executables on the same machine, or on different machines... all depending upon your circumstances.
HTH
For example if I assign Thread.current[:user] in the beginning of some requests, do I have to clean it up at the end of those request? Is this different between different versions of Rails or different server software such as Passenger, Mongrel and JRuby + Glassfish?
Hongli Lai (http://groups.google.com/group/phusion-passenger/msg/8c3fc0ba589726bf) says that mongrel spawns a new thread for every request but all other app servers process subsequent requests in the same thread. Cleaning up Thread.current in the beginning of every request (or not using it) seems to be the best way to deal with it.
Are there any issues to consider when using DRb for implementing an in-memory message queue and for synchronizing actions between processes? I heard that it could be unreliable, but haven't found anything on the web that confirms this assertion.
In case it's relevant, these would be processes running with a rails app environment that will update models in the database.
DRb is pretty established and widely used. I don't know of anything that would make it unreliable, but I don't use it as a message queue
I'd say you'll have more luck using a message queue as a message queue, instead of rolling your own using DRb. There's a bunch of solutions depending on your needs, memcacheq is pretty easy to interact with, and is in-memory, and is pretty solid.
I personally use DRb running in two separate processes on my web server one to perform minutes-long calculations, allowing the website to poll and check in on the progress, another as a shared captcha server with its own DB connection for various applications on my server. In neither case have I ever had the DRb server fail (except where it was a programming mistake on my part).
Even when the DRb server does fail, you can restart it and the still-running client will reconnect cleanly the next time it needs to communicate.