Ryan Bates mentions the LISTEN/NOTIFY functionality of Postgres when discussing push notifications in this episode, but I haven't been able to find any hint on how to implement a LISTEN/NOTIFY in my rails app.
Here is documentation for a wait_for_notify function inside of the pg adaptor, but I can't figure out what exactly that does/is designed for.
Do we need to tap directly into the connection variable of the pg adaptor?
You're looking in the right place with the wait_for_notify method, but since ActiveRecord apparently doesn't provide an API for using it, you'll need to get at the underlying PG::Connection object (or one of them, if you're running a multithreaded setup) that ActiveRecord is using to talk to Postgres.
Once you've got the connection, simply execute whatever LISTEN statements you need, then pass a block (and an optional timeout period) to wait_for_notify. Note that this will block the current thread, and monopolize the Postgres connection, until the timeout is reached or a NOTIFY occurs (so you wouldn't want to do this inside a web request, for example). When another process issues a NOTIFY on one of the channels you're listening to, the block will be called with three arguments - the channel that the notified, the pid of the Postgres backend that triggered the NOTIFY, and the payload that accompanied the NOTIFY (if any).
I haven't used ActiveRecord in quite a while, so there may be a cleaner way to do this, but this seems to work alright in 4.0.0.beta1:
# Be sure to check out a connection, so we stay thread-safe.
ActiveRecord::Base.connection_pool.with_connection do |connection|
# connection is the ActiveRecord::ConnectionAdapters::PostgreSQLAdapter object
conn = connection.instance_variable_get(:#connection)
# conn is the underlying PG::Connection object, and exposes #wait_for_notify
begin
conn.async_exec "LISTEN channel1"
conn.async_exec "LISTEN channel2"
# This will block until a NOTIFY is issued on one of these two channels.
conn.wait_for_notify do |channel, pid, payload|
puts "Received a NOTIFY on channel #{channel}"
puts "from PG backend #{pid}"
puts "saying #{payload}"
end
# Note that you'll need to call wait_for_notify again if you want to pick
# up further notifications. This time, bail out if we don't get a
# notification within half a second.
conn.wait_for_notify(0.5) do |channel, pid, payload|
puts "Received a second NOTIFY on channel #{channel}"
puts "from PG backend #{pid}"
puts "saying #{payload}"
end
ensure
# Don't want the connection to still be listening once we return
# it to the pool - could result in weird behavior for the next
# thread to check it out.
conn.async_exec "UNLISTEN *"
end
end
For an example of a more general usage, see Sequel's implementation.
Edit to add: Here's another description of what's going on. This may not be the exact implementation behind the scenes, but it seems to describe the behavior well enough.
Postgres keeps a list of notifications for each connection. When you use a connection to execute LISTEN channel_name, you're telling Postgres that any notifications on that channel should be pushed to this connection's list (multiple connections can listen to the same channel, so a single notification can wind up being pushed to many lists). A connection can LISTEN to many channels at the same time, and notifications to any of them will all be pushed to the same list.
What wait_for_notify does is pop the oldest notification off of the connection's list and passes its information to the block - or, if the list is empty, sleeps until a notification becomes available and does the same for that (or until the timeout is reached, in which case it just returns nil). Since wait_for_notify only handles a single notification, you're going to have to call it repeatedly if you want to handle multiple notifications.
When you UNLISTEN channel_name or UNLISTEN *, Postgres will stop pushing those notifications to your connection's list, but the ones that have already been pushed to that list will stay there, and wait_for_notify will still return them when it is next called. This might cause an issue where notifications that are accumulated after wait_for_notify but before UNLISTEN stick around and are still present when another thread checks out that connection. In that case, after UNLISTEN you might want to call wait_for_notify with short timeouts until it returns nil. But unless you're making heavy use of LISTEN and NOTIFY for many different purposes, though, it's probably not worth worrying about.
I added a better link to Sequel's implementation above, I'd recommend looking at it. It's pretty straightforward.
The accepted answer looks good to me. Some promising resources I found while exploring postgres LISTEN/NOTIFY:
https://gist.github.com/chsh/9c9f5702919c83023f83
https://github.com/schneems/hey_you
The source in hey_you is easy to read and looks similar to the other examples
Related
I'm working on a Rails 7 app with some pretty tight response time SLAs. I am well within SLA during normal runtime. Where I fall painfully short is first request. I've added an initializer that will load up ActiveRecord and make sure all of my DB models are loaded. It hydrates some various memory caches, etc. This took me pretty far. My first response time was reduced about 60%. However, I've been trying to figure out a couple things are are still slowing down first response time.
First API request does a check to see if I need to do a rails migration. I've not figured out how to move this check to init.
First API request appears to be be using a fresh DB Pool.. not the one that was used in init phase. I've tried fully hydrating the pool to spare the API from creating them when Rails kicks on, but I've not figured it out.
In an initializer I can do something like this:
connections = []
ActiveRecord::Base.connection.pool.size.times do
connections << ActiveRecord::Base.connection.pool.checkout
end
connections.each { ActiveRecord::Base.connection.pool.checkin(_1) }
According to my PG logs, this opens up the connections and Rails does all of this typing queries, setting session properties, etc. However, when I go to fire off my first API call, my pool is empty.
In the end what ended up being the general issue was I needed to be hydrating the pool with the correct connections. on_worker_boot is because this is running behind puma.
on_worker_boot do
ActiveRecord::Base.connected_to(role: :reading) do
# spin up db connections
connections = []
(ActiveRecord::Base.connection.pool.size - 1).times do
connections << ActiveRecord::Base.connection.pool.checkout
end
connections.each { |x| ActiveRecord::Base.connection.pool.checkin(x) }
end
end
I'm working on a rails based web backend, and I've ran into a bit of an issue. I'm building a crypto trading based application, which relies on knowing the exact current price of many cryptos/stocks. To do this I seem to need a websocket to update certain data, however I can't figure out how to store this data. I need to be able to write to it on every websocket update, as well as read from it when sending out data to the front end. Both of these actions seem too fast to rely on my database, so I'm wondering if there is a better option. My idea was to use a class with a class method that is set on server startup. Then read from/write to that method when needed. The class looks something like this
class CryptoSocket
def self.start
##cryptos = {
BTC: 0,
ETH: 0,
DOGE: 0
}
end
def self.value(symbol)
##cryptos[symbol]
end
end
Inside the start method is a websocket which gets opened, and on message writes to ##cryptos with the updated value of the coin. I call CryptoSocket.start when the server boots up
To get the value for a symbol I can just call CryptoSocket.value(symbol) anywhere in my app. This seemed like it was working, however I've noticed sometimes it fails telling me NameError: uninitialized class variable ##cryptos in CryptoSocket
It seems like the issue is running reload! in the rails console, or entering a binding.pry then exiting. My guess is some garbage collection is happening, but overall it's something I'd like to avoid.
Does anyone have a suggestion for a better way to go about setting this class up? Does rails have a better way to persist an object in memory? It's fine to lose it when the server shuts down, but I would like to keep access to it when the server stays up
I have a sneaker worker(given below) as a backend of a chatbot.
class RabbitMQWorker
include Sneakers::Worker
from_queue "message"
def work(options)
parsed_options = JSON.parse(options)
# Initializing some object variables
#question_id = parsed_options[:question_id]
#answer = parsed_option[:answer]
#session_id = parsed_option[:session_id]
ActiveRecord::Base.connection_pool.with_connection do
# send next question to the session_id based on answer
end
ack!
end
end
What's happening
The problem I am facing here is that when I run sneaker with more than 1 thread and multiple users are chatting at the same time, then the ampq event which comes slightly later cause to override the #session_id and as a result, the second user gets two questions and the first one gets none. This happens because by the time 1st event is getting process the second event came and override #session_id. Now when it's time to send the next question to the first user by using #session_id, the question get's send to the second user.
My Questions
Do the work method and any instance variables I create in it works like global mutable data for sneaker's threads?
If yes then I am guessing I need to make them as thread-local variables. If I do that, then do I need to make these changes deep down in my Rails logic as well? As this worker works with Rails.
Curiosity question
How does Puma manage these things? It is a multi-threaded app server and we use instance variables in controllers and it manages to serve multiple requests simultaneously. Does it mean that Puma handles this multi-contexting implicitly and Sneakers don't?
What I have done till now
I read the documentation of Sneaker and couldn't found anything regarding this.
I perform a load tests to verify the problem and it is the problem as I stated above.
I tried getting my logic clear on how actually multi-threading works but everywhere there is only general stuff. The curiosity question I asked above will help a lot in terms of clearing the concepts, I am searching for an explanation of it for days but couldn't found any.
After 2 days of searching for an issue where messages seemed to get mixed up I was finally able to solve this by removing all instance variables from my workers.
This thread gave me the clue to do so: https://github.com/jondot/sneakers/issues/244
maybe we should simply disallow instance variables in workers since
changing the behavior to instantiate multiple worker instances might
break existing code somehow
and:
I think that an instance per thread is the way to go.
So when you remove your instance variables you should be fine!
What I'm doing
I'm using the twitter gem (a Ruby wrapper for the Twitter API) in my app, which is run on Heroku. I use Heroku's Scheduler to periodically run caching tasks that use the twitter gem to, for example, update the list of retweets for a particular user. I'm also using delayed_job so scheduler calls a rake task, which calls a method that is 'delayed' (see scheduler.rake below). The method loops through "authentications" (for users who have authenticated twitter through my app) to update each authorized user's retweet cache in the app.
My question
What am I doing wrong? For example, since I'm using Heroku's Scheduler, is delayed_job redundant? Also, you can see I'm not catching (rescuing) any errors. So, if Twitter is unreachable, or if a user's auth token has expired, everything chokes. This is obviously dumb and terrible because if there's an error, the entire thing chokes and ends up creating a failed delayed_job, which causes ripple effects for my app. I can see this is bad, but I'm not sure what the best solution is. How/where should I be catching errors?
I'll put all my code (from the scheduler down to the method being called) for one of my cache methods. I'm really just hoping for a bulleted list (and maybe some code or pseudo-code) berating me for poor coding practice and telling me where I can improve things.
I have seen this SO question, which helps me a little with the begin/rescue block, but I could use more guidance on catching errors, and one the higher-level "is this a good way to do this?" plane.
Code
Heroku Scheduler job:
rake update_retweet_cache
scheduler.rake (in my app)
task :update_retweet_cache => :environment do
Tweet.delay.cache_retweets_for_all_auths
end
Tweet.rb, update_retweet_cache method:
def self.cache_retweets_for_all_auths
#authentications = Authentication.find_all_by_provider("twitter")
#authentications.each do |authentication|
authentication.user.twitter.retweeted_to_me(include_entities: true, count: 200).each do |tweet|
# Actually build the cache - this is good - removing to keep this short
end
end
end
User.rb, twitter method:
def twitter
authentication = Authentication.find_by_user_id_and_provider(self.id, "twitter")
if authentication
#twitter ||= Twitter::Client.new(:oauth_token => authentication.oauth_token, :oauth_token_secret => authentication.oauth_secret)
end
end
Note: As I was posting this, I noticed that I'm finding all "twitter" authentications in the "cache_retweets_for_all_auths" method, then calling the "User.twitter" method, which specifically limits to "twitter" authentications. This is obviously redundant, and I'll fix it.
First what is the exact error you are getting, and what do you want to happen when there is an error?
Edit:
If you just want to catch the errors and log them then the following should work.
def self.cache_retweets_for_all_auths
#authentications = Authentication.find_all_by_provider("twitter")
#authentications.each do |authentication|
being
authentication.user.twitter.retweeted_to_me(include_entities: true, count: 200).each do |tweet|
# Actually build the cache - this is good - removing to keep this short
end
rescue => e
#Either create an object where the error is log, or output it to what ever log you wish.
end
end
end
This way when it fails it will keep moving on to the next user but will still making a note of the error. Most of the time with twitter its just better to do something like this then try to do with each error on its own. I have seen so many weird things out of the twitter API, and random errors, that trying to track down every error almost always turns into a wild goose chase, though it is still good to keep track just in case.
Next for when you should use what.
You should use a scheduler when you need something to happen based on time only, delayed jobs for when its based on an user action, but the 'action' you are going to delay would take to long for a normal response. Sometimes you can just put the thing plainly in the controller also.
So in other words
The scheduler will be fine as long as the time between updates X is less then the time it will take for the update to happen, time Y.
If X < Y then you might want to look at calling the logic from the controller when each indvidual entry is accessed, isntead of trying to do them all at once. The idea being you would only update it after a certain time as passed so. You could store the last time update either on the model itself in a field like twitter_udpate_time or in a redis or memecache instance at a unquie key for the user/auth.
But if the individual update itself is still too long, then thats when you should do the above, but instead of doing the actually update, call a delayed job.
You could even set it up that it only updates or calls the delayed job after a certain number of views, to further limit stuff.
Possible Fancy Pants
Or if you want to get really fancy you could still do it as a cron job, but have a point system based on views that weights which entries should be updated. The idea being certain actions would add points to certain users, and if their points are over a certain amount you update them, and then remove their points. That way you could target the ones you think are the most important, or have the most traffic or show up in the most search results etc etc.
Next off a nick picky thing.
http://api.rubyonrails.org/classes/ActiveRecord/Batches.html
You should be using
#authentications.find_each do |authentication|
instead of
#authentications.each do |authentication|
find_each pulls in only 1000 entries at a time so if you end up with a lof of Authentications you don't end up pulling a crazy amount of entries into memory.
Rails has a nice set of filters (before_validation, before_create, after_save, etc) as well as support for observers, but I'm faced with a situation in which relying on a filter or observer is far too computationally expensive. I need an alternative.
The problem: I'm logging web server hits to a large number of pages. What I need is a trigger that will perform an action (say, send an email) when a given page has been viewed more than X times. Due to the huge number of pages and hits, using a filter or observer will result in a lot of wasted time because, 99% of the time, the condition it tests will be false. The email does not have to be sent out right away (i.e. a 5-10 minute delay is acceptable).
What I am instead considering is implementing some kind of process that sweeps the database every 5 minutes or so and checks to see which pages have been hit more than X times, recording that state in a new DB table, then sending out a corresponding email. It's not exactly elegant, but it will work.
Does anyone else have a better idea?
Rake tasks are nice! But you will end up writing more custom code for each background job you add. Check out the Delayed Job plugin http://blog.leetsoft.com/2008/2/17/delayed-job-dj
DJ is an asynchronous priority queue that relies on one simple database table. According to the DJ website you can create a job using Delayed::Job.enqueue() method shown below.
class NewsletterJob < Struct.new(:text, :emails)
def perform
emails.each { |e| NewsletterMailer.deliver_text_to_email(text, e) }
end
end
Delayed::Job.enqueue( NewsletterJob.new("blah blah", Customers.find(:all).collect(&:email)) )
I was once part of a team that wrote a custom ad server, which has the same requirements: monitor the number of hits per document, and do something once they reach a certain threshold. This server was going to be powering an existing very large site with a lot of traffic, and scalability was a real concern. My company hired two Doubleclick consultants to pick their brains.
Their opinion was: The fastest way to persist any information is to write it in a custom Apache log directive. So we built a site where every time someone would hit a document (ad, page, all the same), the server that handled the request would write a SQL statement to the log: "INSERT INTO impressions (timestamp, page, ip, etc) VALUES (x, 'path/to/doc', y, etc);" -- all output dynamically with data from the webserver. Every 5 minutes, we would gather these files from the web servers, and then dump them all in the master database one at a time. Then, at our leisure, we could parse that data to do anything we well pleased with it.
Depending on your exact requirements and deployment setup, you could do something similar. The computational requirement to check if you're past a certain threshold is still probably even smaller (guessing here) than executing the SQL to increment a value or insert a row. You could get rid of both bits of overhead by logging hits (special format or not), and then periodically gather them, parse them, input them to the database, and do whatever you want with them.
When saving your Hit model, update a redundant column in your Page model that stores a running total of hits, this costs you 2 extra queries, so maybe each hit takes twice as long to process, but you can decide if you need to send the email with a simple if.
Your original solution isn't bad either.
I have to write something here so that stackoverflow code-highlights the first line.
class ApplicationController < ActionController::Base
before_filter :increment_fancy_counter
private
def increment_fancy_counter
# somehow increment the counter here
end
end
# lib/tasks/fancy_counter.rake
namespace :fancy_counter do
task :process do
# somehow process the counter here
end
end
Have a cron job run rake fancy_counter:process however often you want it to run.