Situation:
In a typical cluster setup, I have a 5 instances of mongrel running behind Apache 2.
In one of my initializer files, I schedule a cron task using Rufus::Scheduler which basically sends out a couple of emails.
Problem:
The task runs 5 times, once for each mongrel instance and each recipient ends up getting 5 mails (despite the fact I store logs of each sent mail and check the log before sending). Is it possible that since all 5 instances run the task at exact same time, they end up reading the email logs before they are written?
I am looking for a solution that will make the tasks run only once. I also have a Starling daemon up and running which can be utilized.
The rooster rails plugin specifically addresses your issue. It uses rufus-scheduler and ensures the environment is loaded only once.
The way I am doing it right now:
Try to open a file in exclusive locked mode
When lock is acquired, check for messages in Starling
If message exists, other process has already scheduled the job
Set the message again to the queue and exit.
If message is not found, schedule the job, set the message and exit
Here is the code that does it:
starling = MemCache.new("#{Settings[:starling][:host]}:#{Settings[:starling][:port]}")
mutex_filename = "#{RAILS_ROOT}/config/file.lock"
scheduler = Rufus::Scheduler.start_new
# The filelock method, taken from Ruby Cookbook
# This will ensure unblocking of the files
def flock(file, mode)
success = file.flock(mode)
if success
begin
yield file
ensure
file.flock(File::LOCK_UN)
end
end
return success
end
# open_lock method, taken from Ruby Cookbook
# This will create and hold the locks
def open_lock(filename, openmode = "r", lockmode = nil)
if openmode == 'r' || openmode == 'rb'
lockmode ||= File::LOCK_SH
else
lockmode ||= File::LOCK_EX
end
value = nil
# Kernerl's open method, gives IO Object, in our case, a file
open(filename, openmode) do |f|
flock(f, lockmode) do
begin
value = yield f
ensure
f.flock(File::LOCK_UN) # Comment this line out on Windows.
end
end
return value
end
end
# The actual scheduler
open_lock(mutex_filename, 'r+') do |f|
puts f.read
digest_schedule_message = starling.get("digest_scheduler")
if digest_schedule_message
puts "Found digest message in Starling. Releasing lock. '#{Time.now}'"
puts "Message: #{digest_schedule_message.inspect}"
# Read the message and set it back, so that other processes can read it too
starling.set "digest_scheduler", digest_schedule_message
else
# Schedule job
puts "Scheduling digest emails now. '#{Time.now}'"
scheduler.cron("0 9 * * *") do
puts "Begin sending digests..."
WeeklyDigest.new.send_digest!
puts "Done sending digests."
end
# Add message in queue
puts "Done Scheduling. Sending the message to Starling. '#{Time.now}'"
starling.set "digest_scheduler", :date => Date.today
end
end
# Sleep will ensure all instances have gone thorugh their wait-acquire lock-schedule(or not) cycle
# This will ensure that on next reboot, starling won't have any stale messages
puts "Waiting to clear digest messages from Starling."
sleep(20)
puts "All digest messages cleared, proceeding with boot."
starling.get("digest_scheduler")
Why dont you use mod_passenger (phusion)? I moved from mongrel to phusion and it worked perfect (with a timeamount of < 5 minutes)!
Related
I have a rails application running with puma server. Is there any way, we can see how many number of threads used in application currently ?
I was wondering about the same thing a while ago and came upon this issue. The author included the code they ended up using to collect those stats:
module PumaThreadLogger
def initialize *args
ret = super *args
Thread.new do
while true
# Every X seconds, write out what the state of this dyno is in a format that Librato understands.
sleep 5
thread_count = 0
backlog = 0
waiting = 0
# I don't do the logging or string stuff inside of the mutex. I want to get out of there as fast as possible
#mutex.synchronize {
thread_count = #workers.size
backlog = #todo.size
waiting = #waiting
}
# For some reason, even a single Puma server (not clustered) has two booted ThreadPools.
# One of them is empty, and the other is actually doing work
# The check above ignores the empty one
if (thread_count > 0)
# It might be cool if we knew the Puma worker index for this worker, but that didn't look easy to me.
# The good news: By using the PID we can differentiate two different workers on two different dynos with the same name
# (which might happen if one is shutting down and the other is starting)
source_name = "#{Process.pid}"
# If we have a dyno name, prepend it to the source to make it easier to group in the log output
dyno_name = ENV['DYNO']
if (dyno_name)
source_name="#{dyno_name}."+source_name
end
msg = "source=#{source_name} "
msg += "sample#puma.backlog=#{backlog} sample#puma.active_connections=#{thread_count - waiting} sample#puma.total_threads=#{thread_count}"
Rails.logger.info msg
end
end
end
ret
end
end
module Puma
class ThreadPool
prepend PumaThreadLogger
end
end
This code contains logic that is specific to heroku, but the core of collecting the #workers.size and logging it will work in any environment.
I have a Rails application where users upload Audio files. I want to send them to a third party server, and I need to connect to the external server using Web sockets, so, I need my Rails application to be a websocket client.
I'm trying to figure out how to properly set that up. I'm not committed to any gem just yet, but the 'faye-websocket' gem looks promising. I even found a similar answer in "Sending large file in websocket before timeout", however, using that code doesn't work for me.
Here is an example of my code:
#message = Array.new
EM.run {
ws = Faye::WebSocket::Client.new("wss://example_url.com")
ws.on :open do |event|
File.open('path/to/audio_file.wav','rb') do |f|
ws.send(f.gets)
end
end
ws.on :message do |event|
#message << [event.data]
end
ws.on :close do |event|
ws = nil
EM.stop
end
}
When I use that, I get an error from the recipient server:
No JSON object could be decoded
This makes sense, because the I don't believe it's properly formatted for faye-websocket. Their documentation says:
send(message) accepts either a String or an Array of byte-sized
integers and sends a text or binary message over the connection to the
other peer; binary data must be encoded as an Array.
I'm not sure how to accomplish that. How do I load binary into an array of integers with Ruby?
I tried modifying the send command to use the bytes method:
File.open('path/to/audio_file.wav','rb') do |f|
ws.send(f.gets.bytes)
end
But now I receive this error:
Stream was 19 bytes but needs to be at least 100 bytes
I know my file is 286KB, so something is wrong here. I get confused as to when to use File.read vs File.open vs. File.new.
Also, maybe this gem isn't the best for sending binary data. Does anyone have success sending binary files in Rails with websockets?
Update: I did find a way to get this working, but it is terrible for memory. For other people that want to load small files, you can simply File.binread and the unpack method:
ws.on :open do |event|
f = File.binread 'path/to/audio_file.wav'
ws.send(f.unpack('C*'))
end
However, if I use that same code on a mere 100MB file, the server runs out of memory. It depletes the entire available 1.5GB on my test server! Does anyone know how to do this is a memory safe manner?
Here's my take on it:
# do only once when initializing Rails:
require 'iodine/client'
Iodine.force_start!
# this sets the callbacks.
# on_message is always required by Iodine.
options = {}
options[:on_message] = Proc.new do |data|
# this will never get called
puts "incoming data ignored? for:\n#{data}"
end
options[:on_open] = Proc.new do
# believe it or not - this variable belongs to the websocket connection.
#started_upload = true
# set a task to send the file,
# so the on_open initialization doesn't block incoming messages.
Iodine.run do
# read the file and write to the websocket.
File.open('filename','r') do |f|
buffer = String.new # recycle the String's allocated memory
write f.read(65_536, buffer) until f.eof?
#started_upload = :done
end
# close the connection
close
end
end
options[:on_close] = Proc.new do |data|
# can we notify the user that the file was uploaded?
if #started_upload == :done
# we did it :-)
else
# what happened?
end
end
# will not wait for a connection:
Iodine::Http.ws_connect "wss://example_url.com", options
# OR
# will wait for a connection, raising errors if failed.
Iodine::Http::WebsocketClient.connect "wss://example_url.com", options
It's only fair to mention that I'm Iodine's author, which I wrote for use in Plezi (a RESTful Websocket real time application framework you can use stand alone or within Rails)... I'm super biased ;-)
I would avoid the gets because it's size could include the whole file or a single byte, depending on the location of the next End Of Line (EOL) marker... read gives you better control over each chunk's size.
I build a website-crawler that (later on) uses these links to read out information.
The current rake-task goes through all the possible pages one by one and checks if the requests goes trough (valid response) or returns a 404/503 error (invalid page). If it's valid the pages url gets saved into my database.
Now as you can see the task requests 50,000 pages in total thus requires some time.
I have read about Sidekiq and how it can perform these tasks asynchronously thus making this a lot faster.
My question: As you can see my task builds the counter after each loop. This will not work with Sidekiq I guess as it will only perform this script independent of itself various times, am I right?
How would I go around the problem of each instance needing its own counter then?
Hopefully my question makes sense - Thank you very much!
desc "Validate Pages"
task validate_url: :environment do
require 'rubygems'
require 'open-uri'
require 'nokogiri'
counter = 1
base_url = "http://example.net/file"
until counter > 50000 do
begin
url = "#{base_url}_#{counter}/"
open(url)
page = Page.new
page.url = url
page.save!
puts "Saved #{url} !"
counter += 1
rescue OpenURI::HTTPError => ex
logger ||= Logger.new("validations.log")
if ex.io.status[0] == "503"
logger.info "#{ex} # #{counter}"
end
puts "#{ex} # #{counter}"
counter += 1
rescue SocketError => ex
logger ||= Logger.new("validations.log")
logger.info "#{ex} # #{counter}"
puts "#{ex} # #{counter}"
counter += 1
end
end
end
A simple Redis INCR operation will create and/or increment a global counter for your jobs to use. You can use Sidekiq's redis connection to implement a counter trivially:
Sidekiq.redis do |conn|
conn.incr("my-counter")
end
If you want to use it async - that means you will have many instances of same job. The fastest approach - to use something like redis. This will give you simple and fast way to check\update counter for your needs. But also make sure you took care about counter: If one of your jobs using it, lock it for other jobs, so there wont be wrong results, etc
I have added the following code to my initializers:
require 'delayed/worker'
Delayed::Worker.logger = ActiveSupport::BufferedLogger.new("log/#{Rails.env}_delayed_jobs.log", Rails.logger.level)
module Delayed
class Worker
def say_with_flushing(text, level = Logger::INFO)
if logger
say_without_flushing(text, level)
logger.flush
end
end
alias_method_chain :say, :flushing
end
end
But it only seems to output stuff like:
2011-09-06T14:02:04-0700: [Worker(delayed_job host:Caribbean.local pid:80322)] 1 jobs processed at 16.5391 j/s, 0 failed ...
2011-09-06T14:03:19-0700: [Worker(delayed_job host:Caribbean.local pid:80322)] MailTest completed after 0.2072
...
Is there a way to get it to actually output what it's doing? Ideally what I am trying to get is the actual outgoing email content that would normally be seen in development mode, but have it go into the delayed_job.log (regardless of rails environment).
So without having mailing go through delayed_job, in production mode the output in the log is something like:
[Mail sent to foo#foo.com]
But I want to get the actual contents as they'd be displayed if I were in development mode-- (the entire email body, subject, etc) but again in the delayed_job log because delayed_job is handling mailing emails.
I have a Ruby process that listens on a given device. I would like to spin up/down instances of it for different devices with a rails app. Everything I can find for Ruby daemons seems to be based around a set number of daemons running or background processing with message queues.
Should I just be doing this with Kernel.spawn and storing the PIDs in the database? It seems a bit hacky but if there isn't an existing framework that allows me to bring up/down daemons it seems I may not have much choice.
Instead of spawning another script and keeping the PIDs in the database, you can do it all within the same script, using fork, and keeping PIDs in memory. Here's a sample script - you add and delete "worker instances" by typing commands "add" and "del" in console, exiting with "quit":
#pids = []
#counter = 0
def add_process
#pids.push(Process.fork {
loop do
puts "Hello from worker ##{#counter}"
sleep 1
end
})
#counter += 1
end
def del_process
return false if #pids.empty?
pid = #pids.pop
Process.kill('SIGTERM', pid)
true
end
def kill_all
while del_process
end
end
while cmd = gets.chomp
case cmd.downcase
when 'quit'
kill_all
exit
when 'add'
add_process
when 'del'
del_process
end
end
Of course, this is just an example, and for sending comands and/or monitoring instances you can replace this simple gets loop with a small Sinatra app, or socket interface, or named pipes etc.