Run scripts in parallel in ruby - ruby-on-rails

I need to convert videos in 4 threads
For example I have Active Record models Video with titles: Video1, Video2, Video3, Video4, Video5
So, I need to execute something like this
bundle exec script/video_converter start
Where script will process unconverted videos for 4 threads, for example
Video.where(state: 'unconverted').first.process
But if one of 4 videos are converted, next video must be automatically added to thread
What is the best solution for this ? Sidekiq gem? Daemons gem + Ruby Threads manually?
For now I am using this script:
spawns = []
loop do
videos = Video.where(state:'unconverted').limit(THREAD_COUNT).reorder("ID DESC")
videos.each do |video|
spawns << do
result = video.process
if result.nil?
But this script waits 4 videos, and after it takes another 4 videos. I want, that after one of 4-th video converted, it will be automatically added to new thread, which is empty.

If your goal is to keep processing videos by using just 4 threads (or whatever Spawnling is configured to use - as it supports fork and thread), then, you could use a Queue to queue all your video records to be processed, spawn 4 threads and let them keep processing records one by one until queue is empty.
require "rails"
require "spawnling"
# In your case, videos are read from DB, below array is for illustration
videos = ["v1", "v2", "v3", "v4", "v5", "v6", "..."]
spawns = []
q =
videos.each {|i| q.push(i) }
spawns << do
until q.empty? do
v = q.pop
# simulate processing
puts "Processing video #{v}"
# simulate processing time
This answer is inspired from this answer
PS: I have added few requires and defined videos array to make above code self-contained running example.


Using limit and offset in rails together with updated_at and find_each - will that cause a problem?

I have a Ruby on Rails project in which there are millions of products with different urls. I have a function "test_response" that checks the url and returns either a true or false for the Product attribute marked_as_broken, either way the Product is saved and has its "updated_at"-attribute updated to the current Timestamp.
Since this is a very tedious process I have created a task which in turn starts off 15 tasks, each with a N/15 number of products to check. The first one should check from, for example, the first to the 10.000th, the second one from the 10.000nd to the 20.000nd and so on, using limit and offset.
This script works fine, it starts off 15 process but rather quickly completes one script after another far too early. It does not terminate, it finishes with a "Process exited with status 0".
My guess here is that using find_each together with a search for updated_at as well as in fact updating the "updated_at" while running the script changes everything and does not make the script go through the 10.000 items as supposed but I can't verify this.
Is there something inherently wrong by doing what I do here. For example, does "find_each" run a new sql query once in a while providing completely different results each time, than anticipated? I do expect it to provide the same 10.000 -> 20.000 but just split it up in pieces.
task :big_response_launcher => :environment do
nbr_of_fps = Product.where(:marked_as_broken => false).where("updated_at < '" + 1.year.ago.to_date.to_s + "'").size.to_i
nbr_of_processes = 15
batch_size = ((nbr_of_fps / nbr_of_processes))-2
heroku = PlatformAPI.connect_oauth(auth_code_provided_elsewhere)
(0..nbr_of_processes-1).each do |i|
puts "Launching #{i.to_s}"
current_offset = batch_size * i
puts "rake big_response_tester[#{current_offset},#{batch_size}]"
heroku.dyno.create('kopa', {
:command => "rake big_response_tester[#{current_offset},#{batch_size}]",
:attach => false
task :big_response_tester, [:current_offset, :batch_size] => :environment do |task,args|
current_limit = args[:batch_size].to_i
current_offset = args[:current_offset].to_i
puts "Launching with offset #{current_offset.to_s} and limit #{current_limit.to_s}"
Product.where(:marked_as_broken => false).where("updated_at < '" + 1.year.ago.to_date.to_s + "'").limit(current_limit).offset(current_offset).find_each do |fp|
As many have noted in the comments, it seems like using find_each will ignore the order and limit. I found this answer (ActiveRecord find_each combined with limit and order) that seems to be working for me. It's not working 100% but it is a definite improvement. The rest seems to be a memory issue, i.e. I cannot have too many processes running at the same time on Heroku.

Jobs update with Dashing and Ruby

I use Dashing for monitor trends and website statistics.
I create a jobs to check GooglesNews trends and Twitter trends .
The data is displayed well, however, they appear at first load and does put more update then. There is the code for twitter_trends.rb :
require 'nokogiri'
require 'open-uri'
url = ''
data = Nokogiri::HTML(open(url))
list = data.xpath('//ol/li')
tags = list.collect do |tag|
tags = tags.take(10)
tag_counts ={value: 0})
SCHEDULER.every '10s' do
tag = tags.sample
tag_counts[tag] = {label: tag}
send_event('twitter_trends', {items: tag_counts.values})
I think I used bad "rufus-scheduler" to schedule my job jobs
How to make the data will update correctly on a regular basis ?
Your scheduler looks fine, but it looks like you're making one call to the website:
data = Nokogiri::HTML(open(url))
But never calling it again. Is your intent to only check that site once along with the initial processing of it?
I assume you'd really want to wrap more of your logic into the scheduler loop - only things in there will be rerun when the schedule job hits.
When you covered everything in a scheduler, you are only taking one sample every 10 seconds ( then adding it to tag_counts. This is clearing the tag each time. Thing to remember about schedulers is it's basically a clean slate every time it runs. I'd recommend looping through tags and adding them to tag_counts that way instead of sampling. sampling is kind of unnecessary seeing as you are reducing it to 10 each time you run the scheduler.
If I move the SCHEDULER like this (after url on top), it works but that only one item appears randomly every 10 seconds.
require 'nokogiri'
require 'open-uri'
url = ''
SCHEDULER.every '10s' do
data = Nokogiri::HTML(open(url))
list = data.xpath('//ol/li')
tags = list.collect do |tag|
tags = tags.take(10)
tag_counts ={value: 0})
tag = tags.sample
tag_counts[tag] = {label: tag}
send_event('twitter_trends', {items: tag_counts.values})
How to display a list of 10 items, which is updated regularly ?

Run external processes in non-blocking mode

I want to perform some actions in parallel periodically and once they're all done, show the results to the user on a page. It'll happen approximately 1 time per 5 mins, it depends on the users' activity.
These actions are performed by the external, third-party applications (processes). There're about 4 of them now. So I have to run 4 external processes for each user request .
While they are performing, I show an user a page with an ajax spinner and send an ajax requests to the server to check if everything is done. Once done, I show the results.
Here is a rough version of what I have
class MyController
def my_action request_id
res = external_apps_cmds_with_args.each do |x|
# new process
res = Open3.popen3 x do |stdin, stdout, stderr, wait_thr|
exit_value = wait_thr.value.exitstatus
if exit_value == 0 ....
write_res_to_db res, request_id #each external app writes to the db its own result for each request_id
The calculations CAN be done in parallel because there's NO overall result here, there are only the results from each tool. There is no race condition.
So I want them to run in non-blocking mode, obviously.
Is Open3.popen3 a non-blocking command? Or should I run the external processes in the different threads:
threads = []
external_apps_cmds_with_args.each do |x|
# new threads
threads << do
# new process
res = Open3.popen3 x do |stdin, stdout, stderr, wait_thr|
exit_value = wait_thr.value.exitstatus
if exit_value == 0 ....
write_res_to_db res, request_id #each external app writes to the db its own result for each request_id
threads.each &:join
Or should I create only one thread?
# only one new thread
thread = do
res = external_apps_cmds_with_args.each do |x|
# new process
res = Open3.popen3 x do |stdin, stdout, stderr, wait_thr|
exit_value = wait_thr.value.exitstatus
if exit_value == 0 ....
write_res_to_db res, request_id #each external app writes to the db its own result for each request_id
Or should I continue using the approach I'm using now: NO threads at all?
What I would suggest is that you have one action to load the page and then a separate ajax action for each process. As the processes finish they will return data to the user (presumably in different parts of the page) and you will take advantage of the multi-process/threading capabilities of your webserver.
This approach has some issues because like your original ideas, you are tying up some of your web processes while the external processes are running and you may run into timeouts. If you want to avoid that, you could run them as background jobs (delayed_job, resque, etc..) and then display the data when the jobs have finished.

Duplicated results on Ruby threading

I need to improve a rake task that build cloth looks by fetching the images from external server.
When I try to create multiple threads, the results are duplicated.
But if I put sleep 0.1 before each, the code works! Why?
new_looks = []
threads = []
for look in looks
# sleep 0.1 - when I put it, works!
threads << do
# a external http request is being done here
new_looks << look["look_ref"])
puts 'waiting threads to finish...'
puts 'saving...'
Array is not generally thread safe. Switch to a thread-safe data structure such as Queue:
new_look_queue =
threads = do |look| do
new_look_queue.enq look["look_ref"])
puts 'waiting threads to finish...'
puts 'saving...'
new_looks = []
while !new_look_queue.empty?
new_look_queue << queue.deq
Queue#enq puts a new entry in the queue; Queue#deq gets one out, blocking if there isn't one.
If you don't need the new_looks saved in order, the code gets simpler:
puts 'saving...'
while !new_look_queue.empty?
Or, even simpler yet, just do the save inside the thread.
If you have a great many looks, the above code will create more threads than is good. Too many threads cause the requests to take too long to process, and consume excess memory. In that case, consider create some number of producer threads:
As before, there's a queue of finished work:
new_look_queue =
But there's now also a queue of work to be done:
look_queue =
looks.each do |look|
look_queue.enq look
Each thread will live until it's out of work, so let's add some "out of work" symbols to the queue, one for each thread:
NUM_THREADS.times do {look_queue.enq :done}
And now the threads:
threads = do do
while (look = look_queue.deq) != :done
new_look_queue.enq look["look_ref"])
Processing the new_look_queue is the same as above.
Try to update your code to this one:
for look in looks
threads << do |lk|
new_looks << lk["look_ref"])
This should help you.
UPD: Forgot about

Server Side Timers with Juggernaut 2

I am writing a rails app with Juggernaut 2 for real-time push notifications and am not sure how to approach this problem. I have a number of users in a chat room and I would like to run a timer so that a push can go out to each browser in the chat room every 30 seconds. Juggernaut 2 is built on node.js, so I'm assuming I need to write this code there. I just have no idea where to start in terms of integrating this with Juggernaut 2.
I just browsed through Juggernaut briefly so take my answer with a grain of salt...
You might be interested in the Channel object ( You'll notice that is an object (think ruby's hash) of all the channels that exist. You can set a 30 second recurring timer (setInterval - to do something with all your channels.
What to do in each loop iteration? Well, the link to the aforementioned Channel code has a publish method:
publish: function(message){
var channels = message.getChannels();
delete message.channels;
for(var i=0, len = channels.length; i < len; i++) { = channels[i];
var clients = this.find(channels[i]).clients;
for(var x=0, len2 = clients.length; x < len2; x++) {
So you basically have to create a Message object with message.channels set to Channel.channels and if you pass that message to the publish method, it will send out to all your clients.
As to the contents of your message, I dunno what you are using client side ( a chat client someone already built for you off Juggernaut and so that's up to you.
As for where to put the code creating the interval and firing off the callback to publish your message to all channels, you might want to check here in the code that creates the actual server listening on the given port: ( If you attach the interval within init(), then as soon as you start the server it will be checking every 30 seconds to publish your given message to every channel
Here is a sample client which pushes every 30 seconds in Ruby.
Install your Juggernaut with Redis and Node: install ruby and rubygems, then run gem install juggernaut and
#!/usr/bin/env ruby
require "rubygems"
require "juggernaut"
while 1==1
Juggernaut.publish("channel1","some Message")
sleep 30
We implemented a quiz system which pushed out questions on a variable time interval. We did it as follows:
def start_quiz"*** Quiz starting at #{}")
$redis.flushall # Clear all scores from database
quiz = Quiz.find(params[:quizz] || 1 )
#quiz_master = quiz.user
quiz_questions = quiz.quiz_questions.order("question_no ASC")
spawn_block do
quiz_questions.each { |q|"*** Publishing question #{q.question_no}.")
time_alloc = q.question_time
Juggernaut.publish( select_channel("/quiz_stream"), {:q_num => q.num, :q_txt => q.text :time=> time_alloc} )
scoreboard = publish_scoreboard
Juggernaut.publish( select_channel("/scoreboard"), {:scoreboard => scoreboard} )
respond_to do |format|
format.all { render :nothing => true, :status => 200 }
The key in our case was using 'spawn' to run a background process for the quiz timing so that we could still process the incoming scores.
I have no idea how scalable this is.
