I want to perform some actions in parallel periodically and once they're all done, show the results to the user on a page. It'll happen approximately 1 time per 5 mins, it depends on the users' activity.
These actions are performed by the external, third-party applications (processes). There're about 4 of them now. So I have to run 4 external processes for each user request .
While they are performing, I show an user a page with an ajax spinner and send an ajax requests to the server to check if everything is done. Once done, I show the results.
Here is a rough version of what I have
class MyController
def my_action request_id
res = external_apps_cmds_with_args.each do |x|
# new process
res = Open3.popen3 x do |stdin, stdout, stderr, wait_thr|
exit_value = wait_thr.value.exitstatus
if exit_value == 0 ....
end
end
write_res_to_db res, request_id #each external app writes to the db its own result for each request_id
end
end
The calculations CAN be done in parallel because there's NO overall result here, there are only the results from each tool. There is no race condition.
So I want them to run in non-blocking mode, obviously.
Is Open3.popen3 a non-blocking command? Or should I run the external processes in the different threads:
threads = []
external_apps_cmds_with_args.each do |x|
# new threads
threads << Thread.new do
# new process
res = Open3.popen3 x do |stdin, stdout, stderr, wait_thr|
exit_value = wait_thr.value.exitstatus
if exit_value == 0 ....
end
end
write_res_to_db res, request_id #each external app writes to the db its own result for each request_id
end
threads.each &:join
Or should I create only one thread?
# only one new thread
thread = Thread.new do
res = external_apps_cmds_with_args.each do |x|
# new process
res = Open3.popen3 x do |stdin, stdout, stderr, wait_thr|
exit_value = wait_thr.value.exitstatus
if exit_value == 0 ....
end
end
write_res_to_db res, request_id #each external app writes to the db its own result for each request_id
end
thread.join
Or should I continue using the approach I'm using now: NO threads at all?
What I would suggest is that you have one action to load the page and then a separate ajax action for each process. As the processes finish they will return data to the user (presumably in different parts of the page) and you will take advantage of the multi-process/threading capabilities of your webserver.
This approach has some issues because like your original ideas, you are tying up some of your web processes while the external processes are running and you may run into timeouts. If you want to avoid that, you could run them as background jobs (delayed_job, resque, etc..) and then display the data when the jobs have finished.
Related
I have a sidekiq worker that waits for a change to happen to a record made by a remote client. Something like the following:
#myworker async process to wait for client to confirm status
perform(myRecordID)
sendClient(myRecordID)
didClientAcknowledge = false
while !didClientAcknowledge
didClientAcknowledge = myRecords.find(myRecordID).status == :ACK_OK
if didClientAcknowledge
break
end
# wait for client to perform an update on the record to confirm status
sleep 5.seconds
end
Rails.logger.info("client got the message")
end
my problem is that although I can see that the client has in fact performed the acknowledgement and updated the record with correct status update (ACK_OK), my sidekiq thread continues to see the old status for myRecord.
I'm sure my logic is flawed here but it seems like the sidekiq process does not "see" changes to the DB...but if I used my rails console I can see that the client has in fact updated the DB as expected...
Thanks!
Edit 1
ok so here's a thought, instead of the loop, I'll schedule another call to the worker within 5 seconds... so here's the updated code:
perform(myRecordID, retry_count)
retry_count -= 1
if retry_count < 1
return
end
sendClient(myRecordID)
didClientAcknowledge = false
if !didClientAcknowledge
didClientAcknowledge = myRecords.find(myRecordID).status == :ACK_OK
if didClientAcknowledge
Rails.logger.info("client got the message")
return
end
# wait for client to perform an update on the record to confirm status
myWorker.perform_in(5.seconds)
end
Rails.logger.info("client got the message")
end
This seems to work, but will test a bit more..one challenge is having a retry count which means I need to maintain some sort of variable between calls to the worker...
edit2 possibly this can be done by passing in the time to the first call and then checking if a timeout has been surpassed before invoking the next instance...assuming time does not stand still as well inside the async call...
edit3 Adding the retry_count argument allows us to control how many times this worker will be spawned...
I need to convert videos in 4 threads
For example I have Active Record models Video with titles: Video1, Video2, Video3, Video4, Video5
So, I need to execute something like this
bundle exec script/video_converter start
Where script will process unconverted videos for 4 threads, for example
Video.where(state: 'unconverted').first.process
But if one of 4 videos are converted, next video must be automatically added to thread
What is the best solution for this ? Sidekiq gem? Daemons gem + Ruby Threads manually?
For now I am using this script:
THREAD_COUNT = 4
SLEEP_TIME = 5
logger = CONVERTATION_LOG
spawns = []
loop do
videos = Video.where(state:'unconverted').limit(THREAD_COUNT).reorder("ID DESC")
videos.each do |video|
spawns << Spawnling.new do
result = video.process
if result.nil?
video.create_thumbnail!
else
video.failured!
end
end
end
Spawnling.wait(spawns)
sleep(SLEEP_TIME)
end
But this script waits 4 videos, and after it takes another 4 videos. I want, that after one of 4-th video converted, it will be automatically added to new thread, which is empty.
If your goal is to keep processing videos by using just 4 threads (or whatever Spawnling is configured to use - as it supports fork and thread), then, you could use a Queue to queue all your video records to be processed, spawn 4 threads and let them keep processing records one by one until queue is empty.
require "rails"
require "spawnling"
# In your case, videos are read from DB, below array is for illustration
videos = ["v1", "v2", "v3", "v4", "v5", "v6", "..."]
THREAD_COUNT = 4
spawns = []
q = Queue.new
videos.each {|i| q.push(i) }
THREAD_COUNT.times do
spawns << Spawnling.new do
until q.empty? do
v = q.pop
# simulate processing
puts "Processing video #{v}"
# simulate processing time
sleep(rand(10))
end
end
end
Spawnling.wait(spawns)
This answer is inspired from this answer
PS: I have added few requires and defined videos array to make above code self-contained running example.
I've got a set of data that I'd like to do some calculations on inside my rails application, each calculation is independent of each other so I'd like to thread them so my response is much faster.
Here's what I've got ATM:
def show
#stats = Stats.new
Thread.new {
#stats.top_brands = #RESULT OF FIRST CALCULATION
}
Thread.new {
#stats.top_retailers = #RESULT OF SECOND CALCULATION
}
Thread.new {
#stats.top_styles = #RESULT OF THIRD CALCULATION
}
Thread.new {
#stats.top_colors = #RESULT OF FOURTH CALCULATION
}
render json: #stats
end
Now this returns a bunch of empty arrays for each of the member instances of #stats, however, if I join the threads together, it runs, but defeats the purpose of threading since each of the threads block.
Since I'm very new to threads, I'm wondering what I'm doing wrong here or if its even possible to accomplish what I'm trying do, that is, run 4 calcs in paralell and return the result to the client.
Thanks,
Joe
It first depends if your calculations are doing processor heavy operations or are doing a lot blocking IO like reading from databases, the file system or the network. It wouldn't do much good if they're doing the former since each thread is taking up CPU time and no other thread can be scheduled - worse even if you're using Ruby MRI which has a Global Interpreter Lock. If the threads are doing blocking IO however, they can at least wait, let another thread run, wait, let another run and so on until they all return.
At the end you do have to join all the threads together because you want their return values. Do this below all your Thread.new calls. Save the return value of each Thread.new to an array:
threads = []
threads << Thread.new ...
Then join them together before you render:
threads.each &:join
If you want to really be sure this helps you out just benchmark the entire action:
def show
start_time = Time.now.to_f
#stats = Stats.new
Thread.new {
#stats.top_brands = #RESULT OF FIRST CALCULATION
}
Thread.new {
#stats.top_colors = #RESULT OF FOURTH CALCULATION
}
#elapsed_time = Time.now.to_f - start_time
# do something with #elapsed_time, like putsing it or rendering it in your response
render json: #stats
end
Hope that helps.
I need to improve a rake task that build cloth looks by fetching the images from external server.
When I try to create multiple threads, the results are duplicated.
But if I put sleep 0.1 before each Thread.new, the code works! Why?
new_looks = []
threads = []
for look in looks
# sleep 0.1 - when I put it, works!
threads << Thread.new do
# a external http request is being done here
new_looks << Look.new(ref: look["look_ref"])
end
end
puts 'waiting threads to finish...'
threads.each(&:join)
puts 'saving...'
new_looks.sort_by(&:ref).each(&:save)
Array is not generally thread safe. Switch to a thread-safe data structure such as Queue:
new_look_queue = Queue.new
threads = looks.map do |look|
Thread.new do
new_look_queue.enq Look.new(ref: look["look_ref"])
end
end
puts 'waiting threads to finish...'
threads.each(&:join)
puts 'saving...'
new_looks = []
while !new_look_queue.empty?
new_look_queue << queue.deq
end
new_looks.sort_by(&:ref).each(&:save)
Queue#enq puts a new entry in the queue; Queue#deq gets one out, blocking if there isn't one.
If you don't need the new_looks saved in order, the code gets simpler:
puts 'saving...'
while !new_look_queue.empty?
new_look_queue.deq.save
end
Or, even simpler yet, just do the save inside the thread.
If you have a great many looks, the above code will create more threads than is good. Too many threads cause the requests to take too long to process, and consume excess memory. In that case, consider create some number of producer threads:
NUM_THREADS = 8
As before, there's a queue of finished work:
new_look_queue = Queue.new
But there's now also a queue of work to be done:
look_queue = Queue.new
looks.each do |look|
look_queue.enq look
end
Each thread will live until it's out of work, so let's add some "out of work" symbols to the queue, one for each thread:
NUM_THREADS.times do {look_queue.enq :done}
And now the threads:
threads = NUM_THREADS.times.map do
Thread.new do
while (look = look_queue.deq) != :done
new_look_queue.enq Look.new(ref: look["look_ref"])
end
end
end
Processing the new_look_queue is the same as above.
Try to update your code to this one:
for look in looks
threads << Thread.new(look) do |lk|
new_looks << Look.new(ref: lk["look_ref"])
end
end
This should help you.
UPD: Forgot about Thread.new(args)
When i am using win32ole as stand alone application at that time everything seems working fine, as soon as i put into my rails application which is running on mongrel server it goes into infinite loop.
I am trying to access "https://microsoft/sharepoint/document.doc"
def generatertm(issue)
begin
word = WIN32OLE.new('word.application')
logger.debug("Word Initialized...")
word.visible = true
myDocLink = "https://microsoft/sharepoint/url.doc"
myFile = word.documents.open(myDocLink)
logger.debug("File Opened...")
puts "Started Reading bookmarks..."
myBookMarks = myFile.Bookmarks puts "bookmarks fetched working background task..."
print ("Bookmakr Count : " + myBookMarks.Count.to_s + "\n")
myBookMarks.each do |i|
logger.warn ("Bookmark Name : " + i.Name + "\n")
end
rescue WIN32OLERuntimeError => e
puts e.message
puts e.backtrace.inspect
else
ensure
word.activedocument.close( true ) # presents save dialog box
#word.activedocument.close(false) # no save dialog, just close it
word.quit
end
end
When i run this code stand alone at that time one Pop up come for Microsoft Share point credentials. however in mongrel rails it goes into infinite loop.
Do i need to handle this pop up to appear through Rails?
Have you looked into patching the win32ole.rb file?
Basically, here's the reason for the patch:
t turns out that win32ole.rb patches the thread to call the windows
OleInitialize() & OleUninitialize() functions around the yield to the
block. However, the MS documentation for CoInitialize (which
OleInitialize calls internally) state that: "the first thread in the
application that calls CoInitialize with 0 (or CoInitializeEx with
COINIT_APARTMENTTHREADED) must be the last thread to call
CoUninitialize. Otherwise, subsequent calls to CoInitialize on the STA
will fail and the application will not work."
http://msdn.microsoft.com/en-us/library/ms678543(v=VS.85).aspx
And here's the modified win32ole.rb file to fix the threading issue:
require 'win32ole.so'
# Fail if not required by main thread.
# Call OleInitialize and OleUninitialize for main thread to satisfy the following:
#
# The first thread in the application that calls CoInitialize with 0 (or CoInitializeEx with COINIT_APARTMENTTHREADED)
# must be the last thread to call CoUninitialize. Otherwise, subsequent calls to CoInitialize on the STA will fail and the
# application will not work.
#
# See http://msdn.microsoft.com/en-us/library/ms678543(v=VS.85).aspx
if Thread.main != Thread.current
raise "Require win32ole.rb from the main application thread to satisfy CoInitialize requirements."
else
WIN32OLE.ole_initialize
at_exit { WIN32OLE.ole_uninitialize }
end
# re-define Thread#initialize
# bug #2618(ruby-core:27634)
class Thread
alias :org_initialize :initialize
def initialize(*arg, &block)
if block
org_initialize(*arg) {
WIN32OLE.ole_initialize
begin
block.call(*arg)
ensure
WIN32OLE.ole_uninitialize
end
}
else
org_initialize(*arg)
end
end
end
http://cowlibob.co.uk/ruby-threads-win32ole-coinitialize-and-counin