Duplicated results on Ruby threading - ruby-on-rails

I need to improve a rake task that build cloth looks by fetching the images from external server.
When I try to create multiple threads, the results are duplicated.
But if I put sleep 0.1 before each Thread.new, the code works! Why?
new_looks = []
threads = []
for look in looks
# sleep 0.1 - when I put it, works!
threads << Thread.new do
# a external http request is being done here
new_looks << Look.new(ref: look["look_ref"])
end
end
puts 'waiting threads to finish...'
threads.each(&:join)
puts 'saving...'
new_looks.sort_by(&:ref).each(&:save)

Array is not generally thread safe. Switch to a thread-safe data structure such as Queue:
new_look_queue = Queue.new
threads = looks.map do |look|
Thread.new do
new_look_queue.enq Look.new(ref: look["look_ref"])
end
end
puts 'waiting threads to finish...'
threads.each(&:join)
puts 'saving...'
new_looks = []
while !new_look_queue.empty?
new_look_queue << queue.deq
end
new_looks.sort_by(&:ref).each(&:save)
Queue#enq puts a new entry in the queue; Queue#deq gets one out, blocking if there isn't one.
If you don't need the new_looks saved in order, the code gets simpler:
puts 'saving...'
while !new_look_queue.empty?
new_look_queue.deq.save
end
Or, even simpler yet, just do the save inside the thread.
If you have a great many looks, the above code will create more threads than is good. Too many threads cause the requests to take too long to process, and consume excess memory. In that case, consider create some number of producer threads:
NUM_THREADS = 8
As before, there's a queue of finished work:
new_look_queue = Queue.new
But there's now also a queue of work to be done:
look_queue = Queue.new
looks.each do |look|
look_queue.enq look
end
Each thread will live until it's out of work, so let's add some "out of work" symbols to the queue, one for each thread:
NUM_THREADS.times do {look_queue.enq :done}
And now the threads:
threads = NUM_THREADS.times.map do
Thread.new do
while (look = look_queue.deq) != :done
new_look_queue.enq Look.new(ref: look["look_ref"])
end
end
end
Processing the new_look_queue is the same as above.

Try to update your code to this one:
for look in looks
threads << Thread.new(look) do |lk|
new_looks << Look.new(ref: lk["look_ref"])
end
end
This should help you.
UPD: Forgot about Thread.new(args)

Related

How can I keep the Tempfile contents from being empty in a separate (Ruby) thread?

In a Rails 6.x app, I have a controller method which backgrounds queries that take longer than 2 minutes (to avoid a browser timeout), advises the user, stores the results, and sends a link that can retrieve them to generate a live page (with Highcharts charts). This works fine.
Now, I'm trying to implement the same logic with a method that backgrounds the creation of a report, via a Tempfile, and attaches the contents to an email, if the query runs too long. This code works just fine if the 2-minute timeout is NOT reached, but the Tempfile is empty at the commented line if the timeout IS reached.
I've tried wrapping the second part in another thread, and wrapping the internals of each thread with a mutex, but this is all getting above my head. I haven't done a lot of multithreading, and every time I do, I feel like I stumble around till I get it. This time, I can't even seem to stumble into it.
I don't know if the problem is with my thread(s), or a race condition with the Tempfile object. I've had trouble using Tempfiles before, because they seem to disappear quicker than I can close them. Is this one getting cleaned up before it can be sent? The file handle actually still exists on the file system at the commented point, even though it's empty, so I'm not clear on what's happening.
def report
queue = Queue.new
file = Tempfile.new('report')
thr = Thread.new do
query = %Q(blah blah blah)
#calibrations = ActiveRecord::Base.connection.exec_query query
query = %Q(blah blah blah)
#tunings = ActiveRecord::Base.connection.exec_query query
if queue.empty?
unless #tunings.empty?
CSV.open(file.path, 'wb') do |csv|
csv << ["headers...", #parameters].flatten
#calibrations.each do |c|
line = [c["h1"], c["h2"], c["h3"], c["h4"], c["h5"], c["h6"], c["h7"], c["h8"]]
t = #tunings.select { |t| t["code"] == c["code"] }.first
#parameters.each do |parameter|
line << t[parameter.downcase]
end
csv << line
end
end
send_data file.read, :type => 'text/csv; charset=iso-8859-1; header=present', :disposition => "attachment; filename=\"report.csv\""
end
else
# When "timed out", `file` is empty here
NotificationMailer.report_ready(current_user, file.read).deliver_later
end
end
give_up_at = Time.now + 120.seconds
while Time.now < give_up_at do
if !thr.alive?
break
end
sleep 1
end
if thr.alive?
queue << "Timeout"
render html: "Your report is taking longer than 2 minutes to generate. To avoid a browser timeout, it will finish in the background, and the report will be sent to you in email."
end
end
The reason the file is empty is because you are giving the query 120 seconds to complete. If after 120 seconds that has not happened you add "Timeout" to the queue. The query is still running inside the thread and has not reached the point where you check if the queue is empty or not. When the query does complete, since the queue is now not empty, you skip the part where you write the csv file and go to the Notification.report line. At that point the file is still empty because you never wrote anything into it.
In the end I think you need to rethink the overall logic of what you are trying to accomplish and there needs to be more communication between the threads and the top level.
Each thread needs to tell the top level if it has already sent the result, and the top level needs to let the thread know that its past time to directly send the result, and instead should email the result.
Here is some code that I think / hope will give some insight into how to approach this problem.
timeout_limit = 10
query_times = [5, 15, 1, 15]
timeout = []
sent_response = []
send_via_email = []
puts "time out is set to #{timeout_limit} seconds"
query_times.each_with_index do |query_time, query_id|
puts "starting query #{query_id} that will take #{query_time} seconds"
timeout[query_id] = false
sent_response[query_id] = false
send_via_email[query_id] = false
Thread.new do
## do query
sleep query_time
unless timeout[query_id]
puts "query #{query_id} has completed, displaying results now"
sent_response[query_id] = true
else
puts "query #{query_id} has completed, emailing result now"
send_via_email[query_id] = true
end
end
give_up_at = Time.now + timeout_limit
while Time.now < give_up_at
break if sent_response[query_id]
sleep 1
end
unless sent_response[query_id]
puts "query #{query_id} timed out, we will email the result of your query when it is completed"
timeout[query_id] = true
end
end
# simulate server environment
loop { }
=>
time out is set to 10 seconds
starting query 0 that will take 5 seconds
query 0 has completed, displaying results now
starting query 1 that will take 15 seconds
query 1 timed out, we will email the result of your query when it is completed
starting query 2 that will take 1 seconds
query 2 has completed, displaying results now
starting query 3 that will take 15 seconds
query 1 has completed, emailing result now
query 3 timed out, we will email the result of your query when it is completed
query 3 has completed, emailing result now

Run scripts in parallel in ruby

I need to convert videos in 4 threads
For example I have Active Record models Video with titles: Video1, Video2, Video3, Video4, Video5
So, I need to execute something like this
bundle exec script/video_converter start
Where script will process unconverted videos for 4 threads, for example
Video.where(state: 'unconverted').first.process
But if one of 4 videos are converted, next video must be automatically added to thread
What is the best solution for this ? Sidekiq gem? Daemons gem + Ruby Threads manually?
For now I am using this script:
THREAD_COUNT = 4
SLEEP_TIME = 5
logger = CONVERTATION_LOG
spawns = []
loop do
videos = Video.where(state:'unconverted').limit(THREAD_COUNT).reorder("ID DESC")
videos.each do |video|
spawns << Spawnling.new do
result = video.process
if result.nil?
video.create_thumbnail!
else
video.failured!
end
end
end
Spawnling.wait(spawns)
sleep(SLEEP_TIME)
end
But this script waits 4 videos, and after it takes another 4 videos. I want, that after one of 4-th video converted, it will be automatically added to new thread, which is empty.
If your goal is to keep processing videos by using just 4 threads (or whatever Spawnling is configured to use - as it supports fork and thread), then, you could use a Queue to queue all your video records to be processed, spawn 4 threads and let them keep processing records one by one until queue is empty.
require "rails"
require "spawnling"
# In your case, videos are read from DB, below array is for illustration
videos = ["v1", "v2", "v3", "v4", "v5", "v6", "..."]
THREAD_COUNT = 4
spawns = []
q = Queue.new
videos.each {|i| q.push(i) }
THREAD_COUNT.times do
spawns << Spawnling.new do
until q.empty? do
v = q.pop
# simulate processing
puts "Processing video #{v}"
# simulate processing time
sleep(rand(10))
end
end
end
Spawnling.wait(spawns)
This answer is inspired from this answer
PS: I have added few requires and defined videos array to make above code self-contained running example.

Run external processes in non-blocking mode

I want to perform some actions in parallel periodically and once they're all done, show the results to the user on a page. It'll happen approximately 1 time per 5 mins, it depends on the users' activity.
These actions are performed by the external, third-party applications (processes). There're about 4 of them now. So I have to run 4 external processes for each user request .
While they are performing, I show an user a page with an ajax spinner and send an ajax requests to the server to check if everything is done. Once done, I show the results.
Here is a rough version of what I have
class MyController
def my_action request_id
res = external_apps_cmds_with_args.each do |x|
# new process
res = Open3.popen3 x do |stdin, stdout, stderr, wait_thr|
exit_value = wait_thr.value.exitstatus
if exit_value == 0 ....
end
end
write_res_to_db res, request_id #each external app writes to the db its own result for each request_id
end
end
The calculations CAN be done in parallel because there's NO overall result here, there are only the results from each tool. There is no race condition.
So I want them to run in non-blocking mode, obviously.
Is Open3.popen3 a non-blocking command? Or should I run the external processes in the different threads:
threads = []
external_apps_cmds_with_args.each do |x|
# new threads
threads << Thread.new do
# new process
res = Open3.popen3 x do |stdin, stdout, stderr, wait_thr|
exit_value = wait_thr.value.exitstatus
if exit_value == 0 ....
end
end
write_res_to_db res, request_id #each external app writes to the db its own result for each request_id
end
threads.each &:join
Or should I create only one thread?
# only one new thread
thread = Thread.new do
res = external_apps_cmds_with_args.each do |x|
# new process
res = Open3.popen3 x do |stdin, stdout, stderr, wait_thr|
exit_value = wait_thr.value.exitstatus
if exit_value == 0 ....
end
end
write_res_to_db res, request_id #each external app writes to the db its own result for each request_id
end
thread.join
Or should I continue using the approach I'm using now: NO threads at all?
What I would suggest is that you have one action to load the page and then a separate ajax action for each process. As the processes finish they will return data to the user (presumably in different parts of the page) and you will take advantage of the multi-process/threading capabilities of your webserver.
This approach has some issues because like your original ideas, you are tying up some of your web processes while the external processes are running and you may run into timeouts. If you want to avoid that, you could run them as background jobs (delayed_job, resque, etc..) and then display the data when the jobs have finished.

Running threads inside my rails controller method

I've got a set of data that I'd like to do some calculations on inside my rails application, each calculation is independent of each other so I'd like to thread them so my response is much faster.
Here's what I've got ATM:
def show
#stats = Stats.new
Thread.new {
#stats.top_brands = #RESULT OF FIRST CALCULATION
}
Thread.new {
#stats.top_retailers = #RESULT OF SECOND CALCULATION
}
Thread.new {
#stats.top_styles = #RESULT OF THIRD CALCULATION
}
Thread.new {
#stats.top_colors = #RESULT OF FOURTH CALCULATION
}
render json: #stats
end
Now this returns a bunch of empty arrays for each of the member instances of #stats, however, if I join the threads together, it runs, but defeats the purpose of threading since each of the threads block.
Since I'm very new to threads, I'm wondering what I'm doing wrong here or if its even possible to accomplish what I'm trying do, that is, run 4 calcs in paralell and return the result to the client.
Thanks,
Joe
It first depends if your calculations are doing processor heavy operations or are doing a lot blocking IO like reading from databases, the file system or the network. It wouldn't do much good if they're doing the former since each thread is taking up CPU time and no other thread can be scheduled - worse even if you're using Ruby MRI which has a Global Interpreter Lock. If the threads are doing blocking IO however, they can at least wait, let another thread run, wait, let another run and so on until they all return.
At the end you do have to join all the threads together because you want their return values. Do this below all your Thread.new calls. Save the return value of each Thread.new to an array:
threads = []
threads << Thread.new ...
Then join them together before you render:
threads.each &:join
If you want to really be sure this helps you out just benchmark the entire action:
def show
start_time = Time.now.to_f
#stats = Stats.new
Thread.new {
#stats.top_brands = #RESULT OF FIRST CALCULATION
}
Thread.new {
#stats.top_colors = #RESULT OF FOURTH CALCULATION
}
#elapsed_time = Time.now.to_f - start_time
# do something with #elapsed_time, like putsing it or rendering it in your response
render json: #stats
end
Hope that helps.

Render Word Document from Remote Server using WIN32OLE in ruby on rails

When i am using win32ole as stand alone application at that time everything seems working fine, as soon as i put into my rails application which is running on mongrel server it goes into infinite loop.
I am trying to access "https://microsoft/sharepoint/document.doc"
def generatertm(issue)
begin
word = WIN32OLE.new('word.application')
logger.debug("Word Initialized...")
word.visible = true
myDocLink = "https://microsoft/sharepoint/url.doc"
myFile = word.documents.open(myDocLink)
logger.debug("File Opened...")
puts "Started Reading bookmarks..."
myBookMarks = myFile.Bookmarks puts "bookmarks fetched working background task..."
print ("Bookmakr Count : " + myBookMarks.Count.to_s + "\n")
myBookMarks.each do |i|
logger.warn ("Bookmark Name : " + i.Name + "\n")
end
rescue WIN32OLERuntimeError => e
puts e.message
puts e.backtrace.inspect
else
ensure
word.activedocument.close( true ) # presents save dialog box
#word.activedocument.close(false) # no save dialog, just close it
word.quit
end
end
When i run this code stand alone at that time one Pop up come for Microsoft Share point credentials. however in mongrel rails it goes into infinite loop.
Do i need to handle this pop up to appear through Rails?
Have you looked into patching the win32ole.rb file?
Basically, here's the reason for the patch:
t turns out that win32ole.rb patches the thread to call the windows
OleInitialize() & OleUninitialize() functions around the yield to the
block. However, the MS documentation for CoInitialize (which
OleInitialize calls internally) state that: "the first thread in the
application that calls CoInitialize with 0 (or CoInitializeEx with
COINIT_APARTMENTTHREADED) must be the last thread to call
CoUninitialize. Otherwise, subsequent calls to CoInitialize on the STA
will fail and the application will not work."
http://msdn.microsoft.com/en-us/library/ms678543(v=VS.85).aspx
And here's the modified win32ole.rb file to fix the threading issue:
require 'win32ole.so'
# Fail if not required by main thread.
# Call OleInitialize and OleUninitialize for main thread to satisfy the following:
#
# The first thread in the application that calls CoInitialize with 0 (or CoInitializeEx with COINIT_APARTMENTTHREADED)
# must be the last thread to call CoUninitialize. Otherwise, subsequent calls to CoInitialize on the STA will fail and the
# application will not work.
#
# See http://msdn.microsoft.com/en-us/library/ms678543(v=VS.85).aspx
if Thread.main != Thread.current
raise "Require win32ole.rb from the main application thread to satisfy CoInitialize requirements."
else
WIN32OLE.ole_initialize
at_exit { WIN32OLE.ole_uninitialize }
end
# re-define Thread#initialize
# bug #2618(ruby-core:27634)
class Thread
alias :org_initialize :initialize
def initialize(*arg, &block)
if block
org_initialize(*arg) {
WIN32OLE.ole_initialize
begin
block.call(*arg)
ensure
WIN32OLE.ole_uninitialize
end
}
else
org_initialize(*arg)
end
end
end
http://cowlibob.co.uk/ruby-threads-win32ole-coinitialize-and-counin

Resources