Rails threads testing db lock - ruby-on-rails

Let's say we want to test that the database is being locked..
$transaction = Thread.new {
Rails.logger.debug 'transaction process start'
Inventory.transaction do
inventory.lock!
Thread.stop
inventory.units_available=99
inventory.save
end
}
$race_condition = Thread.new {
Rails.logger.debug 'race_condition process start'
config = ActiveRecord::Base.configurations[Rails.env].symbolize_keys
config[:flags] = 65536 | 131072 | Mysql2::Client::FOUND_ROWS
begin
connection = Mysql2::Client.new(config)
$transaction.run
$transaction.join
rescue NoMethodError
ensure
connection.close if connection
end
}
Rails.logger.debug 'main process start'
$transaction.join
Rails.logger.debug 'main process after transaction.join'
sleep 0.1 while $transaction.status!='sleep'
Rails.logger.debug 'main process after sleep'
$race_condition.join
Rails.logger.debug 'main process after race_condition.join'
In theory, I'd think it would do the transaction thread, then wait( Thread.stop ), then the main process would see that it's sleeping, and start the race condition thread(which will be trying to alter data in the locked table when it actually works). Then the race condition would continue the transaction thread after it was done.
what's weird is the trace
main process start
transaction process start
race_condition process start
Coming from nodejs, it seems like threads aren't exactly as user friendly.. though, there has to be a way to get this done.
Is there an easier way to lock the database, then try to change it with a different thread?

Thread.new automatically starts the Thread.
But that does not mean that it is executing.
That depends on Operations system, ruby or jruby, how many cores, etc.
In your example the main thread runs until
$transaction.join,
and only then your transaction thread starts, just by chance.
It runs still Thread.stop, then your '$race_condition' Thread starts, because both other are blocked (it might have started before)
So that explains your log.
You have two $transaction.join
they wait until the thread exits, but a thread can only exit once...
I don't know what is happen then, maybe the second call waits forever.
For your test, you need some sort of explicit synchronization, so that our race_thread writes exactly when the transaction_thread is in the middle of the transaction. You can do this with Mutex, but better would be some sort of message passing. The following blog post may help:
http://www.engineyard.com/blog/2011/a-modern-guide-to-threads/

For any resource to make it a "Mutually Exclusive", you need to use Mutex class and use a synchronize method to make the resources locked while one thread is using them. You have to do something like this:
semaphore = Mutex.new
and use it inside the Thread instance.
$transaction = Thread.new {
semaphore.synchronize{
# Do whatever you want with the *your shared resource*
}
}
This way you can prevent any deadlocks.
Hope this helps.

Related

Retry Sidekiq worker from within worker

In my app I am trying to perform two worker tasks sequentially.
First, a PDF is being created with Wicked pdf and then, once the PDF is created, to send an email to two different recipients with the PDF attached.
This is what is called in the controller :
PdfWorker.perform_async(#d.id)
MailingWorker.perform_in(1.minutes, #d.id,#d.class.name.to_s)
First worker creates the PDF and second worker sends email.
Here is second worker :
class MailingWorker
include Sidekiq::Worker
sidekiq_options retry: false
def perform(d_id,model)
#d = eval(model).find(d_id)
#model = model
if #d.pdf.present?
ProfessionnelMailer.notification_d(#d).deliver
ClientMailer.notification_d(#d).deliver
else
MailingWorker.perform_in(1.minutes, #d.id, #model.to_s)
end
end
end
The if statement checks if the PDF has been created. If true two mails are sent, otherwise, the same worker is called again one minute later, just to let the Heroku server extra time to process the PDF creation in case it takes more time or a long queue.
Though if the PDF has definitely failed to be processed, the above ends up in an infinite loop.
Is there a way to fix this ?
One option I see is calling the second worker inside the PDF creation worker though I don't really want to nest workers too deep. It makes my controller more clear to have them separate, I can see the sequence of actions. But any advice welcome.
Another option is to use sidekiq_options retry: 5 and request a retry of the controller that could be counted towards the full total of 5 retries, instead of retrying the worker with else MailingWorker.perform_in(1.minutes, #d.id, #model.to_s) but I don't know how to do this. As per this thread https://github.com/mperham/sidekiq/issues/769 it would be to raise an exception but I am not sure how to do this ... (also I am not sure how long the retry will wait before being processed with the exception method, with the solution above I can control the time frame..)
If you do not want to have nested workers, then in MailingWorker instead of enqueuing it again, raise an exception if the PDF is not present.
Also, configure the worker retry option, so that sidekiq will push it to the retry queue and run it again in sometime. According to the documentation,
Sidekiq will retry failures with an exponential backoff using the
formula (retry_count ** 4) + 15 + (rand(30) * (retry_count + 1)) (i.e.
15, 16, 31, 96, 271, ... seconds + a random amount of time). It will
perform 25 retries over approximately 21 days.
Worker code will be more like,
class MailingWorker
include Sidekiq::Worker
sidekiq_options retry: 5
def perform(d_id,model)
#d = eval(model).find(d_id)
#model = model
if #d.pdf.present?
ProfessionnelMailer.notification_d(#d).deliver
ClientMailer.notification_d(#d).deliver
else
raise "PDF not present"
end
end
end
I believe the "correct" and most asynchroneous way to do this is to have two queues, and two workers:
Queue 1: CreatePdfWorker
Queue 2: SendPdfWorker
When the CreatePdfWorker has generated the PDF, it then enqueues the SendPdfWorker with the newly generated PDF and recipients.
This way, each worker can work independently and pluck from the queue asynchroneously, and you're not struggling against the design choices of Sidekiq.

Puma sleeps an important thread on boot of rails application

I am running Rails 3 with Ruby 2.3.3 on puma with postgresql. I have an initializer/twitter.rb file that starts a thread on boot with a streaming api for twitter. When I use rails server to start my application, the twitter streaming works and I can reach my website like normal. (If I do not put the streaming on a different thread, the streaming works but I can not view my application in the browser since the thread is blocked by the twitter stream). But when I use puma -C config/puma.rb to start my application, I get the following message that is telling me that my thread was found on startup and was put to sleep. How can I tell puma to let me run this thread in the background on startup?
initializer/twitter.rb
### START TWITTER THREAD ### if production
if Rails.env.production?
puts 'Starting Twitter Stream...'
Thread.start {
twitter_stream.user do |object|
case object
when Twitter::Tweet
handle_tweet(object)
when Twitter::DirectMessage
handle_direct_message(object)
when Twitter::Streaming::Event
puts "Received Event: #{object.to_yaml}"
when Twitter::Streaming::FriendList
puts "Received FriendList: #{object.to_yaml}"
when Twitter::Streaming::DeletedTweet
puts "Deleted Tweet: #{object.to_yaml}"
when Twitter::Streaming::StallWarning
puts "Stall Warning: #{object.to_yaml}"
else
puts "It's something else: #{object.to_yaml}"
end
end
}
end
config/puma.rb
workers Integer(ENV['WEB_CONCURRENCY'] || 2)
threads_count = Integer(ENV['RAILS_MAX_THREADS'] || 5)
threads threads_count, threads_count
preload_app!
rackup DefaultRackup
port ENV['PORT'] || 3000
environment ENV['RACK_ENV'] || 'development'
on_worker_boot do
# Valid on Rails up to 4.1 the initializer method of setting `pool` size
ActiveSupport.on_load(:active_record) do
config = ActiveRecord::Base.configurations[Rails.env] ||
Rails.application.config.database_configuration[Rails.env]
config['pool'] = ENV['RAILS_MAX_THREADS'] || 5
ActiveRecord::Base.establish_connection(config)
end
end
Message on startup
2017-04-19T23:52:47.076636+00:00 app[web.1]: Connecting to database specified by DATABASE_URL
2017-04-19T23:52:47.115595+00:00 app[web.1]: Starting Twitter Stream...
2017-04-19T23:52:47.229203+00:00 app[web.1]: Received FriendList: --- !ruby/array:Twitter::Streaming::FriendList []
2017-04-19T23:52:47.865735+00:00 app[web.1]: [4] * Listening on tcp://0.0.0.0:13734
2017-04-19T23:52:47.865830+00:00 app[web.1]: [4] ! WARNING: Detected 1 Thread(s) started in app boot:
2017-04-19T23:52:47.865870+00:00 app[web.1]: [4] ! #<Thread:0x007f4df8bf6240#/app/config/initializers/twitter.rb:135 sleep> - /app/vendor/ruby-2.3.3/lib/ruby/2.3.0/openssl/buffering.rb:125:in `sysread'
2017-04-19T23:52:47.875056+00:00 app[web.1]: [4] - Worker 0 (pid: 7) booted, phase: 0
2017-04-19T23:52:47.865919+00:00 app[web.1]: [4] Use Ctrl-C to stop
2017-04-19T23:52:47.882759+00:00 app[web.1]: [4] - Worker 1 (pid: 11) booted, phase: 0
2017-04-19T23:52:48.148831+00:00 heroku[web.1]: State changed from starting to up
Thanks in advance for the help. I have looked at several other posts mentioning WARNING: Detected 1 Thread(s) started in app boot but the answers say to ignore the warning if the thread is not important. In my case, the thread is very important and I need this thread to not sleep.
From your code I think you have a bigger issue on your hands than a sleeping thread... which I guess might be caused by the fact that some things are misnamed and others are just not often considered when relying on a web framework.
In the world of servers, "workers" refer to forked processes that perform server related tasks, often accepting new connections and handling web requests.
BUT - fork doesn't duplicate threads! - the new process (the worker) starts with only one single thread, which is a copy of the thread that called fork.
This is because processes don't share memory (normally). Whatever global data you have in a process is private to that process (i.e., if you save connected websocket clients in an array, that array is different for each "worker").
This can't be helped, it's part of how the OS and fork are designed.
So, the warning is not something you can circumvent - it's an indication of a design flaw in the app(!).
For example, in your current design (assuming the thread wasn't sleeping), the handle_tweet method will only be called for the original server process and it won't be called for any worker process.
If you're using pub/sub, you only need one twitter_stream connection for the whole app (no matter how many servers or workers you application has) - perhaps a twitter_stream process (or background app) will be better than a thread.
But if you're implementing handle_tweet in a process specific way - i.e., by sending a message to every connected clients saved in an array - you need to make sure every "worker" initiates a twitter_stream thread(!).
When I wrote Iodine (a different server than Puma), I handled these use cases using the Iodine.run method, which defers tasks for later. The "saved" task should be performed only after the workers are initialized and the event loop starts running, so it's performed in each process (allowing you to start a new tread in each proccess).
i.e.
Iodine.run do
Thread.start do
twitter_stream.user do |object|
# ...
end
end
end
I assume Puma has a similar solution. From what I understand of the Puma Clustered-Mode Documentation, Adding the following block to your config/puma.rb might help:
# config/puma.rb
on_worker_boot do
Thread.start do
twitter_stream.user do |object|
# ...
end
end
end
Good luck!
EDIT: relating to the comment about twitter_stream using ActiveRecord
From the comments I gather that the twitter_stream callbacks store data in the DataBase as well as handle "push" events or notices.
Although these two concerns are connected, they are very different from each other.
For example, twitter_stream callbacks should only store data in the DataBase once. Even if your application grows to a billion users, you will only need to save the data in the database once.
This means that the twitter_stream callbacks should have their own dedicated process that runs only once, possibly separate from the main application.
At first, and as long as you limit your application to a single (only one server/application running), you might use fork together with the initializer/twitter.rb script... i.e.:
### START TWITTER PROCESS ### if production
if Rails.env.production?
puts 'Starting Twitter Stream...'
Process.fork do
twitter_stream.user do |object|
# ...
end
end
end
On the other hand, notifications should be addressed to a specific user on a specific connection owned by a specific process.
Hence, notifications should be a separate concern from the twitter_stream DataBase update and they should be running in the background of every process, using the on_worker_boot (or Iodine.run) described above.
To achieve this, you might have on_worker_boot start a background thread that will listen to a pub/sub service such as Redis, while the twitter_stream callbacks "publish" updates to the pub/sub service.
This would allow each process to review the update and check if any of the connections it "owns" belongs to a client that should be notified of the update.
The way I'm reading your question, this doesn't look like an issue. A sleeping thread is different from a dead thread. Sleep just means that the thread is waiting idle, not consuming any cpu. If all else is hooked up properly, then as soon as the twitter api detects an event, it should wake the the thread up, run whatever handler you've defined, and then go right back to sleep. Sleeping isn't "running in the background," but it is "waiting for something to happen (e.g. someone tweets #me.) so I can run in the background."
A quick example to demonstrate this:
2.4.0 :001 > t = Thread.new { TCPServer.new(1234).accept ; puts "Got a connection! Dying..." }
=> #<Thread:0x007fa3941fed90#(irb):1 sleep>
2.4.0 :002 > t
=> #<Thread:0x007fa3941fed90#(irb):1 sleep>
2.4.0 :003 > t
=> #<Thread:0x007fa3941fed90#(irb):1 sleep>
2.4.0 :004 > TCPSocket.new 'localhost', 1234
=> #<TCPSocket:fd 35>
2.4.0 :005 > Got a connection! Dying...
t
=> #<Thread:0x007fa3941fed90#(irb):1 dead>
Sleeping just means "waiting for action."
Puma is a thread-based server, and is very particular about spinning threads up in its boot process, hence the warning about a thread started at app boot.
For what it's worth though, it's kind of weird to have a thread listening for updates from an api like that in a webserver. Maybe you should look into having a worker handle twitter events using something like Resque? Or maybe ActionCable is relevant to your use case?

Multithreading in Ruby in EC2 causing weird behavior

I have the following code that I run in a rake task in rails:
10.times do |i|
Thread.new do
puts "#{i}"
end
end
When I run this locally, I get the following:
0
3
5
1
7
8
2
4
9
6 (with new lines)
However, when I run the same code in EC2 via the same rake task, it will print out maybe one or two lines, and then the task will terminate. I'm not sure why, but it seems my EC2 instance can't handle the multithreading for some reason.
Any insights why?
You've just been getting lucky locally - there is nothing that guarantees that your 10 threads will execute to completion before your program exits. If you want to wait for your threads then you must do so explicitly:
threads = 10.times.collect do |i|
Thread.new do
puts i
end
end
threads.each(&:join)
The join method blocks the calling thread until the specified thread has completed. It also returns the return value of that thread.

How can I ensure an operation runs before Rails exits, without using `at_exit`?

I have an operation that I need to execute in my rails application that before my Rails app dies. Is there a hook I can utilize in Rails for this? Something similar to at_exit I guess.
Ruby itself supports two hooks, BEGIN and END, which are run at the start of a script and as the interpreter stops running it.
See "What does Ruby's BEGIN do?" for more information.
The BEGIN documentation says:
Designates, via code block, code to be executed unconditionally before sequential execution of the program begins. Sometimes used to simulate forward references to methods.
puts times_3(gets.to_i)
BEGIN {
def times_3(n)
n * 3
end
}
The END documentations says:
Designates, via code block, code to be executed just prior to program termination.
END { puts "Bye!" }
Okay so I am making no guarantees as to impact because I have not tested this at all but you could define your own hook e.g.
ObjectSpace.define_finalizer(YOUR_RAILS_APP::Application, proc {puts "exiting now"})
Note this will execute after at_exit so the rails application server output will look like
Stopping ...
Exiting
exiting now
With Tin Man's solution included
ObjectSpace.define_finalizer(YOUR_RAILS_APP::Application, proc {puts "exiting now"})
END { puts "exiting again" }
Output is
Stopping ...
Exiting
exiting again
exiting now

Rails threading - multiple tasks

I am trying to run multiple tasks, each task access the database, and I am trying to run the tasks into separate execution wires.
I played around, tried allow_concurrency which I have set to true, or config.thread_safe! but it I get un-deterministic errors, for example sometimes a class is missing, or a constant ...
here is some code
grabbers = get_grabber_name_list
threads = []
grabbers.each { |grabber|
threads << Thread.new {
ARGV[0] = grabber
if (##last_run_timestamp[grabber.to_sym].blank? || (##last_run_timestamp[grabber.to_sym] >= AbstractGrabber.aff_net_accounts(grabber, "grab_interval").seconds.ago))
Rake::Task["aff_net:import:" + grabber].execute
##last_run_timestamp.merge!({grabber.to_sym => Time.now})
end
}
}
threads.each {|t| t.join }
thanks
I've recently implemented a Rails application that uses threads and made a few discoveries:
First, if you're writing to any arrays or hashes (i.e., complex types) outside your thread, wrap them in a mutex. It looks to me like hash and array references may not be thread safe. It seems unlikely that hash/array element indexing isn't thread safe but all I know is that after I put the external data structures in a mutex before writing, problems disappeared.
Second, close your ActiveRecord connection when the thread terminates, otherwise you can end up creating a large number of stale connections. Here's a post about how to do this. I don't know if it still applies for Rails versions > 2.2 but after I started closing connections explicitly, my problems disappeared. The author suggests monkey-patching ActiveRecord to do this automatically but I decided to release connections explicitly in my code.
Here's a sample of code that's working for me:
mutex = Mutex.new
my_array = []
threads = []
1.upto(10) do |i|
threads << Thread.new {
begin
do_some_stuff
mutex.synchronize {
# You'd think that each thread would only touch its own personal
# array element but without a mutex, I run into problems.
my_array[i] = some_computed_value
}
ensure
ActiveRecord::Base.connection_pool.release_connection
end
}
}
threads.each {|t| t.join}
By the way, if you're using threads to take advantage of multi-core CPUs, you'll need to use JRuby. As far as I know, JRuby is the only implementation that can take advantage of native CPU threads. If you use threads so you can do other things while waiting on network connections or some other non-CPU task, this isn't an issue.
You should probably do this using background workers. There are a few options for background worker libraries, but my favourite is delayed_job (http://github.com/tobi/delayed_job).
It should be pretty easy to convert the code you posted into background jobs.

Resources