Are rails timers reliable when using Net::HTTP? - ruby-on-rails

When reading data from a potentially slow website, I want to ensure that get_response can not hang, and so added a timer to timeout after x seconds. So far, so good. I then read http://ph7spot.com/musings/system-timer which illustrates that in certain situations timer.rb doesn't work due to ruby's implementation of threads.
Does anyone know if this is one of these situations?
url = URI.parse(someurl)
begin
Timeout::timeout(30) do
response = Net::HTTP.get_response(url)
#responseValue = CGI.unescape(response.body)
end
rescue Exception => e
dosomething
end

well, first of all Timeout is not a class defined in Rails but in Ruby, second, Timeout is not reliable in cases when you make system calls.
Ruby uses what it's so called Green Threads. Let's suppose you have 3 threads, you think all of them will run in parallel but if one of the threads makes a syscall all the rest of the threads will be blocked until the syscall finishes, in this case Timeout won't work as expected, so it's always better to use something reliable like SystemTimer.

Related

Is there a way to exit a thread in a Parallel.foreach loop which is stuck

I am using OmniThread Parallel.foreach(). There are instances where the loop takes a long time or gets stuck.
I would like to know, is it possible to timeout each process in the Parallel.foreach() loop?
In short: Nope, there isn't.
Unless you program the timeout handling in your 'thread body' code (what gets called in the execute).
eg my database engine allows sending a CancelProcessing call to running queries from a different thread that runs the query, this would 'cleanly' end the running subthread.
'Dirty' end of the subthreads:
I added a FR to Omnithread's github site to add an (Dirty) Terminate method to the IOmniParallel interfaces (and alikes). Which has is drawback because killing subthreads will probably leave you with memory/resource leaks.
Meanwile you might use this dirty shutdown solution/workaround wich actually comes down fixing a similar problem (I had a deadlock in my parallel processed routine, so my parallel.Waitfor never returned true, and worse my IOmniParallelTask interface variable was never released causing the calling thread to block as well.

Is this Ruby code using threads, thread pools, and concurrency correctly

I am what I now consider part 3 of completing a task of pinging a very large list of URLs (which number in the thousands) and retrieving a URL's x509 certificate associated with it. Part 1 is here (How do I properly use threads to ping a URL) and Part 2 is here (Why won't my connection pool implement my thread code).
Since I asked these two questions, I have now ended up with the following code:
###### This is the code that pings a url and grabs its x509 cert #####
class SslClient
attr_reader :url, :port, :timeout
def initialize(url, port = '443')
#url = url
#port = port
end
def ping_for_certificate_info
context = OpenSSL::SSL::SSLContext.new
tcp_client = TCPSocket.new(url, port)
ssl_client = OpenSSL::SSL::SSLSocket.new tcp_client, context
ssl_client.hostname = url
ssl_client.sync_close = true
ssl_client.connect
certificate = ssl_client.peer_cert
verify_result = ssl_client.verify_result
tcp_client.close
{certificate: certificate, verify_result: verify_result }
rescue => error
{certificate: nil, verify_result: nil }
end
end
The above code is paramount that I retrieve the ssl_client.peer_cert. Below I have the following code that is the snippet that makes multiple HTTP pings to URLs for their certs:
pool = Concurrent::CachedThreadPool.new
pool.post do
[LARGE LIST OF URLS TO PING].each do |struct|
ssl_client = SslClient.new(struct.domain.gsub("*.", "www."), struct.scan_port)
cert_info = ssl_client.ping_for_certificate_info
struct.x509_cert = cert_info[:certificate]
struct.verify_result = cert_info[:verify_result]
end
end
pool.shutdown
pool.wait_for_termination
#Do some rails code with the database depending on the results.
So far when I run this code, it is unbelievably slow. I thought that by creating a thread pool with threads, the code would go much faster. That doesn't seem the case and I'm not sure why. A lot of it was because I didn't know the nuances of threads, pools, starvation, locks, etc. However, after implementing the above code, I read some more to try to speed it up and once again I'm confused and could use some clarification as to how I can make the code faster.
For starters, in this excellent article here (ruby-concurrency-parallelism) . We get the following definitions and concepts:
Concurrency vs. Parallelism
These terms are used loosely, but they do have distinct meanings.
Concurrency: The art of doing many tasks, one at a time. By switching
between them quickly, it may appear to the user as though they happen
simultaneously. Parallelism: Doing many tasks at literally the same
time. Instead of appearing simultaneous, they are simultaneous.
Concurrency is most often used for applications that are IO heavy. For
example, a web app may regularly interact with a database or make lots
of network requests. By using concurrency, we can keep our application
responsive, even while we wait for the database to respond to our
query.
This is possible because the Ruby VM allows other threads to run while
one is waiting during IO. Even if a program has to make dozens of
requests, if we use concurrency, the requests will be made at
virtually the same time.
Parallelism, on the other hand, is not currently supported by Ruby.
So from this piece of the article, I understand that what I want to do needs to be done concurrently because I am pinging URLs on the network and that Parallelism is not currently supported by Ruby.
Next is where things get confused for me. From my part 1 question on Stack Overflow, I learned the following in a comment given to me that I should do the following:
Use a thread pool; don't just create a thousand concurrent threads. For something like
connecting to a URL where there will be a lot of waiting you can
oversubscribe the number of threads per CPU core, but not by a huge
amount. You'll have to experiment.
Another user says this:
You'd not spawn thousands of threads, use a connection pool
(e.g https://github.com/mperham/connection_pool) so you have maximum
20-30 concurrent requests going (this maximum number should be
determined by testing at which point network performance drops and you
get these timeouts)
So for this part, I turned to concurrent-ruby and implemented both a CachedThreadPool and a FixedThreadPool with10 threads. I chose a `CachedThreadPool because it seemed to me that the number of threads needed would be taken care of for me by the Threadpool. Now in concurrent ruby's documentation for a pool, I see this:
pool = Concurrent::CachedThreadPool.new
pool.post do
# some parallel work
end
I thought we just established in the first article that parallelism is not supported in Ruby, so what is the thread pool doing? Is it working concurrently or in parallel? What exactly is going on? Do I need a thread pool or not? Also at this point in time I thought connection pools and thread pools were the same just used interchangeably. What is the difference between the two pools and which one do I need?
In another excellent article How to Perform Concurrent HTTP Requests in Ruby and Rails, this article introduces the Concurrent::Promises class form concurrent ruby to avoid locks and have thread safety with two api calls. Here is a snippet of code below with the following description:
def get_all_conversations
groups_thread = Thread.new do
get_groups_list
end
channels_thread = Thread.new do
get_channels_list
end
[groups_thread, channels_thread].map(&:value).flatten
end
Every request is executed it its own thread, which can run in parallel because it is a blocking I/O. But can you see a catch here?
In the above code, another mention of parallelism which we just said didn't exist in ruby. Below is the approach with Concurrent::Promise
def get_all_conversations
groups_promise = Concurrent::Promise.execute do
get_groups_list
end
channels_promise = Concurrent::Promise.execute do
get_channels_list
end
[groups_promise, channels_promise].map(&:value!).flatten
end
So according to this article, these requests are being made 'in parallel'. Are we still talking about concurrency at this point?
Finally, in these two articles, they talk about using Futures for concurrent http requests. I won't go into the details but I'll paste the links here.
1.Using Concurrent Ruby in a Ruby on Rails Application
2. Learn Concurrency by Implementing Futures in Ruby
Again, what's talked about in the article looks to me like the Concurrent::Promise functionality. I just want to note that the examples show how to use the concepts for two different API calls that need to be combined together. This is not what I need. I just need to make thousands of API calls fast and log the results.
In conclusion, I just want to know what I need to do to make my code faster and thread safe to make it run concurrently. What exactly am I missing to make the code go faster because right now it is going so slow that I might as well not have used threads in the first place.
Summary
I have to ping thousands of URLs using threads to speed up the process. The code is slow and I am confused if I am using threads, thread pools, and concurrency correctly.
Let us look at the problems you have described and try to solve these one at a time:
You have two pieces of code, SslClient and the script which uses this ssl client. From my understanding of the threadpool, the way you have used the threadpool needs to be changed a bit.
From:
pool = Concurrent::CachedThreadPool.new
pool.post do
[LARGE LIST OF URLS TO PING].each do |struct|
ssl_client = SslClient.new(struct.domain.gsub("*.", "www."), struct.scan_port)
cert_info = ssl_client.ping_for_certificate_info
struct.x509_cert = cert_info[:certificate]
struct.verify_result = cert_info[:verify_result]
end
end
pool.shutdown
pool.wait_for_termination
to:
pool = Concurrent::FixedThreadPool.new(10)
[LARGE LIST OF URLS TO PING].each do | struct |
pool.post do
ssl_client = SslClient.new(struct.domain.gsub("*.", "www."), struct.scan_port)
cert_info = ssl_client.ping_for_certificate_info
struct.x509_cert = cert_info[:certificate]
struct.verify_result = cert_info[:verify_result]
end
end
pool.shutdown
pool.wait_form
In the initial version, there is only one unit of work that is posted to the pool. In the second version, we are posting as many units of work to the pool as there are items in LARGE LIST OF URLS TO PING.
To add a bit more about Concurrency vs Parallelism in Ruby, it is true that Ruby doesn't support true parallelism due to GIL (Global Interpreter Lock), but this only applies when we are actually doing any amount of work on the CPU. In case of a network request, CPU bound work duration is very negligible compared to the IO bound work, which means that your usecase is a very good candidate for using threads.
Also by using a threadpool, we can minimize the overhead of thread creation incurred by the CPU. When we use a threadpool, like in the case of Concurrent::FixedThreadPool.new(10), we are literally restricting the number of threads that are available in the pool, for an unbound threadpool, new threads are created for everytime when a unit of work is present, but rest of thre threads in the pool are busy.
In the first article, there was a need to collect the result returned by each individual workers and also to act meaningfully in case of an exception (I am the author). You should be able to use the class given in that blog without any change.
Lets try rewriting your code using Concurrent::Future since in your case too, we need the results.
thread_pool = Concurrent::FixedThreadPool.new(20)
executors = [LARGE LIST OF URLS TO PING].map do | struct |
Concurrent::Future.execute({ executor: thread_pool }) do
ssl_client = SslClient.new(struct.domain.gsub("*.", "www."), struct.scan_port)
cert_info = ssl_client.ping_for_certificate_info
struct.x509_cert = cert_info[:certificate]
struct.verify_result = cert_info[:verify_result]
struct
end
end
executors.map(&:value)
I hope this helps. In case of questions, please ask in comments, I shall modify this write up to answer those.

Ruby (rails) non-blocking recursive algorithm?

I've written the following pseudo-ruby to illustrate what I'm trying to do. I've got some computers, and I want to see if anything's connected to them. If nothing is connected to them, try again for another two attempts, and if that's the still case, shut it down.
This is for a big deployment so this recursive timer could be running for hundreds of nodes. I just want to check, is this approach sound? Will it generate tonnes of threads and eat up lots of RAM while blocking the worker processes? (I expect it will be running as a delayed_job)
check_status(0)
def check_status(i)
if instance.connected.true? then return
if instance.connected.false? and i < 3
wait.5.minutes
instance.check_status(i+1)
else
instance.shutdown
return
end
end
There is not going to be a large problem when the maximum recursion depth here is 3. It should be fine. Recursing a method does not create threads, but each call does store more information about the call stack, and eventually the resources used for that storage could run out. Not after 3 calls though, that is quite safe.
However, there is no need for recursion to solve your problem. The following loop should do just as well:
def check_status
return if instance.connected.true?
2.times do
wait.5.minutes
return if instance.connected.true?
end
instance.shutdown
end
You got answers from other users already. However, since you are waiting 5 minutes at least two times, you might consider using another language or change the design.
Ruby (MRI) has a global interpreter lock, which restricts parallel execution of Ruby code. MRI is not parallel. You risk to be inefficient with this.
Consider using threads (a reasonable number of thread pools might make sense), probably fed by a queue with tasks
Make sure you don't wait 5 minutes. Instead put them to sleep for that time. This way other threads can execute, while some are sleeping/waiting
You could also consider using jRuby, since jRuby has true parallelism (MRI is restricted by the GIL, thus it is not truly parallel)
Consider using another programming language that might be more performant
If it's running via delayed_job why not use the gem's functionality to implement what you want? I, for one, would go for something like the following. No need to sleep the delayed jobs or anything.
class CheckStatusJob
def before(job)
#job = job
end
def perform
if instance.connected.true? then return
if instance.connected.false? and #job.attempts < 3
raise 'The job failed!'
else
instance.shutdown
end
end
def max_attempts
3
end
def reschedule_at(current_time, attempts)
current_time + 5.minutes
end
end

Can ruby exceptions be handled asynchronously outside of a Thread::handle_interrupt block?

At first glance, I thought the new ruby 2.0 Thread.handle_interrupt was going to solve all my asynchronous interrupt problems, but unless I'm mistaken I can't get it to do what I want (my question is at the end and in the title).
From the documentation, I can see how I can avoid receiving interrupts in a certain block, deferring them to another block. Here's an example program:
duration = ARGV.shift.to_i
t = Thread.new do
Thread.handle_interrupt(RuntimeError => :never) do
5.times { putc '-'; sleep 1 }
Thread.handle_interrupt(RuntimeError => :immediate) do
begin
5.times { putc '+'; sleep 1}
rescue
puts "received #{$!}"
end
end
end
end
sleep duration
puts "sending"
t.raise "Ka-boom!"
if t.join(20 + duration).nil?
raise "thread failed to join"
end
When run with argument 2 it outputs something like this:
--sending-
--received Ka-boom!
That is, the main thread sends a RuntimeError to the other thread after two seconds, but that thread doesn't handle it until it gets into the inner Thread.handle_interrupt block.
Unfortunately, I don't see how this can help me if I don't know where my thread is getting created, because I can't wrap everything it does in a block. For example, in Rails, what would I wrap the Thread.handle_interrupt or begin...rescue...end blocks around? And wouldn't this differ depending on what webserver is running?
What I was hoping for is a way to register a handler, like the way Kernel.trap works. Namely, I'd like to specify handling code that's context-independent that will handle all exceptions of a certain type:
register_handler_for(SomeExceptionClass) do
... # handle the exception
end
What precipitated this question was how the RabbitMQ gem, bunny sends connection-level errors to the thread that opened the Bunny::Session using Thread#raise. These exceptions could end up anywhere and all I want to do is log them, flag that the connection is unavailable, and continue on my way.
Ideas?
Ruby provides for this with the ruby Queueobject (not to be confused with an AMQP queue). It would be nice if Bunny required you to create a ruby Queue before opening a Bunny::Session, and you passed it that Queue object, to which it would send connection-level errors instead of using Thread#raise to send it back to where ever. You could then simply provide your own Thread to consume messages through the Queue.
It might be worth looking inside the RabbitMQ gem code to see if you could do this, or asking the maintainers of that gem about it.
In Rails this is not likely to work unless you can establish a server-wide thread to consume from the ruby Queue, which of course would be web server specific. I don't see how you can do this from within a short-lived object, e.g. code for a Rails view, where the threads are reused but Bunny doesn't know that (or care).
I'd like to raise (ha-ha!) a pragmatic workaround. Here be dragons. I'm assuming you're building an application and not a library to be redistributed, if not then don't use this.
You can patch Thread#raise, specifically on your session thread instance.
module AsynchronousExceptions
#exception_queue = Queue.new
class << self
attr_reader :exception_queue
end
def raise(*args)
# We do this dance to capture an actual error instance, because
# raise may be called with no arguments, a string, 3 arguments,
# an error, or any object really. We want an actual error.
# NOTE: This might need to be adjusted for proper stack traces.
error = begin
Kernel.raise(*args)
rescue => error
error
end
AsynchronousExceptions.exception_queue.push(error)
end
end
session_thread = Thread.current
session_thread.singleton_class.prepend(AsynchronousExceptions)
Bear in mind that exception_queue is essentially a global. We also patch for everybody, not just the reader loop. Luckily there are few legitimate reasons to do Thread.raise, so you might just get away with this safely.

Problem with nested timeouts in ruby over system calls

I ran into this weird issue while working with ruby(on rails) time outs. This time out
timeout(10) do
//some code involving http calls that takes more than 10 seconds
end
is not working. But this timeout
timeout(20) do
timeout(10) do
//some code involving http calls that takes more than 10 seconds
end
end
times out after 20 seconds. I read that timeout in ruby wont work properly if it involves system calls. If that be the case then any number of nested timeout should also not work. Why would this work on the second timeout?
btw..the link I referred
http://ph7spot.com/musings/system-timer
Thanks in advance
You might have better luck using a combination of timeout and terminator to do this sort of thing.
One of the known deficiencies of the timeout method is it's not always strictly enforced and many things can block it.

Resources