I saw some retry code written like this, it tries to call a service 3 times if some exception is raised, Im trying to understand in a non-MRI multi-threaded server, is this counter thread safe? is it necessary to lock the process using Mutex?
This is how its been called
MyClass.new.my_method
class MyClass
def my_method
counter = 3
begin
call_some_service_raise_some_exception
rescue SomeException => e
retry if counter.positive?
end
end
end
Assuming the variable counter is scoped to that method only, and that there is no funny shenanigans going on with Singleton or any other weird stuff, then yes, that method should be thread safe in its current form.
If, however, counter is an instance variable and you are using an accessor to set it, then that method is not thread safe. You may never encounter the race condition if you're using every MyClass once only, but all it takes is one well-meaning refactoring to reuse MyClass and suddenly you've got a race condition there.
Basically - if your method is self-contained, uses variables scoped to it only, and references no external shared data then it is thread safe by default.
As soon as you use something that could be accessed at the same time by another thread, you have a potential race condition in the making and you should start thinking about synchronising access to the shared resources.
Related
It's been hammered into my head that I shouldn't use ThreadLocal with Reactor. But I want to know if I can use ThreadLocal within a single execution of a reactor function.
Specifically, when inside a Spring Webflux Controller method, can the thread ever change if I don't invoke a reactor function?
Please let me know if this is correct
#GetMapping
public Mono<String> someControllerMethod() {
// Thread 1 executing
ThreadLocal<String> USER_ID = new ThreadLocal<>();
USER_ID.set("1");
Thread.sleep(...);
someMethod();
// Thread 1 executing
assertEquals(USER_ID.get(), "1"); // this will ALWAYS be true
return Mono.just("hello ")
// this is the only time a new thread executes and USER_ID is not set
.flatMap(s -> s + USER_ID.get());
}
void someMethod() {
// Thread 1 executing
assertEquals(USER_ID.get(), "1"); // this will ALWAYS be true
}
Is my understanding above correct?
Revised this section for clarity
In a reactor chain of many operators, each operator (e.g. map) could be run under different threads, and even different "instances?" (e.g. map of url N) of the same operator could be on different threads. But once we're in an instance of a operator, will it always be the same thread (ie is it safe to declare ThreadLocal in an instance of an reactor operator)?
// main thread
Flux.fromIterable(urls)
.map(url -> {
// each of these instances runs on a different thread
// but is declaring ThreadLocal here safe to do?
ThreadLocal<String> URL = new ThreadLocal<>();
URL.set(url);
// Will URL always be set deep in the call stack?
someOtherMethod();
// Will URL always be set at the end?
URL.get();
});
.subscribeOn(Schedules.boundedElastic())
.subscribe();
void someOtherMethod() {
URL.get(); // will this will ALWAYS be set?
}
Basically, I'd like to know whether it's safe to use ThreadLocal objects like io.grpc.Context within a single instance of a Reactor operator execution.
It's been hammered into my head that I shouldn't use ThreadLocal with Reactor.
You mustn't use ThreadLocal in a reactive chain with reactor (which is the only sensible way to use that library.) In a reactive chain, the thread might change whenever you invoke an asynchronous operator - so a single reactive chain could have operations executing on many different threads throughout. In this case your ThreadLocal might work sometimes, but it's unreliable - introduce an async operator that switches the thread (say a web request that's executed on the netty worker pool), and you've then introduced a subtle and weird bug that's hard to track down (you're arbitrarily leaking information from one reactive chain to another unintentionally.) In short, it's incredibly bad practice to tie your reactive chains to a single thread - while it might seem to work initially, you're going to eventually run into a lot of problems if you do.
That being said, you don't really have a reactive chain in the above method - it's incredibly weird. If you're returning a Mono<String> to try to make the method reactive, then you need to be executing everything as part of a reactive chain. What you're actually doing is:
Using synchronous & blocking logic, a complete no-no as it ties up an event loop thread which isn't allowed;
Calling another method that's not part of a reactive chain;
Using a JUnit test method in a controller class;
Wrapping up a value to return in Mono.just();
Making one flatMap call at the end (which won't work as it's not even mapping to a publisher to flatten, you'd have to use map instead.)
...so while using your ThreadLocal is technically "safe" in this context, from a wider perspective the implementation makes no sense at all. You realistically have two options - either make the entire method non-blocking and reactive properly, not just wrapping blocking logic in a reactive publisher, or make the whole controller just return a standard object and forget the reactive element entirely.
Follow-up:
once we're in an instance of a operator, will it always be the same thread (ie is it safe to declare ThreadLocal in an instance of an reactor operator)?
No, there's at least two cases I can think of where that wouldn't be safe:
Operators can be nested. Once you're "inside" a certain operator, there's no reason why other operators can't be used that would also switch thread.
Code in other threads can be explicitly started even if there's no operator.
I don't think you can wind up in cases where the thread changes under you other than those two, but I could well be missing something, and it's still a rather delicate scenario (someone could break it quite easily.) If you must use a Threadlocal for some reason then I'd still be seriously considering whether you should be using reactor in this context.
I have this method in a class:
class User < ApplicationRecord
...
def answers
#answers ||= HTTParty.get("http://www.example.com/api/users/#{self.id}/answers.json")
end
...
end
Since I'm using Puma as a web server I'm wondering if this code is thread safe? can someone confirm it and if possible explain why this is thread safe?
This in an instance method, not to be confused with a class method. The answers method is on an instance of User, as opposed to being on the User class itself. This method is caching the answers on the instance of a User, but as long as this User instance is being instantiated with each web request (such as a User.find()or User.find_by()), you’re fine because the instance is not living between threads. It’s common practice to look records up every web request in the controller, so you’re likely doing that.
If this method was on the User class directly (such as User.answers), then you’d need to evaluate whether it’s safe for that cached value to be maintained across threads and web requests.
To recap, the your only concern for thread safety is class methods, class variables (instance variables that use two at signs such as ##answers), and instance methods where the instance lives on past a single web request.
If you ever find yourself needing to use a class-level variable safely, you can use Thread.current, which is essentially a per-thread Hash (like {}) that you can store values in. For example Thread.current[:foo] = 1 would be an example. ActiveSupport uses this when setting Time.zone.
Alternatively you may find times where you need a single array that you need to safely share across threads, in which case you’d need to look into Mutex, which basically lets you have an array that you lock and unlock to give threads safe access to reading and writing in it. The Sidekiq gem uses a Mutex to manage workers, for example. You lock the Mutex, so that no one else can change it, then you write to it, and then unlock it. It’s important to note that if any other thread wants to write to the Mutex while it’s locked, it’ll have to wait for it to become unlocked (like, the thread just pauses while the other thread writes), so it’s important to lock as short as possible.
I have an extremely unusual case where I need to connect to a sqlite database using an anonymous class. This class in created on every request, and then connects to a database using ActiveRecord's establish_connection. My issue is that every time establish_connection is called, ActiveRecord creates a new connection pool to track the connections made through the class. Since these are one time use classes, this is actually a memory leak and the number of connection pools tracked by ActiveRecord grows with each request. One way to solve this is to call
model_copy = Class.new Model { ... }
model_copy.establish_connection ...
# Do work
model_copy.connection.disconnect!
model_copy.connection_handler.remove_connection model_copy
I would like to do this without the explicit disconnect! and remove_connection calls because it is annoying and very prone to errors and memory leaks. Does anyone have any guidance here?
Thanks!
Assuming the # Do work part is the only part that changes, this looks like a classic use case for a method that takes a block:
def hit_the_db
model_copy = Class.new Model { ... }
model_copy.establish_connection ...
yield
model_copy.connection.disconnect!
model_copy.connection_handler.remove_connection model_copy
end
...
hit_the_db { block of code that does work }
I have a variable $proxyManager, which is a global instance of a class ProxyManager.
$proxyManager = ProxyManager.new
I have a function getProxy in this class, which is called many times by multiple threads. Inside this function, I am popping an item from an array (let's assume this is thread safe). I am then setting its identifier to the current time, setting an active flag, and returning it.
Is it possible for proxy.identifier to "change" in the same thread after it's been set? As an example, suppose Thread 1 sets the identifier to 1, and immediately Thread 2, executes the same line and sets it to 2. Does this mean Thread 1's identifier is now 2?
class ProxyManager
def getProxy
key = "proxy"
proxy = popFromGlobalArray() #assume this operation is thread-safe/atomic
proxy.identifier = Time.now.to_i
proxy.active = 1
return proxy
end
end
It is not inherently thread-safe although it will depend upon exactly what is being done and to what. Also, the implementation - e.g. Ruby 1.8 MRI with "green threads" vs Ruby 2 MRI vs JRuby, etc - will play a role in how race conditions, if any, will materialize.
Remember that race conditions often result from shared data. The variables are not important (and a thread will not use another threads local variables any more than a recursive method will reuse variables), but the object named by the variables is important. (Note: proxy is a local variable but proxy.instance is not a local variable!)
Race condition assuming shared data/object:
proxy_A = popFromGlobalArray()
proxy_B = popFromGlobalArray()
# assume same object was returned so that proxy_A.equal? proxy_B is true
proxy_A.identifier = Time.now.to_i
proxy_A.active = 1
proxy_B.identifier = Time.now.to_i # such that it is different
proxy_B.active = 1
This isn't very exciting here, because the outcome at this point is the same, but imagine if the returned object (of which proxy_A and proxy_B both refer to) has been used between the assignments (and thread visibility propagation) of the identifier variable - broken code.
That is, assume the above is expanded:
h = {}
# assume same object was returned so that proxy_A.equal? proxy_B is true
proxy_A.identifier = Time.now.to_i
h[proxy_A.identifier] = proxy_A # i.e. used after return
proxy_B.identifier = Time.now.to_i # such that it is different
h[proxy_B.identifier] = proxy_B # i.e. used after return
# now there may be an orphaned key/value.
Of course, if popFromGlobalArray is guaranteed to return different objects then the above does not apply, but there is another issue - a race condition dependent upon precision of time:
proxy_A = popFromGlobalArray()
proxy_B = popFromGlobalArray()
# assume Time.now.to_i returns x for both threads
proxy_A.identifier = x
proxy_B.identifier = x
# and a lost proxy ..
h = {}
h[proxy_A.identifier] = proxy_A
h[proxy_B.identifier] = proxy_B
Moral of the story: don't rely on luck. I've simplified the above to show race conditions that can occur with instant visibility of data between threads. However, data visibility between threads - i.e. lack of memory fences - make this problem far worse than it may initially look.
If popFromGlobalArray() functions correctly in a multithreaded environment and is guaranteed not to return the same object more than once, and the implementation of the proxy class does not share state between instances, the rest of the function should be fine. You aren't operating on the same data on different threads, so they can't conflict.
If you're worried about the variables themselves, you needn't be. Locals are defined per method invocation, and different threads will be running different invocations of the method. They don't share locals.
Obviously the specifics can make this less true, but this is how it generally works.
local variables are defined per thread, and are thread safe.
if your array is indeed atomic, the proxy variable is guaranteed be a different item each time a thread enters this function, and other threads wont be able to overwrite the identifier.
My Rails 3.1 app uses an engine and I want to know if access to this engine is threadsafe.
I have /lib/mymodule.rb in the engine and it looks something like this:
module MyModule
def self.my_method()
begin
data = WebResource.find(:all) # Where WebResource < ActiveResource::Base
rescue
data = nil
end
return data
end
end
Then in my views/controllers, I call this method like this:
MyModule::WebResource.headers[:some_id] = cookies[:some_id]
MyModule::my_method()
In my main app, I have the threadsafe! configuration option set. I know that with threadsafe! enabled, each Controller lives in it's own thread for each request.
However, is this Module threadsafe? I suspect that there is only one copy of this module for all requests, so it is not inherently thread safe, and requires manual synchronization using something like a Mutex. Specifically, I have code that sets the header for the HTTP request outside of the ActiveResource class WebResource. Could this cause a threading issue?
It will depend on what you do inside this method as to whether it is thread safe. If it touches no class variables, then it is thread safe.
If it stores or sets information at the class level and assumes that no other method is going to touch that information before it uses it again, then it is not thread safe.