I have a variable $proxyManager, which is a global instance of a class ProxyManager.
$proxyManager = ProxyManager.new
I have a function getProxy in this class, which is called many times by multiple threads. Inside this function, I am popping an item from an array (let's assume this is thread safe). I am then setting its identifier to the current time, setting an active flag, and returning it.
Is it possible for proxy.identifier to "change" in the same thread after it's been set? As an example, suppose Thread 1 sets the identifier to 1, and immediately Thread 2, executes the same line and sets it to 2. Does this mean Thread 1's identifier is now 2?
class ProxyManager
def getProxy
key = "proxy"
proxy = popFromGlobalArray() #assume this operation is thread-safe/atomic
proxy.identifier = Time.now.to_i
proxy.active = 1
return proxy
end
end
It is not inherently thread-safe although it will depend upon exactly what is being done and to what. Also, the implementation - e.g. Ruby 1.8 MRI with "green threads" vs Ruby 2 MRI vs JRuby, etc - will play a role in how race conditions, if any, will materialize.
Remember that race conditions often result from shared data. The variables are not important (and a thread will not use another threads local variables any more than a recursive method will reuse variables), but the object named by the variables is important. (Note: proxy is a local variable but proxy.instance is not a local variable!)
Race condition assuming shared data/object:
proxy_A = popFromGlobalArray()
proxy_B = popFromGlobalArray()
# assume same object was returned so that proxy_A.equal? proxy_B is true
proxy_A.identifier = Time.now.to_i
proxy_A.active = 1
proxy_B.identifier = Time.now.to_i # such that it is different
proxy_B.active = 1
This isn't very exciting here, because the outcome at this point is the same, but imagine if the returned object (of which proxy_A and proxy_B both refer to) has been used between the assignments (and thread visibility propagation) of the identifier variable - broken code.
That is, assume the above is expanded:
h = {}
# assume same object was returned so that proxy_A.equal? proxy_B is true
proxy_A.identifier = Time.now.to_i
h[proxy_A.identifier] = proxy_A # i.e. used after return
proxy_B.identifier = Time.now.to_i # such that it is different
h[proxy_B.identifier] = proxy_B # i.e. used after return
# now there may be an orphaned key/value.
Of course, if popFromGlobalArray is guaranteed to return different objects then the above does not apply, but there is another issue - a race condition dependent upon precision of time:
proxy_A = popFromGlobalArray()
proxy_B = popFromGlobalArray()
# assume Time.now.to_i returns x for both threads
proxy_A.identifier = x
proxy_B.identifier = x
# and a lost proxy ..
h = {}
h[proxy_A.identifier] = proxy_A
h[proxy_B.identifier] = proxy_B
Moral of the story: don't rely on luck. I've simplified the above to show race conditions that can occur with instant visibility of data between threads. However, data visibility between threads - i.e. lack of memory fences - make this problem far worse than it may initially look.
If popFromGlobalArray() functions correctly in a multithreaded environment and is guaranteed not to return the same object more than once, and the implementation of the proxy class does not share state between instances, the rest of the function should be fine. You aren't operating on the same data on different threads, so they can't conflict.
If you're worried about the variables themselves, you needn't be. Locals are defined per method invocation, and different threads will be running different invocations of the method. They don't share locals.
Obviously the specifics can make this less true, but this is how it generally works.
local variables are defined per thread, and are thread safe.
if your array is indeed atomic, the proxy variable is guaranteed be a different item each time a thread enters this function, and other threads wont be able to overwrite the identifier.
Related
When I read The Swift Programming Language: Memory Safety, I was confused by the section Conflicting Access to Properties:
The code below shows that the same error appears for overlapping write
accesses to the properties of a structure that’s stored in a global
variable.
var holly = Player(name: "Holly", health: 10, energy: 10)
balance(&holly.health, &holly.energy) // Error
In practice,
most access to the properties of a structure can overlap safely. For
example, if the variable holly in the example above is changed to a
local variable instead of a global variable, the compiler can prove
that overlapping access to stored properties of the structure is
safe:
func someFunction() {
var oscar = Player(name: "Oscar", health: 10, energy: 10)
balance(&oscar.health, &oscar.energy) // OK
}
In the example above, Oscar’s health and energy are passed as the two in-out parameters to balance(_:_:). The compiler can prove that memory
safety is preserved because the two stored properties don’t interact
in any way.
How the compiler can prove that memory safety?
Being inside a function scope gives the compiler the certainty of which operations will be executed on the struct and when. The compiler knows how structs work, and how and when (relative to the time the function is called) the code inside a function is executed.
In a global or larger scope, the compiler loses visibility over what could be modifying the memory, and when, so it cannot assure safety.
It's because of multiple threads. When "holly" is a global variable, multiple threads could access the global variable at the same time, and you are in trouble. In the case of a local variable, that variable exists once per execution of the function. If multiple threads run someFunction() simultaneously, each thread has its own "oscar" variable, so there is no chance that thread 1's "oscar" variable access thread 2's oscar variable.
An answer from Andrew_Trick on Swift Forums:
"don't interact" is a strong statement. The compiler simply checks that each access to 'oscar' in the call to 'balance' can only modify independent pieces of memory ('heath' vs 'energy'). This is a special case because any access to a struct is generally modeled as a read/write to the entire struct value. The compiler does a little extra work to detect and allow this special case because copying the 'oscar' struct in situations like this would be inconvenient.
I have this method in a class:
class User < ApplicationRecord
...
def answers
#answers ||= HTTParty.get("http://www.example.com/api/users/#{self.id}/answers.json")
end
...
end
Since I'm using Puma as a web server I'm wondering if this code is thread safe? can someone confirm it and if possible explain why this is thread safe?
This in an instance method, not to be confused with a class method. The answers method is on an instance of User, as opposed to being on the User class itself. This method is caching the answers on the instance of a User, but as long as this User instance is being instantiated with each web request (such as a User.find()or User.find_by()), you’re fine because the instance is not living between threads. It’s common practice to look records up every web request in the controller, so you’re likely doing that.
If this method was on the User class directly (such as User.answers), then you’d need to evaluate whether it’s safe for that cached value to be maintained across threads and web requests.
To recap, the your only concern for thread safety is class methods, class variables (instance variables that use two at signs such as ##answers), and instance methods where the instance lives on past a single web request.
If you ever find yourself needing to use a class-level variable safely, you can use Thread.current, which is essentially a per-thread Hash (like {}) that you can store values in. For example Thread.current[:foo] = 1 would be an example. ActiveSupport uses this when setting Time.zone.
Alternatively you may find times where you need a single array that you need to safely share across threads, in which case you’d need to look into Mutex, which basically lets you have an array that you lock and unlock to give threads safe access to reading and writing in it. The Sidekiq gem uses a Mutex to manage workers, for example. You lock the Mutex, so that no one else can change it, then you write to it, and then unlock it. It’s important to note that if any other thread wants to write to the Mutex while it’s locked, it’ll have to wait for it to become unlocked (like, the thread just pauses while the other thread writes), so it’s important to lock as short as possible.
I saw some retry code written like this, it tries to call a service 3 times if some exception is raised, Im trying to understand in a non-MRI multi-threaded server, is this counter thread safe? is it necessary to lock the process using Mutex?
This is how its been called
MyClass.new.my_method
class MyClass
def my_method
counter = 3
begin
call_some_service_raise_some_exception
rescue SomeException => e
retry if counter.positive?
end
end
end
Assuming the variable counter is scoped to that method only, and that there is no funny shenanigans going on with Singleton or any other weird stuff, then yes, that method should be thread safe in its current form.
If, however, counter is an instance variable and you are using an accessor to set it, then that method is not thread safe. You may never encounter the race condition if you're using every MyClass once only, but all it takes is one well-meaning refactoring to reuse MyClass and suddenly you've got a race condition there.
Basically - if your method is self-contained, uses variables scoped to it only, and references no external shared data then it is thread safe by default.
As soon as you use something that could be accessed at the same time by another thread, you have a potential race condition in the making and you should start thinking about synchronising access to the shared resources.
I have a controller comparing parsed documents with databases of key words. Typical snippets are encapsulated loops looping through the parsed document, the database of key words counting the number a match and building an array of matching words. Here is an example :
#key_words = KeyWords.all
#count = 0
#array_of_matching_words = []
#parsed_document.each do |a|
#key_words.each do |word|
if a =~ /#{key_word.word}/
#count = #count+1
#array_of_matching_words = #array_of_matching_words.push "#{key_word.word}"
end
end
Both instantiated variable #count and #array_of_matching_words are passed to a view. I have accumulated numerous of those snippets for the other databases of words and I am trying to re-factor the code by moving some of this code into separate methods.
One of the option I have considered is to move the code to a private method in the controller
def get_the_number_and_the_list_of_matching_words
#key_words = KeyWords.all
#count = 0
#array_of_matching_words = []
#parsed_document.each do |a|
#key_words.each do |word|
if a =~ /#{key_word.word}/
#count = #count+1
#array_of_matching_words = #array_of_matching_words.push "#{key_word.word}"
end
end
end
I then just leave
get_the_number_and_the_list_of_matching_words
in my action controller and the view will get both #count and #array_of_matching_words.
I find this coding "style" terrible. The method "get_the_number_and_the_list_of_matching_words" appears out of the blue, it does not instantiate anything, no parameter is passed, both instantiated variable (#count, #array_of_matching_words) are not visible (they are burried at the bottom of my .rb file in the private methods section). Is this really a good way to re-factor the code or is there other option? Does it make sense to move code to a method when the method is neither an instance method nor a method requiring to pass a parameter and returning new variables?
A method name like get_the_number_and_the_list_of_matching_words is preposterously long, and if it needs to be that long to convey what it's doing then your method is overly ambitious by design. Break it up into smaller chunks.
Your example method sets three different instance variables, depends on one which isn't passed in as an argument, it's just presumed to magically be there, and has no clear objective other than to define #array_of_matching_words.
Generally including things like the type of something in the variable name is extraneous information. #matching_words is vastly preferable here.
What you should do is break out the logic that does the comparison between #key_words and #parsed_document where both of those are passed in as parameters. Rails encourages putting such methods into "concerns" or, where they're applicable to a particular model, in there if that's a good fit.
Whatever KeyWords is could be extended to produce the desired output, for example, making your code look a lot more like this:
def prepare_key_words_matching
#key_words = KeyWords.all
#matching_words = KeyWords.matching(#parsed_document)
end
Where you could then use #matching_words.length instead of having a separate #count variable.
This could be a personal preference question, but here goes.
I would recommend splitting methods to be as small as possible. The benefits are two fold:
Consider a newcomer reading the method. If it is only a few lines long (regardless of the several other methods), it will be easier to follow
Testing becomes a lot easier if you delegate logic to other methods. You now only need to test that a method is called within your method, rather than test all of that logic
I agree that using different methods within a method is obfuscating things, especially when instance variables are concerned, but one only needs to look at the method and follow which others are called by it to learn what is happening.
I'm into Ruby on Rails programming for almost 5 weeks now.
I was wondering why people always use instance variables instead of local ones.
I always thought that you would use instance variables only for classes (so these instance variables are the attributes of the class).
But people also use them not only for being attributes of a class. And this is the part where I am getting confused.
For instance, take a look at these lines of codes:
class Foo
def print_a_hello
puts "Hello World"
end
end
#instance_variable = Foo.new
#instance_variable.print_a_hello
# => "Hello World"
locale_variable = Foo.new
locale_variable.print_a_hello
# => "Hello World"
So, who of you got a great explanation for me?
I was wondering why people always use instance variables instead of locale ones.
I'm not sure how you get that impression. I certainly don't "always" use instance variables. I use instance variables when I need an instance variable, and I use local variables, when I need a local variable, and most code I see does it the same way.
Usually, it doesn't even make sense to interchange them. They have completely different purpose: local variables have static lexical scope, instance variables have dynamic object scope. There's pretty much no way to interchange them, except for the very narrow case of a simple single-file procedural script, where the dynamic scope of the top-level main object and the lexical scope of the script body are identical.
I always thought that you would use instance variables only for classes (so these instance variables are the attributes of the class).
No. Instance variables are attributes of the instance (i.e. object), not the class, that's why they are called "instance variables", after all. Class variables are attributes of the class, but class variables are a different beast and only used in very specific circumstances. (Classes are objects (i.e. instances), too, so they can have instance variables as well; there's generally no need to use class variables, which have some weird and un-intuitive properties, unless you specifically need those weird and un-intuitive properties).
For instance, take a look on this short codelines:
class Foo
def print_a_hello
puts "Hello World"
end
end
#instance_variable = Foo.new
#instance_variable.print_a_hello
# => "Hello World"
locale_variable = Foo.new
locale_variable.print_a_hello
# => "Hello World"
This is the case I mentioned above: in this specific case (and only in this case), the dynamic scope of the top-level main object and the static lexical scope of the script body are identical, so it doesn't matter whether you use a local variable of the script body or an instance variable of the main object.
However, if we make just a tiny change to that, by adding a second script and requireing it from the first, that condition will no longer hold, because we now have two separate script bodies and thus two separate script scopes, but still only one top-level object.
The idiomatic way in your example would definitely be to use a local variable, and nobody I know would do otherwise.
Best use case for instance variables is in Controller's when you want to pass parameter to the view.
Then you use something like
class TestController < ActionController::Base
def show
#usable_in_view = Test.first
not_usable_in_view = Test.first
end
end
In your view you can now use #usable_in_view, but cant use variable not_usable_in_view. Most people always use instance variable in controllers even if they do not need them in view, because they do not understand why they need instance variable
Instance variables are used so that they can be accessed in the view page.
Local variables are not accessible in the view. It has become the habit even I sometimes write instance variables though it is not required in the view.:-)
People probably get in the [bad] habit of using instance variables everywhere since it's common in Rails to use them to get information from the controller to the view.
In my own Ruby code I use instance variables only when I need to, local variables otherwise. That's the proper way to use Ruby.