I want to create a thread object in environment.rb and use it in some other action of some controller.
How should I do it?
Thanks in advance.
Actually, I want three processes to be running perpetually which are fetching some data and storing it in database. That's why I am using threads. Is there any other way to do so?
To answer your initial question, constants declared in environment.rb are available throughout the entire codebase. Avoid doing so if you can, though; this can become configuration spaghetti pretty quickly.
More broadly, although Rails has been (from what I understand) thread-safe since version 2.2, threads are still quite uncommon - particularly in MRI - as a way to provide concurrent operation, and MRI's green threads are anyway not particularly helpful. Consider using a message queue like Starling that spins up other Ruby processes to perform asynchronous work.
Further to what Brian says, consider using an initializer (put in config/initializers to have it executed). I think it makes the intent more clear than using environment.rb.
Be very careful with this. To the best of my knowledge, rails is not thread-safe. And trying to use threads safely in the face of all the magic (excuse me, "meta programming") it does sounds risky as all get out.
Why do you want a thread object anyway?
In response to the comment, saying rails is thread safe might not mean as much as you think. I's certainly be leery of counting on it at if I didn't need to.
Related
First time learning about concurrency and threading within Rails, so any advice is very appreciated.
I currently have an array of 50 strings. I have an 3rd party API call that takes in the string and returns a numeric value. Right now I am simply calling the API on each string one at a time, which takes a really long time.
After looking at a few SO like this one, this other one and finally this one, it seems like I have to use some sort of threading to achieve what I want to do. My plan is to break down the array into batches of ten strings, and then run 5 API calls on each array of ten strings concurrently in hopes that it will drastically reduce the time.
I've never done threading of any kind with rails before, so I just wondering if I am on the right track following the third SO post above, or if I should use other techniques that may be better for my need.
The approach you take will depend on your use case. Do you need to wait for all the calls to be made to do something with the result? Can it be asynchronous?
If you are looking into threads to distribute the work then the third SO post you mentioned is a good way to do it.
If your use case permits the process to be async, I'd definitely look into a scheduler, as mentioned in the first SO post. I've use DelayedJob for this goal, there are some other alternatives.
On a related topic, I usually implement a micro-service that receives those requests and processes them async instead of having DelayedJob in the same app, but is just a matter of preference.
Something REALLY important to have in mind if you go with the async approach is that if you are accessing ActiveRecord records inside a thread you need to explicitly check out the database connection. Rails only handles the check in/out of connections in the main thread. Be really careful on this since it can cause connection leaks really hard to track.
The first answer on this SO post shows how to ensure the db connection to be released.
Hope that helps.
While researching a deadlock issue, I found the following post:
https://rails.lighthouseapp.com/projects/8994/tickets/6596
The gist of it is as follows:
the MySQL docs say:
Deadlocks are a classic problem in transactional databases, but they are not dangerous unless they are so frequent that you cannot run certain transactions at all. Normally, you must write your applications so that they are always prepared to re-issue a transaction if it gets rolled back because of a deadlock.
Therefore debugging transient deadlocks is an antipattern because MySQL says they are OK and unavoidable.
Therefore, Rails should offer us a way, because it:
makes the assumption that there is the "best" way to do things, and it's designed to encourage that way
but Rails doesn't offer us a way so we are using a hacky DIY thing.
So if all of this is true, where is the Rails solution?
NOTE: This project is inactive, but seems simple enough to be a solution. Why does Rails not have something like this?
https://github.com/qertoip/transaction_retry
The fix, for me, was a better index.
The update in question was in a query with a join, and existing indexes were not sufficient for MySQL to join and search efficiently.
Adding the appropriate index completely removed the deadlock issue even in tests with unreasonably concurrent loads.
I've got a script in a Rails 2 application that involves performing a lot of tasks in parallel (using the parallel gem) with threads. What I'm running into is an issue with uninitialized constants due to the way Rails (or ActiveSupport?) loads dependencies.
My online research has indicated to me that I'm probably supposed to use config.threadsafe! or something of that nature to address this issue. The problem there, though, is that I don't want to change the way the entire application works (by changing environment.rb) since the use of threads is really localized to this one script.
What I'd done thus far is simply initialize as many constants as I know I need before creating any new Thread objects:
constants_to_load = [
User,
Merchant,
Loggable,
UserMailer,
# etc.
]
# code involving threads
This is clearly not a good approach, as it's based entirely on my own guesstimated list of what constants need to be loaded (mostly from reactively adding constants whenever I encounter an exception) and is about as far from future-proof as it gets. Is it possible to just say something like "Load everything I could possibly need"? I know that probably sounds bad; it just seems like the most fool-proof approach to me. But if there's another way to do this right, I'm all ears.
I keep getting conflicting opinions on the practice of storing information in the Thread.current hash (e.g., the current_user, the current subdomain, etc.). The technique has been proposed as a way to simplify later processing within the model layer (query scoping, auditing, etc.).
Why are my thread variables intermittent in Rails?
Alternative to using Thread.current in API wrapper for Rails
Are Thread.current[] values and class level attributes safe to use in rails?
Many consider the practice unacceptable because it breaks the MVC pattern.
Others express concerns about reliability/safety of the approach, and my 2-part question focuses on the latter aspect.
Is the Thread.current hash guaranteed to be available and private to one and only one response, throughout its entire cycle?
I understand that a thread, at the end of a response, may well be handed over to other incoming requests, thereby leaking any information stored in Thread.current. Would clearing such information before the end of the response (e.g. by executing Thread.current[:user] = nil from a controller's after_filter) suffice in preventing such security breach?
Thanks!
Giuseppe
There is not an specific reason to stay away from thread-local variables, the main issues are:
it's harder to test them, as you will have to remember to set the thread-local variables when you're testing out code that uses it
classes that use thread locals will need knowledge that these objects are not available to them but inside a thread-local variable and this kind of indirection usually breaks the law of demeter
not cleaning up thread-locals might be an issue if your framework reuses threads (the thread-local variable would be already initiated and code that relies on ||= calls to initialize variables might fail
So, while it's not completely out of question to use, the best approach is not to use them, but from time to time you hit a wall where a thread local is going to be the simplest possible solution without changing quite a lot of code and you will have to compromise, have a less than perfect object oriented model with the thread local or changing quite a lot of code to do the same.
So, it's mostly a matter of thinking which is going to be the best solution for your case and if you're really going down the thread-local path, I'd surely advise you to do it with blocks that remember to clean up after they are done, like the following:
around_filter :do_with_current_user
def do_with_current_user
Thread.current[:current_user] = self.current_user
begin
yield
ensure
Thread.current[:current_user] = nil
end
end
This ensures the thread local variable is cleaned up before being used if this thread is recycled.
This little gem ensures your thread/request local variables not stick between requests: https://github.com/steveklabnik/request_store
The accepted answer covers the question but as Rails 5 now provides a "Abstract super class" ActiveSupport::CurrentAttributes which uses Thread.current.
I thought I would provide a link to that as a possible(unpopular) solution.
https://github.com/rails/rails/blob/master/activesupport/lib/active_support/current_attributes.rb
The accepted answer is technically accurate, but as pointed out in the answer gently, and in http://m.onkey.org/thread-safety-for-your-rails not so gently:
Don't use thread local storage, Thread.current if you don't absolutely have to
The gem for request_store is another solution (better) but just read the readme there for more reasons to stay away from thread local storage.
There is almost always a better way.
I am trying to spawn a thread in Rails. I am usually not comfortable using threads as I will need to have an in-depth knowledge of Rails' request/response cycle, yet I cannot avoid using one as my request times out.
In order to avoid the time out, I am using a thread within a request. My question here is simple. The thread that I've used accesses a params[] variable inside it. And things seem to work OK now. I want to know whether this is right? I'd be happy if someone can throw some light on using Threads in Rails during request/response cycle.
[Starting a bounty]
The short answer is yes, but only to a degree; the binding in which the thread was created will continue to persist. The params will still exist only if no one (including Rails) goes out of their way to modify or delete the params hash. Instead, they rely on the garbage collector to clean up these objects. Since the thread has access to the current context (called the "binding" in Ruby) when it was created, all the variables that can be reached from that scope (effectively the entire state when the thread was created) cannot be deleted by the garbage collector. However, as executing continues in the main thread, the values of the variables in that context can be changed by the main thread, or even by the thread you created, if it can access it. This is the benefit--and the downfall--of threads: they share memory with everything else.
You can emulate a very similar environment to Rails to test your problem, using a function as such: http://gist.github.com/637719. As you can see, the code still prints 5.
However, this is not the correct way to do this. The better way to pass data to a thread is to pass it to Thread.new, like so:
# always dup objects when passing into a thread, else you really
# haven't done yourself any good-it would still be the same memory
Thread.new(params.dup) do |params|
puts params[:foo]
end
This way, you can be sure than any modifications to params will not affect your thread. The best practice is to only use data you pass to your thread in this way, or things that the thread itself created. Relying on the state of the program outside the thread is dangerous.
So as you can see, there are good reasons that this is not recommended. Multithreaded programming is hard, even in Ruby, and especially when you're dealing with as many libraries and dependencies as are used in Rails. In fact, Ruby seems to make it deceptively easy, but it's basically a trap. The bugs you will encounter are very subtle; they will happen "randomly" and be very hard to debug. This is why things like Resque, Delayed Job, and other background processing libraries are used so widely and recommended for Rails apps, and I would recommend the same.
The question is more does rails keep the request open whilst the thread is running than does it persist the value.
It won't persist the value as soon as the request ends and I also wouldn't recommend holding the request open unless there is a real need. As other users have said some stuff is just better in a delayed job.
Having said that we used threading a couple of times to query multiple sources concurrently and actually reduce the response time of an app (that was only for admins so didn't need to have fast response times) and if memory serves correctly the thread can keep the request open if you call join at the end and wait for each thread to finish before continuing.