I know how useless and vague the title is. Sorry. I don't have much other than some observation and evidence that nothing changed in my code.
I have a Rails 3.2.14 app using DelayedJob and PostgreSQL 9.2. For months, I have had code that has background workers process file contents into the database. Each job/task will load 100K to 1M records. Until very, very recently, when I would watch the database, I could see the records accumulating by calling Product.count, for example.
Now, I only see Product.count update to a new sum when a job/task completes. It is almost as if the entire operation is now being wrapped in a transaction preventing me from seeing the incremental changes. I have verified that nothing in the relevant areas of code have changed and I've been on 3.2.14 for some time now.
Does anyone know what this could be? DelayedJob?
I am also using Ruby 2.0.0-p247.
From this post https://github.com/collectiveidea/delayed_job/issues/585#issuecomment-56743773, it appears that delayed_job is not wrapping the job in a transaction.
Related
Let's say I need to be sure ModelName can't be updated at the same time by two different Rails threads; this can happen, for example, when a webhooks post to the application tries to modify it at the same time some other code is running.
Per Rails documentation, I think the solution would be to use model_name_instance.with_lock, which also begins a new transaction.
This works fine and prevents simultaneous UPDATES to the model, but it does not prevent other threads from reading that table row while the with_lock block is running.
I can prove that with_lock does not prevent other READS by doing this:
Open 2 rails consoles;
On console 1, type something like ModelName.last.with_lock { sleep 30 }
On console 2, type ModelName.last. You'll be able to read the model no problem.
On console 2, type ModelName.update_columns(updated_at: Time.now). You'll see it will wait for the 30 seconds lock to expire before it finishes.
This proves that the lock DOES NOT prevent reading, and as far as I could tell there's no way to lock the database row from being read.
This is problematic because if 2 threads are running the same method at the EXACT same time and I must decide to run the with_lock block regarding some previous checks on the model data, thread 2 could be reading stale data that would be soon be updated by thread 1 after it finishes the with_lock block that is already running, because thread 2 CAN READ the model while with_lock block is in progress in thread 1, it only can't UPDATE it because of the lock.
EDIT: I found the answer to this question, so you can stop reading here and go straight to it below :)
One idea that I had was to begin the with_lock block issuing a harmless update to the model (like model_instance_name.update_columns(updated_at: Time.now) for instance), and then following it with a model_name_instance.reload to be sure that it gets the most updated data. So if two threads are running the same code at the same time, only one would be able to issue the first update, while the other would need to wait for the lock to be released. Once it is released, it would be followed with that model_instance_name.reload to be sure to get any updates performed by the other thread.
The problem is this solution seems way too hacky for my taste, and I'm not sure I should be reinventing the wheel here (I don't know if I'm missing any edge cases). How does one assure that, when two threads run the exact same method at the exact same time, one thread waits for the other to finish to even read the model ?
Thanks Robert for the Optimistic Locking info, I could definitely see me going that route, but Optimistic locking works by raising an exception on the moment of writing to the database (SQL UPDATE), and I have a lot of complex business logic that I wouldn't even want to run with the stale data in the first place.
This is how I solved it, and it was simpler than what I imagined.
First of all, I learned that pessimistic locking DOES NOT preventing any other threads from reading that database row.
But I also learned that with_lock also initiates the lock immediately, regardless of you trying to make a write or not.
So if you start 2 rails consoles (simulating two different threads), you can test that:
If you type ModelName.last.with_lock { sleep 30 } on Console 1 and ModelName.last on Console 2, Console 2 can read that record immediately.
However, if you type ModelName.last.with_lock { sleep 30 } on Console 1 and ModelName.last.with_lock { p 'I'm waiting' } on Console 2, Console 2 will wait for the lock hold by console 1, even though it's not issuing any write whatsoever.
So that's a way of 'locking the read': if you have a piece of code that you want to be sure that it won't be run simultaneously (not even for reads!), begin that method opening a with_lock block and issue your model reads inside it that they'll wait for any other locks to be released first. If you issue your reads outside it, your reads will be performed even tough some other piece of code in another thread has a lock on that table row.
Some other nice things I learned:
As per rails documentation, with_lock will not only start a transaction with a lock, but it will also reload your model for you, so you can be sure that inside the block ModelName.last is on it's most up-to-date state, since it issues a .reload on that instance.
That are some gems designed specifically to block the same piece of code running at the same time in multiple threads (which I believe the majority of every Rails app is while in production environment), regardless of the database lock. Take a look at redis-mutex, redis-semaphore and redis-lock.
That are many articles on the web (I could find at least 3) that state that Rails with_lock will prevent a READ on the database row, while we can easily see with the tests above that's not the case. Take care and always confirm information testing it yourself! I tried to comment on them warning about this.
You were close, you want optimistic locking instead of pessimist locking: http://api.rubyonrails.org/classes/ActiveRecord/Locking/Optimistic.html .
It won't prevent reading an object and submitting a form. But it can detect that the form was submitted when the user was seeing stale version of the object.
Recently we deployed a new version of our app, and since then we've been seeing some really weird issues with ActiveRecord. For example, here's a snippet of a query it generates hundreds of times per day, usually correctly:
`entries`.`style` AS t1_r25, `entries`.`pdf_visibility` AS , `entries`.`web_visibility` AS t1_r27
That's not a typo, t1_r26 is missing there although there's a space where it should be. But only that one time. That's not hand-written SQL, either, that's ActiveRecord writing the query and deciding on all the placeholder variables. It has similarly botched other queries leaving things blank that shouldn't be blank (shouldn't even be possible), but only once in a while. Most of the time it's fine.
We're also seeing a lot of instances where it complains about things like table_alias or reflection being an undefined variable or method on false:FalseClass. That's true...but the thing that is a FalseClass should have been an ActiveRecord model. We have no clue how any of this is happening, or how we could possibly have written a bug in our Rails code that would do most of this (especially the invalid query above).
We're on Rails 4.1.16 (we upgraded from 4.1.8 when this started happening) with Ruby 2.2.0 in Passenger 5.0.26 (going to 5.0.30 next). These errors are extremely sporadic and none of them make any sense. Out of thousands of requests per day, only a small handful of them (less than 10 across 5 servers) result in one of these weird errors, and we can't purposely reproduce any of them.
My entire team is stumped. We've spent hours poring over code changes and can't see anything that might cause any of this. We don't even know what we could possibly have written that would cause ActiveRecord to sometimes write a bad query in a way that we shouldn't be able to affect. We have no idea how to begin troubleshooting this kind of thing. Does anyone out there have a hint that might point us in some useful direction?
Update: Here's a new one it threw this morning. Note that LibraryItem is one of our pretty straightforward ActiveRecord models:
NoMethodError: undefined method `__callbacks' for #<LibraryItem:0x007f66cc5b82b0>
I...have no idea.
To close the loop for those who tried to help and for anyone who stumbles into this: We cured it by upgrading MRI. We'd been running on 2.2.0 for around a year, which was why we didn't immediately suspect it, and also because this started with a particular deployment. I was tipped off when we saw a couple of errors about an inability to allocate memory, and when MRI exploded in a hail of shrapnel on one server (by which I mean it segfaulted) and took Passenger down with it.
From there I started looking at MRI changelogs and noticed a ton of memory and GC related bug fixes between 2.2.0 and 2.2.5. Last night we upgraded to 2.2.5 with a deployment, and (fingers crossed) we haven't seen a single one of these weird issues yet. (Previously we were seeing 12-20 per day or more, depending on traffic).
So, why did it start happening following a deployment for us? I don't know for sure, but I have a guess: I'm thinking the size in bytes of our application in memory finally hit some critical mass at which it started triggering one or more of the MRI bugs that were fixed between 2.2.0 and 2.2.5. Best I can come up with.
Huge thanks to those who stepped in to try to assist!
We have a relatively standard Ruby on Rails project, which has quite a few background jobs that run under Resque (with Redis as a backend.)
The issue is that very rarely -- perhaps once a month, maybe a little less -- we'll suddenly see floods of exceptions from Resque. The exceptions are all in the following vein:
undefined method `find_by_id` for User():Class
undefined method `find_by_name` for CustomerAccount():Class
undefined method `find_by_id` for Job():Class
It appears that suddenly, all ActiveRecord::Base models lose their find_by_* methods for the entire thread. Restarting the worker fixes the issue.
I know that generically, the answer must be "someone, somewhere -- probably in a gem -- is breaking method_missing somehow." Or perhaps, somehow the constants are getting reassigned to a different class. But before I begin a really thorough investigation, I wanted to check if anyone has run into this problem and solved it already.
This project is running Ruby 2.1.1p76, Rails 3.2.17, Resque 1.25.1.
Closing the loop on this ancient question: it turns out moonfly's comment was indeed the issue, and in long-running workers, a dropped database connection would result in this (somewhat strange) error message.
Knowing the root cause, we were able to add a periodic conn refresh (to try and keep the connection alive when a worker was idle for too long), and also, to add a detection mechanism for when a db conn was dropped, and to reconnect. So, thanks much #moonfly, if you'd like to turn your comment into an answer I'm happy to award you much-delayed answer credit.
I have a long running rake task that is gobbling up all my system memory over time? What is the quickest way to track down my issue and get to the bottom of it?
Im using rails 2.3.5, ruby 1.8.7, ubuntu on slicehost and mysql 5.
I have rails app that works fine. I have a nightly job that runs all night and does tons of work (some external calls to twitter, google etc, and lots of db calls using active record, over time that job grows in memory size to nearly 4 gig. I need to figure out why the rake task is not releasing memeory.
I started looking into bleak_house, but it seems complex to setup and hasnt been updated in over a year. I cant get it to work locally so im reluctant to try in production.
thanks
Joel
Throwing out two ideas. First, if you're looping as part of this job, make sure you're not holding onto references to objects you don't need, as this will prevent them from being collected. If you're done, remove them from your array, or whatever. Also, put a periodic GC.start into your loop as a way to see if it's simply not getting around to GC-ing.
Second idea is that ruby does not GC symbols, so if your API clients are storing values as symbols you can end up with a huge and growing set of symbols that will never be re-used. Symbols are tiny, but tiny things can still add up.
And of course, don't load more objects than you need to. use #find_each to load AR objects in batches if you have to iterate over lots of them.
I'm working on an application that works like a search engine, and all the time it has workers in the background searching the web and adding results to the Results table.
While everything works perfectly, lately I started getting huge response times while trying to browse, edit or delete the results. My guess is that the Results table is being constantly locked by the workers who keep adding new data, which means web requests must wait until the table is freed.
However, I can't figure out a way to lower that load on the Results table and get faster respose times for my web requests. Has anyone had to deal with something like that?
The search bots are constantly reading and adding new stuff, it adds new results as it finds them. I was wondering if maybe by only adding the bulk of the results to the database after the search would help, or if it would make things worse since it would take longer.
Anyway, I'm at a loss here and would appreciate any help or ideas.
I'm using RoR 2.3.8 and hosting my app on Heroku with PostgreSQL
PostgreSQL doesn't lock tables for reads nor writes. Start logging your queries and try to find out what is going on. Guessing doesn't help, you have to dig into it.
To check the current activity:
SELECT * FROM pg_stat_activity;
Try the NOWAIT command. Since you're only adding new stuff with your background workers, I'd assume there would be no lock conflicts when browsing/editing/deleting.
You might want to put a cache in front of the database. On Heroku you can use memcached as a cache store very easily.
This'll take some load off your db reads. You could even have your search bots update the cache when they add new stuff so that you can use a very long expiration time and your frontend Rails app will very rarely (if ever) hit the database directly for simple reads.