My current Rails 3 app has seen a performance decrease as I introduce more associations to the schema.
This performance is not related to page loads, but to a background task that I run using resque. The task parses an external data source to populate the database. It does this through a custom helper method (quite a long one).
I don't expect the task to execute very quickly, because it is parsing a lot of data, but recently after adding a few additional associations, I have seen the execution time for my test method increase from 2 minutes to about 5 minutes. I'm running on a VM, so perhaps that's why it is so slow in general.
Using ruby-prof, it seems that the bulk of the additional computation is spent in methods dealing with the associations I added:
ActiveRecord::Associations::CollectionAssociation#* (where * is any number of methods)
When I have added associations, I have included indexes on the association table. However, other than this, I'm not sure what, if anything, I can do to mitigate the performance hit I'm seeing from adding these associations.
So my questions are:
1) Is it possible that adding associations can cause such drastic performance decrease in a Rails app?
2) What are the best practices for having a Rails app with many associations that performs well?
EDIT: Some additional info as requested
database: Postgres
the associations I added were both HABTM
it is the populating of these associations that takes up the time. I am creating hundreds, if not thousands of associations during the population process. For bulk inserts, I have been using activerecord-import, which speeds things up significantly, but I am not aware of an equivalent for associations.
I was doing the following (variables changed for simplicity):
// flavors_array is an array of all the Flavor objects I want to associate with my ice_cream
ice_cream.flavors = flavors_array
I did it this way because I know with certainty that there are no preexisting associations for this 'ice_cream' instance that I would be deleting by using '='
However, this method is still very slow
IMPORTANT: When I replace the above with a SQL statement to bulk insert the associations directly into the association table (via IDs), the performance improves dramatically. The operations take almost no time.
You also need to analyze your query, there is no magic formula, but evaluate the usage of "include vs join". Usually with associations, the data is not loaded unless required.
The railscast below provide excellent idea on this,
http://railscasts.com/episodes/181-include-vs-joins
Eager loading might help prevent round trips to DB and may give better performance.
Related
I have a Rails active job running on a Heroku dyno via Sidekiq.
When the job runs, the memory grows from 150MB to 550MB.
I found out the cause was n+1 queries to the DB, and fixed it by doing my calculations on the data in the DB query instead of in the code.
Afterwards I wanted to refactor a bit, as I generally like to keep the SQL somewhat simple, and have the logic in the code. For this reason I switched my "joins" with "includes", which allows me to use the associations of the objects. It turned out though that this refactoring introduced the memory issue once again though.
So my conclusion is, that it is the use of the associations that causes the memory growth, as the number of SQL queries are the same after I fixed the N+1 issue. Please note the job handles a lot of objects, around 250.000. It seems reasonable c.f. the delete vs. destroy functionality where delete performs better as the objects themselves are not instantiated as they are with destroy.
Is my conclusion accurate, or am I missing something?
Thanks,
-Louise
I am debugging a very complicated query on a mature codebase.
Our performance monitoring tool has identified N+1s in a complex part of the codebase that we have assumed to be free of lazy loading.
I would like to temporarily disable (or crash on) lazy loading while debugging certain sections of code.
# In my test suite or while debugging:
PseudoCode.disable_lazy_loading!
SuspectedNPlusOne.run(params) # Crash if lazy loading occurs
PseudoCode.enable_lazy_loading!
How can I disable lazy loading, or temporarily crash on DB reads for the sake of debugging?
ActiveRecord's querying API revolves around the concept of lazy loading. When I go in to try to clean up N+1 queries I find myself constantly fighting against the way AR was designed to work, which makes it sooo tempting to say "Meh, computers are fast enough".
This is one of several usability reasons that I've decided to switch to Elixir / Phoenix for all future development work; the Ecto ORM makes it literally impossible to lazy load data; you either load an association at query evaluation time or you don't have the associated data, you need to explicitly query it later on.
One very partial suggestion: Install the Bullet gem and use its Bullet.raise = true setting to crash the app if you see any lazy loading. Troubleshooting with Bullet is awkward because it's fighting against ActiveRecord's natural behavior, so it can only tell you where the N+1 query was lazily triggered, not where in the code the query was originally composed (and where you might need to add includes). But at least with this gem in place you can get part of the information you're seeking:
Is this code executing obvious N+1 queries?
Where in the code was the N+1 query triggered?
What class and association name is the culprit?
I have an application with a lot of database relationships that depend on each other to successfully operate the application. The hinge in the application is a model called the Schedule, but the schedule will pull Blocks, an Employee, a JobTitle, and an Assignment (in addition to that, every Block will pull an assignment from the database along with it as well) to assemble an employees schedule throughout the day.
When I built the app, I put a lot of emphasis on validations that would ensure that all of the pieces had to be in place before everything was saved to the database. This has worked out fantastically so far, and the app has been live and pounded on for almost 6 months, serving approximately 150,000 requests a month with no hiccups or errors. Until last week.
Last week, while someone was altering a schedule, it looks like the database erred, and a Schedule was saved to the database with its Assignment missing. Because the association is called in every view, whenever this schedule was called from the database, the application would throw an NoMethod error for calling on nil.
When designing an application in the way that I state, do you guard against a possible failure on the part of the database/validations? And if so how do you programatically defend against it? Do you check every relationship to make sure that it is not nil before sending it to the view?
I know this question is awash in generality, and if I can be more specific in what I mean, please let me know in the comments.
I would recommend adding database-enforced foreign key constraints and wrapping important groups of operations into transactions.
If there is a foreign-key between Schedule and Assignment somewhere, a database-enforced foreign key constraint would have prevented the errant insert. Additionally, if you wrap the particular action in a transaction, you can be sure that either the entire stream of inserts/updates/deletes happens or fails, reverting to a clean state.
In addition to your validations, and adding some database constraints as mentioned in other answers, you might also run a background job that periodically sweeps the database looking for orphans.
When it finds one, it cleans it up (if possible), or deletes it, or just marks it inactive and sends you email so you can look at it later. Depending on the amount and nature of your data, once a minute, once an hour, once a day...
That way, if bad data does get in despite whatever safeguards you have in place, you'll know about it sooner rather than later.
I'll argue the non-conventional wisdom on this. The constraints you describe don't belong in the database, they belong in your OO code. And it's not true that "the database erred", it's unquestionably true that the application is what inserted improperly validated data.
When you start expecting the database to carry the burden of these checks, you're putting business rules into the schema. At a minimum, this makes it a lot harder to write unit tests (which is where you should probably have caught this in the first place; but now is your chance to add another test.)
Ideally, you should be able to replace the RDBMS with some other generic data store and still have all the functional logic properly active and unchanged in the appropriate other places. The UI shouldn't be talking to the DAL much less dealing with database exceptions directly.
You can add the additional database constraints if you want, but it should be strictly as a backup. As you can see, handling database structural errors gracefully (especially if the UI is involved) is a lot harder.
If it's something that must be true in order for the app to function, that's really what assert()s are for. I've barely ever used Ruby, but I imagine it must have that concept. Use them to enforce preconditions in various places throughout your code. That combined with sanitizing and validating your external (user) inputs should be enough to protect you. I think if something goes wrong after that amount of checking, your app is righteously allowed to crash (in a controlled manner, of course).
I doubt the problem you're experiencing is a bug in your database. More likely there's some edge case in your validations that you've overlooked.
Good day all,
We are doing a data migration from one system to a Rails application. Some of the tables we are working with are very large and moving them over 1 record at a time using ActiveRecord takes far too long. Therefore we resorted to copying the table over in SQL and validating after the fact.
The one-by-one validation check is still slow, but the speed increase from the SQL copy more than makes up for it. However, that hasn't quenched our thirst to see if we can get the validation check to happen more quickly. We attempted to split the table into chunks and pass each chunk to a Thread but it actually executed slower.
The question is, large table, currently iterating row-by-row to do the validation, like so
Model.find_each do |m|
logger.info "M #{m.id} is not valid" unless m.valid?
end
Anyone have any recommendations on how to speed this up?
Thanks
peer
EDIT I should say not specifically this code. We are looking for recommendations on how we can run this concurrently, giving each process a chunk of data, without needed a machine per process
find_each is using find_in_batches, which fetches 1000 rows at a time by default. You could try playing with the batch_size option. The way you have it above seems pretty optimal; it's fetching from the database in batches and iterating over each one, which you need to do. I would monitor your RAM to see if the batch size is optimal, and you could also try using Ruby 1.9.1 to speed things up if you're currently using 1.8.*.
http://api.rubyonrails.org/classes/ActiveRecord/Batches/ClassMethods.html#M001846
I like zgchurch's response as a starting point.
What I would add is that threading is definitely not going to help here, especially because Ruby uses green threads (at least in 1.8.x), so there is no opportunity to utilize multiple processors anyway. Even if that weren't the case it's very likely that this operation is IO-heavy enough that you would get IO contention eating into any multi-core benefits.
Now if you really want to speed this up you should take a look at the actual validations and figure out a more efficient way to achieve them. Just loading all the rows and instantiating an ActiveRecord object is going to tend to dominate the performance in most validation situations. You may be spending 90-99.99% of your time just loading and unloading the data from memory.
In these types of situations I tend to go towards raw SQL. You can do things like validating foreign key integrity tens of thousands of times faster than raw ActiveRecord validation callbacks. Of course the viability of this approach depends on the actual ins and outs of your validations. Even if you need something a little richer than SQL to define validity, you could still probably get a 10-100x speed increase just be loading the minimal data with a thinner SQL interface and examining the data directly. If that's the case Perl or Python might be a better choice for raw performance.
There is a good chance that we will be tech crunched in the next few days. Unfortunately, we have not gone live yet so we don't have a good estimation of how our system handles a production audience.
Our production setup consists of 2 EngineYard slices each with 3 mongrel instances, using Postgres as the database server.
Obviously a huge portion of how our app will hold up is to do with our actual code and queries etc. However, it would be good to see if there are any tips/pointers on what kind of load to expect or experiences from people who have been through it. Does 6 mongrel instances (possibly 8 if the servers can take it) sound like it will handle the load, or are at least most of it?
I have worked on several rails applications that experienced high load due to viral growth on Facebook.
Your mongrel count should be based on several factors. If your mongrels make API calls or deliver email and must wait for responses, then you should run as many as possible. Otherwise, try to maintain one mongrel per CPU core, with maybe a couple extra left over.
Make sure your server is using a Fair Proxy Balancer (not round robin). Here is the nginx module that does this: http://github.com/gnosek/nginx-upstream-fair/tree/master
And here are some other tips on improving and benchmarking your application performance to handle the load:
ActiveRecord
The most common problem Rails applications face is poor usage of ActiveRecord objects. It can be quite easy to make 100's of queries when only one is necessary. The easiest way to determine if this could be a problem with your application is to set up New Relic. After making a request to each major page on your site, take a look at the newrelic SQL overview. If you see a large number of very similar queries sequentially (select * from posts where id = 1, select * from posts where id = 2, select * from posts...) this may be a sign that you need to use a :include in one of your ActiveRecord calls.
Some other basic ActiveRecord tips (These are just the ones I can think of off the top of my head):
If you're not doing it already, make sure to correctly use indexes on your database tables.
Avoid making database calls in views, especially partials, it can be very easy to lose track of how much you are making database queries in views. Push all queries and calculations into your models or controllers.
Avoid making queries in iterators. Usually this can be done by using an :include.
Avoid having rails build ActiveRecord objects for large datasets as much as possible. When you make a call like Post.find(:all).size, a new class is instantiated for every Post in your database (and it could be a large query too). In this case you would want to use Post.count(:all), which will make a single fast query and return an integer without instantiating any objects.
Associations like User..has_many :objects create both a user.objects and user.object_ids method. The latter skips instantiation of ActiveRecord objects and can be much faster. Especially when dealing with large numbers of objects this is a good way to speed things up.
Learn and use named_scope whenever possible. It will help you keep your code tiny and makes it much easier to have efficient queries.
External APIs & ActionMailer
As much as you can, do not make API calls to external services while handling a request. Your server will stop executing code until a response is received. Not only will this add to load times, but your mongrel will not be able to handle new requests.
If you absolutely must make external calls during a request, you will need to run as many mongrels as possible since you may run into a situation where many of them are waiting for an API response and not doing anything else. (This is a very common problem when building Facebook applications)
The same applies to sending emails in some cases. If you expect many users to sign up in a short period of time, be sure to benchmark the time it takes for ActionMailer to deliver a message. If it's not almost instantaneous then you should consider storing emails in your database an using a separate script to deliver them.
Tools like BackgroundRB have been created to solve this problem.
Caching
Here's a good guide on the different methods of caching in rails.
Benchmarking (Locating performance problems)
If you suspect a method may be slow, try benchmarking it in console. Here's an example:
>> Benchmark.measure { User.find(4).pending_invitations }
=> #<Benchmark::Tms:0x77934b4 #cutime=0.0, #label="", #total=0.0, #stime=0.0, #real=0.00199985504150391, #utime=0.0, #cstime=0.0>
Keep track of methods that are slow in your application. Those are the ones you want to avoid executing frequently. In some cases only the first call will be slow since Rails has a query cache. You can also cache the method yourself using Memoization.
NewRelic will also provide a nice overview of how long methods and SQL calls take to execute.
Good luck!
Look into some load testing software like WEBLoad or if you have money, Quick Test Pro. This will help give you some idea. WEBLoad might be the best test in your situation.
You can generate thousands of virtual nodes hitting your site and you can inspect the performance of your servers from that load.
In my experience having watched some of our customers absorb a crunching, the traffic was fairly modest- not the bone crushing spike people seem to expect. Now, if you get syndicated and make on Yahoo's page or something, things may be different.
Search for the experiences of Facestat.com if you want to read about how they handled it (the Yahoo FP.)
My advise is just be prepared to turn off signups or go to a more static version of your site if your servers get too hot. Using a monitoring/profiling tool is a good idea as well, I like FiveRuns Manage tool for ease of setup.
Since you're using EngineYard, you should be able to allocate more machines to handle the load if necessary
Your big problems will probably not be the number of incoming requests, but will be the amount of data in your database showing you where your queries aren't using the indexes your expecting, or are returning too much data, e.g. The User List page works with 10 users, but dies when you try to show 10,000 users on that one page because you didn't add pagination (will_paginate plugin is almost your friend - watch out for 'select count(*)' queries that are generated for you)
So the two things to watch:
Missing indexes
Too much data per page
For #1, there's a plugin that runs an 'explain ...' query after every query so you can check index usage manually
There is a plugin that can generate data for you for various types of data that may help you fill your database up to test these queries too.
For #2, use will_paginate plugin or some other way to reduce data per page.
We've got basically the same setup as you, 2 prod slices and a staging slice at EY. We found ab to be a great load testing tool - just write a bash script with the urls that you expect to get hit and point it at your slice. Watch NewRelic stats and it should give you some idea of the load your app can handle and where you might need to optimise.
We also found query_reviewer to be very useful as well. It is great for finding those un-indexed tables and n+1 queries.