In Rails 2, I'm trying to optimize the performance of a web page the loads slowly.
I'm timing the execution time of statements in a model and finding that a surprising amount of the time is in a call from inside one model to another model, even though it appears there is no database access at all.
To be specific, let's say the model that is slow is department, and I'm calculating Department.expenditures. The expenditures method needs to know whether the quarter has been closed, and that information is in a different model, Quarter
The first time that Department.expenditures calls Quarter.closed? there is a database access, and I can accept that. But I've done something so to keep that in memory inside the model method, so that future calls to Quarter.closed? have no database access. The code inside Quarter.closed? now runs in around 4 microseconds, but simply invoking Quarter.closed? from inside Department.expenditures takes 400 microseconds, and with hundreds of departments, that adds up.
I could cache the Quarter.closed value inside a global variable, but that seems hairy. Does anyone know what is going on or have a suggestion about a better practice?
Not 100% sure if this applies to your problem. But with similar loading time problems in many cases eager loading solves the problem. You would do it like this:
Department.all(:include => :expenditures)
I'm a bit out of Rails 2 syntax. In Rails 3 you can specify includes quite detailed like this:
Category.includes(:posts => [{:comments => :guest}, :tags]).find(1)
I think (but not sure) the :include option in Rails 2 allowed for similar syntax
So maybe this would work:
Department.all(:include => [:expenditures => [:quarters]])
(Maybe need some experiments with combination of arra/hash syntax here)
Related
In my Ruby on Rails project, I have a mailer that basically prepares a daily digest of things that happened in the system for a given user. In the mailer controller, I am gathering all the relevant records from the various models according to some common pattern (within a certain date, not authored by this user, not flagged, etc) and with minor differences from model to model.
There are half a dozen of models involved here (and counting), and most of them have unified column names for certain things (like date of publishing, or whether an item is flagged by admin or not). Hence, the 'where's that go into query are mostly the same. There are minor differences in conditions, but at least 2 or 3 conditions are exactly the same. I easily assume there may be even more similar conditions between models, since we are just starting the feature and haven't figured out the eventual shape of the data yet.
I basically chain the 'where' calls upon each model. It irritates me to have 6 lines of code so close to each other, spanning so far to the right of my code editor, and yet so similar. I am dreaded by the idea that at some point we will have to change one of the 'core' conditions, munging with that many lines of code all at once.
What I'd love to do is to move a core set of conditions that goes into each query into some sort of Proc or whatever, then simply call it upon each model like a scope, and after that continue the 'where' chain with model-specific conditions. Much like a scope on each model.
What I am struggling with is how exactly to do that, while keeping the code inside mailer. I certainly know that I can declare a complex scope inside a concern, then mix it into my models and start each of queries with that scope. However, this way the logic will go away from the mailer into an uncharted territory of model concerns, and also it will complicate each model with a scope that is currently only needed for one little mailer in a huge system. Also, for some queries, a set of details from User model is required for a query, and I don't want each of my models to handle User.
I like the way scopes are defined in the Active Record models via lambdas (like scope :pending, -> { where(approved: [nil, false]) }), and was looking for a way to use similar syntax outside model class and inside my mailer method (possibly with a tap or something like that), but I haven't found any good examples of such an approach.
So, is it possible to achieve? Can I collect the core 'where' calls inside some variable in my mailer method and apply them to many models, while still being able to continue the where chain after that?
The beauty of Arel, the technology behind ActiveRecord query-building, is it's all completely composable, using ordinary ruby.
Do I understand your question right that this is what you want to do?
def add_on_something(arel_scope)
arel_scope.where("magic = true").where("something = 1")
end
add_on_something(User).where("more").order("whatever").limit(10)
add_on_something( Project.where("whatever") ).order("something")
Just ordinary ruby method will do it, you don't need a special AR feature. Because AR scopes are already composable.
You could do something like:
#report_a = default_scope(ModelA)
#report_b = default_scope(ModelB)
private
def default_scope(model)
model.
where(approved: [nil, false]).
order(:created_at)
# ...
end
I started using 'flog' and 'flay' gems to bring down code complexity and duplication. As a result, some of my controllers started having a lot of before and after filters. For an example, even if one line of code is repeated in multiple methods of a controller, i started shifting that code to a before_filter. flog n flay do say that my code is optimized but i was wondering whether it really is? Do so many filters bring down execution time?
I don't necessarily think so, but I haven't tested it. One way to ensure efficiency is to add conditions on filters. For example: before_filter :store_image, :unless => :has_image?
This way a model would only execute store_image if no image is present.
I have a model which has through relationships to another model through two foreign key relationships. I do a lot of lookups on those tables (which are 3-4K rows) while importing a lot of data, and I'm trying to eliminate spurious repeated database lookups.
(Ideally, my database would be doing async writes/INSERTs only)
I have played with doing my own caching by ID, and recently switched to using Rails.cache (with MemoryStore for now. I have no need to sync against other instances, and I'm ram rich on the import machine). However, I find that I am getting multiple copies of the same associated records, and I'd like to get rid of this.
For instance:
irb> p = Phone.includes([:site => :client, :btn => :client]).first.
irb> p.site.client.object_id => 67190640
irb> p.btn.client.object_id => 67170780
Ideally, I'd like these to point to the same object in memory.
Rails.cache would serialize things in/out, which really just makes this worse, but I was surprised by this. Could I override find_by_id() or some such in a way that the association proxies would make use of my cache?
Maybe there is another caching module that I'm missing?
(please note that there is no web front end involved in this process. It's all models and ORM)
Try using IdentityCache (see https://github.com/Shopify/identity_cache). We currently have a similar issue. We're using JRuby because it's fast but mallocs are expensive in our target environment... making cacheing these record instances all the more necessary.
There used to be an IdentityMap within ActiveRecord, but it was removed due to issues with unexpected behaviour around associations.
Just noticed you asked this in August, did you find a good solution?
I am building a rails app and the data should be reset every "season" but still kept. In other words, the only data retrieved from any table should be for the current season but if you want to access previous seasons, you can.
We basically need to have multiple instances of the entire database, one for each season.
The clients idea was to export the database at the end of the season and save it, then start fresh. The problem with this is that we can't look at all of the data at once.
The only idea I have is to add a season_id column to every model. But in this scenario, every query would need to have where(season_id: CURRENT_SEASON). Should I just make this a default scope for every model?
Is there a good way to do this?
If you want all the data in a single database, then you'll have to filter it, so you're on the right track. This is totally fine, as data is filtered all the time anyway so it's not a big deal. Also, what you're describing sounds very similar to marking data as archived (where anything not in the current season is essentially archived), something that is very commonly done and usually accomplished (I believe) via setting a boolean flag on every record to true or false in order to hide it, or some equivalent method.
You'll probably want a scope or default_scope, where the main downside of a default_scope is that you must use .unscoped in all places where you want to access data outside of the current season, whereas not using a default scope means you must specify the scope on every call. Default scopes can also seem to get applied in funny places from time to time, and in my experience I prefer to always be explicit about the scopes I'm using (i.e. I therefore never use default_scope), but this is more of a personal preference.
In terms of how to design the database you can either add the boolean flag for every record that tells whether or not that data is in the current season, or as you noted you can include a season_id that will be checked against the current season ID and filter it that way. Either way, a scope of some sort would be a good way to do it.
If using a simple boolean, then either at the end of the current season or the start of the new season, you would have to go and mark any current season records as no longer current. This may require a rake task or something similar to make this convenient, but adds a small amount of maintenance.
If using a season_id plus a constant in the code to indicate which season is current (perhaps via a config file) it would be easier to mark things as the current season since no DB updates will be required from season to season.
[Disclaimer: I'm not familiar with Ruby so I'll just comment from the database perspective.]
The problem with this is that we can't look at all of the data at once.
If you need to keep the old versions accessible, then you should keep them in the same database.
Designing "versioned" (or "temporal" or "historized") data model is something of a black art - let me know how your model looks like now and I might have some suggestions how to "version" it. Things can get especially complicated when handling connections between versioned objects.
In the meantime, take a look at this post, for an example of one such model (unrelated to your domain, but hopefully providing some ideas).
Alternatively, you could try using a DBMS-specific mechanism such as Oracle's flashback query, but this is obviously not available to everybody and may not be suitable for keeping the permanent history...
Introduction
I have a (mostly) single-page application built with BackboneJS and a Rails backend.
Because most of the interaction happens on one page of the webapp, when the user first visits the page I basically have to pull a ton of information out of the database in one large deeply joined query.
This is causing me some rather extreme load times on this one page.
NewRelic appears to be telling me that most of my problems are because of 457 individual fast method calls.
Now I've done all the eager loading I can do (I checked with the Bullet gem) and I still have a problem.
These method calls are most likely ocurring in my Rabl serializer which I use to serialize a bunch of JSON to embed into the page for initializing Backbone. You don't need to understand all this but suffice to say it could add up to 457 method calls.
object #search
attributes :id, :name, :subscription_limit
# NOTE: Include a list of the members of this search.
child :searchers => :searchers do
attributes :id, :name, :gravatar_icon
end
# Each search has many concepts (there could be over 100 of them).
child :concepts do |search|
attributes :id, :title, :search_id, :created_at
# The person who suggested each concept.
child :suggester => :suggester do
attributes :id, :name, :gravatar_icon
end
# Each concept has many suggestions (approx. 4 each).
node :suggestions do |concept|
# Here I'm scoping suggestions to only ones which meet certain conditions.
partial "suggestions/show", object: concept.active_suggestions
end
# Add a boolean flag to tell if the concept is a favourite or not.
node :favourite_id do |concept|
# Another method call which occurs for each concept.
concept.favourite_id_for(current_user)
end
end
# Each search has subscriptions to certain services (approx. 4).
child :service_subscriptions do
# This contains a few attributes and 2 fairly innocuous method calls.
extends "service_subscriptions/show"
end
So it seems that I need to do something about this but I'm not sure what approach to take. Here is a list of potential ideas I have:
Performance Improvement Ideas
Dumb-Down the Interface
Maybe I can come up with ways to present information to the user which don't require the actual data to be present. I don't see why I should absolutely need to do this though, other single-page apps such as Trello have incredibly complicated interfaces.
Concept Pagination
If I paginate concepts it will reduct the amount of data being extracted from the database each time. Would product an inferior user interface though.
Caching
At the moment, refreshing the page just extracts the entire search out of the DB again. Perhaps I can cache parts of the app to reduce on DB hits. This seems messy though because not much of the data I'm dealing with is static.
Multiple Requests
It is technically bad to serve the page without embedding the JSON into the page but perhaps the user will feel like things are happening faster if I load the page unpopulated and then fetch the data.
Indexes
I should make sure that I have indexes on all my foreign keys. I should also try to think about places where it would help to have indexes (such as favourites?) and add them.
Move Method Calls into DB
Perhaps I can cache some of the results of the iteration I do in my view layer into the DB and just pull them out instead of computing them. Or I could sync things on write rather than on read.
Question
Does anyone have any suggestions as to what I should be spending my time on?
This is a hard question to answer without being able to see the actual user interface, but I would focus on loading exactly only as much data as is required to display the initial interface. For example, if the user has to drill down to see some of the data you're presenting, then you can load that data on demand, rather than loading it as part of the initial payload. You mention that a search can have as many as 100 "concepts," maybe you don't need to fetch all of those concepts initially?
Bottom line, it doesn't sound like your issue is really on the client side -- it sounds like your server-side code is slowing things down, so I'd explore what you can do to fetch less data, or to defer the complex queries until they are definitely required.
I'd recommend separating your JS code-base into modules that are dynamically loaded using an asset loader like RequireJS. This way you won't have so many XHRs firing at load time.
When a specific module is needed it can load and initialize at an appropriate time instead of every page load.
If you complicate your code a little, each module should be able to start and stop. So, if you have any polling occurring or complex code executing you can stop the module to increase performance and decrease the network load.