Query Optimization with ActiveRecord for each method - ruby-on-rails

Below mentioned query is taking too much time, not able to understand how to optimized it.
Code and Associations :
temp = []
platforms = current_user.company.advisory_platforms
platforms.each{ |x| temp << x.advisories.published.collect(&:id) }
class Advisory
has_many :advisory_platforms,:through =>:advisory_advisory_platforms
end
class AdvisoryPlatform
has_many :companies,:through => :company_advisory_platforms
has_many :company_advisory_platforms,:dependent => :destroy
has_many :advisory_advisory_platforms,:dependent => :destroy
has_many :advisories, :through => :advisory_advisory_platforms
end

There are three glaring performance issues in your example.
First, you are iterating the records using each which means that you are loading the entire record set into memory at once. If you must iterate records in this way you should always use find_each so it is done in batches.
Second, every iteration of your each loop is performing an additional SQL call to get its results. You want to limit SQL calls to the bare minimum.
Third, you are instantiating entire Rails models simply to collect a single value, which is very wasteful. Instantiating Rails models is expensive.
I'm going to solve these problems in two ways. First, construct an ActiveRecord relation that will access all the data you need in one query. Second, use pluck to grab the id you need without paying the model instantiation cost.
You didn't specify what published is doing so I am going to assume it is a scope on Advisory. You also left out some of the data model so I am going to have to make assumptions about your join models.
advisory_ids = AdvisoryAdvisoryPlatform
.where(advisory_platform_id: current_user.company.advisory_platforms)
.where(advisory_id: Advisory.published)
.pluck(:advisory_id)
If you pass a Relation object as the value of a field, ActiveRecord will convert it into a subquery.
So
where(advisory_id: Advisory.published)
is analogous to
WHERE advisory_id IN (SELECT id FROM advisories WHERE published = true)
(or whatever it is published is doing).

Related

Can you temporarily disable mongoid relationships to roll your own queries

I am trying to manually build a series of queries to get around mongo's lack of joins and mongoids lack of eager loading. Suppose I have 2 classes:
class A
include Mongoid::Document
has_many :bs
...
class B
include Mongoid::Document
belongs_to :a
...
If I run a query on bs:
bs = B.where(...)
The result is a MongoidCriteria
If i try to get the first b by calling bs.first, however, it immediately fires a mongo query for the a association. This is exactly what I'm trying to avoid (If I have 1,000 b's, I'm trying to avoid 1000 singleton b queries).
This is fine, but when I have complex relationships, I want to work around the lack of eager loading by manually specifying the models myself, collecting ids, and the only returning the core model, without the associations.
Is there anything that will let me do this? Something like:
bs = B.where(...).disable_automatic_association_queries
Does such a thing exist?
The method is .without. For example:
A.where(...).without(:bs)

Rails Data Modelling

In my company, we are trying to cache some data that we are querying from an API. We are using Rails. Two of my models are 'Query' and 'Response'. I want to create a one-to-many relationship between Query and Response, wherein, one query can have many responses.
I thought this is the right way to do it.
Query = [query]
Response = [query_id, response_detail_1, response_detail_2]
Then, in the Models, I did the following Data Associations:
class Query < ActiveRecord::Base
has_many :response
end
class Response < ActiveRecord::Base
belongs_to :query
end
So, canonically, whenever I want to find all the responses for a given query, I would do -
"_id" = Query.where(:query => "given query").id
Response.where(:query_id => "_id")
But my boss made me use an Array column in the Query model, remove the Data Associations between the models and put the id of each response record in that array column in the Query model. So, now the Query model looks like
Query = [query_id, [response_id_1, response_id_2, response_id_3,...]]
I just want to know what are the merits and demerits of doing it both ways and which is the right way to do it.
If the relationship is really a one-to-many relationship, the "standard" approach is what you originally suggested, or using a junction table. You're losing out on referential integrity that you could get with a FK by using the array. Postgres almost had FK constraints on array columns, but from what I researched it looks like it's not currently in the roadmap:
http://blog.2ndquadrant.com/postgresql-9-3-development-array-element-foreign-keys/
You might get some performance advantages out of the array approach if you consider it like a denormalization/caching assist. See this answer for some info on that, but it still recommends using a junction table:
https://stackoverflow.com/a/17012344/4280232. This answer and the comments also offer some thoughts on the array performance vs the join performance:
https://stackoverflow.com/a/13840557/4280232
Another advantage of using the array is that arrays will preserve order, so if order is important you could get some benefits there:
https://stackoverflow.com/a/2489805/4280232
But even then, you could put the order directly on the responses table (assuming they're unique to each query) or you could put it on a join table.
So, in sum, you might get some performance advantages out of the array foreign keys, and they might help with ordering, but you won't be able to enforce FK constraints on them (as of the time of this writing). Unless there's a special situation going on here, it's probably better to stick with the "FK column on the child table" approach, as that is considerably more common.
Granted, that all applies mainly to SQL databases, which I notice now you didn't specify in your question. If you're using NoSQL there may be other conventions for this.

Can I have a one way HABTM relationship?

Say I have the model Item which has one Foo and many Bars.
Foo and Bar can be used as parameters when searching for Items and so Items can be searched like so:
www.example.com/search?foo=foovalue&bar[]=barvalue1&bar[]=barvalue2
I need to generate a Query object that is able to save these search parameters. I need the following relationships:
Query needs to access one Foo and many Bars.
One Foo can be accessed by many different Queries.
One Bar can be accessed by many different Queries.
Neither Bar nor Foo need to know anything about Query.
I have this relationship set up currently like so:
class Query < ActiveRecord::Base
belongs_to :foo
has_and_belongs_to_many :bars
...
end
Query also has a method which returns a hash like this: { foo: 'foovalue', bars: [ 'barvalue1', 'barvalue2' } which easily allows me to pass these values into a url helper and generate the search query.
This all works fine.
My question is whether this is the best way to set up this relationship. I haven't seen any other examples of one-way HABTM relationships so I think I may be doing something wrong here.
Is this an acceptable use of HABTM?
Functionally yes, but semantically no. Using HABTM in a "one-sided" fashion will achieve exactly what you want. The name HABTM does unfortunately insinuate a reciprocal relationship that isn't always the case. Similarly, belongs_to :foo makes little intuitive sense here.
Don't get caught up in the semantics of HABTM and the other association, instead just consider where your IDs need to sit in order to query the data appropriately and efficiently. Remember, efficiency considerations should above all account for your productivity.
I'll take the liberty to create a more concrete example than your foos and bars... say we have an engine that allows us to query whether certain ducks are present in a given pond, and we want to keep track of these queries.
Possibilities
You have three choices for storing the ducks in your Query records:
Join table
Native array of duck ids
Serialized array of duck ids
You've answered the join table use case yourself, and if it's true that "neither [Duck] nor [Pond] need to know anything about Query", using one-sided associations should cause you no problems. All you need to do is create a ducks_queries table and ActiveRecord will provide the rest. You could even opt to use has_many :through relationship if you need to do anything fancy.
At times arrays are more convenient than using join tables. You could store the data as a serialized integer array and add handlers for accessing the data similar to the following:
class Query
serialize :duck_ids
def ducks
transaction do
Duck.where(id: duck_ids)
end
end
end
If you have native array support in your database, you can do the same from within your DB. similar.
With Postgres' native array support, you could make a query as follows:
SELECT * FROM ducks WHERE id=ANY(
(SELECT duck_ids FROM queries WHERE id=1 LIMIT 1)::int[]
)
You can play with the above example on SQL Fiddle
Trade Offs
Join table:
Pros: Convention over configuration; You get all the Rails goodies (e.g. query.bars, query.bars=, query.bars.where()) out of the box
Cons: You've added complexity to your data layer (i.e. another table, more dense queries); makes little intuitive sense
Native array:
Pros: Semantically nice; you get all the DB's array-related goodies out of the box; potentially more performant
Cons: You'll have to roll your own Ruby/SQL or use an ActiveRecord extension such as postgres_ext; not DB agnostic; goodbye Rails goodies
Serialized array:
Pros: Semantically nice; DB agnostic
Cons: You'll have to roll your own Ruby; you'll loose the ability to make certain queries directly through your DB; serialization is icky; goodbye Rails goodies
At the end of the day, your use case makes all the difference. That aside, I'd say you should stick with your "one-sided" HABTM implementation: you'll lose a lot of Rails-given gifts otherwise.

Rails ActiveRecord - Uniqueness and Lookup on Array Attribute

Good morning,
I have a Rails model in which I’m currently serializing an array of information. Two things are important to me:
I want to be able to ensure that this is unique (i.e. can’t have two models with the same array)
I want to be able to search existing models for this hash (in a type of find_or_create_by method).
This model describes a “portfolio” – i.e. a group of stock or bonds. The array is the description of what securities are inside the portfolio, and in what weights. I also have a second model, which is a group of portfolios (lets call it a “Portcollection” to keep things simple). A collection has many portfolios, and a portfolio can be in many collections. In other words:
class Portfolio
serialize :weights
has_and_belongs_to_many :portcollections
class Portcollection
has_and_belongs_to_many :portfolios
When I am generating a “portcollection” I need to build a bunch of portfolios, which I do programmatically (implementation not important). Building a portfolio is an expensive operation, so I’m trying to check for the existence of one first. I thought I could do this via find_or_create_by, but wasn’t having much luck. This is my current solution:
Class Portcollection
before_save :build_portfolios
def build_portfolios
……
proposed_weights = ……
yml =proposed_weights.to_yaml
if port = Portfolio.find_by_weights(yml)
self.portfolios << port
else
self.portfolios << Portfolio.create!(:weights => proposed_weights)
end
……..
end
This does work, but it is quite slow. I have a feeling this is because I’m converting stuff to YAML each time it runs when I try to check for an existing portfolio (this is running probably millions of times), and I’m searching for a string, as opposed to an integer. I do have an index on this column though.
Is there a better way to do this? A few thoughts had crossed my mind:
Calculate an MD5 hash of the “weights” array, and save to a database column. I’ll still have to calculate this hash each time I want to search for an array, but I have a gut feeling this would be easier for the database to index & search?
Work on moving from has_and_belongs_to_many to a has_many => through, and store the array information as database columns. That way I could try to sort out a database query that could check for the uniqueness, without any YAML or serialization…
i.e. something like :
class Portfolio
has_many :portcollections, :through => security_weights
class Portcollections
has_many :portfolios, :through => security_weights
SECURITY_WEIGHTS
id portfolio_id portcollection_id weight_of_GOOG weight_of_APPLE ……
1 14 15 0.4 0.3
In case it is important, the “weights” array would look like this:
[ [‘GOOG’, 0.4] , [‘AAPL’, 0.3] , [‘GE’, 0.3] ]
Any help would be appreciated. Please keep in mind I'm quite an amateur - programming is just a hobby for me! Please excuse me if I'm doing anything really hacky or missing something obvious....
Thanks!
UPDATE 1
I've done some research into the Rails 3.2 "store" method, but that doesn't seem to be the answer either... It just stores objects as JSON, which gives me the same lack of searchability I have now.
I think storing a separate hash in it's own column is the only way to do this efficiently. You are using serialization or a key/value store that is designed to not be easily searchable.
Just make sure you consider sorting on your values before hashing them, other wise you could have the same content but differing hashes.

ActiveRecord has_n association

I was wondering what the best way to model a relationship where an object is associated with exactly n objects of another class. I want to extend the has_one relationship to a specific value of n.
For example, a TopFiveMoviesList would belong to user and have exactly five movies. I would imagine that the underlying sql table would have fields like movie_id_1, movie_id_2, ... movie_id_5.
I know I could do a has_many relationship and limit the number of children at the model level, but I'd rather not have an intermediary table.
I think implementing this model through a join model is going to be you're best bet here. It allows the List model to worry about List logic and the Movie model to worry about Movie logic. You can create a Nomination (name isn't the greatest, but you know what I mean) model to handle the relationship between movies and lists, and when there's a limit of 5, you could just limit the number of nominations you pull back.
There are a few reasons I think this approach is better.
First, assuming you want to be able to traverse the relationships both ways (movie.lists and list.movies), the 5 column approach is going to be much messier.
While it'd be so much better for ActiveRecord to support has n relationships, it doesn't, and so you'll be fighting the framework on that one. Also, the has n relationship seems a bit brittle to me in this situation. I haven't seen that kind of implementation pulled off in ActiveRecord, though I'd be really interested in seeing it happen. :)
My first instinct would be to use a join table, but if that's not desirable User.movie[1-5]_id columns would fit the bill. (I think movie1_id fits better with Rails convention than movie_id_1.)
Since you tagged this Rails and ActiveRecord, I'll add some completely untested and probably somewhat wrong model code to my answer. :)
class User < ActiveRecord::Base
TOP_N_MOVIES = 5
(1..TOP_N_MOVIES).each { |n| belongs_to "movie#{n}".to_sym, :class_name => Movie }
end
You could wrap that line in a macro-style method, but unless if that's a common pattern for your application, doing that will probably just make your code that harder to read with little DRY benefit.
You might also want to add validations to ensure that there are no duplicate movies on a user's list.
Associating your movie class back to your users is similar.
class Movie < ActiveRecord::Base
(1..User::TOP_N_MOVIES).each do |n|
has_many "users_list_as_top_#{n}".to_sym, :class_name => User, :foreign_key => "movie#{n}_id"
end
def users_list_as_top_anything
ary = []
(1..User::TOP_N_MOVIES).each {|n| ary += self.send("users_list_as_top_#{n}") }
return ary
end
end
(Of course that users_list_as_top_anything would probably be better written out as explicit SQL. I'm lazy today.)
I assume you mean "implement" rather than "model"? The modeling's pretty easy in UML, say, where you have a Person entity that is made up of 5 Movie entities.
But the difficulty comes when you say has_one, going to has_5. If it's a simple scalar value, has_one is perhaps a property on the parent entity. Has_5 is probably 2 entities related to one another through an "is made up of" relationship in UML.
The main question to answer is probably, "Can you guarantee that it will always be 'Top 5'?" If yes, model it with columns, as you mentioned. If no, model it with another entity.
Another question is perhaps, "How easy will it be to refactor?" If it's simple, heck, start with 5 columns and refactor to separate entities if it ever changes.
As usual, "best" is dependent on the business and technical environment.

Resources