I am loading data from two models, and once the data are loaded in the variables, then I need to remove those items from the first relation, that are not in the second one.
A sample:
users = User.all
articles = Articles.order('created_at DESC').limit(100)
I have these two variables filled with relational data. Now I would need to remove from articles all items, where user_id value is not included in the users object. So in the articles would stay only items with user_id, that is in the variable users.
I tried it with a loop, but it was very slow. How do I do it effectively?
EDIT:
I know there's a way to avoid doing this by building a better query, but in my case, I cannot do that (although I agree that in the example above it's possible to do that). That thing is that I have in 2 variables loaded data from database and I would need to process them with Ruby. Is there a command for doing that?
Thank you
Assuming you have a belongs_to relation on the Article model:
articles.where.not(users: users)
This would give you at most 100, but probably less. If you want to return 100 with the condition (I haven't tested, but the idea is the same, put the conditions for users in the where statement):
Articles.includes(:users).where.not(users: true).order('created_at DESC').limit(100)
The best way to do this would probably be with a SQL join. Would this work?
Articles.joins(:user).order('created_at DESC').limit(100)
Related
I have a variable (#cars) containing data from the database and from here, I need to generate an XLS document.
Because of some specifics of the XLS document, I need to know in advance the length of some associations (model Car has a has_many association for the model PreviousOwner) - particularly, I need to know how many previous owners each car had and I need to capture the highest number of previous owners of all cars.
One way of finding that is adding counter_cache to the Car model structure, is there any other way to deal with this situation? I have #cars variable and from there I need to find the car with the most previous owners.
One of the ways of dealing with it is by joining and selecting a count:
Car.left_joins(:previous_owners)
.select(
'cars.*',
'COUNT(previous_owners.*) AS previous_owners_count'
)
.group(:id)
.order(previous_owners_count: :desc)
Advantages when compared to a counter cache:
No additional update queries when inserting associated records.
More accurate if the count is critical and you have a lot of write activity.
Disadvantages:
Count is calculated for every query which is less efficient when reading.
It gets in the way of eager loading the records.
More code complexity vs a simple model callback.
In my model User, I have scope set up:
scope :count_likes, lambda {
select("(SELECT count(*) from another_model) AS count")
}
If I want to get all attributes of my User + count_likes, I have to do:
Model.count_likes.select("users.*")
because calling select() will the default "*"
I use count_likes scope a lot of my application and my issue is that I have to append select("users.*") everywhere.
I know about the default scope, however, I don't think doing select("users.*") in default scope if a good idea.
Is there a DRY / better way of doing this?
Thanks
This isn't really another answer. I wanted to leave a comment about the joins, but comments cannot run long and I wanted to provide code examples.
What you need is to sometimes get all the fields and counts of a related table, and other times get the counts without the users.* fields, (and maybe sometimes just the user.* fields without the counts). So, you are going to have to tell the code which one you want. I think what you are looking for is an except type of thing, where by default you get the user.* fields and the counts, but when you only want the counts, to specify turning off the select('user.*'). I don't think there is such a solution, except maybe using the default scope. I suggest having one scope for just the counts, and one scope for users fields and the counts.
Here is what I would do:
class Users
has_many :likes
def self.with_count_likes
joins(:likes)
.select('users.*, count(likes.id) as count')
.group('users.id')
end
def self.count_likes
joins(:likes)
.select('users.id, users.username, count(likes.id) as count')
.group('users.id')
end
...
Call with_count_likes (or chain it into a query) when you want all the users fields and the likes counts. Call count_likes when you want just the counts and a few identifying fields.
I'm assuming here that whenever you want the counts, you want some users fields to identify what/(who) the counts are for.
Note that some databases (like Oracle) may require grouping by 'users.*'. This is the standard in SQL, but some databases like mySQL only use the primary key.
You may simply add users.* to the scope.
scope :count_likes, lambda {
select("(SELECT count(*) from another_model) AS count, users.*")
}
HTH
EDIT: I am not sure of exactly what you are trying to achieve, but you should consider using joins and get the data by joining tables appropriately.
EDIT: Usually I am not a big fan of making such changes, but as situation suggests sometimes we need to get our hands dirty. In this case, I would try to reduce the number of operations in terms of making changes. Consider:
scope :count_likes, Proc.new { |all| s = select("(SELECT count(*) from another_model) AS count"); s = s.select("users.*") unless all == false; s }
Now you will get users.* everywhere. For specific places where you just need the count, you may replace it like User.count_likes(false) and it will give you just the counts. Thus minimal changes.
There may be another possibility of appending multiple scopes together, one for counts, one for users.* and use them to achieve the above effect.
Can I order my users in the database, so I don't have to say order_by("created_at desc") each time I query?
Sounds for me like a logical thing to do, but I don't know if it's possible and if it's best practice?
SOLUTION
I'm already using the default_scope and as I understand it from you, it is the best way to do it? Thanks a lot for the answers though.
If you are after results sorted by create date desc, the reverse natural order will be close to this (but not guaranteed to be identical).
If you want a specific ordering, adding order_by() to an indexed query is the best way to assure this.
If you are using the default generated ObjectIds the first 4-bytes are actually a unix timestamp (seconds since the epoch) .. and the _id field is indexed by default aside from a few exceptions noted in the documentation.
So a query like last 50 users created (based on ObjectId) in the mongo shell would be:
db.users.find().sort({_id:-1}).limit(50)
There are mixed views about default scopes, but to achieve what you're asking:
http://apidock.com/rails/ActiveRecord/Base/default_scope/class
class User < ActiveRecord::Base
default_scope order('created_at DESC')
### other model code here ###
end
you should be able to add an index or indexes to your db table. Be careful with running this on a live system as the overhead for creating an index on a large table can be disabling.
EDIT: should have expanded.
By creating an index, you will still have to order, but your ordering/sorting will be more efficient.
ref: Is it okay to add database indexes to a database that already has data?
Named scopes really made this problem easier but it is far from being solved. The common situation is to have logic redefined in both named scopes and model methods.
I'll try to demonstrate the edge case of this by using somewhat complex example. Lets say that we have Message model that has many Recipients. Each recipient is being able to mark the message as being read for himself.
If you want to get the list of unread messages for given user, you would say something like this:
Message.unread_for(user)
That would use the named scope unread_for that would generate the sql which will return the unread messages for given user. This sql is probably going to join two tables together and filter messages by those recipients that haven't already read them.
On the other hand, when we are using the Message model in our code, we are using the following:
message.unread_by?(user)
This method is defined in message class and even it is doing basically the same thing, it now has different implementation.
For simpler projects, this is really not a big thing. Implementing the same simple logic in both sql and ruby in this case is not a problem.
But when application starts to get really complex, it starts to be a problem. If we have permission system implemented that checks who is able to access what message based on dozens of criteria defined in dozens of tables, this starts to get very complex. Soon it comes to the point where you need to join 5 tables and write really complex sql by hand in order to define the scope.
The only "clean" solution to the problem is to make the scopes use the actual ruby code. They would fetch ALL messages, and then filter them with ruby. However, this causes two major problems:
Performance
Pagination
Performance: we are creating a lot more queries to the database. I am not sure about internals of DMBS, but how harder is it for database to execute 5 queries each on single table, or 1 query that is going to join 5 tables at once?
Pagination: we want to keep fetching records until specified number of records is being retrieved. We fetch them one by one and check whether it is accepted by ruby logic. Once 10 of them are accepted, process will stop.
Curious to hear your thoughts on this. I have no experience with nosql dbms, can they tackle the issue in different way?
UPDATE:
I was only speaking hypotetical, but here is one real life example. Lets say that we want to display all transactions on the one page (both payments and expenses).
I have created SQL UNION QUERY to get them both, then go through each record, check whether it could be :read by current user and finally paginated it as an array.
def form_transaction_log
sql1 = #project.payments
.select("'Payment' AS record_type, id, created_at")
.where('expense_id IS NULL')
.to_sql
sql2 = #project.expenses
.select("'Expense' AS record_type, id, created_at")
.to_sql
result = ActiveRecord::Base.connection.execute %{
(#{sql1} UNION #{sql2})
ORDER BY created_at DESC
}
result = result.map do |record|
klass = Object.const_get record["record_type"]
klass.find record["id"]
end.select do |record|
can? :read, record
end
#transactions = Kaminari.paginate_array(result).page(params[:page]).per(7)
end
Both payments and expenses need to be displayed within same table, ordered by creation date and paginated.
Both payments and expenses have completely different :read permissions (defined in ability class, CanCan gem). These permission are quite complex and they require querieng several other tables.
The "ideal" thing would be to write one HUGE sql query that would do return what I need. It would made pagination and everything else a lot easier. But that is going to duplicate my logic defined in ability.rb class.
I'm aware that CanCan provides a way of defining the sql query for the ability, but the abilities are so complex, that they couldn't be defined in that way.
What I did is working, but I'm loading ALL transactions, and then checking which ones I could read. I consider it a big performance issue. Pagination here seems pointless because I'm already loading all records (it only saves bandwidth). An alternative is to write really complex SQL that is going to be hard to maintain.
Sounds like you should remove some duplication and perhaps use DB logic more. There's no reason that you can't share code between named scopes between other methods.
Can you post some problematic code for review?
I have an ActiveRecord model Language, with columns id and short_code (there are other columns, but they are not relevant to this question). I want to create a method that will be given a list of short codes, and return a list of IDs. I do not care about associations, I just need to end up with an array that looks like [1, 2, 3, ...].
My first thought was to do something like
def get_ids_from_short_codes(*short_codes)
Language.find_all_by_short_code(short_codes.flatten, :select => 'id').map(&:id)
end
but I'm not sure if that's needlessly wasting time/memory/processing.
My question is twofold:
Is there a way to run an ActiveRecord find that will just return an array of a certain table column rather than instantiating objects?
If so, would it actually be worthwhile to collect an array of length n rather than instantiating n ActiveRecord objects?
Note that for my specific purpose, n would be approximately 200.
In Rails 3.x, you can use the pluck method which returns the values from the requested field without instantiating objects to hold them.
This would give you an array of IDs:
Language.where(short_code: short_codes.flatten).pluck(:id)
I should mention that in Rails 3.x you can pluck only one column at a time but in Rails 4 you can pass multiple columns to pluck.
By the way, here's a similar answer to a similar question
Honestly, for 200 records, I wouldn't worry about it. When you get to 2000, 20,000, or 200,000 records - then you can worry about optimization.
Make sure you have short_code indexed in your table.
If you are still concerned about performance, take a look at the development.log and see what the database numbers are for that particular call. You can adjust the query and see how it affects performance in the log. This should give you a rough estimate of performance.
Agree with the previous answer, but if you absolutely must, you can try this
sql = Language.send(:construct_finder_sql, :select => 'id', :conditions => ["short_code in (?)", short_codes])
Language.connection.select_values(sql)
A bit ugly as it is, but it doesn't create in-memory objects.
if you're using associations you can get raw ids directly from ActiveRecord.
eg.:
class User < ActiveRecord::Base
has_many :users
end
irb:=> User.find(:first).user_ids
irb:>> [1,2,3,4,5]
Phil is right about this, but if you do find that this is an issue. You can send a raw SQL query to the database and work at a level below ActiveRecord. This can be useful for situations like this.
ActiveRecord::Base.connection.execute("SQL CODE!")
Benchmark your code first before you resort to this.
This really is a matter of choice.
Overkill or not, ActiveRecord is supposed to give you objects since it's an ORM. And Like Ben said, if you do not want objects, use raw SQL.