Rails 4 - Join with AND clause - ruby-on-rails

The query below gets me the user's next question based on the status of that question. It gets all the questions for that specific section and then the scope does a LEFT JOIN on the statuses that belong to that user.
My question is, this doesn't seem like a very Railsy way to do it - is there a better way of filtering my table rather than this clumsy AND and .to_s business. My issue is that obviously, if any user has answered that question, then the left join will fill up with that user's answer, whereas I require it to be null.
Essentially the query works but is ugly and I can't figure out if it's the most efficient way!
scope :next_for_user, lambda { |user|
joins("LEFT JOIN user_question_statuses ON user_question_statuses.question_id = questions.id AND user_question_statuses.user_id = ", user.id.to_s).
reorder("user_question_statuses.answered ASC NULLS FIRST").
order("user_question_statuses.updated_at ASC NULLS FIRST").
limit(1)
}
Edit:
I realise this method is particularly vulnerable to SQL injection so I've replaced the main line in the query with:
joins(sanitize_sql_array(["LEFT JOIN user_question_statuses ON user_question_statuses.question_id = questions.id AND user_question_statuses.user_id = %d", user.id]))
which seems to work and forces the input to be an integer only.
Edit 2:
My other option is to use the find_each and then user first_or_create to create empty question statuses for that particular section of questions for the current user. This could happen as and when they need them before looking for a question. This would allow me to do a RIGHT JOIN from the questions on to those statuses, knowing they exist but if the first method is efficient and safe (and as Railsy as it can be), then there's not reason to change that.
Edit 3:
I have structured this query in this way because - from the section model that has_many questions - I want to find the next question that should be passed to a user.
To find this I need to join all of the user_question_statuses on to all of the section model's questions. The only way this can be done is on question.id. However, there are many user_question_statuses with that question id for different users. So when joining I need the AND clause to filter down the user_question_statuses to only ones from that user before the join happens. A user hey obviously only have one status per question.
I use a LEFT JOIN so that if a status does not yet exist (they only get created after a user attempts a question for the first time) there are still statuses with NULLs everywhere so that they create a row from which to then move to the top (hence NULLS FIRST) and potentially server to the user.
This may all be extremely unclear!

What you did was skipping the ORM layer offered by Active Record and constructed the queries by yourself. Your feeling is correct that this approach as many limitations and is not a well fit to MVC model that rails is following. I would suggest a read through Active Record Query Interface to get concepts of doing it in the rails/oop way

Related

Join an ActiveRecord model to a table in another schema with no model

I need to join an ActiveRecord model in my Ruby on Rails app to another table in a different schema that has no model. I've searched for the answer, and found parts of it, but not a whole solution in one place, hence this question.
I have Vehicle model, with many millions of rows.
I have a table (reload_cars) in another schema (temp_cars) in the same database, with a few million records. This is an ad hoc table, to be used for one ad hoc data update, and will never be used again. There is no model associated with that table.
I initially was lazy and selected all the reload_cars records into an array (reload_vins) in one query, and then in a second query did something like:
`Vehicle.where(vin_status: :invalid).where('vin in (?)', reload_vins)`.
That's simplified a bit from the actual query, but demonstrates the join I need. In other queries, I need full sets of inner and outer joins between these tables. I also need to put various selection criteria on the model table and/or the non-model table in various steps.
That blunt approach worked fine in development, but did not scale up to the production database. I thought it would take a few minutes, which is plenty fast enough for a one-time operation. But, it timed out, particularly when looping through sets of records. Small tweaks did not help.
So, I need to do a legit join.
In retrospect, the answer seems pretty obvious. This query ran pretty much instantly, and gave the exact expected result, with various criteria on each table.
Here is one such query:
Vehicle.where(vin_status: :invalid)
.joins("
join temp_cars.reload_cars tcar
on tcar.vin = vehicles.vin
where tcar.registration_id is not null
")

Include vs Join

I have 3 models
User - has many debits and has many credits
Debit - belongs to User
Credit - belongs to User
Debit and credit are very similar. The fields are basically the same.
I'm trying to run a query on my models to return all fields from debit and credit where user is current_user
User.left_outer_joins(:debits, :credits).where("users.id = ?", #user.id)
As expected returned all fields from User as many times as there were records in credits and debits.
User.includes(:credits, :debits).order(created_at: :asc).where("users.id = ?", #user.id)
It ran 3 queries and I thought it should be done in one.
The second part of this question is. How I could I add the record type into the query?
as in records from credits would have an extra field to show credits and same for debits
I have looked into ActiveRecordUnion gem but I did not see how it would solve the problem here
includes can't magically retrieve everything you want it to in one query; it will run one query per model (typically) that you need to hit. Instead, it eliminates future unnecessary queries. Take the following examples:
Bad
users = User.first(5)
users.each do |user|
p user.debits.first
end
There will be 6 queries in total here, one to User retrieving all the users, then one for each .debits call in the loop.
Good!
users = User.includes(:debits).first(5)
users.each do |user|
p user.debits.first
end
You'll only make two queries here: one for the users and one for their associated debits. This is how includes speeds up your application, by eagerly loading things you know you'll need.
As for your comment, yes it seems to make sense to combine them into one table. Depending on your situation, I'd recommend looking into Single Table Inheritance (STI). If you don't go this route, be careful with adding a column called type, Rails won't like that!
First of all, in the first query, by calling the query on User class you are asking for records of type User and if you do not want user objects you are performing an extra join which could be costly. (COULD BE not will be)
If you want credit and debit records simply call queries on Credit and Debit models. If you load user object somewhere prior to this point, use includes preload eager_load to do load linked credit and debit record all at once.
There is two way of pre-loading records in Rails. In the first, Rails performs single query of each type of record and the second one Rails perform only a one query and load objects of different types using the data returned.
includes is a smart pre-loader that performs either one of the ways depending on which one it thinks would be faster.
If you want to force Rails to use one query no matter what, eager_load is what you are looking for.
Please read all about includes, eager_load and preload in the article here.

Rails select in scope

In my model User, I have scope set up:
scope :count_likes, lambda {
select("(SELECT count(*) from another_model) AS count")
}
If I want to get all attributes of my User + count_likes, I have to do:
Model.count_likes.select("users.*")
because calling select() will the default "*"
I use count_likes scope a lot of my application and my issue is that I have to append select("users.*") everywhere.
I know about the default scope, however, I don't think doing select("users.*") in default scope if a good idea.
Is there a DRY / better way of doing this?
Thanks
This isn't really another answer. I wanted to leave a comment about the joins, but comments cannot run long and I wanted to provide code examples.
What you need is to sometimes get all the fields and counts of a related table, and other times get the counts without the users.* fields, (and maybe sometimes just the user.* fields without the counts). So, you are going to have to tell the code which one you want. I think what you are looking for is an except type of thing, where by default you get the user.* fields and the counts, but when you only want the counts, to specify turning off the select('user.*'). I don't think there is such a solution, except maybe using the default scope. I suggest having one scope for just the counts, and one scope for users fields and the counts.
Here is what I would do:
class Users
has_many :likes
def self.with_count_likes
joins(:likes)
.select('users.*, count(likes.id) as count')
.group('users.id')
end
def self.count_likes
joins(:likes)
.select('users.id, users.username, count(likes.id) as count')
.group('users.id')
end
...
Call with_count_likes (or chain it into a query) when you want all the users fields and the likes counts. Call count_likes when you want just the counts and a few identifying fields.
I'm assuming here that whenever you want the counts, you want some users fields to identify what/(who) the counts are for.
Note that some databases (like Oracle) may require grouping by 'users.*'. This is the standard in SQL, but some databases like mySQL only use the primary key.
You may simply add users.* to the scope.
scope :count_likes, lambda {
select("(SELECT count(*) from another_model) AS count, users.*")
}
HTH
EDIT: I am not sure of exactly what you are trying to achieve, but you should consider using joins and get the data by joining tables appropriately.
EDIT: Usually I am not a big fan of making such changes, but as situation suggests sometimes we need to get our hands dirty. In this case, I would try to reduce the number of operations in terms of making changes. Consider:
scope :count_likes, Proc.new { |all| s = select("(SELECT count(*) from another_model) AS count"); s = s.select("users.*") unless all == false; s }
Now you will get users.* everywhere. For specific places where you just need the count, you may replace it like User.count_likes(false) and it will give you just the counts. Thus minimal changes.
There may be another possibility of appending multiple scopes together, one for counts, one for users.* and use them to achieve the above effect.

Duplicating logic in methods and scopes (and sql)

Named scopes really made this problem easier but it is far from being solved. The common situation is to have logic redefined in both named scopes and model methods.
I'll try to demonstrate the edge case of this by using somewhat complex example. Lets say that we have Message model that has many Recipients. Each recipient is being able to mark the message as being read for himself.
If you want to get the list of unread messages for given user, you would say something like this:
Message.unread_for(user)
That would use the named scope unread_for that would generate the sql which will return the unread messages for given user. This sql is probably going to join two tables together and filter messages by those recipients that haven't already read them.
On the other hand, when we are using the Message model in our code, we are using the following:
message.unread_by?(user)
This method is defined in message class and even it is doing basically the same thing, it now has different implementation.
For simpler projects, this is really not a big thing. Implementing the same simple logic in both sql and ruby in this case is not a problem.
But when application starts to get really complex, it starts to be a problem. If we have permission system implemented that checks who is able to access what message based on dozens of criteria defined in dozens of tables, this starts to get very complex. Soon it comes to the point where you need to join 5 tables and write really complex sql by hand in order to define the scope.
The only "clean" solution to the problem is to make the scopes use the actual ruby code. They would fetch ALL messages, and then filter them with ruby. However, this causes two major problems:
Performance
Pagination
Performance: we are creating a lot more queries to the database. I am not sure about internals of DMBS, but how harder is it for database to execute 5 queries each on single table, or 1 query that is going to join 5 tables at once?
Pagination: we want to keep fetching records until specified number of records is being retrieved. We fetch them one by one and check whether it is accepted by ruby logic. Once 10 of them are accepted, process will stop.
Curious to hear your thoughts on this. I have no experience with nosql dbms, can they tackle the issue in different way?
UPDATE:
I was only speaking hypotetical, but here is one real life example. Lets say that we want to display all transactions on the one page (both payments and expenses).
I have created SQL UNION QUERY to get them both, then go through each record, check whether it could be :read by current user and finally paginated it as an array.
def form_transaction_log
sql1 = #project.payments
.select("'Payment' AS record_type, id, created_at")
.where('expense_id IS NULL')
.to_sql
sql2 = #project.expenses
.select("'Expense' AS record_type, id, created_at")
.to_sql
result = ActiveRecord::Base.connection.execute %{
(#{sql1} UNION #{sql2})
ORDER BY created_at DESC
}
result = result.map do |record|
klass = Object.const_get record["record_type"]
klass.find record["id"]
end.select do |record|
can? :read, record
end
#transactions = Kaminari.paginate_array(result).page(params[:page]).per(7)
end
Both payments and expenses need to be displayed within same table, ordered by creation date and paginated.
Both payments and expenses have completely different :read permissions (defined in ability class, CanCan gem). These permission are quite complex and they require querieng several other tables.
The "ideal" thing would be to write one HUGE sql query that would do return what I need. It would made pagination and everything else a lot easier. But that is going to duplicate my logic defined in ability.rb class.
I'm aware that CanCan provides a way of defining the sql query for the ability, but the abilities are so complex, that they couldn't be defined in that way.
What I did is working, but I'm loading ALL transactions, and then checking which ones I could read. I consider it a big performance issue. Pagination here seems pointless because I'm already loading all records (it only saves bandwidth). An alternative is to write really complex SQL that is going to be hard to maintain.
Sounds like you should remove some duplication and perhaps use DB logic more. There's no reason that you can't share code between named scopes between other methods.
Can you post some problematic code for review?

Querying Mongodb collection based on parent's attribute

I've got a Posts document that belong to Users, and Users have an :approved attribute. How can I query my Posts using Mongodb s.t. I only get those for where User has :approved => true ?
I could write a loop that creates a new array, but that seems inefficient.
MongoDB does not have any notion of joins.
You've stated in the comments that Posts and Users are separate collections, but your query clearly involves data from both collections, which would imply a join.
I could write a loop that creates a new array, but that seems inefficient.
A join operation in SQL is basically a loop that happens on the server. With no join support on the server side, you'll have to make your own.
Note that many of the libraries (like Morphia) actually have some of this functionality built-in. You are using Mongoid which may have some of this support, but you'll have to do some hunting.
The easiest way to think about it would be to query for unique user ids of users who are approved and then query for post documents where the poster's user_id is in that set.
As Rubish said, you could de-normalize by adding an approved field to the post document. When a user's approval status is toggled (they become approved or unapproved) do an update on the posts collection where, for all of that user's posts, you toggle the denormalized approval field.
Using the denormalized method lets you do one query instead of two (simplifying the logic for the most common case) and isn't too much of a pain to maintain.
Let me know if that makes sense.

Resources