How to build inner join in Rails with conditions? - ruby-on-rails

I've a model StockUpdate which keeps track of stocks for every product for a store. Table attributes are: :product_id, :stock, :store_id. I was trying to find out last entry for every product for a given store. According to that I build my query in PGAdmin which is given below and it's working fine. I'm new in Rails and I don't know how to represent it in Model. Please help.
SELECT a.*
FROM stock_updates a
INNER JOIN
(
SELECT product_id, MAX(id) max_id
FROM stock_updates where store_id = 9 and stock > 0
GROUP BY product_id
) b ON a.product_id = b.product_id AND
a.id = b.max_id

I does not clearly understand what you want to do, but I think you can do something like this:
class StockUpdate < ActiveRecord::Base
scope :a_good_name, -> { joins(:product).where('store_id = ? and stock > ?', 9, 0) }
end
You can all call StoclUpdate.a_good_name.explain to check the generated sql

What you need is really simple and can be easily accomplished with 2 queries. Otherwise it becomes very complicated in a single query (it's still doable though):
store_ids = [0, 9]
latest_stock_update_ids = StockUpdate.
where(store_id: store_ids).
group(:product_id).
maximum(:id).
values
StockUpdate.where(id: latest_stock_update_ids)
Two queries, without any joins necessary. The same could be possible with a single query too. But like your original code, it would include subqueries.
Something like this should work:
StockUpdate.
where(store_id: store_ids).
where("stock_updates.id = (
SELECT MAX(su.id) FROM stock_updates AS su WHERE (
su.product_id = stock_updates.product_id
)
)
")
Or perhaps:
StockUpdate.where("id IN (
SELECT MAX(su.id) FROM stock_updates AS su GROUP BY su.product_id
)")
And to answer your original question, you can manually specify a joins like so:
Model1.joins("INNER JOINS #{Model2.table_name} ON #{conditions}")
# That INNER JOINS can also be LEFT OUTER JOIN, etc.

Related

Rails: Collect records whose join tables appear in two queries

There are three models that matter here: Objective, Student, and Seminar. All are associated with has_and_belongs_to_many.
There is an ObjectiveStudent join model that includes columns "ready" and "points_all_time". There is an ObjectiveSeminar join model that includes column "priority".
I need to collect all of the objectives that are associated with a given student and also with a given seminar.
They need to also be marked with a "priority" above zero in the seminar. So I think I need this line:
obj_sems = ObjectiveSeminar.where(:seminar => given_seminar).where("priority > ?", 0)
Finally, they need to also be objectives where the student is ready, but has not scored above 7. So I think I need this line:
obj_studs = ObjectiveStudent.where(:user => given_student, :ready => true).where("points_all_time <= ?", 7)
Is there a way to gather all the objectives whose join table records appear in both of the above queries? Note that neither of the lists return objectives; they return objective_seminars, and objective_students, respectively. My end goal is to collect the objectives that meet all of the above criteria.
Or am I approaching this all wrong?
Bonus question: I would also love to sort the objectives by their priority in the given seminar. But I'm afraid that would add too much to the database load. What are your thoughts on this?
Thank you in advance for any insight.
In order to get Objectives you'll need to start your query from that.
In order to query with an AND condition the associated tables, you'll need inner joins with these tables.
Finally you'll need a distinct operator to only fetch each objective once.
The extended version of what (I think) you need is:
Objective.joins(objective_seminars: :seminar, objective_student: :student).
where(seminars: seminar_search_params, strudents: student_search_params).
where('objective_seminars.priority > 0').
where('objective_students.ready = 1 AND points_all_time <= 7').
order('objective_seminars.priority ASC').
distinct
Now for the database load it all depends on your indexes and the size of your tables.
The above query will translate to the following SQL (or something similar).
SELECT DISTINCT objectives.* FROM objectives
INNER JOIN objective_students ON objective_students.objective_id = objectives.id
INNER JOIN students ON students.id = objective_students.student_id
INNER JOIN objective_seminars ON objective_seminars.objective_id = objectives.id
INNER JOIN seminars ON seminars.id = objective_seminars.seminar_id
WHERE seminars_query AND
students_query AND
objective_seminars.priority > 0 AND
objective_students.ready = 1 AND points_all_time <= 7 AND
objective_seminars.priority ASC
So you'll need to add or extend your indexes so that all 5 tables queries can have an index helping out. The actual index implementation is up to you and depends on your application's specific (read - write load, tables size, cardinality etc)

Rails 4 bulk updating array of models

I have an array of ActiveRecord model and I want to renumber one column and bulk update them.
Code looks like this:
rules = subject.email_rules.order(:number)
rules.each_with_index do |rule, index|
rule.number = index + 1
end
EmailRule.update(rules.map(&:id), rules.map { |r| { number: r.number } })
But this creates N SQL statements and I would like 1, is there a way to do it?
Assuming you are using postgres you can use row_number and the somewhat strange looking UPDATE/FROM construct. This is the basic version:
UPDATE email_rules target
SET number = src.idx
FROM (
SELECT
email_rules.id,
row_number() OVER () as idx
FROM email_rules
) src
WHERE src.id = target.id
You might need to scope this on a subject and of course include the order by number which could look like this:
UPDATE email_rules target
SET number = src.idx
FROM (
SELECT
email_rules.id,
row_number() OVER (partition by subject_id) as idx
FROM email_rules
ORDER BY number ASC
) src
WHERE src.id = target.id
(assuming subject_id is the foreign key that associates subjects/email_rules)
One alternative to you is to put all interaction in a transaction and it will at least make one single commit at the end, making it way faster.
ActiveRecord::Base.transaction do
...
end

rails fetch records with no has many relationship and ones with a condition

I have a HomeMaker and PerDateUnavailability models. HomeMaker has_many per_date_unavailabilities. I want all home makers who don't have a record in per_date_unavailabilities and home makers who have record but not when the per_date_unavailabilties.unavailable_date = somedate
I usually do the first part when I want HomeMakers without a PerDateUnavailability record using HomeMaker.includes(:per_date_unavailabilities).where(per_date_unavailabilities: {id: nil})
and the second part of the condition using HomeMaker.joins(:per_date_unavailabilities).where.not(per_date_unavailabilities: {unavailable_date: Date.today})
How do I mix these?
The sql you need is somewhat like this:
select a.*
from home_makers a left join per_date_unavailabilities b
on a.id = b.home_maker_id
where b.home_maker_id is NULL
or b.unavailable_date IS NOT ?;
Expressing the or clause is a little tough in ActiveRecord, and its best we don't fight it. (Edited after OP's comment)
HomeMaker.joins("LEFT JOIN per_date_unavailabilities on per_date_unavailabilities.home_maker_id = home_makers.id")
.where("per_date_unavailabilities.home_maker_id IS NULL OR per_date_unavailabilities.unavailable_date != ?", somedate)

Rails expanding fields with scope, PG does not like it

I have a model of Widgets. Widgets belong to a Store model, which belongs to an Area model, which belongs to a Company. At the Company model, I need to find all associated widgets. Easy:
class Widget < ActiveRecord::Base
def self.in_company(company)
includes(:store => {:area => :company}).where(:companies => {:id => company.id})
end
end
Which will generate this beautiful query:
> Widget.in_company(Company.first).count
SQL (50.5ms) SELECT COUNT(DISTINCT "widgets"."id") FROM "widgets" LEFT OUTER JOIN "stores" ON "stores"."id" = "widgets"."store_id" LEFT OUTER JOIN "areas" ON "areas"."id" = "stores"."area_id" LEFT OUTER JOIN "companies" ON "companies"."id" = "areas"."company_id" WHERE "companies"."id" = 1
=> 15088
But, I later need to use this scope in more complex scope. The problem is that AR is expanding the query by selecting individual fields, which fails in PG because selected fields must in the GROUP BY clause or the aggregate function.
Here is the more complex scope.
def self.sum_amount_chart_series(company, start_time)
orders_by_day = Widget.in_company(company).archived.not_void.
where(:print_datetime => start_time.beginning_of_day..Time.zone.now.end_of_day).
group(pg_print_date_group).
select("#{pg_print_date_group} as print_date, sum(amount) as total_amount")
end
def self.pg_print_date_group
"CAST((print_datetime + interval '#{tz_offset_hours} hours') AS date)"
end
And this is the select it is throwing at PG:
> Widget.sum_amount_chart_series(Company.first, 1.day.ago)
SELECT "widgets"."id" AS t0_r0, "widgets"."user_id" AS t0_r1,<...BIG SNIP, YOU GET THE IDEA...> FROM "widgets" LEFT OUTER JOIN "stores" ON "stores"."id" = "widgets"."store_id" LEFT OUTER JOIN "areas" ON "areas"."id" = "stores"."area_id" LEFT OUTER JOIN "companies" ON "companies"."id" = "areas"."company_id" WHERE "companies"."id" = 1 AND "widgets"."archived" = 't' AND "widgets"."voided" = 'f' AND ("widgets"."print_datetime" BETWEEN '2011-04-24 00:00:00.000000' AND '2011-04-25 23:59:59.999999') GROUP BY CAST((print_datetime + interval '-7 hours') AS date)
Which generates this error:
PGError: ERROR: column
"widgets.id" must appear in the
GROUP BY clause or be used in an
aggregate function LINE 1: SELECT
"widgets"."id" AS t0_r0,
"widgets"."user_id...
How do I rewrite the Widget.in_company scope so that AR does not expand the select query to include every Widget model field?
As Frank explained, PostgreSQL will reject any query which doesn't return a reproducible set of rows.
Suppose you've a query like:
select a, b, agg(c)
from tbl
group by a
PostgreSQL will reject it because b is left unspecified in the group by statement. Run that in MySQL, by contrast, and it will be accepted. In the latter case, however, fire up a few inserts, updates and deletes, and the order of the rows on disk pages ends up different.
If memory serves, implementation details are so that MySQL will actually sort by a, b and return the first b in the set. But as far as the SQL standard is concerned, the behavior is unspecified -- and sure enough, PostgreSQL does not always sort before running aggregate functions.
Potentially, this might result in different values of b in result set in PostgreSQL. And thus, PostgreSQL yields an error unless you're more specific:
select a, b, agg(c)
from tbl
group by a, b
What Frank highlighted is that, in PostgreSQL 9.1, if a is the primary key, than you can leave b unspecified -- the planner has been taught to ignore subsequent group by fields when applicable primary keys imply a unique row.
For your problem in particular, you need to specify your group by as you currently do, plus every field that you're basing your aggregate onto, i.e. "widgets"."id", "widgets"."user_id", [snip] but not stuff like sum(amount), which are the aggregate function calls.
As an off topic side note, I'm not sure how your ORM/model works but the SQL it's generating isn't optimal. Many of those left outer joins seem like they should be inner joins. This will result in allowing the planner to pick an appropriate join order where applicable.
PostgreSQL version 9.1 (beta at this moment) might fix your problem, but only if there is a functional dependency on the primary key.
From the release notes:
Allow non-GROUP BY columns in the
query target list when the primary key
is specified in the GROUP BY clause
(Peter Eisentraut)
Some other database system already
allowed this behavior, and because of
the primary key, the result is
unambiguous.
You could run a test and see if it fixes your problem. If you can wait for the production release, this can fix the problem without changing your code.
Firstly simplify your life by storing all dates in a standard time-zone. Changing dates with time-zones should really be done in the view as a user convenience. This alone should save you a lot of pain.
If you're already in production write a migration to create a normalised_date column wherever it would be helpful.
nrI propose that the other problem here is the use of raw SQL, which rails won't poke around for you. To avoid this try using the gem called Squeel (aka Metawhere 2) http://metautonomo.us/projects/squeel/
If you use this you should be able to remove hard coded SQL and let rails get back to doing its magic.
For example:
.select("#{pg_print_date_group} as print_date, sum(amount) as total_amount")
becomes (once your remove the need for normalising the date):
.select{sum(amount).as(total_amount)}
Sorry to answer my own question, but I figured it out.
First, let me apologize to those who thought I might be having an SQL or Postgres issue, it is not. The issue is with ActiveRecord and the SQL it is generating.
The answer is... use .joins instead of .includes. So I just changed the line in the top code and it works as expected.
class Widget < ActiveRecord::Base
def self.in_company(company)
joins(:store => {:area => :company}).where(:companies => {:id => company.id})
end
end
I'm guessing that when using .includes, ActiveRecord is trying to be smart and use JOINS in the SQL, but it's not smart enough for this particular case and was generating that ugly SQL to select all associated columns.
However, all the replies have taught me quite a bit about Postgres that I did not know, so thank you very much.
sort in mysql:
> ids = [11,31,29]
=> [11, 31, 29]
> Page.where(id: ids).order("field(id, #{ids.join(',')})")
in postgres:
def self.order_by_ids(ids)
order_by = ["case"]
ids.each_with_index.map do |id, index|
order_by << "WHEN id='#{id}' THEN #{index}"
end
order_by << "end"
order(order_by.join(" "))
end
User.where(:id => [3,2,1]).order_by_ids([3,2,1]).map(&:id)
#=> [3,2,1]

Using RoR with a legacy table that uses E-A-V

I'm needing to connect to a legacy database and pull a subset of data from a table that uses the entity-attribute-value model to store a contact's information. The table looks like the following:
subscriberid fieldid data
1 2 Jack
1 3 Sparrow
2 2 Dan
2 3 Smith
where fieldid is a foreign key to a fields table that lists custom fields a given customer can have (e.g. first name, last name, phone). The SQL involved is rather hairy as I have to join the table to itself for every field I want back (currently I need 6 fields) as well as joining to a master contact list that's based on the current user.
The SQL is something like this:
select t0.data as FirstName, t1.data as LastName, t2.data as SmsOnly
from subscribers_data t0 inner join subscribers_data t1
on t0.subscriberid = t1.subscriberid
inner join subscribers_data t2
on t2.subscriberid = t1.subscriberid
inner join list_subscribers ls
on (t0.subscriberid = ls.subscriberid and t1.subscriberid = ls.subscriberid)
inner join lists l
on ls.listid = l.listid
where l.name = 'My Contacts'
and t0.fieldid = 2
and t1.fieldid = 3;
How should I go about handling this with my RoR application? I would like to abstracat this away and still be able to use the normal "dot notation" for pulling the attributes out. Luckily the data is read-only for the foreseeable future.
This is exactly what #find_by_sql was designed for. I would reimplement #find to do what you need to do, something like this:
class Contact < ActiveRecord::Base
set_table_table "subscribers_data"
def self.find(options={})
find_by_sql <<EOS
select t0.data as FirstName, t1.data as LastName, t2.data as SmsOnly
from subscribers_data t0 inner join subscribers_data t1
on t0.subscriberid = t1.subscriberid
inner join subscribers_data t2
on t2.subscriberid = t1.subscriberid
inner join list_subscribers ls
on (t0.subscriberid = ls.subscriberid and t1.subscriberid = ls.subscriberid)
inner join lists l
on ls.listid = l.listid
where l.name = 'My Contacts'
and t0.fieldid = 2
and t1.fieldid = 3;
EOS
end
end
The Contact instances will have #FirstName and #LastName as attributes. You could rename them as AR expects too, such that #first_name and #last_name would work. Simply change the AS clauses of your SELECT.
I am not sure it is totally germane to your question, but you might want to take a look at MagicModel. It can generate models for you based on a legacy database. Might lower the amount of work you need to do.

Resources