Rails relation.count does not equal relation.map(&:id).count - ruby-on-rails

I cant for the life of me figure out why the below active record relation returns a different result if you run count on the relation directly as opposed to mapping over the relation and then counting the result.
Shouldn't they be the same? Does anyone know what is going on?
ActiveRecord::Base.connection.query_cache.clear
# => {}
Panel.connection.schema_cache.clear!
# => nil
Panel.reset_column_information
# => nil
#filtered_skew_panels[agglo_code].map(&:panel_id).count
# => 57
#filtered_skew_panels[agglo_code].all.count
(13.1ms) SELECT COUNT(count_column) FROM (SELECT 1 AS count_column FROM "panels" WHERE "panels"."agglo_code_id" = $1 AND "panels"."environment_id" = $2 AND "panels"."product_id" = $3 AND "panels"."alcohol_friendly" = $4 AND "panels"."suburb" = 'Marrickville' AND (NOT EXISTS(SELECT 1 FROM campaign_search_panels WHERE campaign_search_panels.panel_id = panels.panel_id AND campaign_search_panels.campaign_id = 52)) AND (NOT EXISTS(SELECT 1 FROM "AIDAAU_Avails" WHERE "AIDAAU_Avails"."PanelID" = panels.panel_uid AND "AIDAAU_Avails"."TillDate" >= '2017-08-21 00:00:00.000000' AND "AIDAAU_Avails"."FromDate" <= '2017-09-03 00:00:00.000000')) LIMIT 100) subquery_for_count [["agglo_code_id", 4], ["environment_id", 2], ["product_id", 1], ["alcohol_friendly", "t"]]
# => 0
Shouldn't #filtered_skew_panels[agglo_code].map(&:panel_id).count be equal to #filtered_skew_panels[agglo_code].count?
The relation itself seems to actually contain all the records:
[3] pry(#<PanelSearch>)> #filtered_skew_panels[agglo_code]
=> [#<Panel:0x007fe4ef779990
location_type: "Street",
move_id: "27779",
panel_id: "11441A1",
address: "Victoria Rd N/O Sydenham Rd E/S",
postal_code: "2204",
suburb: "Marrickville",
latitude: #<BigDecimal:7fe4ee0b14b0,'-0.33909329E2',18(27
[5] pry(#<PanelSearch>)> #filtered_skew_panels[agglo_code].class
=> Panel::ActiveRecord_Relation
[6] pry(#<PanelSearch>)> #filtered_skew_panels[agglo_code].to_sql
=> "SELECT \"panels\".* FROM \"panels\" WHERE \"panels\".\"agglo_code_id\" = 4 AND \"panels\".\"environment_id\" = 2 AND \"panels\".\"product_id\" = 1 AND \"panels\".\"alcohol_friendly\" = 't' AND \"panels\".\"suburb\" = 'Marrickville' AND (NOT EXISTS(SELECT 1 FROM campaign_search_panels WHERE campaign_search_panels.panel_id = panels.panel_id AND campaign_search_panels.campaign_id = 52)) AND (NOT EXISTS(SELECT 1 FROM \"AIDAAU_Avails\" WHERE \"AIDAAU_Avails\".\"PanelID\" = panels.panel_uid AND \"AIDAAU_Avails\".\"TillDate\" >= '2017-08-21 00:00:00.000000' AND \"AIDAAU_Avails\".\"FromDate\" <= '2017-09-03 00:00:00.000000')) LIMIT 500"
[7] pry(#<PanelSearch>)> #filtered_skew_panels[agglo_code].all.to_sql
=> "SELECT \"panels\".* FROM \"panels\" WHERE \"panels\".\"agglo_code_id\" = 4 AND \"panels\".\"environment_id\" = 2 AND \"panels\".\"product_id\" = 1 AND \"panels\".\"alcohol_friendly\" = 't' AND \"panels\".\"suburb\" = 'Marrickville' AND (NOT EXISTS(SELECT 1 FROM campaign_search_panels WHERE campaign_search_panels.panel_id = panels.panel_id AND campaign_search_panels.campaign_id = 52)) AND (NOT EXISTS(SELECT 1 FROM \"AIDAAU_Avails\" WHERE \"AIDAAU_Avails\".\"PanelID\" = panels.panel_uid AND \"AIDAAU_Avails\".\"TillDate\" >= '2017-08-21 00:00:00.000000' AND \"AIDAAU_Avails\".\"FromDate\" <= '2017-09-03 00:00:00.000000')) LIMIT 500"

Related

where.not with where in rails/activerecord excluding all records?

I have a model (pairings) that has a length and date attribute (among others). I am trying to query all pairings that do NOT touch a specific date or date range while matching other properties of the pairing. The code for all other queries working fine, but my where.not is excluding all records. This is because (I think) of the way I am building the relation/query.
I need something that excludes selected date range AND matches the length. This needs to be stacked on other scoped queries.
Any ideas on how to accomplish this?
Thanks in advance!
Controller code:
rel = rel.other scoped queries
#all_trip_len.each do |len|
rel = rel.date_selector(#date_sel_1, #date_start_1, #date_end_1, len)
end
rel = rel.more scoped queries
Model Scope:
def self.date_selector(sel, start_d, end_d, length)
# Need to get all possible start dates that could touch avoid days for each length of trip
rel = self
start = Date.parse(start_d) - (length.to_i - 1)
rel = rel.where.not(date: start.to_s..Date.parse(end_d).to_s).where(length: length)
return rel
end
I am getting a query that is:
SELECT "pairings".*
FROM "pairings"
WHERE "pairings"."bid_month_id" = $1
AND NOT ("pairings"."date" BETWEEN $2 AND $3)
AND "pairings"."length" = $4
AND NOT ("pairings"."date" BETWEEN $5 AND $6)
AND "pairings"."length" = $7 [["bid_month_id", 8], ["date", "2020-04-08"], ["date", "2020-04-08"], ["length", 1], ["date", "2020-04-07"], ["date", "2020-04-08"], ["length", 2]
And I need something more like:
SELECT "pairings".*
FROM "pairings"
WHERE "pairings"."bid_month_id" = $1
AND NOT (("pairings"."date" BETWEEN $2 AND $3)
AND "pairings"."length" = $4)
AND NOT (("pairings"."date" BETWEEN $5 AND $6)
AND "pairings"."length" = $7) [["bid_month_id", 8], ["date", "2020-04-08"], ["date", "2020-04-08"], ["length", 1], ["date", "2020-04-07"], ["date", "2020-04-08"], ["length", 2]
Edit:
With the help of MurifoX I got a little farther. The query is built almost correctly but I need OR in between the pairing.date grouping.
What I have now:
SELECT "pairings".*
FROM "pairings"
WHERE "pairings"."bid_month_id" = $3
AND (((date <= '2020-04-06' AND date >= '2020-04-07') AND length = 1))
AND (((date <= '2020-04-05' AND date >= '2020-04-07') AND length = 2))
What I need:
SELECT "pairings".*
FROM "pairings"
WHERE "pairings"."bid_month_id" = $3
AND (((date <= '2020-04-06' AND date >= '2020-04-07') AND length = 1)
OR ((date <= '2020-04-05' AND date >= '2020-04-07') AND length = 2))
I have tried using the rails 5 or (rel.or(Pairing.date_selector(xxx)) but that does not work because it turns all the where AND into OR and I just need the OR between the pairing/date groupings. Also need the parens around the date groupings.
The construction of semi-complex queries with ActiveRecord methods can be tricky sometimes, so in this particular cases i always tell people to do it by hand:
rel = self
start = Date.parse(start_d) - (length.to_i - 1)
rel = rel.where("((date <= ? AND date >= ?) AND length = ?)", start.to_s, Date.parse(end_d).to_s, length)
return rel
I ended up having to build this as a string to get it to do exactly what I want. the or relation in rails 5 created a statement that OR'd ALL of my previous scope clauses with the new one (all AND -> OR) and I needed all my previous scoped queries to stand and just the new dates to be OR'd.
def exclude_dates(rel, date_start_1, date_end_1, all_trip_len)
dates_query = String.new
count = 0
all_trip_len.each do |len|
start_d = Date.parse(date_start_1) - (len.to_i - 1)
end_d = Date.parse(date_end_1)
count += 1
dates_query += " OR " if count > 1
dates_query += "((date < '#{start_d.to_s}' OR date > '#{end_d.to_s}') AND length = '#{len}')"
end
rel = rel.where(dates_query)
end
This created the query I needed. Wish there was a more 'rails' way to do it...
SELECT COUNT(*) FROM "pairings" WHERE "pairings"."bid_month_id" = $1 AND (((date < '2020-04-02' OR date > '2020-04-02') AND length = '1') OR ((date < '2020-04-01' OR date > '2020-04-02') AND length = '2') OR ((date < '2020-03-31' OR date > '2020-04-02') AND length = '3') OR ((date < '2020-03-30' OR date > '2020-04-02') AND length = '4')) AND (((date < '2020-04-14' OR date > '2020-04-17') AND length = '1') OR ((date < '2020-04-13' OR date > '2020-04-17') AND length = '2') OR ((date < '2020-04-12' OR date > '2020-04-17') AND length = '3') OR ((date < '2020-04-11' OR date > '2020-04-17') AND length = '4')) [["bid_month_id", 8]]
Thanks to #MurifoX for getting me going in the right direction with the dates stuff!

Weird behavior of CAST with 0 in rails

Sorry for long query but its what im working with.
I got this query in rails
#posts = Cama::PostType.first.posts.joins(:custom_field_values).where("(status = ?) AND (cama_custom_fields_relationships.custom_field_slug = ? AND
CAST(cama_custom_fields_relationships.value AS INTEGER) >= ? AND
CAST(cama_custom_fields_relationships.value AS INTEGER) <= ? ) OR
(cama_custom_fields_relationships.custom_field_slug = ? AND
CAST(cama_custom_fields_relationships.value AS INTEGER) >= ? AND
CAST(cama_custom_fields_relationships.value AS INTEGER) <= ? ) AND
(LOWER(title) LIKE ? OR LOWER(content_filtered) LIKE ?)","published", "filtry-
powierzchnia", params['size-start'].to_i, params['size-end'].to_i,"filtry-
cena", params['price-start'].to_i, params['price-end'].to_i
,"%#{params[:q]}%",
"%#{params[:q]}%").group("cama_posts.id").having("COUNT(cama_custom_fields_rel
ationships.objectid) >= 2")
that make this sql
CamaleonCms::Post Load (0.9ms) SELECT "cama_posts".* FROM "cama_posts"
INNER JOIN "cama_custom_fields_relationships" ON
"cama_custom_fields_relationships"."objectid" = "cama_posts"."id" AND
"cama_custom_fields_relationships"."object_class" = $1 WHERE
"cama_posts"."post_class" = $2 AND "cama_posts"."taxonomy_id" = $3 AND
((status = 'published') AND
(cama_custom_fields_relationships.custom_field_slug = 'filtry-powierzchnia'
AND CAST(cama_custom_fields_relationships.value AS INTEGER) >= 1 AND
CAST(cama_custom_fields_relationships.value AS INTEGER) <= 10000 ) OR
(cama_custom_fields_relationships.custom_field_slug = 'filtry-cena' AND
CAST(cama_custom_fields_relationships.value AS INTEGER) >= 0 AND
CAST(cama_custom_fields_relationships.value AS INTEGER) <= 10000000 ) AND
(LOWER(title) LIKE '%%' OR LOWER(content_filtered) LIKE '%%')) GROUP BY
cama_posts.id HAVING (COUNT(cama_custom_fields_relationships.objectid) >= 2)
ORDER BY "cama_posts"."post_order" ASC, "cama_posts"."created_at" DESC LIMIT
$4 OFFSET $5 [["object_class", "Post"], ["post_class", "Post"],
["taxonomy_id", 2], ["LIMIT", 6], ["OFFSET", 0]]
As you can see Im using CAST, I can't change column just to integer, because its CMS. I created custom_field that suppoust to be integer but table in db is still text so im forced to do it that way.
Now everything is ok if my
..CAST(cama_custom_fields_relationships.value AS INTEGER) >= 1..
but when this value will be 0
so when my query will look like
CamaleonCms::Post Load (0.9ms) SELECT "cama_posts".* FROM "cama_posts"
INNER JOIN "cama_custom_fields_relationships" ON
"cama_custom_fields_relationships"."objectid" = "cama_posts"."id" AND
"cama_custom_fields_relationships"."object_class" = $1 WHERE
"cama_posts"."post_class" = $2 AND "cama_posts"."taxonomy_id" = $3 AND
((status = 'published') AND
(cama_custom_fields_relationships.custom_field_slug = 'filtry-powierzchnia'
AND ****CAST(cama_custom_fields_relationships.value AS INTEGER) >= 0**** AND
CAST(cama_custom_fields_relationships.value AS INTEGER) <= 10000 ) OR
(cama_custom_fields_relationships.custom_field_slug = 'filtry-cena' AND
CAST(cama_custom_fields_relationships.value AS INTEGER) >= 0 AND
CAST(cama_custom_fields_relationships.value AS INTEGER) <= 10000000 ) AND
(LOWER(title) LIKE '%%' OR LOWER(content_filtered) LIKE '%%')) GROUP BY
cama_posts.id HAVING (COUNT(cama_custom_fields_relationships.objectid) >= 2)
ORDER BY "cama_posts"."post_order" ASC, "cama_posts"."created_at" DESC LIMIT
$4 OFFSET $5 [["object_class", "Post"], ["post_class", "Post"],
["taxonomy_id", 2], ["LIMIT", 6], ["OFFSET", 0]]
Im getting error:
ActionView::Template::Error (PG::InvalidTextRepresentation: ERROR: invalid input syntax for integer: "/media/1/asd.jpg"
: SELECT "cama_posts".* FROM "cama_posts" INNER JOIN "cama_custom_fields_relationships" ON "cama_custom_fields_relationships"."objectid" = "cama_posts"."id" AND "cama_custom_fields_relationships"."object_class" = $1 WHERE "cama_posts"."post_class" = $2 AND "cama_posts"."taxonomy_id" = $3 AND ((status = 'published') AND (cama_custom_fields_relationships.custom_field_slug = 'filtry-powierzchnia' AND CAST(cama_custom_fields_relationships.value AS INTEGER) >= 0 AND CAST(cama_custom_fields_relationships.value AS INTEGER) <= 10000 ) OR (cama_custom_fields_relationships.custom_field_slug = 'filtry-cena' AND CAST(cama_custom_fields_relationships.value AS INTEGER) >= 0 AND CAST(cama_custom_fields_relationships.value AS INTEGER) <= 10000000 ) AND (LOWER(title) LIKE '%%' OR LOWER(content_filtered) LIKE '%%')) GROUP BY cama_posts.id HAVING (COUNT(cama_custom_fields_relationships.objectid) >= 2) ORDER BY "cama_posts"."post_order" ASC, "cama_posts"."created_at" DESC LIMIT $4 OFFSET $5):
and #posts return it on the query lvl so its not error that occurs somewhere else.
I tried CASTing to numeric, same problem. It only happens when its starting condition, when there is another condition before it, it accepts 0 like in my example, the second condition after this one works with >= 0

How to debug sql queries in activerecord-sqlserver-adapter in console

If anyone has a couple of free hours (or days) to help me optimise a few calls and want to be paid for it ( i can offer 150USD an hour ) for their help I would really like your help. I'm getting desperate :)
I've got some sql queries that are quite slow:
Panel Load (1075.7ms) EXEC sp_executesql N'SELECT [panels].* FROM [panels] WHERE [panels].[agglo_code_id] = #0 AND [panels].[environment_id] = #1 AND [panels].[product_id] = #2 AND (NOT EXISTS(SELECT 1 FROM campaign_search_panels WHERE campaign_search_panels.panel_id = panels.panel_id AND campaign_search_panels.campaign_id = 32)) AND (NOT EXISTS(SELECT 1 FROM "AIDAAU_Avails" WHERE "AIDAAU_Avails"."PanelID" = panels.panel_uid AND "AIDAAU_Avails"."TillDate" >= ''08-21-2017'' AND "AIDAAU_Avails"."FromDate" <= ''09-03-2017''))', N'#0 int, #1 int, #2 int', #0 = 24, #1 = 14, #2 = 25 [["agglo_code_id", 24], ["environment_id", "14"], ["product_id", "25"]]
I am trying to figure out how to debug this but I can't quite get it right. I would like to perform an explain on it however I can't access the db directly via a sql client as it's locked down to the ip of the server so I am trying to do it via the rails console on the server.
I can do the following (not sure why it runs two queries):
irb(main):049:0> ActiveRecord::Base.connection.execute('SELECT [panels].* FROM [panels] WHERE [panels].[agglo_code_id] = 24 AND [panels].[environment_id] = 14 AND [panels].[product_id] = 25 AND (NOT EXISTS(SELECT 1 FROM campaign_search_panels WHERE campaign_search_panels.panel_id = panels.panel_id AND campaign_search_panels.campaign_id = 32)) AND (NOT EXISTS(SELECT 1 FROM "AIDAAU_Avails" WHERE "AIDAAU_Avails"."PanelID" = panels.panel_uid AND "AIDAAU_Avails"."TillDate" >= ''08-21-2017'' AND "AIDAAU_Avails"."FromDate" <= ''09-03-2017''))')
(47.3ms) SELECT [panels].* FROM [panels] WHERE [panels].[agglo_code_id] = 24 AND [panels].[environment_id] = 14 AND [panels].[product_id] = 25 AND (NOT EXISTS(SELECT 1 FROM campaign_search_panels WHERE campaign_search_panels.panel_id = panels.panel_id AND campaign_search_panels.campaign_id = 32)) AND (NOT EXISTS(SELECT 1 FROM "AIDAAU_Avails" WHERE "AIDAAU_Avails"."PanelID" = panels.panel_uid AND "AIDAAU_Avails"."TillDate" >= 08-21-2017 AND "AIDAAU_Avails"."FromDate" <= 09-03-2017))
(47.3ms) SELECT [panels].* FROM [panels] WHERE [panels].[agglo_code_id] = 24 AND [panels].[environment_id] = 14 AND [panels].[product_id] = 25 AND (NOT EXISTS(SELECT 1 FROM campaign_search_panels WHERE campaign_search_panels.panel_id = panels.panel_id AND campaign_search_panels.campaign_id = 32)) AND (NOT EXISTS(SELECT 1 FROM "AIDAAU_Avails" WHERE "AIDAAU_Avails"."PanelID" = panels.panel_uid AND "AIDAAU_Avails"."TillDate" >= 08-21-2017 AND "AIDAAU_Avails"."FromDate" <= 09-03-2017))
=> 1143
and its much faster that the above but is that because I have replaced all the scalar variables or why is it so much faster? Is there any way I can run the query exactly the same? ie:
query = <<-SQL
EXEC sp_executesql N'SELECT [panels].* FROM [panels] WHERE [panels].[agglo_code_id] = #0 AND [panels].[environment_id] = #1 AND [panels].[product_id] = #2 AND (NOT EXISTS(SELECT 1 FROM campaign_search_panels WHERE campaign_search_panels.panel_id = panels.panel_id AND campaign_search_panels.campaign_id = 32)) AND (NOT EXISTS(SELECT 1 FROM "AIDAAU_Avails" WHERE "AIDAAU_Avails"."PanelID" = panels.panel_uid AND "AIDAAU_Avails"."TillDate" >= ''08-21-2017'' AND "AIDAAU_Avails"."FromDate" <= ''09-03-2017''))', N'#0 int, #1 int, #2 int', #0 = 24, #1 = 14, #2 = 25 [["agglo_code_id", 24], ["environment_id", "14"], ["product_id", "25"]]
SQL
ActiveRecord::Base.connection.execute(query)
ActiveRecord::StatementInvalid: TinyTds::Error: Incorrect syntax near '["agglo_code_id", 24'.:
any ideas how it can be improved?
Without the execution plan, it will be extremely difficult to diagnose what the exact performance problem is. However, just by looking at your SQL, I see a huge red flag for me that is likely your performance problem.
SELECT
[panels].*
FROM [panels]
WHERE
[panels].[agglo_code_id] = #0
AND
[panels].[environment_id] = #1
AND
[panels].[product_id] = #2
AND
(
NOT EXISTS( SELECT 1
FROM campaign_search_panels
WHERE
campaign_search_panels.panel_id = panels.panel_id
AND
campaign_search_panels.campaign_id = 32)
)
AND
(
NOT EXISTS( SELECT 1
FROM AIDAAU_Avails
WHERE
AIDAAU_Avails.PanelID = panels.panel_uid
AND
AIDAAU_Avails.TillDate >= '08-21-2017'
AND
AIDAAU_Avails.FromDate <= '09-03-2017')
)
When I extracted your dynamic SQL and made it pretty, I discovered two things that can cause performance problems. First, you have a SELECT * which will grab every column from the table, regardless if you need it. You could potentially be slowing yourself down because you are grabbing way more data then you actually need.
The second thing, which is my huge red flag, is you have two NOT EXISTS clauses that run SQL queries. Depending on the amount of data between the three tables, this can be a very expensive operation. For every record returned by your main your query, you need to run each of the NOT EXISTS queries. That means if the main query returns 100 rows, you have to run 200 additional queries to satisfy your where clause.
To fix this, you should be able replace those NOT EXISTS with two LEFT JOIN. I can guess on how to do it, but without data to work with, I can't be certain and don't want to give you something that makes things worse.
To give you an idea of performance difference, I had a query doing something similar. It would take 36 hours to run due to the size of the data. I replaced the sub-queries with some sort of JOIN and I had it running in less then an hour.

Rails sum on AssociationRelation attribute is incorrect if association has limit clause

I have a method that computes stats (mainly sums) on a number of float attributes in a model.
The models
class GroupPlayer < ActiveRecord::Base
belongs_to :group
has_many :scored_rounds
has_many :rounds, dependent: :destroy
end
class Round < ActiveRecord::Base
belongs_to :group_player
end
class ScoredRound < Round
# STI
end
The method that provides stats on up to 4 float attributes that is called from a other methods, depending if I'm getting stats for one player or a group of players. An initial filter on ScoredRound is passed to the method (sr)
def method_stats(method,sr,grp)
rounds = sr.where.not(method => nil)
number_rounds = rounds.count
won = rounds.sum(method).round(2)
if method == :quality
dues = grp.options[:dues] * number_rounds
else
dues = grp.options["#{method.to_s}_dues"] * number_rounds
end
balance = (won - dues).round(2)
perc = dues > 0 ? (won / dues).round(3) : 0.0
[self.full_name,number_rounds,won,dues,balance,perc]
end
3 of the 4 attributes I am summing in ScoredRounds may not be set (nil) if the player did not win that game so the rounds are filtered.
Everything worked fine until I decided to add a limit on how many rounds to use. For instance if I only wanted status for the last 25 rounds in the query passed to method_stats I'd call:
def money_stats(grp,method,limit=100)
sr = self.scored_rounds.where.not(method => nil).order(:date).reverse_order.limit(limit)
method_stats(method,sr,grp)
end
Again, I just added the limit and order clause to the query. Worked fine for all records.
If I simulate the procedure in the console with out using the above methods (or using them!) I'll get an erroneous sum
gp = GroupPlayer.find(123)
GroupPlayer Load (2.1ms) SELECT "group_players".* FROM "group_players" WHERE "group_players"."id" = $1 LIMIT $2 [["id", 123], ["LIMIT", 1]]
=> valid group player
sr = gp.scored_rounds.where.not(:quality => nil)
ScoredRound Load (1.7ms) SELECT "rounds".* FROM "rounds" WHERE "rounds"."type" IN ('ScoredRound') AND "rounds"."group_player_id" = $1 AND ("rounds"."quality" IS NOT NULL) [["group_player_id", 123]]
=> #<ActiveRecord::AssociationRelation [#<ScoredRound id: 5706, player_id: 123, group_player_id: 123, event_id: 12, type: "ScoredRound", date: "2016-11-04", team: 3, tee: "White", quota: 32, front: 15, back: 15, total: 30, created_at: "2016-11-04 14:18:27", updated_at: "2016-11-04 19:12:47", quality: 0.0, skins: nil, par3: nil, other: nil>,...]
sr.count
(1.5ms) SELECT COUNT(*) FROM "rounds" WHERE "rounds"."type" IN ('ScoredRound') AND "rounds"."group_player_id" = $1 AND ("rounds"."quality" IS NOT NULL) [["group_player_id", 123]]
=> 44
sr.sum(:quality)
(1.0ms) SELECT SUM("rounds"."quality") FROM "rounds" WHERE "rounds"."type" IN ('ScoredRound') AND "rounds"."group_player_id" = $1 AND ("rounds"."quality" IS NOT NULL) [["group_player_id", 123]]
=> 354.166666666667
# Now if I add the order and limit clause
sr = gp.scored_rounds.where.not(:quality => nil).order(:date).reverse_order.limit(25)
ScoredRound Load (1.6ms) SELECT "rounds".* FROM "rounds" WHERE "rounds"."type" IN ('ScoredRound') AND "rounds"."group_player_id" = $1 AND ("rounds"."quality" IS NOT NULL) ORDER BY "rounds"."date" DESC LIMIT $2 [["group_player_id", 123], ["LIMIT", 25]]
=> => #<ActiveRecord::AssociationRelation [...]
sr.count
(1.1ms) SELECT COUNT(count_column) FROM (SELECT 1 AS count_column FROM "rounds" WHERE "rounds"."type" IN ('ScoredRound') AND "rounds"."group_player_id" = $1 AND ("rounds"."quality" IS NOT NULL) LIMIT $2) subquery_for_count [["group_player_id", 123], ["LIMIT", 25]]
=> 25
sr.sum(:quality)
(1.8ms) SELECT SUM("rounds"."quality") FROM "rounds" WHERE "rounds"."type" IN ('ScoredRound') AND "rounds"."group_player_id" = $1 AND ("rounds"."quality" IS NOT NULL) LIMIT $2 [["group_player_id", 123], ["LIMIT", 25]]
=> 354.166666666667
### This is the error, it return the sum off all records,
# not the limited???? if I use pluck and sum
sr.pluck(:quality)
=> [10.0, 11.3333333333333, 10.0, 34.0, 0.0, 7.33333333333333, 0.0, 0.0, 31.5, 0.0, 21.3333333333333, 0.0, 19.0, 0.0, 0.0, 7.5, 0.0, 20.0, 10.0, 28.0, 8.0, 9.5, 0.0, 3.0, 24.0]
sr.pluck(:quality).sum
=> 254.49999999999994
Don't know if I found a bug in AREL or I'm doing something wrong. I tried it with just Round instead of the STI ScoredRound with the same results.
Any ideas?
If you notice, the SUM results for both, with and without LIMIT, are the same:
sr = gp.scored_rounds.where.not(:quality => nil)
sr.sum(:quality)
(1.0ms) SELECT SUM("rounds"."quality") FROM "rounds" WHERE "rounds"."type" IN ('ScoredRound') AND "rounds"."group_player_id" = $1 AND ("rounds"."quality" IS NOT NULL) [["group_player_id", 123]]
=> 354.166666666667
sr = gp.scored_rounds.where.not(:quality => nil).order(:date).reverse_order.limit(25)
sr.sum(:quality)
(1.8ms) SELECT SUM("rounds"."quality") FROM "rounds" WHERE "rounds"."type" IN ('ScoredRound') AND "rounds"."group_player_id" = $1 AND ("rounds"."quality" IS NOT NULL) LIMIT $2 [["group_player_id", 123], ["LIMIT", 25]]
=> 354.166666666667
That's because LIMIT affects the number of rows returned by the query and SUM returns just one, so the function is applied for all the 44 records, not the 25 given to LIMIT. That's not what happens with sr.pluck(:quality).sum which applies only to the 25 records returned by the query.
Don't know if I found a bug in AREL or I'm doing something wrong
Sadly, 99.9% of times is not a bug but our fault :(
# File activerecord/lib/active_record/relation/calculations.rb, line 75
def sum(column_name = nil)
return super() if block_given?
calculate(:sum, column_name)
end
if you call sr.sum(:quality) then sum take quality as a column name and Calculates the sum of values on a given column.

Ruby ActiveRecord group and average "syntax error at or near "AS""

params = {device_id: "u100", device_point: "1", point_no: "1", max_date: "2016-12-01 12:00", min_date: "2016-12-01 11:40"}
I am working in a ruby/rails environment with a PG database.
I am trying to add to my device controller to use ActiveRecord to get 10 minute averages from a some dummy sensor data.
I can group by minute fine with;
.group("date_trunc('minute', device_time)").average('point_data_val')
e.g
device_data = DeviceDatum.select('device_time, point_data_val').filter(params.slice(:device_id, :device_point, :point_no, :iot_version, :iot_time, :device_time, :point_data_type, :point_data_val, :point_bat_val,:min_val, :max_val, :min_date, :max_date)).limit(1008).group("date_trunc('minute', device_time)").average('point_data_val')
(240.3ms) SELECT AVG("device_data"."point_data_val") AS average_point_data_val, date_trunc('minute', device_time) AS date_trunc_minute_device_time FROM "device_data" WHERE "device_data"."device_id" = $1 AND "device_data"."device_point" = $2 AND "device_data"."point_no" = $3 AND (device_time >= '2016-12-01 11:40') AND (device_time <= '2016-12-01 12:00') GROUP BY date_trunc('minute', device_time) LIMIT 1008 [["device_id", "u100"], ["device_point", 1], ["point_no", 1]]
=> {2016-12-01 11:41:00 +0900=>#<BigDecimal:7f91599cfbc0,'-0.4816E1',18(27)>, 2016-12-01 11:51:00 +0900=>#<BigDecimal:7f91599cf8f0,'-0.4868E1',18(27)>}
I found this method to do it
But when I try and use AS in the group method
DeviceDatum.select('device_time, point_data_val')
.filter(params.slice(:device_id, :device_point, :point_no, :iot_version, :iot_time, :device_time,
:point_data_type, :point_data_val, :point_bat_val,:min_val, :max_val, :min_date,
:max_date)).limit(1008)
.group("date_trunc('hour', device_time) AS hour_stump,
(extract(minute FROM device_time)::int / 10)
AS min10_slot,count(*)").average('point_data_val')
I am getting this error.
ActiveRecord::StatementInvalid: PG::SyntaxError: ERROR: syntax error at or near "AS"
LINE 1: ... 12:00') GROUP BY date_trunc('hour', device_time) AS hour_st... : SELECT AVG("device_data"."point_data_val") AS average_point_data_val, date_trunc('hour', device_time) AS hour_stump,(extract(minute FROM device_time)::int / 10) AS min10_slot,count(*) AS date_trunc_hour_device_time_as_hour_stump_extract_minute_from_d FROM "device_data" WHERE "device_data"."device_id" = $1 AND "device_data"."device_point" = $2 AND "device_data"."point_no" = $3 AND (device_time >= '2016-12-01 11:40') AND (device_time <= '2016-12-01 12:00') GROUP BY date_trunc('hour', device_time) AS hour_stump,(extract(minute FROM device_time)::int / 10) AS min10_slot,count(*) LIMIT 1008
from ...../bundle/ruby/2.1.0/gems/activerecord-4.2.6/lib/active_record/connection_adapters/postgresql_adapter.rb:637:in `prepare'
ActiveRecord seems to be using the options from the group method in sql it generates for average as well.
SELECT
AVG("device_data"."point_data_val") AS average_point_data_val,
date_trunc('hour', device_time) AS hour_stump,
(
extract(minute
FROM
device_time)::int / 10
)
AS min10_slot,
count(*) AS date_trunc_hour_device_time_as_hour_stump_extract_minute_from_d
FROM
"device_data"
WHERE
"device_data"."device_id" = $1
AND "device_data"."device_point" = $2
AND "device_data"."point_no" = $3
AND
(
device_time >= '2016-12-01 11:40'
)
AND
(
device_time <= '2016-12-01 12:00'
)
GROUP BY
date_trunc('hour', device_time) AS hour_stump,
(
extract(minute
FROM
device_time)::int / 10
)
AS min10_slot,
count(*) LIMIT 1008
What am I missing here?
I think that your group clause should look like this:
group("hour_stump, min10_slot,
date_trunc_hour_device_time_as_hour_stump_extract_minute_from_d")
if you use aliases in select clause. As #Fallenhero and #Iceman said, aliases in group by are not allowed.
If aliases are substituted in select automatically, I think the only way is to use functions as is
group("date_trunc('hour', device_time),
(extract(minute FROM device_time)::int / 10), count(*)")
.

Resources