Limit query by sum of attributes of objects in Rails and Postgresql - ruby-on-rails

I have a Lead model that has a pre-calculated float column called "sms_price". I want to allow users to send text messages to those leads, and to assign a budget to their campaigns (something similar to what you can find on fb ads).
I need a scope that can limit the number of leads by the total price of those leads. So for example if the user has defined a total budget of 200
id: 1 | sms_price: 0.5 | total_price: 0.5
id: 2 | sms_price: 1.2 | total_price: 1.7
id: 3 | sms_price: 0.9 | total_price: 2.6
...
id: 94 | sms_price: 0.8 | total_price: 199.4 <--- stop query here
id: 95 | sms_price: 0.7 | total_price: 200.1
So I need two things in my scope:
Calculate the total price recursively
Get only the leads that have a total price lower than the desired budget
So far I have only managed to do the first task (Calculate the total price recursively) using this scope:
scope :limit_by_total_price, ->{select('"leads".*, sum(sms_price) OVER(ORDER BY id) AS total_price')}
This works and if I do Lead.limit_by_total_price.last.total_price I get 38039.7499999615
Now what I need is a way to retrieve only the leads that have a total price lower than the budget:
scope :limit_by_total_price, ->(budget){select('"leads".*, sum(sms_price) OVER(ORDER BY id) AS total_price').where('total_price < ?', budget)}
But it doesn't recognise the total_price attribute:
ActiveRecord::StatementInvalid: PG::UndefinedColumn: ERROR: column "total_price" does not exist
Why does it recognise the total_price attribute in a single object and not in the scope ?

The problem is that the columns calculated in a SELECT clause are not available to the WHERE clause in the same statement. To do what you want, you need a subquery.
You can do this and yet stay in the Rails universe using ActiveRecord's from method. The technique is nicely illustrated in this Hashrocket blog post.
In your case it might look something like this (because of the complexity, I would use a class method rather than a scope):
def self.limit_by_total_price(budget)
subquery = select('leads.*, sum(leads.sms_price) over(order by leads.id) as total_price')
from(subquery, :leads).where('total_price < ?', budget)
end

Related

Advanced search with many to many in Rails 4

I have a list of real estates, which each real estate has some features. Then, I have 2 tables:
real_estates, to store all real estates add by users.
re_home_features, to store all default features, added by admin, like pool, closet, office, garden and a lot of features that the real estate can to have
The same real_estate can to have many features AND the same feature can to have many real estates. I created these models:
real_estate.rb
class RealEstate < ActiveRecord::Base
has_many :real_estate_home_features
has_many :re_home_features, as: :home_features, through: :real_estate_home_features, dependent: :destroy
end
re_home_features.rb
class ReHomeFeature < ActiveRecord::Base
has_many :real_estate_home_features
has_many :real_estates, through: :real_estate_home_features
end
real_estate_home_feature.rb
class RealEstateHomeFeature < ActiveRecord::Base
belongs_to :real_estate
belongs_to :re_home_feature
end
With this, the relation many to many is working fine.
I have a search to real estates with some parameters, like:
Number of living rooms
Number of bathrooms
Sell price (min to max)
Area total (min to max)
Real estate code
And a lot of other params
My search is like that:
real_estates_controller.rb
def search
r = real_estates
r = r.where(code: params['code']) if params['code'].present?
r = r.where(city_name: params['city']) if params['city'].present?
r = r.where(garage: params['garage']) if params['garage'].present?
r = r.where(restrooms: params['restrooms']) if params['restrooms'].present?
r = r.paginate(:page => params[:page], :per_page => 10)
r
end
This search is working fine too. No problems with this, because all parameters are within the same table real_estates.
But now, the search is a little bit more complex. I have to search real estates with specific features. Example: I want all real estates, which has 4 restrooms, 2 cars in garage AND has pool.
In my example, a search in real estates with 4 restrooms and 2 cars returned to me 50 real estates, but only 15 of these real estates have pool.
How can I filter these 50 real estates to show only the records associated with the 'pool feature'?
I can't to verify in the view, because causes a wrong number per page. I think the filter must occur in the database query moment, just before the paginate.
I appreciate any help!
environment
rails -v: Rails 4.2.1
ruby -v: ruby 2.2.2p95 (2015-04-13 revision 50295) [x86_64-linux]
so: Ubuntu 16.04.5 LTS
localhost database: MySQL (development)
server database: Postgres (production)
EDIT 1
Based on #jvillian answer, I will change my controller, adding a JOIN in the query.
if params['home_features'].present? && params['home_features'].any?
r = r.joins(:re_home_features).where(re_home_features: {id: params['home_features']}).distinct(:id)
end
In my tests, I had:
params['home_features'] = [1 , 2, 3]
I have 1 real estate which has these 3 features. But, in the view, I got the same site showed 3 times. If I added the distinct, the real estate is showed only once. BUT... If I change the params to:
params['home_features'] = [1 , 2, 3, 500]
I have no real estates with these 4 features, but the results are the same.
Without distinct, the real estate are showed 3 times. With the dinstict, the real estate is showed once. The expected result is zero results, because I want real estates with all the selected features.
But I think we are almost there! I will provide some information about my models:
table real_estates
id | title | description | code | city_name | garage | ...
7 | Your House | Awesome... | 1234 | Rio de Janeiro | 4
table re_home_features
id | name
1 | Pool
2 | Garden
3 | Tenis court
4 | Closet
table real_estate_home_features - association many to many
id | real_estate_id | re_home_feature_id
1 | 7 | 1
2 | 7 | 2
3 | 7 | 3
If I run:
r = r.joins(:re_home_features).where(re_home_features: {id: [1,2,3,500]}).distinct(:id)
I got these query (rails console):
SELECT DISTINCT `real_estates`.* FROM `real_estates` INNER JOIN `real_estate_home_features` ON `real_estate_home_features`.`real_estate_id` = `real_estates`.`id` INNER JOIN `re_home_features` ON `re_home_features`.`id` = `real_estate_home_features`.`re_home_feature_id` WHERE `re_home_features`.`id` IN (1, 2, 3, 500)
And it returns 1 result. The real estate id 7. If I remove the distinct, I have 3 results: the same real estate, 3 times.
The expected is zero results. If params['home_features'] = [1 , 2, 3] I expect 1 result.
EDIT 2
This join method works, but it returns like "OR" query. In my case, I need a join query with "AND". The query must be: "Return all real estates which has features 1 AND 2 AND 3".
Tks!
I'm not sure this is an "advanced search". Searching on joined models is covered in the guide under 12.1.4 Specifying Conditions on the Joined Tables. It would look something like:
real_estates.joins(:re_home_features).where(re_home_features: {feature_name: 'pool'})
Naturally, that's probably not going to exactly work because you don't tell us much about how to find the 'pool feature'. But, it should give you the right direction.
BTW, the reason this:
params['home_features'] = [1 , 2, 3, 500]
returns records that have any of the home_features is because it employs an 'or'. If you want real_estates with all home_features, then you want an 'and'. You can google around on that.
I think I would try something like:
def search
r = real_estates.joins(:re_home_features)
%i(
code
city_name
garage
restrooms
).each do |attribute|
r = r.where(attribute => params[attribute]) unless params[attribute].blank?
end
params[:home_features].each do |home_feature_id|
r = r.where(re_home_features: {id: home_feature_id})
end unless params[:home_features].blank?
r = r.uniq
r = r.paginate(:page => params[:page], :per_page => 10)
r
end
NOTE: I changed params[:city] to params[:city_name] to make that first each iterator cleaner. You will need to change your view if you want to do it this way. If you don't want to do it this way, then you can go back to the non-iterator approach you already have.
This is not tested, so I'm not confident that it will work exactly as presented.

How do I write a Rails finder method that will return the greatest date grouped by record?

I'm using Rails 5 with PostGres 9.5. I have a table that tracks prices ...
Table "public.crypto_prices"
Column | Type | Modifiers
--------------------+-----------------------------+------------------------------------------------------------
id | integer | not null default nextval('crypto_prices_id_seq'::regclass)
crypto_currency_id | integer |
market_cap_usd | bigint |
total_supply | bigint |
last_updated | timestamp without time zone |
created_at | timestamp without time zone | not null
updated_at | timestamp without time zone | not null
I would like to get the latest price per currency (where last_updated is greatest) for a select currencies. I can find all the prices related to certain currencies like so
current_prices = CryptoPrice.where(crypto_currency_id: CryptoIndexCurrency.all.pluck(:crypto_currency_id).uniq)
Then I can sort them by currency into arrays, looping through each until I find the one with the greatest last_updated value, but how can I write a finder that will return exactly one row per currency with the greatest last_updated date?
Edit: Tried Owl Max's suggestion like so
ids = CryptoIndexCurrency.all.pluck(:crypto_currency_id).uniq
crypto_price_ids = CryptoPrice.where(crypto_currency_id: ids).group(:crypto_currency_id).maximum(:last_updated).keys
puts "price ids: #{crypto_price_ids.length}"
#crypto_prices = CryptoPrice.where(crypto_currency_id: crypto_price_ids)
puts "ids: #{#crypto_prices.size}"
Although the first "puts" only reveals a size of "12" the second puts reveals over 38,000 results. It should only be returning 12 results, one for each currency.
We can write a finder that will return exactly one row per currency with the greatest last_updated date in such a way like
current_prices = CryptoPrice.where(crypto_currency_id: CryptoIndexCurrency.all.pluck(:crypto_currency_id).uniq).select("*, id as crypto_price_id, MAX(last_updated) as last_updated").group(:crypto_currency_id)
I hope that this will took you closer to your goal. Thank you.
Only works with Rails5 because of or query method
specific_ids = CryptoIndexCurrency.distinct.pluck(:crypto_currency_id)
hash = CryptoPrice.where(crypto_currency_id: specific_ids)
.group(:crypto_currency_id)
.maximum(:last_updated)
hash.each_with_index do |(k, v), i|
if i.zero?
res = CryptoPrice.where(crypto_currency_id: k, last_updated: v)
else
res.or(CryptoPrice.where(crypto_currency_id: k, last_updated: v))
end
end
Explanation:
You can use group to regroup all your CryptoPrice object by each CryptoIndexCurrency presents in your table.
Then using maximum (thanks to #artgb) to take the biggest value last_updated. This will output a Hash with keys: crypto_currency_id and value
last_updated.
Finally, you can use keys to only get an Array of crypto_currency_id.
CryptoPrice.group(:crypto_currency_id).maximum(:last_updated)
=> => {2285=>2017-06-06 09:06:35 UTC,
2284=>2017-05-18 15:51:05 UTC,
2267=>2016-03-22 08:02:53 UTC}
The problem with this solution is that you get the maximum date for each row without getting the whole records.
To get the the records, you can do a loop on the hash pairwise. with crypto_currency_id and last_updated. It's hacky but the only solution I found.
Using this code you can fetch the latest updated row here from particular table.
CryptoPrice.order(:updated_at).pluck(:updated_at).last
This Should be help for you.
This is currently not easy to do in Rails in one statement/query. If you don't mind using multiple statements/queries than this is your solution:
cc_ids = CryptoIndexCurrency.distinct.pluck(:crypto_currency_id)
result = cc_ids.map do |cc_id|
max_last_updated = CryptoPrice.where(crypto_currency_id: cc_id).maximum(:last_updated)
CryptoPrice.find_by(crypto_currency_id: cc_id, last_updated: max_last_updated)
end
The result of the map method is what you are looking for. This produces 2 queries for every crypto_currency_id and 1 query to request the crypto_currency_ids.
If you want to do this with one query you'll need to use OVER (PARTITION BY ...). More info on this in the following links:
Fetch the row which has the Max value for a column
https://learn.microsoft.com/en-us/sql/t-sql/queries/select-over-clause-transact-sql
https://blog.codeship.com/folding-postgres-window-functions-into-rails/
But in this scenario you'll have to write some SQL.
EDIT 1:
If you want a nice Hash result run:
cc_ids.zip(result).to_h
EDIT 2:
If you want to halve the amount of queries you can shove the max_last_updated query in the find_by as sub-query like so:
cc_ids = CryptoIndexCurrency.distinct.pluck(:crypto_currency_id)
result = cc_ids.map do |cc_id|
CryptoPrice.find_by(<<~SQL.squish)
crypto_currency_id = #{cc_id} AND last_updated = (
SELECT MAX(last_updated)
FROM crypto_prices
WHERE crypto_currency_id = #{cc_id})
SQL
end
This produces 1 queries for every crypto_currency_id and 1 query to request the crypto_currency_ids.

How to get objects linked to objects?

I'm confused about something in Rails (using Rails 5). I have this model
class MyEventActivity < ApplicationRecord
belongs_to :event_activity
end
and what I want to do is get a list of all the objects linked to it, in other words, all the "event_activity" objects. I thought this would do the trick
my_event_activities = MyEventActivity.all.pluck(:event_activity)
but its giving me this SQL error
(2.3ms) SELECT "event_activity" FROM "my_event_activities"
ActiveRecord::StatementInvalid: PG::UndefinedColumn: ERROR: column "event_activity" does not exist
LINE 1: SELECT "event_activity" FROM "my_event_activities"
How do I get the objects linked to the MyEventActivity objects? Note that I don't want just the IDs, I want the whole object.
Edit: This is the PostGres table as requested
eactivit=# \d event_activities;
Table "public.event_activities"
Column | Type | Modifiers
--------------------------+-----------------------------+----------------------------------------------------------------
id | integer | not null default nextval('event_activities_id_seq'::regclass)
name | character varying |
abbrev | character varying |
attendance | bigint |
created_at | timestamp without time zone | not null
updated_at | timestamp without time zone | not null
EventActivity.joins(:my_event_activities).distinct
Returns all EventActivity objects that have associated MyEventActivity records
Or more along the lines of what you've already tried:
EventActivity.where(id: MyEventActivity.all.pluck(:event_activity_id).uniq)
But the first one is preferable for its brevity and performance.
Update to explain why the first option should be preferred
TL;DR much faster and more readable
Assume we have 100 event_activities, and all but the last (id: 100) have 100 my_event_activities for a total of 9900 my_event_activities.
EventActivity.where(id: MyEventActivity.all.pluck(:event_activity_id).uniq) performs two SQL queries:
SELECT "my_event_activities"."event_activity_id" FROM "my_event_activities" which will return an Array of 9900 non-unique event_activity_ids. We want to reduce this to unique ids to optimize the second query, so we call Array#uniq which has its own performance cost on large arrays, reducing 9900 down to 99. Then we can call the second query: SELECT "event_activities".* FROM "event_activities" WHERE "event_activities"."id" IN (1, 2, 3, ... 97, 98, 99)
EventActivity.joins(:my_event_activities).distinct performs only one SQL query: SELECT DISTINCT "event_activities".* FROM "event_activities" INNER JOIN "my_event_activities" ON "my_event_activities"."event_activity_id" = "event_activities"."id". Once we drop into the database we never have to switch back to Ruby to perform some expensive process and then make a second trip back to the database. joins is designed for performing these types of chainable and composable queries in situations like this.
The performance difference can be checked with a simple benchmark. With an actual Postgres database loaded with 100 event_activities, 99 of which have 100 my_event_activities:
require 'benchmark/ips'
require_relative 'config/environment'
Benchmark.ips do |bm|
bm.report('joins.distinct') do
EventActivity.joins(:my_event_activities).distinct
end
bm.report('pluck.uniq') do
EventActivity.where(id: MyEventActivity.all.pluck(:event_activity_id).uniq)
end
bm.compare!
end
And the results:
Warming up --------------------------------------
joins.distinct 5.922k i/100ms
pluck.uniq 7.000 i/100ms
Calculating -------------------------------------
joins.distinct 71.504k (± 3.5%) i/s - 361.242k in 5.058311s
pluck.uniq 73.459 (±13.6%) i/s - 364.000 in 5.061892s
Comparison:
joins.distinct: 71503.9 i/s
pluck.uniq: 73.5 i/s - 973.38x slower
973x slower :-O ! The joins method is meant to be used for things just like this, and this is one of the happy cases in Ruby where more readable is also more performant.

How to make Rails/ActiveRecord return unique objects using join table's boolean column

I have a Rails 4 app using ActiveRecord and Postgresql with two tables: stores and open_hours. a store has many open_hours:
stores:
Column |
--------------------+
id |
name |
open_hours:
Column |
-----------------+
id |
open_time |
close_time |
store_id |
The open_time and close_time columns represent the number of seconds since midnight of Sunday (i.e. beginning of the week).
I would like to get list of store objects ordered by whether the store is open or not, so stores that are open will be ranked ahead of the stores that are closed. This is my query in Rails:
Store.joins(:open_hours).order("#{current_time} > open_time AND #{current_time} < close_time desc")
Notes that current_time is in number of seconds since midnight on the previous Sunday.
This gives me a list of stores with the currently open stores ranked ahead of the closed ones. However, I'm getting a lot of duplicates in the result.
I tried using the distinct, uniq and group methods, but none of them work:
Store.joins(:open_hours).group("stores.id").group("open_hours.open_time").group("open_hours.close_time").order("#{current_time} > open_time AND #{current_time} < close_time desc")
I've read a lot of the questions/answers already on Stackoverflow but most of them don't address the order method. This question seems to be the most relevant one but the MAX aggregate function does not work on booleans.
Would appreciate any help! Thanks.
Here is what I did to solve the issue:
In Rails:
is_open = "bool_or(#{current_time} > open_time AND #{current_time} < close_time)"
Store.select("stores.*, CASE WHEN #{is_open} THEN 1 WHEN #{is_open} IS NULL THEN 2 ELSE 3 END AS open").group("stores.id").joins("LEFT JOIN open_hours ON open_hours.store_id = stores.id").uniq.order("open asc")
Explanation:
The is_open variable is just there to shorten the select statement.
The bool_or aggregate function is needed here to group the open_hours records. Otherwise there likely will be two results for each store (one open and one closed), which is why using the uniq method alone doesn't eliminate the duplicate issues
LEFT JOIN is used instead of INNER JOIN so we can include the stores that don't have any open_hours objects
The store can be open (i.e. true), closed (i.e. false) or not determined (i.e. nil), so the CASE WHEN statement is needed here: if a store is open, then it's 1, 2 if not determined and 3 if closed
Ordering the results ASC will show open stores first, then the not determined ones, then the closed stores.
This solution works but doesn't feel very elegant. Please post your answer if you have a better solution. Thanks a lot!
Have you tried uniq method, just append it at the end
Store.joins(:open_hours).order("#{current_time} > open_time AND #{current_time} < close_time desc").uniq

Sorting by rank and total where multiple entries may exist

Ruby 2.1.5
Rails 4.2.1
My model is contributions, with the following fields:
event, contributor, date, amount
The table would have something like this:
earth_day, joe, 2014-04-14, 400
earth_day, joe, 2015-05-19, 400
lung_day, joe, 2015-05-20, 800
earth_day, john, 2015-05-19, 600
lung_day, john, 2014-04-18, 900
lung_day, john, 2015-05-21, 900
I have built an index view that shows all these fields and I implemented code to sort (and reverse order) by clicking on the column titles in the Index view.
What I would to do is have the Index view displayed like this:
Event Contributor Total Rank
Where event is only listed once per contributor and the total is sum of all contributions for this event by the contributor and rank is how this contributor ranks relative to everyone else for this particular event.
I am toying with having a separate table where only a running tally is kept for each event/contributor and a piece of code to compute rank and re-insert it in the table, then use that table to drive views.
Can you think of a better approach?
Keeping a running tally is a fine option. Writes will slow down, but reads will be fast.
Another way is to create a database view, if you are using postgresql, something like:
-- Your table structure and data
create table whatever_table (event text, contributor text, amount int);
insert into whatever_table values ('e1', 'joe', 1);
insert into whatever_table values ('e2', 'joe', 1);
insert into whatever_table values ('e1', 'jim', 0);
insert into whatever_table values ('e1', 'joe', 1);
insert into whatever_table values ('e1', 'bob', 1);
-- Your view
create view event_summary as (
select
event,
contributor,
sum(amount) as total,
rank() over (order by sum(amount) desc) as rank
from whatever_table
group by event, contributor
);
-- Using the view
select * from event_summary order by rank;
event | contributor | total | rank
-------+-------------+-------+------
e1 | joe | 2 | 1
e1 | bob | 1 | 2
e2 | joe | 1 | 2
e1 | jim | 0 | 4
(4 rows)
Then you have an ActiveRecord class like:
class EventSummary < ActiveRecord::Base
self.table_name = :event_summary
end
and you can do stuff like EventSummary.order(rank: :desc) and so on. This won't slow down writes, but reads will be a little slower, depending on how much data you are working with.
Postgresql also has support for materialized views, which could give you the best of both worlds, assuming you can have a little bit of lag between when the data is entered and when the summary table is updated.

Resources