I'm using PostgreSQL 9.4. I've got a resources table which has the following columns:
id
name
provider
description
category
Let's say none of these columns is required (save for the id). I want resources to have a completion level, meaning that resources with NULL values for each column will be at 0% completion level.
Now, each column has a percentage weight. Let's say:
name: 40%
provider: 30%
description: 20%
category: 10%
So if a resource has a provider and a category, its completion level is at 60%.
These weight percentages could change at any time, so having a completion_level column which always has the value of the completion level will not work out (there could be million of resources). For example, at any moment, the percentage weight of description could decrease from 20% to 10% and category's from 10% to 20%. Maybe even other columns could be created and have their own weight.
The final objective is to be able to order resources by their completion levels.
I'm not sure how to approach this. I'm currently using Rails so almost all interaction with the database has been through the ORM, which I believe is not going to be much help in this case.
The only query I've found that somewhat resembles a solution (and not really) is to do something like the following:
SELECT * from resources
ORDER BY CASE name IS NOT NULL AND
provider IS NOT NULL AND
description is NOT NULL AND
category IS NOT NULL THEN 100
WHEN name is NULL AND provider IS NOT NULL...
However, there I must per mutate by every possible combination and that's pretty bad.
Add a weights table as in this SQL Fiddle:
PostgreSQL 9.6 Schema Setup:
CREATE TABLE resource_weights
( id int primary key check(id = 1)
, name numeric
, provider numeric
, description numeric
, category numeric);
INSERT INTO resource_weights
(id, name, provider, description, category)
VALUES
(1, .4, .3, .2, .1);
CREATE TABLE resources
( id int
, name varchar(50)
, provider varchar(50)
, description varchar(50)
, category varchar(50));
INSERT INTO resources
(id, name, provider, description, category)
VALUES
(1, 'abc', 'abc', 'abc', 'abc'),
(2, NULL, 'abc', 'abc', 'abc'),
(3, NULL, NULL, 'abc', 'abc'),
(4, NULL, 'abc', NULL, NULL);
Then calculate your weights at runtime like this
Query 1:
select r.*
, case when r.name is null then 0 else w.name end
+ case when r.provider is null then 0 else w.provider end
+ case when r.description is null then 0 else w.description end
+ case when r.category is null then 0 else w.category end weight
from resources r
cross join resource_weights w
order by weight desc
Results:
| id | name | provider | description | category | weight |
|----|--------|----------|-------------|----------|--------|
| 1 | abc | abc | abc | abc | 1 |
| 2 | (null) | abc | abc | abc | 0.6 |
| 3 | (null) | (null) | abc | abc | 0.3 |
| 4 | (null) | abc | (null) | (null) | 0.3 |
SQL's ORDER BY can order things by pretty much any expression; in particular, you can order by a sum. CASE is also fairly versatile (if somewhat verbose) and an expression so you can say things like:
case when name is not null then 40 else 0 end
which is more or less equivalent to name.nil?? 0 : 40 in Ruby.
Putting those together:
order by case when name is not null then 40 else 0 end
+ case when provider is not null then 30 else 0 end
+ case when description is not null then 20 else 0 end
+ case when category is not null then 10 else 0 end
Somewhat verbose but it'll do the right thing. Translating that into ActiveRecord is fairly easy:
query.order(Arel.sql(%q{
case when name is not null then 40 else 0 end
+ case when provider is not null then 30 else 0 end
+ case when description is not null then 20 else 0 end
+ case when category is not null then 10 else 0 end
}))
or in the other direction:
query.order(Arel.sql(%q{
case when name is not null then 40 else 0 end
+ case when provider is not null then 30 else 0 end
+ case when description is not null then 20 else 0 end
+ case when category is not null then 10 else 0 end
desc
}))
You'll need the Arel.sql call to avoid deprecation warnings in Rails 5.2+ as they don't want you to order(some_string) anymore, they just want you ordering by attributes unless you want to jump through some hoops to say that you really mean it.
Sum up weights like this:
SELECT * FROM resources
ORDER BY (CASE WHEN name IS NULL THEN 0 ELSE 40 END
+ CASE WHEN provider IS NULL THEN 0 ELSE 30 END
+ CASE WHEN description IS NULL THEN 0 ELSE 20 END
+ CASE WHEN category IS NULL THEN 0 ELSE 10 END) DESC;
This is how I would do it.
First: Weights
Since you say that the weights can chage from time to time, you have to create an structure to handle the changes. It could be a simple table. For this solution, it will be called weigths.
-- Table: weights
CREATE TABLE weights(id serial, table_nane text, column_name text, weight numeric(5,2));
id | table_name | column_name | weight
---+------------+--------------+--------
1 | resources | name | 40.00
2 | resources | provider | 30.00
3 | resources | description | 20.00
4 | resources | category | 10.00
So, when you need to change categories from 10 to 20 or/and description from 20 to 10, you update this structure.
Second: completion_level
Since you say that you could have millions of rows, it is ok to have completion_level column in the table resources; for efficiency purposes.
Making a query to get the completion_level works, you could have it in a view. But when you need the data fast and simple and you have MILLIONS of rows, it is better to set the data by "default" in a column or in another table.
When you have a view, every time you run it, it recreates the data. When you have it already on the table, it's fast and you don't have to recreate nothing, just query the data.
But how can you handle a completion_level? TRIGGERS
You would have to create a trigger for resources table. So, whenever you update or insert data, it will create the completion level.
First you add the column to the resources table
ALTER TABLE resources ADD COLUMN completion_level numeric(5,2);
And then you create the trigger:
CREATE OR REPLACE FUNCTION update_completion_level() RETURNS trigger AS $$
BEGIN
NEW.completion_level := (
CASE WHEN NEW.name IS NULL THEN 0
ELSE (SELECT weight FROM weights WHERE column_name='name') END
+ CASE WHEN NEW.provider IS NULL THEN 0
ELSE (SELECT weight FROM weights WHERE column_name='provider') END
+ CASE WHEN NEW.description IS NULL THEN 0
ELSE (SELECT weight FROM weights WHERE column_name='description') END
+ CASE WHEN NEW.category IS NULL THEN 0
ELSE (SELECT weight FROM weights WHERE column_name='category') END
);
RETURN NEW;
END $$ LANGUAGE plpgsql;
CREATE TRIGGER resources_completion_level
BEFORE INSERT OR UPDATE
ON resources
FOR EACH ROW
EXECUTE PROCEDURE update_completion_level();
NOTE: table weights has a column called table_name; it's just in case you want to expand this functionality to other tables. In that case, you should update the trigger and add AND table_name='resources' in the query.
With this trigger, every time you update or insert you would have your completion_level ready so getting this data would be a simple query on resources table ;)
Third: What about old data and updates on weights?
Since the trigger only works for update and inserts, what about old data? or what if I change the weights of the columns?
Well, for those cases you could use a function to recreate all completion_level for every row.
CREATE OR REPLACE FUNCTION update_resources_completion_level() RETURNS void AS $$
BEGIN
UPDATE resources set completion_level = (
CASE WHEN name IS NULL THEN 0
ELSE (SELECT weight FROM weights WHERE column_name='name') END
+ CASE WHEN provider IS NULL THEN 0
ELSE (SELECT weight FROM weights WHERE column_name='provider') END
+ CASE WHEN description IS NULL THEN 0
ELSE (SELECT weight FROM weights WHERE column_name='description') END
+ CASE WHEN category IS NULL THEN 0
ELSE (SELECT weight FROM weights WHERE column_name='category') END
);
END $$ LANGUAGE plpgsql;
So everytime you update the weights or to update the OLD data, you just run the function
SELECT update_resources_completion_level();
Finally: What if I add columns?
Well, you would have to insert the new column in the weights table and update the functions (trigger and update_resources_completion_level()). Once everything is set, you run the function update_resources_completion_level() to set all weights acording to the changes :D
Related
UPDATED with sample data etc.
I am a bit over my head on this complex query. Some background: This is a rails app and I have expenditures model which has many expenditure_items which each has an amount column - these all sum up to a total for the related expenditure.
A given expenditure can be an Order which then can have multiple (or single or nil) related Invoice expenditures. I am looking for a single query that jives me all the orders which have total invoices and identify those that have invoices totalling more than a threshold (in my case 10%).
I get the idea from my searching that I need a sub-select here but I can't sort it out. I apologize as raw SQL is not my wheel house - normal Rails Active Record calls meet 99% of my needs.
Sample Data:
=> SELECT * FROM expenditures WHERE id = 17;
id | category | parent_id
-----+----------------+----------
17 | purchase_order |
=> SELECT * FROM expenditures_items WHERE expenditure_id = 17;
id | amount
-----+-------------
1 | 1000.00
2 | 2000.00
I need to obtain the SUM ( expenditures.amount ) in my result - the original order of $3,000.00.
Related Expenditures (invoices)
=> SELECT * FROM expenditures WHERE category = 'invoice', parent_id = 17;
id | category | parent_id
-----+----------------+----------
46 | invoice | 17
88 | invoice | 17
=> SELECT * FROM expenditures_items WHERE expenditure_id IN (46, 88) ;
id | amount | expenditure_id
-----+----------+---------------
23 | 500.00 | 46
24 | 1000.00 | 46
78 | 550.00 | 88
79 | 1100.00 | 88
Order 17 has two invoices (46 & 88) totalling $3,150.00 - this is the SUM of all the invoice expenditure_item amounts.
In the end I am looking for the SQL that gets me something like this:
=> SELECT * FROM expenditures WHERE category = 'purchase_order';
id | category | expenditure_total | invoice_total | percent
-----+----------------+-------------------+---------------+---------
17 | purchase_order | 3000.00 | 3150.00 | 5
45 | purchase_order | 4000.00 | 3000.00 | -25
75 | purchase_order | 7000.00 | 7000.00 | 0
99 | purchase_order | 10000.00 | 11100.00 | 11
percent is invoice_total / expenditure_total - 1.
I also need to (perhaps a HAVING clause) filter out only the results that have a percent > a threshold (say 10).
From all my searching this seems to be a sub query along with some joins but I am lost at this point.
UPDATED Further
I had another look - this is close:
SELECT DISTINCT expenditures.*, SUM( invoice_items.amount ) as invoiced_total FROM "expenditures" JOIN expenditures AS invoices ON invoices.category = 'invoice' AND expenditures.id = CAST( invoices.ancestry AS INT) JOIN expenditure_items ON expenditure_items.expenditure_id = expenditures.id JOIN expenditure_items AS invoice_items ON invoice_items.expenditure_id = invoices.id WHERE "expenditures"."category" IN ($1, $2) GROUP BY expenditures.id HAVING (( SUM( invoice_items.amount ) / SUM( expenditure_items.amount ) ) > 1.1 ) [["category", "work_order"], ["category", "purchase_order"]]
Here is the odd thing - the invoiced_total in the select works. I get the proper amounts as per my example. The issue seems to be in my HAVING where it only pulls the SUM on the first invoice.
UPDATE 3
Soooooo close:
SELECT DISTINCT
expenditures.*,
( SELECT
SUM(expenditure_items.amount)
FROM expenditure_items
WHERE expenditure_items.expenditure_id = expenditures.id ) AS order_total,
( SELECT
SUM(expenditure_items.amount)
FROM expenditure_items
JOIN expenditures invoices ON expenditure_items.expenditure_id = invoices.id
AND CAST (invoices.ancestry AS INT) = expenditures.id ) AS invoice_total
FROM "expenditures"
INNER JOIN "expenditure_items" ON "expenditure_items"."expenditure_id" = "expenditures"."id"
WHERE "expenditures"."category" IN ("work_order", "purchase_order")
The only thing I can't get is eliminate the expenditures that either have no invoices or that are over my 10% rule. The first was in my old solution with the original join - I can't seem to figure out how to sum on that join data.
step-by-step demo:db<>fiddle
I am sure, there is a better solution, but this one should work:
WITH cte AS (
SELECT
e.id,
e.category,
COALESCE(parent_id, e.id) AS parent_id,
ei.amount
FROM
expenditures e
JOIN
expenditures_items ei ON e.id = ei.expenditure_id
),
cte2 AS (
SELECT
id,
SUM(amount) FILTER (WHERE category = 'purchase_order') AS expentiture_total,
SUM(amount) FILTER (WHERE category = 'invoice') AS invoice_total
FROM (
SELECT
parent_id AS id,
category,
SUM(amount) AS amount
FROM cte
GROUP BY (parent_id, category)
) s
GROUP BY id
)
SELECT
*,
(invoice_total/expentiture_total - 1) * 100 AS percent
FROM
cte2
The first CTE joins the both tables. The COALESCE() function mirrors the id as parent_id if the record has none (if category = 'purchase_order'). This can be used to do one single GROUP on this id and the category.
This is done within the second CTE (most inner subquery). [Btw: I choose the CTE variant because I find it much more readable. In this case you could do all steps as subqueries of course.] This group sums up the different categories for each (parent_)id.
The outer subquery is doing a pivot. It shifts the different records per category into your expected result with the help of a GROUP BY and the FILTER clause (Have a look at this step in the fiddle to understand it). Don't worry about the SUM() function here. Because of the GROUP BY, one aggregation function is necessary, but it does nothing, because the grouping has been already done.
Last step is calculating the percent value out of the pivoted table.
Ruby 2.1.5
Rails 4.2.1
My model is contributions, with the following fields:
event, contributor, date, amount
The table would have something like this:
earth_day, joe, 2014-04-14, 400
earth_day, joe, 2015-05-19, 400
lung_day, joe, 2015-05-20, 800
earth_day, john, 2015-05-19, 600
lung_day, john, 2014-04-18, 900
lung_day, john, 2015-05-21, 900
I have built an index view that shows all these fields and I implemented code to sort (and reverse order) by clicking on the column titles in the Index view.
What I would to do is have the Index view displayed like this:
Event Contributor Total Rank
Where event is only listed once per contributor and the total is sum of all contributions for this event by the contributor and rank is how this contributor ranks relative to everyone else for this particular event.
I am toying with having a separate table where only a running tally is kept for each event/contributor and a piece of code to compute rank and re-insert it in the table, then use that table to drive views.
Can you think of a better approach?
Keeping a running tally is a fine option. Writes will slow down, but reads will be fast.
Another way is to create a database view, if you are using postgresql, something like:
-- Your table structure and data
create table whatever_table (event text, contributor text, amount int);
insert into whatever_table values ('e1', 'joe', 1);
insert into whatever_table values ('e2', 'joe', 1);
insert into whatever_table values ('e1', 'jim', 0);
insert into whatever_table values ('e1', 'joe', 1);
insert into whatever_table values ('e1', 'bob', 1);
-- Your view
create view event_summary as (
select
event,
contributor,
sum(amount) as total,
rank() over (order by sum(amount) desc) as rank
from whatever_table
group by event, contributor
);
-- Using the view
select * from event_summary order by rank;
event | contributor | total | rank
-------+-------------+-------+------
e1 | joe | 2 | 1
e1 | bob | 1 | 2
e2 | joe | 1 | 2
e1 | jim | 0 | 4
(4 rows)
Then you have an ActiveRecord class like:
class EventSummary < ActiveRecord::Base
self.table_name = :event_summary
end
and you can do stuff like EventSummary.order(rank: :desc) and so on. This won't slow down writes, but reads will be a little slower, depending on how much data you are working with.
Postgresql also has support for materialized views, which could give you the best of both worlds, assuming you can have a little bit of lag between when the data is entered and when the summary table is updated.
Currently sum function in Rails return 0.0, if provided columns data is null
============================================================================
For example:
Tablename: Price
id | name | Cost
-----------------
1 | A | 1200
2 | A | 2500
3 | A | 3000
4 | B | 5000
5 | B | 7000
6 | C |
Now,
Price.group(:name).sum(:cost)
return 6700, 12000, 0.0 , instead of 6700, 12000, nil.
So here I want nil if given columns value is 'null' or empty
SUM is ignoring null values, so columns with NULL values will always be 0 as zero + nothing is 0
To overcome this I have used condition like:
Price.where("cost IS NOT NULL).group(:name).sum(:cost)
This request will get non null cost values and sum them. After that, I can fill with NULL the cost columns of the other records.
This way I can make sure that if cost is actually 0.0 then I get sum(cost) as 0.0 instead of NULL.
Due to the implementation of sum in Rails, it is impossible to get the values before type cast.
However, you can get the expected values by selecting the values with raw SQL:
Price.group(:name).select('name, sum(cost) AS total_cost')
I have a model for stocks and a model for stock_price_history.
I want to mass insert with this
sqlstatement = "INSERT INTO stock_histories SELECT datapoint1 AS id,
datapoint2 AS `date` ...UNION SELECT datapoint9,10,11,12,13,14,15,16,
UNION SELECT datapoint 17... etc"
ActiveRecord::Base.connection.execute sqlstatement
However, I don't actually want to use datapoint1 AS id. If I leave it blank I get an error that my model has 10 fields and I'm inserting only 9 and that it is missing the primary key.
Is there a way to force an auto increment on the id when inserting by SQL?
Edit: Bonus question cause I'm a noob. I am developing in SQLite3 and deploying to a Posgres (i.e. Heroku), Will I need to modify the above mass insert statement so it's for a posgres database?
2nd edit: my initial question had Assets and AssetHistory instead of Stocks and Stock_Histories. I changed it to Stocks / Stock price histories because I thought it was more intuitive to understand. Therefore some answers refer to Asset Histories for this reason.
You can change your SQL and be more explicit about which fields you're inserting, and leave id out of the list:
insert into asset_histories (date) select datapoint2 as `date` ...etc
Here's a long real example:
jim=# create table test1 (id serial not null, date date not null, name text not null);
NOTICE: CREATE TABLE will create implicit sequence "test1_id_seq" for serial column "test1.id"
CREATE TABLE
jim=# create table test2 (id serial not null, date date not null, name text not null);
NOTICE: CREATE TABLE will create implicit sequence "test2_id_seq" for serial column "test2.id"
CREATE TABLE
jim=# insert into test1 (date, name) values (now(), 'jim');
INSERT 0 1
jim=# insert into test1 (date, name) values (now(), 'joe');
INSERT 0 1
jim=# insert into test1 (date, name) values (now(), 'bob');
INSERT 0 1
jim=# select * from test1;
id | date | name
----+------------+------
1 | 2013-03-14 | jim
2 | 2013-03-14 | joe
3 | 2013-03-14 | bob
(3 rows)
jim=# insert into test2 (date, name) select date, name from test1 where name <> 'jim';
INSERT 0 2
jim=# select * from test2;
id | date | name
----+------------+------
1 | 2013-03-14 | joe
2 | 2013-03-14 | bob
(2 rows)
As you can see, only the selected rows were inserted, and they were assigned new id values in table test2. You'll have to be explicit about all the fields you want to insert, and ensure that the ordering of the insert and the select match.
Having said all that, you might want to look into the activerecord-import gem, which makes this sort of thing a lot more Railsy. Assuming you have a bunch of new AssetHistory objects (not persisted yet), you could insert them all with:
asset_histories = []
asset_histories << AssetHistory.new date: some_date
asset_histories << AssetHistory.new date: some_other_date
AssetHistory.import asset_histories
That will generate a single efficient insert into the table, and handle the id for you. You'll still need to query some data and construct the objects, which may not be faster than doing it all with raw SQL, but may be a better alternative if you've already got the data in Ruby objects.
What query should I execute in MySQL database to get a result containing partial sums of source table?
For example when I have table:
Id|Val
1 | 1
2 | 2
3 | 3
4 | 4
I'd like to get result like this:
Id|Val
1 | 1
2 | 3 # 1+2
3 | 6 # 1+2+3
4 | 10 # 1+2+3+4
Right now I get this result with a stored procedure containing a cursor and while loops. I'd like to find a better way to do this.
You can do this by joining the table on itself. The SUM will add up all rows up to this row:
select cur.id, sum(prev.val)
from TheTable cur
left join TheTable prev
on cur.id >= prev.id
group by cur.id
MySQL also allows the use of user variables to calculate this, which is more efficient but considered something of a hack:
select
id
, #running_total := #running_total + val AS RunningTotal
from TheTable
SELECT l.Id, SUM(r.Val) AS Val
FROM your_table AS l
INNER JOIN your_table AS r
ON l.Val >= r.Val
GROUP BY l.Id
ORDER By l.Id