MySQL query that computes partial sums - stored-procedures

What query should I execute in MySQL database to get a result containing partial sums of source table?
For example when I have table:
Id|Val
1 | 1
2 | 2
3 | 3
4 | 4
I'd like to get result like this:
Id|Val
1 | 1
2 | 3 # 1+2
3 | 6 # 1+2+3
4 | 10 # 1+2+3+4
Right now I get this result with a stored procedure containing a cursor and while loops. I'd like to find a better way to do this.

You can do this by joining the table on itself. The SUM will add up all rows up to this row:
select cur.id, sum(prev.val)
from TheTable cur
left join TheTable prev
on cur.id >= prev.id
group by cur.id
MySQL also allows the use of user variables to calculate this, which is more efficient but considered something of a hack:
select
id
, #running_total := #running_total + val AS RunningTotal
from TheTable

SELECT l.Id, SUM(r.Val) AS Val
FROM your_table AS l
INNER JOIN your_table AS r
ON l.Val >= r.Val
GROUP BY l.Id
ORDER By l.Id

Related

In Rails- how do you get the distinct rows from X column and from that sum Y column

I have a model view controller in my rails app, which pulls in user entered data from the view and places it into a table in the database. The table named "table" that looks like this:
__X__|__Y__|__Created_At__
A | 1 | 2021-01-02
B | 5 | 2021-01-02
C | 3 | 2021-01-02
A | 4 | 2021-01-01
What I need is a function to find the unique values in the X column of the table (i.e. a singular A, B, C... in order of the most recently entered value... so in this case the value I'd want pulled for A would be the first one there since it was created most recently), then it needs to take those unique valued rows and pull the Y values and sum them together.
This is what I was trying but it didn't work:
Table.distinct(:x).sum(:y)
The issue is that seems to just be summing all values in Y, instead of disregarding the bottom duplicate A.
If it makes any difference I'm using Rails 6.0.0
I'm not sure I'm understanding this perfectly: you have a Table model with an X:string column. You want to get all instances of Table
Table.all
and return an array of them
Table.all.collect {|x| x }
You then want to sort them by created_at
Table.all.collect {|x| x }.sort_by{|x| x.created_at}
and keep the most recent ones. This should do everything so far:
Table.all.collect {|x| x }.sort_by{|x| x.created_at}.uniq
What do you mean by "take these unique valued rows and pull the Y values and sum them together"? Sum all the Y values? Or sum the Xs and the Ys? Are the X integers to be added?
If you're using postgres it supports "distinct on" which makes it relatively easy to do this directly in the database. That would look like:
SELECT SUM(distinct_on_x.Y)
FROM (
SELECT distinct ON (X) Y
FROM table
ORDER BY created_at DESC
) AS distinct_on_x
You can execute this directly with select_values, something like:
select_distinct_count_sql = <<-SQL
SELECT SUM(distinct_on_x.Y)
FROM (
SELECT distinct ON (X) Y
FROM table
ORDER BY created_at DESC
) AS distinct_on_x
SQL
Table.connection.select_values(select_distinct_count_sql)
If you just want the distinct rows so you can sum them in memory you can do:
Table.select("DISTINCT ON (X) *").order("X, created_at")

Complex Nested SUM / Sub-Select

UPDATED with sample data etc.
I am a bit over my head on this complex query. Some background: This is a rails app and I have expenditures model which has many expenditure_items which each has an amount column - these all sum up to a total for the related expenditure.
A given expenditure can be an Order which then can have multiple (or single or nil) related Invoice expenditures. I am looking for a single query that jives me all the orders which have total invoices and identify those that have invoices totalling more than a threshold (in my case 10%).
I get the idea from my searching that I need a sub-select here but I can't sort it out. I apologize as raw SQL is not my wheel house - normal Rails Active Record calls meet 99% of my needs.
Sample Data:
=> SELECT * FROM expenditures WHERE id = 17;
id | category | parent_id
-----+----------------+----------
17 | purchase_order |
=> SELECT * FROM expenditures_items WHERE expenditure_id = 17;
id | amount
-----+-------------
1 | 1000.00
2 | 2000.00
I need to obtain the SUM ( expenditures.amount ) in my result - the original order of $3,000.00.
Related Expenditures (invoices)
=> SELECT * FROM expenditures WHERE category = 'invoice', parent_id = 17;
id | category | parent_id
-----+----------------+----------
46 | invoice | 17
88 | invoice | 17
=> SELECT * FROM expenditures_items WHERE expenditure_id IN (46, 88) ;
id | amount | expenditure_id
-----+----------+---------------
23 | 500.00 | 46
24 | 1000.00 | 46
78 | 550.00 | 88
79 | 1100.00 | 88
Order 17 has two invoices (46 & 88) totalling $3,150.00 - this is the SUM of all the invoice expenditure_item amounts.
In the end I am looking for the SQL that gets me something like this:
=> SELECT * FROM expenditures WHERE category = 'purchase_order';
id | category | expenditure_total | invoice_total | percent
-----+----------------+-------------------+---------------+---------
17 | purchase_order | 3000.00 | 3150.00 | 5
45 | purchase_order | 4000.00 | 3000.00 | -25
75 | purchase_order | 7000.00 | 7000.00 | 0
99 | purchase_order | 10000.00 | 11100.00 | 11
percent is invoice_total / expenditure_total - 1.
I also need to (perhaps a HAVING clause) filter out only the results that have a percent > a threshold (say 10).
From all my searching this seems to be a sub query along with some joins but I am lost at this point.
UPDATED Further
I had another look - this is close:
SELECT DISTINCT expenditures.*, SUM( invoice_items.amount ) as invoiced_total FROM "expenditures" JOIN expenditures AS invoices ON invoices.category = 'invoice' AND expenditures.id = CAST( invoices.ancestry AS INT) JOIN expenditure_items ON expenditure_items.expenditure_id = expenditures.id JOIN expenditure_items AS invoice_items ON invoice_items.expenditure_id = invoices.id WHERE "expenditures"."category" IN ($1, $2) GROUP BY expenditures.id HAVING (( SUM( invoice_items.amount ) / SUM( expenditure_items.amount ) ) > 1.1 ) [["category", "work_order"], ["category", "purchase_order"]]
Here is the odd thing - the invoiced_total in the select works. I get the proper amounts as per my example. The issue seems to be in my HAVING where it only pulls the SUM on the first invoice.
UPDATE 3
Soooooo close:
SELECT DISTINCT
expenditures.*,
( SELECT
SUM(expenditure_items.amount)
FROM expenditure_items
WHERE expenditure_items.expenditure_id = expenditures.id ) AS order_total,
( SELECT
SUM(expenditure_items.amount)
FROM expenditure_items
JOIN expenditures invoices ON expenditure_items.expenditure_id = invoices.id
AND CAST (invoices.ancestry AS INT) = expenditures.id ) AS invoice_total
FROM "expenditures"
INNER JOIN "expenditure_items" ON "expenditure_items"."expenditure_id" = "expenditures"."id"
WHERE "expenditures"."category" IN ("work_order", "purchase_order")
The only thing I can't get is eliminate the expenditures that either have no invoices or that are over my 10% rule. The first was in my old solution with the original join - I can't seem to figure out how to sum on that join data.
step-by-step demo:db<>fiddle
I am sure, there is a better solution, but this one should work:
WITH cte AS (
SELECT
e.id,
e.category,
COALESCE(parent_id, e.id) AS parent_id,
ei.amount
FROM
expenditures e
JOIN
expenditures_items ei ON e.id = ei.expenditure_id
),
cte2 AS (
SELECT
id,
SUM(amount) FILTER (WHERE category = 'purchase_order') AS expentiture_total,
SUM(amount) FILTER (WHERE category = 'invoice') AS invoice_total
FROM (
SELECT
parent_id AS id,
category,
SUM(amount) AS amount
FROM cte
GROUP BY (parent_id, category)
) s
GROUP BY id
)
SELECT
*,
(invoice_total/expentiture_total - 1) * 100 AS percent
FROM
cte2
The first CTE joins the both tables. The COALESCE() function mirrors the id as parent_id if the record has none (if category = 'purchase_order'). This can be used to do one single GROUP on this id and the category.
This is done within the second CTE (most inner subquery). [Btw: I choose the CTE variant because I find it much more readable. In this case you could do all steps as subqueries of course.] This group sums up the different categories for each (parent_)id.
The outer subquery is doing a pivot. It shifts the different records per category into your expected result with the help of a GROUP BY and the FILTER clause (Have a look at this step in the fiddle to understand it). Don't worry about the SUM() function here. Because of the GROUP BY, one aggregation function is necessary, but it does nothing, because the grouping has been already done.
Last step is calculating the percent value out of the pivoted table.

Query only records with max value within a group

Say you have the following users table on PostgreSQL:
id | group_id | name | age
---|----------|---------|----
1 | 1 | adam | 10
2 | 1 | ben | 11
3 | 1 | charlie | 12 <-
3 | 2 | donnie | 20
4 | 2 | ewan | 21 <-
5 | 3 | fred | 30 <-
How can I query all columns only from the oldest user per group_id (those marked with an arrow)?
I've tried with group by, but keep hitting "users.id" must appear in the GROUP BY clause.
(Note: I have to work the query into a Rails AR model scope.)
After some digging, you can do use PostgreSQL's DISTINCT ON (col):
select distinct on (users.group_id) users.*
from users
order by users.group_id, users.age desc;
-- you might want to add extra column in ordering in case 2 users have the same age for same group_id
Translated in Rails, it would be:
User
.select('DISTINCT ON (users.group_id), users.*')
.order('users.group_id, users.age DESC')
Some doc about DISTINCT ON: https://www.postgresql.org/docs/9.3/sql-select.html#SQL-DISTINCT
Working example: https://www.db-fiddle.com/f/t4jeW4Sy91oxEfjMKYJpB1/0
You could use ROW_NUMBER/RANK(if ties are possible) windowed functions:
SELECT *
FROM (SELECT *,ROW_NUMBER() OVER(PARTITION BY group_id ORDER BY age DESC) AS rn
FROM tab) s
WHERE s.rn = 1;
you can use a subquery wuth aggreagated resul in join
select m.*
from users m
inner join (
select group_id, max(age) max_age
from users
group by group_id
) AS t on (t.group_id = m.group_id and t.max_age = m.age)

SQL lookup key defined by LAG function

I want to join two tables on a key based on LAG function. My query doesn't work though. I get an error:
Msg 4108, Level 15, State 1, Line 13 Windowed functions can only appear in the SELECT or ORDER BY clauses.
I shall appreciate any suggestion on how to tackle it.
**Table A**
Key
1
2
3
and so on...
**Table B**
MaxKey | Something
3 | A
5 | B
8 | C
**Expected Results**
Key|Something
1 A
2 A
3 A
4 B
5 B
6 C
SELECT
tabA.Key
,tabB.[Something]
,LAG (tabB.MaxKey,1,1) OVER (ORDER BY tabB.MaxKey) AS MinKey
,tabB.[MaxKey]
FROM TableA as tabA
LEFT JOIN TableB as tabB
ON tabA.Key > tabB.MinKey AND tabA.Key <= tabB.MaxKey
I think you can solve this using an outer apply like this:
select * from TableA a
outer apply (
select top 1 something
from TableB b
where b.maxkey >= a.[key]
) oa
Sample SQL Fiddle
Another option is to modify your query to do the lag in a derived table, I believe this might work too:
SELECT
tabA.[Key]
,tabB.[Something]
,MinKey
,tabB.[MaxKey]
FROM TableA as tabA
LEFT JOIN (
SELECT
[Something]
,LAG (MaxKey,1,1) OVER (ORDER BY MaxKey) AS MinKey
,[MaxKey]
FROM TableB) tabB
ON tabA.[key] >= tabB.MinKey AND tabA.[key] <= tabB.MaxKey
ORDER BY tabA.[key]

Left Joins that link to multiple rows only returning one

I'm trying to join two table (call them table1 and table2) but only return 1 entry for each match. In table2, there is a column called 'current' that is either 'y', 'n', or 'null'. I have left joined the two tables and put a where clause to get me the 'y' and 'null' instances, those are easy. I need help to get the rows that join to rows that only have a 'n' to return one instance of a 'none' or 'null'. Here is an example
table1
ID
1
2
3
table2
ID | table1ID | current
1 | 1 | y
2 | 2 | null
3 | 3 | n
4 | 3 | n
5 | 3 | n
My current query joins on table1.ID=table2.table1ID and then has a where clause (where table2.current = 'y' or table2.current = 'null') but that doesn't work when there is no 'y' and the value isn't 'null'.
Can someone come up with a query that would join the table like I have but get me all 3 records from table1 like this?
Query Return
ID | table2ID | current
1 | 1 | y
2 | null | null
3 | 3 | null or none
First off, I'm assuming the "null" values are actually strings and not the DB value NULL.
If so, this query below should work (notice the inclusing of the where criteria INSIDE the ON sub-clause)
select
table1.ID as ID
,table2.ID as table2ID
,table2.current
from table1 left outer join table2
on (table2.table1ID = table1.ID and
(table2.current in ('y','null'))
If this does work, I would STRONGLY recommend changing the "null" string value to something else as it is entirely misleading... you or some other developer will lose time debugging this in the future.
If "null" acutally refers to the null value, then change the above query to:
select
table1.ID as ID
,table2.ID as table2ID
,table2.current
from table1 left outer join table2
on (table2.table1ID = table1.ID and
(table2.current = 'y' or table2.current is null))
you need to decide which of the three rows from table2 with table1id = 3 you want:
3 | 3 | n
4 | 3 | n
5 | 3 | n
what's the criterion?
select t1.id
, t2.id
, case when t2.count_current > 0 then
t2.count_current
else
null
end as current
from table1 t1
left outer join
(
select id
, max(table1id)
, sum(case when current = 'y' then 1 else 0 end) as count_current
from table2
group by id
) t2
on t1.id = t2.table1id
although, as justsomebody has pointed out, this may not work as you expect once you have multiple rows with 'y' in your table 2.

Resources