Multiple column SUMS with single formula - google-sheets

I need to SUM multiple columns (sum for each column, not a total range sum) with a single formula. So the output would look something like this:
+-------+-------+------------+-----------+------------+
| 2019 | 2018 | 2017 | 2016 | 2015 |
+-------+-------+------------+-----------+------------+
| $0.00 | $0.00 | $4,341.00 | $0.00 | $5,281.00 |
| $0.00 | 0 | 0 | 0 | 0 |
| $0.00 | 0 | $10,805.00 | $2,865.00 | $8,295.00 |
| $0.00 | 0 | 0 | 0 | $233.00 |
+-------+-------+------------+-----------+------------+
| $0.00 | $0.00 | $15,146.00 | $2,865.00 | $13,809.00 |
+-------+-------+------------+-----------+------------+
I've tried several approaches (SUM,SUMIF,SUMIFS,MMULT), but can't seem to get it right. The closest I've come is with this formula from a website I found
=ArrayFormula(MMULT(B2:F5,(transpose(COLUMN(B1:F1)^0))))
I would also prefer to avoid the need for a 0 value as shown in the MMULT attempt below. But, if that's what it takes to make it work, so be it. But a blank value would be preferred. Am I attempting the impossible or just looking in the exact wrong direction?
My sheet

=ARRAYFORMULA(TRANSPOSE(MMULT(TRANSPOSE(IF(B2:5<>"", B2:5, 0)), ROW(B2:5)^0)))

Related

Looking to count zero values from right to left until non-zero values appears

I have a large table of monthly values.
I am looking to count the zero values from right to left, stopping once a non-zero value occurs.
I want the last column to display these values.
| JAN | FEB | MAY | APR | MAY | JUN | Value I need |
Ben | 10 | 10 | 10 | 0 | 0 | 0 | =3 |
Tim | 0 | 0 | 10 | 10 | 10 | 0 | =1 |
Susan | 0 | 0 | 5 | 10 | 0 | 10 | =0 |
Frank | 10 | 0 | 0 | 10 | 10 | 10 | =0 |
Many thanks for any help!
I don't think you need anything very sophisticated - just find last column which is non-zero:
=ArrayFormula(columns(B:G)-max(if(B2:G2>0,column(B:G)-column(A:A),0)))
try:
=ARRAYFORMULA(IF(A2:A="",,LEN(REGEXREPLACE(INDEX(SPLIT(TRANSPOSE(QUERY(TRANSPOSE(
IF(VLOOKUP(A2:A, A2:G, TRANSPOSE(SORT(TRANSPOSE(COLUMN(B:G)), 1, 0)), 0)=0,
"♦", "♥")),,9^9)), "♥", , 0),,1), "^ .+| |#.+", ))))

Data Warehouse dimension for schedules (Dimensional Modeling)

I have not found an example or a way of building a dimension that contains schedule attributes. For example, in my scenario I'm building a data warehouse that will help to gather analytics on podcast/radio show episodes.
We have the following:
dim_episode
dim_podcast_show
dim_date
fact_user_daily_activity
And I'm trying to add another dimension that contains schedule attributes about the podcast_show, for example, some shows air their episodes every day, others tuesdays and thursdays, others only saturdays.
dim_show_schedule (Option 1)
| schedule_key | show_key | time | sunday_flag | monday_flag | tuesday_flag | wednesday_flag | thursday_flag | friday_flag | saturday_flag |
|--------------|----------|-------|-------------|-------------|--------------|----------------|---------------|-------------|---------------|
| 1 | 0 | 00:30 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| 2 | 1 | 12:30 | 0 | 1 | 1 | 1 | 1 | 1 | 0 |
| 3 | 2 | 21:00 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
However, would it be better to have a bridge table with something like:
bridge_show_schedule (Option 2)
| show_key | day_key |
|----------|---------|
| 0 | 2 |
| 0 | 4 |
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 1 | 5 |
dim_show_schedule (Option 3) (suggested by #nsousa)
| schedule_key | show_key | time | day |
|--------------|----------|-------|-------------|
| 1 | 0 | 00:30 | tuesday |
| 1 | 0 | 00:30 | thursday |
| 2 | 1 | 12:30 | monday |
| 2 | 1 | 12:30 | tuesday |
| 2 | 1 | 12:30 | wednesday |
| 2 | 1 | 12:30 | thursday |
| 2 | 1 | 12:30 | friday |
| 3 | 2 | 21:00 | saturday |
I've searched in Kimball's Data warehouse lifecycle toolkit and could not find an example on this use case.
Any thoughts?
If you keep a dimension with a string attribute saying which days it’s on, e.g., “M,W,F”, the most entries you have are 2^7, 128. A bridge table is an unnecessary complication.
Option 1
You can create a scheduled dimension that has a unique record for every possible schedule (128 daily combinations) combined with every reasonable start time. Using 5 minute intervals would still be less than 37k rows which is trivial for a dimension.
Option 2
If you want to leverage a date dimension instead, create a "Scheduled" fact that relate the show dimension to the date dimension for that future date. This would be handled in your ETL process to map the relationship. Your date dimension should already have the week and day of week logic included. You could also leverage your Show duration attribute to create a semi-additive calculated measure to allow you to easily get the total programming for the period.
I would opt for Option 2 as it provides many more possibilities for analytics.

Time span accumulating fact tables design

I need to design a star schema to process order processing. The progress of an order look like this:
Customer C place an order on item I with quantity 100
Factory F1 take the order partially with quantity 30
Factory F2 take the order partially with quantity 20
Buy from market 50 items
F1 delivery 20 items
F1 delivery 7 items
F1 cancel the contract (we need to buy 3 more item from market)
F2 delivery 20 items
Buy from market 3 items
Complete the order
How can I design a fact table in this case, since the number of step is not fixed, the data types of event is not the same.
I'm sorry for my bad English.
The definition of an Accumulating Snapshot Fact table according to Kimball is:
summarizes the measurement events occurring at predictable steps between the beginning and the end of a process.
For this particular use case I would go with a Transaction Fact Table as the events (steps) are unpredictable, it is more like an event fact table, something similar to logs or audits.
| order_key | date_key | full_datetime | entity_key (customer, factory, etc. varchar) | entity_type | state | quantity |
|-----------|----------|---------------------|----------------------------------------------|-------------|----------|----------|
| 1 | 20190602 | 2019-06-02 04:30:00 | C1 | customer | request | 100 |
| 1 | 20190602 | 2019-06-02 05:30:00 | F1 | factory | receive | 30 |
| 1 | 20190602 | 2019-06-02 05:30:00 | F2 | factory | receive | 20 |
| 1 | 20190602 | 2019-06-02 05:40:00 | Company? | company | buy | 50 |
| 1 | 20190603 | 2019-06-03 06:40:00 | F1 | factory | deliver | 20 |
| 1 | 20190603 | 2019-06-03 02:40:00 | F1 | factory | deliver | 7 |
| 1 | 20190603 | 2019-06-03 04:40:00 | F1 | factory | deliver | 3 |
| 1 | 20190603 | 2019-06-03 06:40:00 | F1 | factory | cancel | |
| 1 | 20190604 | 2019-06-04 07:40:00 | F2 | factory | deliver | 20 |
| 1 | 20190604 | 2019-06-04 07:40:00 | Company? | company | buy | 3 |
| 1 | 20190604 | 2019-06-04 09:40:00 | Company? | company | complete | 100 |
I'm not sure about your reporting needs as they were not specified, but assuming you need to measure lag/durations of unpredictable steps, you could PIVOT and use dynamic SQL to create the required view
SQL Server dynamic PIVOT query?
Let me know if you came up with something different as I'm interested on this particular use case. Good luck

Automated way to create a confusion matrix in Google Sheets?

I have a table of this form in Google Sheets:
+---------+------------+--------+
| item_id | prediction | actual |
+---------+------------+--------+
| 1 | 1 | 1 |
| 2 | 1 | 1 |
| 3 | 1 | 0 |
| 4 | 0 | 1 |
| 5 | 0 | 0 |
| 6 | 1 | 1 |
+---------+------------+--------+
And I'd like to know if there's an automated way to get this kind of summary, with the counts of items that fit the criteria specified in that row/column combination:
+----------+--------------+--------------+-------+
| | prediction=0 | prediction=1 | total |
+----------+--------------+--------------+-------+
| actual=0 | 1 | 1 | 2 |
| actual=1 | 1 | 3 | 4 |
+----------+--------------+--------------+-------+
| total | 2 | 4 | |
+----------+--------------+--------------+-------+
I've been doing this somewhat manually in Google Sheets by using COUNTIFS, but I'm wondering if there's a built-in way? I tried using pivot tables, but couldn't figure out how to get the calculated fields to show the data I want.
A coworker figured it out - you can get this by creating a pivot table with the correct columns and rows, and setting the value to item_id summarized by COUNTUNIQUE.

Sum values in a range based on the date from another column above the cells

Here are my data :
+-------+------------+------------+------------+------------+------------+------------+
| Date | 01/01/2017 | 02/01/2017 | 03/01/2017 | 01/02/2017 | 02/02/2017 | 03/02/2017 |
+-------+------------+------------+------------+------------+------------+------------+
| Value | 1 | 0,5 | 0 | 2 | 0,5 | 1 |
+-------+------------+------------+------------+------------+------------+------------+
I trying to write a formula that would calculate all values for each month. So with my example right here I would get 1,5 for January and 3,5 for February.
I tried something with =SUMIF(), =OFFSET() and =MONTH() so that it would only sum the values that share the same month based on the date above them, but I tried everything I always get a syntax error.
Does anybody have an idea ? Is it even possible without doing scripts ?
Thank you very much and have a good day.
OK so I found a way with =FILTER() :
=SUM(FILTER(2:2;MONTH(1:1)=MONTH(XXX)))
Where XXX here is the month I want to calculate. In my case I do it from another sheet :
+---+------------------------------------------------------------------+------------------------------------------------------------------+------------------------------------------------------------------+
| | A | B | C |
+---+------------------------------------------------------------------+------------------------------------------------------------------+------------------------------------------------------------------+
| 1 | Jan. 2017 | Feb. 2017 | Mar. 2017 |
| 2 | =SUM(FILTER('OtherSheet'!2:2;MONTH('OtherSheet'!1:1)=MONTH(A1))) | =SUM(FILTER('OtherSheet'!2:2;MONTH('OtherSheet'!1:1)=MONTH(B1))) | =SUM(FILTER('OtherSheet'!2:2;MONTH('OtherSheet'!1:1)=MONTH(C1))) |
+---+------------------------------------------------------------------+------------------------------------------------------------------+------------------------------------------------------------------+
Which gives me :
+-----------+-----------+-----------+
| Jan. 2017 | Feb. 2017 | Mar. 2017 |
+-----------+-----------+-----------+
| 1,5 | 3,5 | 2 |
+-----------+-----------+-----------+

Resources