How can I make sure every change to dimension is captured in SCD2? - data-warehouse

I work for a finance company. We need to track exact value dimensions at the time of the transaction. We try to load data incrementally into the warehouse ~ 15 mins, and in this period, we could see a dimension with the exact business key change multiple times (multiple records are collected). Usually, we write scripts to pick the latest of all the changes in 15 min window. But in our case, I want all those changes to be loaded into dimension table. How can this be implemented?
EDIT:
Examples in same Batch:
Business Key, Name, email (scd 2), Created_at
1, xyz, xyz#gmail.com, 1/1/21 10:00 AM
1, xyz, abc#gmail.com, 1/1/21 10:05 AM
Expected changes in dimension
SK, BK, Name, Email, Effective_date, Expiration_date, Current
1, 1, efg#gmail.com, 01/01/1900 0:00 AM, 1/1/21 9:59 AM, N
--- New changes from batch ------
2, 1, xyz#gmail.com, 01/01/2021 10:00 AM, 01/01/2021 10:05 AM, N
3, 1, abc#gmail.com, 01/01/2021 10:05 AM, 12/31/9999 00:00 AM, Y

Related

Time function in sheets

I have the data of 4000 employees in google sheets along with their shift timings (9 hour long shift) spread across 24 hours. I wish to use a formula to understand the most common timing these employees are available in the office (09:00 to 18:00). My results would be 09:00 to 11:00, 11:00 to 13:00, 13:00 to 15:00, 15:00 to 18:00, 18:00 to 22:00, 22:00 to 09:00.
I could have used this formula to derive to the value:
=IF(AND(TIMEVALUE(A2)>=TIMEVALUE("09:00"), TIMEVALUE(A2)<=TIMEVALUE("11:00")), "09:00 to 11:00",
IF(AND(TIMEVALUE(A2)>=TIMEVALUE("11:00"), TIMEVALUE(A2)<=TIMEVALUE("13:00")), "11:00 to 13:00",
IF(AND(TIMEVALUE(A2)>=TIMEVALUE("13:00"), TIMEVALUE(A2)<=TIMEVALUE("15:00")), "13:00 to 15:00",
IF(AND(TIMEVALUE(A2)>=TIMEVALUE("15:00"), TIMEVALUE(A2)<=TIMEVALUE("18:00")), "15:00 to 18:00",
IF(AND(TIMEVALUE(A2)>=TIMEVALUE("18:00"), TIMEVALUE(A2)<=TIMEVALUE("22:00")), "18:00 to 22:00", "22:00 to 09:00")))))
but the problem is the timings are not in the time format but they are in text format
Here's my take:
Suppose Column A has clock ins, and Column B has clock outs. Let Column D have Times starting at 00:00 and going up to 33:00 (8am next day) in 5 minute (or 30, 60 etc) increments.
Let column E be the amount of clock in and outs that an employee was in the office at the time referred to in E.
We will define E to be =COUNTIFS($A$2:$A$9999,"<="&D2,$B$2:$B$9999,">="&D2).
Next, apply some conditional formatting to highlight the most busy times.
Note that you will need only the times of day, which it sounds like you have, but you will need to convert overnight shifts to not wrap around midnight.

Google Sheets: every month on a specific date increase a cell number from a number in another cell

Keeping track of finances!
Situation: every month I have a Direct Debit that moves an amount (e.g. £25.00) from one bank account to another to pay a bill.
Sheets: Every month on a specific date I want to automatically increase a specific cell that starts at £0.00 with the amount £25.00 that is in another cell.
Example:
A1 - Netflix
B1 - £25.00
C1 - Netflix Payments
D1 - £25.00 on Jan 1st, £50.00 on Feb 1st, £75.00 on Mar 1st etc
In D1:
=B1*(DATEDIF(DATEVALUE("Jan 1, 2022"),TODAY(),"m")+1)
This will continue to multiply the value in B1 by the number of full months since the start date of January 1, 2022. That is, if the B1 value is 25, then today, the D1 value will be 25; on February 1, 2022, it will automatically read 50, on March 1 it will read 75, ad infinitum.
Use now() formatted to month (as a number) and multiply the 25 by that result.
Jan is 1, Feb is 2, March is 3 etc
So 1 * 25 = 25, 2 * 25 = 50...

Google Sheets query output changes upon closing and reopening workbook

I have a workbook where I track game stats for my local community. I added a chart that changes upon a few selections and I use filter to get the desired result. The data comes from a sheet where I use query to calculate month to month differences (since I could not find this easily done with google's provided pivot options). One of the query's looks like this
=query('Response Edits'!1:1112,"select A,B,C WHERE A IS NOT NULL AND NOT H matches '"&textjoin("|",TRUE,query('Response Edits'!1:1112,"select min(H) WHERE A IS NOT NULL group by D",0))&"' order by D, C ASC",0)
A converts the month value in the timestamp to the correct survey month (e.g. a 2020-07-01 would be for 06 survey and 2020-07-29 would
be for 07 survey)
B converts the year value in the timestamp to the correct survey year
C is the timestamp of the survey submission
D is the player name
H is the player XP of the survey submission (I use this as a lazy solution since it only increases and because I could not figure out a
way to include the key phrase date using multiple datetime e.g.
NOT C matches date texjoin("|",TRUE,"select min(C)...") did not work)
the textjoin is just to remove the earliest date submitted because it would not have a month to month value. Here is a portion of the output of the query above and another query which I believe is correct:
7 2020 2020-07-31 23:18:48 ... 6873449 198 11610
8 2020 2020-08-31 22:15:53 ... 7789713 175 8732
9 2020 2020-09-30 23:03:12 ... 5994347 139 8932
When I close the the sheet and reopen it I notice that my chart has only 0 values because my sheet with the query functions is only outputting 0. The above query and my other query have also given a different output, which I have provided a portion for below:
6 2020 2020-06-30 22:04:02 ... 0 0 0
7 2020 2020-07-31 23:18:48 ... 0 0 0
8 2020 2020-08-31 22:15:53 ... 0 0 0
9 2020 2020-09-30 23:03:12 ... 0 0 0
I am new to using query, but the formula seems correct, because if I change the last 0 in the formula (which is the option for header) to 1 and then back to 0 I get the desired result.
Tl;dr Why does the queried data not output correctly when I close and reopen a workbook? And why does it output correctly after the formula is changed and changed back (including selecting undo)? Is it potentially textjoin or matches causing the problem in the query?
try to run this:
=QUERY('Response Edits'!A1:H1112,
"select A,B,C
where A is not null
and not H matches '"&TEXTJOIN("|", 1,
QUERY('Response Edits'!A1:H1112,
"select min(H)
where A is not null group by D", 0))&"'
order by D, C", 0)

Get max change between rows, ignoring empty cells at end of list

I have a spreadsheet where I'm tracking my net worth over time. Once a month, I add in my account balances.
In one sheet, I have this structure:
Date
Year
Net Worth
Account1
Account2
Account3
Jan 31, 2021
2021
$320
$200
$140
-$20
Feb 28, 2021
2021
$340
$200
$150
-$10
Mar 31, 2021
2021
$410
$250
$200
-$40
Apr 30, 2021
May 31, 2021
...rest of months for the year
The formula in the Year column is =if(C3<>"", year(A3), "").
The formula in the Net worth column is =if(sum(D3:F3)<>0, sum(D3:F3), "").
The Problem:
I'd like to have a cell which lists the greatest 1 month change (so $410 - $340 = $70), without having to update the formula every month. (In an ideal world, I never need to touch it again, only ever having to enter account balances once a month.)
What I've got so far:
=if(
abs(min(ArrayFormula(C3:C400 - C2:C399)))=max(ArrayFormula(abs(C3:C400 - C2:C399))),
min(ArrayFormula(C3:C400 - C2:C399)),
max(ArrayFormula(C3:C400 - C2:C399))
)
However, this includes the change from $410 to "", which is coerced to $0. So instead of the expected $70, I'm instead getting $410.
How can I get the greatest 1 month change, but ignore the empty string values?
Easiest way to fix it is just to put in an if clause I think:
=ArrayFormula(max(if(C3:C400<>"",abs(C3:C400-C2:C399),)))
because Max will ignore the empty string generated by the If statement
or slightly shorter:
=ArrayFormula(max((C3:C400<>"")*abs(C3:C400-C2:C399)))
so that the change for any empty cells is set to zero.
These assume that C2 itself is not blank!

VLOOKUP in a FILTER-ed range while automatically adding new rows

I have an easy-to-append monthly purchase log:
month prod count
-----------------
jan water 10
jan bread 20
feb bread 2
feb water 1
And I want to get a friendlier summary table:
prod jan feb
-------------
water 10 1
bread 20 2
Any idea how I can get this raport with new months in log appearing automatically as new columns?
I managed to get the month heads with a =ArrayFormula(TRANSPOSE(UNIQUE(FILTER(log!A2:A, log!A2:A<>"")))) and I am ok with entering the prod column by hand but I only managed to have a formula per column for count. And that means I need to drag the formula with each new month added to the log...
Any ideas? Thanks!
Try this formula:
=QUERY(A:C,"select B, sum(C) where A <> '' group by B pivot A")
See more info here:
https://developers.google.com/chart/interactive/docs/querylanguage
Use number of months instead of names to get 1, 2, 3 from feb, jan ordered alphabetically

Resources