I Have 4 columns I am interested in creating a list. We collected weekly data from our third party vendor. We sort it by the DataCollection week. They do not always submit this data. So, there will be times where a Vender submitted one week but not the next. I need to have a running total of total enrollments by the collection week broken down by the name. I did the MAX function but that only gives me the latest date in the whole table, I want the max for each districts individual date. How do I accomplish this so that say, if the latest week is 2/21/2020 for Name A, and the latest week for Name b was 2/14/2020, I can have both dates and enrollment totals, because as it stands I get only the max date which is 2/21/2020 but the names of those other districts that submitted the data are not coming back.
The code below is what I have.
SELECT DATACOLLECTIONWEEK, NAME,DISTSCH,TOTALENROLLMENTS
FROM DB.SCHEMA.TEST
WHERE datacollectionweek = (SELECT MAX(datacollectionweek)
FROM DB.SCHEMA.TEST)
SELECT DATACOLLECTIONWEEK, NAME,DISTSCH,TOTALENROLLMENTS
FROM DB.SCHEMA.TEST as DB1
WHERE DB1.datacollectionweek = (SELECT MAX(datacollectionweek)
FROM DB.SCHEMA.TEST AS DB2
WHERE DB1.NAME = DB2.NAME)
Related
I am working with a dataset containing 22,232,726 entries collected between 2008 and 2021. Because original entries can not be deleted from the database, a new entry must be created with the same ID to update an observation.
I want to remove all repeated IDs leaving only the latest entry per ID for my analysis.
I used the following Level of Detail function in Tableau to achieve this:
{FIXED [ID]: MAX([Date])} = [Date]
The function returns a total of 17,980,416 entries. However, when I run a distinct count COUNTD([ID]) before and after applying the LOD filter, I get 17,899,956 distinct IDs. Why is my LOD function returning an extra 80,460 repeated IDs to the result?
FYI, there are no Nulls in the ID nor the Date columns. So there can be repeated dates for the same ID, but I expected Tableau to keep only one of them in the results. How can I remove these extra repeated entries or fix this counting problem?
I eventually found a solution to the problem by using a Row_ID field as the criterium for selecting one of the records with an identical ID and Date. I used 2 LOD calcs as filters.
The first filter kept all unique IDs with the latest Date, including some repeated IDs with the same latest date.
1:{FIXED [ID]: MAX([Date])} = [Date]
The second filter took the repeated records with identical ID and Date and kept only the one with the last Row_ID.
2:{FIXED [ID],[Date]: MAX([Row_ID])}=[Row_ID]
The original dataset doesn't have a Row_ID variable, so I had to create it by using Pandas in Python by adding index and index_label parameters:
df.to_csv("my-file-name.csv", index=True, index_label='Row_ID')
I am trying to create a report from another report(source sheet). :)
The source sheet updates daily automatically by inserting new rows with progress on sales on top of the rows completed a day before:
Date
Product
Units sold
11/15
A
35
11/15
B
12
11/15
C
18
11/14
A
30
11/14
C
11
11/14
B
10
11/13
F
88
11/12
B
7
11/12
A
29
11/12
C
10
11/11
C
8
11/11
A
29
11/11
B
3
The "Units sold" column is cumulative meaning that a newer record on a certain product will show a greater or equal value to a previous record on that specific product.
New products appear in the source sheet when entering the company and they disappear from it when they are sold out, pretty much randomly. (e.g. product "F" that showed up and sold out in the same day)
In the first column in the source report i already found a formula that concatenates date and product and is used by another reports.
To solve this, in the results report i made on column T the same concat of date and product. Then, in my new report, in the results column, i used the following formula: =vlookup(T2,Source!$A2:$C$10000,3,0)-vlookup(T2,Source!$A3:$C$10000,3,0) with the intention to obtain the difference between the amount of products sold in the last day vs the amount of products sold in the day before it, or, better said, the amount of each of the products sold on a specific date. Finally, by using a column of =year() and one of =month() applied on date column and a couple of pivot tables i was able to obtain the value of the daily increment for each and/or year.
The problem i couldn't find a solution for is that when the source sheet updates, the new rows added with the freshest data move down the cell references from the vlookup formula i used in the results sheet.
Please help me find a way to "pin down" the cell references in the vlookup formula without being affected by the new rows insertions.
Thank you!
to "pin down" the references you can use INDIRECT
example:
A1:A >>> INDIRECT("A1:A")
In google sheets, I have created the the table below:
Current Week:
4/12/21 (represented by: =TODAY()-WEEKDAY(TODAY(),3))
Most availability for current week:
Person 1
Team Member
4/12/21
4/19/21
Person 1
10%
25%
Person 2
5%
50%
What I am trying to do:
For the current week's column, automatically determine the max value and return the Team Member's name.
What I used thus far:
=INDEX(A8:A,MATCH(MAX(B8:B),B8:B,0))
This formula makes me manually select the column, I am hoping to have the column selected automatically based on the current week, which is why I have it represented above the table as it's own cell using the formula to identify the current week. Can this be done?
Assuming your table header row is 7 then this formula may work:
=INDEX(A8:A,MATCH(MAX(FILTER(B8:C,B7:C7=TODAY()-WEEKDAY(TODAY(),3))),FILTER(B8:C,B7:C7=TODAY()-WEEKDAY(TODAY(),3)),0))
I have the spreadsheet attached.
I'd like to find Client No from lookup sheet based on the date provided in the live sheet.
The same client can appear with a different client number, so i need to lookup the name and date (from live sheet) and find the corresponding client number in the lookup sheet where the date from live sheet falls between the 2 dates on the lookup sheet.
I hope this makes sense.
Any help appreciated.
Thank you
This might do what you're looking for.
=IFERROR(
QUERY(SORT(FILTER(Lookup!A$2:D,Lookup!C$2:C=B2,Lookup!A$2:A<=A2),1,0),
"SELECT * WHERE COL4 >= DATE '"&TEXT(A2,"YYYY-MM-DD")&"' LIMIT 1",0),
QUERY(SORT(FILTER(Lookup!A$2:D,Lookup!C$2:C=B2,Lookup!A$2:A<=A2),1,0),
"SELECT * LIMIT 1",0) )
I've added a tab Live-GK to your sheet, with this formula in C2. It has to be dragged down. There may be another approach where it can be done as an arrayformula, but I haven't figured that out.
Note that on my tab, I'm doing the lookups from Lookup-GK, since I could add more test data there. The above formula can be used as is, pasted into cell C2 in your Live tab.
Note that for debugging purposes, column H of my tab returns all of the columns, not just the client #, so the start and end dates can be verified.
Let me know if this helps you.
Explanation:
The inner filter selects all rows from the Lookup tab where:
i) the client name (column C in Lookup) matches the client name in column B (of Live), and,
ii) the start date (column A in Lookup) is less than or equal the client date in Live.
These records are sorted in descending date order.
Then the query selects the first record where the end date (column D in Lookup) is greater than the client date in Live.
If the Lookup record has no end date, this gives an error (empty query result) so IFERROR, a second query is run, but without the filtering by end date, selecting the one record with no end date, but an appropriate start date.
These seemed to work with the few test records I used. If there is a duplication of client dates, the first client # is returned. See client #1 and #7 in my test data. Some more error handling might be necessary if your client records might have overlapping date ranges, as CalculusWhiz asked.
I am putting an overtime sheet together that staff can show their availability for the Saturday on a table with name and date with a simple Y/N, I also have another table for the hours each person has accumulated.
Based on several staff members saying Y to their availability (we have two members of staff in) I would like two cells to display the name of staff that has the least number of hours to their name.
Column A is the name
Column B is the hours they have worked
Column C is the checkbox,
checked meaning they will work overtime.
The following formula will return the two willing to work overtime with the least hours.
=query(A:C,"select * where A is not null and C = TRUE order by B limit 2")