Sequential number based on repeating value in another column - Google Sheets - google-sheets

this one seems super simple but I'm having a tough time figuring it out, any help would be greatly appreciated.
I have repeating data in Column A, in Column B I need sequential numbering unless the previous row has a repeat value, in which case it would repeat that number in the sequence. Example below.
Is this possible in a single cell array formula?
Column A Column B
7648490 1
7634199 2
7631608 3
7620465 4
7620465 4
7616976 5
7601241 6
7601241 6
7601241 6
7601241 6
7599651 7
7597439 8
7597376 9
7596068 10
7596068 10
7596068 10
7596068 10
7596068 10
7596067 10

Delete everything from Col B (including the header) and place the following formula in B1:
=ArrayFormula({"Header";IF(A2:A="",,VLOOKUP(A2:A,{UNIQUE(FILTER(A2:A,A2:A<>"")),SEQUENCE(COUNTA(UNIQUE(FILTER(A2:A,A2:A<>""))))},2,FALSE))})
This will create header text (which you can change as you like within the formula itself) and will produce the result for each row.
The virtual array formed between the curly brackets { } creates a pairing of each UNIQUE value from Col A with an incremental SEQUENCE that starts at 1. Then VLOOKUP just finds each actual value from Col A within the virtual array and returns the SEQUENCE number.

Related

Google Sheet - It's possible to array sum function in the following condition?

Would it be possible to use arrayformular for this condition?
Sum all the rows that PID are the same, the result should be as in the image.
I tried this code, but I think it's too long, and if the PID exceed over 20 rows, it would not work.
=IF(A3<>A2,BJ3+IF(A3=A4,BJ4,0)+IF(A3=A5,BJ5,0)+IF(A3=A6,BJ6,0)+IF(A3=A7,BJ7,0)+IF(A3=A8,BJ8,0)+IF(A3=A9,BJ9,0)+IF(A3=A10,BJ10,0)+IF(A3=A11,BJ11,0)+IF(A3=A12,BJ12,0)+IF(A3=A13,BJ13,0)+IF(A3=A14,BJ14,0)+IF(A3=A15,BJ15,0)+IF(A3=A16,BJ16,0)+IF(A3=A17,BJ17,0)+IF(A3=A18,BJ18,0)+IF(A3=A19,BJ19,0)+IF(A3=A20,BJ20,0)+IF(A3=A21,BJ21,0)+IF(A3=A22,BJ22,0),0)
With a table like this :
ID
Value
1
5
1
10
2
5
2
10
2
15
You have an expected output of :
ID
Value
Sum
1
5
15
1
10
blank
2
5
30
2
10
blank
2
15
blank
It is achievable with this formula (just drag it in your sum column) :
=IF(A2=A1,"",SUMIFS(B$2:B$12,A$2:A$12,A2))
It check if the ids are the same and then sum them, but only show them on the row where the id first appears
Found it on google by searching google sheets sum group by
The following in C2 will generate the required answer without any copying-down required:
=arrayformula(if(len(A2:A),ifna(vlookup(row(A2:A),query({row(A2:B),A2:B},"select min(Col1),sum(Col3) where Col2 is not null group by Col2"),2,false)),))
We are making a lookup table of grouped sums against the first row of each 'P#' group using QUERY, then using VLOOKUP to distribute the group sums to the first row in each group. Probably also doable using a SCAN/OFFSET combination as well, I think.

Google Sheets - Embedded arrays with multiple columns under each other

I'd like to insert 2 column wide fields under each other. I tried with embedded arrays but was not successful.
So basically from:
a 1 e 5
b 2 f 6
c 3
I would like to get:
a 1
b 2
c 3
e 5
f 6
I tried with
={{A:A,B:B};{C:C,D:D}}
but could not get it working, however
={{A:A,B:B},{C:C,D:D}}
put the columns the same as they were so its intresting that with ; its not working.
The blocks are always 2 column wide but the rows are different length
Thanks for your help in advance!
Try:
=filter({A:B;C:D},{A:A;C:C}<>"")
This will return rows where Columns A or C are not blank.
Just under c assuming a is in A1:
=ArrayFormula(C1:D2)
You're not going to find a clean built-in formulaic solution to this one that doesn't utilize some sort of built-in magic auto expansion (like pnuts's answer). Here is my approach using OFFSET that will also work in Microsoft Excel.
In two columns, copy this formula.
=OFFSET($A$1,(ROW()-ROW($G$1))/2,IF(MOD(ROW()-ROW($G$1),2)=1,2,0))
In the second column, modify the formula, adding 1 to the column offset parameter:
=OFFSET($A$1,(ROW()-ROW($G$1))/2,1+IF(MOD(ROW()-ROW($G$1),2)=1,2,0))
where $A$1 is replaced with the address of the top left of your range and $G$1 is the starting location of your output range. This should be resistant to auto-update of formulas from range insertions and deletions (which I despise butchering my formulae and conditional formatting rules) by using only the bare number of references, which are all absolute.
This works by dividing the row offset from your starting position by 2 and rounding down (via an implicit cast to integer when used as a parameter to the OFFSET function) to get the row number of your input range. Then it shifts over 2 columns on every odd row to get data from the second column pair.
Note this is not a size-aware function, so it interweaves the second column pair:
a 1
e 5
b 2
f 6
c 3

Google Sheets Query Group By / First-N-Per-Group

I'm trying to find a simple solution for first-n-per-group.
I have a table of data, first column dates and rest data. I want to group based around the date, as multiple entries per date are allowed. For the second column some numbers, but want the FIRST record.
Currently the aggregate function I could possibly use is MIN() but that will return the lowest value and not the first.
A B
01/01/2018 10
01/01/2018 15
02/01/2018 10
02/01/2018 2
02/01/2018 100
02/01/2018 20
03/01/2018 5
03/01/2018 2
Desired output
A B
01/01/2018 10
02/01/2018 10
03/01/2018 5
Current results using MIN() - undesired
A B
01/01/2018 10
02/01/2018 2
03/01/2018 2
It's a shame there isn't a FIRST() aggregate function in Google Sheets, which would make this a lot easier.
I saw a couple of examples of using the Row Number and ArrayQuery, but that doesn't seem to work for me. There are about 5000 rows of data so trying to keep this as efficient as possible, and not have to recalculate the entire sheet on any change, each taking a few seconds.
Currently I have this, which appends a third column with the Row Number:
=query({A1:B, arrayformula(row(A1:B))}, "select min(Col1),min(Col2) group by Col1")
Thanks
EDIT 1
A suggested solution was =SORTN(A:B,2^99,2,1,1), which is a clean simple one. However, this requires a large range of "free space" to display the returned dataset. Imagine 3000+ rows.
I was hoping for a QUERY() -based solution, as I wanted to do further operations with the results. Specifically, count the occurrences of distinct values.
For example: I wanted a returned dataset of
A B
01/01/2018 10
02/01/2018 10
03/01/2018 5
Yet I want to count the occurrences of those values (and then ignoring the dates). For example:
B C
10 2
5 1
Perhaps I've confused the situation by using numbers? the "data" in ColB is TEXT (short 3 letter codes), however I used numbers to show I couldn't use MIN() function as that returns the numerically lowest value.
So in brief:
Go through all rows (3000+ rows) and group by the FIRST row of a particular date
return the FIRST value of that row
COUNT() all unique occurrences of those FIRST values, disregarding the date. Just a list with the unique values and their count (again, only the first one of any particular day)
=SORTN(A:B,2^99,2,1,1)
If your data is sorted as in the sample, You can easily remove duplicates with SORTN()

Excluding the last value in a range from an aggregate calculation in Google Sheets

I have a Google Sheet with two columns of data. A is monotonically increasing with many duplicates (based on a coarse timestamp), while B is essentially random. There are many empty rows at the bottom waiting for future data. It resembles the following:
A B
1 5 43
2 5 77
3 13 8
4 21 34
5 27 68
6 27 90
7
8
9
10
I'm trying to write a few formulae which examine all of the (non-empty) values in a column except for the last one. For example, I would like to find the maximum value of B excluding the latest value, so the result should be 77 from B2 instead of 90 from B6.
If the values in the range were strictly increasing and unique, I could filter the values of A into C, excluding any values equal to the maximum value (only the last entry), and then take the MAX(..) of that range. However, my data does not have that property; the final value could be duplicated and the duplicates would be inappropriately ignored.
C D E
1 =FILTER(A:A, A:A < MAX(A:A)) =MAX(C:C) This produces A4's 21 instead of A5's 27.
A similar approach would work if we had a third column of incrementing indices to use:
A B C D E
1 5 43 9 =MAX(FILTER(C:C, A:A <> "")) Value of index in last populated row.
2 5 77 10 =MAX(FILTER(A:A, C:C < D1)) Maximum value from a row with lower index.
3 13 8 11
4 21 34 12
5 27 68 13
6 27 90 14
7 15
8 16
9 17
10 18
But I'm looking for a solution that doesn't require modifying the original spreadsheet, because that's not always possible. I can't just create a new IndexSheet with nothing but an an index column and join it in like this instead...
A B C
1 5 43 =MAX(FILTER(IndexSheet!A:A, A:A <> ""))
2 5 77 =MAX(FILTER(A:A, IndexSheet!A:A < C1))
...
...because that requires that the IndexSheet have the same number of rows as the data sheet, and would break as more data is added.
Without modifying the original data sheet, or relying on properties of the data (beyond values being numeric and rows being empty or full), is there any way to perform an aggregate calculation on a range while excluding the last value?
You can use indirect and address formulas to create dynamic range excluding the last row
=max(indirect("A1:"&Address(count(A:A)-1,1)))
The count function gives the number of non empty cells in the column A. You subtract 1 to exclude the last row.
You use that number to build an address using "A1:"&address(row no, Col no) which in your example case should be A1:$A$5
Use this string to reference your cells using the indirect method indirect(A1:$A$5) and pass the reference to the max function to determine the max in that range.
From another sheet try:
=MAX(Sheet1!B1:indirect("Sheet1!B"&count(Sheet1!B:B)-1))
We can use the FILTER() and ROW() functions to accomplish this:
D
1 =MAX(FILTER(Data!A:A,
ROW(Data!A:A) < MAX(FILTER(ROW(Data!A:A),
Data!A:A <> ""))))
We use FILTER(ROW(DATA!A:A), Data!A:A <> "")) to get an array of row numbers of non-empty rows, and use MAX(...) to take the last row number. We use this to exclude the last row by filtering out values from lower row numbers with FILTER(Data!A:A, ROW(Data!A:A) < ...). We apply MAX(...) to this filtered array and get the result we were looking for.

Comparing a specific set of numbers

I have a google spreadsheet that has 6 cells with specific numbers in them. Every week, a series of numbers is entered in and I would like to flag the numbers in a separate column if they appear for that week. I was using the formula below where my numbers are in D2->I2 and the weekly ones would be in D18->I18 for example.
=arrayformula(sumproduct((D2:I2=D18:I18)))
Now, while this works, it's not quite what I'm trying to do. Unless the numbers match each other exactly, 1 2 3 4 5 6 to 1 2 3 4 5 6 then the addition doesn't happen. What I would like to have happen is that if, for example, the master column has 1 2 3 4 5 6 and the weekly column has 3 7 9 1 8 5 then the cell with the formula would display the value of 3 for matching three of the numbers that week.
Does anyone have a suggestion on how best to accomplish this?
See if this works ?
=ArrayFormula(sum(--regexmatch(D2:I2&"", join("|", D18:I18&""))))
with exclusion of empty cells in both ranges:
=iferror(ArrayFormula(sum(--regexmatch(to_text(filter(D2:I2, len(D2:I2))), "\b("&join("|", to_text(filter(D18:I18, len(D18:I18))))&")\b"))))

Resources