Stack multiple columns in a single column based on another column - google-sheets

In sheet1, I have something like this (A to D are headers - sample sheet here):
A
B
C
D
X
X1
X2
Y
Y1
Y2
Y3
Z
Z1
And in sheet2, I want something like this:
A
B
X
X1
X2
Y
Y1
Y1
Y1
Z
Z1
*Values in column A exists only on the first instance (it's not merged with the cells below it)
Sheet1 data comes from google form submissions, but we want to structure them as to the table sample in sheet2, where sheet1 columns B to D are stacked in sheet2 column B.
For now, we're using the following to merge columns B to D in a single cell aligned with the values in column A:
=ArrayFormula(Sheet1!A:A&CHAR(10)&Sheet1!B:B&CHAR(10)&Sheet1!C:C&CHAR(10)&Sheet1!C:C&CHAR(10)Sheet1!D:D)
However, this presents a lot of problems since line breaks would still be there even when there's no supposed second line, and that we have to manually update status of these items (since they're used for monitoring).
If we can have it line by line as what is expected, we'd be able to automate some of the tasks. We tried playing with QUERY, but to no avail (although I think it's possible via that function... not sure).
Hoping to get ideas from the community. Thanks!

I've added a new sheet ("Erik Help") with the following formula in A1:
=ArrayFormula({"Header 1", "Header 2";QUERY(SPLIT(FLATTEN({FILTER(INDIRECT("Sheet1!A2:A"),INDIRECT("Sheet1!A2:A")<>"")&"|"&FILTER(INDIRECT("Sheet1!B2:B"),INDIRECT("Sheet1!A2:A")<>""),IF(FILTER(ROW(INDIRECT("Sheet1!A2:A")),INDIRECT("Sheet1!A2:A")<>""),"")&"|"&FILTER(INDIRECT("Sheet1!C2:D"),INDIRECT("Sheet1!A2:A")<>"")}),"|",1,0),"Select * Where Col2 Is Not Null")})
This formula creates the two headers first.
You'll notice the heavy use of INDIRECT to reference ranges. This is because you'll have form data coming into that sheet; and if the formula doesn't have a way to "lock" ranges, those ranges will shift down one every time a new row is added onto the form-intake sheet. In most other applications, you can "lock" those ranges by using full-column references (e.g., A:A instead of A2:A). But given the specifics of what you're trying to do here, INDIRECT was the least complex approach. Keep in mind that, because INDIRECT is used, the references will not automatically change if you rename Sheet1 to something else (like "Form Responses 1"). You will need to change each reference manually. Or use FIND/REPLACE, select "Specific Range" set to the formula cell, and check the "Also search within formulas" box.
The idea here is that every value from A2:A is concatenated to every value in the same row of B2:D, with a pipe symbol between (as a SPLIT marker for later). Since you only want to see the Col-A values beside Col-B values, Col-A&"|"&Col-B are first processed alone; then a blank is appended instead of Col-A for everything else.
FILTER is used to only process rows for which there is data in Col-A.
SPLIT splits the combinations made (as described above) at the pipe symbol, forming two columns.
QUERY keeps only those results of the SPLIT that have something in the second column.

Related

ArrayFormula to calculate previous rows

I have 3 sheets:
Sheet1 - list of transactions (Account, Credit, Debit, Date)
Sheet2 - list of transactions (Account, Credit, Debit, Date)
Sheet3 (I plan to lock it) - combined list of transactions, sorted by Date
Sheet3 looks like:
I need to add 1 more column to Sheet3 to count current balance for certain row to be like:
I'm able to do this with formula:
=SUM(FILTER($B$2:$B$8, ROW($A$2:$A$8) <= ROW($A2), A$2:A$8=$A2)) - SUM(FILTER($C$2:$C$8, ROW($A$2:$A$8) <= ROW($A2), A$2:A$8=$A2))
But this one I need continuously drag down.
Question: Is there way convert this formula to ArrayFormula, to avoid dragging
In G2 on sheet 3 I entered
=ArrayFormula(if(A2:A="",,mmult((A2:A=transpose(A2:A))*(row(A2:A)>= TRANSPOSE(row(A2:A)))*(transpose(B2:B)-transpose(C2:C)),row(A2:A)^0)))
See if that works for you?
In Sheet3 row 1, put your headers.
In Sheet3!A2, put
=sort({filter(Sheet1!A2:D,not(isblank(Sheet1!A2:A)));filter(Sheet2!A2:D,not(isblank(Sheet2!A2:A))),4,true)
In Sheet3!E2, put
=mmult(transpose(arrayformula(arrayformula(array_constrain(A2:A,counta(A2:A),1)=transpose(array_constrain(A2:A,counta(A2:A),1)))
*arrayformula(array_constrain(row(A2:A),counta(A2:A),1)<=transpose(array_constrain(row(A2:A),counta(A2:A),1))))),
arrayformula(array_constrain(B2:B,counta(A2:A),1)-array_constrain(C2:C,counta(A2:A),1))
To see why, let's temporarily remove the array_constrain(...,counta(...),1) wrappings, which is meant to auto detect the last data row:
=mmult(transpose(arrayformula(arrayformula(A2:A9=transpose(A2:A9))
*arrayformula(row(A2:A9)<=transpose(row(A2:A9))))),
arrayformula(B2:B9-C2:C9))
arrayformula(B2:B9-C2:C9) are the running sums of column B - column C (ie. credit - debit). It is a column vector with the length of your data size.
We want to, for each row, 1) filter this vector by comparison to column A (ie. account name) & 2) filter this vector by whether the running sums are below or above the row in question.
arrayformula(A2:A9=transpose(A2:A9)) does 1). arrayformula(row(A2:A9)<=transpose(row(A2:A9))) does 2).
We want elementwise product between the 2 matrices in order to compose the filter. Hence, arrayformula(...*...).
The columns of our filters are meant to be applied to the running sums. To use matrix multiplication, we can keep the column vector of running sums as the post-multiplier; and transpose the filter matrix as pre-multiplier so that the rows of the transposed matrix are multiplied (ie. applied) to the running sums. Hence, mmult(transpose(...),...).
Add back the array_constrain trick. And we are done.
Feel free to experiment with alternate placings of arrayformula. But remember to keep the () brackets wherever you omit arrayformula. Example:
=arrayformula(mmult(transpose(((array_constrain(A2:A,counta(A2:A),1)=transpose(array_constrain(A2:A,counta(A2:A),1)))
*(array_constrain(row(A2:A),counta(A2:A),1)<=transpose(array_constrain(row(A2:A),counta(A2:A),1))))),
(array_constrain(B2:B,counta(A2:A),1)-array_constrain(C2:C,counta(A2:A),1))))
Nonetheless, the 1 formula solution is computationally inefficient compared to individually spread formula per cell. That is because, without mutating the formula per row, we are forced to compute the filters as full n-by-n matrices where n is your data size.
Whereas, if in E2 we put =sum(filter(B$2:B2-C$2:C2,A$2:A2=A2)) and spread to the end by double right-clicking the square on bottom right when you select E2, the formula mutates per row, saving the row index comparison entirely, and also cutting the comparison to column A logarithmically.
Granted, we probably shouldn't rely on Google Sheet for a large database (e.g. >100k entries). But even for thousands of entries, if you square the amount of computations required, getting the results in browser becomes impractically slow well before one may expect.

Google sheets lookup with the query function within an array

I have a following formula in my google sheets
=TEXTJOIN(" -- ",TRUE,QUERY('sheetName'!B2:F,"SELECT F WHERE B = '"&$A3&"'"))
The formula is in a different sheet, same workbook though, let's call it "sheetResult". Basically it looks-up values and returns them if there is a match. There are two things I would like to achieve with it further. I need it to be an array so that it applies to all of the rows and I need it to return only the unique values found, I have tried the following but it does not work.
=ARRAYFORMULA(IF(A2:A = "" , , TEXTJOIN(", ",TRUE,UNIQUE(QUERY('sheetName'!B2:F,"SELECT F WHERE B = '"&$A2&"'"))) )) --> not sure what syntax to use
I tried filter but filter just returns all of the info stacked up, need the formula to return the data considering the rows in which the lookup value is held.
EDIT: Added a link to shared file to better describe the question.
I want to make the formula in Y3 on the "Students" sheet apply to all of the cells below it, much like an array formula does.
Example
After further studying your situation I came with a simple fix based on your original formula. I understand that you want to apply the Y3 formula to the whole table, but without altering its behaviour. I assume that the only moving part would be the students ID (Column A). Then you only need to modify your formula to lock the fixed variables with something like:
=TEXTJOIN(" -- ",TRUE,UNIQUE(QUERY('.data'!$B$2:$F,"SELECT F WHERE B = '"&A3&"'")))
After you write that on Y3 you would need to select it and drag it down to fill the table. Please leave a comment if you need further help.

How do I use minifs with arrayformula so it autopopulates whole column?

Google sheets user here.
I am using the formula minifs to return the lowest match (out of multiple possible match). Is there a way I can use arrayformula as well to auto-populate an entire column so I don't need to copy the same formula to an entire column?
Sample data below:
Column D and J are data manually inputted. Column I is the formula(s).
Essentially what I want to do here is:
Look at Column D - sees the name "Tom"
Sees that "Tom" has 3 scores 100, 90, 70 in Column J
Formula slaps "70" back into Column I because that is the lowest score
Repeats logic for "John" and "Mary"
Note: The actual data type for column J and I is a date instead of a number. But it is easier to illustrate the problem this way.
So I can do this elegantly with the formula: =minifs(J:J,D:D,D2) and D3,D4,D5,D6...etc.
However, I will have to manually drag the formula to the entire column. This is a problem because my colleagues often insert rows in between (and forget to copy n paste the formula to Column I), is there a way I can auto-populate the entire column like I could with an arrayformula?
Assuming your data are A2:C, you can get the min or max of each row by this way: (you can also add a condition in query)
=query(transpose(query(transpose(A2:C),"select " & "min(Col"&arrayformula(textjoin("),min(Col",,row(A2:C)-1))&")")),"select Col2")
https://docs.google.com/spreadsheets/d/1Ia05jywxlvT2amFDG4vQhYOd0lo68FKdOY733MzU-MQ/copy

google sheets, use formula output for next formula

I'm trying to CONCATENATE two cells in order to compare the results so that I can search by them, however the values of the two CONCATENATE outputs are different as one inputs is coming from the another formula.
Screen shots attached
I'm basically trying to compare the start time and channel number from A and B, with the data from G and H, so that I can update D with the relevant information in F (in the same format as A).
I first convert the EPOC time to human time readable, but when i try and CONCATENATE with the channel number, I get a different value to when i do that with A and B.
formula for c2 =CONCATENATE(A2,B2)
formula for i2 =G2/86400000+date(1970,1,1)
formula for k2 =CONCATENATE(G2,H2)
As you can see, the values for c2 and k2 are different event though a2 and i2 are the same (looking).
I've tried using CELL, INDEX, and INDIRECT but just can't seem to get it right, and I've tried various formatting options
Hopefully i've explained this right. Any solution welcome
raw data csv here START ,CHANNEL,concat,end?,,EndDateTime epoc,startDateTime epoc,channel,converted start,converted end,concat
12:58:00 AM,10,,,,1520391600000,1520382480000,7,,,
12:28:00 AM,7,,,,1520395200000,1520384280000,10,,,
So you have a couple of issues here.
CONCATENATE(A2,B2) will never equal CONCATENATE(I2,H2) because the values in A2 (12:58) and B2 (10) do not equal the values in I2 (12:28) and H2 (7). I think you meant to compare A2,B2 to I3,H3
A2 (12:58) does not equal I3 (12:58). You'll see this for yourself if you convert both to the date or number formats. The date value of A2 is 12/30/1899, the default when you enter only a time in the cell. The date value of I3 is 3/7/2018, because you converted the exact date and time from the EPOCH value.
For the two concatenations to equal each other, you need to resolve the issues above. You can do this by adding a date to column A's values.
On another note, I think there are better ways of populating column D based on the data in column F. A simple Vlookup should do the trick, once you resolve issue #2 above.

Google Spreadsheet sum which always ends on the cell above

How to create a Google Spreadsheet sum() which always ends on the cell above, even when new cells are added? I have several such calculations to make on each single column so solutions like this won't help.
Example:
On column B, I have several dynamic ranges which has to be summed. B1..B9 should be summed on B10, and B11..B19 should be summed on B20. I have tens such calculations to make. Every now and then, I add rows below the last summed row , and I want them to be added to the sum. I add a new row (call it 9.1) before row 10, and a new raw (let's call it 19.1) before row 20. I want B10 to contain the sum of B1 through B9.1 and B20 to contain the sum of B11:B19.1.
On excel, I have the offset function which does it like charm. But how to do it with google spreadsheet? I tried to use formulas like this:
=SUM(B1:INDIRECT(address(row()-1,column(),false))) # Formula on B10
=SUM(B11:INDIRECT(address(row()-1,column(),false))) # Formula on B20
But on Google Spreadsheet, all it gives is a #name error.
I wasted hours trying to find a solution, maybe someone can calp?
Please advise
Amnon
You are probably looking for formula like:
=SUM(INDIRECT("B1:"&ADDRESS(ROW()-1,COLUMN(),4)))
Google Spreadsheet INDIRECT returns reference to a cell or area, while - from what I recall - Excel INDIRECT returns always reference to a cell.
Given Google's INDIRECT indeed has some hard time when you try to use it inside SUM as cell reference, what you want is to feed SUM with whole range to be summed up in e.g. a1 notation: "B1:BX".
You get the address you want in the same way as in EXCEL (note "4" here for row/column relative, by default Google INDIRECT returns absolute):
ADDRESS(ROW()-1,COLUMN(),4)
and than use it to prepare range string for SUM function by concatenating with starting cell.
"B1:"&
and wrap it up with INDIRECT, which will return area to be sum up.
REFERRING TO BELOW ANSWER from Druvision (I cant comment yet, I didn't want to multiply answers)
Instead of time consuming formulas corrections each time row is inserted/deleted to make all look like:
=SUM(INDIRECT(ADDRESS(ROW()-9,COLUMN(),4)&":"&ADDRESS(ROW()-1,COLUMN(),4)))
You can spare one column in separate sheet for holding variables (let's name it "def"), let's say Z, to define starting points e.g.
in Z1 write "B1"
in Z2 write "B11"
etc.
and than use it as variable in your sum by using INDEX:
SUM(INDIRECT(INDEX(def!Z:Z,1,1)&":"&ADDRESS(ROW()-1,COLUMN(),4))) - sums from B1 to calculated row, since in Z1 we have "B1" ( the 1,1 in INDEX(...,1,1) )
SUM(INDIRECT(INDEX(def!Z:Z,2,1)&":"&ADDRESS(ROW()-1,COLUMN(),4))) - sums from B11 to calculated row, since in Z2 we have "B11" ( the 2,1 in INDEX(...,2,1) )
please note:
Separate sheet named 'def' - you don't want row insert/delete influence that data, thus keep it on side. Useful for adding some validation lists, other stuff you need in your formulas.
"Z:Z" notation - whole column. You said you had a lot of such formulas ;)
Thus you preserve flexibility of defining starting cell for each of your formulas, which is not influenced by calculation sheet changes.
By the way, wouldn't it be easier to write custom function/script summing up all rows above cell? If you feel like javascripting, from what I recall, google spreadsheet has now nice script editor. You can make a function called e.g. sumRowsAboveMe() and than just use it in your sheet like =sumRowsAboveMe() in sheet cell.
Note: you might have to replace commas by semicolons
NOTE
After testing this answer, it will only work if the sum is in a different column due to a circular dependency error. Otherwise, the solution is valid.
It's a bit of algebra, but we can take advantage of Spreadsheets' lower right corner drag.
=SUM(X:X) - SUM(X2:X)
Where X is the column you are working with and X2 is your ending point. Drag the formula down and Sheets will increment the X2, thus changing the ending point.
*You mentioned that you had tens of such calculations to make. So in order to fit your exact need, we would subtract your last summation to get that "middle" range that we wanted.
e.g.
B1..B9 should be summed on B10, and B11..B19 should be summed on B20
Because of the circular dependency error mentioned earlier, I can't solve it exactly and put the sum on the same line, but this could work in other cases where the sum needs to be stored in a different column.
=SUM(B:B) - SUM(B9:B) //Formula on C10 (Sum of B1..B9)
=SUM(B:B) - SUM(B19:B) - B10 // Formula on C20 (Sum of B11..B19)
This is based on #PsychoFish, here is the solution:
=SUM(INDIRECT(SUBSTITUTE(ADDRESS(1,COLUMN(),4),"1","")&"3:"&ADDRESS(ROW()-1,COLUMN(),4)))
Simply replace the "3:" for the row to start sum.
#PsychoFish is correct but cannot be dragged and copied since the column is literal and hard coded, and #Druvision was in the right direction but was wrong... basically ended up with the same issue of having to re-enter the ranges and then sliding the formulas over and over.
You guys are making this harder than you have to. I just leave a couple of empty rows above by "sum" row (you can format them to be filled with color or something to keep them from being inadvertently used), then just add your new rows just above those special rows.
Agree with what user7255446 said that everyone is overcomplicating. Keep one row blank before your sum row. And then whenever you want to insert a new row, click on your blank row and use "Insert row ABOVE" instead of "insert row below". Your sum formula will automatically adjust.
Example: I want to sum from B1 to B19. I leave row 20 blank. In cell B21, put =SUM(B1:B20). Then if you ever need to insert a new row, click on row 20 and choose "Insert row above". The sum formula automatically changes to =SUM(B1:B21) for you. And of course your sum cell is now B22.
General syntax:
=SUM(INDIRECT(cell_reference_as_string1 &":"& cell_reference_as_string2)
with for example:
cell_reference_as_string1 = ADDRESS(ROW(),COLUMN(),4)
cell_reference_as_string2 = ADDRESS(ROW()-1,COLUMN(),4)
I like how #abernier describes the general solution. So far only alphabet-based A1 notation (A being first column, 1 being first row) are being used. It keeps confusing me, especially when thinking of number of columns left of another column. I like the number-based R1C1 notation much better. To use R1C1 notation for INDIRECT, you need to pass FALSE like so:
=SUM(INDIRECT("R1C"&COLUMN()&":R"&(ROW()-1)&"C"&COLUMN(), FALSE))
I hope you find that helpful, too.
OFFSET() can be used/abused for this purpose. Give it the absolute address of the top left of the range, 0 and 0 for the row/column offsets, and the height/width of the range. Let OFFSET() be the argument to SUM(), SUMIF(), etc.
ROW() and COLUMN() are handy when computing the desired height/width. Be sure to remember to subtract one to exclude the current row/column, or else you're liable to end up with a circular reference. If you have header rows/columns, subtract for them too.
For example, to sum everything from A2 down, excluding the current row, try:
=SUM(OFFSET($A$2,0,0,ROW()-2,1))
To sum everything to the left of the current cell, wherever it may be, try:
=SUM(OFFSET(INDIRECT("RC1",FALSE),0,0,1,COLUMN()-1))
Now let's flip things upside down, to show that this works in the other direction. Suppose you want to sum the B column, starting below the current row, until (and including) row #10. Try this:
=SUM(OFFSET($B$10,ROW()-9,0,10-ROW(),1))
You can avoid negative offsets, while still summing column B:
=SUM(OFFSET(INDIRECT("RC2",FALSE),1,0,10-ROW(),1))
Remove the "2" to instead sum the current column:
=SUM(OFFSET(INDIRECT("RC",FALSE),1,0,10-ROW(),1))
(Credit to Tom Sharpe, who commented above.) INDEX() can be used in a range expression. You might prefer this over OFFSET(), so I'm putting it here. The following sums everything from G1 down to the row above the current:
=SUM(G1:INDEX(G:G,ROW()-1))
Here's how I do it.
This formula does not require you to edit or enter anything about the particular column you would like to sum
=SUM(INDIRECT(CONCATENATE(address(1,column(),4),":",LEFT(address(1,column(),4),1))&ROW()-1))
The answer by #PsychoFish led me in the correct way.
The only issue that I had to rewrite the formula again from each column and each sum. So here is the improved formula, which sums the previous 9 cells on the same column, without hardcoding the column or row numbers:
=SUM(INDIRECT(ADDRESS(ROW()-9,COLUMN(),4)&":"&ADDRESS(ROW()-1,COLUMN(),4)))
The only issue is that I had to rewrite the formulas if someone adds or deletes a row. In this case I should change 9 to 10 or 8 corrspondingly.

Resources