Google sheets query to remove header not working with offset - google-sheets

Can't figure out why some of these queries work and some don't. Just trying to build a table of my own
=query(TRANSPOSE(ImportHtml(C7, "table",1)),"select * limit 1 offset 1")
=query(TRANSPOSE(ImportHtml(C2, "table",1)),"select * limit 1 offset 1")
Where c2 = http://www.vitalmtb.com/product/compare/2819 and C7 = http://www.vitalmtb.com/product/compare/2775
The ones not working bring back the row of headers, when I want to remove the headers and leave the data.

The query formula has the third, optional parameter: the number of header rows. If it's not provided, a guess is made to what rows are headers (usually the top one is). In your first example, its guess is that there is 1 header row.
The header row is always a part of the array returned by query, which uses it to label the returned columns (unless you override that by providing different headers with label). This is why you can't get rid of it by changing offset: the offset determines with data rows to return; the header row is present regardless.
It you want to get rid of headers, set the third parameter to 0, so that the headers are treated as data. Then offset will offset past them:
=query(TRANSPOSE(ImportHtml(C2, "table", 1)), "select * limit 1 offset 1", 0)

Related

Enumerate rows in google sheet

I need to enumerate rows in a google sheet to use as simple unique ids. When I add a new row, I want it to get assigned the next number that hasn't been used. The problem is that I need to be able to delete a row, and not have any of the ids change. So if I had rows 1 through 5 enumerated, and I deleted row 3, I would expect to have this:
DATA ID
A 1
B 2
D 4
E 5
I can easily make a function that enumerates numbers for each row, but I don't know how to make that number immutable once created the first time. Also, if I were to delete the last row (ID 5 above) and then add another row, I don't know how to ensure that the new row's id would be 6 instead of a new 5. Thank you.
In my case all I want is to be sure to have unique numbers that won't repeat even if rows or data get added or subtracted. So I just used this formula. In my case I only had a few hundred rows, but you could increase the "1000" to whatever power of ten is going to be bigger than your total number of rows:
=int((now() - DATE(2021, 7, 19)) * 86400) * 1000 + row()
As #MattKing pointed out, this will recalculate if you do something like insert a column before this function, or even just reopen the spreadsheet. So you can tweak the cell to be self-referential. It's not really recursive, but you do have to turn on "Iterative Calculation" for this to work. Go to:
File -> Speadsheet Settings -> Calculation -> set "Iterative Calculation" to "On". Then to be safe, you can set the "Max number of iterations" to 1 since this isn't actually recursive. Then use this cell formula but change both occurrences of the cell referenced in the VALUE() function (AL11 in the example below) to whatever cell this is in:
=IF(VALUE(AL11) > 0, AL11, INT((NOW() - DATE(2021, 7, 19)) * 86400) * 1000 + ROW())

ARRAY formula to find last row to contain value in Google Sheets

I have a Google Sheet that is populated automatically via Zapier integration. For each new row added, I need to evaluate a given cell (Shipper Name) to find last instance of Shipper Name in prior rows, and if so, return Row# for the last entry.
Example Data Sheet
I am trying to create a formula that simply looks at name in new row and returns the number of the most recent row with that name.
Formula needs to run as an Array formula so that the data auto populates with each new row added to the Sheet.
I have tried to use this formula, but when refactored as Array formula, it doesn't populate new values for new rows, it just repeats the first value for all rows.
From Row J:
=sumproduct(max(row(A$1:A3)*(F4=F$1:F3)))
I need this formula refactored to be an Array formula that auto populates all the cells below it.
I have tried this version, but it doesn't work:
=ArrayFormula(IF(ISBLANK($A2:$A),"",sumproduct(max(row(A$1:A3)*($F4:$F=F$1:F3))))
A script (custom function maybe?) would be better.
Solution 1
Below is a formula you can place into the header (put in in J1, remove everything below).
It works much faster than the second solution and has no N² size restriction. Also it works with empty shippers (& "♥" is for those empty ones): as long as A:A column has some value it will not be ignored.
={
"Row of Last Entry";
ARRAYFORMULA(
IF(
A2:A = "",
"",
VLOOKUP(
ROW(F2:F)
+ VLOOKUP(
F2:F & "♥",
{
UNIQUE(F2:F & "♥"),
SEQUENCE(ROWS(UNIQUE(F2:F)))
* POWER(10, INT(LOG10(ROWS(F:F))) + 1)
},
2,
0
),
SORT(
{
ROW(F2:F) + 1
+ VLOOKUP(
F2:F & "♥",
{
UNIQUE(F2:F & "♥"),
SEQUENCE(ROWS(UNIQUE(F2:F)))
* POWER(10, INT(LOG10(ROWS(F:F))) + 1)
},
2,
0
),
ROW(F2:F);
{
SEQUENCE(ROWS(UNIQUE(F2:F)))
* POWER(10, INT(LOG10(ROWS(F:F))) + 1),
SEQUENCE(ROWS(UNIQUE(F2:F)), 1, 0, 0)
}
},
1,
1
),
2,
1
)
)
)
}
Details on how it works
For every row we use VLOOKUP to search for a special number in a sorted virtual range to get the row number of the previous entry matching current.
A special number for a row is constructed like this: we get a sequential number for the current entry among unique entries and append to it current row number.
The right part (row number) of the resulting special numbers must be aligned between them. If the entry has sequential number 13 and the row number is 1234 and there are 100500 rows, then the number must be 13001234. 001234 is the aligned right part.
Alignment is done by multiplying a sequential number by 10 to the power of (log10(total number of rows) + 1), gives us 13000000 (from the example above). This approach is used to avoid using LEN and TEXT - working with numbers is faster then working with strings.
Virtual range has almost the same special numbers in the first column and original row numbers in the second.
Almost the same special numbers: they just increased by 1, so VLOOKUP will stop at most one step before the number corresponding to the current string.
Also virtual range has some special rows (added at the bottom before sorting) which have all 0's as the right part of their special numbers (1st column) and 0 for the row number (2nd column). That is done so VLOOKUP will find it for the first occurrence of the entry.
Virtual range is sorted, so we could use is_sorted parameter of the outer VLOOKUP set to 1: that will result in the last match that is less or equal to the number being looked for.
& "♥" are appended to the entries, so that empty entries also will be found by VLOOKUP.
Solution 2 - slow and has restrictions
But for some small enough number of rows this formula works (put in in J1, remove everything below):
={
"Row of Last Entry";
ARRAYFORMULA(
REGEXEXTRACT(
TRANSPOSE(QUERY(TRANSPOSE(
IF(
(FILTER(ROW(F2:F), F2:F <> "") > TRANSPOSE(FILTER(ROW(F2:F), F2:F <> "")))
* (FILTER(F2:F, F2:F <> "") = TRANSPOSE(FILTER(F2:F, F2:F <> ""))),
TRANSPOSE(FILTER(ROW(F2:F), F2:F <> "")),
""
)
), "", ROWS(FILTER(F2:F, F2:F <> "")))),
"(\d*)\s*$"
)
)
}
But there is a problem. The virtual range inside of the formula is of size N², where N is the number of rows. For current 1253 rows it works. But there is a limit after which it will throw an error of a range being too large.
That is the reason to use FILTER(...) and not just F2:F.
Here is a significantly simpler way to get at the information you're interested in. (I think.) I'm mostly guessing about what you want because your question wasn't really about what you want, but rather about how to get something that you think would help you get what you want. This is an example of an XY problem. I attempted to guess based on experience at what you're really after.
This editable sheet contains just 3 formulas. 2 on the raw data sheet and one in a new tab called "analysis."
The first formula on the Raw data tab extracts a properly formatted timestamp using a combination of MMULT and SPLIT functions and looks like this:
=ARRAYFORMulA({"Good Timestamp";IF(A2:A="",,MMULT(N(IFERROR(SPLIT(A2:A,"T"))),{1;1}))})
The second formula finds the amount of time since the previous timestamp for that Shipper. and subtracts it from the current timestamp thereby giving you the time between timestamps. However, it only does this if the time is less than 200 minutes. IF it is more than 200 minutes, it assumes that was a different shift for that shipper. It looks like this and uses a combination of LOOKUP() and SUBSTITUTE() to make sure it's pulling the correct timestamps. Obviously, you can find and change the 200 value to something more appropriate if it makes sense.
=ARRAYFORMULA({"Minutes/Order";IF(A2:A="",,IF(IFERROR((G2:G-1*SUBSTITUTE(LOOKUP(F2:F&G2:G-0.00001,SORT(F2:F&G2:G)),F2:F,""))*24*60)>200,,IFERROR((G2:G-1*SUBSTITUTE(LOOKUP(F2:F&G2:G-0.00001,SORT(F2:F&G2:G)),F2:F,""))*(24*60))))})
The third formula, on the tab called analysis uses query to show the average minutes per order and the number of orders per hour that each shipper is processing. It looks like this:
=QUERY({'Sample Data'!F:I},"Select Col1,AVG(Col3),COUNT(Col3)/(SUM(Col3)/60) where Col3 is not null group by Col1 label COUNT(Col3)/(SUM(Col3)/60)'Orders/ hour',AVG(Col3)'Minutes/ Order'")
Hopefully I've guessed correctly at your real goals. Always do your best to explain what they are rather than asking for only a small portion that you think will help you get to the answer. You can end up overcomplicating your process without realizing it.

Insert duplicate rows (based on one cell-value) in google sheets

So I have this formula that copies rows (with data in col A) into a new range. Column A contains number indicating how many duplicates the row should yield in the result. Also the output rows gets sorted based on the value in column A.
=sort(arrayformula(vlookup(
transpose(split(query(rept(row(D2:D)&" ",A2:A),,9^9)," ")),
arrayformula({row(D2:D),{A2:A,D2:F}}),
{2,3,4,5},
0)),
1,
TRUE)
This is not exectly what I need thou. Instead of having a single value in the cells in column A that indicates how many times the row should be duplicated I need to have a text string like “2,3,5” in every cell in that column, where the individual numbers in the string indicates the position of the row in output (rather than the number of times the row should be copied).
For example, in the output I want the row with the string “2,3,5” to be copied three times. The output should have one of the rows be 2:nd from top, the other 3:rd from top, and the last one the 5:th from top.
If I could have the A2:A part of this range {A2:A,J2:N} instead contain the matching values for split(A2:A) I think it would do what I want.
This is a copy of my google sheet. Hopefully it's possible to understand what it is I'm trying to achieve.
https://docs.google.com/spreadsheets/d/1sp5DRBwFP0-aG-FvjUPKBmyylz0WoPB63ASOOdUGdnI/edit?usp=sharing

Cannot append uniformly using Google Sheet Api V4 with empty values

I am trying to figure what combination of options for the (Google Sheet Append Api) I need to uniformly append a row of data. My data could include empty values or not.
Problem:
I cannot seem to figure out a way to guarantee my row to append to a sheet starting from A1 every single time.
Here are the different behaviours I have seen:
1) { values: [["", "", 3]] }
If I append this row 3 times, the rows will cover the ranges A1:C1, C2:E2, E3: G3. What I really want is A1:C3.
2) { values: [["3", "", 3]] }
If I append this row 3 times, the rows will do something similar to 1) where it will cover the ranges A1:C1, C2:E2, E3: G3. Again, I want A1:C3.
3) { values: [["3", false, 3]] }
If I append this row 3 times, everything works the way I want it to. It covers the ranges A1:C3.
What options do I specify in my append api request to guarantee that my rows will always start at A1 (even if the first n columns are empty)?

Get column headers with query language

I'm using the query language to query data from spreadsheet.
I would like to retrieve the first row(column headers), how do I do that?
Currently I'm using: select * where ( A = -1 )
, the data in A column is never equal to -1, so it returns only column headers.
Is there a straightforward way to do this?
You can use query(A:Z, "select * limit 0", 1) meaning: select all, return at most 0 rows. The result is that only the header row is returned (the 3rd parameter is to make it clear there is 1 header row).
But it's not really natural to use query for this purpose. The function array_constrain is provided for the purpose of truncating an array of data. For example,
=array_constrain(A:Z, 1, 1e7)
returns the first row of the given array. (Since no limit on the number of columns is needed, I gave 1e7 = 10,000,000 as the maximal number of columns. A spreadsheet can't even have that many cells.)

Resources