How to handle data manipulation when using importrange() in Google Sheets? - google-sheets

I am working on speeding up a workbook in Google sheets that is using importrange(). The purpose of the entire workbook is to import data from a mastersheet and then allow us to manipulate it the way we want to outside of the mastersheet.
The problem: because importrange() doesn't allow you to directly manipulate cells we have Sheet1 acting as the import sheet; it doesn't get touched. Sheet2 is where we do the manipulating but, it was literally just taken as a copy of Sheet1, so it is also using importrange(). This bogs down the entire workbook and makes manipulations very slow.
I am thinking of using !Sheet1A1... and copying that to all the cells in the manipulation sheet, but my concern is that this will still bog down the workbook. There is potential that the import data could grow as large as 10k+ rows, and I'm only at about half that currently and running into this problem. Outside of that, I'm not sure what else there is to try.

The QUERY function can help here and there are some great resources online.
=importrange(spreadsheet_url, range_string)
a typical example is:
=importrange("https://docs.google.com/spreadsheets/d/xxxxxxxxxxxxxxxxxxxx","Sheet1!A:Z")
You can wrap a QUERY function around this to manipulate your data.
QUERY is like a version of SQL and very powerful. It's in the format:
=QUERY({},"",1)
Your data range importrange("https://docs.google.com/spreadsheets/d/xxxxxxxxxxxxxxxxxxxx","Sheet1!A:Z") would go within {}.
Then within the "" part of the query, you could write your parameters for manipulating the data.
Example:
select Col1,Col4,Col5 where Col1 is not null and Col6 contains 'hello' order by Col1,Col7 desc label Col1 'new name 1',Col4 'new name 4'
The select bit allows you to specify specific columns from your importrange. If you want the all, then you could use select *.
The where item is where you build up your criteria using various or or and parameters.
is not null is another way of saying you want rows that have data.
contains is useful. You can also have matches, starts with, ends with and like. like can use wildcards %, so where Col1 like '%the%' would find 'hello there'.
order by is ascending unless you add desc, ie. order by Col1,Col2,Col4,Col5 desc,Col3.
label allows you to rename the columns, so let's say input column 1 is called 'Name1' and input column 2 is 'Name2' and you want them to be 'First name' and 'Surname, you would use label Col1 'First name', Col2 'Surname'.
If you like QUERY there are other powerful clauses, and they run in this order within the QUERY(range,"clauses",0):
select
where
group by
pivot
order by
limit
offset
label
format
options
One small point which you may come across, when you use importrange to get your data you need to reference the columns as Col1,Col2,Col3 within the QUERY.
If, however, your range is already in the same sheet (same or different tab), then you would reference column letters instead, eg. select A,B,C where A is not null order by A desc.
To make it more consistent and use the Col1,Col2,Col3 notation, you would put your internal range in an array {}.
QUERY(Sheet1!B:F,"select B,C,D where F is not null order by B,C",0)
would become:
QUERY({Sheet1!B:F},"select Col1,Col2,Col3 where Col5 is not null order by Col1,Col2",0)
{Sheet1!B:F} is smart because you can add columns in front of this range without needing to change your clause. So adding one column in front of Sheet1, would result in:
QUERY({Sheet1!C:G},"select Col1,Col2,Col3 where Col5 is not null order by Col1,Col2",0)
The other method would need you to alter your clause from:
QUERY(Sheet1!B:F,"select B,C,D where F is not null order by B,C",0)
to:
QUERY(Sheet1!C:G,"select C,D,E where G is not null order by C,D",0)
It's a lot to take in, but definitely worth persuing!

Related

Google Sheets Query with Variables in the data section

I am trying to create a health related spreadsheet that has a lot of data - a lot of which isn't relevant to this question so I've simplified it. There is a column for each type of pain where you write on a scale of 0-10 how intense your pain was, and another column for any relevant notes. The data is broken up into named ranges to make it easier to display on different tabs (HeadData = Head Pain, ChestData = Chest Pain, etc. - 15 named ranges in total.)
One of the tabs I'm working on has a table where you are viewing only the specific named range, in this case HeadData.
=query({HeadData}, " Select * where Col1 is not null ",1)
This works perfectly, but I want to replace {HeadData} with a reference cell to a drop menu so you can select the specific pain area column you want to be displayed.
If I put the reference cell in G1 with a drop down list of the named ranges and select ChestData and try to do
=query({&G1&}, " Select * where Col1 is not null ",1)
It is only picking up G1 (ChestData) as a string and not the actual named range.
So my question is, is there a way to make a drop menu containing named ranges that turn into actual sets of data and not strings when placed in the data section of the query?
Here is my spreadsheet, any help is appreciated. Thanks!
https://docs.google.com/spreadsheets/d/1CcuSV2bbfxsUPPkmj-fru2yYYmmtpXk9LKEF85sxVUw/edit?usp=sharing
You can use INDIRECT for this.
INDIRECT
Returns a cell reference specified by a string.
Change your formula to
=query({INDIRECT(G1)}, " Select * where Col1 is not null ",1)
The INDIRECT function will convert the string in G1 to a cell reference and then the rest of your formula will query the relevant named range.

Only apply complex arrayformula() to rows with certain value in dataset

I have a quite complext formula (i mean that is complex to me) that Tom Sharpe helped me building to aggregate values and ordering them by months in a row(you can find the details in the original post but i think you'll only need the final formula which is:
=ArrayFormula(mmult(sequence(1,counta(A2:A),1,0), if((C2:index(C:C,counta(C:C))<=eomonth(G2,sequence(1,datedif(G2,H2,"M")+1,0)))* (D2:index(D:D,counta(D:D))>=eomonth(G2,sequence(1,datedif(G2,H2,"M")+1,0))),E2:index(E:E,counta(E:E)),0)))
and here is the result -> [J1:U1]
Now, what i would need to do as the final step is to be able to group data by a certain label (John or Jane in the example) on separate rows, but mantaining the order/aggregate by month on the row. On the example, this would mean having one row with only 'John' data and below, one with 'Jane' values.
I am struggling to understand how to adapt the formula to do so.
I have tried:
Using another array to first return a list of these labels with query(unique()) or something like that, but then i struggle looping in it with the other formula.
A bit more simplistic but it could work after all: on the 1st row (the cell next to where the data will be returned) writing 'John', on row 2 'Jane' and then using filter() to only pull data that matches. The 'John, Jane' value is for the example but the real labels won't be that many, the list of labels don't need to be dynamic.
The thing with these solutions is that they work when used separately, but i can't figure out how to nest this in the first arrayformula() that Tom helped me with...As i am just beginning with the google sheets queries.
I don't really need necessarily the complete formula/code but maybe just directions or tips to visualize the way i could solve this.
Thanks to all who might contribute
With hindsight I might have done better to go down the route of using a query to calculate the sums on my previous answer rather than Mmult.
This uses the same method as before to create a 2d array of amounts vs dates (going across) and individuals (going down). Then it uses Textjoin to generate a query to group by name with the required number of columns.
=ArrayFormula(query({A2:A,if((C2:C<=eomonth(G2,sequence(1,datedif(G2,H2,"M")+1,0)))* (D2:D>=eomonth(G2,sequence(1,datedif(G2,H2,"M")+1,0))),E2:E,0)},
"select Col1,sum(Col"&textjoin("),sum(Col",,sequence(1,datedif(G2,H2,"M")+1,2))&") where Col1 is not null group by Col1"))
This is the generated query
select Col1,sum(Col2),sum(Col3),sum(Col4),sum(Col5),sum(Col6),sum(Col7),sum(Col8),sum(Col9),sum(Col10),sum(Col11),sum(Col12),sum(Col13) where Col1 is not null group by Col1
Ideally there should be an extra section saying label sum(Col2) '' etc. to suppress the 'Sum' headers.
=ArrayFormula(query({A2:A,if((C2:C<=eomonth(G2,sequence(1,datedif(G2,H2,"M")+1,0)))* (D2:D>=eomonth(G2,sequence(1,datedif(G2,H2,"M")+1,0))),E2:E,0)},
"select Col1,sum(Col"&textjoin("),sum(Col",,sequence(1,datedif(G2,H2,"M")+1,2))&") where Col1 is not null group by Col1 label sum(Col" & textjoin(") '', sum(Col",,sequence(1,datedif(G2,H2,"M")+1,2)) & ") ''"))

How to concatenate strings and select the same columns multiple times using Query (Google Sheets)

I am trying to generate a table for the Gantt chart. Table should have this format:
https://developers.google.com/chart/interactive/docs/gallery/ganttchart#data-format
So,I need task name the same like taks ID, but in Query I can't use Col1 twice (I get error)
=QUERY({Tab1;Tab1};"select Col1,Col1,Col5,Col16,Col17 WHERE Col16>now() ORDER BY Col5 DESC,Col17 ";0)
The second point is that it is also not possible to merge two columns as a result, so it doesn't work:
=QUERY({Tab1;Tab1};"select Col1+Col7,Col1,Col5,Col16,Col17 WHERE Col16>now() ORDER BY Col5 DESC,Col17 ";0)
Here is my data and 2 results what I neet to get by QUERY
https://docs.google.com/spreadsheets/d/1CZYgfYo6oIeONZOH6ZR5rOW615HuH4ICaoe7lj0dapw/edit#gid=0
These are such trivial things in a real SQL, is there no way to do it somehow straightforwardly in Google Query? So far I have found a combination of QUERY and ARRAYFORMULA but then there are very complicated queries - mutants. Not easier?
You don't need Query, just Arrays.
You will get the first result from this code:
={ARRAYFORMULA(B3:B&" "&C3:C)\A3:A}
The second result from this code:
={A3:A\A3:A\B3:B1}
Based on your example I assume that you are not using US spreadsheet settings.
If so formulas have to be change to:
First:
={ARRAYFORMULA(B3:B&" "&C3:C),A3:A}
Second:
={A3:A,A3:A,B3:B}
Link to working example: https://docs.google.com/spreadsheets/d/1eMkOkyFwvDeYSy-8UlhQum4OWcb-4WJqGxy_CXM8pVs/edit?usp=sharing
I see that in your real sheet you would like to compare some data with now(). You can easily do this using array I propose as a source to Query. There will you have something like this (of course now it will not work - its only an example - an array have only 2 columns, not 15):
=QUERY({ARRAYFORMULA(B3:B10&" "&C3:C10)\A3:A10};"select * where Col15>now()";0)
About Query - you can't perform arthmetic operations on column containing strings. Look at the documentation: https://developers.google.com/chart/interactive/docs/querylanguage#arithmetic-operators
"I can't use Col1 twice (I get error)"
You can duplicate your indata that to solve this.
QUERY({Tab1 Column 1\Tab1 Column 1};"Select Col1, Col2......"
"Tab1 Column 1" is now Col1 and Col2
"The second point is that it is also not possible to merge two columns as a result, so it doesn't work:"
Yes, adding result of column is possible "select Col1+Col7......" is correct.

How to definitely use column names in Google Sheet Query

query function doesn't let you use column names; you have instead to use letters if you refer to a cell range or ColN if you refer to an array.
This is very annoying, most of all when you alter the queried table adding, deleting or exchanging columns.
I would like to use column names, like in a standard SQL query.
You can actually get around this by splitting the Query formula and using other formula's to automatically get the desired column names from a list.
For example if you have a table in range A1:E15 with headers "H1, H2, H3, H4, H5", and you'd like to only get columns H3 & H5:
Store the desired headers (H3 & H5) in another table/range as a list - lets say this range is G1:G2
Use MATCH formula along with TextJoin formula to generate an concatenated string like Col3, Col5
=TextJoin(", ",TRUE,ArrayFormula(IFERROR("Col"&MATCH(G1:G6,$A$1:$E$1,0),"")))
Lets say this was in cell H1
You can refer to this cell in your Query formula like below
=QUERY({A1:E20},"SELECT "&H1&" WHERE Col2='w'")
You can see it in action in below screenshot:
One solution could be recurring to some custom function created by a script, but when you have a not so small table you surely will incur in some error due to the exceeding computation time.
The most efficient solution (using only native functions) I found is as follows.
Suppose you are working on a sheet range, your column names are in row 1 and you want to refer to the column "salary"; you can obtain the column letter by
substitute(address(1,match("salary",A1:1,0),4),"1","")
Instead, if you are querying arrays, it is simpler; the string you need is
"Col"&match("salary",A1:1,0)
The final query could be not so elegant, but the efficiency is guaranteed:
query(
employeessheet!A:E,
"select "&substitute(address(1,match("salary",employeessheet!A1:1,0),4),"1","")&" where ...",
1)

Combining multiple spreadsheets in one using IMPORTRANGE

I would like to aggregate the data of multiple spreadsheets into one spreadsheet.
Spreadsheet 1 has a Row of Strings A2:A500
Spreadsheet 2 has a Row of Strings A2:A500
Spreadsheet 3 is supposed to have a Row of both (Spreadsheet1!A2:A500 AND Spreadsheet2!A2:A500).
Duplicates shall not be handled differently. I would like them to appear as often as they appear in the different sheets.
Is it possible to do this without writing a script or using jQuery, e.g. by using IMPORTRANGE?
What does not work: I have tried using IMPORTRANGE as follows:
ARRAY{IMPORTRANGE("key-of-spreadsheet1","list!A2:A500"), IMPORTRANGE("key-of-spreadsheet2", "list!A2:A500")}
This causes an error.
You should be able to use a vertical array in the Spreadsheet 3:
={IMPORTRANGE("Sheet1Key","SheetName!A2:A500");IMPORTRANGE("Sheet2Key","SheetName!A2:A500")}
Of course, it is also possible to combine several IMPORTRANGE() functions with the QUERY() function, which gives us a greater control over the results we import.
For example, we can use such a construction:
=QUERY(
{
IMPORTRANGE("key-or-url-of-spreadsheet-1", "'sheet-name-1'!A2:Z100");
IMPORTRANGE("key-or-url-of-spreadsheet-2", "'sheet-name-2'!A2:Z100");
IMPORTRANGE("key-or-url-of-spreadsheet-3", "'sheet-name-3'!A2:Z100");
IMPORTRANGE("key-or-url-of-spreadsheet-4", "'sheet-name-4'!A2:Z100")
},
"SELECT * WHERE Col1 IS NOT NULL ORDER BY Col3 ASC"
)
###Explanation:
The above query removes blank lines from imported ranges:
SELECT * WHERE Col1 IS NOT NULL
and sorts ascending all data collected together in relation to the third column:
ORDER BY Col3 ASC
For descending, just use DESC in place of ASC.
Of course, we can also arrange any other criteria, or omit them displaying everything without modification:
"SELECT * "
###Note:
In order to use the above constructed query, we first need to call a single IMPORTRANGE() method for each of the spreadsheets we want to refer:
=IMPORTRANGE("key-or-url-of-spreadsheet-1", "'sheet-name-1'!A2:Z100")
We have to do this even if we refer to the same spreadsheet in which we write this formula, but for every spreadsheet it is enough to do it once.
This is to be able to connect these sheets and allow access to the sheets (to which we have the access rights anyway):
                                                    
After giving permission for all spreadsheets, we can use the above query.
I am also applying above given formula for getting data from multiple spreadsheet which is getting an error something is like IN ARRAY_LITERAL An array literal was missing values for one or more rows.
Easy fix: Apply the filter to the entire column / sheet instead of just the current selection. This will automatically update all of the filters to include new additions.

Resources