How to change specific values to be a range instead - google-sheets

I have a list of values in Google Sheets for example:
10
14
36
43
64
110
92
103
and I want to change it to a range of
0-20, 21-40, 41-80, 81-120
so that it outputs
2
1
2
3
(two values in the range 0-20, one value in the 21-40 range, two values in the 41-80 range, and three values in the 81-120 range.)

You can do it in one step with the Frequency function FREQUENCY(data, classes):
=frequency(A2:A10,{20,40,80,120})
Note that Frequency creates one count per class, plus an extra count for values which exceed the highest class value. You can suppress this if you want to, but it could be a useful check for outliers.

=QUERY(ARRAYFORMULA({A1:A, IF(LEN(A1:A),
IFERROR(VLOOKUP(A1:A, {{0, "0-20" };
{21, "21-40" };
{41, "41-80" };
{81, "81-120" }}, 2), ),)}),
"select Col2, count(Col2)
where Col2 !=''
group by Col2
label count(Col2)''")
alternatives: https://webapps.stackexchange.com/a/123741/186471

Related

Calculate the average of sets of cells satisfying a certain criterion (AVERAGEIF)

This is my Items sheet
And this is my Values sheet
As can be seen from the table, alfa is associated with 20 40 60 80, beta with 30 40 70 80 and gamma with 50 60 70 80.
In the Items sheet in cell B1 (next to the first item) I would like a formula (Arrayformula or alike) generating the average value for each item. In my example it should be:
alfa -> 50 (that is: (20+40+60+80)/4 = 200/4)
beta -> 55 (that is: (30+40+70+80)/4 = 220/4)
gamma-> 65 (that is: (50+60+70+80)/4 = 260/4)
So the final result should be:
This is my googlesheet: example
P.S. For simplicity's sake, I used just columns A:C for items in Values sheet. In real case I have 10 columns so I want to avoid to specify each one in the formula and instead use a range.
try:
=BYROW(A1:A3, LAMBDA(x, INDEX(QUERY(SPLIT(FLATTEN(Values!A1:C10&"​"&Values!D1:D10), "​"),
"select avg(Col2) where Col1 = '"&x&"'"), 2)))
update
=IFERROR(BYROW(A1:INDEX(A:A, MAX(ROW(A:A)*(A:A<>""))),
LAMBDA(x, INDEX(QUERY(SPLIT(FLATTEN(
FILTER(Values!A:C, Values!D:D<>"")&"​"&
FILTER(Values!D:D, Values!D:D<>"")), "​"),
"select avg(Col2) where Col1 = '"&x&"'"), 2))))

Stacking Multiple Arrays In Query/Lambda Function

My question was inspired by this post in that I'm wondering if it's possible to create a formula to stack a dynamic amount of arrays based on a list (see below for clarification).
Sample Starting Data From Three Sources
ID
Amount
India
9
Delta
4
Hotel
8
ID
Amount
Alpha
1
Echo
5
Foxtrot
6
ID
Amount
Bravo
2
Gulf
7
Charlie
3
Desired final result:
ID
Amount
Alpha
1
Bravo
2
Charlie
3
Delta
4
Echo
5
Foxtrot
6
Gulf
7
Hotel
8
India
9
I can get the final result by using a query function as shown in this spreadsheet with a formula referencing the appropriate cells with fileID and range:
=Query({IMPORTRANGE(E2,F2);
IMPORTRANGE(E3,F3);
IMPORTRANGE(E4,F4)},"Select * where Col1 is not null order by Col1",1)
if you want to play with it in your own sheet, you could use this hard-coded function which is the same as above:
=Query({IMPORTRANGE("1WtI56_9mhyArMn_j_H4pZg8E0QdIBaKoJfAr-fDAoE0","'Sheet1'!A:B");
IMPORTRANGE("1HamomAuLtwKJiFEtRKTuEkt--YDTtWChUavetBcAcBA","'Sheet1'!A2:B");
IMPORTRANGE("1WtI56_9mhyArMn_j_H4pZg8E0QdIBaKoJfAr-fDAoE0","'Sheet2'!A2:B")},"Select * where Col1 is not null order by Col1",1)
My Question:
Is there a way to leverage a formula to generate this result based on the number of file ids and ranges in columns E and F? So if a fourth ID and range were added, the desired result in columns a and b would be shown? I suspect Lambda would work, but I am not as strong with it as I should be.
Unsuccessful attempt:
=lambda(someIDs,SomeRanges,IMPORTRANGE(someIds,SomeRanges))(filter(E2:E,E2:E<>""),filter(F2:F,F2:F<>""))
REALLY Bad Attempts:
=contact(Player()*1800-CoffeeBribe*Not(Home))
=company(theMaster(emailed)*(false))<>🐇
All helpful answers will be upvoted if not accepted. Thanks.
if ranges would be the same:
=LAMBDA(x, QUERY(REDUCE({"ID", "Amount"}, x,
LAMBDA(a, c, {a; IMPORTRANGE(c, "Sheet1!A2:B")})),
"where Col1 is not null", 1))
(E2:INDEX(E:E, MAX((E:E<>"")*ROW(E:E))))
if ranges are not the same:
=INDEX(LAMBDA(x, y, QUERY(SPLIT(TRANSPOSE(SPLIT(QUERY(MAP(x, y,
LAMBDA(e, f, QUERY("♣"&FLATTEN(QUERY("♥"&TRANSPOSE(
IMPORTRANGE(e, f)),,9^9)),,9^9))),,9^9),
"♣")), "♥"), "where Col1 <> ' ' order by Col2", 1))(
E2:INDEX(E:E, MAX((E:E<>"")*ROW(E:E))),
F2:INDEX(F:F, MAX((F:F<>"")*ROW(F:F)))))
or:
=LAMBDA(x, QUERY(REDUCE({"ID", "Amount"}, x,
LAMBDA(a, b, {a; IMPORTRANGE(b, OFFSET(b,,1))})),
"where Col2 is not null", 1))
(E2:INDEX(E:E, MAX((E:E<>"")*ROW(E:E))))
in old days it would be solved by generating it:
={""; INDEX("={"&TEXTJOIN("; ", 1, "IMPORTRANGE("""&
FILTER(E2:E, E2:E<>"")&""", """&FILTER(F2:F, F2:F<>"")&""")")&"}")}
REDUCE accepts and returns arrays. We can use it to stack ranges. INDEX/COUNTA can be used to get the range needed without blanks. OFFSET can be used to get the next column's value.
=QUERY(
REDUCE(
{"Id","Amount"},
E2:INDEX(E2:E,COUNTA(E2:E)),
LAMBDA(
a,e,
{a;IMPORTRANGE(e,OFFSET(e,0,1))}
)
),
"Select * where Col1 is not null order by Col1",
1
)

Get max value from each day from a range of time using google sheets arrayformula

Example
There is range A that stores time and its value and it gets updated dynamically (not in the example).
From that range A, I want to make a dynamic range B of each day and its max value.
Filter() doesn't work with arrayformula and I don't know if query works with it too.
You can do it just with a query:
=ArrayFormula(query({int(A4:A),B4:B},"select Col1,max(Col2) where Col2 is not null group by Col1 label Col1 'Date'"))
as long as you format the date column in the result appropriately.
EDIT
To remove the column labels, just put an empty string as below:
=ArrayFormula(query({int(A4:A),B4:B},"select Col1,max(Col2) where Col2 is not null group by Col1 label Col1 '',max(Col2) ''"))
You can try this (tested on Excel because your Google Sheet is blocked, but it should work perfectly)
=SUMPRODUCT(MAX(--($A$5:$A$31>D5)*--($A$5:$A$31<D5+1)*$B$5:$B$31))
This is how it works:
--($A$5:$A$31>D5) will return an array of 1 and 0 if cell value is higher than date reference. Say date reference is 24/01/2020, then the returned array will be {1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1}
--($A$5:$A$31<D5+1) will do same, but only if cell value is lower to d5+1 (next day). So for 24/01/2020 we would obtain {1;1;1;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0}
First * will multiply both arrays, so {1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1} * {1;1;1;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0} = {1;1;1;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0}
Second * will multiply previous array by values in range B5:B31, that means {1;1;1;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0} * {0,333;0,667;0,667;0,667;0,667;0,667;0,667;0,667;0,667;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;0,667} = {0,333;0,667;0,667;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0}
MAX will get the max value from previous array (in this case, 0,667)
SUMPRODUCT has been added so we can work with arrays. Normal Max would not do it by itself.
NOTE: Please, notice that my decimal separator is the comma and my argument separator is the semicolon, so probably you will need to fix this acording to your language settings
try:
=SORTN(SORT(FILTER({INT(A5:A), B5:B}, B5:B<>""), 1, 0, 2, 0), 9^9, 2, 1, 1)

Google Sheets: sum two columns by QUERY with some empty cells

I have try to get sum of two columns using query function in Google Sheets.
Col1 Col2 Col3
-----------------------
12 User1
23 44 creature
55 User1
14 User1
This work fine if there are at least one number in each column:
=QUERY(IMPORTRANGE('SomeURL';"Page!A1:C");
"select (sum(Col1) + sum(Col2)) where Col3 = 'User1'")
However this query cause error QUERY:AVG_SUM_ONLY_NUMERIC if all cells in one column are empty in result set.
Col1 Col2 Col3
-----------------------
12 User1
23 44 creature
User1
14 User1
How can I get sum of columns using query function, if sometimes the cells are empty in one of the column?
=ARRAYFORMULA(SUM(query(IMPORTRANGE("url","page!A1:C6"),"select Col1,Col2 where Col3 = 'User1'")))
This should work for a simple sum. But i don't think there's a way inside QUERY to consider blanks as zero or assume them as numbers. If you could actually import range into sheet(i.e., use them as helper columns), then you can use ARRAYFORMULA(Query ({filter (A1:B6*1,NOT(ISEMAIL(A1:A6))),C1:C6}, "select *.... You should convert blanks outside Query (by *1)or Sum them outside query. Or use a DOUBLE Query and double import range, which would be performance depreciative.
You can SUM the Query, like this:
=sum(query('SomeURL';"Page!A1:C"); "select Col1, Col2 where Col3 ='User1'"))
OR use SUMIF() twice, once for each column. It means 2 importranges, though, so it will probably be slower.

feeding categorical data to classifier

Suppose I have the dataset in the following format:
col1 col2 col3 col4 col5 (to be predicted)
12 13 4 primary 12
1 15 2 secondary 13
5 7 8 primary 18
14 12 44 college 6
col5 needs to be predicted for some test data using col1, col2, col3 and col4
During training, col1, col2, col3 can be feeded as such in an array to the classifier but how to feed col4.
I am aware that this is categorical and need to be converted to numeric type, but even after assigning some number, it will still remain as nominal type.
So if primary=1, secondary=2 and college=3, the numbers 1,2 and 3 cant be compared as per their magnitude because they are still like labels, with no numerical significance.
So how should I proceed after this step... should they be normalized ? or any further should be done ?
You should use One Hot Encoding in such cases. Every possible categorial value creates new binary feature.
One Hot Encoding for Machine learning

Resources