Sum specific numbers in a column (Google sheets) - google-sheets

I have the following spreadsheet containing information about some courses.
I would like to sum the ECTS column but also group the sums by their type General or Tech. E.g here I would have end up with two cells. One containing the number 5 (Total sum of ECTS for courses with type Tech) and another cell containing the number 27.5 (Total sum of ECTS for courses with type General).
Can this be achieved somehow?

Boris, here is a general formula:
=query(A1:E,"select E,sum(C) where E<>'' group by E",1)
This would return a mini-table of the results.
To get just the two cells, modify the above formula to this
=query(A2:E,"select sum(C) where E<>'' group by E
order by sum(C) label sum(C) '' ",0)
You might need to sort them ("order by") a different column to get them in the order your want - this sorts them by increasing value.
UPDATE:
Further explanation of the formula:
" where E<>'' " is effectively saying where (column) E is not equal to blank. It is important to note that query only works reliably with consistent data - only numbers or only text/strings in each column. It will still run if you have mixed data, but the results can be surprising, and query tends to look at what the majority of the data in a column is, either text or numberic.
So the above test will only work for a text column. If you are looking for numbers, you would not use the single quotes, just the equal sign. Eg. where E <> 0 would find rows with numbers not equal to zero in column E.
order by does sorting of the resuslts by one or more columns, and can specify ascending or descending order.
And label sum(C) '' turns off the column header that the query adds when you include an aggregating function like "sum". Or it can be used to re-label the default heading to something else - label sum(C) 'Calculated Sum'
References:
Query - general usage
Query - detailed reference

Formula:
=query(B19:C23, "select sum(B) group by C", -1 )
Everything in the image should be self descriptive.

=SUMPRODUCT(--(B1:B8="a"), A1:A8)
another way to do it using SUMPRODUCT function
B1:B8 : checks for text "a"
-- : the operator decodes True as 1 and False as 0.
A1:A8 : Values to be taken and added

Related

Range of Cells as Criteria for COUNTIFS Function in Google Sheets

I have a list that contains multiple items. However, each item has different variants.
I want to sum all occurrences of each item, regardless of the variant.
I am using the COUNTIFS function in Google Sheets but for the criteria, I want to input a range that is an array of strings.
=countifs(!A:A,("B:B"),!C:C,"small")
Where column B includes a list of different variant names and column C is sizing.
For example:
A
B
C
apples
apples
small
apples
applez
small
applez
applees
small
appleees
small
oranges
small
In this case I would want the result to = 4 because there were four total instances in column A where the criteria was met (using any string/row in column B) and since all sizes were small.
I was able to get the result I wanted using this formula however it is extremely cumbersome as there are many variants and they are constantly updated/changed concurrently in column B:
=countifs(A:A,"item variant 1",C:C,"small")
+countifs(A:A,"item variant 2",C:C,"small")
+countifs(A:A,"item variant 3",C:C,"small")
+countifs(A:A,"item variant 4",C:C,"small")
+countifs(A:A,"item variant 5",C:C,"small")
Seeking any improvement at all from there, I tried listing the variants within a range itself (making sure to use a semicolon for Google Sheets based on this answer) and couldn't get that to work either:
=countifs(A:A,{"item variant 1";"item variant 2";"item variant 3";"item variant 4";"item variant 5"},C:C,"small")
In the above case, it only counts instances of first variant mentioned in the range (in this case item variant 1).
while you said the expected output is 4,
I can only see 2 unique items matching for a count of 3(apples, applez)
& orange has zero match
& appleees not a match to applees
formula:
=COUNTA(IFERROR(filter(A:A,(C:C="small")*(REGEXMATCH(A:A,TEXTJOIN("|",1,B2:B))))))
Alternate formula:
=COUNTA(ARRAYFORMULA(IF(LEN(A2:A)*(C2:C="small"),IFERROR(vlookup(A2:A6,B:B,1,)),)))
Try this formula:
Assume that your data are always arranged as {ITEMS,VARIANTS,SIZES},
In this formula, you can adjust data range and search criteria according to the values in the last () (current values are $A:$C and "small"),
this formula...
uses BYROW() to iterate VARIANTS column and...
use QUERY() to filter ITEMS column for matches according to VARIANT and... FINDSIZE as criteria,
COUNT() the output of the filters by QUERY(), SUM() the RESULTS of all filters to get 3, since only apples and applez of the given VARIANTS has matches. (applees in VARIANTS has only 2 'e's while appleees in ITEMS has 3 'e's, makes it a non-match)
=ArrayFormula(
LAMBDA(RANGE,FINDSIZE,
LAMBDA(DATA,FINDSIZE,
LAMBDA(ITEMS,VARIANTS,SIZES,
LAMBDA(RESULTS,
SUM(RESULTS)
)(
BYROW(VARIANTS,LAMBDA(VARIANT,
LAMBDA(FILTER,
SUM(IFNA(FILTER,0))
)(QUERY({ITEMS,SIZES},"SELECT COUNT(Col1) WHERE Col1='"&VARIANT&"' AND Col2='"&FINDSIZE&"' LABEL COUNT(Col1)''",0))
))
)
)(INDEX(DATA,,1),QUERY({RANGE},"SELECT Col2 WHERE Col2 IS NOT NULL",0),LOWER(INDEX(DATA,,2)))
)(QUERY({RANGE},"SELECT Col1,Col3 WHERE Col1 IS NOT NULL OR Col3 IS NOT NULL",0),LOWER(FINDSIZE))
)($A:$C,"small")
)
If you don't concern the accessibility of the range and criteria, here is a shorter version:
=SUM(BYROW(B:B,LAMBDA(VARIANT,IFNA(IF(VARIANT="",0,QUERY({A:A,C:C},"SELECT COUNT(Col1) WHERE Col1='"&VARIANT&"' AND Col2='small' LABEL COUNT(Col1)''",0)),0))))

Counting the number of times a value appears more than once in a column AND where another conditon is met

Any help in figuring this out would be appreciated. I would like a forumla to calculate the number of times a code number appears more than once AND where type is A.
A sample set of data looks like the following:
In this case the forumla should return 1 as there is one case of a repeated code number (1) where type is (A) - first row and last row in this case.
Would the forumla be any different if I also had a third column and wanted that to be a certain value as well? Again with the test data below I would want this to return 1 in the case that I wanted to measure the number of times any code number appeared more than once where type=A and subtype=C:
.
Ihave started with the following which identifies the number of unique combinations in columns A and B, but I can't seem to add any way to only return where a particular combination appears more than once:
=COUNTUNIQUE(IFERROR(FILTER(A2:A,B2:B="A"),""))
I have tried the following but it doesn't return correctly:
=COUNTUNIQUE(IFERROR(FILTER(A2:A,B2:B="A",COUNTIF(A2:A,A2:A)>1)))
Been trying to figure this one out for a while with no success.
Thank you
You can try this (TABLE = the range corresponding to your dataset, including the header row):
=query(query(transpose(query(transpose(TABLE),,9^9)),"select Col1,count(Col1) where Col1 contains 'A' group by Col1",1),"select Col2-1 where Col2>1 label Col2-1 ''")
What we are doing is to concatenate the Code number & type columns into one using the TRANSPOSE/QUERY/TRANSPOSE...9^9 hack, querying it again to make a temporary table of each group against its count for those groups which meet the criteria, then finally subtracting one from each group count and only returning an answer if there were groups with count>1 to begin with. You will get multiple results if multiple groups satisfy the count>1 criteria.
To add the subtype column to the formula as per the second question, change TABLE to suit, then change the inner QUERY to:
"select Col1,count(Col1) where Col1 contains 'A' and Col1 contains 'c' group by Col1"
Note that the if your 'real' type & subtype categories share characters then the where/contains approach in the QUERY will fail and a different approach will be needed.
Assume that you place you data at A1:B10, what this function do is:
FILTER B1:B10 by type, which is "A" in this example, and return an array which is filtered A1:B10.
Use INDEX to extract only the 1st column, which is the code column of the filtered array, and name it 'DATA' with LAMBDA function.
Use BYROW to iterate 'DATA', and check each code with COUNTIF, if it counts more than one of this code in the filter result, return that code, else return "".
Use UNIQUE to get rid of duplicate results. (since we are looking for code which have more than 1 repeats, so the return array will sure have duplicates.)
Use query to get rid of the extry empty rows.
=QUERY(UNIQUE(
LAMBDA(DATA,
BYROW(DATA,LAMBDA(ROW,
IF(COUNTIF(DATA,ROW)>1,ROW,"")
))
)(INDEX(FILTER(A1:B10,B1:B10="A"),,1))
),"WHERE Col1 IS NOT NULL")
Just noticed that the INDEX function is not necessary, FLITER can directly returns A1:A10 according the compare results of B1:B10.
=QUERY(UNIQUE(
LAMBDA(DATA,
BYROW(DATA,LAMBDA(ROW,
IF(COUNTIF(DATA,ROW)>1,ROW,"")
))
)(FILTER(A1:A10,B1:B10="A"))
),"WHERE Col1 IS NOT NULL")

Calculate sum and average treating blank values with specific values based on other column condition without adding helper column

I would like to calculate the sum and average in Google Spreadsheet of a range based on conditions from another column, but treat blanks with a specified value. It can be accomplished using a helper column, but I would like to do it without it. Here is the sample data:
I would like to sum values in column B based on value on Column A, but replacing blanks values with the value specified on E2 and E3 respectivelly.
Here is a sample in google sheet:
https://docs.google.com/spreadsheets/d/1Cv9YxFMHuGq2biNNCdGsjU8cAD_OwCPTR4YCPLc2r34/edit?usp=sharing
I was trying to use the following formulas for the sum of team A but I am not getting the expected result:
=sumif(A2:A,"A", if(B2:B<>"",B2:B,E2)) returns 6 instead of 19
=sum(if(A2:A="A",if(B2:B<>"", B2:B, E2),)) return 27 instead of 19
I cannot use a combintation of sumif and arrayformula like this because it expects a range in the third input argument:
=sumif(A2:A,"A", ARRAYFORMULA(if(B2:B<>"", B2:B,E2)))
I wasn't going to jump in on this one, since it's after midnight and I didn't feel I had the energy to both write and explain such a formula. But I see that you yourself have helped others on this forum. So I'll soldier through for you.
Delete everything from columns G:I (i.e., leave those columns entirely blank); and I suggest removing all of the formatting that you currently have in place in those columns, since it won't make sense after what I propose below.
Place the following formula in G1:
=ArrayFormula(QUERY(FILTER({A2:A,IF(B2:B="",IFERROR(VLOOKUP(A2:A&"*",D:E,2,FALSE),0),B2:B)},A2:A<>""),"Select Col1, SUM(Col2), AVG(Col2) GROUP BY Col1 LABEL Col1 'Team', SUM(Col2) 'Sum', AVG(Col2) 'Average'"))
This one formula will generate all headers, team names and results for all teams' sums and averages.
The virtual array between the curly brackets pairs every element of A2:A with the results of the IF function. That IF function checks to see if B2:B is blank. If so, a VLOOKUP with wildcard is performed to find the Col-A team name at the start of any value in Col D and, if found, returns the corresponding filler value from Col E. (If not found, IFERROR returns 0 as the filler value. It's important to have a numerical value because of the way QUERY works.)
FILTER filters in all of the above results only for those rows where A2:A contains a non-null value.
QUERY then pulls the team names, sums and averages; the LABEL portion of the QUERY assigns your desired column headers to the results.
This formula is not restricted to only two teams. You can add as many teams as you like in A:B and assign as many filler values as necessary in D:E.
It's important to note, however, that the formula relies on the FULL team name found in A:A being found at the beginning of the Col-D values. So if your team name is "Bears," just make sure the corresponding entry in Col-D starts with "Bears" as well (e.g., "Bears blanks" or even just "Bears").
You'll need to format the entire Col H as whole numbers and the entire Col I as 0.00 to match the results you shown in your sample.
ADDENDUM (after further comments from OP):
It seems that what you're saying is that the D:E values in your sample spreadsheet were something you wanted included within the formula itself and that you did not intend for them to be used as a reference list. I think your post's reference to "without a helper column" may have been your attempt to say this; but it was not clear, as with or without that D:E list as a live reference range, the main formula may have relied on its own helper column in addition to that list.
If you want a formula that contains the list:
=ArrayFormula(QUERY(FILTER({A2:A,IF(B2:B="",IFERROR(VLOOKUP(A2:A,{"A",2;"B",4},2,FALSE),0),B2:B)},A2:A<>""),"Select Col1, SUM(Col2), AVG(Col2) GROUP BY Col1 LABEL Col1 'Team', SUM(Col2) 'Sum', AVG(Col2) 'Average'"))
To add further blank-values, just keep adding to this section...
{"A",2;"B",4}
... being sure to follow the pattern of team-comma-value-semi for all but the last entry which will not need the closing semi.
You can use a combination of sumifs and countifs
to add up the non-blanks
=sumifs(B2:B10,A2:A10,"A")
to add up the blanks (and multiply by the default value)
=countifs(A2:A10,"A",B2:B10,"")*E2
all together
=sumifs(B2:B10,A2:A10,"A")+countifs(A2:A10,"A",B2:B10,"")*E2
Average (use countifs to work out how many items):
=(sumifs(B2:B10,A2:A10,"A")+countifs(A2:A10,"A",B2:B10,"")*E2)/countifs(A2:A10,"A")

Why my ArrayFormula is giving error? How do I correct it? (I'm not looking for another Arrayformula as solutions!)

I wanted a ArrayFormula at C1 which gives the required result as shown.
Entry sheet:
(Column C is my required column)
Date Entered is the date when the Name is Assigned a group i.e. a, b, c, d, e, f
Criteria:
The value of count is purely on basis of Date Entered (if john is assigned a on lowest date(10-Jun) then count value is 1, if rose is assigned a on 2nd lowest date(17-Jun) then count value is 2).
The value of count does not change even when the data is sorted in any manner because Date Entered column values is always permanent & does not change.
New entry date could be any date not necessarily highest date (If a new entry with name Rydu is assigned a on 9-Jun then the it's count value will become 1, then john's (10-Jun) will become 2 and so on)
Example:
After I sort the data in any random order say like this:
Random ordered sheet:
(Count value remains permanent)
And when I do New entries in between (Row 4th & 14th) and after last row (Row 17th):
Random Ordered sheet:
(Doesn't matter where I do)
I already got a ArrayFormula which gives the required result:
={"AF Formula1"; ArrayFormula(IF(B2:B="", "", COUNTIFS(B$2:B, "="&B2:B, D$2:D, <"&D2:D)+1))}
I'm not looking for another Arrayformula as solutions. What I want is to know what is wrong in my ArrayFormula? and how do I correct it?
I tried to figure my own ArrayFormula but it's not working:
I got Formula for each cell:
=RANK($D2,FILTER($D$2:$D, $B$2:$B=$B2),1)
I figured out Filter doesn't work with ArrayFormula so I had to take a different approach.
I took help from my previous question answer (Arrayformula at H3) which was similar since in both cases each cell FILTER formula returns more than 1 value. (It was actually answered by player0)
Using the same technique I came up with this Formula which works absolutely fine :
=RANK($D2, ARRAYFORMULA(TRANSPOSE(SPLIT(VLOOKUP($B2, SUBSTITUTE(TRIM(SPLIT(FLATTEN(QUERY(QUERY({$B:$B&"×", $D:$D}, "SELECT MAX(Col2) WHERE Col2 IS NOT NULL GROUP BY Col2 PIVOT Col1", 1),, 9^9)), "×")), " ", ","), 2, 0), ","))), 1)
Now when I tried converting it to ArrayFormula:
($D2 to $D2:$D & $B2 to $B2:$B)
=ARRAYFORMULA(RANK($D2:$D,TRANSPOSE(SPLIT(VLOOKUP($B2:$B, SUBSTITUTE(TRIM(SPLIT(FLATTEN(QUERY(QUERY({$B:$B&"×", $D:$D}, "SELECT MAX(Col2) WHERE Col2 IS NOT NULL GROUP BY Col2 PIVOT Col1", 1),, 9^9)), "×")), " ", ","), 2, 0), ",")), 1))
It gives me an error "Did not find value '' in VLOOKUP evaluation", I figured out that the problem is only in VLOOKUP when I change $B2 to $B2:$B.
I'm sure VLOOKUP works with ArrayFormula, I fail to understand where my formula is going wrong! Please help me correct my ArrayFormula.
Here is the editable sheet link
if I understand correctly, you are trying to "rank" B column based on D column dates in such way that dates are in theoretical ascending order so if you randomize your dataset, the "rank" of each entry would stay same and not change based on the randomness you introduce.
therefore the correct formula would be:
={"fx"; INDEX(IFNA(VLOOKUP(B2:B&D2:D,
{INDEX(SORT({B2:B&D2:D, D2:D}, 2, 1),,1),
IFERROR(1/(1/COUNTIFS(
INDEX(SORT(B2:D, 3, 1),,1),
INDEX(SORT(B2:D, 3, 1),,1), ROW(B2:B), "<="&ROW(B2:B))))}, 2, 0)))}
{"fx"; ...} array of 2 tables (header & actual table) under each other eg. ;
outer shorter INDEX or longer ARRAYFORMULA (doesnt matter which one) is needed coz we are processing an array
IFNA for removing possible #N/A errors from VLOOKUP function when VLOOKUP fails to find a match
we VLOOKUP joint B and D column B2:B&D2:D in our virtual table {} and returning second 2 column if there is an exact match 0
our virtual table {INDEX(SORT({B2:B&D2:D, D2:D}, 2, 1),,1), ...} we VLOOKUP from is constructed with 2 columns next to each other eg. ,
we are getting the first column by creating an array of 2 columns {B2:B&D2:D, D2:D} next to each other where we SORT this array by date/2nd column 2, in ascending order 1 but all we need after sorting is the 1st column so we use INDEX where we bring all rows ,, and the first column 1
now lets take a look on how we getting the 2nd column of our virtual table by using COUNTIFS which will mimic the "rank"
IFERROR(1/(1/ is used to remove all zero values from the output (all empty rows would have 0 in it as the "rank")
under COUNTIFS we put 2 pairs of arguments: "if column is qual to column" and "if row is larger or equal to next row increment it by 1" ROW(B2:B), "<="&ROW(B2:B))
for "if column is qual to column" we do this twice and use range B2:D and sort it by date/3rd column 3 in ascending order 1 and of this we again need only the 1st column so we INDEX it and return all rows ,, and first column 1
with this formula you can add, remove or randomize your dataset and you will always get the right value for each of your rows
as for why your formula doesnt work... to not get #N/A error for vlookup you would need to define the end row of the range but still, the result wont be as you would expect coz formula is not the right one for this job.
as mentioned there are functions that are not supported under AF like SUM,AND,OR and then there are also functions which work but in a different way like IFS or with some limitations like SPLIT,GOOGLEFINANCE,etc.
I have answered you on the tab in your shared sheet called My Practice thusly:
You cannot split a two column array as you have attempted to do in cell CI2. That is why your formula does not work. You can only split a ONE column array.
I understand you are trying to learn, but attempting to use complicated formulas like that is going to make it harder I'm afraid.

Google sheets query functions: how to do arithmetic on columns with non-constants, i.e. dividing not by a constant but by another cell?

I understand how to do arithmetic on a column using a query function as long as I'm using a fixed constant. However, if I try to do the same using a cell reference instead of a constant, I get an error.
I've tried making the cell a named range and referring to that name hoping it would act like a constant but I still get errors. There must be a way to do this. Here are examples of things that work:
=QUERY(FebMarket,"SELECT (C/5)") //This divides column C by 5
=QUERY(FebMarket,"SELECT AVG(C)") //This gives me the average of column C
=QUERY(FebMarket,"SELECT AVG(C) LABEL AVG(C) ''")
//average of col C without a header
However, if I do any of the following I get an error:
=QUERY(FebMarket,"SELECT C/(AVG(C) LABEL AVG(C) '')")
=QUERY(FebMarket,"SELECT C/(AVG(C))")
=QUERY(FebMarket,"SELECT C/AVG(C)")
=QUERY(FebMarket,"SELECT C/avgC") // where 'avgC' is a named range given to a cell where I calculated the average of column C separately
=QUERY(QUERY(C1:C,
"select C / "&AVERAGE(C1:C), 0),
"select Col1
where Col1 is not null
label Col1 '' ", 0)
As far as I know you get the same result in all dialects of SQL. You can do one of two things in a single query:
Get the result of an aggregate (like AVGE) over the entire dataset
e.g.
=QUERY(FebMarket,"SELECT AVG(C)")
in which case you can only select aggregates in the query
or
Get an aggregate for each one of a set of groups defined by one or more grouping variables.
e.g.
=QUERY(FebMarket,"SELECT GroupVariable,AVG(C) group by groupVariable")
in which case you can select grouping variables and aggregates in the query.
Neither of these help.
If you google something like 'SQL divide column by its average' you will probably get an answer using a subquery but at time of writing these are not available in Google Sheets.
So (although you could do this shorter by other means) the solution using queries would have to be
=ArrayFormula(query(A:A,"select A where A is not null")/query(A:A,"select avg(A) label avg(A) ''"))

Resources