How to Calculate Conditional Correlation in Google Sheets - google-sheets

Here is my screenshot and link to the spreadsheet. I am trying to calculate the correlation between number of passes students take and their grades, but only for those with greater than 60% attendance. I am not sure what is wrong with my formula.
Sheet

There are several things wrong with your formula, including that you are using range references (e.g., "B2:B98") where QUERY only calls for "B"; but rather than telling you what is wrong, I'll just share what should be right as well as easier for you:
=ArrayFormula(CORREL(FILTER(B2:B,A2:A>0.6),FILTER(C2:C,A2:A>0.6)))
If you want to stick with QUERY:
=ArrayFormula(CORREL(QUERY(A2:C,"Select B WHERE A > 0.6"),QUERY(A2:C,"Select C WHERE A > 0.6")))

Related

Google Sheets - Calculate sum of row results - by using one-liner formula

In some column have formula like below:
=SUM(MMULT(IFNA(FILTER($H$2:$H;$G$2:$G=M2);0);L2))
I have expanded it in 3 rows (A1:A3):
=SUM(MMULT(IFNA(FILTER($H$2:$H;$G$2:$G=M2);0);L2))
=SUM(MMULT(IFNA(FILTER($H$2:$H;$G$2:$G=M3);0);L3))
=SUM(MMULT(IFNA(FILTER($H$2:$H;$G$2:$G=M4);0);L4))
And I'm getting right results.
Then I want to get sum of those. But I need to create all of above in one line.
I mean - I don't want to calculate above separately and then do something like SUM(A1:A3).
I was trying with ARRAYFORMULA() but with no success, yeah, how hard can it be, right?
Have a nice evening!
UPDATE:
If someone will look for the solution, the best one is Mike's solution. Take a look at picture what he has added.
I have just extended it to filter L and M columns and it does not matter how long columns are. Of course L and M columns have to have the same number of elements ;)
=SUM(MMULT(ARRAYFORMULA(--(G2:G=TRANSPOSE(FILTER(M2:M;ISTEXT(M2:M))))*(H2:H));FILTER(L2:L;ISNUMBER(L2:L))))
You should be able to nest the evaluations inside a SUM, like the below and get a SUM of the SUM's so to speak.
=SUM(SUM(MMULT(IFNA(FILTER($H$2:$H;$G$2:$G=M2);0);L2)),SUM(MMULT(IFNA(FILTER($H$2:$H;$G$2:$G=M3);0);L3)),SUM(MMULT(IFNA(FILTER($H$2:$H;$G$2:$G=M4);0);L4)))
There probably is a more efficient way, but why over complicate it. :)
Try this
=sum(mmult(arrayformula(--(G2:G=transpose(M2:M4))*(H2:H)),L2:L4))
the formula =arrayformula(--(G2:G=transpose(M2:M4))*(H2:H)) will give you a matrix as follows, then apply mmult

Google Sheets Count Unique Dates based upon a criteria in different columns

I am trying to find a formula that will give me the count of unique dates a persons' name appears in one of two different columns and/or both columns.
I have a set of data where a person's name may show up in a "driver" column or a "helper" column, multiple times over the course of one day. Throughout the day some drivers might also be helpers and some days a driver may come in for duty but only as a helper. Basically all drivers can be helpers, but not all helpers can be drivers.
I've attached a link to a sample sheet for more clarity.
https://docs.google.com/spreadsheets/d/1GqNa1hrViX4B6mkL3wWcqEsy87gmdw77DhkhIaswLyI/edit?usp=sharing
I've created a REPORTS tab with a SORT(UNIQUE(FLATTEN)) Formula to give me a list of the names that appear in the DATA Tab.
I'm looking for a way to count the unique dates a name from the name (Column A of the REPORTS Tab) appears in either of the two columns (Column B and/or C of the DATA Tab) to determine the total number of days worked so I can calculate the total number of days off over the range queried.
I've tried several iterations of countif, countunique, and countuniqueifs but cannot seem to find a way to return the correct values.
Any advice on how to make this work would be appreciated.
I think if you put this formula in cell b7 you'll be set. You can drag it down.
=Counta(Unique(filter(DATA!A:A,(DATA!C:C=A7)+(DATA!B:B=A7))))
Here's a working version of your file.
For anyone interested, Google Sheets' Filter function differs slightly from Excel's Filter function because Sheets attempts to make it easier for users to apply multiple conditions by simply separating each parameter with a comma. Example: =filter(A:A,A:A<>"",B:B<>"bad result") will provide different results between the Sheets and Excel.
Excel Filter requires users to specify multiple conditions within parenthesis and denote each criterion be flagged with an OR condition with a + else an AND condition with a multiplication sign *. While this can appear daunting and bizarre to multiply arrays that have text in it, it allows for more flexibility.
To Google's credit, if one follows the required Excel Syntax (as I did in this answer) then the functions will behave the same.
delete what you got and use:
=QUERY(QUERY(UNIQUE({DATA!A:B; DATA!A:A, DATA!C:C}),
"select Col2,count(Col1),"&D2&"-count(Col2)
where Col2 is not null
group by Col2"),
"offset 1", 0)

How do I make Google Spreadsheet automatically divide a column into another column?

I'm making a spreadsheet that includes a long list of values, with a column that contains a total of values, and after that an average of the values in the row. I need the averaged column to always be 1/6 of the value in the summed column, but I can't figure out a way to make it automatically calculate it for me for each new row.
So far, I have been doing it all manually (type out all the values, manually add them together for the total, then divide by 6 myself for the average) but I'd really like to automate the math parts. I have not found a single way to properly do this - using "=DIVIDE(K2,6)" as a modified version of a suggestion on this other question (modified to use the column I'm actually putting the numbers in) does literally nothing, and I'd have to manually change and paste it into each row, which is actually harder and more tedious than continuing to do the math myself.
Here's an example image of what my columns look like. All the math is correct so far, I just want to automate it so I can type fewer numbers:
EDIT: Combined answers from Scott and Player0 is what worked! thanks for being patient with me! I was able to also use that to make the Sum column function automatically as well, so both columns are fully automated now! :D
You don't have to enter the formula manually on every line. 
Enter =K2/6 in cell L2; then select cell L2
and drag/fill it down to L12. 
(That means click on the dot in the lower right corner of the cell
and drag it down.) 
Or however far your sheet actually goes. 
That will automatically fill in L3 with =K3/6,
L4 with =K4/6, and so on.
use on row 2:
=INDEX(IFERROR(K2:K/6; 0)
also see: ArrayFormula of Average on Infinite Truly Dynamic Range in Google Sheets

How do I do a SUMPRODUCT in Google Sheets, but conditional on the text in both vectors?

The following spreadsheet shows the exercise submission status for 4 students. There are 4 exercises (1-4), but only 2 of them are homework (and thus graded) - they have a prefix 'H' in their name. A correct submission is marked "complete".
I'm trying to count, for each student, how many "complete" submissions he has, which are also homework. The right-most column is my desired result.
I tried all kinds of countifs, but couldn't get it. I have an ugly solution which uses SUMPRODUCT, but that requires substituting all the "complete" with 1's (which I'd rather not) + some more. I prefer a Google Sheets solution, but excel would work as well...
Have a heart and help out a teacher :-)
I suggest using mmult, which is a standard way of getting row totals from a matrix. As you mention, the first step is to convert each cell containing "complete" into a 1, then check the headers for presence of letter H.
=ArrayFormula(mmult((A2:D6="complete")*(isnumber(SEARCH("h",A1:D1))),transpose(column(A2:D6))^0))
I have tested this in Google Sheets, but it should work in Excel as well.
EDIT
(1) The easiest way to make the range accommodate changes is to put some upper limit on number of columns and make the references full-column, e.g.
=ArrayFormula(if(A2:A="","",mmult((A2:M="complete")*(isnumber(SEARCH("h",A1:M1))),transpose(column(A2:M))^0)))
You might want to move the total off onto another sheet:
=ArrayFormula(if(Sheet7!A2:A="","",mmult((Sheet7!A2:Z="complete")*(isnumber(SEARCH("h",Sheet7!A1:Z1))),transpose(column(Sheet7!A2:Z))^0)))
(2) To get the values as percentages, you can use countif:
=ArrayFormula(if(Sheet7!A2:A="","",mmult((Sheet7!A2:Z="complete")*(isnumber(SEARCH("h",Sheet7!A1:Z1))),transpose(column(Sheet7!A2:Z))^0)/countif(Sheet7!A1:Z1,"*h*")))
and format column as percent.
EDIT 2
To check for presence of H in headers but ignore h, use Find instead of Search, and regexmatch instead of countif:
=ArrayFormula(if(Sheet7!A2:A="","",mmult((Sheet7!A2:Z="complete")*(isnumber(find("H",Sheet7!A1:Z1))),transpose(column(Sheet7!A2:Z))^0)/sum(--regexmatch(""&Sheet7!A1:Z1,"H"))))
If you only want to include headers _starting_with H, change "H" in the regexmatch to "^H" as in #player0's answer.
if position of H columns is known, you can do simple:
=INDEX(IF(A2:A="",,ADD(D2:D="complete", E2:E="complete")))
if the number of columns and position of H's is unknown:
=INDEX(MMULT((INDIRECT("A2:"&ADDRESS(COUNTA($A:$A), COLUMN()-1))="complete")
*(REGEXMATCH(UPPER(INDIRECT("A1:"&ADDRESS(1, COLUMN()-1))), "^H.*")),
ROW(INDIRECT("A1:"&COLUMN()-1))^0))
update:
=INDEX(TEXT(MMULT((INDIRECT("A2:"&ADDRESS(COUNTA($A:$A), COLUMN()-1))="complete")
*(REGEXMATCH(UPPER(INDIRECT("A1:"&ADDRESS(1, COLUMN()-1))), "^H.*")),
ROW(INDIRECT("A1:"&COLUMN()-1))^0)/
SUM(1*REGEXMATCH(UPPER(INDIRECT("A1:"&ADDRESS(1, COLUMN()-1))), "^H.*")), "0.00%"))

Google Spreadsheet: Query String and Numeric (Involving math formula)

Sorry about the imprecise title. Allow me to elaborate. I'm currently in the process of making 'Order' sheets for the small retailer i work for. Some items are easy to count due to low inventory while other items are abundant and difficult to count but easy to gauge whether we ought to order them.
When an employee takes a store count, the on-hand number they put down is contrasted with a minimum. The minimum is our lower threshold. The minimum is subtracted by the input quantity and a formula produces a third column, "Order". If the number in the order column is < 0 then a query function on a separate sheet will copy the entire row. To be clear, there are three columns, "On Hand", "Minimum", "Order", with the "Order" column containing the following mathematical formula:
="Minimum" - "On Hand"
[Cells are specified so that it would look more like "=B2-A2".]
However, I'd also like to include the ability for employees to put a simple 'x' in the count spot, signifying that we need to order the product without having to count every single instance of the item. I'd still like to include the ability for them to enter a number if they so choose. I'd like for them to be able either the number or the 'x' in the same column. I'm currently using the following query function:
=QUERY('String(Fail)'!A:D;"select * where A contains 'x' or C > 0")
[The above is from a sheet I'm experimenting with. I will provide a link below in case you're more hands-on.]
The issue arises when the formula in the "order" column outputs any sort of number. If the formula is functional, no row marked with an 'x' is copied to the new page via the query command. If any row produces a numeric, no 'x' rows are copied over at all. I've experimented a bit but am at a loss as to where to go next.
The sheet I'm currently experimenting with is linked below. If you'd like any additional information I'd be happy to provide it. I'm relatively new to all of this so excuse my stupidity. I do recognize that I could very likely make a script for this but am not well versed in scripting with Google Apps and enjoy the immediate benefits of the query function.
Any help is welcome. Thank you.
Experimental Spreadsheet
All the values in a column need to be of the same type in order to be evaluated by QUERY. The mix of 'x' and numbers is confusing things.
If you use the Format menu to ensure all the values in column A are Plain Text, then your Query will work. (Formatting a numeric value as plain text does not stop it from working in a numeric calculation, so your column C survives.) Here's a screenshot of your query, after doing that formatting:
Based on your specification, your query needs to have the comparison to zero reversed, like this:
=QUERY('String(Fail)'!A:D;"select * where A contains 'x' or C < 0")
^^^

Resources