correlate a demographic column with answers from multiple columns - google-sheets

I have a spreadsheet from a Google Consumer Survey. The survey captured demographics as well as the responses to a question. Acceptable responses could have chosen zero or more 'answers'. The response for each answer is in a unique column. For example,
user id | gender | age | income | answer 1 | answer 2 | answer 3 |
0001 | Female | 20-30 | 50-75 | [empty] | Right | Never |
0002 | Male | 20-30 | 30-50 | Up | Left | [empty] |
I would like to know how to correlate a column of demographic info with each of the possible answers. For example, I want to be able to answer questions like, Were males more likely than females to choose X for answer 1? and Which age group was more likely to choose Y for answer 2?
I prefer an answer using Google Sheets functions, but I am open to learning other ways to understand the data. Thank you for any help!

Good way is to use query function. Let's first assume, your data is stored in range A:G:
A | B | C | D | E | F | G |
user id | gender | age | income | answer 1 | answer 2 | answer 3 |
0001 | Female |20-30| 50-75 | [empty] | Right | Never |
0002 | Male |20-30| 30-50 | Up | Left | [empty] |
you may write simple query functions.
For example, to count all answer 1, group them by gender and age, pivot by answer 1:
=query(A:G,"select B, C, count(D) where not A is null group by B, C pivot E")
where not A is null -- prevents empty data to be used in query
count(D) -- can count any column, that wasn't already used by query
group by B, C -- must contain all selected items, except aggregates (count, sum, ets.)
pivot E -- will make all answers to show in separate columns.
The result will look like this:
Left Never Right Up
Female 20-30 1 1 1
Female 30-40 1
Male 20-30 1 1 1
Male 30-40 1
Please, look at complete Query Language Reference to learn more.

Have you tried using the Pivot Table function of Google Sheets?
Download the data in excel format after the survey is complete and open with Google Sheets
Select the tab with the resulting data from the Google Consumer Survey after it is run.
From the menu, select Data -> Pivot Table. This opens a new tab in your spreadsheet.
For the Values area of the pivot table, select User ID and from the "Summerize by" dropdown, select COUNTUNIQUE
For the columns and rows, select whichever dimensions you are interested in. For instance, in your example, you would pick
"Gender" and "Answer 1" as a row and column.
"Age" and "Answer 2" as a row and column.
This should answer these kinds of questions easily.
Hope this helps!

What I think I needed was the COUNTIFS function (in Google Sheets). Notice the plural use, which is different than countif (singular).
COUNTIFS allowed me to specify multiple criteria to make a score for each demographic segment. For example, I could count all the Males that responded Up in the answer 1 column.

Related

Reference range when column matches string

Budget spreadsheet. Column A contains categories, Row 1 contains paycheck dates, and each cell from B2:AE91 contains numeric values ("how much I spent on categoryX during paycheckY").
Named ranges:
Column A - "Budget_LineItem"
Row 1 - "Budget_PayPeriods"
On another tab, I have a list of specific categories called "Funds," where I want to track how much I've saved so far each paycheck toward the category by adding up the category's values each paycheck up until TODAY().
For example:
| | A | B | C | D |
| - | - | - | - | - |
| 1 | Fund | Balance | Today: | =TODAY() |
| 2 | Auto Insurance | =SUMIF(Budget_PayPeriods,"<="&MAX($D$1:$D$2),Budget!F48:AE48) | Projected Date: | |
As you can see, I just have a static range for the "Auto Insurance" category: Budget!B48:AE48. This works, but I want a formula that looks up the adjacent value in column A against the Budget_LineItem range, and returns the row range from B:AE in the Budget spreadsheet.
Basically reads: "Go find how much I've saved/spent so far toward categoryX in the Budget tab, and add up all the values for each paycheck up through today."
I know I'm close, but I can't make INDEX, MATCH, or any of the LOOKUP functions do what I need. I just can't figure it out.
EDIT: Here's a link to an example: https://docs.google.com/spreadsheets/d/1L4mlMrRCWwDNPSiYHpmFiXU1zNOnga6gAziz_m2awKI/edit?usp=sharing
I also made a change to the OP formula in B2 as I realized it didn't work. I had tweaked it because my original formula had extra complexity and I was trying to KISS for this question. I changed it back to the more complex version so it works properly now.
delete range B2:B and use this in B2:
=INDEX(MMULT(FILTER(Budget!B2:4, Budget!B1:1<=MAX(D1:D2))*1,
SEQUENCE(SUMPRODUCT((Budget!B1:1<=MAX(D1:D2))))^0))
update:
=INDEX(IFNA(VLOOKUP(A2:A,
{Budget!A2:A4, MMULT(FILTER(Budget!B2:4, Budget!B1:1<=MAX(D1:D2))*1,
SEQUENCE(SUMPRODUCT((Budget!B1:1<=MAX(D1:D2))))^0)}, 2, 0)))

Google Sheets ArrayFormula that returns an index of the column that matched specified criteria for each row

I've been searching for several hours for what I thought would be a pretty straight forward problem but without any luck.
I need an array formula (needs to calculate for range without copying down the formula) that returns an index reference to the column containing a match for the passed criteria for each row. I don't need the value returned, which is what I've seen related problems solving for, just the column index. I will be using the returned index value to pull data from a bound matrix containing data such as allocated hours. I tried to use MATCH inside an ArrayFormula with a dynamic index for the lookup range but it doesn't increment the row as I would expect. Below is example data with the desired results shown in the first column (technically the results will be returned in a separate worksheet but included here for illustrative purposes), assignee is the criteria for which to find the matching column index across reviewers 1 - 3.
+---------+----------+------------+------------+------------+
| Results | Assignee | Reviewer 1 | Reviewer 2 | Reviewer 3 |
+---------+----------+------------+------------+------------+
| 2 | Paul | Tim | Paul | Sue |
| 1 | Nick | Nick | Linda | Adam |
| 3 | Bill | Ryan | Paul | Bill |
| 2 | Tom | Paul | Tom | Sarah |
+---------+----------+------------+------------+------------+
I've been struggling with this for a while so any guidance would be appreciated!
Try this:
=MMULT(ARRAYFORMULA(--('Table 2'!A3:D7) * --('Table 1'!A3:A7 = 'Table 1'!B3:E7)), SEQUENCE(COLUMNS('Table 1'!B3:E7), 1, 1, 0))
--('Table 2'!A3:D7) - places 0s instead of blanks in table 2 (needed for MMULT).
--('Table 1'!A3:A7 = 'Table 1'!B3:E7) - gives a table with 1s in cells corresponding to current reviewer, and 0s in all the other.
Then those two ranges are multiplied cell by cell. That gives a table with the right hours in cells with the reviewers' names, one value in a row.
MMULT gives a row wise sum, which is effectively a column of those hours from the previous step.
If you'll have a bigger table you'll just need to adjust Table 1'!A3:A7, 'Table 1'!B3:E7, and Table 2'!A3:D7 accordingly. The rest will remain the same.
The best I've been able to come up with so far is this SWITCH statement. It works but not so elegant
=ArrayFormula(SWITCH(Current_Assignee, INDEX(Queue,,1), "1", INDEX(Queue,,2), "2", INDEX(Queue,,3), "3", INDEX(Queue,,4), "4", INDEX(Queue,,5), "5"))

Google Sheets: How to eliminate duplicates in some columns and show only the most recent data in others?

I have a spreadsheet of books, with one row for every time a book was checked out (this is a small classroom library). Here are the columns:
BookTitle | Author | DateCheckedOut | CheckedOutBy | Status
=========================================================================
The BFG | Dahl, Roald | 6/1/2016 | Suzy | Out
The BFG | Dahl, Roald | 4/5/2016 | Johnny | Returned
The BFG | Dahl, Roald | 12/4/2015 | Wendy | Returned
Charlotte's Web | White, E.B. | | | Added
Wonder | Palacio, R.J. | 5/29/2016 | Joey | Returned
Wonder | Palacio, R.J. | 3/21/2016 | Mary | Returned
I want to query it to get only the row with the highest date value for each book and then display all columns of that row except CheckedOutBy.
I wanted to get a list of unique book title / author combinations and then join it with the original table the way I would in DB2, but it seems that joins like that are not possible in Google Sheets. I tried grouping and the max function, but when I get those things to work I either haven't been able to eliminate earlier dates or haven't been able to display columns that aren't being used in the aggregate function. My Google Sheets querying skills are not up to par :/
Is there a simple way to do this that I'm missing? I would appreciate any tips.
Here's a copy of that sample data from above in a Google Sheet.:
https://docs.google.com/spreadsheets/d/1J384S0fsc8tgxVMehPb_uyRNc5-6cQx-xKN-q8K8Gds/edit?usp=sharing
I created a new sheet and entered in cell A1
=ArrayFormula(iferror(vlookup(unique(Sheet1!A2:A), sort(Sheet1!A2:E, 3, 0), {1, 2, 3, 5}, 0)))
See if that works for you ?
BREAKDOWN:
The general idea behind the formula is to make use of the fact that VLOOKUP only returns the first match. We want that 'first match' to be the latest date per book.
So first we sort the table so that the latest dates are on top.
We 'lookup' the unique book titles in that sorted table and we return the columns {1, 2, 3, 5}.
Links:
sort() function
vlookup() function

Structuring a query between multiple tabs to join values by name

I'm trying to write a SQL query in Google Sheets to try and get data for "matching" results from two different tabs, but running into some trouble.
This is a sheet that's basically an automated scoring engine for instructors who take a two-part test (written and practical). After the results are entered, I'd like to use some SQL to take the results from the two tabs and collate them into a final score.
Link to the sheet in question.
There's a "Practical Scores" tab (which takes all the data from the associated Google Form), and a "Written Scores" tab. I'd like to get the name of the instructors who match in both those tabs, and give the associated score for them, but I'm mostly having trouble with writing the correct SQL.
Most of what I'm trying to do is working fine. I'm able to pull the final practical scores via the following SQL:
=query(PracticalScores!A2:E, "select A, count(E),SUM(E)/3 group by A")
I can also pull the written scores as follows:
=query('Written Scores'!B2:C,"select B,C")
But I want the intersection of the two as well, and that's where I'm running into problems.
=query(A8:E, "select A,C,D where A = E")
will simply return the rows where the names match up, and I want the instances where the names match up, regardless of whether the rows do.
That is, I want all the rows where the names match from tab 1 to tab 2 and not just the few rows that happen to line up perfectly.
If I'm not explaining this well, please let me know and I can provide additional information. Any assistance would be very greatly appreciated!
Since the query function does not support joins, this can't all be done in one query. Instead, the following device can be used:
=arrayformula(vlookup(name column, table, # of column to extract, False))
For example, suppose I have a table
+---+-------+---+
| | A | B |
+---+-------+---+
| 2 | Jim | 3 |
| 3 | Sarah | 4 |
| 4 | Bob | 5 |
+---+-------+---+
to which I want to add another column, taking it from
+---+-------+---+
| | E | F |
+---+-------+---+
| 2 | Sarah | 9 |
| 3 | Bob | 8 |
| 4 | Jim | 7 |
+---+-------+---+
The basic idea is to put in cell C2 the formula
=arrayformula(vlookup(A2:A, E2:F, 2, false))
which will look up every name from first table (column A) in the column E, and return the matching value in column F. Result:
+---+-------+---+---+
| | A | B | C |
+---+-------+---+---+
| 2 | Jim | 3 | 7 |
| 3 | Sarah | 4 | 9 |
| 4 | Bob | 5 | 8 |
+---+-------+---+---+
In practice, one should filter out empty lookup values to improve performance:
=arrayformula(vlookup(filter(A2:A, len(A2:A)), E2:F, 2, false))
If the second table contains some names not present in the first, they will not be returned by the above formula. In this case it is better to prepare a full list of names, for example with
=sort(unique({Sheet1!A2:A; Sheet2!A2:A}))
which collects the names from A columns of two sheets, eliminating duplicates and sorting. Then look up those using vlookup as above.

Using more than one sheet for a pivot table in Google sheets

I'm trying to link three spreadsheets in Google Sheets by using the pivot tables functionality.
The problem that i have now is that i don't find a way to pull the data for more than one sheet. I can only operate the Pivot table with the information coming from only one.
I have researched quite a lot, but my impression so far is that the documentation available for Google Docs is not so extensive at some point.
Basically what i need to do is the following:
Table 1(main):
Car Name | ModelId | ColorID
ford | 1 | 1
fiat | 2 | 2
Table 2:
ModelID | Name
1 | mustang
2 | bravo
Table 3:
ColorID | Name
1 | Red
2 | Blue
Resulting pivot table:
Car Name | Model| Color
ford | mustang | Red
fiat | bravo | Blue
In SQL statements i'm basically trying to simulate a JOIN.
I also could write a javascript script but i would like to know if there is a simple way to achieve this without coding.
Thanks!
This formula reproduces your example output and will update if more records are added to the 3 tables:
={"CarName","Model","Color";
Table1!A2:A,
ARRAYFORMULA(IFERROR(VLOOKUP(Table1!B2:B,Table2!A:B,2,0))),
ARRAYFORMULA(IFERROR(VLOOKUP(Table1!C2:C,Table3!A:B,2,0)))}
This example sheet shows the formula working.

Resources