Sort pie chart's slices without sorting the data in the columns - google-sheets

I haven't found something similar to that on the web, and it seems that the only way for the data to be sorted on the pie chart is if they're pre-sorted.
The problem is that I have a sheet populated by a third party software (Typeform) that places random data, which I then aggregate to present to the pie chart.
More specifically, Typeform writes
town | salary | cost
London | 1000 | 500
Bristol | 700 | 300
London | 900 | 400
Leeds | 600 | 200
Leeds | 500 | 300
Leeds | 400 | 200
Then I aggregate the data in another sheet (Sheet2) so that I have
town | occurrences
London 2
Bristol 1
Leeds 3
Obviously the pie chart will draw London first, then Bristol, and then Leeds. These are only 3 entries, however in my example, I have 20, and the data in the pie chart are not ordered.
Sheet2's data cannot be sorted descending since I am using =UNIQUE(Sheet1!A2:A) and then in the column next to it =countif(Sheet1!A:A,A2) to populate them from the Sheet1 where the 3rd party software is writing them, in fact when I select them and click sort they don't get sorted, they reappear as they were.
Is there any way to sort them (and keep them sorted) in Sheet2, or by writing them in a new sheet?

If town is in A1 of Sheet1 please try:
=query(Sheet1!A2:D7, "select A, count(C) group by A")

Assuming Typeform data is in Sheet1!A:C, the following function in Sheet2!A1 should do the trick:
=QUERY("Sheet1!A:C","select A, count(B) order by count(B) desc",1)

another way to do it is to simply apply Filter on your data, then sort from A-Z (for lowest-highest percentage) or Z-A (for highest-lowest percentage).
Then you create your PIE chart out of that and it comes out sorted!

Related

Can I filter out pivot table results that only have one row for a value in column A?

I created a pivot table in googlesheets, and it returns results that look like:
first | second | CountOf3
--------------------------
thing | value | 23
| newVal | 3
| cool | 34
that | value | 234
otherThing | cool | 4
| newVal | 345
And I want to filter out results with just one resulting row for the item in the first column.
So in this example, that would be the row: that | value | 234.
I would like the filter to remove that row, and leave the remaining rows. This is a pivot table in a 2nd sheet that updates when Sheet1 changes.
I have been trying all day, and have not been able to come up with a solution. I was hoping there would be some sort of filter, or spreadsheet formula to do this. I've tried multiple combinations of filters, but nothing seems to work - I'm starting to wonder if this is even possible.
It isn't pretty, but a brute force way is to have a check column beside your pivot table, with this formula on the first data row, ie beside "thing | value | 23".
It flags each row where the subsequent cell in column D is not blank. Then use a query (or filter) to list only the output rows you want. Note that you would hide the columns or rows with the actual (unfiltered) pivot output.
This is the simplest version, to see the logic:
=AND(LEN(D3),LEN(D4))
which results in a TRUE value for pivot chart rows that only have one value.
A more elegant version is an arrayformula, adds the header lable, and uses "Skip" as the flag for which rows to filter out.
={"Better Check";ARRAYFORMULA(IF(LEN(D3:D998)*LEN(D4:D999)*LEN(E3:E998),"Skip",))}
Note that this formula allows for a pivot table result effectively to the bottom of the sheet, but it does have a finite range, due to the constraint of checking two rows at once. It could be enhanced by using a COUNTA on the third data column to measure the exact length of the pivot table results and control the range dynamically, Like this:
={"Better Check";
ARRAYFORMULA( IF( LEN(INDIRECT("D3:D" & (COUNTA(F$3:F)+ROW(F$2)))) *
LEN(INDIRECT("D4:D" & (COUNTA(F$3:F)+1+ROW(F$2)))),
"Skip",))}
Let us know if this helps at all.

Combine 2 queries (different columns from 2 different sheets) and filter based on matching results

As usual, I have set a goal, way beyond my skills...
I need to get data from 2 sheets, One has a lot more entries than the other (a master list I guess you could say). Any entry in the smaller sheet will always have a matching entry in the Master, but not necessarily the other way round.
I have written what I need in pseudo query syntax, but I need help getting this to work...
QUERY the 'Catalog' sheet and get TITLE, SUBTITLE, STATUS, TITLE-ID WHERE the STATUS does NOT have the word 'Retired' in it.
Then Query 'Report_Dec 2017' and get UNITS, USD, GPB, EUR WHERE TITLE-ID from 'Report_Dec 2017' Matches TITLE-ID from 'Catalog'
Catalog (master)
| TITLE | SUBTITLE | STATUS | TITLE-ID |
Report_Nov_2017
| UNITS | USD | GPB | EUR | (has TITLE-ID also, but don't need this twice)
Final result should look like this:
| TITLE | SUBTITLE | STATUS | TITLE-ID | UNITS | USD | GPB | EUR |
The end result should only ever have a max number of entries equal to that of from 'Report_Nov 2017', So the Catalog might have 100 total entries but since only 20 units were sold in November, then the result will only show 20
First of all is that possible? And secondly, if it is, can someone point me in the right direction?
EDIT UPDATE
I have made some progress with this, but I am stuck on a strange issue...
This is my google sheet:https://docs.google.com/spreadsheets/d/10uXJVilUqAnSE_ZPlA6VKMBl0DCFRt_WqzYYl-c4Syc/edit?usp=sharing
This is my current formula:
=ArrayFormula(query({to_text(Catalog!B:J),to_text('Report_Nov 2017'!A:J)},"SELECT Col1,Col3,Col4,Col9,Col16,Col17,Col18,Col19 where Col4 != 'Retired' and Col15 MATCHES '"&textjoin("|", TRUE, Catalog!J2:J)&"'",1))
I am getting a result where the entries returned from Catalog are not matching the entries returned from ReportNov2017 - It just seems to be grabbing the first 25 results from Catalog instead of checking to see if the TITLE ID matches in ReportNov2017 - Any Ideas where Im going wrong?
I suggest you split the task into smaller tasks:
add some columns to the report sheet: | TITLE | SUBTITLE | STATUS |. Get their values from Catalog (master). You may try vlookup arrayformula to automate this. See the article.
Then use simple query formula to get the rest.

Filter to the latest month and then filter to the best score per person

I've got a Google Sheet which holds the results of a monthly competition. The format is
Name | Date | Score
--------------------------------
Alan Smith | 14/01/2016 | 500
Bob Dow | 14/01/2016 | 450
Bob Dow | 16/01/2016 | 470
Clare Allie| 16/01/2016 | 550
Declan Ham | 16/01/2016 | 350
Alan Smith | 10/02/2016 | 490
Bob Dow | 10/02/2016 | 425
Declan Ham | 12/02/2016 | 400
Declan Ham | 12/02/2016 | 390
Clare Allie| 12/02/2016 | 560
I want to do 2 things with this data
I want to create a new sheet which holds the latest 'best' results. For the data presented here that would be
Alan Smith | 10/02/2016 | 490
Bob Dow | 10/02/2016 | 425
Declan Ham | 12/02/2016 | 400
Clare Allie| 12/02/2016 | 560
i.e. The results from February with the 'best' score per person. Here Declan Ham's lower score of '390' was removed.
I want another sheet to hold the tournament ranking. People are ranked by their top 3 monthly scores. i.e. The best score for each person for each month is obtained and the top 3 scores are combined to give their place in the tournament.
So far I've attempted to use Google queries, vlookups, filters to get these new sheets. But, just focusing on 1), the best I've been able to achieve is
=FILTER(Results!$A:$B, MONTH(Results!$B:$B) = MONTH(MAX(Results!$B:$B)))
Which will get me the results from the latest month. But it does not remove duplicates entries by people.
Does anyone have a suggestion for how I can achieve these requirements? Feel like I'm treading water at the moment.
Rather than trying to remove duplicates, you need to identify the maximum score by each person; you can do that by grouping values by person, then aggregating using max(). Here's how that would look, for the month of February 2016:
=query(Results!A1:C,"select A,max(C) where todate(B) > date '2016-2-1' group by A")
Instead of using a fixed value for the start of the latest month, we can get the year and month using spreadsheet formulas, and concatenate our query with them:
=query(Results!A1:C,"select A,max(C) where todate(B) > date '"&year(max(Results!B2:B))&"-"&month(max(Results!B2:B))&"-1' group by A")
That addresses your first question.
Tournament ranking
Your second goal is too complex for a single spreadsheet formula, in my opinion. Here's a way to accomplish it with multiple formulas, though!
The X & Y axes are filled out by spreadsheet formulas. On the X axis (orange), we populate participants names using this in cell A3:
=unique(Results!A2:A)
The Y axis consists of dates (green). These are the start dates of each unique month that there are scores for, calculated using the following formula in cell D2. This results in strings, e.g. 2016-01-1, and that format is specifically required for the later formulas to work.
=TRANSPOSE(SORT(UNIQUE(ARRAYFORMULA(TEXT(Results!B2:B13,"YYYY-MM-1")))))
Here's the formula for cell D3, which will calculate the sum of the 3 highest scores recorded for the user whose name appears in A3, for the month appearing in D2. (Copy & Paste the formula across the full range of participants & months, and it will adjust.)
=sum(query(Results!$A$1:$C,"select C where A='"&$A2&"' and todate(B) >= date '"&B$1&"' and todate(B) < date '"&IF(ISBLANK(C$1),TEXT(TODAY()+1,"yyyy-mm-dd"),C$1)&"' order by C desc limit 3 label C ''"))
Key points about that formula:
The query range needs to used fixed values so it isn't transposed when copied to additional cells. However, it's still open-ended, to absorb additional rows of scores on the "Results" sheet.
Results!$A$1:$C
A WHERE clause is used to select rows from the Results sheet that are for the given participant (A='"&$A2&"') and fall within the month that heads the column (C$1).
...and todate(B) < date '"&IF(ISBLANK(C$1),TEXT(TODAY()+1,"yyyy-mm-dd"),C$1)&"'
The best 3 scores for the month are found by first sorting the above result descending, then limiting the result to 3 rows.
...order by C desc limit 3
Finally, the QUERY headers are suppressed by this little trick, so that we get a single number as the result:
...label C ''
Individual tournament totals appear in column C, with a range SUM across the row, e.g. for cell C3:
SUM(D3:3)
The corresponding ranking in column B is then:
RANK(C3,C$3:C)
Tidy
For simpler copy/paste, you can do some error checking in these formulas, so that they can be placed in the sheet before the corresponding data is - for example, at the start of your season. Using IF(ISBLANK(... or IFERROR(... can be very effective for this.
B3 & down:
=IFERROR(RANK(C3,C$3:C))
C3 & down:
=IF(ISBLANK(A3),"",sum(D3:3))
D3 & rest of field:
=IFERROR(sum(query(Results!$A$1:$C,"select C where A='"&$A3&"' and todate(B) >= date '"&D$2&"' and todate(B) < date '"&IF(ISBLANK(E$2),TEXT(TODAY()+1,"yyyy-mm-dd"),E$2)&"' order by C desc limit 3 label C ''")))
Alternatively for the first part of your question (the latest 'best' results) , in addition to the solution provided by Mogsdad, this should also work.. :-)
=ArrayFormula(iferror(vlookup(unique(A2:A), sort(A2:C, 2, 0, 3, 0), {1,3}, 0)))
EDIT: This formula sorts the table with dates (col B) descending and col C descending and then (ab)uses the fact that vlookup only returns the first match to return the first and last column.

correlate a demographic column with answers from multiple columns

I have a spreadsheet from a Google Consumer Survey. The survey captured demographics as well as the responses to a question. Acceptable responses could have chosen zero or more 'answers'. The response for each answer is in a unique column. For example,
user id | gender | age | income | answer 1 | answer 2 | answer 3 |
0001 | Female | 20-30 | 50-75 | [empty] | Right | Never |
0002 | Male | 20-30 | 30-50 | Up | Left | [empty] |
I would like to know how to correlate a column of demographic info with each of the possible answers. For example, I want to be able to answer questions like, Were males more likely than females to choose X for answer 1? and Which age group was more likely to choose Y for answer 2?
I prefer an answer using Google Sheets functions, but I am open to learning other ways to understand the data. Thank you for any help!
Good way is to use query function. Let's first assume, your data is stored in range A:G:
A | B | C | D | E | F | G |
user id | gender | age | income | answer 1 | answer 2 | answer 3 |
0001 | Female |20-30| 50-75 | [empty] | Right | Never |
0002 | Male |20-30| 30-50 | Up | Left | [empty] |
you may write simple query functions.
For example, to count all answer 1, group them by gender and age, pivot by answer 1:
=query(A:G,"select B, C, count(D) where not A is null group by B, C pivot E")
where not A is null -- prevents empty data to be used in query
count(D) -- can count any column, that wasn't already used by query
group by B, C -- must contain all selected items, except aggregates (count, sum, ets.)
pivot E -- will make all answers to show in separate columns.
The result will look like this:
Left Never Right Up
Female 20-30 1 1 1
Female 30-40 1
Male 20-30 1 1 1
Male 30-40 1
Please, look at complete Query Language Reference to learn more.
Have you tried using the Pivot Table function of Google Sheets?
Download the data in excel format after the survey is complete and open with Google Sheets
Select the tab with the resulting data from the Google Consumer Survey after it is run.
From the menu, select Data -> Pivot Table. This opens a new tab in your spreadsheet.
For the Values area of the pivot table, select User ID and from the "Summerize by" dropdown, select COUNTUNIQUE
For the columns and rows, select whichever dimensions you are interested in. For instance, in your example, you would pick
"Gender" and "Answer 1" as a row and column.
"Age" and "Answer 2" as a row and column.
This should answer these kinds of questions easily.
Hope this helps!
What I think I needed was the COUNTIFS function (in Google Sheets). Notice the plural use, which is different than countif (singular).
COUNTIFS allowed me to specify multiple criteria to make a score for each demographic segment. For example, I could count all the Males that responded Up in the answer 1 column.

Is there a multiple-and-add formula in Google's spreadsheet?

What I want is to easily multiply a number by another number for each column and add them up at the end in Google Sheets. For example:
User | Points 1 | Points 2 | Points 3 | Total
| 5 | 1 | 4 |
-----+----------+----------+----------+------
Jane | 2 | 3 | 0 | 13 (2*5 + 3*1 + 0*4)
John | 1 | 11 | 4 | 32 (1*5 + 11*1 + 4*4)
So it's easy enough to make this formula for the total:
= B3*$B$2 + C3*$C$2 + D3*$D$2
The problem is I frequently need to insert additional columns or even remove some columns. So then I have to mess with all the formulas. It's a pain... we have many spreadsheets with these formulas. I wish there was a formula like SUM(B3:D3) where I could just specify a range. Is there anything like MULTIPLY_AND_SUM(B2:D2, B3:D3) that would do this? Then I could insert columns in the middle and the range would still work.
There is a built in function in Google Sheets that does exactly what you are looking for: SUMPRODUCT.
In your example the formula would be:
=sumproduct(B$2:D$2,B3:D3)
Click here for more information about this function.
You can accomplish that without requiring a special-purpose function.
In E3, try this (and copy it to the rest of your rows):
=sum(arrayformula(B3:D3*B$2:D$2))
You can read about arrayformula here.
As long as you introduce new columns between B and D, this formula will automatically adjust. If you add new columns outside of that range, you'll need to edit (and cut & paste).
On it's own, arrayformula(B3:D3*B$2:D$2) operates over each value in B3:D3 in turn, multiplying it by the corresponding value in B$2:D$2. (Note the use of absolute references to 'lock down' to row 2.) The result in this case is three values, [10,3,0], arranged horizontally in three rows because that matches the dimensions of the ranges.
The enveloping sum() function adds up the values of the array produced by arrayformula, which is 13 in this case.
As you copy that formula to other rows, the relative range references get updated for the new row.

Resources