I am working with a spreadsheet where I store the books I read. The format is as follows:
A | B | C | D | E | F
year | book | author | mark | language | country of the author
With entries like:
A | B | C | D | E | F
-------------------------------------------------------------
2004 | Hamlet | Shakespeare | 8 | ES | UK
2005 | Crimen y punishment | Dostoevsky | 9 | CAT | Russia
2007 | El mundo es ansí | Baroja | 8 | ES | Spain
2011 | Dersu Uzala | Arsenyev | 8 | EN | Russia
2015 | Brothers Karamazov | Dostoevsky | 8 | ES | Russia
2019 | ... Shanti Andía | Baroja | 7 | ES | Spain
I have several pivot tablas to get different data, such as top countries, top books, etc. In one of them I want to group by authors and order by number of books I have read from each one of them.
So I defined:
ROWS
author (column C) with
order: Desc for COUNT of author
VALUES
author
summation by: COUNT
show as Default
mark
summation by: AVERAGE
show as Default
This way, the data above show like this:
author | COUNT of author | AVERAGE of mark
-------------------------------------------------------------
Baroja | 2 | 7,5
Dostoevsky | 2 | 8,5
Shakespeare | 1 | 8
Arsenyev | 1 | 8
It is fine, since it orders data having top read authors on top. However, I would also like to order also by AVERAGE of mark. This way, when COUNT of author matches, it would use AVERAGE of mark to solve the ties and put on top the one author with a better average on their books.
On my sample data, Dostoevsky would go above Baroja (8,5 > 7).
I have been looking for different options, but I could not find any without including an extra column in the pivot table.
How can I use a second option to solve the ties when the first option gives the same value?
You can achieve a customized sort order on a pivot table without any extra columns in the source range. However... you'd definately need an extra field added to the pivot.
In the Pivot table editor go to Values and add a Calculated Field.
Use any formula that describes the sort order you want. E.g. let's multiply the counter by 100 to use as first criteria:
=COUNTA(author) * 100 + AVERAGE(score)
Do notice it is important to select Summarize by your Custom formula (screenshot above).
Now, just add this new calculated field as your row's Sort by field, and you're done!
Notice though, you do get an extra column added to the pivot.
Of course, you could hide it.
Translated from my answer to the cross-posted question on es.SO.
try:
=QUERY(A2:F,
"select C,count(C),avg(D)
where A is not null
group by C
order by count(C) desc
label C'author',count(C)'COUNT of author',avg(D)'AVERAGE of mark'")
Related
If a user ordered same product with two different order_id;
The orders are created within a same date-hour granularity, for example
order#1 2019-05-05 17:23:21
order#2 2019-05-05 17:33:21
In the data warehouse, should we put them into two rows like this (Option 1):
| id | user_key | product_key | date_key | time_key | price | quantity |
|-----|----------|-------------|----------|----------|-------|----------|
| 001 | 1111 | 22 | 123 | 456 | 10 | 1 |
| 002 | 1111 | 22 | 123 | 456 | 10 | 2 |
Or just put them in one row with the aggregated quantity (Option 2):
| id | user_key | product_key | date_key | time_key | price | quantity |
|-----|----------|-------------|----------|----------|-------|----------|
| 001 | 1111 | 22 | 123 | 456 | 10 | 3 |
I know if I put the order_id as a degenerate dimension in the fact table, it should be Option 1. But in our case, we don't really want to keep the order_id.
Also I once read an article that says that when all dimensions are filtered out, there should be only one row of data in the fact table. If this statement is correct, the Option 2 will be the choice.
Is there a principle where I can refer ?
Conceptually, fact tables in a data warehouse should be designed at the most detailed grain available. You can always aggregate data from the lower granularity to the higher one, while the opposite is not true - if you combine the records, some information is lost permanently. If you ever need it later (even though you might not see it now), you'll regret the decision.
I would recommend the following approach: in a data warehouse, keep order number as degenerate dimension. Then, when you publish a star schema, you might build a pre-aggregated version of the table (skip order number, group identical records by date/hour). This way, you can have smaller/cleaner fact table in your dimensional model, and yet preserve more detailed data in the DW.
below is an example of the data in the table
+--------------+------+---------+
| Expense Name | Cost | mileage |
+--------------+------+---------+
| Costco Gas | 20 | 145200 |
| marathon gas | 2 | 145500 |
| oil change | 35 | 145600 |
| marathon gas | 25 | 145750 |
| A/C Work | 305 | 145800 |
| oil change | 36 | 150000 |
+--------------+------+---------+
Whenever the "Expanse Name" string equals "oil change" and it has the highest Mileage from the corresponding mileage I want that mileage to appear in a separate column.
So with this data I would search through the "Expense Name" column and find two that matched the string. From those two I want the one with the higher mileage(150000) to appear.
Another method that doesn't require dragging or array formulae is
=MAX(FILTER(C2:C, A2:A = "oil change"))
let us say expense name is in A1.
In D2 put the formula =COUNTIF("oil change",A2)*C2.
Grab the lower right hand corner handle of the cell and drag it down to copy throughout your data set (in your case D7).
One cell below (D8 in your example), say =MAX of the above cells, so in your case =MAX(D2:D7).
That cell contains your answer.
I have a problem with emoji in my production database. Since it's in production, all I get out of it is an auto-geneated excel spreadsheet (.xls) every so often with tens of thousands of rows. I use Google Sheets to parse this so I can easily share the results.
What formula can I use to get a count of all cells in column n that contain emoji?
For instance:
Data
+----+-----------------+
| ID | Name |
+----+-----------------+
| 1 | Chad |
+----+-----------------+
| 2 | ✨Darla✨ |
+----+-----------------+
| 3 | John Smith |
+----+-----------------+
| 4 | Austin ⚠️ Powers |
+----+-----------------+
| 5 | Missus 🎂 |
+----+-----------------+
Totals
+----------------------------------+---+
| People named Chad | 1 |
+----------------------------------+---+
| People with emoji in their names | 3 |
+----------------------------------+---+
Edit by Ben C. R. Leggiero:
=COUNTA(FILTER(A2:A6;REGEXMATCH(A2:A6;"[^\x{0}-\x{F7}]")))
This should work:
=arrayformula(countif(REGEXMATCH(A2:A6,"[^a-zA-Z\d\s:]"),true))
You cannot extract emojis with regular formula because Google Spreadsheet uses the light-weight re2 regex engine, which lacks many features, including those necessary to find emojis.
What you need to do is creating a custom formula. Select Tools menu, then Script editor.... In the script editor, add the following:
function find_emoji(s) {
var re = /[\u1F60-\u1F64]|[\u2702-\u27B0]|[\u1F68-\u1F6C]|[\u1F30-\u1F70]|[\u2600-\u26ff]|[\uD83C-\uDBFF\uDC00-\uDFFF]+/i;
if (s instanceof Array) {
return s.map(function(el){return el.toString().match(re);});
} else {
return s.toString().match(re);
}
}
Save the script. Go back to your spreadsheet, then test your formula =find_emoji(A1)
My test yields the following:
| Missus 🎂 | 🎂 |
| Austin ⚠️ Powers | ⚠ |
| ✨Darla✨ | ✨ |
| joke 😆😆 | 😆😆 |
And, to count entries that don't have emojis, you can use this formula:
=countif( arrayformula(isblank( find_emoji(filter(F2:F,not(isblank(F2:F)))))), FALSE)
EDIT
I was wrong. You can use regular formula to extract emoji. The regex syntax is [\x{1F300}-\x{1F64F}]|[\x{2702}-\x{27B0}]|[\x{1F68}-\x{1F6C}]|[\x{1F30}-\x{1F70}]|[\x{2600}-\x{26ff}]|[\x{D83C}-\x{DBFF}\x{DC00}-\x{DFFF}]
I'm using a spreadsheet to store highscores. I have one column (Initials [Column D]) and one column (Scores [Column E]). They are already sorted from highest to lowest (dependent upon the Scores). I want to get the first occurrence of all initials and that initials score.
For example if I had this:
|Initials|Scores|
| ABC | 5 |
| NOT | 4 |
| ABC | 2 |
| LOL | 1 |
I want to get this:
|Initials|Scores|
| ABC | 5 |
| NOT | 4 |
| LOL | 1 |
I've been able to get just the names portion with =UNIQUE(D:D), but how would one also get the scores from the next column? I've been trying for a while now, and can't figure it out.
Since the values in E are already sorted, try:
=ArrayFormula(vlookup(unique(filter(D2:D, len(D2:D))), D2:E, {1,2}, 0))
of if you want to use a limited range:
=ArrayFormula(vlookup(unique(D2:D50), D2:E50, {1,2}, 0))
See if that works ?
A | B | C | D | E | F | G
name|num|quant|item|quant2
car | 5 | 100 |
| | |wheel| 4
| | |axel | 2
| | |engine|1
truck| 2 | 20 |
| | |wheel| 6
| | |bed | 1
| | | axel| 2
I need a formula which will do B*C*E. the tables look like this, so it needs to be something like
=b$2*c$2*e3 and then dragged.... and then the next set, b$6*c$6*e7 and dragged, etc but i want sure how to get the cieling sort of something. if b5 is empty, look at each above until it finds the one not filled.
I am trying to use this to get total quantity of parts per car, truck etc.... and then group by part.
I dont have a set of DB tables to do this, just a spreadsheet.
I had to add some additional information to resolve this.
I was thinking there would be a way to do a google script that would do this and update the file, but i couldnt seem to find it.
I first summed each group item:
=b$3*e4
and dragged for that grouping.
Then afterwards, i went to a selection of space and wrote up a query.
=query(D:F, "select D,sum(F) group by D")