How to calculate total of partial matches for dataset?

How to calculate total of partial matches for dataset? - google-sheets

I'm trying to figure out how to calculate when, in a set of rows, any 2 of them match from 3 criteria.
So (6 isn't the correct answer, just an example), I would be trying to calculate how many of the rows contained 2 out of the three criteria: Cats, Dogs, Parrots. So each permutation is accounted for - Cats, Dogs, Lions would be valid, for example, but Cats, Hippos, Gazelle would not.

try:
=ARRAYFORMULA(COUNTIF(LEN(SUBSTITUTE(FLATTEN(QUERY(TRANSPOSE(
IFERROR((REGEXMATCH(FLATTEN(QUERY(TRANSPOSE(A2:C),,9^9)),
TRIM(SPLIT(E2, ",")))/1)^-1)),,9^9)), " ", )), ">=2"))

Related

Google Sheets function to divide two arrays and return the lowest number

So I manufacture products and the challenge is I need to work out in a Google Sheets spreadsheet how many of each of my recipes (flower bouquets in this case) I can make with the current stock on hand.
I have a dataset which shows my stock qty (number of each flower) in column B, then I have my products across the top of the page in a row. Product 1 is shown in column C, Product 2 in D etc....
Here is the example.
Divide two arrays and return lowest number
As I have hundreds of products and also hundreds of different component parts I need to work out what formula can essentially divide these two arrays and return to me the MIN or lowest number, this is essentially the number of bouquets I can make.
Thanks in advance
I have tried experimenting with SUMPRODUCT as I wondered if you can use this for division as well as multiplication, but I cannot seem to fathom this

Use the following formula in C11:
= ROUNDDOWN(MIN(ARRAYFORMULA(IFERROR( $B2:$B9/C2:C9 ))))
You can then drag the formula to D11.
Alternatively you can use the new bycol function to calculate the complete array:
= BYCOL(ARRAYFORMULA(IFERROR( B2:B9/C2:D9 )), LAMBDA(x, ROUNDDOWN(MIN(x))))

Reference Specific Row in Named Range within another Named Range

I'm writing a spreadsheet to keep track of a small business' financials. They operate a few Rooms for rent, and the structure of the document is made so that each sheet holds a year's worth of booking for all the rooms.
Essentially, each row is defines a specific date, while each rooms spans a few columns (reason is that they don't just want to track whether or not a room is booked, but also record names of clients & other remarks), among which the daily calculated income (some factors alter the daily rate each room will generate).
So this is all fine and dandy, and I've created named ranges for each month of the year, and for each room.
For example, rows 6:36 will represent the month of January, while columns C:I will represent Room 1. Room 2 will span J:P and so forth.
Now, in another sheet, I wanted to make a dashboard which lists the earning for each room, per month. It's a very simple table with 12 rows (one for each month) and 10 columns (1 for each room) where I planned to sum up all the earnings.
So my issue is that I can't find a way to retrieve a specific column of a named range for a room ('vertical named range'), which is also limited in a named range for a month ('horizontal named range'). I had read about using ARRAYFORMULA(INDEX(named_range, ,wished_column)) but that only works for a single named range. My knowledge of these two functions being non-existent, I didn't manage to extend it to a 2-named-range version...
(I mean I did try something along the lines of ARRAYFORMULA(INDEX(January, , INDEX(Room1, , 3))) but that didn't work)
So because there isn't a one-to-one relation from the Dashboard cells to the Rooms cells, my current only solution is to manually reference everything, which you'll understand is inefficient and time-consuming...
My question, in fine, is: How can I retrieve a range that results of the intersection of 2 (or more) named ranges ? Once I have that resulting range, I know it will be very easy to use INDEX().

Define a named range Base as
A:Z
Define a range named Horizontal as
6:36
Define a range named Vertical as
C:I
Then the intersection of the vertical and horizontal ranges is given by:
index(Base,row(Horizontal),COLUMN(Vertical)):index(Base,row(Horizontal)+rows(Horizontal)-1,COLUMN(Vertical)+columns(Vertical)-1)
This can be verified by using it in a function e.g.
=countblank(index(Base,row(Horizontal),COLUMN(Vertical)):index(Base,row(Horizontal)+rows(Horizontal)-1,COLUMN(Vertical)+columns(Vertical)-1))
gives the result 7 * 31 = 217 in my sheet because I haven't filled in any of the cells.
The Offset version of this would be:
=countblank(offset(A1,row(Horizontal)-1,COLUMN(Vertical)-1):offset(A1,row(Horizontal)+rows(Horizontal)-2,COLUMN(Vertical)+columns(Vertical)-2))
or more simply:
=countblank(offset(A1,row(Horizontal)-1,COLUMN(Vertical)-1,rows(Horizontal),COLUMNS(Vertical)))
So this works well in OP's case where you have two fully overlapping ranges like this:
Partial Overlap
Suppose you have two partially overlapping ranges like this:
You can use a variation on the standard overlap formula (This is one of the early references to it as used with a date range)
max(start1,start2) to min(end1,end2)
So the previous formula becomes
=countblank(index(Base,max(row(index(Partial1,1,1)),row(index(Partial2,1,1))),max(COLUMN(index(Partial1,1,1)),column(index(Partial2,1,1)))):
index(Base,min(row(index(Partial1,1,1))+rows(Partial1)-1,row(index(Partial2,1,1))+rows(Partial2)-1),min(COLUMN(index(Partial1,1,1))+columns(Partial1)-1,column(index(Partial2,1,1))+columns(Partial2)-1)))
and the offset version is
=countblank(offset(A1,max(row(offset(Partial1,0,0)),row(offset(Partial2,0,0)))-1,max(COLUMN(offset(Partial1,0,0)),column(offset(Partial2,0,0)))-1):
offset(A1,min(row(offset(Partial1,0,0))+rows(Partial1)-2,row(offset(Partial2,0,0))+rows(Partial2)-2),min(COLUMN(offset(Partial1,0,0))+columns(Partial1)-2,column(offset(Partial2,0,0))+columns(Partial2)-2)))
I have tested this on ranges C2:F10 and D3:G11 which gives the result 24 as expected.
However, if there is no overlap, this can still give a non-zero result, so a suitable test needs adding to the formula:
=if(and(max(row(index(Partial1,1,1)),row(index(Partial2,1,1)))<=min(row(index(Partial1,1,1))+rows(Partial1)-1,row(index(Partial2,1,1))+rows(Partial2)-1),
max(column(index(Partial1,1,1)),column(index(Partial2,1,1)))<=min(column(index(Partial1,1,1))+columns(Partial1)-1,column(index(Partial2,1,1))+columns(Partial2)-1)),"Overlap","No overlap")
Perhaps the best approach in Google Sheets is to go back to the full version of the Offset call OFFSET(cell_reference, offset_rows, offset_columns, [height], [width]) . Although this is rather long, it will return a #Value! error if there is no overlap:
=Countblank(offset(A1,
max(row(offset(Partial1,0,0)),row(offset(Partial2,0,0)))-1,
max(COLUMN(offset(Partial1,0,0)),column(offset(Partial2,0,0)))-1,
min(row(offset(Partial1,0,0))+rows(Partial1),row(offset(Partial2,0,0))+rows(Partial2))-max(row(offset(Partial1,0,0)),row(offset(Partial2,0,0))),
min(COLUMN(offset(Partial1,0,0))+columns(Partial1),column(offset(Partial2,0,0))+columns(Partial2))-max(COLUMN(offset(Partial1,0,0)),column(offset(Partial2,0,0)))
))
Notes
Why did I have to introduce some more indexes (indices?) in the second formula to make it work? Because if you use the row function with a range in an array context, you get an array of row numbers which isn't what I want. As it happens, in the first formula you are not using it in an array context, so you just get the first row and column of the given range which is fine. In the second formula, Max and Min try to evaluate all the rows in the array, which gives the wrong answer, so I have used Index(range,1,1) to force it to look only at the top left hand corner of each range. The other thing is that both index and offset return a reference, so it is valid to use the construct Index(...):Index(...) or Offset(...):Offset(...) to define a new range.
I have also tested the above in Excel (where as mentioned the Index version would be preferable). In this case Base would be set to $1:$1048576.
Although in Excel you have the Intersect Operator (single space) so it's not necessary to use an Index or Offset formula at all e.g. the first example above would simply be:
=COUNTBLANK(Vertical Horizontal)
and if there is no overlap the formula returns a #NULL! error.

"I've created named ranges for each month of the year, and for each
room. For example, rows 6:36 will represent the month of January,
while columns C:I will represent Room 1. Room 2 will span J:P and so
forth."
What I suggest is that if "January" is defined for columns C to whatever (the last column of the last room), then that's all you need.
You haven't shown us the layout of the dashboard. But let's assume that at the very least you're interested in the income generated by each room.
=query({January},"select sum(Col3) label sum(Col3)'' ")
In this image, the range called "January" is highlighted. Note that it does NOT include the header. Note also that it can be many columns wide; in this example, I've just made up a few columns, but your range should cover all the columns for rooms 1 to n.
Syntax: QUERY(data, query, [headers])
Data: This formula queries the range called "January". That range can be on the same sheet, on on another sheet (such as your Dashboard). Reminder: in this screenshot, "my version of "January" is highlighted.
Query to count Number of People: "select sum(Col3) label sum(Col3)'' "
Query to sum the income earned: "select count(Col2) label count(Col2)'' "
Col2 & Col4 = Number of People for Room#1 and Room#2 respectively.
Col3 & Col5 = Income for Room#1 and Room#2 respectively.
[headers]: You can ignore them.
This formula delivers just the value of the query; even though it includes a "label", the label will not print.
Modify and adapt these formulae to create the other information required for your Dashboard.

Populate a new Google Sheets sheet, with a random sample, with at least X samples of a given attribute

I have a data set which I would like to take a random sample from and place in to a new sheet. I have one extra constraint / stratification: I would like X examples of each of a given attribute.
For example, if COL A has 5 rows of Apples, 5 rows of Bananas etc., I would like a random sample which includes 2 Apple rows, 2 Banana rows and so on for as many values of COL A as there are.
I am halfway there having got a formula to populate a new sheet with a random sample:
A1: =ArrayFormula(FILTER( SORT('My list of 100000 rows'!A:A ;RANDBETWEEN( 0+ROW('My list of 100000 rows'!A:A) ; ROWS('My list of 100000 rows'!A:A)); TRUE); ROW('My list of 100000 rows'!A:A)<=100))
but this doesn't give me the ability to select a minimum or exact number of instances of each unique attribute.
Any advice is appreciated!

I would like a random sample which includes 2 Apple rows, 2 Banana rows and so on for as many values of COL A as there are.
Insert two columns to the left of your data and in A1:
=choose(randbetween(1,10),"12","13","14","15","23","24","25","34","35","45")
in B1 and copied down to suit:
=countif(C$1:C1,C1)
then :
=query(A:D,"select C,D where B contains '"&left(A1)&"' or B contains '"&right(A1)&"' ")

Arranging the ingredient-quantity data in a single row, to be retrieved elsewhere

I want to implement a table on Google Sheets that contains rows with different data, two of these fields to be related arrays and I'd like to know best way to implement it. There are three different sheets, named Recipies, Ingredients and Dishes. Recipies contain the name of the dish and the list of ingredients, Ingredients contains the nutritional info of each ingredient, and Dishes contains the actual nutritional info of each serving.
Recipies Sheet I want to keep it as readable and visual as possible:
Name Time Ingredients
Pasta 20min {150 spaghetti, 30 sauce, 5 marjoram, 20 cheese}
Dishes:
Name Prot. Carb. Fat KCal
Pasta x x x x
I want to get the recipies' ingredients and quantities data, lookup for its nutritional content and sum the quantities on the Dishes sheet. The Ingredients sheet looks like:
Ingredient Prot. Carb. Fat KCal
Pasta x x x x
Marjoram x x x x
I know how to get row data from a different sheet if ingredients and quantities are arrenged in two rows and each one in different columns, like to sum the total protein in the dish:
=ArrayFormula(SUM(LOOKUP(F5:K5;Ingredients!$B$4:$B$1000;Ingredients!$D$4:$D$1000)*((F6:K6)/100)))/E5
The problem I have is I want to have in a single row the list of ingredients and the quantity used like {150 spaghetti, 30 sauce, 5 marjoram, 20 cheese}. I want to keep it readable at a glance. I tried to use an array but dont know how to make it work.
How can I best arrange data to be in a single row and be able to use it to lookup data on other sheets?

Ideally, you would enter data for each recipe in structured form, such as an ingredient-quantity table, and then display it in readable form, by building a text string from the data using join or similar. General idea is that structured data should be the source of information, while the things displayed for human consumption are derived from that.
But if you insist on using strings such as "150 spaghetti, 30 sauce, 5 marjoram, 20 cheese" as source of information, here is a way to compute nutritional content from them.
Suppose A:B is an ingredient - nutritional content table, such as
Ingredient Protein
spaghetti 2
sauce 4
eggs 10
cheese 6
marjoram 2
and cell E2 has "150 spaghetti, 30 sauce, 5 marjoram, 20 cheese". The following formula computes the amount of protein in this recipe:
=sumproduct(
vlookup(split(regexreplace(E2, "[\d\W]+", " "), " "), A:B, 2, 0),
split(regexreplace(E2, "\D+", " "), " ")
)
Explanation:
the first split-regexreplace combo extracts all words from the recipe, by replacing the digits and punctuation with spaces and then splitting by space. We get the array of four words: spaghetti, sauce, marjoram, cheese
then, vlookup looks up the protein content (column 2 in range A:B). If you also have fat, calories, carbs, etc, then the lookup range will be A:E and the column to use will be one of 2, 3, 4, 5.
the second split-regexreplace extracts the amounts of ingredients, it removes everything that is not a digit from the formula
sumproduct adds the products of these two arrays, computing the total protein content.
All this is fragile because of regex-based parsing of text, and will break if the string deviates from expected format; for example, if a recipe calls for "1 1/2 tablespoons of sugar". In which case, see the first paragraph.

Compute subranks in spreadsheet column in combination with ArrayFormula (Google Sheets)

I'm trying to find the inverse rank within categories using an ArrayFormula. Let's suppose a sheet containing
A B C
---------- -----
1 0.14 2
1 0.26 3
1 0.12 1
2 0.62 2
2 0.43 1
2 0.99 3
Columns A:B are input data, with an unknown number of useful rows filled-in manually. A is the classifier categories, B is the actual measurements.
Column C is the inverse ranking of B values, grouped by A. This can be computed for a single cell, and copied to the rest, with e.g.:
=1+COUNTIFS($B$2:$B,"<" & $B2, $A$2:$A, "=" & $A2)
However, if I try to use ArrayFormula:
=ARRAYFORMULA(1+COUNTIFS($B$2:$B,"<" & $B2:$B, $A$2:$A, "=" & $A2:$A))
It only computes one row, instead of filling all the data range.
A solution using COUNT(FILTER(...)) instead of COUNTIFS fails likewise.
I want to avoid copy/pasting the formula since the rows may grow in the future and forgetting to copy again could cause obscure miscalculations. Hence I would be glad for help with a solution using ArrayFormula.
Thanks.

I don't see a solution with array formulas available in Sheets. Here is an array solution with a custom function, =inverserank(A:B). The function, given below, should be entered in Script Editor (Tools > Script Editor). See Custom Functions in Google Sheets.
function inverserank(arr) {
arr = arr.filter(function(r) {
return r[0] != "";
});
return arr.map(function(r1) {
return arr.reduce(function(rank, r2) {
return rank += (r2[0] == r1[0] && r2[1] < r1[1]);
}, 1);
});
}
Explanation: the double array of values in A:B is
filtered, to get rid of empty rows (where A entry is blank)
mapped, by the function that takes every row r1 and then
reduces the array, counting each row (r2) only if it has the same category and smaller value than r1. It returns the count plus 1, so the smallest element gets rank 1.
No tie-breaking is implemented: for example, if there are two smallest elements, they both get rank 1, and there is no rank 2; the next smallest element gets rank 3.

Well this does give an answer, but I had to go through a fairly complicated manoeuvre to find it:
=ArrayFormula(iferror(VLOOKUP(row(A2:A),{sort({row(A2:A),A2:B},2,1,3,1),row(A2:A)},4,false)-rank(A2:A,A2:A,true),""))
So
Sort cols A and B with their row numbers.
Use a lookup to find where those sorted row numbers now are: their position gives the rank of that row in the original data plus 1 (3,4,2,6,5,7).
Return the new row number.
Subtract the rank obtained just by ranking on column A (1,1,1,4,4,4) to get the rank within each group.
In the particular case where the classifiers (col A) are whole numbers and the measurements (col B) are fractions, you could just add the two columns and use rank:
=ArrayFormula(iferror(rank(A2:A+B2:B,if(A2:A<>"",A2:A+B2:B),true)-rank(A2:A,A2:A,true)+1,""))

My version of an array formula, it works when column A contains text:
=ARRAYFORMULA(RANK(ARRAY_CONSTRAIN(VLOOKUP(A1:A,{UNIQUE(FILTER(A1:A,A1:A<>"")),ROW(INDIRECT("a1:a"&COUNTUNIQUE(A1:A)))},2,)*1000+B1:B,COUNTA(A1:A),1),ARRAY_CONSTRAIN(VLOOKUP(A1:A,{UNIQUE(FILTER(A1:A,A1:A<>"")),ROW(INDIRECT("a1:a"&COUNTUNIQUE(A1:A)))},2,)*1000+B1:B,COUNTA(A1:A),1),1) - COUNTIF(A1:A,"<"&OFFSET(A1,,,COUNTA(A1:A))))

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart