Counting Unique Values with Contain Critera - google-sheets

I've searched all over and this simple principle is apparently not so simple.
BTW I'm using Google Sheets which as you probably know has most of the same functionality as Excel, plus an extra function that might be useful in my case: COUNTUNIQUE()
My "Criteria" is two-fold and required for two different expressions:
Count the unique values that "contain" a string.
Count the unique values that "don't contain" a string.
Consider this data:
A
1 snapple
2 snapple
3 grapple
4 orange
5 orange
6 peach
Criteria 1: Say I want to count the unique values in Column A that contain the word "apple."
In the data above, it should render "2," knowing it should void duplicates.
Criteria 2: Say I want to count the unique values in Column A that don't contain the word "apple."
In the data above, it should render "2," knowing it should void duplicates.
Here's a sheet doc to test: https://docs.google.com/spreadsheets/d/1JYZIhZmSuoWoGvmTFFcBAD1EUQVWvosqBQzkGsbwaOI/edit?usp=sharing

Criteria 1:
=COUNTUNIQUE(IFERROR(QUERY(A:A,"select A where A contains 'apple'",0)))
Criteria 2:
=COUNTUNIQUE(IFERROR(QUERY(A:A,"select A where not(A contains 'apple')",0)))
Original answer:
Criteria 1:
=COUNTUNIQUE(IFERROR(FILTER(A:A,SEARCH("apple",A:A))))
Criteria 2:
=COUNTUNIQUE(IFERROR(FILTER(A:A,ISERROR(SEARCH("apple",A:A)))))
This gives unpredictable results when referencing cells with clickable URLs in. IMO, this is a bug associated specifically with the FILTER function, and how it parses URLs. QUERY works around this because (again, IMO) it will first convert the source data to a single data type (in this case text) in each referenced column.

Related

Advanced filter/configurator based on dataset

I would like help with a problem, or rather a challenge in Excel and/or Google Sheets.
What we want to develop is as follows:
We have a table of products and certain attributes. Now we want to create a kind of search function based on this table.
Example:
Let me give a simple example. Suppose you have as a product an apple, a banana and an orange. The characteristics associated with these are size, color country of origin. We then want a search function, where you indicate one or more preferences, i.e. size, color and/or country of origin and that based on those criteria, all products that meet these criteria are displayed.
So if you specify oblong as the size and do not specify any other criteria, it only shows "Banana. If the banana and the orange have Holland as their country of origin and you only give Holland as the criteria country of origin, it will show 'Banana' and 'Orange'. If you say country of origin Netherlands and format oblong, it again shows only 'Banana'
See below an image of our document and how we would like this to look approximately.
Currently, there is no existing formula, because we simply do not know if this can be done and how best to do it.
The document can be accessed at:
A copy of our document with sample data:
Document
ADDITION:
Hi, Unfortunately I still am not able to get it to work. I am not really a hero in coding/functions. I created a bit more of a clear view in my file and also set the language of my sample file to english. You can find it here: Sample
What I actually need is just that it shows the data on 'Datasheet' if conditions on the left (parameters/value) are met, but only if they are filled. Probably easy one for you, hard to me haha Could you help me out once more? –
Your question is very generic, I will try provide here some guidelines on how to achieve it in Excel or Google Sheet based on my own experience. The approach used for Excel can be used for Google Spreadsheet, since it is based on FILTER function that both tools have but with different signature. For Google Spreadsheet you can also use QUERY that is very powerful for situation like this.
In all cases, it is a good practice to have a sheet with the input raw data (let's say Input tab), then in second sheet the working data of filtered data (let's say WorkData). This is specially relevant when the raw data is big dataset, so you don't touch the original data set, and instead you have the filtered data in a separated tab.
Both tools offer filter features in the UI or slice. This is something to consider, but using Excel/Google Spreadsheet functions, you can show the filter parameters in a more friendly manner, because you can see the parameters selected without additional click to find what filter values where selected. The approach here is based on Excel/Google Spreadsheet functions.
Excel
Let's say you have a block of filter conditions that you want to apply to a range of data. You can use data validation list so you can select a subset of possible values for each of the filter conditions and then to concatenate such conditions logically (OR or AND) using multiplication of addition.
=FILTER(dataset, condition1 * condition2...conditionN)
where each condition is based on the filter value you want to restrict and each condition represents an array of {TRUE,FALSE} values all of them of the same size as dataset (number of rows).
I use some wildcard values to represent all values of the column, in my case I use ALL, but you can setup in a different way. In such case the filter doesn't take effect, but we want to make it work when a specific value is selected. The following trick can be used for both scenarios.
IF(B3="ALL", D3:D15<>"*",D3:D15=B3)
indicating that if B3 is equal to ALL, then the condition to select all of the D3:D15 rows is the following: <>"*". Otherwise select only the rows equals to B3.
Sometimes I would like to consider OR conditions for a given filter condition, for example for a given filter condition, consider value1 or value2 and it is represented in the filter value as a list of values delimited by comma, for example: value1, value2.
Here, some Stack Overflow questions I posted with answers about how to deal with that:
Filter an excel range based on multiple dynamic filter conditions
Filter an excel range based on multiple dynamic filter conditions (with column values delimited)
Google Spreadsheet
The FILTER function here, allows to add the filter conditions via input arguments, so now we have:
=FILTER(dataset, condition1, condition2...,conditionN)
Note: Keep in mind in Google Spreadsheet we don't need to add the conditions by multiplying each one of them. It is added via input argument.
here you can check some of question I posted related to this topic:
Using ARRAYFORMULA with SUMIF for multiple conditions combined with a wildcard to select all values for a given condition
Using ARRAYFORMULA with SUMIF for multiple conditions combined with conditions using a wildcard. Result by Months
In some cases it is better to use QUERY function.
Here, a sample file using QUERY statement and how to combine multiple conditions inserting IF in the where statement.
sample query on C1 cell:
=query('Jira Issues'!$A:$T, "where "
& IF(B2="", "G is not Null", "G >= date '"
& TEXT(startPeriod,"yyyy-mm-dd")&"'")
& IF(B3="", "", " and G <= date '"
& TEXT(endPeriod,"yyyy-mm-dd")&"'")
& IF(OR(B4="ALL",B4=""), "", " and A='"&B4&"'")
& IF(OR(B5="ALL",B5=""), "", " and I='"&B5&"'")
& " label A 'Team', S 'Reporter', T 'Assignee',
P 'Env.', I 'Release'",1)
The raw data is in Jira Issues tab, the data populated is based on multiple filter conditions. I am using some name ranges for the filter values for a better understanding of the formula, such as: startPeriod, endPeriod, etc. You can test the actual query will be invoked looking at the result of the consolidated string of the query input argument of QUERY function.
Similarly you can stablish a where statement to consider whether the input parameter is empty or not. In such case, you can build a logic like this inserting an IF block as part of the where statement and concatenate the string result.
=QUERY(Input!A:Y,
"select *" & " where A " & IF(B2="", "<>'*'", "='"&B2&"'")
"and " & " where B " & IF(B3="", "<>'*'", "='"&B3&"'")
,1)
The above query for column A or B, returns the entire column via condition: "<>'*'" if the input parameter B2 or B3 were not specified. In a similar way you can add additional conditions for more parameters, repeating the third line of the query and changing the column and the parameter cell.
Recommendations
Focus on a specific tool: Excel or Google Spreadsheet, even they have some similarities, you need to get familiar with the specifics of each one of them.
Try to start working on your specific problem, once you face impediments, do some research, usually you are not the first person facing this problem, if you don't find a solution, then post your specific problem using a sample as an extract of your real problem (in English, your sample is in other language). Generic questions like this one are difficult to get some attention.

Is there a way to specify an input is a single cell in Google Sheets?

I want to iterate over an array of cells, in this case B5:B32, and keep the values that are equal to some reference text in a new array.
However, SPLIT nowadays accepts arrays as inputs. That means that if I use the array notation of "B5:B32" within ARRAYFORMULA or FILTER, it treats it as a range, rather than the array over which we iterate one cell at a time.
Is there a way to ensure that a particular range is the range over which we iterate, rather than the range given at once as an input?
What I considered was using alternative formulations of a cell, using INDEX(ROW(B5), COLUMN(B5)) but ROW and COLUMN also accept array values, so I'm out of ideas on how to proceed.
Example code:
ARRAYFORMULA(
INDEX(
SPLIT(B5:B32, " ", 1), 1
) = "Some text here"
)
Example sheet:
https://docs.google.com/spreadsheets/d/1H8vQqD5DFxIS-d_nBxpuwoRH34WfKIYGP9xKKLvCFkA/edit?usp=sharing
Note: In the example sheet, I can get to my desired answer if I create separate columns containing the results of the SPLIT formula. This way, I first do the desired SPLITS, and then take the values I need from that output by specifying the correct range.
Is there a way to do this without first creating an output and then taking a cell range as an input to FILTER or other similar functions?
For example in cell C35 I've already gotten the desired SPLIT and FILTER done in one go, but I'd still need to find a way to sum up the values of the first character of the second column. Doing this requires that I take the LEFT value of the second column, but for that I need to output the results and continue in a new cell. Is there a way to avoid this?
Ralph, I'm not sure if your sample sheet really reflects what you are trying to end up with, since, for example, I assume you are likely to want the total of the hours per area.
In any case, this formula extracts all of the areas, and the hours worked, and is then easy to do further calculations with.
=ArrayFormula({REGEXEXTRACT({C5:C9;D5:D9;E5:E9;F5:F9;G5:G9;H5:H9},"(.*) \d"),
VALUE(REGEXEXTRACT({C5:C9;D5:D9;E5:E9;F5:F9;G5:G9;H5:H9}," (\d+)hrs"))})
Try that in cell E13, to see the output.
The first REGEXEXTRACT pulls out all the text in front of the first space and number, and the second pulls out all the digits in a string of " #hr" in each cell. These criteria could be modified, if necessary, depending on your actual requirements. Note that it requires the use of VALUE, to convert the hours from text to numeric values, since REGEXEXTRACT produces text (string) results.
It involved concatenating your multiple data columns into one long column of data, to make it simpler to process all the cells in the same way.
This next formula will give you a sum, for whatever matching room/task you type into B6, as an example.
=ArrayFormula(QUERY({REGEXEXTRACT({C5:C9;D5:D9;E5:E9;F5:F9;G5:G9;H5:H9},"(.*) \d"),
VALUE(REGEXEXTRACT({C5:C9;D5:D9;E5:E9;F5:F9;G5:G9;H5:H9}," (\d+)hrs"))},
"select Col1, sum(Col2) where Col1='"&B6&"' group by Col1 label sum(Col2) '' ",0))
I will also answer my own question given what I know from kirkg13's answer and other sources.
Short answer: no, there isn't. If you want to do really convoluted computations with particular cell values, there are a few options and tips:
Script your own functions. You can expand INDEX to accept array inputs and thereby you can select any set of values from an array without outputting it first. Example that doesn't use REGEXMATCH and QUERY to get the SUM of hours in the question's example data set: https://docs.google.com/spreadsheets/d/1NljC-pK_Y4iYwNCWgum8B4NJioyNJKYZ86BsUX6R27Y/edit?usp=sharing.
Use QUERY. This makes your formula more convoluted quite quickly, but is still a readable and universally applicable method of selecting data, for example particular columns. In the question's initial example, QUERY can retrieve only the second column just like an adapted INDEX function would.
Format your input data more effectively. The more easily you can get numbers from your input, the less you have to obfuscate your code with REGEXMATCHES and QUERY's to do computations. Doing a SUM over a RANGE is a lot more compact of a formula than doing a VALUE of a LEFT of a QUERY of an ARRAYFORMULA of a SPLIT of a FILTER. Of course, this will depend on where you get your inputs from and if you have any say in this.
Also, depending on how many queries you will run on a given data set, it may actually be desirable to split up the formula into separate parts and output partial results to keep the code from becoming an amalgamation of 12 different queries and formulas. If the results don't need to be viewed by people, you can always choose to hide specific columns and rows.

Sort Google Sheet by order values are entered in a data validation

Is there a way to sort a Google Sheet by the order in which values are entered into a data validation criteria?
I want to sort the sheet based in ascending order Low,Medium,High or descending order High,Medium,Low. Not by alphabetical order High,Low,Medium and Medium,Low,High respectively.
Aaron. The easiest way would be to use a helper column (which you can hide later if you like) wherein you assign numerical values to your Low, Medium and High (presumably 1, 2 and 3 respectively). Then you sort using the numerical column. It's fairly easy to write a one-cell array formula that would assign the numerical values to your labels. The numerical column need not be beside the label column; it can be any column.
Without seeing an actual sample sheet, I can't show you. But hopefully the concept is clear, and you can take it from there.
Added description after sheet was shared:
In the example sheet, Sheet1 Column A contained the Priority in words (Low, Medium, High) and Column B contained "other data." I placed the following array formula into C1:
=ArrayFormula({"Priority Val";IF(A2:A="","",VLOOKUP(A2:A,Data!A:B,2,FALSE))})
The formula is an array formula, hence the ArrayFormula() wrap.
Inside this are curly brackets {} which allow the building of arrays that are not "of a type." In this case, the header is listed first ("Priority Val"). The semicolon means "place the next part underneath." Then a VLOOKUP references every value in Column A (i.e., the priority words) against a simple chart in a second sheet named "Data." In that "Data" sheet, Column A simply lists 1, 2, 3 and Column B lists your exact words: Low, Medium, High. The IF() function just checks to see if a row in Sheet1!A:A is blank. If so, a null is assigned before trying the VLOOKUP; otherwise, every blank row would show an #NA error.
If you want to make it even more air tight, it's good practice to wrap VLOOKUP in IFERROR(), just in case you misspell something in Sheet1!A:A. That would look like this:
=ArrayFormula({"Priority Val";IF(A2:A="","",IFERROR(VLOOKUP(A2:A,Data!A:B,2,FALSE)))})
And you can avoid misspelling by applying data validation to Sheet1!A2:A, referencing Data!A:A as the only allowable answers. This is not strictly necessary; but I have done it in the sample sheet to show you.

Google-Sheets Conditional Formatting based on multiple conditions

I am trying to format a cell based on multiple conditions. I am creating a spreadsheet to keep track of items borrowed. Let's say I am lending books. I want to have a list of books, one name in each cell. Then below that I want to have 3 columns: One column to enter the name of the book borrowed, the borrowing date, and the return date. I want to turn the cell with the book name RED, if the book has been borrowed AND if the return date is BLANK, meaning book is out. In my example screenshot, cell A2, and B2 should be red.
The conditional formula I have come up with is =AND($A6=A2, $C6="") for Book1 conditions, but it only works if C6 if empty, not if C8 is empty or other cells in column C where Book1 is found AND the return date is blank. There is no specific deadline to return items, just that if book has been borrowed and the return date in the same row is empty then the book name at the top should turn red.
Compare the result of COUNTA applied to the in and out ranges.
E.g. COUNTA(FILTER($B6:$B,$A6:$A=A2)) will count how many times a specific book is checked out, while COUNTA(FILTER($C6:$C, $A6:$A=A2)) will count how many times it is checked back in
Your question title asks about "multiple conditions", but very specifically you're looking to match based on any row that itself matches multiple conditions. That goes beyond the common AND operator and into a function that can process a range. You also need to be prepared for a book to be checked out and returned many times, which means there's no single row that manages the status of a given book; VLOOKUP and INDEX/MATCH are off the table too. Instead, you're effectively looking to generate a list of 0 or 1 values that match whether that book was checked out without being returned, and then coloring the cell based on whether there are any rows that match that condition.
To operate on multiple values at a time, you can use ARRAYFORMULA and then combine the output array with OR. However, one of the tricks about ARRAYFORMULA is that, to preserve the invariant about making single-value functions into array-valued functions, you can't use functions that can take arrays. This means that AND and ISBLANK don't work the way you'd like them to, but you can resolve that by using * instead of AND and = "" for ISBLANK.
One such solution (working example):
=OR(ARRAYFORMULA((A1 = $A$5:$A) * ($C$5:$C = "")))
ARRAYFORMULA isn't the only function to operate on a list of values, though; you could also use FILTER directly to only return matching rows. Here, you're checking whether any row has a matching book name and a blank return value, and then confirming that the value is not the #N/A that FILTER returns when nothing matches.
One such solution (working example):
=NOT(ISNA(FILTER($A$8:$C, $A$8:$A = A1, $C$8:$C = "")))
Of course, you can also take advantage of the fact that you're only checking blanks to use tehhowch's solution with COUNTA and FILTER above. However, since that solution won't work for arbitrary expressions, you can use ARRAYFORMULA or FILTER if your needs become more complex.

Filter one sheet by a range in another sheet

I have a Google Spreadsheet document that I'm using to maintain a reference of all business logic on various systems. It is comprised of 2 sheets:
Sheet1 is a view of all of the logic. Each row has a unique code column (column B) and many details about the logic being done in other columns
Sheet2 is a mapping of the systems to the logic. Each system is on one row. From column E onward, each cell is exactly a code from Sheet1
The relationship between code and system is many to many, so the same code may be used by many systems, and each system may have many codes.
I would like to be able to filter Sheet1 based on whether the code column in each row is found for particular systems.
Example
System A and System B are in Sheet2 rows 50 and 51
Their codes are from column E to K
Filter Sheet1 by code where code is contained in Sheet2!E50:K51. The end result should be Sheet1 shows only those codes (and of course all columns for them)
I have seen and tried a bit of the usual suspects (ARRAY_FORMULA, INDEX, LOOKUP) but I do not yet grok them fully. I thought the answer would be going to "Filter -> By Condition -> Custom Formula is" but I'm not sure what to put there.
Any help is greatly appreciated!
Short Answer
In custom formulas of filters use INDIRECT to refer to ranges in another sheet.
To test if a value is in a 2D range, compare the value and the range, coerce booleans to numbers and sum them.
Explanation
Part 1: Custom Formulas in filters
Custom formulas in filters and conditional formatting rules can only reference the same sheet, using standard notation (='sheetname'!cell). To reference another sheet in the formula, use the INDIRECT function.
Example
Assuming that the filter criteria are in A2:A3, the filter custom formula in in a sheet called Sheet1 is:
=ISNUMBER(MATCH(A2,INDIRECT("Sheet2!$A$2:$A$3"),0))
Part 2: Test if a value is included in a 2D array
LOOKUP only could look for values in a single column or single row, by the other hand AND and OR functions can't be used in array formulas so, instead of use them we will compare a scalar value with the 2D range. This will return a 2D array of TRUE/FALSE values that we will coerce to number (1 for TRUE, 0 for FALSE) and sum them.
The final custom formula is the following one:
=ArrayFormula(SUM(N(A2=INDIRECT("Sheet2!E50:K51"))))
References
Filter your data
Apply conditional formatting rules

Resources