Conditionally formatting duplicate rows in google sheets - google-sheets

I want to apply conditional formatting so that all the rows which match another row exactly are highlighted.
Let's say I have a spreadsheet like the following
| | a | b | c |
|---|---|---|---|
| 1 | A | B | C | // Matches row 3 and 6
| 2 | A | B | A | // Matches row 5
| 3 | A | B | C | // Matches row 1 and 6
| 4 | B | B | C | // Matches no other row
| 5 | A | B | A | // Matches Row 2
| 6 | A | B | C | // Matches row 1 and 3
| 7 | B | B | A | // Matches no other row
All the rows except for row 4 and 7 would be highlighted.
For to rows to be considered duplicates, the value of each/every cell in a given row must exactly match the value of the corresponding cell (cell in the same column) in a duplicate row.
My attempt so far can only return the values of rows with only the first 2 cells being duplicate and returns the concatenation of all the duplicate values in each row, which is very far away from what I want.
CC = arrayformula(A:A&" "&B:B&" "&C:C) returns a new row which is the concatenation of A, B, and C, which is coercing the cell values into strings so "1" and 1 which are not the same appear to be the same, and also doesn't work across the entire row (could do If I just kept adding Columns, but would look terrible).
=filter(unique(CC), arrayformula(countif(CC, unique(CC)) > 1)) CC is the returned value from the previous equation
This would output
A B C
A B A
Then I could add a conditional formatting rule with a custom formula that Highlights a row if it's concatenated contents "Match" one of the return values from the previous equation, but I don't know how to do that, and the previous equation is already pretty flawed.
Ideally I want a solution that involves no string concatenation or entering in all column names.

Let's go over what is needed to create this function.
1st you need to get the rows as a string to be able to compare them like you did. I didn't use space like you did because it takes place, but you can keep them.
=ARRAYFORMULA(A:A&B:B&C:C)
The issue with that is that since the formula will be on 3 column, we don't want it to become C:C&D:D&E:E so we have to fix the column.
=ARRAYFORMULA($A:$A&$B:$B&$C:$C)
Yay! Now we have a list of string that represent the "value" of each row. We can now count for each line how many times they are found. I used A2 cause I guess you have a header, but if you don't, simply replace it with A1.
=COUNTIF(ARRAYFORMULA($A:$A&$B:$B&$C:$C);A2&B2&C2)
We also have to fix the column here or the function will only work on the 1st one.
=COUNTIF(ARRAYFORMULA($A:$A&$B:$B&$C:$C);$A2&$B2&$C2)
And now all that's left is check if you want to see thoses who are unique or thoses who have matches
=COUNTIF(ARRAYFORMULA($A:$A&$B:$B&$C:$C);$A2&$B2&$C2)>1

This solution doesn't involved converting the values to strings, but it still requires adding a function for every column, so it's almost there.
=countifs(arrayformula($A:$A=$A1),TRUE,arrayformula($B:$B=$B1),TRUE,arrayformula($C:$C=$C1),TRUE)>1
It's just a conditional for each column conditional = arrayformula($A:$A=$A1) in a countifs, countifs(conditional, true).
I just need to make it so it can take the column values as an array which i'm guessing will require an arrayformula

There is a MUCH simpler way.
Load Conditional Formatting (under Format).
Select "custom formula is" (way at the bottom of the formula list)
Use the formula "=countif(A:A,A1)>1", where A is the column that contains the cells you want to be formatted for duplicates.

Related

Reference range when column matches string

Budget spreadsheet. Column A contains categories, Row 1 contains paycheck dates, and each cell from B2:AE91 contains numeric values ("how much I spent on categoryX during paycheckY").
Named ranges:
Column A - "Budget_LineItem"
Row 1 - "Budget_PayPeriods"
On another tab, I have a list of specific categories called "Funds," where I want to track how much I've saved so far each paycheck toward the category by adding up the category's values each paycheck up until TODAY().
For example:
| | A | B | C | D |
| - | - | - | - | - |
| 1 | Fund | Balance | Today: | =TODAY() |
| 2 | Auto Insurance | =SUMIF(Budget_PayPeriods,"<="&MAX($D$1:$D$2),Budget!F48:AE48) | Projected Date: | |
As you can see, I just have a static range for the "Auto Insurance" category: Budget!B48:AE48. This works, but I want a formula that looks up the adjacent value in column A against the Budget_LineItem range, and returns the row range from B:AE in the Budget spreadsheet.
Basically reads: "Go find how much I've saved/spent so far toward categoryX in the Budget tab, and add up all the values for each paycheck up through today."
I know I'm close, but I can't make INDEX, MATCH, or any of the LOOKUP functions do what I need. I just can't figure it out.
EDIT: Here's a link to an example: https://docs.google.com/spreadsheets/d/1L4mlMrRCWwDNPSiYHpmFiXU1zNOnga6gAziz_m2awKI/edit?usp=sharing
I also made a change to the OP formula in B2 as I realized it didn't work. I had tweaked it because my original formula had extra complexity and I was trying to KISS for this question. I changed it back to the more complex version so it works properly now.
delete range B2:B and use this in B2:
=INDEX(MMULT(FILTER(Budget!B2:4, Budget!B1:1<=MAX(D1:D2))*1,
SEQUENCE(SUMPRODUCT((Budget!B1:1<=MAX(D1:D2))))^0))
update:
=INDEX(IFNA(VLOOKUP(A2:A,
{Budget!A2:A4, MMULT(FILTER(Budget!B2:4, Budget!B1:1<=MAX(D1:D2))*1,
SEQUENCE(SUMPRODUCT((Budget!B1:1<=MAX(D1:D2))))^0)}, 2, 0)))

Google Sheets Conditional Formatting - Column comparison w/ changing data

I have a code generated spreadsheet which has the following data:
Col:A B C D
Sect | Lbl | Data1 | Data2
===================================
Sec1 | Lbl1 | 1 | >50
-----------------------------------
Sec2 | Lbl2 | 2 | 1
I have a conditional format rule in place to say that if Data1 is greater than Data2 then make the cell background colour red in the Data1 column. If the data is >50 I extract the 50 in my formula. I ignore the first row and the first 2 columns as they aren't needed for this formatting.
Apply to range: C2:Z1000
Custom Formula is: =AND((1*REGEXEXTRACT(D2,"\d+"))<(1*REGEXEXTRACT(C2,"\d+")))
This seems to work ok.
Next step is some new data is inserted between column B and C in the existing spreadsheet (my newest data is always on the left)
Col:A B C D E
Sect | Lbl | NEWData| Data1 | Data2
==========================================
Sec1 | Lbl1 | 9 | 1 | >50
------------------------------------------
Sec2 | Lbl2 | 3 | 2 | 1
So as you can see, the new data is now in column C and everything has shifted over 1 column.
The rule now affects column D (with no change to the above set up). I want it, however, to affect all columns C onwards so:
NEWData value is red if greater than Data1 value on this row
Data1 value is red if greater than Data2 value on this row
Data2 will always be unformatted as it has no value to compare on its right-hand side.
Every time I add a new column (in column C's position) I want the colouring to be updated for all applicable columns data.
Side Note: I also have another similar rule which colours the cell green if the value is 'less than' as opposed to 'greater than'. This will also be applied once I get this rule working.
I've finally figured this out. The problem seemed to be in the formatting (or assumed format) of the data being entered into the new column.
My code inserted values work with the rules (see below) but when I manually typed in values to a new column (to test the formula) the rule wasn't applied.
Manually copying and pasting a value into a new column DID work though.
Very irritating.
Anyway, I have simplified my rules:
BG color = red if this value greater than the value to the right
Range: C2:Z
Formula: =GT(1*REGEXEXTRACT(C2,"\d+"),1*REGEXEXTRACT(D2,"\d+"))
BG color = green if this value less than the value to the right
Range: C2:Z
Formula: =LT(1*REGEXEXTRACT(C2,"\d+"),1*REGEXEXTRACT(D2,"\d+"))

Retrieve the values if the adjacent cell contains a given string

This is how my spreadsheet looks like:
2 spreadsheets: Foo and Bar.
Foo only has one cell, A1.
Bar has n rows and 2 columns.
Now I want to have a formula in A1 so that it sums up all values in the 2nd column of Bars spreadsheet, if the adjacent column contains a given string.
So for example if my Bar spreadsheet looks like this:
-----------------
Machine A | 500 |
Mach B | 321 |
Door | 34 |
Machines C | 2 |
-----------------
A1 should now sum up all values of the rows where the first column's cell contains either the word Machine, Mach or Machines, thus the value in A1 would be 823.
I suppose it is some combination of IF, SEARCH/FIND, but my main problem is to address the adjacent value cell, if the a string is found.
Try:
=sumif(Bar!A:A,"=*Mach*",Bar!B:B)

Get multiple values, selected according to another column, into a single cell

I am trying to find a way to get multiple values from an array to display in one cell
For example I have the two columns as below
| a | 1 |
| b | 2 |
| c | 1 |
| d | 3 |
| e | 2 |
So if the parameter is 2 the cell would display "be"
I want all the values form the first column where the second column is 1.
I have tried to do this with dget but that only returns a single value. Is there a way to do this with formulas or does it require a Javascript solution?
You can do it by using filter to return only the letters next to "2", and then join to join them in one cell.
=join("", filter(A1:A, B1:B = 2))

Count rows with not empty value

In a Google Spreadsheet: How can I count the rows of a given area that have a value? All hints about this I found up to now lead to formulas that do count the rows which have a not empty content (including formula), but a cell with
=IF(1=2;"";"") // Shows an empty cell
is counted as well.
What is the solution to this simple task?
I just used =COUNTIF(Range, "<>") and it counted non-empty cells for me.
=counta(range)
counta: "Returns a count of the number of values in a dataset"
Note: CountA considers "" to be a value. Only cells that are blank (press delete in a cell to blank it) are not counted.
Google support: https://support.google.com/docs/answer/3093991
countblank: "Returns the number of empty cells in a given range"
Note: CountBlank considers both blank cells (press delete to blank a cell) and cells that have a formula that returns "" to be empty cells.
Google Support: https://support.google.com/docs/answer/3093403
If you have a range that includes formulae that result in "", then you can modify your formula from
=counta(range)
to:
=Counta(range) - Countblank(range)
EDIT: the function is countblank, not countblanks, the latter will give an error.
Here's what I believe is the best solution so far:
=CountIf(ArrayFormula(range<>""),TRUE)
Here's why in 3 easy steps
Step 1: Simple As Pie - Add Extra Column
The answer by eniacAvenger will yield the correct solution without worrying about edge cases as =A1<>"" seems to arrive at the correct truthy/falsy value based on how we intuitively think of blank cells, either virgin blanks or created blanks.
So imagine we have this data and we want the Count of non-blanks in B2:B6:
| | A | B | C |
|---|-------------|-------|---------|
| 1 | Description | Value | B1<>"" |
| 2 | Text | H | TRUE |
| 3 | Number | 1 | TRUE |
| 4 | IF -> "" | | FALSE |
| 5 | IF -> Text | h | TRUE |
| 6 | Blank | | FALSE |
If we relied on Column C, we could get the count of values in B like this:
=COUNTIF(C2:C6,True)
Step 2: Use FormulaArray to dynamically create Extra Column
However, consideRatio's comment is a valid one - if you need an extra column, you can often accomplish the same goal with an ArrayFormula which can create a column in memory without eating up sheet space.
So if we want to create C dynamically, we can use an array formula like this:
=ArrayFormula(B2:B6<>"")
If we simply put it in C2, it would create the vertical array with a single stroke of the pen:
| | A | B | C |
|---|-------------|-------|--------------------------|
| 1 | Description | Value | =ArrayFormula(B2:B6<>"") |
| 2 | Text | H | TRUE |
| 3 | Number | 1 | TRUE |
| 4 | IF -> "" | | FALSE |
| 5 | IF -> Text | h | TRUE |
| 6 | Blank | | FALSE |
Step 3: Count Values in Dynamic Column
But with that solved, we no longer need the column to merely display the values.
ArrayFormula will resolve to the following range: {True,True,False,True,False}.
CountIf just takes in any range and in this case can count the number of True values.
So we can wrap CountIf around the values produced by ArrayFormula like this:
=CountIf(ArrayFormula(B2:B6<>""),TRUE)
Further Reading
The other solutions in this thread are either overly complex, or fail in particular edge cases that I've enumerated in this test sheet:
Google Spreadsheet - CountA Test - Demo
For why CountA works the wonky way it does, see my answer here
For me, none of the answers worked for ranges that include both virgin cells and cells that are empty based on a formula (e.g. =IF(1=2;"";""))
What solved it for me is this:
=COUNTA(FILTER(range, range <> ""))
It works for me:
=SUMPRODUCT(NOT(ISBLANK(F2:F)))
Count of all non-empty cells from F2 to the end of the column
Solved using a solution i found googling by Yogi Anand: https://productforums.google.com/d/msg/docs/3qsR2m-1Xx8/sSU6Z6NYLOcJ
The example below counts the number of non-empty rows in the range A3:C, remember to update both ranges in the formula with your range of interest.
=ArrayFormula(SUM(SIGN(MMULT(LEN(A3:C), TRANSPOSE(SIGN(COLUMN(A3:C)))))))
Also make sure to avoid circular dependencies, it will happen if you for example count the number of non-empty rows in A:C and place this formula in the A or C column.
Given the range A:A, Id suggest:
=COUNTA(A:A)-(COUNTIF(A:A,"*")-COUNTIF(A:A,"?*"))
The problem is COUNTA over-counts by exactly the number of cells with zero length strings "".
The solution is to find a count of exactly these cells. This can be found by looking for all text cells and subtracting all text cells with at least one character
COUNTA(A:A): cells with value, including "" but excluding truly empty cells
COUNTIF(A:A,"*"): cells recognized as text, including "" but excluding truly blank cells
COUNTIF(A:A,"?*"): cells recognized as text with at least one character
This means that the value COUNTIF(A:A,"*")-COUNTIF(A:A,"?*") should be the number of text cells minus the number of text cells that have at least one character i.e. the count of cells containing exactly ""
A simpler solution that works for me:
=COUNTIFS(A:A;"<>"&"")
It counts both numbers, strings, dates, etc that are not empty
As far as I can see, most of the solutions here count the number of non empty cells, and not the number of rows with non empty cell inside.
One possible solution for the range B3:E29 is for example
=SUM(ArrayFormula(IF(B3:B29&C3:C29&D3:D29&E3:E29="";0;1)))
Here ArrayFormula(IF(B3:B29&C3:C29&D3:D29&E3:E29="";0;1)) returns a column of 0 (if the row is empty) and 1 (else).
Another one is given in consideRatio's answer.
You can define a custom function using Apps Script (Tools > Script editor) called for example numNonEmptyRows :
function numNonEmptyRows(range) {
Logger.log("inside");
Logger.log(range);
if (range && range.constructor === Array) {
return range.map(function(a){return a.join('')}).filter(Boolean).length
}
else {
return range ? 1 : 0;
}
}
And then use it in a cell like this =numNonEmptyRows(A23:C25) to count the number of non empty rows in the range A23:C25;
In Google Sheets, to count the number of rows which contain at least one non-empty cell within a two-dimensional range:
=ARRAYFORMULA(
SUM(
N(
MMULT(
N(A1:C5<>""),
TRANSPOSE(COLUMN(A1:C5)^0)
)
>0
)
)
)
Where A1:C5 is the range you're checking for non-empty rows.
The formula comes from, and is explained in the following article from EXCELXOR - https://excelxor.com/2015/03/30/counting-rows-where-at-least-one-condition-is-met/
A very flexible way to do that kind of things is using ARRAYFORMULA.
As an example imagine you want to count non empty strings (text fields) you can use this code:
=ARRAYFORMULA(SUM(IF(Len(B3:B14)>0, 1, 0)))
What happens here is that "ArrayFormula" let you operate over a set of values. Using the SUM function you indicates "ArrayFormula" to sum any value of the set. The "If" clause is only used to check "empty" or "not empty", 1 for not empty and 0 otherwise. "Len" returns the length of the different text fields, there is where you define the set (range) you want to check. Finally "ArrayFormula" will sum 1 for each field inside the set(range) in which "len" returns more than 0.
If you want to check any other condition, just modify the first argument of the IF clause.
Make another column that determines if the referenced cell is blank using the function "CountBlank". Then use count on the values created in the new "CountBlank" column.

Resources