In Google Sheets, I'm finding duplicates using the common approach of:
=IF(COUNTIF(A:A,A2)>1,"Duplicate","Unique")
But it is ignoring punctuation marks like '?'
For example, if I have 'wordA' and 'wordA?' it shows them as duplicates when they are not.
Is there any way around this?
The countif() function treats ? as a wildcard that matches any single character. wordA? will match wordA that has a trailing space. To get exact matches only, use filter(), like this:
=if( counta(iferror(filter(A:A, A:A = A2))) > 1, "Duplicate", "Unique" )
See countif().
try:
=INDEX(IF(COUNTIF(A:A&"×", A2&"×")>1, "Duplicate", "Unique"))
Related
I'm trying to do a conditional formatting that matches on the double quote character followed by a zero. i.e.
"0 / 10" : this should match as true
"10 / 10": this should match as false
This regex is incorrect, as it matches on both:
=REGEXMATCH(B:B;"0 /")
I expect to be able to use the formula standard of escaping the " with an extra quote. It accepts this formula syntactically, but does not match:
=REGEXMATCH(B:B;"""0 /")
I tried matching with punctuation characters, no match:
=REGEXMATCH(B:B;"[[:punct:]]0 /")
I can use [digit[ to match the 10/10 case, but ~digit doesn't match the zero with a quote in front of it:
=REGEXMATCH(B:B;"[^[:digit:]]0 /")
I even tried concatenating the specific character, no match:
=REGEXMATCH(B:B;CONCATENATE(CHAR(34), "0 /"))
I'm very confused at this point. If I insert any other special character before the zero, I have no trouble matching it. But it seems like double-quote just isn't treated like a regular character somehow. Does anyone know what I'm doing wrong?
Try
=REGEXMATCH(B2;char(34)&"0 /")
Thanks to Tim for giving me a hint that lead to a solution. My problem was that the text in a cell was part of a formula. So I needed to extract the formula as text (FORMULATEXT) and then it works:
=REGEXMATCH(FORMULATEXT($B1);"""0 /")
Try this in row 1 of a different column:
=arrayformula(if(B:B<>"",iferror(regexmatch(B:B,"""0"),),))
if(B:B<>"" will only process the formula provided Col B has values.
iferror( will ignore numbers in Col B that produce #VALUE!.
"""0" is the regex. The double quote to the left of 0 is doubled up (or char(34)&"0" as per #Mike Steelson).
Ideally what I'm looking for is to get the dollar amount extracted no matter the format.
Sheet link:
https://docs.google.com/spreadsheets/d/1drTPlnQmVTsbUXwJDfQr7DnHjSbnGx-fLthad6KxfM8/edit?usp=sharing
Delete everything from Column B, including the header. Then place the following formula in cell B1:
=ArrayFormula({"Header"; IF(A2:A="",,VALUE(IFERROR(REGEXEXTRACT(A2:A,"\$(\d+\.?\d*)"))))})
You may change the header text within the formula as you like.
If a cell in A2:A is blank, the corresponding cell in B2:B will be left blank as well.
Otherwise REGEXEXTRACT will look for a pattern that begins with a literal dollar sign. The parenthesis within the quotes denote the beginning and end of a capture group (i.e., what will be returned if found) following that literal dollar sign. The pattern \d+\.?\d* means "a group of one or more digits, followed by zero or one literal period symbols, followed by zero or more digits."
IFERROR will cause null to be rendered instead of an error if such a pattern is not able to be extracted.
VALUE will convert the extracted string (or null) to a real number.
If you would prefer that null be returned instead of 0 where no pattern match is found, you can use the following variation of the formula instead:
=ArrayFormula({"Header"; IFERROR(VALUE(IFERROR(REGEXEXTRACT(A2:A,"\$(\d+\.?\d*)"),"x")))})
If your strings may include numbers with comma separators, use the following versions of the above two formulas, respectively:
=ArrayFormula({"Header V1"; IF(A2:A="",,VALUE(IFERROR(REGEXEXTRACT(SUBSTITUTE(A2:A,",",""),"\$(\d+\.?\d*)"))))})
=ArrayFormula({"Header V2"; IFERROR(VALUE(IFERROR(REGEXEXTRACT(SUBSTITUTE(A2:A,",",""),"\$(\d+\.?\d*)"),"x")))})
try:
=INDEX(IFNA(REGEXEXTRACT(A2:A, "\$(\d+.\d+|\d+)")*1))
Here's my next challenge, and it's related to the previous one (found here: This works for one cell - now how can I apply it to a range?).
I've ended up with a godawful ugly formula for conditional formatting, and somehow (perhaps by dumb luck) it seems to work...
=OR(ARRAYFORMULA(IF(ISNUMBER(SEARCH($B$18,D7)),SIGN(SEARCH($B$18,D7)),IF(ISNUMBER(SEARCH(SPLIT($B$19,","),D7)),SEARCH(SPLIT($B$19,","),D7)))))
It returns true for any single target cell (D7 in this example), checking whether it contains either the string in B18 or one of two or more string values, separated by commas, in B19.
As with the previous scenario, I can't work out how to turn this into a formula (array formula?) which I can apply to a range (D3:D12) and count how many cells meet the criteria.
Or maybe the better question is, what would be the correct way to tackle this in preference to my Frankenstein's Monster of a kludged-up formula quoted above!
Any and all insights appreciated :)
Assuming the values in B19 are separated by a comma, followed by a space, try:
=sum(ArrayFormula(--(REGEXMATCH(D3:D12, B18&"|"&SUBSTITUTE(B19, ", ", "|")))))
If there is no space after the comma use "," instead of ", ".
If you want the match to be case-insensitive, try:
=sum(ArrayFormula(--(REGEXMATCH(D3:D12, "(?i)"&B18&"|"&SUBSTITUTE(B19, ", ", "|")))))
See if that works?
I'm trying to SUM column C based on the contents of columns A and B. Like this:
=sum(filter(C:C, (A:A="Safari")*(B:B="10.0.1")))
The above formula works. The FILTER function works as an exact match for "Safari" and "10.0.1" for columns A and B respectively.
The problem is... this only captures an exact match: "10.0.1". I need to capture multiple strings e.g. "10.0.1", "10.0.2", "10.0.3", etc.
If helpful, here's an example sheet.
I'm not sure if regex can be used in combination with a filter function. In any case, I've tried hard and failed spectacularly. So... how best to filter for multiple strings instead of exact match only?
=SUMIFS(C:C,A:A,"Safari",B:B,"10.0.*")
Please try:
=filter(C:C, (A:A="Safari")*(REGEXMATCH(B:B, "10\.0\..*")))
Notes:
filter is an arrayformlula and it has a great property: it converts all the formulas inside it into array formulas
"10.0..*" is a regex for your match. "\." will match a dot, ".*" will match any sequence of chars. Please see more syntax here.
I have a column XXX like this :
XXX
A
Aruin
Avolyn
B
Batracia
Buna
...
I would like to count a cell only if the string in the cell has a length > 1.
How to do that?
I'm trying :
COUNTIF(XXX1:XXX30, LEN(...) > 1)
But what should I write instead of ... ?
Thank you in advance.
For ranges that contain strings, I have used a formula like below, which counts any value that starts with one character (the ?) followed by 0 or more characters (the *). I haven't tested on ranges that contain numbers.
=COUNTIF(range,"=?*")
To do this in one cell, without needing to create a separate column or use arrayformula{}, you can use sumproduct.
=SUMPRODUCT(LEN(XXX1:XXX30)>1)
If you have an array of True/False values then you can use -- to force them to be converted to numeric values like this:
=SUMPRODUCT(--(LEN(XXX1:XXX30)>1))
Credit to #greg who posted this in the comments - I think it is arguably the best answer and should be displayed as such. Sumproduct is a powerful function that can often to be used to get around shortcomings in countif type formulae.
Create another list using an =ARRAYFORMULA(len(XXX1:XXX30)>1) and then do a COUNTIF based on that new list: =countif(XXY1:XXY30,true()).
A simple formula that works for my needs is =ROWS(FILTER(range,LEN(range)>X))
The Google Sheets criteria syntax seems inconsistent, because the expression that works fine with FILTER() gives an erroneous zero result with COUNTIF().
Here's a demo worksheet
Another approach is to use the QUERY function.
This way you can write a simple SQL like statement to achieve this.
For example:
=QUERY(XXX1:XXX30,"SELECT COUNT(X) WHERE X MATCHES '.{1,}'")
To explain the MATCHES criteria:
It is a regex that matches every cell that contains 1 or more characters.
The . operator matches any character.
The {1,} qualifies that you only want to match cells that have at 1 or more characters in them.
Here is a link to another SO question that describes this method.