Regualr expression for google Sheet Find and Replace - google-sheets

I have some rows in some columns contains something like
#Invalid Ref: 234566
#Invalid Ref: 123445
#Invalid Ref: 235678
I am trying to use find and replace by regular expression to find any row that contains any of the above and replace it with empty
what is the best regular expression I can use?

If the last numbers are always six digits, this should work. See google's explanation for Regex Find Replace for more examples.
^#Invalid Ref: [0-9]{6}$

Related

extracting values based on regex

I have a column which has urls, So below are the values of the column
https://www.example.com/jasja
https://www.example.com/jasdqw?new=exact
So what I want to extract is before the question mark and after the last slash
So here my output in the column should be
jasja
jasdqw
How can I get this using Regex
Tried =REGEXEXTRACT(C2:C16, SPLIT()), but don't know how to use this
Any help is appreciated
We can use REGEXEXTRACT with a capture group:
=REGEXEXTRACT(C2, "/([^/]+?)(?:\?|$)")
Here is a regex demo.
You could also try
=RegexExtract(A1,".*/(.*?)\?")

Replace everything after specific character in google sheets

So I have a document with 30k+ emails. The probleme is, during the export random characters appeared after the emails, something like name#email.com2019-10-10T0545152019-10-10T054515f or name#email.com00000000000700392019-11-28T070033f
My question is, how do i remove everything after ".com" or ".fr" in all the cells ?
You could try using REGEXREPLACE.
=REGEXREPLACE(A1,"\.com.*|\.fr.*", "")
Try
=REGEXEXTRACT(A1,".+\.com|.+\.fr")
Working from what other people added, you can get all emails from the column A and use regular expressions to get the values. Using ARRAYFORMULA you can do it in a single formula:
=ARRAYFORMULA(IF(A:A<>""; REGEXEXTRACT(A:A; ".+\.(?:com|fr)"); ""))
Rundown
ARRAYFORMULA allows to execute the formula to the entire column
REGEXEXTRACT extracts part of the string using regular expressions
IF conditional. In this case it's used to no execute when the cell is empty, preventing an error.
References
ARRAYFORMULA (Docs Editor Help)
REGEXEXTRACT (Docs Editor Help)
IF (Docs Editor Help)
Supposing your raw-data email list were in A2:A, try this in, Row 2 of an otherwise empty column (e.g., B2):
=ArrayFormula(IF(A2:A="",,REGEXEXTRACT(A2:A,"^.+\.\D+")))
In plain English, this means "Extract everything up to the last dot found that is followed by some number of non-digits."
This should pull up to any suffix (e.g., .com, .co, .biz, .org, .ma.gov, etc.).

Sheets: use FILTER for multiple strings instead of exact match only?

I'm trying to SUM column C based on the contents of columns A and B. Like this:
=sum(filter(C:C, (A:A="Safari")*(B:B="10.0.1")))
The above formula works. The FILTER function works as an exact match for "Safari" and "10.0.1" for columns A and B respectively.
The problem is... this only captures an exact match: "10.0.1". I need to capture multiple strings e.g. "10.0.1", "10.0.2", "10.0.3", etc.
If helpful, here's an example sheet.
I'm not sure if regex can be used in combination with a filter function. In any case, I've tried hard and failed spectacularly. So... how best to filter for multiple strings instead of exact match only?
=SUMIFS(C:C,A:A,"Safari",B:B,"10.0.*")
Please try:
=filter(C:C, (A:A="Safari")*(REGEXMATCH(B:B, "10\.0\..*")))
Notes:
filter is an arrayformlula and it has a great property: it converts all the formulas inside it into array formulas
"10.0..*" is a regex for your match. "\." will match a dot, ".*" will match any sequence of chars. Please see more syntax here.

I need to extract a specific word or words from a URL

How can I extract a specific word or words from a URL to display in another column on Google Spreadsheets? The URL is https://seatgeek.com/bands/katy-perry?p=3 and I have to extract "katy perry" from this URL. I also have to create a second formula that will display the same URL with a date from another column on the spreadsheet.
Look up regular expressions for VBA. This way you can perform pattern matching with a lot of flexibility.
Here:
http://www.macrostash.com/2011/10/08/simple-regular-expression-tutorial-for-excel-vba/
or better yet, here:
How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops
How's this - change A3 as needed to match the Cell with the URL:
=SUBSTITUTE(MID(A3,SEARCH(";",SUBSTITUTE(A3,"/",";",4))+1,FIND("?",SUBSTITUTE(A3,"/",";",4))-SEARCH(";",SUBSTITUTE(A3,"/",";",4))-1),"-"," ")
What this is doing is switching out the '/' right before 'katy-perry' with a unique (to that cell) mark, the semi-colon. Then, using MID(), extract the info between the substituted ';' and the '?'.
Edit: This should work with any name length (i.e. 'katy-perry','katyyyyyy-peeerrryyy'). Note that it assumes that you will ALWAYS have a URL with four '/' before the artist's name.
The single sample URL you provided leaves one wondering if the configuration is going to be standard across many other URLs you may have listed. If this is typical of the way other URLs are constructed, you can identify the question mark and the last forward slash to parse out the katy-perry. Here is is in steps then altogether.
The following instructions assume that https://seatgeek.com/bands/katy-perry?p=3 is in A1.
Append a question mark to the end just in case there isn't one in the URL and use the first question mark found to strip off anything right of that.
      =LEFT(A1, FIND("?", A1&"?")-1)
      
Replace all forward slashes with 99 spaces.
      =SUBSTITUTE(LEFT(A1, FIND("?", A1&"?")-1), "/", REPT(" ", 99))
      
Peel off the right-most 99 characters and trim off extra spaces.
      =TRIM(RIGHT(SUBSTITUTE(LEFT(A1, FIND("?", A1&"?")-1), "/", REPT(" ", 99)), 99))
      
The result should katy-perry. This formula is Google-Spreadsheet friendly.
      

Countif with len in Google Spreadsheet

I have a column XXX like this :
XXX
A
Aruin
Avolyn
B
Batracia
Buna
...
I would like to count a cell only if the string in the cell has a length > 1.
How to do that?
I'm trying :
COUNTIF(XXX1:XXX30, LEN(...) > 1)
But what should I write instead of ... ?
Thank you in advance.
For ranges that contain strings, I have used a formula like below, which counts any value that starts with one character (the ?) followed by 0 or more characters (the *). I haven't tested on ranges that contain numbers.
=COUNTIF(range,"=?*")
To do this in one cell, without needing to create a separate column or use arrayformula{}, you can use sumproduct.
=SUMPRODUCT(LEN(XXX1:XXX30)>1)
If you have an array of True/False values then you can use -- to force them to be converted to numeric values like this:
=SUMPRODUCT(--(LEN(XXX1:XXX30)>1))
Credit to #greg who posted this in the comments - I think it is arguably the best answer and should be displayed as such. Sumproduct is a powerful function that can often to be used to get around shortcomings in countif type formulae.
Create another list using an =ARRAYFORMULA(len(XXX1:XXX30)>1) and then do a COUNTIF based on that new list: =countif(XXY1:XXY30,true()).
A simple formula that works for my needs is =ROWS(FILTER(range,LEN(range)>X))
The Google Sheets criteria syntax seems inconsistent, because the expression that works fine with FILTER() gives an erroneous zero result with COUNTIF().
Here's a demo worksheet
Another approach is to use the QUERY function.
This way you can write a simple SQL like statement to achieve this.
For example:
=QUERY(XXX1:XXX30,"SELECT COUNT(X) WHERE X MATCHES '.{1,}'")
To explain the MATCHES criteria:
It is a regex that matches every cell that contains 1 or more characters.
The . operator matches any character.
The {1,} qualifies that you only want to match cells that have at 1 or more characters in them.
Here is a link to another SO question that describes this method.

Resources