Combining REGEXEXTRACT and SUBSTITUTE in Google Sheets Formula - google-sheets

I'm extracting text from filename cells into separate metadata field cells. So far I have done this successfully using the REGEXTRACT formula, as seen below.
=REGEXEXTRACT(A1, "TILEABLE|ROOM|MAIN|FLOORSHOT|SWATCH|ANGLED")
However some metadata fields that include multiple words require that a space or other character be placed between words. I'm trying to figure out how to use SUBSTITUTE or REPLACE in conjunction with REGEXTRACT to find a phrase and replace it with a version with something different. Ex. Replace "TOPDOWN" with "Top Down" or replace "1TO1" with "1-to-1).

Depending on your purpose one formula might be better than other. If you want to list in a column the substituted values of this string you could chain the number of phrases you want using SUBSTITUTE and REGEXTRACT.
This will return all the phrases you are looking for and substitute them to then use the formula TRANSPOSE to take this range and display it in a columns (as it normally would be displayed in a row and only a single value). This is a simple example:
=TRANSPOSE({SUBSTITUTE(REGEXEXTRACT(A1,"TOPDOWN"),"TOPDOWN","Top Down"),SUBSTITUTE(REGEXEXTRACT(A1,"SHIRTS"),"SHIRTS","Shirts1")})

try:
=SUBSTITUTE(SUBSTITUTE(REGEXEXTRACT(A1,
"TOPDOWN|1TO1|TILEABLE|ROOM|MAIN|FLOORSHOT|SWATCH|ANGLED"),
"TOPDOWN", "Top Down"),
"1TO1", "1-to-1")

Related

In Google Sheets, how can I test if a range of cells contains any of the text in another range?

Best explained with an example:
I want to search the blue range and check if any of the cells contain any of the strings in the green range.
Ideally non-case-sensitive, and the search string could appear anywhere within the searched cells.
If...
search range: A1:A10
search key: B1:B3
... then use the following formula
=arrayformula(sum(if(regexmatch(textjoin(",",false,",",A1:A10,","),","&B1:B3&","),1,0)))>0
Feel free to read the documentation of the functions in question.
The basic idea here is that: we want to be able to join the search words into 1 string and apply arrayformula to individual search keys; and then, we want to search whole words.
So how do we easily search whole words? Your search words are divided by cells. So lets put , between them but also wrapping them. Now ","&search_key&"," marks a matched word -- not just a component of a search word.
The rest is doing and operation on array. Google Sheet unfortunately doesn't have functions like any or all. So the most (computationally) efficient thing to do is to use if (in comparison to alternatives like matrix multiplication or filter). The position of arrayformula doesn't matter here so you can just put it outside everything.
Here's another possible solution:
=ARRAYFORMULA(IF(BYROW(A5:C,LAMBDA(r,SUM(LEN(r))))=0,,BYROW(REGEXMATCH(A5:C,"\b"&TEXTJOIN("\b|\b",1,E1:E)&"\b"),LAMBDA(r,SUM(--r)>0))))
Note that this formula is entered once in D5 and it doesn't have to be dragged down.
result
With the recently added new functions, things can be done as easy as this.
The reference range A5:C7 and E1:E3 can be changed to match your needs.
=BYROW(A5:C7,LAMBDA(ROW,REGEXMATCH(JOIN(" ",ROW),JOIN("|",$E$1:$E$3))))
To make it a 'non-case-sensitive' search, you can add UPPER() to both of the reference range.
Since UPPER() itself is not an ArrayFormula, you'll have to wrap the whole thing with ArrayFormula(), so the outcome will look like this:
=ArrayFormula(BYROW(UPPER(A5:C7),LAMBDA(ROW,REGEXMATCH(JOIN(" ",ROW),JOIN("|",UPPER($E$1:$E$3))))))
Just found a problem, that if the green range contains empty cells, it may ruin the result, to get rid of this problem, I added QUERY() to the ref. range of green area like this:
=ArrayFormula(BYROW(UPPER(A5:C7),LAMBDA(ROW,REGEXMATCH(JOIN(" ",ROW),JOIN("|",UPPER(QUERY({E1:E3},"WHERE Col1 IS NOT NULL")))))))
Or, we can include the 'non-case-sensitive' argument into regex2 like this:
=BYROW(A5:C7,LAMBDA(ROW,REGEXMATCH(JOIN(" ",ROW),"(?i)"&JOIN("|",QUERY({E1:E3},"WHERE Col1 IS NOT NULL")))))
use:
=INDEX(REGEXMATCH(FLATTEN(QUERY(TRANSPOSE(A5:C7),,9^9)),
"(?i)\b"&TEXTJOIN("|", 1, E1:E)&"\b"))

Google Spreadsheet error Text result of TEXTJOIN is longer than the limit of 50000 characters

I am trying to combine cells and show in one cell as each cell contains product skus comma seperated. Need to combine these cells with comma seperator in seperate cell in same column.
For this i am using
=TEXTJOIN(",",TRUE, G5,G10,G19,G27,G39,G46,G59)
But getting error:
Text result of TEXTJOIN is longer than the limit of 50000 characters.
use query (that's the only way):
=QUERY({G5;G10;G19;G27;G39;G46;G59}&",";;9^9)
or:
=QUERY({QUERY({G5;G10;G19;G27;G39;G46}&",";;9^9); G59};;9^9)
A Sheets cell cannot have more than 50,000 characters:
When you convert a document from Excel to Google Sheets, any cell with more than 50,000 characters will be removed in Sheets.
I'd suggest you to split your data into several cells.
Reference:
Files you can store in Google Drive
According to #player0's answer and his comment about "query adds one empty space between each cell", you can use the ARRAYFORMULA, SPLIT, and SUBSTITUTE functions to manipulate the output of the QUERY function.
=ARRAYFORMULA(SPLIT(SUBSTITUTE(QUERY({G5;G10;G19;G27;G39;G46;G59}&"#####",,9^9),"##### ",""),"#####"))
By setting a unique possible (with lower chance to exists in your data) character, such as "#####", and adding the space after it (e.g., "##### "), you can substitute for an empty value (e.g., ""). However, in the end, it will keep one "#####" without space. To solve this, you can split it and get the first part only.
This formula will also achieve a SUBSTITUTE Limitation, however, it can sustain a little more than 50000 characters.

Is there a way to specify an input is a single cell in Google Sheets?

I want to iterate over an array of cells, in this case B5:B32, and keep the values that are equal to some reference text in a new array.
However, SPLIT nowadays accepts arrays as inputs. That means that if I use the array notation of "B5:B32" within ARRAYFORMULA or FILTER, it treats it as a range, rather than the array over which we iterate one cell at a time.
Is there a way to ensure that a particular range is the range over which we iterate, rather than the range given at once as an input?
What I considered was using alternative formulations of a cell, using INDEX(ROW(B5), COLUMN(B5)) but ROW and COLUMN also accept array values, so I'm out of ideas on how to proceed.
Example code:
ARRAYFORMULA(
INDEX(
SPLIT(B5:B32, " ", 1), 1
) = "Some text here"
)
Example sheet:
https://docs.google.com/spreadsheets/d/1H8vQqD5DFxIS-d_nBxpuwoRH34WfKIYGP9xKKLvCFkA/edit?usp=sharing
Note: In the example sheet, I can get to my desired answer if I create separate columns containing the results of the SPLIT formula. This way, I first do the desired SPLITS, and then take the values I need from that output by specifying the correct range.
Is there a way to do this without first creating an output and then taking a cell range as an input to FILTER or other similar functions?
For example in cell C35 I've already gotten the desired SPLIT and FILTER done in one go, but I'd still need to find a way to sum up the values of the first character of the second column. Doing this requires that I take the LEFT value of the second column, but for that I need to output the results and continue in a new cell. Is there a way to avoid this?
Ralph, I'm not sure if your sample sheet really reflects what you are trying to end up with, since, for example, I assume you are likely to want the total of the hours per area.
In any case, this formula extracts all of the areas, and the hours worked, and is then easy to do further calculations with.
=ArrayFormula({REGEXEXTRACT({C5:C9;D5:D9;E5:E9;F5:F9;G5:G9;H5:H9},"(.*) \d"),
VALUE(REGEXEXTRACT({C5:C9;D5:D9;E5:E9;F5:F9;G5:G9;H5:H9}," (\d+)hrs"))})
Try that in cell E13, to see the output.
The first REGEXEXTRACT pulls out all the text in front of the first space and number, and the second pulls out all the digits in a string of " #hr" in each cell. These criteria could be modified, if necessary, depending on your actual requirements. Note that it requires the use of VALUE, to convert the hours from text to numeric values, since REGEXEXTRACT produces text (string) results.
It involved concatenating your multiple data columns into one long column of data, to make it simpler to process all the cells in the same way.
This next formula will give you a sum, for whatever matching room/task you type into B6, as an example.
=ArrayFormula(QUERY({REGEXEXTRACT({C5:C9;D5:D9;E5:E9;F5:F9;G5:G9;H5:H9},"(.*) \d"),
VALUE(REGEXEXTRACT({C5:C9;D5:D9;E5:E9;F5:F9;G5:G9;H5:H9}," (\d+)hrs"))},
"select Col1, sum(Col2) where Col1='"&B6&"' group by Col1 label sum(Col2) '' ",0))
I will also answer my own question given what I know from kirkg13's answer and other sources.
Short answer: no, there isn't. If you want to do really convoluted computations with particular cell values, there are a few options and tips:
Script your own functions. You can expand INDEX to accept array inputs and thereby you can select any set of values from an array without outputting it first. Example that doesn't use REGEXMATCH and QUERY to get the SUM of hours in the question's example data set: https://docs.google.com/spreadsheets/d/1NljC-pK_Y4iYwNCWgum8B4NJioyNJKYZ86BsUX6R27Y/edit?usp=sharing.
Use QUERY. This makes your formula more convoluted quite quickly, but is still a readable and universally applicable method of selecting data, for example particular columns. In the question's initial example, QUERY can retrieve only the second column just like an adapted INDEX function would.
Format your input data more effectively. The more easily you can get numbers from your input, the less you have to obfuscate your code with REGEXMATCHES and QUERY's to do computations. Doing a SUM over a RANGE is a lot more compact of a formula than doing a VALUE of a LEFT of a QUERY of an ARRAYFORMULA of a SPLIT of a FILTER. Of course, this will depend on where you get your inputs from and if you have any say in this.
Also, depending on how many queries you will run on a given data set, it may actually be desirable to split up the formula into separate parts and output partial results to keep the code from becoming an amalgamation of 12 different queries and formulas. If the results don't need to be viewed by people, you can always choose to hide specific columns and rows.

Query formula not displaying results that start with (') leading apostrophe strings

I have a sheet here where I need to use query formula
It doesn't display data that start with ' symbol (strings).
How do I make them display? The red cells are empty.
You mentioned
I have a sheet here where I need to use query formula
You can use the following formula:
=QUERY(ARRAYFORMULA(IF(LEN(A2:A),TEXT(A2:A,0),"")))
(Following that, you can leave the cells as text or change them to numbers depending on their further use.)
Functions used:
QUERY
ArrayFormula
IF
LEN
TEXT
Query considers only one data type for each column. As it is stated in the official documentation:
In case of mixed data types in a single column, the majority data type
determines the data type of the column for query purposes. Minority
data types are considered null values.
Therefore, the solution is to change the format to Plain text for column A.
Result:
You can also convert column to text inside QUERY:
=ArrayFormula(QUERY(TO_TEXT(Sheet1!A2:A),"select *"))

How to transpose rows into columns of contact details without mixing them up? (Google Sheets)

I have a data set of contact details where the emails and their names are scattered in rows, I would like to list them in 2 nice columns. I've tried using "paste special" and use this code below, but none of them worked.
This is how it looks like:
I've tried this code, but it only applies to one row, whereas I want to apply it to all rows and columns.
=transpose(A2:R2)
and
=transpose (A2:R300)
Both don't work. I hope somebody can help me with this, I'd really appreciate it. Thanks in advance!
It looks that you are using the wrong terms so you are using the wrong functions.
Apparently you have a cell with data separated by spaces and break lines and you want to have each email and name on it's own cell, having emails on one column and names on the next column.
One way to achieve that, first replace the separating spaces by using a character like | and the break lines by another different character like $.
Note: Some people use Unicode characters that are very unlikely to appear like ♦, ❤.
To do the above for break lines you could use FIND and REPLACE (Ctrl + H) or function formulas like REGEXREPLACE, SUBSTITUTE, and maybe others. As there are spaces used both as word separators and values separators, FIND and REPLACE can't be used easily. For a single cell, maybe the easier way is to insert the name/email separator manually.
Then separate the cell data. To do this you could use a formula function like SPLIT or Data > Separate values into columns.
Another way is by using Google Apps Script and JavaScript string handling methods but basically the algorithm is the same.
Related
How to split this complex string into 3 columns and 50 rows using Google sheet script
Google Apps Script: Create new rows for cells that contain commas
Google Sheets: string to columns and rows

Resources