When I use SPLIT function, the data in my cell gets converted from TEXT to Number (notice the preceding '0' in cell C3 is removed).
is there a way to retain the data as Text after the split (or at least retain the 0 in front) ?
=SPLIT( JOIN("!",B1:B), "!")
When entering a string that looks like a number, one can keep it as a string by preceding it with a single apostrophe ' (which does not become a part of the string). Same thing works in formulas:
=split(substitute("'" & B1, "!", "!'"), "!")
This appends ' at the beginning and after each separator (which has to be done before splitting). After splitting, the result is as desired: strings, no leading apostrophe.
Related
I want to cut out substrings from this url XY.com/de/haus/dach-ziegel-stein/ and put the values each in its own columns in Google Spreadsheet.
I want to cut out by / and by -.
With this url example:
Column A should be de
Column B should be haus
Column C should be dach
Column D should be ziegel
Column E should be stein
You can use the following single formula for a range
=INDEX(IFERROR(SPLIT(
REGEXREPLACE(
REGEXREPLACE(A125:A128,"^\w+\.\w+\/"," "),
"\/|\-"," ")," ")))
(do adjust ranges and locale according to your needs)
Or simpler
=INDEX(IFERROR(SPLIT(
REGEXREPLACE(A125:A128,"\w+\.\w+\/"," "),"/|-")))
Functions used:
INDEX
IFERROR
SPLIT
REGEXREPLACE
You can use a split formula like so to split by multiple delimiters:
=SPLIT(REGEXREPLACE(right(A1,len(A1)-(find(".com",A1)+3)), "-|/", "😎"), "😎")
This formula works by first trimming off the first part of the URL:
right(A1,len(A1)-(find(".com",A1)+3)
It finds where the end of the url, .com, starts within the string, and then adds 3 to get the end of the url (ie in the example URL, XY.com/..., .com starts at position 3, so we add 3 to compensate for the c o m, which gets us an ending position of 6).
From there, we trim the URL starting on the right hand side, getting the length of the entire URL minus the ending position of the .com.
Finally, we replace the two delimiters that we want to split by with a unicode character, in this case 😎.
REGEXREPLACE(right(A1,len(A1)-(find(".com",A1)+3)), "-|/", "😎")
And then simply split that string by the unicode character, effectively splitting by both - and /.
I have a field where I need to extract the text between two characters.
I've found regexextract and I got it to work when there is one character but I can't for the life get it to work with multiple characters.
2020-02: Test Course (QAS)
I need to extract text after : and before (
So it would just return "Test Course"
TYIA
If it's for just one cell (say A2):
=IFERROR(TRIM(REGEXEXTRACT(A2,":([^\(]+)")))
This will return what you want regardless of spaces after the colon or before the opening parenthesis. If no match is found, null will be returned.
If it's to process an entire range (say, A2:A), place the following in, say, B2 of an otherwise empty Col B:
=ArrayFormula(IF(A2:A="",,IFERROR(TRIM(REGEXEXTRACT(A2:A,":([^\(]+)")),A2:A)))
This will return what you want regardless of spaces after the colon or before the opening parenthesis. If no match is found, the original string will be returned.
In both cases, the REGEX string...
:([^\(]+)
... means "a grouping of any number of characters that aren't an opening parenthesis and which follows a colon."
One way to do that would be with the INDEX() and SPLIT() functions like this:
=TRIM(INDEX(SPLIT(A2,":("),2)
Split splits the text into 3 parts using the : and (, then INDEX chooses the second part.
The TRIM() just gets rid of the spaces.
I have a data set wherein emails are populated. I would like to list all the surnames extracted in the emails per cell and will be all joined to a one single cell but I want to put a separator or delimeter to the emails obtaine per cell.
Here is the data set:
A
B
john.smith#gmail.com, jane.doe#gmail.com
UPDATE
john.smith#gmail.com
CLOSE
And here is the formula to extract
=ARRAYFORMULA(
PROPER(
REGEXEXTRACT(
A:A,
REGEXREPLACE(
A:A,
"(\w+)#","($1)#"
)
)
)
)
This initially yields the ff:
C
D
Smith
Doe
Smith
I would like to use JOIN() inside the ARRAYFORMULA() but it is not working as I seem to think it would since it outputs an error that it only accepts one row or one column of data. My initial understanding of ARRAYFORMULA() is that it iterates through the course of the data, so I thought it will JOIN() first, and then move on to the next element/row but I guess it doesn't work that way. I can use FLATTEN() but I want to have delimiters or separators in between the row elements. I need help in obtaining my intended final result which will look like this:
UPDATE:
Smith
Doe
CLOSE:
Smith
All are located in one cell, C1. UPDATE and CLOSE are from column B.
EDIT: I would like to clarify that the email entries in column A are dynamic and maybe more than two.
I think this will work:
=arrayformula(flatten(if(A2:A<>"",regexreplace(trim(split(B2:B&":"&char(9999)®exreplace(Proper(A2:A),"#[\w\.]+,\ ?|#.*",char(9999)&" "),char(9999))),".*\.",),)))
NOTES:
Proper(A2:A) changes the capitalisation.
The regexreplace "#[\w\.]+,\ ?|#.*" finds:
# symbol...
then any number of A-Z, a-z, 0-9, _ [using \w] or . [using \.]
then a comma
then 'optionally' a space \ [the optional bit is ?]
or [using |], the # symbol then an number of characters [using .*]
The result is replaced with a character that you won't expect to have in your text - char(9999) which is a pencil icon, and a trailing space (used later on when the flatten keeps a gap between lines). The purpose is to get all of the 'name.surname' and 'nameonly' values in front of any # symbol and separate them with char(9999).
Then infront of the regexreplace is B2:B&":"&char(9999)& which gets the value from column B, the : chanracter and char(9999).
The split() function then separates then into columns. Trim() is used to get rid of spaces in front of names that don't contain ..
The next regexreplace() function deletes anything before, and including . to keep surname or name without ..
The if(A2:A<>"" only process rows where there is a value in col A. The arrayformula() function is required to cascade the formula down the sheet.
I didn't output the results in a single cell, but it looks like you've sorted that with textjoin.
Here's my version of getting the results into a single cell.
=arrayformula(textjoin(char(10),1,if(A2:A<>"",REGEXREPLACE(B2:B&":"&char(10)®exreplace(Proper(A2:A),"#[\w\.]+,\ ?|#.*",char(10)),".*\.",),)))
Assuming that your A:A cells will always contain only contiguous email addresses separated by commas, you could place this in, say, C1 (being sure that both Columns C and D are otherwise empty beforehand):
=TRANSPOSE(FILTER({B:B,IFERROR(REGEXEXTRACT(SPLIT(PROPER(A:A),","),"([^\.]+)#"))},A:A<>""))
If this produces the desired result and you'd like to know how it works, report back. The first step, however, is to make sure it works as expected.
use:
=INDEX(REGEXREPLACE(TRIM(QUERY(FLATTEN(QUERY(TRANSPOSE({{B1; IF(B2:B="",,"×"&B2:B)},
PROPER(REGEXEXTRACT(A:A, REGEXREPLACE(A:A, "(\w+)#", "($1)#")))})
,,9^9)),,9^9)), " |×", CHAR(10)))
I have a column which contains comma seperated numbers
A1: 004,005,0005,00005
I want to split and more stuff with this. When I split I end up with the following. Losing the zeros because excel parses them as text
=split(A1, "," )
4 | 5 | 5 | 5
instead of
004 | 005 | 0005 | 00005
The number of zeros is important. I will pass on the result to get a vertical list
=unique(transpose(arrayformula(trim(split(join(",",!A1:A),",")))))
Adding a single quote before a number will keep the format as is, thus, keeping the leading and trailing zeroes. Please see formula below:
=SPLIT(SUBSTITUTE(","&A1,",","#'"),"#")
Try this:
=ArrayFormula(REGEXREPLACE(SPLIT(REGEXREPLACE(A1,"(\d+)","~$1"),","),"~",""))
What this formula does is first replace every group of numbers with a tilde (~) and that same group of numbers. When SPLIT then acts on this new configuration, splitting at the commas, every group of numbers has the tilde in front of it and so retains all digits (because it is seen as a string and not a number). Finally, the outer REGEXREPLACE just gets rid of the tildes.
With just a single value in A1 you can use a little trick I described over at "Web-Applications" in this cross-website post:
=SPLIT(SUBSTITUTE("'"&A1,"#","#'"),"#")
The single quote will force GS to format the returned elements as being text. We can use this principle inside an arrayformula if you have to concatenate multiple strings. You don't need ARRAYFORMULA() per se, but instead of JOIN() you'd need TEXTJOIN() to use the 2nd parameter and exclude empty cells from being joined.
Formula in B1:
=UNIQUE(TRANSPOSE(SPLIT(SUBSTITUTE("'"&TEXTJOIN(",",TRUE,A1:A),",","#'"),"#")))
In a Google Sheets spreadsheet, I have the cell A1 with value "people 12-14 ABC". I want to extract the exact match "ABC" into another cell. The contents of cell A1 can change, e.g. to "woman 60+ ABCD". For this input, I would want to extract "ABCD". If A1 was instead "woman 12-20 CAE", I would want "CAE".
There are 5 possible strings that the last part may be: (ABC, ABCD, AB, CAE, C), while the first portions are very numerous (~400 possibilities).
How can I determine which of the 5 strings is in A1?
If the first part "only" has lower case or numbers and the last part "only" UPPER case,
=REGEXREPLACE(D3;"[^A-E]";)
Anchor: Space
=REGEXEXTRACT(A31;"\s([A-E]+)$")
If you can guarantee well-formatted input, this is simply a matter of splitting the contents of A1 into its component parts (e.g. "gender_filter", "age range", and "my 5 categories"), and selecting the appropriate index of the resultant array of strings.
To convert a cell's contents into an array of that content, the SPLIT() function can be used.
B1 = SPLIT(A1, " ")
would put entries into B1, C1, and D1, where D1 has the value you want - provided your gender filter and age ranges.
Since you probably don't want to have those excess junk values, you want to contain the result of split entirely in B1. To do this, we need to pass the array generated by SPLIT to a function that can take a range or array input. As a bonus, we want to sub-select a part of this range (specifically, the last one). For this, we can use the INDEX() function
B1 = INDEX(SPLIT(A1, " "), 1, COUNTA(SPLIT(A1, " ")))
This tells the INDEX function to access the first row and the last column of the range produced by SPLIT, which for the inputs you have provided, is "ABC", "ABCD", and "CAE".