Cut out substrings of a URL in Google Spreadsheets with different characters?

Cut out substrings of a URL in Google Spreadsheets with different characters? - google-sheets

I want to cut out substrings from this url XY.com/de/haus/dach-ziegel-stein/ and put the values each in its own columns in Google Spreadsheet.
I want to cut out by / and by -.
With this url example:
Column A should be de
Column B should be haus
Column C should be dach
Column D should be ziegel
Column E should be stein

You can use the following single formula for a range
=INDEX(IFERROR(SPLIT(
REGEXREPLACE(
REGEXREPLACE(A125:A128,"^\w+\.\w+\/"," "),
"\/|\-"," ")," ")))
(do adjust ranges and locale according to your needs)
Or simpler
=INDEX(IFERROR(SPLIT(
REGEXREPLACE(A125:A128,"\w+\.\w+\/"," "),"/|-")))
Functions used:
INDEX
IFERROR
SPLIT
REGEXREPLACE

You can use a split formula like so to split by multiple delimiters:
=SPLIT(REGEXREPLACE(right(A1,len(A1)-(find(".com",A1)+3)), "-|/", "😎"), "😎")
This formula works by first trimming off the first part of the URL:
right(A1,len(A1)-(find(".com",A1)+3)
It finds where the end of the url, .com, starts within the string, and then adds 3 to get the end of the url (ie in the example URL, XY.com/..., .com starts at position 3, so we add 3 to compensate for the c o m, which gets us an ending position of 6).
From there, we trim the URL starting on the right hand side, getting the length of the entire URL minus the ending position of the .com.
Finally, we replace the two delimiters that we want to split by with a unicode character, in this case 😎.
REGEXREPLACE(right(A1,len(A1)-(find(".com",A1)+3)), "-|/", "😎")
And then simply split that string by the unicode character, effectively splitting by both - and /.

Related

How to cut out substrings (paths) of a URL in Google Spreadsheets?

I want to cut out substrings from this url XYZ.com/de/haus/dach/ and put the values each in its own columns in Google Spreadsheet.
With this url example:
Column A should be "de"
Column B should be "haus"
Column C should be "dach"
How can I do that?

Remove the characters before the first /. This can be done in a number of ways, including REGEXREPLACE or through a combination of RIGHT, LEN and FIND.
SPLIT the resulting string.
=SPLIT(RIGHT(A1,(LEN(A1)-(FIND("/",A1)))),"/")

Google Sheets Extract Text between two characters

I have a field where I need to extract the text between two characters.
I've found regexextract and I got it to work when there is one character but I can't for the life get it to work with multiple characters.
2020-02: Test Course (QAS)
I need to extract text after : and before (
So it would just return "Test Course"
TYIA

If it's for just one cell (say A2):
=IFERROR(TRIM(REGEXEXTRACT(A2,":([^\(]+)")))
This will return what you want regardless of spaces after the colon or before the opening parenthesis. If no match is found, null will be returned.
If it's to process an entire range (say, A2:A), place the following in, say, B2 of an otherwise empty Col B:
=ArrayFormula(IF(A2:A="",,IFERROR(TRIM(REGEXEXTRACT(A2:A,":([^\(]+)")),A2:A)))
This will return what you want regardless of spaces after the colon or before the opening parenthesis. If no match is found, the original string will be returned.
In both cases, the REGEX string...
:([^\(]+)
... means "a grouping of any number of characters that aren't an opening parenthesis and which follows a colon."

One way to do that would be with the INDEX() and SPLIT() functions like this:
=TRIM(INDEX(SPLIT(A2,":("),2)
Split splits the text into 3 parts using the : and (, then INDEX chooses the second part.
The TRIM() just gets rid of the spaces.

ArrayFormula, RegexExtract, and Join in Google Sheets

I have a data set wherein emails are populated. I would like to list all the surnames extracted in the emails per cell and will be all joined to a one single cell but I want to put a separator or delimeter to the emails obtaine per cell.
Here is the data set:
A
B
john.smith#gmail.com, jane.doe#gmail.com
UPDATE
john.smith#gmail.com
CLOSE
And here is the formula to extract
=ARRAYFORMULA(
PROPER(
REGEXEXTRACT(
A:A,
REGEXREPLACE(
A:A,
"(\w+)#","($1)#"
)
)
)
)
This initially yields the ff:
C
D
Smith
Doe
Smith
I would like to use JOIN() inside the ARRAYFORMULA() but it is not working as I seem to think it would since it outputs an error that it only accepts one row or one column of data. My initial understanding of ARRAYFORMULA() is that it iterates through the course of the data, so I thought it will JOIN() first, and then move on to the next element/row but I guess it doesn't work that way. I can use FLATTEN() but I want to have delimiters or separators in between the row elements. I need help in obtaining my intended final result which will look like this:
UPDATE:
Smith
Doe
CLOSE:
Smith
All are located in one cell, C1. UPDATE and CLOSE are from column B.
EDIT: I would like to clarify that the email entries in column A are dynamic and maybe more than two.

I think this will work:
=arrayformula(flatten(if(A2:A<>"",regexreplace(trim(split(B2:B&":"&char(9999)&regexreplace(Proper(A2:A),"#[\w\.]+,\ ?|#.*",char(9999)&" "),char(9999))),".*\.",),)))
NOTES:
Proper(A2:A) changes the capitalisation.
The regexreplace "#[\w\.]+,\ ?|#.*" finds:
# symbol...
then any number of A-Z, a-z, 0-9, _ [using \w] or . [using \.]
then a comma
then 'optionally' a space \ [the optional bit is ?]
or [using |], the # symbol then an number of characters [using .*]
The result is replaced with a character that you won't expect to have in your text - char(9999) which is a pencil icon, and a trailing space (used later on when the flatten keeps a gap between lines). The purpose is to get all of the 'name.surname' and 'nameonly' values in front of any # symbol and separate them with char(9999).
Then infront of the regexreplace is B2:B&":"&char(9999)& which gets the value from column B, the : chanracter and char(9999).
The split() function then separates then into columns. Trim() is used to get rid of spaces in front of names that don't contain ..
The next regexreplace() function deletes anything before, and including . to keep surname or name without ..
The if(A2:A<>"" only process rows where there is a value in col A. The arrayformula() function is required to cascade the formula down the sheet.
I didn't output the results in a single cell, but it looks like you've sorted that with textjoin.
Here's my version of getting the results into a single cell.
=arrayformula(textjoin(char(10),1,if(A2:A<>"",REGEXREPLACE(B2:B&":"&char(10)&regexreplace(Proper(A2:A),"#[\w\.]+,\ ?|#.*",char(10)),".*\.",),)))

Assuming that your A:A cells will always contain only contiguous email addresses separated by commas, you could place this in, say, C1 (being sure that both Columns C and D are otherwise empty beforehand):
=TRANSPOSE(FILTER({B:B,IFERROR(REGEXEXTRACT(SPLIT(PROPER(A:A),","),"([^\.]+)#"))},A:A<>""))
If this produces the desired result and you'd like to know how it works, report back. The first step, however, is to make sure it works as expected.

use:
=INDEX(REGEXREPLACE(TRIM(QUERY(FLATTEN(QUERY(TRANSPOSE({{B1; IF(B2:B="",,"×"&B2:B)},
PROPER(REGEXEXTRACT(A:A, REGEXREPLACE(A:A, "(\w+)#", "($1)#")))})
,,9^9)),,9^9)), " |×", CHAR(10)))

Split an a comma seperated list of number like strings as text

I have a column which contains comma seperated numbers
A1: 004,005,0005,00005
I want to split and more stuff with this. When I split I end up with the following. Losing the zeros because excel parses them as text
=split(A1, "," )
4 | 5 | 5 | 5
instead of
004 | 005 | 0005 | 00005
The number of zeros is important. I will pass on the result to get a vertical list
=unique(transpose(arrayformula(trim(split(join(",",!A1:A),",")))))

Adding a single quote before a number will keep the format as is, thus, keeping the leading and trailing zeroes. Please see formula below:
=SPLIT(SUBSTITUTE(","&A1,",","#'"),"#")

Try this:
=ArrayFormula(REGEXREPLACE(SPLIT(REGEXREPLACE(A1,"(\d+)","~$1"),","),"~",""))
What this formula does is first replace every group of numbers with a tilde (~) and that same group of numbers. When SPLIT then acts on this new configuration, splitting at the commas, every group of numbers has the tilde in front of it and so retains all digits (because it is seen as a string and not a number). Finally, the outer REGEXREPLACE just gets rid of the tildes.

With just a single value in A1 you can use a little trick I described over at "Web-Applications" in this cross-website post:
=SPLIT(SUBSTITUTE("'"&A1,"#","#'"),"#")
The single quote will force GS to format the returned elements as being text. We can use this principle inside an arrayformula if you have to concatenate multiple strings. You don't need ARRAYFORMULA() per se, but instead of JOIN() you'd need TEXTJOIN() to use the 2nd parameter and exclude empty cells from being joined.
Formula in B1:
=UNIQUE(TRANSPOSE(SPLIT(SUBSTITUTE("'"&TEXTJOIN(",",TRUE,A1:A),",","#'"),"#")))

extract number from cell in openoffice calc

I have a column in open office like this:
abc-23
abc-32
abc-1
Now, I need to get only the sum of the numbers 23, 32 and 1 using a formula and regular expressions in calc.
How do I do that?
I tried
=SUMIF(F7:F16,"([:digit:].)$")
But somehow this does not work.

Starting with LibreOffice 6.4, you can use the newly added REGEX function to generically extract all numbers from a cell / text using a regular expression:
=REGEX(A1;"[^[:digit:]]";"";"g")
Replace A1 with the cell-reference you want to extract numbers from.
Explanation of REGEX function arguments:
Arguments are separated by a semicolon ;
A1: Value to extract numbers from. Can be a cell-reference (like A1) or a quoted text value (like "123abc"). The following regular expression will be applied to this cell / text.
"[^[:digit:]]": Match every character which is not a decimal digit. See also list of regular expressions in LibreOffice
The outer square brackets [] encapsulate the list of characters to search for
^ adds a NOT, meaning that every character not included in the search list is matched
[:digit:] represents any decimal digit
"": replace matching characters (every non-digit) with nothing = remove them
"g": replace all matches (don't stop after the first non-digit character)

Unfortunately Libre-Office only supports regex in find/replace and in search.
If this is a once-only deal, I would copy column A to column to B, then use [data] [text to columns] in B and use the - as a separator, leaving you with all the text in column B and the numbers in column C.
Alternatively, you could use =Right(A1,find("-",A1,1)+1) in column B, then sum Column C.

I think that this is not exactly what do you want, but maybe it can help you or others.
It is all about substring (in Calc called [MID][1] function):
First: Choose your cell (for example with "abc-23" content).
Secondly: Enter the start length ("british" --> start length 4 = tish).
After that: To print all remaining text, you can use the [LEN][2] function (known as length) with your cell ("abc-23") in parameter.
Code now looks like this:
D15="abc-23"
=MID(D15; 5; LEN(D15))
And the output is: 23
When you edit numbers (in this example 23), no problem. However, if you change anything before (text "abc-"), the algorithm collapses because the start length is defined to "5".

Paste the string in a cell, open search and replace dialog (ctrl + f) extended search option mark regular expression search for ([\s,0-9])([^0-9\s])+ and replace it with $1
adjust regex to your needs

I didn't figure out how to do this in OpenOffice/LibreOffice directly. After frustrations in searching online and trying various formulas, I realised my sheet was a simple CSV format, so I opened it up in vim and used vim's built-in sed-like feature to find/replace the text in vim command mode:
:%s/abc-//g
This only worked for me because there were no other columns with this matching text. If there are other columns with the same text, then the solution would be a bit more complex.
If your sheet is not a CSV, you could copy the column out to a text file and use vim to find/replace, and then paste the data back into the spreadsheet. For me, this was a lot less frustrating than trying to figure this out in LibreOffice...

I won't bother with a solution without knowing if there really is interest, but, you could write a macro to do this. Extract all the numbers and then implement the sum by checking for contained numbers in the text.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Cut out substrings of a URL in Google Spreadsheets with different characters? - google-sheets

Related

How to cut out substrings (paths) of a URL in Google Spreadsheets?

Google Sheets Extract Text between two characters

ArrayFormula, RegexExtract, and Join in Google Sheets

Split an a comma seperated list of number like strings as text

extract number from cell in openoffice calc

Categories

Resources