TextPad Replace Character and Line Feed with Nothing - str-replace

How do I replace a line in TextPad ' with nothing (ie: delete lines with just that one character)?
I have an Excel Spreadsheet containing three columns:
Column A - single quote
Column B - some number
Column C - single quote plus a comma
There are over 90,000 rows on this spreadsheet with data in column B. There are over one million rows with just a single quote in column A because I did a "Ctrl+D" on that column to copy the value in that column (a single quote) down to all rows.
When I copy and paste these three columns into TextPad, I end up with over one million lines. I replaced the tabs with nothing using the F8/Replace dialog.
(Replace: tab with: empty string)
The majority of what is left are lines that contain only a single quote. I want to delete these 900,000 extra lines.
How do I specify a Replace (delete) of single quote + line feed. I do not want to delete any of the single quotes from the lines that include a number that came from column B.

I just figured it out. The backslash n is the line feed.
If I check Regular Expression and enter this Find what:
'\n
(Keeping empty string for Replace with) and Replace All, I have deleted those extra lines.

I also experienced the same...it did not work for me until I did this:
uncheck the regular expression first before entering \n in the find box and replacing with whatever you chose to (in my case, it was ',').
Your result might be an entire list becoming transposed (that's what happened to my data).

Related

How to remove a piece of text from a cell?

I'm trying to remove a piece of text (Perfomance) from a column in Google Spreadsheet that contains (XX Performance) XX is a number like 89. I'm using:
=REGEXREPLACE(D:D, " Performance "," - ")
But no love...
enter image description here
Try this Example Sheet
=ArrayFormula(IF(D2:D="",, REGEXEXTRACT(D2:D, "[0-9]+")))
You can use the expression \D+:
\D matches any character that's not a digit (equivalent to [^0-9])
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed
The formula will be like:
=REGEXREPLACE(D:D, "\D+","")
UPDATE
I did put it in another column otherwise it creates a circular dependency. The data is imported via API from another app.
Then you will need to create another sheet or use a hidden column to put that information and then use the regex on the column you want the final result.

ArrayFormula, RegexExtract, and Join in Google Sheets

I have a data set wherein emails are populated. I would like to list all the surnames extracted in the emails per cell and will be all joined to a one single cell but I want to put a separator or delimeter to the emails obtaine per cell.
Here is the data set:
A
B
john.smith#gmail.com, jane.doe#gmail.com
UPDATE
john.smith#gmail.com
CLOSE
And here is the formula to extract
=ARRAYFORMULA(
PROPER(
REGEXEXTRACT(
A:A,
REGEXREPLACE(
A:A,
"(\w+)#","($1)#"
)
)
)
)
This initially yields the ff:
C
D
Smith
Doe
Smith
I would like to use JOIN() inside the ARRAYFORMULA() but it is not working as I seem to think it would since it outputs an error that it only accepts one row or one column of data. My initial understanding of ARRAYFORMULA() is that it iterates through the course of the data, so I thought it will JOIN() first, and then move on to the next element/row but I guess it doesn't work that way. I can use FLATTEN() but I want to have delimiters or separators in between the row elements. I need help in obtaining my intended final result which will look like this:
UPDATE:
Smith
Doe
CLOSE:
Smith
All are located in one cell, C1. UPDATE and CLOSE are from column B.
EDIT: I would like to clarify that the email entries in column A are dynamic and maybe more than two.
I think this will work:
=arrayformula(flatten(if(A2:A<>"",regexreplace(trim(split(B2:B&":"&char(9999)&regexreplace(Proper(A2:A),"#[\w\.]+,\ ?|#.*",char(9999)&" "),char(9999))),".*\.",),)))
NOTES:
Proper(A2:A) changes the capitalisation.
The regexreplace "#[\w\.]+,\ ?|#.*" finds:
# symbol...
then any number of A-Z, a-z, 0-9, _ [using \w] or . [using \.]
then a comma
then 'optionally' a space \ [the optional bit is ?]
or [using |], the # symbol then an number of characters [using .*]
The result is replaced with a character that you won't expect to have in your text - char(9999) which is a pencil icon, and a trailing space (used later on when the flatten keeps a gap between lines). The purpose is to get all of the 'name.surname' and 'nameonly' values in front of any # symbol and separate them with char(9999).
Then infront of the regexreplace is B2:B&":"&char(9999)& which gets the value from column B, the : chanracter and char(9999).
The split() function then separates then into columns. Trim() is used to get rid of spaces in front of names that don't contain ..
The next regexreplace() function deletes anything before, and including . to keep surname or name without ..
The if(A2:A<>"" only process rows where there is a value in col A. The arrayformula() function is required to cascade the formula down the sheet.
I didn't output the results in a single cell, but it looks like you've sorted that with textjoin.
Here's my version of getting the results into a single cell.
=arrayformula(textjoin(char(10),1,if(A2:A<>"",REGEXREPLACE(B2:B&":"&char(10)&regexreplace(Proper(A2:A),"#[\w\.]+,\ ?|#.*",char(10)),".*\.",),)))
Assuming that your A:A cells will always contain only contiguous email addresses separated by commas, you could place this in, say, C1 (being sure that both Columns C and D are otherwise empty beforehand):
=TRANSPOSE(FILTER({B:B,IFERROR(REGEXEXTRACT(SPLIT(PROPER(A:A),","),"([^\.]+)#"))},A:A<>""))
If this produces the desired result and you'd like to know how it works, report back. The first step, however, is to make sure it works as expected.
use:
=INDEX(REGEXREPLACE(TRIM(QUERY(FLATTEN(QUERY(TRANSPOSE({{B1; IF(B2:B="",,"×"&B2:B)},
PROPER(REGEXEXTRACT(A:A, REGEXREPLACE(A:A, "(\w+)#", "($1)#")))})
,,9^9)),,9^9)), " |×", CHAR(10)))

excel to split data after comma and before space

I want to get the first name and middle name from cell. I am able to get the first name with the excel formula:
=LEFT(D2,FIND(",",D2)-1)
The name i.e Shukla,Vinay Devanand is reflecting in cell and I am able to get Shukla with above formula and now want only Vinay (All the characters after first comma and before first space)
Please help with the formula.
Vinay can be extracted by applying what is essentially the same process (replacing space for , however) on what is left once one more character (the ,) is added to the length of what is already known (Shukla) and used as the start point:
=LEFT(MID(D2,LEN(LEFT(D2,FIND(",",D2)))+1,LEN(D2)),FIND(" ",MID(D2,LEN(LEFT(D2,FIND(",",D2)))+1,LEN(D2))))

Get data between number two and three delimiter

I have a large list of people where each person has a line like this.
Bill Gates, IT Manager, Microsoft, <https://www.linkedin.com/in/williamhgates>
I want to extract the company name in a specific cell. In this example, it would be Microsoft, which is between the second and third delimiters (in this case, the delimiter is ", "). How can I do this?
Right now I'm using the split method (=SPLIT(A2, ", ",false)). But it gives me four different cells with information. I would like a command only to output the company in one cell. Can anyone help? I have tried different things, but I can't seem to find anything that works.
Maybe some regex can do it, but I'm not into regex.
Short answer
Use INDEX and SPLIT to get the value between two separators. Example
=INDEX(SPLIT(A1,", ",FALSE),2)
Explation
SPLIT returns an 1 x n array.
The first argument of INDEX could be a range or an array.
The second and third arguments of INDEX are optional. If the first parameter is an array that has only one row or one column, it will assume that the second argument corresponds to the larger side of the array, so there is no need to use the third argument.
A bit nasty, but this formula works, assuming data in cell D3.
=MID(D3,FIND(",",D3,FIND(",",D3)+1)+2,FIND(",",D3,FIND(",",D3,FIND(",",D3)+1)+1)-FIND(",",D3,FIND(",",D3)+1)-2)
Broken down, this is what it does:
Take the Mid point of D3 =MID(D3
starting two characters after the 2nd comma FIND(",",D3,FIND(",",D3)+1)+2
and the number of characters between the 2nd and 3rd comma, excluding spaces FIND(",",D3,FIND(",",D3,FIND(",",D3)+1)+1)-FIND(",",D3,FIND(",",D3)+1)-2)
I'll add my favourite ArratFormula, which you could use to expand list automatically without draggind formula down. Assumptions:
you have list with data in range "A1:A20"
all data have same sintax "...,Company Name, <..."
In this case you could use Arrayformula, pasted in cell B1:
=ArrayFormula(REGEXEXTRACT(A1:A20,", ([^,]+), <"))
If your data doest's always look like "...,Company Name, <..." or you wish to get different ounput, use this formula in cell B1:
=QUERY(QUERY(TRANSPOSE(SPLIT(JOIN(", ",A1:A20),", ",0)),"offset 2"),"skipping 4")
in this formula:
change 2 in offset 2 to 0, 1, 2, 3 to get name, position, company, link
in skipping 4 4 is a number of items.
Number of items can be counted by formula:
=len(A1)-len(SUBSTITUTE(A1,",",""))+1
and final formula is:
=QUERY(QUERY(TRANSPOSE(SPLIT(JOIN(", ",A1:A20),", ",0)),"offset 2"),
"skipping "&len(A1)-len(SUBSTITUTE(A1,",",""))+1)

extract number from cell in openoffice calc

I have a column in open office like this:
abc-23
abc-32
abc-1
Now, I need to get only the sum of the numbers 23, 32 and 1 using a formula and regular expressions in calc.
How do I do that?
I tried
=SUMIF(F7:F16,"([:digit:].)$")
But somehow this does not work.
Starting with LibreOffice 6.4, you can use the newly added REGEX function to generically extract all numbers from a cell / text using a regular expression:
=REGEX(A1;"[^[:digit:]]";"";"g")
Replace A1 with the cell-reference you want to extract numbers from.
Explanation of REGEX function arguments:
Arguments are separated by a semicolon ;
A1: Value to extract numbers from. Can be a cell-reference (like A1) or a quoted text value (like "123abc"). The following regular expression will be applied to this cell / text.
"[^[:digit:]]": Match every character which is not a decimal digit. See also list of regular expressions in LibreOffice
The outer square brackets [] encapsulate the list of characters to search for
^ adds a NOT, meaning that every character not included in the search list is matched
[:digit:] represents any decimal digit
"": replace matching characters (every non-digit) with nothing = remove them
"g": replace all matches (don't stop after the first non-digit character)
Unfortunately Libre-Office only supports regex in find/replace and in search.
If this is a once-only deal, I would copy column A to column to B, then use [data] [text to columns] in B and use the - as a separator, leaving you with all the text in column B and the numbers in column C.
Alternatively, you could use =Right(A1,find("-",A1,1)+1) in column B, then sum Column C.
I think that this is not exactly what do you want, but maybe it can help you or others.
It is all about substring (in Calc called [MID][1] function):
First: Choose your cell (for example with "abc-23" content).
Secondly: Enter the start length ("british" --> start length 4 = tish).
After that: To print all remaining text, you can use the [LEN][2] function (known as length) with your cell ("abc-23") in parameter.
Code now looks like this:
D15="abc-23"
=MID(D15; 5; LEN(D15))
And the output is: 23
When you edit numbers (in this example 23), no problem. However, if you change anything before (text "abc-"), the algorithm collapses because the start length is defined to "5".
Paste the string in a cell, open search and replace dialog (ctrl + f) extended search option mark regular expression search for ([\s,0-9])([^0-9\s])+ and replace it with $1
adjust regex to your needs
I didn't figure out how to do this in OpenOffice/LibreOffice directly. After frustrations in searching online and trying various formulas, I realised my sheet was a simple CSV format, so I opened it up in vim and used vim's built-in sed-like feature to find/replace the text in vim command mode:
:%s/abc-//g
This only worked for me because there were no other columns with this matching text. If there are other columns with the same text, then the solution would be a bit more complex.
If your sheet is not a CSV, you could copy the column out to a text file and use vim to find/replace, and then paste the data back into the spreadsheet. For me, this was a lot less frustrating than trying to figure this out in LibreOffice...
I won't bother with a solution without knowing if there really is interest, but, you could write a macro to do this. Extract all the numbers and then implement the sum by checking for contained numbers in the text.

Resources