Extract last two words in a cell using openoffice - openoffice.org

I need a formula to extract the last two words in a cell using openoffice. For example a cell contains the words: "enjoy the rest of your day" I would like to extract "your day" using a formula. I know how to extract the last word:
=RIGHT(A1;LEN(A1)-FIND("*";SUBSTITUTE(A1;" ";"*";LEN(A1)-LEN(SUBSTITUTE(A1;" ";"")))))
which results in "day". But i need a formula for the last two words.

SEARCH supports regular expressions, so use
=RIGHT(A1, LEN(A1) - SEARCH("[^ ]+ +[^ ]+$", A1) + 1)
When I use semicolons as below, Calc silently substitutes commas, but the OP reports success entering it this way:
=RIGHT(A1; LEN(A1) - SEARCH("[^ ]+ +[^ ]+$"; A1) + 1)

Related

how to extract 10 digit phone number from a cell in Google sheet

I have multiple rows on Google sheet, each cell in the row contains emails messages which have loads of text and symbols etc and also phone numbers, I need to extract these phone numbers, the phone numbers are 10 digits no spaces.
I tried regexextract on google sheet but it gives me only the first number
=REGEXEXTRACT(E2,"\d+")
how do I extract multiple phone numbers data which are present in each cell on Google sheet.
try:
=IFERROR(SPLIT(REGEXREPLACE(A1:A, "\D+", " "), " "))
player0's answer is already good. But if you only need to extract 10 digit numbers and not include other numbers in the cell (e.g. 123), make sure to exclude the non-10 digit numbers.
I did modify the other answer to filter out those non-10 digit numbers using another regexreplace before using split.
Formula:
=split(regexreplace(regexreplace(A1,"\D+", " "),
"^\d{1,9}\s|\s\d{1,9}\s|\s\d{1,9}$|\d{11,}"
, " "), " ")
Patterns to exclude:
We need to exclude any numbers that aren't 10 succeeding digits. These are the following possible patterns.
^\d{1,9}\s less than 10 numbers at the start
\s\d{1,9}\s less than 10 number in between
\s\d{1,9}$ less than 10 numbers at the end
\d{11,} more than 10 numbers
Appending them all using | resulting into "^\d{1,9}\s|\s\d{1,9}\s|\s\d{1,9}$|\d{11,}"
Sample Cell Value:
123asd1234567890oia123joieqw9876543210asda123asd12345678910
Output:
EDIT:
It seems it is having an issue on multiple occurrences when the string has spaces in between. If script is an option, I do recommend this one below.
Code:
function get10DigitNums(string) {
var regex = /[^\w](\d{10})[^\w]|^(\d{10})[^w]|[^w](\d{10})$/g;
var result = [];
do {
m = regex.exec(string);
if (m) {
m.shift();
result.push(m);
}
} while (m);
return [result.flat().filter(Boolean)];
}
Output:
I needed to do sth similar (extract all phone numbers removing spaces and letters/ other symbols) and I think I found an easier solution:
=VALUE(REGEXREPLACE(F3;"[^[:digit:]]";""))
Example:
F3 = "32 3215-2263" ====> Result "3232152263"
Hope it helps.

Google Sheets Match Any Text

In Google Sheets, I have 2 columns (A and B) of text and I'm trying to set up conditional formatting to identify partial duplicates for when these 2 criteria are both met:
Text in A exactly matches with any other cell in A
and
Any of the individual words in cell B match any of the words in any other cell in B
So, if A2 = "target.com" and B2 = "Big Bonus"
I want it to flag any other cells where A = "target.com" and B = "Bonus Donuts" or "Biggest Exciting Bonus Ever" (because "Bonus" is identified as the duplicate) or "Exciting Big Day" (because "Big" is identified as the duplicate). I need it to be case-agnostic.
Nothing I have tried has even come close to working, so I won't include any of it here.
Sample Data: https://docs.google.com/spreadsheets/d/1DO-0uJRf6MOJ7fJiza5MAmFNIqpCwJ4WMH28j6wp22w/edit#gid=0
I've added a new sheet ("Erik Help") to your sample spreadsheet, with the following custom CF rule applied to the range A3:B ...
=AND($A3=$A$1, REGEXEXTRACT(LOWER($B3),SUBSTITUTE(TRIM(LOWER($B$1))," ","|")))
$A3=$A$1 should be self-explanatory.
For the rest, you see I used LOWER to make the comparisons caps-agnostic. I applied TRIM, just in case you accidentally added any spaces into the B1 string and then just replaced remaining spaces with the pipe symbol, which is interpreted by REGEXEXTRACT as OR.
If you don't want partial word matching (Big in Biggest), try this in the conditional custom formula:
=and($A3=$A$1, regexextract(" "&lower($B3)&" "," "&substitute(lower($B$1)," "," | ")&" "))

Count occurrences of a specific word in Google Spreadsheet

I have some cells with text. I need to count the occurrences of a specific word (not a list) from those cells.
Example sheet:https://docs.google.com/spreadsheets/d/1WECDbepLtZNwNfUmjxfKlCbLJgjyUBB72yvWDMzDBB0/edit?usp=sharing
So far I found one way to count it in English by using SUBSTITUTE to replace all these words with "":
=(LEN(B1)-LEN(SUBSTITUTE(UPPER(B1),UPPER(A5),"")))/LEN(A5)
However, I don't know why but it doesn't work in German.
Edited:
I don't want to count "Hero" in "Heroes". However, I'd like to count "afk" in "AFK-Spiel" (German for example). Is it possible?
If you want to count occurences of "Hero" word
=COUNTIF(SPLIT(JOIN(" ", B1:B3), " -."&CHAR(10)), "Hero")
Where:
B1:B3: cells with text
"Hero": the word to count
Explaination
JOIN(" ", B1:B3): Concatenation of all cells with text
SPLIT(..., " -."&CHAR(10)): Create an array with each words
COUNTIF(..., "Hero"): Count each array item equals to "Hero"
Example
if input text is:
Hero Hero-666 heroes heroic
➔ then formula will return 2.
If you want to count occurences of "Hero" string
(Even nested in an other word, i.e: "Heroes")
=COUNTA(SPLIT(UPPER(JOIN(" ",B1:B3)), "HERO", false, false))-1
Where:
B1:B3: cells with text
"HERO": the string to count
Explaination
JOIN(" ", B1:B3): Concatenation of all cells with text
UPPER(...): Convert text in upper case
SPLIT(..., "HERO"): Split on each occurences of the string
COUNTA(...)-1: Count how many splits have been done
Example
if input text is:
Hero Hero-666 heroes heroic
➔ then formula will return 4.
In your sheet you mention that the count should be 14.
Considering that, I believe you are looking for a solution to also include words like heroes or Hero
If you want to include variations of hero, like Hero or Heroes you can use the following:
Case insensitive for any language formula:
=COUNTIF(SPLIT(CONCATENATE(B1:B3), " "), "*heRO*")
You can even have *heRO* placed in a cell like A7 and use
=COUNTIF(SPLIT(CONCATENATE(B1:B3), " "), A7)
If you want just the word Hero, remove the asterisks * around it.
It also works for any language (including German).
try:
=ARRAYFORMULA(COUNTA(IFERROR(SPLIT(QUERY(SUBSTITUTE(
UPPER(B1:B3), UPPER(A5), "♦"),,99^99), "♦")))-1)
and for german:
=ARRAYFORMULA(COUNTA(IFERROR(SPLIT(QUERY(SUBSTITUTE(
UPPER(C1:C3), "HELD", "♦"),,99^99), "♦")))-1)

Google Spreadsheet - Not calculating numbers with space

I am trying to do a calculation of two cells, where one of them contains a number like this: 1 250.
If the number is written like that, and not 1250, then I cannot get the spreadsheet to do any calculations with it. Google suddenly do not treat it as a legit number anymore.
Why not just type 1250 instead of 1 250?
Well, I am getting the cell values from a html import function.
Any good advice on how to get around this?
Try something like this:
=Substitute(A2," ","")
In this formula, A2 is a cell. You are finding any spaces in that cell and then replacing it with a "non-space".
Use the substitute function to transform your number before using it in a formula. For instance, let's say you wanted to multiple F8 by 2, but F8 may contain spaces. You would then do:
=substitute(F8, " ","") * 2
Substitute didn't work form me. But these steps did:
Select one or several columns of data
Press Ctrl + H to get the "Find and Replace" dialog
Make sure "Search using regular expressions" is checked ✅
Enter \s to the "Find" field, and leave "Replace with" empty
Click on the "Replace all" button
Explanation:
\s is a regular expression matching any kind of whitespace character. There may have been some other kind of whitespace in my spreadsheet, not a regular " " (space) character, and that's why regex worked for me, while SUBSTITUTE() didn't.
I've also tried the REGEXREPLACE(A2, "\s", "") function, but it didn't seem to to anything in my case.

extract number from cell in openoffice calc

I have a column in open office like this:
abc-23
abc-32
abc-1
Now, I need to get only the sum of the numbers 23, 32 and 1 using a formula and regular expressions in calc.
How do I do that?
I tried
=SUMIF(F7:F16,"([:digit:].)$")
But somehow this does not work.
Starting with LibreOffice 6.4, you can use the newly added REGEX function to generically extract all numbers from a cell / text using a regular expression:
=REGEX(A1;"[^[:digit:]]";"";"g")
Replace A1 with the cell-reference you want to extract numbers from.
Explanation of REGEX function arguments:
Arguments are separated by a semicolon ;
A1: Value to extract numbers from. Can be a cell-reference (like A1) or a quoted text value (like "123abc"). The following regular expression will be applied to this cell / text.
"[^[:digit:]]": Match every character which is not a decimal digit. See also list of regular expressions in LibreOffice
The outer square brackets [] encapsulate the list of characters to search for
^ adds a NOT, meaning that every character not included in the search list is matched
[:digit:] represents any decimal digit
"": replace matching characters (every non-digit) with nothing = remove them
"g": replace all matches (don't stop after the first non-digit character)
Unfortunately Libre-Office only supports regex in find/replace and in search.
If this is a once-only deal, I would copy column A to column to B, then use [data] [text to columns] in B and use the - as a separator, leaving you with all the text in column B and the numbers in column C.
Alternatively, you could use =Right(A1,find("-",A1,1)+1) in column B, then sum Column C.
I think that this is not exactly what do you want, but maybe it can help you or others.
It is all about substring (in Calc called [MID][1] function):
First: Choose your cell (for example with "abc-23" content).
Secondly: Enter the start length ("british" --> start length 4 = tish).
After that: To print all remaining text, you can use the [LEN][2] function (known as length) with your cell ("abc-23") in parameter.
Code now looks like this:
D15="abc-23"
=MID(D15; 5; LEN(D15))
And the output is: 23
When you edit numbers (in this example 23), no problem. However, if you change anything before (text "abc-"), the algorithm collapses because the start length is defined to "5".
Paste the string in a cell, open search and replace dialog (ctrl + f) extended search option mark regular expression search for ([\s,0-9])([^0-9\s])+ and replace it with $1
adjust regex to your needs
I didn't figure out how to do this in OpenOffice/LibreOffice directly. After frustrations in searching online and trying various formulas, I realised my sheet was a simple CSV format, so I opened it up in vim and used vim's built-in sed-like feature to find/replace the text in vim command mode:
:%s/abc-//g
This only worked for me because there were no other columns with this matching text. If there are other columns with the same text, then the solution would be a bit more complex.
If your sheet is not a CSV, you could copy the column out to a text file and use vim to find/replace, and then paste the data back into the spreadsheet. For me, this was a lot less frustrating than trying to figure this out in LibreOffice...
I won't bother with a solution without knowing if there really is interest, but, you could write a macro to do this. Extract all the numbers and then implement the sum by checking for contained numbers in the text.

Resources