I have a list of numbers in a few formats that may or may not include a dot and a comma. The numbers are locked in a string. For example:
hello 1,000 goodbye
hola 2,000.12 ciao
Hallo 3000.00 Auf Wiedersehen
How can I extract the numbers?
I don't care if the comma is added but the dot is obviously important.
I need the regular_expression to be used in REGEXEXTRACT (and the rest of the REGEX formulas.
The output should be:
1000
2000.12
3000.00
Supposing that your raw data is in A2:A, use this in B2 (or the second cell) of an otherwise empty column:
=ArrayFormula(IF(A2:A="",,IFERROR(VALUE(REGEXEXTRACT(A2:A,"\d[\d,\.]*\d")))))
The REGEX portion reads, in plain English, "Extract any portion that starts with a digit followed by any number of digits, commas or periods (or none of these) and ends with a digit."
You will likely want to apply Format > Number > Currency to the results column.
Related
I'm making a list for buying groceries in Google Sheets and have the following value in cell B4.
0.95 - Lemon Juice
2.49 - Pringle Chips
1.29 - Baby Carrots
9.50 - Chicken Kebab
What I'm trying to do is split using the dash character and combine the costs (0.95+2.49+1.29+9.50).
I've tried to use Index(SPLIT(B22,"-"), 7) and SPLIT(B22,"-") but I don't know how to use only numbers from the split string.
Does someone know how to do this? Here's a sample sheet.
Answer
The following formula should produce the result you desire:
=SUM(ARRAYFORMULA(VALUE(REGEXEXTRACT(SPLIT(B4,CHAR(10)),"(.*)-"))))
Explanation
The first thing to do is to split the entry in B4 into its component parts. This is done by using the =SPLIT function, which takes the text in B4 and returns a separate result every time it encounters a specific delimiter. In this case, that is =CHAR(10), the newline character.
Next, all non-number information needs to be removed. This is relatively easy in your sample data because the numbers always appear to the left of a dash. =REGEXEXTRACT uses a regular expression to only return the text to the left of the dash.
Before the numbers can be added together, however, they must be converted to be in a number format. The =VALUE function is used to convert each result from a text string containing a number to an actual number.
All of this is wrapped in an =ARRAYFORMULA so that =VALUE and =REGEXEXTRACT parse each returned value from =SPLIT, rather than just the first.
Finally, all results are added together using =SUM.
Functions used:
=CHAR
=SPLIT
=REGEXEXTRACT
=VALUE
=ARRAYFORMULA
=SUM
Firstly you can add , symbols start and ends of numbers with below code:
REGEXREPLACE(B4,"([0-9\.]+)",",$1,")
Then split it based of , sign.
SPLIT(A8, ",")
Try below formula (see your sheet)-
=SUM(ArrayFormula(--REGEXEXTRACT(SPLIT(B4,CHAR(10)),"-*\d*\.?\d+")))
I have a word list of over 10,000 words, but this is just a sample:
'Tis midnight
sev'n words spoke
th'Immortal night
A wonder-working pow'r
Wondrous deliv'rer to me
I want to delete all words that contain apostrophes so the list should look like this:
midnight
words spoke
night
A wonder-working
Wondrous to me
How can I do this using Sublime Text so it finds apostrophes and smart apostrophes (’)?
You could use a character class['’] to match both variations of the apostrophes and match zero or more times a non-whitespace character \S* before or after the matched apostrophe followed by optional horizontal white-space chars.
\S*['’]\S*\h*
Regex demo
A slightly more optimized version without preventing the first \S* causing backtracking could be using a negated character class [^\s'’]* to match until the first apostrophe.
[^\s'’]*['’]\S*\h*
Regex demo
I have some strings with a sentence and i need to subdivise it into a substring of maximum 40 characters.
But i don't want to split the sentence in the middle of a word.
I tried with .gsub function but it's return 40 characters maximum and avoid to cut the string in the middle of a word. But it's return only the first occurence.
sentence[0..40].gsub(/\s\w+$/,'')
I tried with split but i can select only the fist 40 characters and split in the middle of a word...
sentence.split(...){40}
My string is "Sure, we will show ourselves only when we know the east door has been opened.".
The string output i want is
["Sure, we will show ourselves only when we","know the east door has
been opened."]
Do you have a solution ? Thanks
Your first attempt:
sentence[0..40].gsub(/\s\w+$/,'')
almost works, but it has one fatal flaw. You are splitting on the number of characters before cutting off the last word. This means you have no way of knowing whether the bit being trimmed off was a whole word, or a partial word.
Because of this, your code will always cut off the last word.
I would solve the problem as follows:
sentence[/\A.{0,39}[a-z]\b/mi]
\A is an anchor to fix the regex to the start of the string.
.{0,39}[a-z] matches on 1 to 40 characters, where the last character must be a letter. This is to prevent the last selected character from being punctuation or space. (Is that desired behaviour? Your question didn't really specify. Feel free to tweak/remove that [a-z] part, e.g. [a-z.] to match a full stop, if desired.)
\b is a word boundary look-around. It is a zero-width matcher, on beginning/end of words.
/mi modifiers will include case insensitive (i.e. A-Z) and multi-line matches.
One very minor note is that because this regex is matching 1 to 40 characters (rather than zero), it is possible to get a null result. (Although this is seemingly very unlikely, since you'd need a 1-word, 41+ letter string!!) To account for this edge case, call .to_s on the result if needed.
Update: Thank you for the improved edit to your question, providing a concrete example of an input/result. This makes it much clearer what you are asking for, as the original post was somewhat ambiguous.
You could solve this with something like the following:
sentence.scan(/.{0,39}[a-z.!?,;](?:\b|$)/mi)
String#scan returns an array of strings that match the pattern - so you can then re-join these strings to reconstruct the original.
Again, I have added a few more characters (!?,;) to the list of "final characters in the substring". Feel free to tweak this as desired.
(?:\b|$) means "either a word boundary, or the end of the line". This fixes the issue of the result not including the final . in the substrings. Note that I have used a non-capture group (?:) to prevent the result of scan from changing.
As you can see from the title, I would like to write a regular expression pattern to find a string that consists of various numbers and is separated by comma every three digits. The length of string can vary.
I am still pretty new to regular expression thingy so can anyone help me with that? Thanks a lot in advance.
P.S.
Anyone could also suggest some of good resources, like website, books, etc, for learning regular expression?
This regex shall match that:
\d{1,3}(?:,\d{3})*
If you want to exclude match to a substring of an ill-formed pattern, you might want to do:
(?:\A|[^,\d])(\d{1,3}(?:,\d{3})*)(?:\z|[^,\d])
Explanation of the first regex
\d{1,3} 1 to 3 consecutive numerals
,\d{3} A comma followed by 3 consecutive numerals
(?:,\d{3})* Zero or more repetition of a non-capturing group of a comma followed by 3 consecutive numerals
Explanation of the second regex
(?:\A|[^,\d]) A non-capturing group of either the beginning of the string, or anything other than comma or numeral
(\d{1,3}(?:,\d{3})*) A capturing group of 1 to 3 consecutive numerals followed by zero or more repetition of a non-capturing group of a comma followed by 3 consecutive numerals
(?:\z|[^,\d]) A non-capturing group of either the end of the string, or anything other than comma of numeral
Try http://regexlib.com for good examples and links to tools to help you get up to speed with RegEx
Also try this regex tester app http://www.ultrapico.com/Expresso.htm
And another tool I've used before here http://osherove.com/tools
I'm trying to split a string and counts the number os words using Ruby but I want ignore special characters.
For example, in this string "Hello, my name is Hugo ..." I'm splitting it by spaces but the last ... should't counts because it isn't a word.
I'm using string.inner_text.split(' ').length. How can I specify that special characters (such as ... ? ! etc.) when separated from the text by spaces are not counted?
Thank you to everyone,
Kind Regards,
Hugo
"Hello, my name is não ...".scan /[^*!#%\^\s\.]+/
# => ["Hello,", "my", "name", "is", "não"]
/[^*!#%\^]+/ will match anything other than *!#%\^. You can add more to this list which need not be matched
this is part answer, part response to #Neo's answer: why not use proper tools for the job?
http://www.ruby-doc.org/core-1.9.3/Regexp.html says:
POSIX bracket expressions are also similar to character classes. They provide a portable alternative to the above, with the added benefit that they encompass non-ASCII characters. For instance, /\d/ matches only the ASCII decimal digits (0-9); whereas /[[:digit:]]/ matches any character in the Unicode Nd category.
/[[:alnum:]]/ - Alphabetic and numeric character
/[[:alpha:]]/ - Alphabetic character
...
Ruby also supports the following non-POSIX character classes:
/[[:word:]]/ - A character in one of the following Unicode general categories Letter, Mark, Number, Connector_Punctuation
you want words, use str.scan /[[:word:]]+/