Split words contained in string based on uppercase - lua

I have a string that has no spaces, what marks every single word is the uppercase letter at the beginning of each word, what would be the best way for extracting them?
here's what i've got:
str = "TheseAreAFewWordsAndThis-one-contains-wildcards"
Desired output would be:
These
Are
A
Few
Words
And
This-one-contains-wildcards
I don't need to treat any magical characters as such, they can stay in the string no problems

for wrd in str:gmatch("%u%U*") do print(wrd) end
"%u%U*" is a string pattern that matches a single capital letter followed by any number of non capital letter characters.
Please read https://www.lua.org/manual/5.4/manual.html#6.4.1

Related

How do I remove all special characters, punctuation and whitespaces from a string in Lua?

In Lua (I can only find examples in other languages), how do I remove all punctuation, special characters and whitespace from a string? So, for example, s t!r#i%p^(p,e"d would become stripped?
In Lua patterns, the character class %p represents all punctuation characters, the character class %c represents all control characters, and the character class %s represents all whitespace characters. So you can represent all punctuation characters, all control characters, and all whitespace characters with the set [%p%c%s].
To remove these characters from a string, you can use string.gsub. For a string str, the code would be the following:
str = str:gsub('[%p%c%s]', '')
(Note that this is essentially the same as Egor's code snippet above.)
If you remove all special chars, whitespace, … all that's left is letters and numbers, right? So if str is your string,
str:gsub( "%W", "" )
will be the cleaned string.
%w matches all word characters, upper-case it %W to match all non-word characters. If that's not exactly what you want to match, you can build your own character class -- e.g. if you wanted to include _ as an acceptable character, you could use [^%w_].
This works for me
m=your_string:gsub('%W','')

Ultraedit regex to remove all words which contains number

I am trying to make a Ultraedit regex which allows me to remove all words of a txt file containing a number.
For example:
test
test2
t2est
te2st
and...
get only
test
A case-insensitive search with Perl regular expression search string \<[a-z]+\d\w*\> finds entire words containing at least 1 digit.
\< ... beginning of a word. \b for any word boundary could be also used.
[a-z]+ ... any letter 1 or more times. You can put additional characters into the square brackets like ÄÖÜäöüß also used in language of text file.
\d ... any digit, i.e. 0-9.
\w* ... any word character 0 or more times. Any word character means all word characters according to Unicode table which includes language dependent word characters, all digits and the underscore.
\> ... end of a word. \b for any word boundary could be also used.
A case-insensitive search with UltraEdit regular expression search string [a-z]+[0-9][a-z0-9_]++ finds also entire words containing at least 1 digit if additionally the find option Match whole word is also checked.
[a-z]+ ... any letter 1 or more times. You can put additional characters into the square brackets used in language of text file.
[0-9] ... any digit.
[a-z0-9_]++ ... any letter, digit or underscore 0 or more times.
The UltraEdit regexp search string [a-z]+[0-9][a-z0-9_]++ in Unix/Perl syntax would be [a-z]+[0-9][a-z0-9_]* which could be also used with find option Match whole word checked instead of the Perl regexp search.

lua string split - how to split a string and get a substring starting from capital letter

I have come across functions that split a string in lua ut my requirement is to split a string when it starts with a lowercase letter, If it does,in my case the string is bound to have a part of string starting with capital letter like :
mdmMSH
in this case i would like to split while it starts at M and add MSH to a table.
How can i do this?
Grab everything after the first uppercase letter in the string:
sub = s:match('[A-Z].*')
Per Egor's comment:
sub = s:match'%u.*'

Splitting strings using Ruby ignoring certain characters

I'm trying to split a string and counts the number os words using Ruby but I want ignore special characters.
For example, in this string "Hello, my name is Hugo ..." I'm splitting it by spaces but the last ... should't counts because it isn't a word.
I'm using string.inner_text.split(' ').length. How can I specify that special characters (such as ... ? ! etc.) when separated from the text by spaces are not counted?
Thank you to everyone,
Kind Regards,
Hugo
"Hello, my name is não ...".scan /[^*!#%\^\s\.]+/
# => ["Hello,", "my", "name", "is", "não"]
/[^*!#%\^]+/ will match anything other than *!#%\^. You can add more to this list which need not be matched
this is part answer, part response to #Neo's answer: why not use proper tools for the job?
http://www.ruby-doc.org/core-1.9.3/Regexp.html says:
POSIX bracket expressions are also similar to character classes. They provide a portable alternative to the above, with the added benefit that they encompass non-ASCII characters. For instance, /\d/ matches only the ASCII decimal digits (0-9); whereas /[[:digit:]]/ matches any character in the Unicode Nd category.
/[[:alnum:]]/ - Alphabetic and numeric character
/[[:alpha:]]/ - Alphabetic character
...
Ruby also supports the following non-POSIX character classes:
/[[:word:]]/ - A character in one of the following Unicode general categories Letter, Mark, Number, Connector_Punctuation
you want words, use str.scan /[[:word:]]+/

string format checking (with partly random string)

I would like to use regular expression to check if my string have the format like following:
mc_834faisd88979asdfas8897asff8790ds_oa_ids
mc_834fappsd58979asdfas8897asdf879ds_oa_ids
mc_834faispd8fs9asaas4897asdsaf879ds_oa_ids
mc_834faisd8dfa979asdfaspo97asf879ds_dv_ids
mc_834faisd111979asdfas88mp7asf879ds_dv_ids
mc_834fais00979asdfas8897asf87ggg9ds_dv_ids
The format is like mc_<random string>_oa_ids or mc_<random string>_dv_ids . How can I check if my string is in either of these two formats? And please explain the regular expression. thank you.
That's a string start with mc_, while end with _oa_ids or dv_ids, and have some random string in the middle.
P.S. the random string consists of alpha-beta letters and numbers.
What I tried(I have no clue how to check the random string):
/^mc_834faisd88979asdfas8897asff8790ds$_os_ids/
Try this.
^mc_[0-9a-z]+_(dv|oa)_ids$
^ matches at the start of the line the regex pattern is applied to.
[0-9a-z] matces alphabetic and numeric chars.
+ means that there should be one or more chars in this set
(dv|oa) matches dv or oa
$ matches at the end of the string the regex pattern is applied to.
also matches before the very last line break if the string ends with a line break.
Give /\Amc_\w*_(oa|dv)_ids\z/ a try. \A is the beginning of the string, \z the end. \w* are one or more of letters, numbers and underscores and (oa|dv) is either oa or dv.
A nice and simple way to test Ruby Regexps is Rubular, might have a look at it.
This should work
/mc_834([a-z,0-9]*)_(oa|dv)_ids/g
Example: http://regexr.com?2v9q7

Resources