About Regular Expression - automata

Set={a,b} and L7:"All words that begin with an a and end with a b" is given L7 can be defined by
a(a+b)*b
What is the meaning of "+" ?
And,
How to solve this problem ?

a # first letter is always 'a'
(a+b)* # zero or more sequence of letters 'a' or 'b' [one letter at time]
b # last letter is always 'b'
Here + means or and then consequently, we have the below results:
ab
abb
abbb
aaab
abbbb
aaaab
abbbbb
aaaaab
.....

Related

Rails string split every other "."

I have a bunch of sentences that I want to break into an array. Right now, I'm splitting every time \n appears in the string.
#chapters = #script.split('\n')
What I'd like to do is .split ever OTHER "." in the string. Is that possible in Ruby?
You could do it with a regex, but I'd start with a simple approach: just split on periods, then join pairs of substrings:
s = "foo. bar foo. foo bar. boo far baz. bizzle"
s.split(".").each_slice(2).map {|p| p.join "." }
# => => ["foo. bar foo", " foo bar. boo far baz", " bizzle"]
This is a case where it's easier to use String#scan than String#split.
We can use the following regular expression:
r = /(?<=\.|\A)[^.]*\.[^.]*(?=\.|\z)/
str=<<~_
Now is the time. This is it. It is now. The time to have fun.
The time to make new friends. The time to party.
_
str.scan(r)
#=> [
# "Now is the time. This is it",
# " It is now. The time to have fun",
# "\nThe time to make new friends. The time to party"
#=> ]
We can write the regular expression in free-spacing mode to make it self-documenting.
r = /
(?<= # begin a positive lookbehind
\A # match the beginning of the string
| # or
\. # match a period
) # end positive lookbehind
[^.]* # match zero or more characters other than periods
\. # match a period
[^.]* # match zero or more characters other than periods
(?= # begin a positive lookahead
\. # match a period
| # or
\z # match the end of the string
) # end positive lookahead
/x # invoke free-spacing regex definition mode
Note that (?<=\.|\A) can be replaced with (?<![^\.]). (?<![^\.]) is a negative lookbehind that asserts the match is not preceded by a character other than a period.
Similarly, (?=\.|\z) can be replaced with (?![^.]). (?![^.]) is a negative lookahead that asserts the match is not followed by a character other than a period.

Regex does not check the first character after first check

I am trying to write a regex that:
Allows only numbers, lowercase letters and also "-" and "_".
String can only start with: letter number or "uuid:"
String must have at least one letter in it.
It must consist of at least 2 characters.
I managed to create such a regex: \A(?:uuid:|[a-z0-9])(?=(.*[a-z])){1,}(?:\w|-)+\z
I just don't understand why if the first character is a letter, it is not taken into account, so it doesn't pass for example: a1.
And also why it allows uppercase letters AA.
Tests: https://rubular.com/r/Q5gEP15iaYkHYQ
Thank you in advance for your help
You could also get the matches without lookarounds using an alternation matching at least 2 characters from the start.
If you don't want to match uppercase chars A-Z, then you can omit /i for case insensitive matching.
\A(?:uuid:|[a-z][a-z0-9_-]|[0-9][0-9_-]*[a-z])[a-z0-9_-]*\z
Explanation
\A Start of string
(?: Non capture group
uuid: match literally
| Or
[a-z][a-z0-9_-] match a char a-z and one of a-z 0-9 _ -
| Or
[0-9][0-9_-]*[a-z] Match a digit, optional chars 0-9 _ - and then a-z
) Close non capture group
[a-z0-9_-]* Match optional chars a-z 0-9 _ -
\z End of string
Regex rubular demo
It looks like AA meets all your requirements: it contains a letter, at least two chars, and starts with a letter, and contains "only numbers, lowercase letters and also - and _". NOTE you have an i flag that makes pattern matching case insensitive, and if you do not want to allow any uppercase letters, just remove it from the end of the regex literal.
To fix the other real issues, you can use
/\A(?=[^a-z]*[a-z])(?=.{2})(?:uuid:|[a-z0-9])[a-z0-9_-]*\z/
See this Rubular demo.
Note that in the demo, (?=[^a-z\n]*[a-z]) is used rather than (?=[^a-z]*[a-z]) because the test is performed on a single multiline string, not an array of separate strings.
Details:
\A - start of string
(?=[^a-z]*[a-z]) - minimum one letter
(?=.{2}) - minimim two chars
(?:uuid:|[a-z0-9]) - uuid:, or one letter or digit
[a-z0-9_-]* - zero or more letters, digits, _ or -
\z - end of string
You can use the following regular expression as the argument of String#match?. I've written the expression in free-spacing mode to make it self-documenting. Free-spacing mode is invoked with the option x (i.e. /x). It causes whitespace and comments to be ignored.
Note that I've defined named capture group common_code so that I can use its instructions as a subexpression (a.k.a. subroutine) in subsequent instructions. The invocation \g<common_code> tells Ruby to repeat the instructions in that capture group. See Subexpression Calls. You will see that numbered capture groups can be used as well.
The use of subexpressions has three benefits:
less code is required;
there is less chance of making errors, both when initially forming the expression and later when modifying it, by confining all instructions that are to repeat through the expression in one place (in the capture group); and
it makes the expression easier to read and understand.
re = /
\A # match the beginning of the string
(?: # begin a non-capture group
[a-z] # match a lc letter
(?<common_code> # begin a capture group named 'common_code'
[\da-z_-] # match a digit, lc letter, underscore or hyphen
) # end capture group
+ # execute the preceding capture group or more times
| # or
uuid: # match a string literal
\g<common_code>* # match code in capture group common_code >= 0 times
| # or
\d # match a digit
\g<common_code>* # match code in capture group common_code >= 0 times
[a-z] # match a lc letter
\g<common_code>* # match code in capture group common_code >= 0 times
) # end the non-capture group
\z # match the end of the string
/x # invoke free-spacing regex definition mode
test = %w| ab a1 a_ a- 0a 01a 0_a 0-a uuid: uuid:a uuid:0 uuid:_ uuid:- | +
%w| a _a -a a$ auuid: 01 0_ 0- 01_- |
test.each do |s|
puts "#{s.ljust(7)} -> #{s.match?(re)}"
end
ab -> true
a1 -> true
a_ -> true
a- -> true
0a -> true
01a -> true
0_a -> true
0-a -> true
uuid: -> true
uuid:a -> true
uuid:0 -> true
uuid:_ -> true
uuid:- -> true
a -> false
_a -> false
-a -> false
a$ -> false
auuid: -> false
01 -> false
0_ -> false
0- -> false
01_- -> false

How do I replace all the apostrophes that come right before or right after a comma?

I have a string aString = "old_tag1,old_tag2,'new_tag1','new_tag2'"
I want to replace the apostrophees that come right before or right after a comma. For example in my case the apostrophees enclosing new_tag1 and new_tag2 should be removed.
This is what I have right now
aString = aString.gsub("'", "")
This is however problematic as it removes any apostrophe inside for example if I had 'my_tag's' instead of 'new_tag1'. How do I get rid of only the apostrophes that come before or after the commas ?
My desired output is
aString = "old_tag1,old_tag2,new_tag1,new_tag2"
My guess is to use regex as well, but in a slightly other way:
aString = "old_tag1,old_tag2,'new_tag1','new_tag2','new_tag3','new_tag4's'"
aString.gsub /(?<=^|,)'(.*?)'(?=,|$)/, '\1\2\3'
#=> "old_tag1,old_tag2,new_tag1,new_tag2,new_tag3,new_tag4's"
The idea is to find a substring with bounding apostrophes and paste it back without it.
regex = /
(?<=^|,) # watch for start of the line or comma before
' # find an apostrophe
(.*?) # get everything between apostrophes in a non-greedy way
' # find a closing apostrophe
(?=,|$) # watch after for the comma or the end of the string
/x
The replacement part just paste back the content of the first, second, and third groups (everything between parenthesis).
Thanks for #Cary for /x modificator for regexes, I didn't know about it! Extremely useful for explanation.
This answers the question, "I want to replace the apostrophes that come right before or right after a comma".
r = /
(?<=,) # match a comma in a positive lookbehind
\' # match an apostrophe
| # or
\' # match an apostrophe
(?=,) # match a comma in a positive lookahead
/x # free-spacing regex definition mode
aString = "old_tag1,x'old_tag2'x,x'old_tag3','new_tag1','new_tag2'"
aString.gsub(r, '')
#=> => "old_tag1,x'old_tag2'x,x'old_tag3,new_tag1,new_tag2'"
If the objective is instead to remove single quotes enclosing a substring when the left quote is at the the beginning of the string or is immediately preceded by a comma and the right quote is at the end of the string or is immediately followed by comma, several approaches are possible. One is to use a single, modified regex, as #Dimitry has done. Another is to split the string on commas, process each string in the resulting array and them join the modified substrings, separated by commas.
r = /
\A # match beginning of string
\' # match single quote
.* # match zero or more characters
\' # match single quote
\z # match end of string
/x # free-spacing regex definition mode
aString.split(',').map { |s| (s =~ r) ? s[1..-2] : s }.join(',')
#=> "old_tag1,x'old_tag2'x,x'old_tag3',new_tag1,new_tag2"
Note:
arr = aString.split(',')
#=> ["old_tag1", "x'old_tag2'x", "x'old_tag3'", "'new_tag1'", "'new_tag2'"]
"old_tag1" =~ r #=> nil
"x'old_tag2'x" =~ r #=> nil
"x'old_tag3'" =~ r #=> nil
"'new_tag1'" =~ r #=> 0
"'new_tag2'" =~ r #=> 0
Non regex replacement
Regular expressions can get really ugly. There is a simple way to do it with just string replacement: search for the pattern ,' and ', and replace with ,
aString.gsub(",'", ",").gsub("',", ",")
=> "old_tag1,old_tag2,new_tag1,new_tag2'"
This leaves the trailing ', but that is easy to remove with .chomp("'"). A leading ' can be removed with a simple regex .gsub(/^'/, "")

How to replace the space between two words with a hyphen if the first and last letter of the two words matches a particular pattern?

I'm working with a language which has some particular rules about spelling. When words are put together, they do not have spaces, but occasionally use ' or - to - distinguish where one word begins and another ends, in the rare cases where confusion can occur.
I have the words currently displayed with spaces between then, e.g.:
The cat caught the mouse.
However, I need to remove the spaces, e.g.:
Thecatcaughtthemouse.
Before these spaces can be removed though, the rules regarding the placement of ' and - must be considered:
first, if the first letter of a word (which also follows another word) begins with a vowel (a, a, á, à, ǎ, ā, b, c, d, e, e, é, è, ě, ē, i, i, í, ì, ǐ, ī, o, o, ó, ò, ǒ, ō, u, u, ú, ù, ǔ, ü, ǘ, ǜ, ǚ, ǖ, or ū), then replace the space with a ' (between words), e.g.:
The cat ate the sandwich and the ice cream.
This becomes:
Thecat'atethesandwichandthe'icecream.
This does not apply to words at the beginning of the sentence.
Next, if the last letter of a word begins with "a", "u", or "ü" (a, a, á, à, ǎ, ā, u, u, ú, ù, ǔ, ü, ǘ, ǜ, ǚ, ǖ, or ū) and next word in the sentences begins with "n", then replace the space with a - (between words), e.g.:
The people from Australia needed a car to visit the plateau near the river.
This becomes:
Thepeoplefrom'Australia-needed'acartovisittheplateau-neartheriver.
Finally, if the last letter of a word ends with "n" and the next word in the sentence begins with "g", then replace the space with a - (between words), e.g.:
The Australian grasshopper was lost in the overgrown grove.
This becomes:
The'Australian-grasshopperwaslostinthe'overgrown-grove.
How can I replace the spaces between words matching these patterns with ' and -?
You don't say just why you're doing this. Let's hope it's not a homework problem.
Suppose that a word ends with a vowel and the next begins with 'f' or 't', and I want to replace the space with a star, I write
sentence:gsub('([aeiouy])%s+([ft])', '%1*%2')
You can take it from there.

How to find all instances of #[XX:XXXX] in a string and then find the surrounding text?

Given a string like:
"#[19:Sara Mas] what's the latest with the TPS report? #[30:Larry Peters] can you help out here?"
I want to find a way to dynamically return, the user tagged and the content surrounding. Results should be:
user_id: 19
copy: what's the latest with the TPS report?
user_id: 30
copy: can you help out here?
Any ideas on how this can be done with ruby/rails? Thanks
How is this regex for finding matches?
#\[\d+:\w+\s\w+\]
Split the string, then handle the content iteratively. I don't think it'd take more than:
tmp = string.split('#').map {|str| [str[/\[(\d*).*/,1], str[/\](.*^)/,1]] }
tmp.first #=> ["19", "what's the latest with the TPS report?"]
Does that help?
result = subject.scan(/\[(\d+).*?\](.*?)(?=#|\Z)/m)
This grabs id and content in backreferences 1 and 2 respectively. For stoping the capture either # or the end of string must be met.
"
\\[ # Match the character “[” literally
( # Match the regular expression below and capture its match into backreference number 1
\\d # Match a single digit 0..9
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
. # Match any single character that is not a line break character
*? # Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
\\] # Match the character “]” literally
( # Match the regular expression below and capture its match into backreference number 2
. # Match any single character that is not a line break character
*? # Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
)
(?= # Assert that the regex below can be matched, starting at this position (positive lookahead)
# Match either the regular expression below (attempting the next alternative only if this one fails)
\# # Match the character “\#” literally
| # Or match regular expression number 2 below (the entire group fails if this one fails to match)
\$ # Assert position at the end of the string (or before the line break at the end of the string, if any)
)
"
This will match something starting from # and ending to punctuation makr. Sorry if I didn't understand correctly.
result = subject.scan(/#.*?[.?!]/)

Resources