Regular expression to avoid a set of characters - ruby-on-rails

I am using Ruby on Rails 3.1.0 and I would like to validate a class attribute just to avoid to store in the database a string containing these characters: (blank space), <, >, ", #, %, {, }, |, \, ^, ~, [, ] and ```.
What is the regex?

Assuming it should also be non-empty:
^[^\] ><"#%{}|\\^~\[`]+$
Since someone is downvoting this, here is some test code:
ary = [' ', '<', '>', '"', '#', '%', '{', '}', '|', '\\', '^', '~', '[', ']', '`', 'a']
ary.each do |i|
puts i =~ /^[^\] ><"#%{}|\\^~\[`]+$/
end
Output:
nil
nil
nil
nil
nil
nil
nil
nil
nil
nil
nil
nil
nil
nil
nil
0

bad_chars = %w(< > " # % { } | \ ^ ~ [ ] ')
re = Regexp.union(bad_chars)
p %q(hoh'oho) =~ re #=> 3
Regexp.union takes care of escaping.

a = "foobar"
b = "foo ` bar"
re = /[ \^<>"#%\{\}\|\\~\[\]\`]/
a =~ re # => nil
b =~ re # => 3
The inverse expression is:
/\A[^ \^<>"#%\{\}\|\\~\[\]\`]+\Z/

Related

Rails - extract substring with in [ and ] from string

This works.
"<name> <substring>"[/.*<([^>]*)/,1]
=> "substring"
But I want to extract substring within [ and ].
input:
string = "123 [asd]"
output:
asd
Anyone can help me?
You can do:
"123 [asd]"[/\[(.*?)\]/, 1]
will return
"asd"
You can test it here:
https://rextester.com/YGZEA91495
Here are a few more ways to extract the desired string.
str = "123 [asd] 456"
#1
r = /
(?<=\[) # match '[' in a positive lookbehind
[^\]]* # match 1+ characters other than ']'
(?=\]) # match ']' in a positive lookahead
/x # free-spacing regex definition mode
str[r]
#=> "asd"
#2
r = /
\[ # match '['
[^\]]* # match 1+ characters other than ']'
\] # match ']'
/x # free-spacing regex definition mode
str[r][1..-2]
#=> "asd"
#3
r = /
.*\[ # match 0+ characters other than a newline, then '['
| # or
\].* # match ']' then 0+ characters other than a newline
/x # free-spacing regex definition mode
str.gsub(r, '')
#=> "asd"
#4
n = str.index('[')
#=> 4
m = str.index(']', n+1)
#=> 8
str[n+1..m-1]
#=> "asd"
See String#index.

Lua gsub chars '(' and ')' fails

For some reason only the open and close bracket wont work, all others are fine.
RequestEncoded = string.gsub(RequestEncoded, '<', ' ')
RequestEncoded = string.gsub(RequestEncoded, '>', ' ')
RequestEncoded = string.gsub(RequestEncoded, '"', ' ')
RequestEncoded = string.gsub(RequestEncoded, '\'', ' ')
RequestEncoded = string.gsub(RequestEncoded, '\\', ' ')
-- RequestEncoded = string.gsub(RequestEncoded, '(', ' ') keeps failing
-- RequestEncoded = string.gsub(RequestEncoded, ')', ' ')
-- RequestEncoded = string.gsub(RequestEncoded, "\x28", " ") --keeps failing
-- RequestEncoded = string.gsub(RequestEncoded, "\x29", ' ')
-- RequestEncoded = string.gsub(RequestEncoded, '\050', ' ') --keeps failing
-- RequestEncoded = string.gsub(RequestEncoded, '\051', ' ')
) and ( are special characters that form a capturing group in a Lua pattern.
You need to escape them when they are outside of square brackets, [...], to match literal parentheses. You need to escape them with %.
string.gsub(RequestEncoded, '%(', ' ')
string.gsub(RequestEncoded, '%)', ' ')
However, since you are using the same replacement pattern in all the subsequent gsub calls, you may simplify your code to
RequestEncoded = string.gsub(RequestEncoded, '[<>"\'\\()]', ' ')
Note that here, () are inside a bracket expression and do not need escaping.
See Lua patterns docs:
Some characters, called magic characters, have special meanings when used in a pattern. The magic characters are
( ) . % + - * ? [ ^ $

Ruby: stringA.gsub(/\s+/, '') versus stringA.strip

Say
string = "Johnny be good! And smile :-) "
Is there a difference between
string.gsub(/\s+/, '')
and
string.strip
?
If so, what is it?
strip only removes leading and trailing whitespace, using gsub in the way that you outline in your question will remove all whitespace from the string.
irb(main):004:0* " hello ".strip
=> "hello"
irb(main):005:0> " h e l l o ".strip
=> "h e l l o"
irb(main):006:0> " hello ".gsub(/\s+/, '')
=> "hello"
irb(main):007:0> " h e l l o ".gsub(/\s+/, '')
=> "hello"

Match self-define token in PARSE

I am working on a string-transforming problem. The requirement is like this:
line: {INSERT INTO `pub_company` VALUES ('1', '0', 'ABC大学', 'B', 'admin', '2014-10-09 11:40:44', '', '105210', null)}
==>
line: {INSERT INTO `pub_company` VALUES ('1', '0', 'ABC大学', 'B', 'admin', to_date('2014-10-09 11:40:44', 'yyyy-mm-dd hh24:mi:ss'), '', '105210', null)}
Note: the '2014-10-09 11:40:44' is transformed to to_date('2014-10-09 11:40:44', 'yyyy-mm-dd hh24:mi:ss').
My code looks like below:
date: use [digit][
digit: charset "0123456789"
[4 digit "-" 2 digit "-" 2 digit space 2 digit ":" 2 digit ":" 2 digit]
]
parse line [ to date to end]
but I got this error:
** Script error: PARSE - invalid rule or usage of rule: digit
** Where: parse do either either either -apply-
** Near: parse line [to date to end]
I have made some tests:
probe parse "SSS 2016-01-01 00:00:00" [thru 3 "S" space date to end] ;true
probe parse "SSS 2016-01-01 00:00:00" [ to date to end] ; the error above
As the position of date value is not the same in all my data set, how can I reach it and match it and make the corresponding change?
I did as below:
line: {INSERT INTO `pub_company` VALUES ('1', '0', 'ABC大学', 'B', 'admin', '2014-10-09 11:40:44', '', '105210', null)}
d: [2 digit]
parse/all line [some [p1: {'} 4 digit "-" d "-" d " " d ":" d ":" d {'} p2: (insert p2 ")" insert p1 "to_date(" ) | skip]]
>> {INSERT INTO `pub_company` VALUES ('1', '0', 'ABC??', 'B', 'admin', to_date('2014-10-09 11:40:44'), '', '105210', null)}
TO and THRU have historically not allowed arbitrary rules as their parameters. See #2129:
"The syntax of TO and THRU is currently restricted by design, for really significant performance reasons..."
This was relaxed in Red. So for example, the following will work there:
parse "aabbaabbaabbccc" [
thru [
some "a" (prin "a") some "b" (prin "b") some "c" (prin "c")
]
]
However, it outputs:
abababababc
This shows that it really doesn't have a better answer than just "naively" applying the parse rule iteratively at each step. Looping the PARSE engine is not as efficient as atomically running a TO/THRU for which faster methods of seeking exist (basic string searches, for instance). And the repeated execution of code in parentheses may not line up with what was actually intended.
Still...it seems better to allow it. Then let users worry about when their code is slow and performance tune it if it matters. So odds are that the Ren-C branch of Rebol will align with Red in this respect, and allow arbitrary rules.
I have made it by an indirect way:
date: use [digit][
digit: charset "0123456789"
[4 digit "-" 2 digit "-" 2 digit space 2 digit ":" 2 digit ":" 2 digit]
]
line: {INSERT INTO `pub_company` VALUES ('1', '0', 'ABC大学', 'B', 'admin', '2014-10-09 11:40:44', '', '105210', null)}
parse line [
thru "(" vals: (
blk: parse/all vals ","
foreach val blk [
if parse val [" '" date "'"][
;probe val
replace line val rejoin [ { to_date(} at val 2 {, 'yyyy-mm-dd hh24:mi:ss')}]
]
]
)
to end
(probe line)
]
The output:
{INSERT INTO `pub_company` VALUES ('1', '0', 'ABC大学', 'B', 'admin', to_date('2014-10-09 11:40:44', 'yyyy-mm-dd hh24:mi:ss'), '', '105210', null)}
Here a true Rebol2 solution
line: {INSERT INTO `pub_company` VALUES ('1', '0', 'ABC??', 'B', 'admin', '2014-10-09 11:40:44', '', '105210', null)}
date: use [digit space][
space: " "
digit: charset "0123456789"
[4 digit "-" 2 digit "-" 2 digit space 2 digit ":" 2 digit ":" 2 digit]
]
>> parse/all line [ some [ [da: "'" date (insert da "to_date (" ) 11 skip de: (insert de " 'yyyy-mm-dd hh24:mi:ss'), ") ] | skip ] ]
== true
>> probe line
{INSERT INTO `pub_company` VALUES ('1', '0', 'ABC??', 'B', 'admin', to_date ('2014-10-09 11:40:44', 'yyyy-mm-dd hh24:mi:ss'), '', '105210', null)}

Haskell Parser Fails on "|" Read

I am working on a parser in Haskell using Parsec. The issue lies in reading in the string "| ". When I attempt to read in the following,
parseExpr = parseAtom
-- | ...
<|> do string "{|"
args <- try parseList <|> parseDottedList
string "| "
body <- try parseExpr
string " }"
return $ List [Atom "lambda", args, body]
I get a parse error, the following.
Lampas >> {|a b| "a" }
Parse error at "lisp" (line 1, column 12):
unexpected "}"
expecting letter, "\"", digit, "'", "(", "[", "{|" or "."
Another failing case is ^ which bears the following.
Lampas >> {|a b^ "a" }
Parse error at "lisp" (line 1, column 12):
unexpected "}"
expecting letter, "\"", digit, "'", "(", "[", "{|" or "."
However, it works as expected when the string "| " is replaced with "} ".
parseExpr = parseAtom
-- | ...
<|> do string "{|"
args <- try parseList <|> parseDottedList
string "} "
body <- try parseExpr
string " }"
return $ List [Atom "lambda", args, body]
The following is the REPL behavior with the above modification.
Lampas >> {|a b} "a" }
(lambda ("a" "b") ...)
So the question is (a) does pipe have a special behavior in Haskell strings, perhaps only in <|> chains?, and (b) how is this behavior averted?.
The character | may be in a set of reserved characters. Test with other characters, like ^, and I assume it will fail just as well. The only way around this would probably be to change the set of reserved characters, or the structure of your interpreter.

Resources