Case-insensitive matching in LPeg.re (Lua) - lua

I'm new to the "LPeg" and "re" modules of Lua, currently I want to write a pattern based on following rules:
Match the string that starts with "gv_$/gv$/v$/v_$/x$/xv$/dba_/all_/cdb_", and the prefix "SYS.%s*" or "PUBLIC.%s*" is optional
The string should not follow a alphanumeric, i.e., the pattern would not match "XSYS.DBA_OBJECTS" because it follows "X"
The pattern is case-insensitive
For example, below strings should match the pattern:
,sys.dba_objects, --should return "sys.dba_objects"
SyS.Dba_OBJECTS
cdb_objects
dba_hist_snapshot) --should return "dba_hist_snapshot"
Currently my pattern is below which can only match non-alphanumeric+string in upper case :
p=re.compile[[
pattern <- %W {owner* name}
owner <- 'SYS.'/ 'PUBLIC.'
name <- {prefix %a%a (%w/"_"/"$"/"#")+}
prefix <- "GV_$"/"GV$"/"V_$"/"V$"/"DBA_"/"ALL_"/"CDB_"
]]
print(p:match(",SYS.DBA_OBJECTS"))
My questions are:
How to achieve the case-insensitive matching? There are some topics about the solution but I'm too new to understand
How to exactly return the matched string only, instead of also have to plus %W? Something like "(?=...)" in Java
Highly appreciated if you can provide the pattern or related function.

You can try to tweak this grammar
local re = require're'
local p = re.compile[[
pattern <- ((s? { <name> }) / s / .)* !.
name <- (<owner> s? '.' s?)? <prefix> <ident>
owner <- (S Y S) / (P U B L I C)
prefix <- (G V '_'? '$') / (V '_'? '$') / (D B A '_') / (C D B '_')
ident <- [_$#%w]+
s <- (<comment> / %s)+
comment <- '--' (!%nl .)*
A <- [aA]
B <- [bB]
C <- [cC]
D <- [dD]
G <- [gG]
I <- [iI]
L <- [lL]
P <- [pP]
S <- [sS]
U <- [uU]
V <- [vV]
Y <- [yY]
]]
local m = { p:match[[
,sys.dba_objects, --should return "sys.dba_objects"
SyS.Dba_OBJECTS
cdb_objects
dba_hist_snapshot) --should return "dba_hist_snapshot"
]] }
print(unpack(m))
. . . prints match table m:
sys.dba_objects SyS.Dba_OBJECTS cdb_objects dba_hist_snapshot
Note that case-insensitivity is quite hard to achieve out of the lexer so each letter has to get a separate rule -- you'll need more of these eventually.
This grammar is taking care of the comments in your sample and skips them along with whitespace so matches after "should return" are not present in output.
You can fiddle with prefix and ident rules to specify additional prefixes and allowed characters in object names.
Note: !. means end-of-file. !%nl means "not end-of-line". ! p and & p are constructing non-consuming patterns i.e. current input pointer is not incremented on match (input is only tested).
Note 2: print-ing with unpack is a gross hack.
Note 3: Here is a tracable LPeg re that can be used to debug grammars. Pass true for 3-rd param of re.compile to get execution trace with test/match/skip action on each rule and position visited.

Finally I got an solution but not so graceful, which is to add an additional parameter case_insensitive into re.compile, re.find, re.match and re.gsubfunctions. When the parameter value is true, then invoke case_insensitive_pattern to rewrite the pattern:
...
local fmt="[%s%s]"
local function case_insensitive_pattern(quote,pattern)
-- find an optional '%' (group 1) followed by any character (group 2)
local stack={}
local is_letter=nil
local p = pattern:gsub("(%%?)(.)",
function(percent, letter)
if percent ~= "" or not letter:match("%a") then
-- if the '%' matched, or `letter` is not a letter, return "as is"
if is_letter==false then
stack[#stack]=stack[#stack]..percent .. letter
else
stack[#stack+1]=percent .. letter
is_letter=false
end
else
if is_letter==false then
stack[#stack]=quote..stack[#stack]..quote
is_letter=true
end
-- else, return a case-insensitive character class of the matched letter
stack[#stack+1]=fmt:format(letter:lower(), letter:upper())
end
return ""
end)
if is_letter==false then
stack[#stack]=quote..stack[#stack]..quote
end
if #stack<2 then return stack[1] or (quote..pattern..quote) end
return '('..table.concat(stack,' ')..')'
end
local function compile (p, defs, case_insensitive)
if mm.type(p) == "pattern" then return p end -- already compiled
if case_insensitive==true then
p=p:gsub([[(['"'])([^\n]-)(%1)]],case_insensitive_pattern):gsub("%(%s*%((.-)%)%s*%)","(%1)")
end
local cp = pattern:match(p, 1, defs)
if not cp then error("incorrect pattern", 3) end
return cp
end
...

Related

Is there a way to fix an expression with operators in it after parsing, using a table of associativities and precedences?

I'm currently working on a parser for a simple programming language written in Haskell. I ran into a problem when I tried to allow for binary operators with differing associativities and precedences. Normally this wouldn't be an issue, but since my language allows users to define their own operators, the precedence of operators isn't known by the compiler until the program has already been parsed.
Here are some of the data types I've defined so far:
data Expr
= Var String
| Op String Expr Expr
| ..
data Assoc
= LeftAssoc
| RightAssoc
| NonAssoc
type OpTable =
Map.Map String (Assoc, Int)
At the moment, the compiler parses all operators as if they were right-associative with equal precedence. So if I give it an expression like a + b * c < d the result will be Op "+" (Var "a") (Op "*" (Var "b") (Op "<" (Var "c") (Var "d"))).
I'm trying to write a function called fixExpr which takes an OpTable and an Expr and rearranges the Expr based on the associativities and precedences listed in the OpTable. For example:
operators :: OpTable
operators =
Map.fromList
[ ("<", (NonAssoc, 4))
, ("+", (LeftAssoc, 6))
, ("*", (LeftAssoc, 7))
]
expr :: Expr
expr = Op "+" (Var "a") (Op "*" (Var "b") (Op "<" (Var "c") (Var "d")))
fixExpr operators expr should evaluate to Op "<" (Op "+" (Var "a") (Op "*" (Var "b") (Var "c"))) (Var "d").
How do I define the fixExpr function? I've tried multiple solutions and none of them have worked.
An expression e may be an atomic term n (e.g. a variable or literal), a parenthesised expression, or an application of an infix operator ○.
e ⩴ n | (e​) | e1 ○ e2
We need the parentheses to know whether the user entered a * b + c, which we happen to associate as a * (b + c) and need to reassociate as (a * b) + c, or if they entered a * (b + c) literally, which should not be reassociated. Therefore I’ll make a small change to the data type:
data Expr
= Var String
| Group Expr
| Op String Expr Expr
| …
Then the method is simple:
The rebracketing of an expression ⟦e⟧ applies recursively to all its subexpressions.
⟦n⟧ = n
⟦(e)⟧ = (⟦e⟧)
⟦e1 ○ e2⟧ = ⦅⟦e1⟧ ○ ⟦e2⟧⦆
A single reassociation step ⦅e⦆ removes redundant parentheses on the right, and reassociates nested operator applications leftward in two cases: if the left operator has higher precedence, or if the two operators have equal precedence, and are both left-associative. It leaves nested infix applications alone, that is, associating rightward, in the opposite cases: if the right operator has higher precedence, or the operators have equal precedence and right associativity. If the associativities are mismatched, then the result is undefined.
⦅e ○ n⦆ = e ○ n
⦅e1 ○ (e2)⦆ = ⦅e1 ○ e2⦆
⦅e1 ○ (e2 ● e3)⦆ =
⦅e1 ○ e2⦆ ● e3, if:
a. P(○) > P(●); or
b. P(○) = P(●) and A(○) = A(●) = L
e1 ○ (e2 ● e3), if:
a. P(○) < P(●); or
b. P(○) = P(●) and A(○) = A(●) = R
undefined otherwise
NB.: P(o) and A(o) are respectively the precedence and associativity (L or R) of operator o.
This can be translated fairly literally to Haskell:
fixExpr operators = reassoc
where
-- 1.1
reassoc e#Var{} = e
-- 1.2
reassoc (Group e) = Group (reassoc e)
-- 1.3
reassoc (Op o e1 e2) = reassoc' o (reassoc e1) (reassoc e2)
-- 2.1
reassoc' o e1 e2#Var{} = Op o e1 e2
-- 2.2
reassoc' o e1 (Group e2) = reassoc' o e1 e2
-- 2.3
reassoc' o1 e1 r#(Op o2 e2 e3) = case compare prec1 prec2 of
-- 2.3.1a
GT -> assocLeft
-- 2.3.2a
LT -> assocRight
EQ -> case (assoc1, assoc2) of
-- 2.3.1b
(LeftAssoc, LeftAssoc) -> assocLeft
-- 2.3.2b
(RightAssoc, RightAssoc) -> assocRight
-- 2.3.3
_ -> error $ concat
[ "cannot mix ‘", o1
, "’ ("
, show assoc1
, " "
, show prec1
, ") and ‘"
, o2
, "’ ("
, show assoc2
, " "
, show prec2
, ") in the same infix expression"
]
where
(assoc1, prec1) = opInfo o1
(assoc2, prec2) = opInfo o2
assocLeft = Op o2 (Group (reassoc' o1 e1 e2)) e3
assocRight = Op o1 e1 r
opInfo op = fromMaybe (notFound op) (Map.lookup op operators)
notFound op = error $ concat
[ "no precedence/associativity defined for ‘"
, op
, "’"
]
Note the recursive call in assocLeft: by reassociating the operator applications, we may have revealed another association step, as in a chain of left-associative operator applications like a + b + c + d = (((a + b) + c) + d).
I insert Group constructors in the output for illustration, but they can be removed at this point, since they’re only necessary in the input.
This hasn’t been tested very thoroughly at all, but I think the idea is sound, and should accommodate modifications for more complex situations, even if the code leaves something to be desired.
An alternative that I’ve used is to parse expressions as “flat” sequences of operators applied to terms, and then run a separate parsing pass after name resolution, using e.g. Parsec’s operator precedence parser facility, which would handle these details automatically.

replacing a value withgsub in lua

function expandVars(tmpl,t)
return (tmpl:gsub('%$([%a ][%w ]+)', t)) end
local sentence = expandVars("The $adj $char1 looks at you and says, $name, you are $result", {adj="glorious", name="Jayant", result="the Overlord", char1="King"})
print(sentence)
The above code work only when I have ',' after the variable name like, in above sentence it work for $ name and $ result but not for $adj and $char1, Why is that ?
Problem:
Your pattern [%a ][%w ]+ means a letter or space, followed by at least one letter or number or space. Since regexp is greedy, it will try to match as large a sequence as possible, and the match will include the space:
function expandVars(tmpl,t)
return string.gsub(tmpl, '%$([%a ][%w ]+)', t)
end
local sentence = expandVars(
"$a1 $b and c $d e f ",
{["a1 "]="(match is 'a1 ')", ["b and c "]="(match is 'b and c ')", ["d e f "]="(match is 'd e f ')", }
)
This prints
(match is 'a1 ')(match is 'b and c ')(match is 'd e f ')
Solution:
The variable names must match keys from your table; you could accepts keys that have spaces and all sort of characters but then you are forcing the user to use [] in the table keys, as done above, this is not very nice :)
Better keep it to alphanumeric and underscore, with the constraint that it cannot start with a number. This means to be generic you want a letter (%a), followed by any number of (including none) (* rather than +) of alphanumeric and underscore [%w_]:
function expandVars(tmpl,t)
return string.gsub(tmpl, '%$(%a[%w_]*)', t)
end
local sentence = expandVars(
"$a $b1 and c $d_2 e f ",
{a="(match is 'a')", b1="(match is 'b1')", d_2="(match is 'd_2')", }
)
print(sentence)
This prints
(match is 'a') (match is 'b1') and c (match is 'd_2') e f; non-matchable: $_a $1a b
which shows how the leading underscore and leading digit were not accepted.

Correctly parsing line indentations in uu-parsinglib in Haskell

I want to create a parser combinator, which will collect all lines below current place, which indentation levels will be greater or equal some i. I think the idea is simple:
Consume a line - if its indentation is:
ok -> do it for next lines
wrong -> fail
Lets consider following code:
import qualified Text.ParserCombinators.UU as UU
import Text.ParserCombinators.UU hiding(parse)
import Text.ParserCombinators.UU.BasicInstances hiding (Parser)
-- end of line
pEOL = pSym '\n'
pSpace = pSym ' '
pTab = pSym '\t'
indentOf s = case s of
' ' -> 1
'\t' -> 4
-- return the indentation level (number of spaces on the beginning of the line)
pIndent = (+) <$> (indentOf <$> (pSpace <|> pTab)) <*> pIndent `opt` 0
-- returns tuple of (indentation level, result of parsing the second argument)
pIndentLine p = (,) <$> pIndent <*> p <* pEOL
-- SHOULD collect all lines below witch indentations greater or equal i
myParse p i = do
(lind, expr) <- pIndentLine p
if lind < i
then pFail
else do
rest <- myParse p i `opt` []
return $ expr:rest
-- sample inputs
s1 = " a\
\\n a\
\\n"
s2 = " a\
\\na\
\\n"
-- execution
pProgram = myParse (pSym 'a') 1
parse p s = UU.parse ( (,) <$> p <*> pEnd) (createStr (LineColPos 0 0 0) s)
main :: IO ()
main = do
print $ parse pProgram s1
print $ parse pProgram s2
return ()
Which gives following output:
("aa",[])
Test.hs: no correcting alternative found
The result for s1 is correct. The result for s2 should consume first "a" and stop consuming. Where this error comes from?
The parsers which you are constructing will always try to proceed; if necessary input will be discarded or added. However pFail is a dead-end. It acts as a unit element for <|>.
In you parser there is however no other alternative present in case the input does not comply to the language recognised by the parser. In you specification you say you want the parser to fail on input s2. Now it fails with a message saying that is fails, and you are surprised.
Maybe you do not want it to fail, but you want to stop accepting further input? In that case
replace pFail by return [].
Note that the text:
do
rest <- myParse p i `opt` []
return $ expr:rest
can be replaced by (expr:) <$> (myParse p i `opt` [])
A natural way to solve your problem is probably something like
pIndented p = do i <- pGetIndent
(:) <$> p <* pEOL <*> pMany (pToken (take i (repeat ' ')) *> p <* pEOL)
pIndent = length <$> pMany (pSym ' ')

Checking that a price value in a string is in the correct format

I use n <- getLine to get from user price. How can I check is value correct ? (Price can have '.' and digits and must be greater than 0) ?
It doesn't work:
isFloat = do
n <- getLine
let val = case reads n of
((v,_):_) -> True
_ -> False
If The Input Is Always Valid Or Exceptions Are OK
If you have users entering decimal numbers in the form of "123.456" then this can simply be converted to a Float or Double using read:
n <- getLine
let val = read n
Or in one line (having imported Control.Monad):
n <- liftM read getLine
To Catch Erroneous Input
The above code fails with an exception if the users enter invalid entries. If that's a problem then use reads and listToMaybe (from Data.Maybe):
n <- liftM (fmap fst . listToMaybe . reads) getLine
If that code looks complex then don't sweat it - the below is the same operation but doing all the work with explicit case statements:
n <- getLine
let val = case reads n of
((v,_):_) -> Just v
_ -> Nothing
Notice we pattern match to get the first element of the tuple in the head of the list, The head of the list being (v,_) and the first element is v. The underscore (_) just means "ignore the value in this spot".
If Floating Point Isn't Acceptable
Floating values are well known to be approximate, and not suitable for real world financial computations (but perhaps homework, depending on your professor). In this case you'd want to read the values into a Rational (from Data.Ratio).
n <- liftM maybeRational getLine
...
where
maybeRational :: String -> Maybe Rational
maybeRational str =
let (a,b) = break (=='.') str
in liftM2 (%) (readMaybe a) (readMaybe $ drop 1 b)
readMaybe = fmap fst . listToMaybe . reads
In addition to the parsing advice provided by TomMD, consider using the appropriate monad for error reporting. It allows you to conveniently chain computations which can fail, avoiding explicit error checking on every step.
{-# LANGUAGE FlexibleContexts #-}
import Control.Monad.Error
parsePrice :: MonadError String m => String -> m Double
parsePrice s = do
x <- case reads s of
[(x, "")] -> return x
_ -> throwError "Not a valid real number."
when (x <= 0) $ throwError "Price must be positive."
return x
main = do
n <- getLine
case parsePrice n of
Left err -> putStrLn err
Right x -> putStrLn $ "Price is " ++ show x

Haskell filter option concatenation

exchangeSymbols "a§ b$ c. 1. 2. 3/" = filter (Char.isAlphaNum) (replaceStr str " " "_")
The code above is supposed to first replace all "spaces" with "_", then filter the String according to Char.isAlphaNum. Unfortunately the Char.isAlphaNum part absorbs the already exchanged "_", which isn't my intention and i want to hold the "_". So, i thought it would be nice just add an exception to the filter which goes like:
exchangeSymbols "a§ b$ c. 1. 2. 3/" = filter (Char.isAlphaNum && /='_') (replaceStr str " " "_")
You see the added && not /='_'. It produces a parse error, obviously it is not so easily possible to concatenate filter options, but is there a smart workaround ? I thought about wrapping the filter function, like a 1000 times or so with each recursion adding a new filter test (/='!'),(/='§') and so on without adding the (/='_'). However it doesn't seem to be a handy solution.
Writing
... filter (Char.isAlphaNum && /='_') ...
is actually a type error (the reason why it yields a parse error is maybe that you used /= as prefix - but its an infix operator). You cannot combine functions with (&&) since its an operator on booleans (not on functions).
Acutally this code snipped should read:
... filter (\c -> Char.isAlphaNum c && c /= '_') ...
Replace your filter with a list comprehension.
[x | x <- replaceStr str " " "_", x /= '_', Char.isAplhaNum x]
Naturally, you probably want to have multiple exceptions. So define a helper function:
notIn :: (Eq a) => [a] -> a -> Bool
notIn [] _ = True
notIn x:xs y = if x == y
then False
else notIn xs
EDIT: Apparently you can use notElem :: (Eq a) => a -> [a] -> Bool instead. Leaving above code for educational purposes.
And use that in your list comprehension:
[x | x <- replaceStr str " " "_", notElem x "chars to reject", Char.isAlphaNum x]
Untested, as haskell isn't installed on this machine. Bonus points if you are doing a map after the filter, since you can then put that in the list comprehension.
Edit 2: Try this instead, I followed in your footsteps instead of thinking it out myself:
[x | x <- replaceStr str " " "_", Char.isAlphaNum x || x == ' ']
[x | x <- replaceStr str " " "_", Char.isAlphaNum x || x `elem` "chars to accept"]
At this point the list comprehension doesn't help much. The only reason I did change it was because I you requested an &&, for which using a list comprehension is great.
Since it seems that you don't quite understand the principle of the list comprehension, its basically applying a bunch of filters and then a map with more than one source, for example:
[(x, y, x + y) | x <- [1, 2, 3, 4, 5], y <- [2, 4], x > y]

Resources