function expandVars(tmpl,t)
return (tmpl:gsub('%$([%a ][%w ]+)', t)) end
local sentence = expandVars("The $adj $char1 looks at you and says, $name, you are $result", {adj="glorious", name="Jayant", result="the Overlord", char1="King"})
print(sentence)
The above code work only when I have ',' after the variable name like, in above sentence it work for $ name and $ result but not for $adj and $char1, Why is that ?
Problem:
Your pattern [%a ][%w ]+ means a letter or space, followed by at least one letter or number or space. Since regexp is greedy, it will try to match as large a sequence as possible, and the match will include the space:
function expandVars(tmpl,t)
return string.gsub(tmpl, '%$([%a ][%w ]+)', t)
end
local sentence = expandVars(
"$a1 $b and c $d e f ",
{["a1 "]="(match is 'a1 ')", ["b and c "]="(match is 'b and c ')", ["d e f "]="(match is 'd e f ')", }
)
This prints
(match is 'a1 ')(match is 'b and c ')(match is 'd e f ')
Solution:
The variable names must match keys from your table; you could accepts keys that have spaces and all sort of characters but then you are forcing the user to use [] in the table keys, as done above, this is not very nice :)
Better keep it to alphanumeric and underscore, with the constraint that it cannot start with a number. This means to be generic you want a letter (%a), followed by any number of (including none) (* rather than +) of alphanumeric and underscore [%w_]:
function expandVars(tmpl,t)
return string.gsub(tmpl, '%$(%a[%w_]*)', t)
end
local sentence = expandVars(
"$a $b1 and c $d_2 e f ",
{a="(match is 'a')", b1="(match is 'b1')", d_2="(match is 'd_2')", }
)
print(sentence)
This prints
(match is 'a') (match is 'b1') and c (match is 'd_2') e f; non-matchable: $_a $1a b
which shows how the leading underscore and leading digit were not accepted.
Related
I'm currently working on a parser for a simple programming language written in Haskell. I ran into a problem when I tried to allow for binary operators with differing associativities and precedences. Normally this wouldn't be an issue, but since my language allows users to define their own operators, the precedence of operators isn't known by the compiler until the program has already been parsed.
Here are some of the data types I've defined so far:
data Expr
= Var String
| Op String Expr Expr
| ..
data Assoc
= LeftAssoc
| RightAssoc
| NonAssoc
type OpTable =
Map.Map String (Assoc, Int)
At the moment, the compiler parses all operators as if they were right-associative with equal precedence. So if I give it an expression like a + b * c < d the result will be Op "+" (Var "a") (Op "*" (Var "b") (Op "<" (Var "c") (Var "d"))).
I'm trying to write a function called fixExpr which takes an OpTable and an Expr and rearranges the Expr based on the associativities and precedences listed in the OpTable. For example:
operators :: OpTable
operators =
Map.fromList
[ ("<", (NonAssoc, 4))
, ("+", (LeftAssoc, 6))
, ("*", (LeftAssoc, 7))
]
expr :: Expr
expr = Op "+" (Var "a") (Op "*" (Var "b") (Op "<" (Var "c") (Var "d")))
fixExpr operators expr should evaluate to Op "<" (Op "+" (Var "a") (Op "*" (Var "b") (Var "c"))) (Var "d").
How do I define the fixExpr function? I've tried multiple solutions and none of them have worked.
An expression e may be an atomic term n (e.g. a variable or literal), a parenthesised expression, or an application of an infix operator ○.
e ⩴ n | (e) | e1 ○ e2
We need the parentheses to know whether the user entered a * b + c, which we happen to associate as a * (b + c) and need to reassociate as (a * b) + c, or if they entered a * (b + c) literally, which should not be reassociated. Therefore I’ll make a small change to the data type:
data Expr
= Var String
| Group Expr
| Op String Expr Expr
| …
Then the method is simple:
The rebracketing of an expression ⟦e⟧ applies recursively to all its subexpressions.
⟦n⟧ = n
⟦(e)⟧ = (⟦e⟧)
⟦e1 ○ e2⟧ = ⦅⟦e1⟧ ○ ⟦e2⟧⦆
A single reassociation step ⦅e⦆ removes redundant parentheses on the right, and reassociates nested operator applications leftward in two cases: if the left operator has higher precedence, or if the two operators have equal precedence, and are both left-associative. It leaves nested infix applications alone, that is, associating rightward, in the opposite cases: if the right operator has higher precedence, or the operators have equal precedence and right associativity. If the associativities are mismatched, then the result is undefined.
⦅e ○ n⦆ = e ○ n
⦅e1 ○ (e2)⦆ = ⦅e1 ○ e2⦆
⦅e1 ○ (e2 ● e3)⦆ =
⦅e1 ○ e2⦆ ● e3, if:
a. P(○) > P(●); or
b. P(○) = P(●) and A(○) = A(●) = L
e1 ○ (e2 ● e3), if:
a. P(○) < P(●); or
b. P(○) = P(●) and A(○) = A(●) = R
undefined otherwise
NB.: P(o) and A(o) are respectively the precedence and associativity (L or R) of operator o.
This can be translated fairly literally to Haskell:
fixExpr operators = reassoc
where
-- 1.1
reassoc e#Var{} = e
-- 1.2
reassoc (Group e) = Group (reassoc e)
-- 1.3
reassoc (Op o e1 e2) = reassoc' o (reassoc e1) (reassoc e2)
-- 2.1
reassoc' o e1 e2#Var{} = Op o e1 e2
-- 2.2
reassoc' o e1 (Group e2) = reassoc' o e1 e2
-- 2.3
reassoc' o1 e1 r#(Op o2 e2 e3) = case compare prec1 prec2 of
-- 2.3.1a
GT -> assocLeft
-- 2.3.2a
LT -> assocRight
EQ -> case (assoc1, assoc2) of
-- 2.3.1b
(LeftAssoc, LeftAssoc) -> assocLeft
-- 2.3.2b
(RightAssoc, RightAssoc) -> assocRight
-- 2.3.3
_ -> error $ concat
[ "cannot mix ‘", o1
, "’ ("
, show assoc1
, " "
, show prec1
, ") and ‘"
, o2
, "’ ("
, show assoc2
, " "
, show prec2
, ") in the same infix expression"
]
where
(assoc1, prec1) = opInfo o1
(assoc2, prec2) = opInfo o2
assocLeft = Op o2 (Group (reassoc' o1 e1 e2)) e3
assocRight = Op o1 e1 r
opInfo op = fromMaybe (notFound op) (Map.lookup op operators)
notFound op = error $ concat
[ "no precedence/associativity defined for ‘"
, op
, "’"
]
Note the recursive call in assocLeft: by reassociating the operator applications, we may have revealed another association step, as in a chain of left-associative operator applications like a + b + c + d = (((a + b) + c) + d).
I insert Group constructors in the output for illustration, but they can be removed at this point, since they’re only necessary in the input.
This hasn’t been tested very thoroughly at all, but I think the idea is sound, and should accommodate modifications for more complex situations, even if the code leaves something to be desired.
An alternative that I’ve used is to parse expressions as “flat” sequences of operators applied to terms, and then run a separate parsing pass after name resolution, using e.g. Parsec’s operator precedence parser facility, which would handle these details automatically.
I'm new to the "LPeg" and "re" modules of Lua, currently I want to write a pattern based on following rules:
Match the string that starts with "gv_$/gv$/v$/v_$/x$/xv$/dba_/all_/cdb_", and the prefix "SYS.%s*" or "PUBLIC.%s*" is optional
The string should not follow a alphanumeric, i.e., the pattern would not match "XSYS.DBA_OBJECTS" because it follows "X"
The pattern is case-insensitive
For example, below strings should match the pattern:
,sys.dba_objects, --should return "sys.dba_objects"
SyS.Dba_OBJECTS
cdb_objects
dba_hist_snapshot) --should return "dba_hist_snapshot"
Currently my pattern is below which can only match non-alphanumeric+string in upper case :
p=re.compile[[
pattern <- %W {owner* name}
owner <- 'SYS.'/ 'PUBLIC.'
name <- {prefix %a%a (%w/"_"/"$"/"#")+}
prefix <- "GV_$"/"GV$"/"V_$"/"V$"/"DBA_"/"ALL_"/"CDB_"
]]
print(p:match(",SYS.DBA_OBJECTS"))
My questions are:
How to achieve the case-insensitive matching? There are some topics about the solution but I'm too new to understand
How to exactly return the matched string only, instead of also have to plus %W? Something like "(?=...)" in Java
Highly appreciated if you can provide the pattern or related function.
You can try to tweak this grammar
local re = require're'
local p = re.compile[[
pattern <- ((s? { <name> }) / s / .)* !.
name <- (<owner> s? '.' s?)? <prefix> <ident>
owner <- (S Y S) / (P U B L I C)
prefix <- (G V '_'? '$') / (V '_'? '$') / (D B A '_') / (C D B '_')
ident <- [_$#%w]+
s <- (<comment> / %s)+
comment <- '--' (!%nl .)*
A <- [aA]
B <- [bB]
C <- [cC]
D <- [dD]
G <- [gG]
I <- [iI]
L <- [lL]
P <- [pP]
S <- [sS]
U <- [uU]
V <- [vV]
Y <- [yY]
]]
local m = { p:match[[
,sys.dba_objects, --should return "sys.dba_objects"
SyS.Dba_OBJECTS
cdb_objects
dba_hist_snapshot) --should return "dba_hist_snapshot"
]] }
print(unpack(m))
. . . prints match table m:
sys.dba_objects SyS.Dba_OBJECTS cdb_objects dba_hist_snapshot
Note that case-insensitivity is quite hard to achieve out of the lexer so each letter has to get a separate rule -- you'll need more of these eventually.
This grammar is taking care of the comments in your sample and skips them along with whitespace so matches after "should return" are not present in output.
You can fiddle with prefix and ident rules to specify additional prefixes and allowed characters in object names.
Note: !. means end-of-file. !%nl means "not end-of-line". ! p and & p are constructing non-consuming patterns i.e. current input pointer is not incremented on match (input is only tested).
Note 2: print-ing with unpack is a gross hack.
Note 3: Here is a tracable LPeg re that can be used to debug grammars. Pass true for 3-rd param of re.compile to get execution trace with test/match/skip action on each rule and position visited.
Finally I got an solution but not so graceful, which is to add an additional parameter case_insensitive into re.compile, re.find, re.match and re.gsubfunctions. When the parameter value is true, then invoke case_insensitive_pattern to rewrite the pattern:
...
local fmt="[%s%s]"
local function case_insensitive_pattern(quote,pattern)
-- find an optional '%' (group 1) followed by any character (group 2)
local stack={}
local is_letter=nil
local p = pattern:gsub("(%%?)(.)",
function(percent, letter)
if percent ~= "" or not letter:match("%a") then
-- if the '%' matched, or `letter` is not a letter, return "as is"
if is_letter==false then
stack[#stack]=stack[#stack]..percent .. letter
else
stack[#stack+1]=percent .. letter
is_letter=false
end
else
if is_letter==false then
stack[#stack]=quote..stack[#stack]..quote
is_letter=true
end
-- else, return a case-insensitive character class of the matched letter
stack[#stack+1]=fmt:format(letter:lower(), letter:upper())
end
return ""
end)
if is_letter==false then
stack[#stack]=quote..stack[#stack]..quote
end
if #stack<2 then return stack[1] or (quote..pattern..quote) end
return '('..table.concat(stack,' ')..')'
end
local function compile (p, defs, case_insensitive)
if mm.type(p) == "pattern" then return p end -- already compiled
if case_insensitive==true then
p=p:gsub([[(['"'])([^\n]-)(%1)]],case_insensitive_pattern):gsub("%(%s*%((.-)%)%s*%)","(%1)")
end
local cp = pattern:match(p, 1, defs)
if not cp then error("incorrect pattern", 3) end
return cp
end
...
I try to remove 1 whitespace from this string:
m y r e a l n a m e i s d o n a l d d u c k
Expected result:
my real name is donald duck
My code are:
def solve_cipher(input)
input.split('').map { |c| (c.ord - 3).chr }.join(' ') # <- Here I tried everything
end
puts solve_cipher('p| uhdo qdph lv grqdog gxfn')
# => m y r e a l n a m e i s d o n a l d d u c k
I tried everything to solve my problem, example:
input.....join(' ').squeeze(" ").strip # => m y r e a l n a m e...
or
input.....join.gsub(' ','') # => myrealnameisdonaldduck
or
input.....join(' ').lstrip # => m y r e a l n a m e...
and so on...
Well, you could split the string into words first, then split each word into characters. Using the same method you used in your code, it could look like this.
def solve_cipher(input) input.split(' ').map{ |w| w.split('').map { |c| (c.ord - 3).chr}.join('')}.join(' ') end
When joining the characters in the same word, we put no space between them; when joining the words together we put one space between them.
As stated in the question, you are using Rails, so you can also try squish method:
def solve_cipher( input )
input.split(' ').map(&:squish).join(' ')
end
str = "m y r e a l n a m e i s d o n a l d d u c k"
str.gsub(/\s(?!\s)/,'')
#=> "my real name is donald duck"
The regex matches a whitespace character not followed by another whitespace character and replaces the matched characters with empty strings. (?!\s) is a negative lookahead that matches a whitespace.
If more than two spaces may be present between words, first replace three or more spaces with two spaces, as follows.
str = "m y r e a l n a m e i s d o n a l d d u c k"
str.gsub(/\s{3,}/, " ").gsub(/\s(?!\s)/,'')
#=> "my real name is donald duck"
I know that it is not a fancy way of doing it but you could just try to create a new string and have a function traversal(input) with a counter initiated at 0, that would return this new string.
It would go through your input (which is here your string) and if the counter is 0 and it sees a space it just ignores it, increments a counter and go to the next character of the string.
If the counter is different of 0 and it sees a space it just concatenates it to the new string.
And if the counter is different of 0 and it sees something different of a space, it concatenates it to the new string and counter equals 0 again.
The trick is to use a capture group
"m y r e a l n a m e i s d o n a l d d u c k".gsub(/(\s)(.)/, '\2')
=> "my real name is donald duck
My Question: What is the cleanest way to pretty print an expression without redundant parentheses?
I have the following representation of lambda expressions:
Term ::= Fun(String x, Term t)
| App(Term t1, Term t2)
| Var(String x)
By convention App is left associative, that is a b c is interpreted as (a b) c and function bodies stretch as far to the right as possible, that is, λ x. x y is interpreted as λ x. (x y).
I have a parser that does a good job, but now I want a pretty printer. Here's what I currently have (pseudo scala):
term match {
case Fun(v, t) => "(λ %s.%s)".format(v, prettyPrint(t))
case App(s, t) => "(%s %s)".format(prettyPrint(s), prettyPrint(t))
case Var(v) => v
}
The above printer always puts ( ) around expressions (except for atomic variables). Thus for Fun(x, App(Fun(y, x), y)) it produces
(λ x.((λ y.x) y))
I would like to have
λ x.(λ y.x) y
Here I'll use a simple grammar for infix expressions with the associativity and precedence defined by the following grammar whose operators are listed in ascending order of precedence
E -> E + T | E - T | T left associative
T -> T * F | T / F | F left associative
F -> G ^ F | G right associative
G -> - G | ( E ) | NUM
Given an abstract syntax tree (AST) we convert the AST to a string with only the necessary parenthesis as described in the pseudocode below. We examine relative precedence and associativity as we recursively descend the tree to determine when parenthesis are necessary. Note that all decisions to wrap parentheses around an expression must be made in the parent node.
toParenString(AST) {
if (AST.type == NUM) // simple atomic type (no operator)
return toString(AST)
else if (AST.TYPE == UNARY_MINUS) // prefix unary operator
if (AST.arg.type != NUM AND
precedence(AST.op) > precedence(AST.arg.op))
return "-(" + toParenString(AST.arg) + ")"
else
return "-" + toParenString(AST.arg)
else { // binary operation
var useLeftParen =
AST.leftarg.type != NUM AND
(precedence(AST.op) > precedence(AST.leftarg.op) OR
(precedence(AST.op) == precedence(AST.leftarg.op) AND
isRightAssociative(AST.op)))
var useRightParen =
AST.rightarg.type != NUM AND
(precedence(AST.op) > precedence(AST.rightarg.op) OR
(precedence(AST.op) == precedence(AST.rightarg.op) AND
isLeftAssociative(AST.op)))
var leftString;
if (useLeftParen) {
leftString = "(" + toParenString(AST.leftarg) + ")"
else
leftString = toParenString(AST.leftarg)
var rightString;
if (useRightParen) {
rightString = "(" + toParenString(AST.rightarg) + ")"
else
rightString = toParenString(AST.rightarg)
return leftString + AST.op + rightString;
}
}
Isn't it that you just have to check the types of the arguments of App?
I'm not sure how to write this in scala..
term match {
case Fun(v: String, t: Term) => "λ %s.%s".format(v, prettyPrint(t))
case App(s: Fun, t: App) => "(%s) (%s)".format(prettyPrint(s), prettyPrint(t))
case App(s: Term, t: App) => "%s (%s)".format(prettyPrint(s), prettyPrint(t))
case App(s: Fun, t: Term) => "(%s) %s".format(prettyPrint(s), prettyPrint(t))
case App(s: Term, t: Term) => "%s %s".format(prettyPrint(s), prettyPrint(t))
case Var(v: String) => v
}
exchangeSymbols "a§ b$ c. 1. 2. 3/" = filter (Char.isAlphaNum) (replaceStr str " " "_")
The code above is supposed to first replace all "spaces" with "_", then filter the String according to Char.isAlphaNum. Unfortunately the Char.isAlphaNum part absorbs the already exchanged "_", which isn't my intention and i want to hold the "_". So, i thought it would be nice just add an exception to the filter which goes like:
exchangeSymbols "a§ b$ c. 1. 2. 3/" = filter (Char.isAlphaNum && /='_') (replaceStr str " " "_")
You see the added && not /='_'. It produces a parse error, obviously it is not so easily possible to concatenate filter options, but is there a smart workaround ? I thought about wrapping the filter function, like a 1000 times or so with each recursion adding a new filter test (/='!'),(/='§') and so on without adding the (/='_'). However it doesn't seem to be a handy solution.
Writing
... filter (Char.isAlphaNum && /='_') ...
is actually a type error (the reason why it yields a parse error is maybe that you used /= as prefix - but its an infix operator). You cannot combine functions with (&&) since its an operator on booleans (not on functions).
Acutally this code snipped should read:
... filter (\c -> Char.isAlphaNum c && c /= '_') ...
Replace your filter with a list comprehension.
[x | x <- replaceStr str " " "_", x /= '_', Char.isAplhaNum x]
Naturally, you probably want to have multiple exceptions. So define a helper function:
notIn :: (Eq a) => [a] -> a -> Bool
notIn [] _ = True
notIn x:xs y = if x == y
then False
else notIn xs
EDIT: Apparently you can use notElem :: (Eq a) => a -> [a] -> Bool instead. Leaving above code for educational purposes.
And use that in your list comprehension:
[x | x <- replaceStr str " " "_", notElem x "chars to reject", Char.isAlphaNum x]
Untested, as haskell isn't installed on this machine. Bonus points if you are doing a map after the filter, since you can then put that in the list comprehension.
Edit 2: Try this instead, I followed in your footsteps instead of thinking it out myself:
[x | x <- replaceStr str " " "_", Char.isAlphaNum x || x == ' ']
[x | x <- replaceStr str " " "_", Char.isAlphaNum x || x `elem` "chars to accept"]
At this point the list comprehension doesn't help much. The only reason I did change it was because I you requested an &&, for which using a list comprehension is great.
Since it seems that you don't quite understand the principle of the list comprehension, its basically applying a bunch of filters and then a map with more than one source, for example:
[(x, y, x + y) | x <- [1, 2, 3, 4, 5], y <- [2, 4], x > y]