Add line break to generated CST node - rascal

I am modifying a CST tree in order to add information in it like this
(EvoQuery) `<Status s> <QlQuery q>`;
But I'd like to have a line break between the Status and the QLQuery. When I try this :
(EvoQuery) `<Status s> \n <QlQuery q>`;
Rascal marks a syntax error. What is the proper way to format a CST node ?

The way to introduce newlines in concrete syntax is by writing it literally, just like in string template syntax:
(EvoQuery) `<Status s>
' <QlQuery q>`;
Compare to the string template syntax:
str x = "<s>
' <q>";

Related

Match Symbol specific number of times

When defining a syntax, it is possible to match 1 or more times (+) or 0 or more times (*) similarly to how it is done in regex. However, I have not found in the rascal documentation if it is possible to also match a Symbol a specific amount of times. In regex (and Rascal patterns) this is done with an integer between two curly brackets but this doesn't seem to work for syntax definition. Ideally, I'd want something like:
lexical Line = [0-9.]+;
syntax Sym = sym: {Line Newline}{5};
Which would only try to match the first 5 lines of the text below:
..0..
11.11
44.44
1.11.1
33333
55555
No this meta syntax does not exist in Rascal. We did not add it.
You could write an over-estimation like this and have a post-parse filter reject more than 5 items:
syntax Sym = fiveLines: (Line NewLine)+ lines
visit (myParseTree) {
case (Sym) `<(Line NewLine)+ lines>` :
throw ParseError(x.src) when length(lines) != 5;
}
Or unfold the loop like so:
syntax Sym
= Line NewLine
Line NewLine
Line NewLine
Line NewLine
Line NewLine
;
Repetition with an integer parameter sounds like a good feature request for us the consider, if you need it badly. We only have to consider what it means for Rascal's type-system; for the parser generator its a simple rule to add.

Parsing simple markup language with Haskell

I'm trying to implement a very simple markup language. I have an intermediate representation that looks like:
data Token = Str Text
| Explode Text
type Rep = [Token]
So, the idea is to turn an arbitrary text of the form:
The quick brown %%fox%% %%jumps%% over the %%lazy%% dog.
into:
[Str "The quick brown", Explode "fox", Explode "jumps", Str "over the", Explode "lazy", Str "dog"]
for further processing. Also, it is important that we treat:
%%fox%% %%jumps%%
differently than
%%fox jumps%%
The latter should (Explode "fox jumps")
I tried to implement this using attoparsec, but I don't think I have the tools I need. But I'm not so good with parsing theory (I studied math, not CS). What kind of grammar is this? What kind of parser combinator library should I use? I considered using Parsec with a stateful monad transformer stack to keep track of the context. Does that sound sensible?
You can take the cheap and easy way, without a proper parser. The important thing to recognise is that this grammar is actually fairly simple – it has no recursion or such. It is just a flat listing of Strs and Explodes.
The easy way
So we can start by breaking the string down into a list containing the text and the separators as separate values. We need a data type to separate the separators (%%) from actual text (everything else.)
data ParserTokens = Sep | T Text
Breaking it down
Then we need to break the list into its constituents.
tokenise = intersperse Sep . map T . Text.splitOn "%%"
This will first split the string on %%, so in your example it'll become
["The quick brown ","fox"," ","jumps"," over the ","lazy"," dog."]
then we map T over it, to turn it from a [Text] to a [ParserTokens]. Finally, we intersperse Sep over it, to reintroduce the %% separators but in a shape that's easier to deal with. The result is, in your example,
[T "The quick brown ",Sep,T "fox",Sep,T " ",Sep,T "jumps",Sep,T " over the ",Sep,T "lazy",Sep,T " dog."]
Building it up
With this done, all that remains is parsing this thing into the shape you want it. Parsing this amounts to finding the 1-2-3 punch of Sep–T "something"–Sep and replacing it with Explode "something". We write a recursive function to do this.
construct [] = []
construct (T s : rest) = Str s : construct rest
construct (Sep : T s : Sep : rest) = Explode s : construct rest
construct _ = error "Mismatched '%%'!"
This converts T s to Str s and the combination of separators and a T s into an Explode s. If the pattern matching fails, it's because there were a stray separator somewhere, so I've just set it to crash the program. You might want better error handling there – such as wrapping the result in Either String or something similar.
With this done, we can create the function
parseTemplate = construct . tokenise
and in the end, if we run your example through parseTemplate, we get the expected result
[Str "The quick brown ",Explode "fox",Str " ",Explode "jumps",Str " over the ",Explode "lazy",Str " dog."]
For such simple parser even Attoparsec seems to be overkill:
parse = map (\w -> case w of
'%':'%':expl -> Explode $ init $ init expl
str -> Str str) . words
Of course, this code needs some sanity checks for Explode case.
This doesn't handle whitespace the way you specified, but it should get you on the right track.
parseMU = zipWith ($) (cycle [Str,Explode]) . splitps where
splitps :: String -> [String]
splitps [] = [[]]
splitps ('%':'%':r) = [] : splitps r
splitps (c:r) = let
(a:r') = splitps r
in ((c:a):r')

Lua string find - How to handle strings with a hyphen?

I have two strings - each string has many lines like the following:
string1 = " DEFAULT-VLAN | Manual 10.1.1.3 255.255.255.0 "
string2 = " 1 DEFAULT-VLAN | Port-based No No"
The first string I split into the following strings: "DEFAULT-VLAN", "|", "Manual"...
Then I want to look up the ID ("1") in string2 for the vlanName ("DEFAULT-VLAN") from string1.
I use this code to find the correct substring:
vpos1, vpos2 = vlan:find("%d-%s-" .. vlanName .. "%s-|")
But vpos1 and vpos2 are nil; When the hyphen ("-") is deleted from the vlanName it is working.
Shouldn't Lua take care to escape the special characters in such strings? The string is handed over from my C++ application to Lua and there may be lots of special characters.
Is there an easy way to solve this?
Thanks!
Lua is not magic. All the expression "%d-%s-" .. vlanName .. "%s-|" does is concatenate some strings, producing a final string. It has no idea what that string is intended to be used for. Only string.find knows that, and it can't have any affect on how the parameter it is given will be used.
So yes, vlanName will be interpreted as a Lua pattern. And if you want to use special characters, you will need to escape them. I would suggest using string.gsub for that. It'd be something like this:
vlanName:gsub("[%-...]", "%%%0")
Where ... are any other characters you want to escape.

F# regular expression string pattern change meaning

I have another question for regular expression in F#:
let tagName = "div"
let ptnTagNotClose = "<" + tagName + "(>|\s+[^>]*>)[^<]"
I want to find the matches for not closing tag in HTML file. The pattern string works in VB.NET.
But for F#, when I debug the above code, I can see the value for ptnTagNotClose:
ptnTagNotClose "<div(>|\\s+[^>]*>)[^<]"
F# automatically change "\s+" to "\\s+", but for regular expression, "\s+" and "\\s+" are different, the results are also different.
Please let me know what to do to avoid F# automatically change the string pattern.
Verbatim string literal could be one solution, but since the tagName can change, i.e. let tagName = "br", then how I can apply verbatim string literal in this case?
Thanks!
John
I don't think that the debug output means what you think it does; using a verbatim string (like "<" + tagName + #"(>|\s+[^>]*>)[^<]") will give you the exact same result because \s isn't a valid escape sequence, so F# interprets the backslash as a literal backslash rather than an escape character.

Removing lines that begin with > in a rails string

I'm trying to remove any lines that begin with the character '>' in a long string (i.e. replies to an email).
In PHP I'd iterate over each line with an if statement, in linux I'd try and use sed or awk.
What's the most elegant rails approach?
You can try this:
your_string.gsub(/^\>.+\n/,'')
Your question is implying that the input is one string, containing multiple lines.
Do you want the output to be just one string with multiple lines as well? I'm assuming yes.
either using String and Array operations:
str.lines.reject{|x| x =~ /^>/}.join # this will return a new string, without those ">" lines
or using Regular Expressions:
str.gsub(/^>.+\n*/. '')
Better Solution:
You will need to use non-greedy multi-line matching mode for your Regular Expression:
str.gsub(/^>.*?$\n*/m, '') # by using gsub!() you can modify the string in place
^> matches your ">" character at the start of a line
.*?$ matches any characters after the start character until the end of the line (non-greedy)
\n* matches the newline character itself if any (you want to remove that as well)
the "m" at the end of the regular expressions indicates multi-line matching , which will apply the RegExp for each line in the string.
It should work as you expect:
your_string.lines.to_a.reject{|line| line[0] == '>'}.join

Resources