How do I parse a string at compile time in Nimrod? - parsing

Going through the second part of Nimrod's tutorial I've reached the part were macros are explained. The documentation says they run at compile time, so I thought I could do some parsing of strings to create myself a domain specific language. However, there are no examples of how to do this, the debug macro example doesn't display how one deals with a string parameter.
I want to convert code like:
instantiate("""
height,f,132.4
weight,f,75.0
age,i,25
""")
…into something which by hand I would write like:
var height: float = 132.4
var weight: float = 75.0
var age: int = 25
Obviously this example is not very useful, but I want to look at something simple (multiline/comma splitting, then transformation) which could help me implement something more complex.
My issue here is how does the macro obtain the input string, parse it (at compile time!), and what kind of code can run at compile time (is it just a subset of a languaje? can I use macros/code from other imported modules)?
EDIT: Based on the answer here's a possible code solution to the question:
import macros, strutils
# Helper proc, macro inline lambdas don't seem to compile.
proc cleaner(x: var string) = x = x.strip()
macro declare(s: string): stmt =
# First split all the input into separate lines.
var
rawLines = split(s.strVal, {char(0x0A), char(0x0D)})
buf = ""
for rawLine in rawLines:
# Split the input line into three columns, stripped, and parse.
var chunks = split(rawLine, ',')
map(chunks, cleaner)
if chunks.len != 3:
error("Declare macro syntax is 3 comma separated values:\n" &
"Got: '" & rawLine & "'")
# Add the statement, preppending a block if the buffer is empty.
if buf.len < 1: buf = "var\n"
buf &= " " & chunks[0] & ": "
# Parse the input type, which is an abbreviation.
case chunks[1]
of "i": buf &= "int = "
of "f": buf &= "float = "
else: error("Unexpected type '" & chunks[1] & "'")
buf &= chunks[2] & "\n"
# Finally, check if we did add any variable!
if buf.len > 0:
result = parseStmt(buf)
else:
error("Didn't find any input values!")
declare("""
x, i, 314
y, f, 3.14
""")
echo x
echo y

Macros can, by and large, utilize all pure Nimrod code that a procedure in the same place could see, too. E.g., you can import strutils or peg to parse your string, then construct output from that. Example:
import macros, strutils
macro declare(s: string): stmt =
var parts = split(s.strVal, {' ', ','})
if len(parts) != 3:
error("declare macro requires three parts")
result = parseStmt("var $1: $2 = $3" % parts)
declare("x, int, 314")
echo x
"Calling" a macro will basically evaluate it at compile time as though it were a procedure (with the caveat that the macro arguments will actually be ASTs, hence the need to use s.strVal above instead of s), then insert the AST that it returns at the position of the macro call.
The macro code is evaluated by the compiler's internal virtual machine.

Related

How to iterate over a compile-time seq in a manner that unrolls the loop?

I have a sequence of values that I know at compile-time, for example: const x: seq[string] = #["s1", "s2", "s3"]
I want to loop over that seq in a manner that keeps the variable a static string instead of a string as I intend to use these strings with macros later.
I can iterate on objects in such a manner using the fieldPairs iterator, but how can I do the same with just a seq?
A normal loop such as
for s in x:
echo s is static string
does not work, as s will be a string, which is not what I need.
The folks over at the nim forum were very helpful (here the thread).
The solution appears to be writing your own macro to do this. 2 solutions I managed to make work for me were from the users mratsim and a specialized version from hlaaftana
Hlaaftana's version:
This one unrolls the loop over the various values in the sequence. By that I mean, that the "iterating variable s" changes its value and is always the value of one of the entries of that compile-time seq x (or in this example a). In that way it functions basically like a normal for-in loop.
import macros
macro unrollSeq(x: static seq[string], name, body: untyped) =
result = newStmtList()
for a in x:
result.add(newBlockStmt(newStmtList(
newConstStmt(name, newLit(a)),
copy body
)))
const a = #["la", "le", "li", "lo", "lu"]
unrollSeq(a, s):
echo s is static
echo s
mratsim's version:
This one doesn't unroll a loop over the values, but over a range of indices.
You basically tell the staticFor macro over what range of values you want an unrolled for loop and it generates that for you. You can access the individual entries in the seq then with that index.
import std/macros
proc replaceNodes(ast: NimNode, what: NimNode, by: NimNode): NimNode =
# Replace "what" ident node by "by"
proc inspect(node: NimNode): NimNode =
case node.kind:
of {nnkIdent, nnkSym}:
if node.eqIdent(what):
return by
return node
of nnkEmpty:
return node
of nnkLiterals:
return node
else:
var rTree = node.kind.newTree()
for child in node:
rTree.add inspect(child)
return rTree
result = inspect(ast)
macro staticFor*(idx: untyped{nkIdent}, start, stopEx: static int, body: untyped): untyped =
result = newStmtList()
for i in start .. stopEx: # Slight modification here to make indexing behave more in line with the rest of nim-lang
result.add nnkBlockStmt.newTree(
ident("unrolledIter_" & $idx & $i),
body.replaceNodes(idx, newLit i)
)
staticFor(index, x.low, x.high):
echo index
echo x[index] is static string
Elegantbeefs version
Similar to Hlaaftana's version this unrolls the loop itself and provides you a value, not an index.
import std/[macros, typetraits]
proc replaceAll(body, name, wth: NimNode) =
for i, x in body:
if x.kind == nnkIdent and name.eqIdent x:
body[i] = wth
else:
x.replaceAll(name, wth)
template unrolledFor*(nameP, toUnroll, bodyP: untyped): untyped =
mixin
getType,
newTree,
NimNodeKind,
`[]`,
add,
newIdentDefs,
newEmptyNode,
newStmtList,
newLit,
replaceAll,
copyNimTree
macro myInnerMacro(name, body: untyped) {.gensym.} =
let typ = getType(typeof(toUnroll))
result = nnkBlockStmt.newTree(newEmptyNode(), newStmtList())
result[^1].add nnkVarSection.newTree(newIdentDefs(name, typ[^1]))
for x in toUnroll:
let myBody = body.copyNimTree()
myBody.replaceAll(name, newLit(x))
result[^1].add myBody
myInnerMacro(nameP, bodyP)
const x = #["la", "le", "Li"]
unrolledFor(value, x):
echo value is static
echo value
All of them are valid approaches.

Implement heredocs with trim indent using PEG.js

I working on a language similar to ruby called gaiman and I'm using PEG.js to generate the parser.
Do you know if there is a way to implement heredocs with proper indentation?
xxx = <<<END
hello
world
END
the output should be:
"hello
world"
I need this because this code doesn't look very nice:
def foo(arg) {
if arg == "here" then
return <<<END
xxx
xxx
END
end
end
this is a function where the user wants to return:
"xxx
xxx"
I would prefer the code to look like this:
def foo(arg) {
if arg == "here" then
return <<<END
xxx
xxx
END
end
end
If I trim all the lines user will not be able to use a string with leading spaces when he wants. Does anyone know if PEG.js allows this?
I don't have any code yet for heredocs, just want to be sure if something that I want is possible.
EDIT:
So I've tried to implement heredocs and the problem is that PEG doesn't allow back-references.
heredoc = "<<<" marker:[\w]+ "\n" text:[\s\S]+ marker {
return text.join('');
}
It says that the marker is not defined. As for trimming I think I can use location() function
I don't think that's a reasonable expectation for a parser generator; few if any would be equal to the challenge.
For a start, recognising the here-string syntax is inherently context-sensitive, since the end-delimiter must be a precise copy of the delimiter provided after the <<< token. So you would need a custom lexical analyser, and that means that you need a parser generator which allows you to use a custom lexical analyser. (So a parser generator which assumes you want a scannerless parser might not be the optimal choice.)
Recognising the end of the here-string token shouldn't be too difficult, although you can't do it with a single regular expression. My approach would be to use a custom scanning function which breaks the here-string into a series of lines, concatenating them as it goes until it reaches a line containing only the end-delimiter.
Once you've recognised the text of the literal, all you need to normalise the spaces in the way you want is the column number at which the <<< starts. With that, you can trim each line in the string literal. So you only need a lexical scanner which accurately reports token position. Trimming wouldn't normally be done inside the generated lexical scanner; rather, it would be the associated semantic action. (Equally, it could be a semantic action in the grammar. But it's always going to be code that you write.)
When you trim the literal, you'll need to deal with the cases in which it is impossible, because the user has not respected the indentation requirement. And you'll need to do something with tab characters; getting those right probably means that you'll want a lexical scanner which computes visible column positions rather than character offsets.
I don't know if peg.js corresponds with those requirements, since I don't use it. (I did look at the documentation, and failed to see any indication as to how you might incorporate a custom scanner function. But that doesn't mean there isn't a way to do it.) I hope that the discussion above at least lets you check the detailed documentation for the parser generator you want to use, and otherwise find a different parser generator which will work for you in this use case.
Here is the implementation of heredocs in Peggy successor to PEG.js that is not maintained anymore. This code was based on the GitHub issue.
heredoc = "<<<" begin:marker "\n" text:($any_char+ "\n")+ _ end:marker (
&{ return begin === end; }
/ '' { error(`Expected matched marker "${begin}", but marker "${end}" was found`); }
) {
const loc = location();
const min = loc.start.column - 1;
const re = new RegExp(`\\s{${min}}`);
return text.map(line => {
return line[0].replace(re, '');
}).join('\n');
}
any_char = (!"\n" .)
marker_char = (!" " !"\n" .)
marker "Marker" = $marker_char+
_ "whitespace"
= [ \t\n\r]* { return []; }
EDIT: above didn't work with another piece of code after heredoc, here is better grammar:
{ let heredoc_begin = null; }
heredoc = "<<<" beginMarker "\n" text:content endMarker {
const loc = location();
const min = loc.start.column - 1;
const re = new RegExp(`^\\s{${min}}`, 'mg');
return {
type: 'Literal',
value: text.replace(re, '')
};
}
__ = (!"\n" !" " .)
marker 'Marker' = $__+
beginMarker = m:marker { heredoc_begin = m; }
endMarker = "\n" " "* end:marker &{ return heredoc_begin === end; }
content = $(!endMarker .)*

Lua, Modify print function

I am writing a generic Log() function in lua which utilizes lua print function:
Log (variable, 'String: %s ', str, 'Word: %d', w)
Currently I'm using below approach:
print(string.format (variable, 'String: %s ', str, 'Word: %d', w))
I tried something like:
Log = function(...) begin
return print(string.format(...))
end
But it doesn't work, Is this correct approach? Or Is there any better more generic way to get this done?
If you just want to print a sequence of values, you can do that with print:
print(variable, 'String: %s ', str, 'Word: %d', w)
What you seem to want is something more complicated. Your algorithm seems to be:
For each argument:
If the argument is not a string, then convert it to a string and print it.
If the argument is a string, figure out how many % patterns it has (let us call this number k). Pass string.format the current argument string and the following k parameters, printing the resulting string. Advance k parameters.
That's a much more complicated algorithm than can be done in a one-line system.
Using Lua 5.3, here's what such a function would look like (note: barely tested code):
function Log(...)
local values = {}
local params = table.pack(...)
local curr_ix = 1
while (curr_ix <= params.n) do
local value = params[curr_ix]
if(type(value) == "string") then
--Count the number of `%` characters, *except* for
--sequential `%%`.
local num_formats = 0
for _ in value:gmatch("%%[^%%]") do
num_formats = num_formats + 1
end
value = string.format(table.unpack(params, curr_ix, num_formats + curr_ix))
curr_ix = curr_ix + num_formats
end
values[#values + 1] = value
curr_ix = curr_ix + 1
end
print(table.unpack(values))
end
I don't think your current approach works, because the first argument of string.format expects the format specifier, not the rest of the arguments.
Anyway, this is the way to combine formatting and printing together:
Log = function(...)
return print(string.format(...))
end
And call it like this:
Log("String: %s Number: %d", 'hello' , 42)
Also, it might be better to make the format specifier argument more explicit, and use io.write instead of print to get more control over printing:
function Log(fmt, ...)
return io.write(string.format(fmt, ...))
end

How does string interpolation / string templates work?

#lf_araujo asked in this question:
var dic = new dict of string, string
dic["z"] = "23"
dic["abc"] = "42"
dic["pi"] = "3.141"
for k in sorted_string_collection (dic.keys)
print (#"$k: $(dic[k])")
What is the function of # in print(# ... ) and lines_add(# ...)?
As this is applicable to both Genie and Vala, I thought it would be better suited as a stand-alone question.
The conceptual question is:
How does string interpolation work in Vala and Genie?
There are two options for string interpolation in Vala and Genie:
printf-style functions:
var name = "Jens Mühlenhoff";
var s = string.printf ("My name is %s, 2 + 2 is %d", name, 2 + 2);
This works using varargs, you have to pass multiple arguments with the correct types to the varargs function (in this case string.printf).
string templates:
var name = "Jens Mühlenhoff";
var s = #"My name is $name, 2 + 2 is $(2 + 2)";
This works using "compiler magic".
A template string starts with #" (rather then " which starts a normal string).
Expressions in the template string start with $ and are enclosed with (). The brackets are unneccesary when the expression doesn't contain white space like $name in the above example.
Expressions are evaluated before they are put into the string that results from the string template. For expressions that aren't of type string the compiler tries to call .to_string (), so you don't have to explicitly call it. In the $(2 + 2) example the expression 2 + 2 is evaluated to 4 and then 4.to_string () is called with will result in "4" which can then be put into the string template.
PS: I'm using Vala syntax here, just remove the ;s to convert to Genie.

Lua String Split

Hi I've got this function in JavaScript:
function blur(data) {
var trimdata = trim(data);
var dataSplit = trimdata.split(" ");
var lastWord = dataSplit.pop();
var toBlur = dataSplit.join(" ");
}
What this does is it take's a string such as "Hello my name is bob" and will return
toBlur = "Hello my name is" and lastWord = "bob"
Is there a way i can re-write this in Lua?
You could use Lua's pattern matching facilities:
function blur(data) do
return string.match(data, "^(.*)[ ][^ ]*$")
end
How does the pattern work?
^ # start matching at the beginning of the string
( # open a capturing group ... what is matched inside will be returned
.* # as many arbitrary characters as possible
) # end of capturing group
[ ] # a single literal space (you could omit the square brackets, but I think
# they increase readability
[^ ] # match anything BUT literal spaces... as many as possible
$ # marks the end of the input string
So [ ][^ ]*$ has to match the last word and the preceding space. Therefore, (.*) will return everything in front of it.
For a more direct translation of your JavaScript, first note that there is no split function in Lua. There is table.concat though, which works like join. Since you have to do the splitting manually, you'll probably use a pattern again:
function blur(data) do
local words = {}
for m in string.gmatch("[^ ]+") do
words[#words+1] = m
end
words[#words] = nil -- pops the last word
return table.concat(words, " ")
end
gmatch does not give you a table right away, but an iterator over all matches instead. So you add them to your own temporary table, and call concat on that. words[#words+1] = ... is a Lua idiom to append an element to the end of an array.

Resources